Just for the records: I finally managed to finsh my experiments regarding [ 1811229 ] [ADT] Adding large document, with update support http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468 and the related code changes. For those interested, here's the detailed story:
"S08-64" System (beo-24): - 2x 64-bit Dual-Core Opteron270 @ 2 Ghz - 8 GB memory - MonetDB/XQuery 0.20, 64-bit, 64-bit OIDs, --enable-optimize (gcc 4.1.2) "S16-32" System (core-1): - 4x 64-bit Dual-Core Opteron870 @ 2 Ghz - 16 GB memory - MonetDB/XQuery 0.20, 64-bit, 32-bit OIDs, --enable-optimize (gcc 4.1.2) Document: http://mirror.openstreetmap.nl/planet/planet-071003.osm.bz2 (extracted: 19 GB XML file) "SR" Shredding read-only: pf:add-doc(".../planet-071003.osm","planet-071003.osm") "SU" Shredding updateable: pf:add-doc(".../planet-071003.osm","planet-071003.osm","planet-071003.osm",5) "QR"/"QU" Count query: count(doc("planet-071003.osm")//*) Configurations: m: without Peter's mmap fix in gdk_posix.mx (i.e., using rev. 1.143 of gdk_posix.mx) M: with Peter's mmap fix in gdk_posix.mx (i.e., using rev. 1.143.2.1 of gdk_posix.mx) h: without Peter's new string hash function in gdk_atoms.mx (i.e., using rev. 1.134 of gdk_atoms.mx) H: with Peter's new string hash function in gdk_atoms.mx (i.e., using rev. 1.134.6.1 of gdk_atoms.mx) Results (wall-clock times): S08-64: SR QR SU QU[1] QU[2] mh 659m34s 1m09s 81m06s ERROR - mH - - 383m17s ERROR - Mh - - 77m03s 344m32s 1m34s MH 644m01s 0m49s 390m42s 342m47s 1m36s S16-32: SR QR SU QU[1] QU[2] mh 127m59s 0m17s 43m14s ERROR - mH 110m33s 0m16s 26m26s ERROR - Mh 128m11s 0m18s 44m00s 100m50s 1m15s MH 191m42s 0m17s 25m43s 101m37s 1m21s (NB: "SR" includes building of indices, while "SU" does not; consequently, "QR" can exploit the indices built during "SR", while "QU[1]" has to build the indices first, and only "QU[2]" can exploit them.) Apparently, the mmap fix in gdk_posix.mx seems to be sufficient to prevent the remap-ERROR reported (for a system & configuration similar to "S08-64") in [ 1811229 ] [ADT] Adding large document, with update support http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468 I'll leave the further interpretation of the above results to the interested recipient / reader. Stefan On Tue, Oct 16, 2007 at 01:46:47PM +0200, Peter Boncz wrote: > Hi, > > Hm, I cannot really understand the purpose of the question. And what is > wrong with performance fixes? > > both fixes are related to the same bug: > - the remap failing is addressed by the gdk_posix fix > - the shredding in the bug report taking excessively long is addressed by > the gdk_atoms fix > > indeed, any hash function can have collisions.. it all depends on the > distribution. > > Peter > > PS most probably, this mail (sent from my home account) will be rejected by > the sourceforge mailing list -- and I cannot sent through CWI from home as > the secure mail sending is not supported by CWI staff for microsoft > emailers. > > > -----Original Message----- > From: Stefan Manegold [mailto:[EMAIL PROTECTED] > Sent: dinsdag 16 oktober 2007 12:01 > To: Peter Boncz > Cc: [email protected] > Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, > MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, > 1.143.2.1 > > > Peter, > > which part of your changes do fix the problem with updatedable shredding of > large XML documents as reporten in > [ 1811229 ] [ADT] Adding large document, with update support > http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56 > 967&atid=482468 > ? > > The new has string function in gdk_atoms.mx or the file descriptor fixes in > gdk_posix.mx? > > The former looks for like a performance fix to me --- too many collisions > should only slows the system down, but not copromize its > fucntionallity/correctness, right? > Also with the new string has functions ("too") many collisions can still > occur with certain datasets ... > > Stefan > > > On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote: > > Update of /cvsroot/monetdb/MonetDB/src/gdk > > In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103 > > > > Modified Files: > > Tag: MonetDB_1-20 > > gdk_atoms.mx gdk_posix.mx > > Log Message: > > > > [checkin on behalf of Peter] > > > > fixing XQuery bug > > [ 1811229 ] [ADT] Adding large document, with update support > > > http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56 > 967&atid=482468 > > > > gdk_atoms.mx: > > - hash collisions in strings that consists of digits only (a common case!) > > we now use a fast derivative of the Bob Jenkins function from now on > > > > Really bad collisions, in case of the 20GB document of the bug report, > > shredding took 8 hours before, 1 hour after this change. > > > > NOTE: this change affects the binary format (string heaps) and all > product > > families, as the hash function is a compiled-in macro! > > In particular, lookup operations and joins on SQL (Monet4/5) > columns > > consisting of digits only, but stored in a VARCHAR, should be > faster > > after this check-in. > > > > gdk_posix.mx > > - we lost track of the file descriptor for large heaps (the file desc is > given > > to the mmap-monitoring-thread to close later), such that the remap > function > > could fail (when it was given the illegal file descriptor 0) > > > > NOTE: this change only affects xquery it only uses remap() -- | Dr. Stefan Manegold | mailto:[EMAIL PROTECTED] | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 | ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Monetdb-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/monetdb-developers
