Peter, Stefan Manegold wrote: > Peter, > > the purpose and/or reason of/for my question is the fact that we cannot > release the binary packages that Sjoerd built yesterday, because the binary > database format has been modified without increasing the GDK version number. > We simply forgot the latter. > Hence, a new Mserver cannot notice in case it is started on an pld database, > and instead of complaining properly, it will just behave strangely, crash > and/or corrup the database. This is not good. > Consequently, Sjoerd needs to re-do the building once again, and we have to > think of how to solve the above problem. > Obviously, one solution is th "simply" increase the GDK version number. > But then, users have to abandon or dump/retore their databases now and with > the next release (GDK-2). > An other alternative would be to undo the string hash function changes in > the release branch, keeping the same binary format for now, and include both > binary format changes in one release. > In particular if the string hash function changes are "only" for performance > reasons, both Sjoerd, Niels, and I would favor the second solution. The same holds for me and Fabian. I would even go as far as completely undoing this patch, aside from the absolute minimum necessary to avoid known and demonstrated bugs (with proper functional tests).
> After all, we and our users have been able to live with the "old" string > hash function for years. Those that really require the new one can switch > to the development version. Yes, a few months ago we have (after strong discussions) decided that *any* interface issue should be discussed thoroughly upfront. This involves both internal/external APIs, anything that may severely complicate live for others. This also was enforced for checking in features in the stable. In the check-in messages on the stable it was repeatedly mentioned that any change of this kind was subject to approval by Sjoerd. Again, I see too many last minute actions with possibly far reaching consequences. I consider a change in storage layout a sufficient discontinuity to not include this without a few months internal use. Otherwise, we might have also released the GDK2 code already. > > I'm busy with testing --- on a 8GB dual-dual-core 2GHz Opteron running > 64-bit FedoraCore 6 and and optimized 64-bit Mserver with 64-bit OIDs, > shredding the 19 GB file read-only took 11 hours, both with the old and the > new hash functions (Mserver grew to 10 GB res, >50 GB virt). Sorry, but this is a performance test and not a functionality test. As such it can not and should not block a release. > Shredding updateable without your two fixes took only 2 hours, but the > resulting DB is currupt as reported in > [ 1811229 ] [ADT] Adding large document, with update support > http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56 > . > Shredding updateable with your two fixes is still busy, running for more > than 6 hours, now... > > Stefan > > > On Tue, Oct 16, 2007 at 01:46:47PM +0200, Peter Boncz wrote: >> Hi, >> >> Hm, I cannot really understand the purpose of the question. And what is >> wrong with performance fixes? >> >> both fixes are related to the same bug: >> - the remap failing is addressed by the gdk_posix fix >> - the shredding in the bug report taking excessively long is addressed by >> the gdk_atoms fix >> >> indeed, any hash function can have collisions.. it all depends on the >> distribution. >> >> Peter >> >> PS most probably, this mail (sent from my home account) will be rejected by >> the sourceforge mailing list -- and I cannot sent through CWI from home as >> the secure mail sending is not supported by CWI staff for microsoft >> emailers. >> >> >> -----Original Message----- >> From: Stefan Manegold [mailto:[EMAIL PROTECTED] >> Sent: dinsdag 16 oktober 2007 12:01 >> To: Peter Boncz >> Cc: [email protected] >> Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, >> MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, >> 1.143.2.1 >> >> >> Peter, >> >> which part of your changes do fix the problem with updatedable shredding of >> large XML documents as reporten in >> [ 1811229 ] [ADT] Adding large document, with update support >> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56 >> 967&atid=482468 >> ? >> >> The new has string function in gdk_atoms.mx or the file descriptor fixes in >> gdk_posix.mx? >> >> The former looks for like a performance fix to me --- too many collisions >> should only slows the system down, but not copromize its >> fucntionallity/correctness, right? >> Also with the new string has functions ("too") many collisions can still >> occur with certain datasets ... >> >> Stefan >> >> >> On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote: >>> Update of /cvsroot/monetdb/MonetDB/src/gdk >>> In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103 >>> >>> Modified Files: >>> Tag: MonetDB_1-20 >>> gdk_atoms.mx gdk_posix.mx >>> Log Message: >>> >>> [checkin on behalf of Peter] >>> >>> fixing XQuery bug >>> [ 1811229 ] [ADT] Adding large document, with update support >>> >> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56 >> 967&atid=482468 >>> gdk_atoms.mx: >>> - hash collisions in strings that consists of digits only (a common case!) >>> we now use a fast derivative of the Bob Jenkins function from now on >>> >>> Really bad collisions, in case of the 20GB document of the bug report, >>> shredding took 8 hours before, 1 hour after this change. >>> >>> NOTE: this change affects the binary format (string heaps) and all >> product >>> families, as the hash function is a compiled-in macro! >>> In particular, lookup operations and joins on SQL (Monet4/5) >> columns >>> consisting of digits only, but stored in a VARCHAR, should be >> faster >>> after this check-in. >>> >>> gdk_posix.mx >>> - we lost track of the file descriptor for large heaps (the file desc is >> given >>> to the mmap-monitoring-thread to close later), such that the remap >> function >>> could fail (when it was given the illegal file descriptor 0) >>> >>> NOTE: this change only affects xquery it only uses remap() >>> >>> >>> Index: gdk_posix.mx >>> =================================================================== >>> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v >>> retrieving revision 1.143 >>> retrieving revision 1.143.2.1 >>> diff -u -d -r1.143 -r1.143.2.1 >>> --- gdk_posix.mx 4 Sep 2007 17:55:20 -0000 1.143 >>> +++ gdk_posix.mx 14 Oct 2007 20:31:33 -0000 1.143.2.1 >>> @@ -615,7 +615,7 @@ >>> MT_mmap_tab[i].writable = writable; >>> MT_mmap_tab[i].fd = fd; >>> MT_mmap_tab[i].pincnt = 0; >>> - fd = -1; >>> + fd = -fd; >>> } >>> (void) pthread_mutex_unlock(&MT_mmap_lock); >>> return fd; >>> @@ -1051,9 +1051,7 @@ >>> } >>> if (ret != (void *) -1L) { >>> hdl->fixed = ret; >>> - fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE)); >>> - if (fd <= 0) >>> - hdl->hdl = (void *) 0; /* MT_mmap_new keeps the fd */ >>> + hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd, >>> (mode & >> MMAP_WRITABLE)); >>> } >>> return ret; >>> } >>> @@ -1061,13 +1059,12 @@ >>> void * >>> MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len) >>> { >>> - void *ret; >>> - >>> - ret = mmap(hdl->fixed, >>> + int fd = (int) (ssize_t) hdl->hdl; >>> + void *ret = mmap(hdl->fixed, >>> len, >>> ((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ, >>> ((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) : >> MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0), >>> - (int) (ssize_t) hdl->hdl, >>> + (fd < 0)?-fd:fd, >>> off); >>> >>> if (ret != (void *) -1L) { >>> @@ -1083,9 +1080,7 @@ >>> MT_mmap_close(MT_mmap_hdl *hdl) >>> { >>> int fd = (int) (ssize_t) hdl->hdl; >>> - >>> - if (fd) >>> - close(fd); >>> + if (fd > 0) close(fd); >>> hdl->hdl = NULL; >>> } >>> >>> >>> Index: gdk_atoms.mx >>> =================================================================== >>> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v >>> retrieving revision 1.134 >>> retrieving revision 1.134.6.1 >>> diff -u -d -r1.134 -r1.134.6.1 >>> --- gdk_atoms.mx 2 May 2007 16:16:58 -0000 1.134 >>> +++ gdk_atoms.mx 14 Oct 2007 20:31:32 -0000 1.134.6.1 >>> @@ -1878,13 +1878,19 @@ >>> rotates all characters together. It is optimized to process 2 characters >>> at a time (adding 16-bits to the hash value each iteration). >>> @h >>> -#define GDK_STRHASH(x,y) { \ >>> - str _c = (str) (x); \ >>> - for((y)=0; _c[0] && _c[1]; _c+=2) { \ >>> - (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] << >> 8) ^ _c[0];\ >>> - } \ >>> - (y) ^= _c[0]; \ >>> +#define GDK_STRHASH(x,y) {\ >>> + str _key = (str) (x);\ >>> + int _i;\ >>> + for (_i = y = 0; _key[_i]; _i++) {\ >>> + y += _key[_i];\ >>> + y += (y << 10);\ >>> + y ^= (y >> 6);\ >>> + }\ >>> + y += (y << 3);\ >>> + y ^= (y >> 11);\ >>> + y += (y << 15);\ >>> } >>> + >>> @c >>> hash_t >>> strHash(str s) >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Splunk Inc. >>> Still grepping through log files to find problems? Stop. >>> Now Search log events and configuration files using AJAX and a browser. >>> Download your FREE copy of Splunk now >> http://get.splunk.com/ >>> _______________________________________________ >>> Monetdb-checkins mailing list >>> [EMAIL PROTECTED] >>> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins >> -- >> | Dr. Stefan Manegold | mailto:[EMAIL PROTECTED] | >> | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | >> | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | >> | The Netherlands | Fax : +31 (20) 592-4312 | > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Monetdb-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/monetdb-developers
