Hi, my name is Amy Adams and I'm the Berkeley DB Product Manager for Sleepycat Software. I wanted to respond to Support Request #950 which you had open with us. > From: [EMAIL PROTECTED] > Subject: Re: prefix comparison costs pages [#950] > > Sleepycat Software writes: >> >> If you specified the default prefix function as your prefix function, and >> got different results than specifying no prefix function, I'm interested >> in tracking that down, that sounds like a problem. > > Sorry for not being clear. Here is what I do : > > 1) Leave the defaults (i.e. the default prefix function is on) > Get 548 internal pages. > 2) Assign the compare function (to point to the default compare function) > and leave the prefix function to 0 (i.e. no prefix function is used) > Get 546 internal pages. One of our engineers has now reviewed this Support Request, and he replies as follows: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= The dataset in question consisted of English words with two-digit numerical suffixes. This is exactly the sort of dataset that prefix compression won't help at all -- the distinguishing bytes between many of the data are the -last- few. I have a weird suspicion that he misinterpreted "prefix compression" -- it is sort of counterintuitive, as it compresses (or rather ignores) identical -suffixes-, not prefixes. Anyway, using a similar dataset (the first 20000 words in /usr/share/dict/words, each inserted 100 times with a different two-digit suffix, for a total of 2,000,000 keys), I get 26 internal pages with either the default or with no prefix function; the prefix function doesn't (and can't) win anything, but it's also no loss. Plop a few repetitions of the alphabet onto the end of every key, and the prefix function shines; now, we get 91 internal pages with the function and a whopping 364 without. This seems reasonable given the relative sizes of the truncated and non-truncated keys, and the size differential between these keys and the ones in the last test. In short, as far as I can tell, the prefix function is working exactly as it ought to. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Thank you for your email, and please let us know if you disagree with our assessment or if we can be of any further assistance. Regards, Amy Adams =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Amy Adams Berkeley DB Product Manager Sleepycat Software Inc. [EMAIL PROTECTED] 394 E. Riding Dr. http://www.sleepycat.com Carlisle, MA 01741-1601 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the SUBJECT of the message.
