Hi! On Jan 11, Matt W wrote: > Hi, > > ----- Original Message ----- > From: "Sergei Golubchik" > To: "Steven Roussey" > Sent: Wednesday, December 10, 2003 7:44 AM > Subject: Re: 4.1.1 FTS 2-level? > > > > Hi! > > > > On Dec 09, Steven Roussey wrote: > > > Does Mysql 4.1.1 have the two level index system integrated into > > > it for full text searches? > > > > What do you mean ? > > Is it used to optimize searches ? No. > > > > Still there could be some speedup because, e.g, MyISAM will use > > binary search in the key pages instead of linear one, etc. > > You're right. Wow! > > I assume you were comparing it to 4.0.x, not 4.1.0. Once the data rows > are cached, the index search in boolean mode seems to be about *7 times* > faster than 4.0. :-) 2 test searches on the same data went from 48s -> > 7 and 35 -> 5. Nice!
Cool :) > So are these faster index searches only the result of binary vs linear > search? (I don't know the exact difference, but binary sounds good. > ;-)) The actual full-text code itself is NOT any more optimized than > 4.0? As far as I remember - no. The difference is simple - if one has, say, a string or words like "just,simply,words,zyx" that is, comma-separated words in alphabetic order. Then to find a particular word one needs to compare word1 ("just") with this particular word, then word "simply", etc. It is because words - as a sequences of characters - may have different lengths. So the only way to find where in the string starts the word N is to read all the words before it. If it's an array of, say, integers int array[100] then i-th entry is simply array[i], and to find a word in a sorted array one can use binary search. These are two different types of pages of a btree in MyISAM index. If an index includes variable-length segments or(and) compression is used then different key entries on the page may have different lengths, MyISAM will use linear search for each key page, time is linear - O(N). (for keys starting from a char/varchar - like a fulltext index - MyISAM uses a special variant of the linear search, that does not unpacks each key for comparison, it's ~2-4 times faster than "normal" linear search, but still it's linear - O(N)) Otherwise MyISAM will use binary search, time is logarithmic - O(log N) In normal fulltext index key entry is <varchar, float, offset> In second level it's only <float, offset> - and MyISAM uses binary seacrh for the pages on the 2nd level. > Are the 2-level indexes solely for FTS to use, or can MyISAM use them in > general for any indexes? Fulltext only. > Just wondering, since you said "Is it used to > optimize searches? No." Which sounds like it's being used for > *storage*, just not the word count statistics for optimization, etc. Yes, for storage. But it also used for word count statistics, as new there is only one entry per word (well, for common enough words, rare words don't matter), and this entry is the place where per-word statistics is be stored. > And my index file was reduced from 1.74G in 4.0 to 1.59G, so I thought > maybe this is where some space was saved. Yes. MyISAM does compress fulltext indexes with prefix compression. That is for each word it stores only the difference from the previous word on this page. If the words are identical, second entry will take one or two bytes only (I don't remember exactly). And the first entry on the page is always not compressed. So, by converting all entries for some particular word to 2-level structure saves these one-two bytes per word and length(word)*(number_of_pages-1) Usually - with prefix compression - one get about ~250 entries on the page (I mean for the same common word), thus number_of_pages is, approximately number_or_rows_with_the_word/250 > BTW, would (re)building the index be slower with 4.1 for any reason? I > thought maybe it was, but I'd have to try again to be sure. I didn't compare :( But out of my head I don't know any reason why it should be. Regards, Sergei -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany <___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]