RE: 4.1.1 FTS 2-level?

2004-01-14 Thread Steven Roussey
Thanks for the additional information. When 4.1.2 comes out, I'll give it a
test and return with some stats on real world result times (for my data set
at least).

-steve-



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 4.1.1 FTS 2-level?

2004-01-12 Thread Matt W
Hi,

- Original Message -
From: Sergei Golubchik
To: Steven Roussey
Sent: Wednesday, December 10, 2003 7:44 AM
Subject: Re: 4.1.1 FTS 2-level?


 Hi!

 On Dec 09, Steven Roussey wrote:
  Does Mysql 4.1.1 have the two level index system integrated into it
for full
  text searches?

 What do you mean ?
 Is it used to optimize searches ? No.

 Still there could be some speedup because, e.g, MyISAM will use binary
 search in the key pages instead of linear one, etc.

 Regards,
 Sergei

You're right.  Wow!

I assume you were comparing it to 4.0.x, not 4.1.0.  Once the data rows
are cached, the index search in boolean mode seems to be about *7 times*
faster than 4.0. :-)  2 test searches on the same data went from 48s -
7 and 35 - 5.  Nice!  Of course, if the data rows have to be read from
disk, the full-text code can do nothing to improve those reads. :*(

So are these faster index searches only the result of binary vs linear
search?  (I don't know the exact difference, but binary sounds good.
;-))  The actual full-text code itself is NOT any more optimized than
4.0?

Are the 2-level indexes solely for FTS to use, or can MyISAM use them in
general for any indexes?  Just wondering, since you said Is it used to
optimize searches? No.  Which sounds like it's being used for
*storage*, just not the word count statistics for optimization, etc.
And my index file was reduced from 1.74G in 4.0 to 1.59G, so I thought
maybe this is where some space was saved.

BTW, would (re)building the index be slower with 4.1 for any reason?  I
thought maybe it was, but I'd have to try again to be sure.


Matt


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 4.1.1 FTS 2-level?

2004-01-11 Thread Sergei Golubchik
Hi!

On Jan 11, Matt W wrote:
 Hi,
 
 - Original Message -
 From: Sergei Golubchik
 To: Steven Roussey
 Sent: Wednesday, December 10, 2003 7:44 AM
 Subject: Re: 4.1.1 FTS 2-level?
 
 
  Hi!
 
  On Dec 09, Steven Roussey wrote:
   Does Mysql 4.1.1 have the two level index system integrated into
   it for full text searches?
 
  What do you mean ?
  Is it used to optimize searches ? No.
 
  Still there could be some speedup because, e.g, MyISAM will use
  binary search in the key pages instead of linear one, etc.

 You're right.  Wow!
 
 I assume you were comparing it to 4.0.x, not 4.1.0.  Once the data rows
 are cached, the index search in boolean mode seems to be about *7 times*
 faster than 4.0. :-)  2 test searches on the same data went from 48s -
 7 and 35 - 5.  Nice!

Cool :)

 So are these faster index searches only the result of binary vs linear
 search?  (I don't know the exact difference, but binary sounds good.
 ;-))  The actual full-text code itself is NOT any more optimized than
 4.0?

As far as I remember - no.

The difference is simple - if one has, say, a string or words like

 just,simply,words,zyx

that is, comma-separated words in alphabetic order.
Then to find a particular word one needs to compare word1 (just) with
this particular word, then word simply, etc. It is because words - as
a sequences of characters - may have different lengths. So the only way
to find where in the string starts the word N is to read all the words
before it.

If it's an array of, say, integers

  int array[100]

then i-th entry is simply array[i], and to find a word in a sorted array
one can use binary search.

These are two different types of pages of a btree in MyISAM index.
If an index includes variable-length segments or(and) compression is used
then different key entries on the page may have different lengths,
MyISAM will use linear search for each key page, time is linear - O(N).
(for keys starting from a char/varchar - like a fulltext index - MyISAM
uses a special variant of the linear search, that does not unpacks each
key for comparison, it's ~2-4 times faster than normal linear search,
but still it's linear - O(N))

Otherwise MyISAM will use binary search, time is logarithmic - O(log N)

In normal fulltext index key entry is varchar, float, offset
In second level it's only float, offset - and MyISAM uses binary
seacrh for the pages on the 2nd level.

 Are the 2-level indexes solely for FTS to use, or can MyISAM use them in
 general for any indexes?

Fulltext only.

 Just wondering, since you said Is it used to
 optimize searches? No.  Which sounds like it's being used for
 *storage*, just not the word count statistics for optimization, etc.

Yes, for storage.
But it also used for word count statistics, as new there is only one
entry per word (well, for common enough words, rare words don't matter),
and this entry is the place where per-word statistics is be stored.

 And my index file was reduced from 1.74G in 4.0 to 1.59G, so I thought
 maybe this is where some space was saved.

Yes.
MyISAM does compress fulltext indexes with prefix compression.
That is for each word it stores only the difference from the previous
word on this page. If the words are identical, second entry will take
one or two bytes only (I don't remember exactly). And the first entry on
the page is always not compressed.

So, by converting all entries for some particular word to 2-level
structure saves these one-two bytes per word and
length(word)*(number_of_pages-1)

Usually - with prefix compression - one get about ~250 entries on the
page (I mean for the same common word), thus number_of_pages is,
approximately number_or_rows_with_the_word/250

 BTW, would (re)building the index be slower with 4.1 for any reason?  I
 thought maybe it was, but I'd have to try again to be sure.

I didn't compare :(

But out of my head I don't know any reason why it should be.

Regards,
Sergei

-- 
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
   ___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: 4.1.1 FTS 2-level?

2003-12-10 Thread Sergei Golubchik
Hi!

On Dec 09, Steven Roussey wrote:
 Does Mysql 4.1.1 have the two level index system integrated into it for full
 text searches?

What do you mean ?
Is it used to optimize searches ? No.

Still there could be some speedup because, e.g, MyISAM will use binary
search in the key pages instead of linear one, etc.

Regards,
Sergei

-- 
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
   ___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



4.1.1 FTS 2-level?

2003-12-09 Thread Steven Roussey
Does Mysql 4.1.1 have the two level index system integrated into it for full
text searches?

Thanks. :)

-steve-




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]