Re: Distributed Fulltext?

Brian DeFeyter Wed, 13 Feb 2002 13:31:30 -0800

I sorta like that idea. I don't know exactly what you can and can't do
as far as indexing inside of HEAP tables.. but the index size would
likely differ from the written index. Then you can expand the idea and
use the X/(num slices) on (num slices) boxes technique.. sending the
query to each, and compiling all of the results.


Your comment about only having a few common words that are searched
makes me wonder if a reverse-stopword function would be valuable (ie:
only index words in a list you generate) Probably could be done other
ways though too with what's available. Maybe a psuedo-bitmap index
inside a mysql table?

 - bdf

On Wed, 2002-02-13 at 12:09, Mike Wexler wrote:
> My understanding is that part of how google and Altavista get such high speeds 
> is to keep everything in memory. Is it possible to create a HEAP table with a 
> full text index? If so, does the full text index take advantage of being in 
> memory? For example, I would imagine that if you were keeping the whole index in 
> memory, details like the index page size, and the format of the pointers/record 
> numbers would be different.
> 
> Then you could just do something roughly like (i know the syntax is a little off)
> 
> CREATE HEAP TABLE fooFast SELECT * FROM fooSlow
> ALTER fooFast ADD fulltext(a, b, c)
> 
> Or maybe you could just have fooSlow on one server. And then have it replicated 
> on N other servers. But on the other servers you could alter the table type so 
> it was a heap table. So you would have one persistent table and a bunch of 
> replicated heap tables. And all the search go could against the heap tables.
> 
> 
> Brian Bray wrote:
> > It seems to me like the best solution that could be implemented as-is 
> > would be to keep a random int column in your table (with a range of say 
> > 1-100) and then have fulltext server 1 psudo-replicate records with a 
> > the random number in the range of 1-10, server 2 11-20 and server 3 
> > 21-30 and so on.
> > 
> > Then run your query on all 10 servers and merge the result sets and 
> > possibly re-sort them if you use the score column.
> > 
> > The problem with splitting the index up by word is that is messes up all 
> > your scoring and ranking.  For example what if you search using 5 
> > keywords, all starting with letters from different groups?  Your going 
> > to get pretty bad score for each match, and it could totally break 
> > boolean searches.
> > 
> > --
> > Brian Bray
> > 
> > 
> > 
> > 
> > Brian DeFeyter wrote:
> > 
> >>On Thu, 2002-02-07 at 15:40, Tod Harter wrote:
> >>[snip]
> >>
> >>
> >>>Wouldn't be too tough to write a little query routing system if you are using 
> >>>perl. Use DBD::Proxy on the web server side, and just hack the perl proxy 
> >>>server so it routes the query to several places and returns a single result 
> >>>set. Ordering could be achieved as well. I'm sure there are commercial 
> >>>packages out there as well. I don't see why the individual database servers 
> >>>would need to do anything special.
> >>>
> >>>
> >>[snip]
> >>
> >>If I'm understanding you correctly, I think you're refering to routing
> >>based on the first character of the word. That would work for cases
> >>where the query is searching for a word that begins with a certain
> >>character.. however fulltext searches also return results with the term
> >>in the middle.
> >>
> >>ie: a search for 'foo' could return:
> >>foo.txt
> >>foobar
> >>
> >>but also could return:
> >>thisisfoo
> >>that_is_foolish
> >>
> >>I could be wrong, but it's my understanding that MySQL stores it's
> >>fulltext index based on all the 'unique words' found. For such a system
> >>as you mentioned above, you'd probably have to create your own fulltext
> >>indexing system to determine: a) where to store the data 'segments' and
> >>b) how to route queries.  It seems like this could probably be done much
> >>more efficiently inside of the server.
> >>
> >> - Brian
> >>
> >>
> >>
> >>---------------------------------------------------------------------
> >>Before posting, please check:
> >>   http://www.mysql.com/manual.php   (the manual)
> >>   http://lists.mysql.com/           (the list archive)
> >>
> >>To request this thread, e-mail <[EMAIL PROTECTED]>
> >>To unsubscribe, e-mail <[EMAIL PROTECTED]>
> >>Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> >>
> >>
> >>
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > Before posting, please check:
> >    http://www.mysql.com/manual.php   (the manual)
> >    http://lists.mysql.com/           (the list archive)
> > 
> > To request this thread, e-mail <[EMAIL PROTECTED]>
> > To unsubscribe, e-mail <[EMAIL PROTECTED]>
> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> > 
> > 
> > ---------------------------------------------------------------------
> > Before posting, please check:
> >    http://www.mysql.com/manual.php   (the manual)
> >    http://lists.mysql.com/           (the list archive)
> > 
> > To request this thread, e-mail <[EMAIL PROTECTED]>
> > To unsubscribe, e-mail <[EMAIL PROTECTED]>
> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> > 
> 
> 
> 
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
> 
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> 
> 



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Distributed Fulltext?

Reply via email to