Steve Rapaport wrote:
> On Friday 08 February 2002 06:14 pm, James Montebello wrote:
>
>>Distribution is how Google gets its speed. You say clustering won't
>>solve the problem, but distributing the indicies across many processors
>>*is* going to gain you a huge speed increase through sheer parallelism.
>>
>
> True, but not enough. The BEST parallelism can do in a compute-bound
> application is divide the time by the number of processors. That's assuming
> a PERFECT routing system. (Correct me if I'm wrong here)
There are actually some exceptions. Specifically if you have a very large
problem set and parallelism allows you to move the problem set into a faster
storage medium, you can sometimes see greater performance increases. Lets say
you are doing a full text search. And you have a 20 GB full text index. It may
not be feasible to build a machine with 20GB of RAM for storing the index in
RAM. But it might be more feasible to store 1/20th of the index in each of 20
1GB machines. And with RAM being >1000 times as fast as hard disk. You could get
a huge win.
>
> So to make the routing system + parallelism add up to
> a MILLION times better performance, you would need at
> least a MILLION processors. I doubt that even Google is
> doing that.
>
>
>>Google uses thousands of processors to handle its index, and any given
>>search is going to be spread over 100s of processors.
>>
>
> Right, so we can expect Google to do, say, 10,000 times (10^4) better
> than Mysql at a Fulltext search. But in fact we're seeing
> a million, 10^6, being generous. It's that extra factor
> of a hundred (more likely a thousand, my estimates
> were very generous) that I'm getting all fussy about.
>
>
>>Asking
>>a general purpose RDBMS to be really good a 10 different things is asking
>>a bit much. Fulltext searches are well down the list.
>>
>
> Here we get out of the realm of hard numbers and into opinions. But here's
> mine: If mysql bothers to support fulltext searches, it's presumably because
> there's some demand for them in some circumstances. The level of scaling and
> optimization that could reasonably be expected: What it takes to make the
> feature useful (i.e. perform similarly) in similar cases to other Mysql
> features. With a regular index, I can do a hash lookup on 23 million records
> subsecond. With a fulltext index (on a small field, only 40 characters) my
> time slips to 3 to 180 seconds. That extra little factor of 100 is my
> problem. Distributing over 4 processors wouldn't really help much. And
> because people don't always type a company name the exact same way,
> FULLTEXT really is the best way to do this.
>
> So Monty et al, my request is this: please put on the wish list, enhancement
> of FULLTEXT search to approximately match the performance of an indexed
> search on 25 million records, on the same hardware and with other things
> held equal.
>
> Steve Rapaport
>
> ---------------------------------------------------------------------
> Before posting, please check:
> http://www.mysql.com/manual.php (the manual)
> http://lists.mysql.com/ (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>
>
> ---------------------------------------------------------------------
> Before posting, please check:
> http://www.mysql.com/manual.php (the manual)
> http://lists.mysql.com/ (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php