Richard Jones wrote:
The data i'm dealing with is stored over a few mysql dbs on different
machines, horizontally partitioned so each user is assigned to a single db.
The queries i'm doing can be done in SQL in parallel over all machines then
combined, which i've tested - it's unacceptably slow :(
There is rather a lot of data; each profile can contain in excess of 1000
artists, and there are almost 1million profiles created. I'm only adding the
top N artists for each user to a lucene index, which is fast and still yields
good results (even with just the top 100 artists each).
I think this could be done reasonably well with SQL in a DBMS like Oracle RAC
or something like mysql cluster.. sadly the former doesnt scale financially,
and the latter requires everything to be held in RAM .. which is why i'm
using lucene.
If you're willing to continue subsetting / summarizing the data out into
Lucene, how about subsetting it out into a dedicated MySQL instance for
this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int =
roughly 1 GB of data, which would easily fit into RAM. Queries should
be pretty fast off of that. Good luck!
--MDC
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]