Richard Jones wrote:

The data i'm dealing with is stored over a few mysql dbs on different machines, horizontally partitioned so each user is assigned to a single db. The queries i'm doing can be done in SQL in parallel over all machines then combined, which i've tested - it's unacceptably slow :(

There is rather a lot of data; each profile can contain in excess of 1000 artists, and there are almost 1million profiles created. I'm only adding the top N artists for each user to a lucene index, which is fast and still yields good results (even with just the top 100 artists each).

I think this could be done reasonably well with SQL in a DBMS like Oracle RAC or something like mysql cluster.. sadly the former doesnt scale financially, and the latter requires everything to be held in RAM .. which is why i'm using lucene.

If you're willing to continue subsetting / summarizing the data out into Lucene, how about subsetting it out into a dedicated MySQL instance for this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int = roughly 1 GB of data, which would easily fit into RAM. Queries should be pretty fast off of that. Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to