Re: selecting data for computation

Dan Nelson Thu, 24 Jun 2010 11:59:42 -0700

In the last episode (Jun 24), Tim Gustafson said:
> I'm working on a compute job that queries mySQL for work packets,
> processes the work packet and then updates the row in mySQL with the
> results of the job.
> 
> Currently, from each compute instance I'm doing something like:
> 
> select * from work_packets where work_data is null order by rand() limit 1;
> 
> That more or less works, and occasionally two jobs will process the same
> row and I'll waste a little bit of CPU time, but as a higher percentage of
> jobs get completed, duplicate work happens more and more.
[..]
> I also thought about doing something like:
> 
> select * from work_packets where work_data is null and (select
> get_lock(work_packets.id, 0) = 1 order by rand() limit 1;
> 
> but that seems to significantly increase the time it takes to run this
> query from about 2 seconds to about 13 seconds.  Since the jobs only take
> about 10 seconds of CPU time each, this represents in an actual slow-down
> as opposed to a speed-up.


That query won't execute (the subquery expression is incomplete); if you
really are using a subquery there, try it plain, with no order by rand():

select * from work_packets where work_data is null and get_lock(id,0)=1 limit 1;

My guess is your initial reason for ORDER BY RAND() was to minimize
duplicate work, but if you're using locks it'll never happen, so you might
as well just grab the first available packet.  That query shouldn't even
take 2 seconds, unless you have 50k rows in the table and no index on
work_data...
 
-- 
        Dan Nelson
        dnel...@allantgroup.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: selecting data for computation

Reply via email to