On 16 Apr 2009, at 16:44, Mark H. Wood wrote:

> On Tue, Apr 14, 2009 at 04:46:59PM +0200, Ruijgrok, P.T. (Peter)  
> wrote:
>> Problem 2:
>> ---------------
>> checker -d 10h  processes approximately 250 bitstreams / hour
>>
>> checker -d 1874/12233 (collection handle) processes approximately  
>> 20.000
>> bitstreams / hour
>>
>> So running the checker every night for eg 10 hours is not  
>> possible.....
>>
>>
>> Cause ( I think):
>> ----------------------
>> select bitstream_id
>>            from most_recent_checksum where to_be_processed = true
>>            order by date_trunc('milliseconds',  
>> last_process_end_date),
>>            bitstream_id ASC LIMIT 1;
>>
>> This statement requests the next bitstream to check and it takes +10
>> seconds to find the next bitstream on our 320.000+   dspace.
>
> I don't *know* what is causing this, and I'm not yet a DBMS
> performance expert, but my eye is attracted to the ORDER BY clause.
> The monitor is having to build a temporary table of some probably very
> sizable subset of 320.000 records, and sort it.  There is no index on
> last_process_end_date.  That may not be significant, but I suppose
> that the query planner might be able to eliminate the sorting if that
> column were indexed.  It might be able to make other optimizations as
> well, but I'm just guessing.


My first suggestion would be to take out that call to date_trunc().  
Even if last_process_end_date is indexed, the results of truncating it  
won't be. It's unnecessary and probably a significant performance hit.

So, time

select bitstream_id
            from most_recent_checksum where to_be_processed = true
            order by date_trunc('milliseconds', last_process_end_date),
            bitstream_id ASC LIMIT 1;

against

select bitstream_id
            from most_recent_checksum where to_be_processed = true
            order by last_process_end_date,
            bitstream_id ASC LIMIT 1;

and see what the difference is.

--
Simon Brown <st...@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH



------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to