On Tue, 14 Apr 2009, Ruijgrok, P.T. (Peter) wrote:

> I had serious performance problems with the bitstream checker, running
> Dspace 1.4.x
> We have +320.000 bitstreams and increasing continously.

Sadly, the current DSpace codebase has some serious scalability issues. 
(And Java's MD5 implementation isn't the fastest, either, but that's not 
the main culprit.)

For our instance, which has a separate server hosting the filesystem 
(which itself resides on a SAN), I wrote a Perl script to do the 
checksumming. It runs continuously in a loop, and manages nearly 500,000
bitstreams in 6 to 10 hours, depending on the load on the fileserver. It 
uses the md5sum binary from solarisfreeware.com.

It puts almost no load on the database, because it only queries the 
checksums from the bitstream table once, at the start. Output is logged 
continuously, and our local Nagios server monitors for any checksum 
errors.

This also has the advantage that it doesn't load the (Tomcat) webapp box, 
which already has enough work to do.

It also means that the same script can run on our backup servers (which 
also use disk; we couldn't manage with tape).

We've taken this approach with other things as well, such as our 
thumbnails (which aren't using the DSpace code, because we wanted to 
separate something as user-interface-centric as that from the actual 
archive contents; the DSpace code was also just too slow and just crashed 
our server).


Best,

--
Tom De Mulder <td...@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 16/04/2009 : The Moon is Waning Gibbous (59% of Full)

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to