On Tue, 14 Apr 2009, Ruijgrok, P.T. (Peter) wrote: > I had serious performance problems with the bitstream checker, running > Dspace 1.4.x > We have +320.000 bitstreams and increasing continously.
Sadly, the current DSpace codebase has some serious scalability issues. (And Java's MD5 implementation isn't the fastest, either, but that's not the main culprit.) For our instance, which has a separate server hosting the filesystem (which itself resides on a SAN), I wrote a Perl script to do the checksumming. It runs continuously in a loop, and manages nearly 500,000 bitstreams in 6 to 10 hours, depending on the load on the fileserver. It uses the md5sum binary from solarisfreeware.com. It puts almost no load on the database, because it only queries the checksums from the bitstream table once, at the start. Output is logged continuously, and our local Nagios server monitors for any checksum errors. This also has the advantage that it doesn't load the (Tomcat) webapp box, which already has enough work to do. It also means that the same script can run on our backup servers (which also use disk; we couldn't manage with tape). We've taken this approach with other things as well, such as our thumbnails (which aren't using the DSpace code, because we wanted to separate something as user-interface-centric as that from the actual archive contents; the DSpace code was also just too slow and just crashed our server). Best, -- Tom De Mulder <td...@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 16/04/2009 : The Moon is Waning Gibbous (59% of Full) ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech