Re: [Dspace-tech] Speed problem in postgres during batch ingesting
This is a nice article, but I can't see if the server is dedicated to this DSpace and how much memory is allocated to Java processes. On a fairly loaded server, it took me 15 seconds per document on a 680,000 items DSpace (with 512MB allocated). On Tue, Jan 27, 2009 at 4:09 PM, Stuart Lewis wrote: > > Given that the test in the paper uses neither postgres nor the DSpace > > import tool, that seems unlikely. > > 30,000 items shouldn't pose a big problem for any mature DBMS (e.g. > Postgres > / MySQL etc). If there are problems at that scale, they are more likely to > be in other parts of the system. > > We've recently finished testing DSpace ingest to 333,000 items using > Postgres. Again, this wasn't using the batch importer, but instead using > SWORD. Deposits into an empty repository took about 1.5 seconds each, and > at > a third of a million items they took about 7 seconds. So the problems > probably aren't with Postgres. For details see: > > > http://blog.stuartlewis.com/2009/01/19/dspace-at-a-third-of-a-million-items/ > > Cheers, > > > Stuart > _ > > Gwasanaethau Gwybodaeth Information Services > Prifysgol Aberystwyth Aberystwyth University > >E-bost / E-mail: stuart.le...@aber.ac.uk > Ffon / Tel: (01970) 622860 > _ > > > > -- > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > ___ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech > -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
On Tue, Jan 27, 2009 at 10:27 AM, Stuart Lewis wrote: > Hi Ilias, > > > I am using the dspace import tool for batch ingesting in a Postgres > > database and I am facing extremely slow feedback in each record > commitment. > > The following paper talks about this, and how DSpace performs when > ingesting > 1 million items: > http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf > The paper says that DSpace 1.4 was used with MySQL. I presume that the DSpace was locally modified to work with MySQL. Is there any chance of contributing these changes to DSpace 1.5 please? I am really keen to use DSpace with MySQL because this might enable me to put a DSpace repository on my own web site (my webspace provider only supports MySQL, not postgres). -- Regards, Andrew M. http://www.andrewpetermarlow.co.uk -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
> Given that the test in the paper uses neither postgres nor the DSpace > import tool, that seems unlikely. 30,000 items shouldn't pose a big problem for any mature DBMS (e.g. Postgres / MySQL etc). If there are problems at that scale, they are more likely to be in other parts of the system. We've recently finished testing DSpace ingest to 333,000 items using Postgres. Again, this wasn't using the batch importer, but instead using SWORD. Deposits into an empty repository took about 1.5 seconds each, and at a third of a million items they took about 7 seconds. So the problems probably aren't with Postgres. For details see: http://blog.stuartlewis.com/2009/01/19/dspace-at-a-third-of-a-million-items/ Cheers, Stuart _ Gwasanaethau Gwybodaeth Information Services Prifysgol Aberystwyth Aberystwyth University E-bost / E-mail: stuart.le...@aber.ac.uk Ffon / Tel: (01970) 622860 _ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
On 27 Jan 2009, at 14:14, Stuart Lewis wrote: > Hi Tom, > >>> The following paper talks about this, and how DSpace performs when >>> ingesting >>> 1 million items: >>> >>> Testing the Scalability of a DSpace-based Archive, Dharitri Misra, >>> James >>> Seamans, George R. Thoma, National Library of Medicine, Bethesda, >>> Maryland, >>> USA >>> >>> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf >> >> That paper doesn't use the DSpace importer, so I fail to see how it >> can >> claim the importer scales well. > > I don't think it does make such a claim. The claim it makes is that > DSpace > can still provide an acceptable level of ingest performance when > loaded with > 1 million items. > > The original email asked "Is there any known problem with the > maximum size > of dspace database using postgres or in the import tool?" so the > first part > of that, subject as you said to having a 'busy DSpace', is hopefully > to some > extent answered by that paper. Given that the test in the paper uses neither postgres nor the DSpace import tool, that seems unlikely. -- Simon Brown - Cambridge University Computing Service +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
Hi Tom, >> The following paper talks about this, and how DSpace performs when ingesting >> 1 million items: >> >> Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James >> Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, >> USA >> >> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf > > That paper doesn't use the DSpace importer, so I fail to see how it can > claim the importer scales well. I don't think it does make such a claim. The claim it makes is that DSpace can still provide an acceptable level of ingest performance when loaded with 1 million items. The original email asked "Is there any known problem with the maximum size of dspace database using postgres or in the import tool?" so the first part of that, subject as you said to having a 'busy DSpace', is hopefully to some extent answered by that paper. Cheers, Stuart _ Gwasanaethau Gwybodaeth Information Services Prifysgol Aberystwyth Aberystwyth University E-bost / E-mail: stuart.le...@aber.ac.uk Ffon / Tel: (01970) 622860 _ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
Hi Simon, >> Testing the Scalability of a DSpace-based Archive, Dharitri Misra, >> James >> Seamans, George R. Thoma, National Library of Medicine, Bethesda, >> Maryland, >> USA >> >> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf > > Can I enquire as to where within the standard DSpace toolset the > SIPIngestManager, as used in the tests in this paper, may be found? I > haven't been able to locate it. It sounds like it is a custom tool they wrote. Thanks, Stuart _ Gwasanaethau Gwybodaeth Information Services Prifysgol Aberystwyth Aberystwyth University E-bost / E-mail: stuart.le...@aber.ac.uk Ffon / Tel: (01970) 622860 _ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
On Tue, 27 Jan 2009, Stuart Lewis wrote: > The following paper talks about this, and how DSpace performs when ingesting > 1 million items: > > Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James > Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, > USA > > http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf > > Is this one big import of 30,000 items, or do you break them up into smaller > chunks? That paper doesn't use the DSpace importer, so I fail to see how it can claim the importer scales well. I can tell from a lot of first-hand experience that the DSpace importer doesn't scale, and that it gets slower as you have more items in your DSpace instance, as well as slowing down for each item in the batch. In addition, if you have a busy DSpace instance, there may be issues with file locking where deleted filehandles don't get recovered properly. best, -- Tom De Mulder - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 27/01/2009 : The Moon is Waning Crescent (3% of Full) -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
I would recommend two things; 1) batches of 200 documents (elsewhere their XML representation in memory becomes very big) 2) make a PostgreSQL Maintenance (ANALYZE) after loading the first 200 and after loading a big part (first 10 thousand records) Christophe Stuart Lewis a écrit : Hi Ilias, I am using the dspace import tool for batch ingesting in a Postgres database and I am facing extremely slow feedback in each record commitment. Initially, the speed was normal but when the items tend to be around 30 thousand, the speed of each commitment is unacceptable. Is there any known problem with the maximum size of dspace database using postgres or in the import tool? Any comments will be helpful. The following paper talks about this, and how DSpace performs when ingesting 1 million items: Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, USA http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf Is this one big import of 30,000 items, or do you break them up into smaller chunks? Thanks, Stuart _ Gwasanaethau Gwybodaeth Information Services Prifysgol Aberystwyth Aberystwyth University E-bost / E-mail: stuart.le...@aber.ac.uk Ffon / Tel: (01970) 622860 _ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech begin:vcard fn:Christophe Dupriez n:Dupriez;Christophe org:DESTIN inc. SSEB adr;quoted-printable:;;rue des Palais 44, bo=C3=AEte 1;Bruxelles;;B-1030;Belgique email;internet:christophe.dupr...@destin.be title:Informaticien tel;work:+32/2/216.66.15 tel;fax:+32/2/242.97.25 tel;cell:+32/475.77.62.11 note;quoted-printable:D=C3=A9veloppement de Syst=C3=A8mes de Traitement de l'Information x-mozilla-html:TRUE url:http://www.destin.be version:2.1 end:vcard -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
On 27 Jan 2009, at 10:27, Stuart Lewis wrote: > Hi Ilias, > >> I am using the dspace import tool for batch ingesting in a Postgres >> database and I am facing extremely slow feedback in each record >> commitment. >> Initially, the speed was normal but when the items tend to be >> around 30 >> thousand, the speed of each commitment is unacceptable. >> Is there any known problem with the maximum size of dspace database >> using postgres or in the import tool? >> >> Any comments will be helpful. > > The following paper talks about this, and how DSpace performs when > ingesting > 1 million items: > > Testing the Scalability of a DSpace-based Archive, Dharitri Misra, > James > Seamans, George R. Thoma, National Library of Medicine, Bethesda, > Maryland, > USA > > http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf > > Is this one big import of 30,000 items, or do you break them up into > smaller > chunks? Can I enquire as to where within the standard DSpace toolset the SIPIngestManager, as used in the tests in this paper, may be found? I haven't been able to locate it. -- Simon Brown - Cambridge University Computing Service +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
Hi Ilias, > I am using the dspace import tool for batch ingesting in a Postgres > database and I am facing extremely slow feedback in each record commitment. > Initially, the speed was normal but when the items tend to be around 30 > thousand, the speed of each commitment is unacceptable. > Is there any known problem with the maximum size of dspace database > using postgres or in the import tool? > > Any comments will be helpful. The following paper talks about this, and how DSpace performs when ingesting 1 million items: Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, USA http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf Is this one big import of 30,000 items, or do you break them up into smaller chunks? Thanks, Stuart _ Gwasanaethau Gwybodaeth Information Services Prifysgol Aberystwyth Aberystwyth University E-bost / E-mail: stuart.le...@aber.ac.uk Ffon / Tel: (01970) 622860 _ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Speed problem in postgres during batch ingesting
Hi, I am using the dspace import tool for batch ingesting in a Postgres database and I am facing extremely slow feedback in each record commitment. Initially, the speed was normal but when the items tend to be around 30 thousand, the speed of each commitment is unacceptable. Is there any known problem with the maximum size of dspace database using postgres or in the import tool? Any comments will be helpful. Thanks, Ilias Stavrakis smime.p7s Description: S/MIME Cryptographic Signature -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech