Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-28 Thread François Parmentier
This is a nice article, but I can't see if the server is dedicated to this
DSpace and how much memory is allocated to Java processes.

On a fairly loaded server, it took me 15 seconds per document on a 680,000
items DSpace (with 512MB allocated).

On Tue, Jan 27, 2009 at 4:09 PM, Stuart Lewis  wrote:

> > Given that the test in the paper uses neither postgres nor the DSpace
> > import tool, that seems unlikely.
>
> 30,000 items shouldn't pose a big problem for any mature DBMS (e.g.
> Postgres
> / MySQL etc). If there are problems at that scale, they are more likely to
> be in other parts of the system.
>
> We've recently finished testing DSpace ingest to 333,000 items using
> Postgres. Again, this wasn't using the batch importer, but instead using
> SWORD. Deposits into an empty repository took about 1.5 seconds each, and
> at
> a third of a million items they took about 7 seconds. So the problems
> probably aren't with Postgres. For details see:
>
>
> http://blog.stuartlewis.com/2009/01/19/dspace-at-a-third-of-a-million-items/
>
> Cheers,
>
>
> Stuart
> _
>
> Gwasanaethau Gwybodaeth  Information Services
> Prifysgol Aberystwyth  Aberystwyth University
>
>E-bost / E-mail: stuart.le...@aber.ac.uk
> Ffon / Tel: (01970) 622860
> _
>
>
>
> --
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> ___
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Andrew Marlow
On Tue, Jan 27, 2009 at 10:27 AM, Stuart Lewis  wrote:

> Hi Ilias,
>
> >  I am using the dspace import tool for batch ingesting in a Postgres
> > database and I am facing extremely slow feedback in each record
> commitment.
>


> The following paper talks about this, and how DSpace performs when
> ingesting
> 1 million items:
> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf
>

The paper says that DSpace 1.4 was used with MySQL.  I presume that the
DSpace was locally modified to work with MySQL. Is there any chance of
contributing these changes to DSpace 1.5 please? I am really keen to use
DSpace with MySQL because this might enable me to put a DSpace repository on
my own web site (my webspace provider only supports MySQL, not postgres).

-- 
Regards,

Andrew M.
http://www.andrewpetermarlow.co.uk
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Stuart Lewis
> Given that the test in the paper uses neither postgres nor the DSpace
> import tool, that seems unlikely.

30,000 items shouldn't pose a big problem for any mature DBMS (e.g. Postgres
/ MySQL etc). If there are problems at that scale, they are more likely to
be in other parts of the system.

We've recently finished testing DSpace ingest to 333,000 items using
Postgres. Again, this wasn't using the batch importer, but instead using
SWORD. Deposits into an empty repository took about 1.5 seconds each, and at
a third of a million items they took about 7 seconds. So the problems
probably aren't with Postgres. For details see:

http://blog.stuartlewis.com/2009/01/19/dspace-at-a-third-of-a-million-items/

Cheers,


Stuart
_

Gwasanaethau Gwybodaeth  Information Services
Prifysgol Aberystwyth  Aberystwyth University

E-bost / E-mail: stuart.le...@aber.ac.uk
 Ffon / Tel: (01970) 622860
_


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Simon Brown

On 27 Jan 2009, at 14:14, Stuart Lewis wrote:

> Hi Tom,
>
>>> The following paper talks about this, and how DSpace performs when  
>>> ingesting
>>> 1 million items:
>>>
>>> Testing the Scalability of a DSpace-based Archive, Dharitri Misra,  
>>> James
>>> Seamans, George R. Thoma, National Library of Medicine, Bethesda,  
>>> Maryland,
>>> USA
>>>
>>> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf
>>
>> That paper doesn't use the DSpace importer, so I fail to see how it  
>> can
>> claim the importer scales well.
>
> I don't think it does make such a claim. The claim it makes is that  
> DSpace
> can still provide an acceptable level of ingest performance when  
> loaded with
> 1 million items.
>
> The original email asked "Is there any known problem with the  
> maximum size
> of dspace database using postgres or in the import tool?" so the  
> first part
> of that, subject as you said to having a 'busy DSpace', is hopefully  
> to some
> extent answered by that paper.


Given that the test in the paper uses neither postgres nor the DSpace  
import tool, that seems unlikely.

--
Simon Brown  - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH



--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Stuart Lewis
Hi Tom,

>> The following paper talks about this, and how DSpace performs when ingesting
>> 1 million items:
>> 
>> Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James
>> Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland,
>> USA
>> 
>> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf
> 
> That paper doesn't use the DSpace importer, so I fail to see how it can
> claim the importer scales well.

I don't think it does make such a claim. The claim it makes is that DSpace
can still provide an acceptable level of ingest performance when loaded with
1 million items.

The original email asked "Is there any known problem with the maximum size
of dspace database using postgres or in the import tool?" so the first part
of that, subject as you said to having a 'busy DSpace', is hopefully to some
extent answered by that paper.
 
Cheers,


Stuart
_

Gwasanaethau Gwybodaeth  Information Services
Prifysgol Aberystwyth  Aberystwyth University

E-bost / E-mail: stuart.le...@aber.ac.uk
 Ffon / Tel: (01970) 622860
_


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Stuart Lewis
Hi Simon,

>> Testing the Scalability of a DSpace-based Archive, Dharitri Misra,
>> James
>> Seamans, George R. Thoma, National Library of Medicine, Bethesda,
>> Maryland,
>> USA
>> 
>> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf
> 
> Can I enquire as to where within the standard DSpace toolset the
> SIPIngestManager, as used in the tests in this paper, may be found? I
> haven't been able to locate it.

It sounds like it is a custom tool they wrote.

Thanks,


Stuart
_

Gwasanaethau Gwybodaeth  Information Services
Prifysgol Aberystwyth  Aberystwyth University

E-bost / E-mail: stuart.le...@aber.ac.uk
 Ffon / Tel: (01970) 622860
_


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Tom De Mulder
On Tue, 27 Jan 2009, Stuart Lewis wrote:

> The following paper talks about this, and how DSpace performs when ingesting
> 1 million items:
>
> Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James
> Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland,
> USA
>
> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf
>
> Is this one big import of 30,000 items, or do you break them up into smaller
> chunks?

That paper doesn't use the DSpace importer, so I fail to see how it can 
claim the importer scales well.

I can tell from a lot of first-hand experience that the DSpace importer 
doesn't scale, and that it gets slower as you have more items in your 
DSpace instance, as well as slowing down for each item in the batch.

In addition, if you have a busy DSpace instance, there may be issues with 
file locking where deleted filehandles don't get recovered properly.


best,

--
Tom De Mulder  - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 27/01/2009 : The Moon is Waning Crescent (3% of Full)

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Christophe Dupriez

I would recommend two things;
1) batches of 200 documents (elsewhere their XML representation in 
memory becomes very big)
2) make a PostgreSQL Maintenance (ANALYZE) after loading the first 200 
and after loading a big part (first 10 thousand records)


Christophe

Stuart Lewis a écrit :

Hi Ilias,

  

 I am using the dspace import tool for batch ingesting in a Postgres
database and I am facing extremely slow feedback in each record commitment.
Initially, the speed was normal but when the items tend to be around 30
thousand, the speed of each commitment is unacceptable.
Is there any known problem with the maximum size of dspace database
using postgres or in the import tool?

Any comments will be helpful.



The following paper talks about this, and how DSpace performs when ingesting
1 million items:

Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James
Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland,
USA

http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf

Is this one big import of 30,000 items, or do you break them up into smaller
chunks?

Thanks,


Stuart
_

Gwasanaethau Gwybodaeth  Information Services
Prifysgol Aberystwyth  Aberystwyth University

E-bost / E-mail: stuart.le...@aber.ac.uk
 Ffon / Tel: (01970) 622860
_


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


  


begin:vcard
fn:Christophe Dupriez
n:Dupriez;Christophe
org:DESTIN inc. SSEB
adr;quoted-printable:;;rue des Palais 44, bo=C3=AEte 1;Bruxelles;;B-1030;Belgique
email;internet:christophe.dupr...@destin.be
title:Informaticien
tel;work:+32/2/216.66.15
tel;fax:+32/2/242.97.25
tel;cell:+32/475.77.62.11
note;quoted-printable:D=C3=A9veloppement de Syst=C3=A8mes de Traitement de l'Information
x-mozilla-html:TRUE
url:http://www.destin.be
version:2.1
end:vcard

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Simon Brown

On 27 Jan 2009, at 10:27, Stuart Lewis wrote:

> Hi Ilias,
>
>> I am using the dspace import tool for batch ingesting in a Postgres
>> database and I am facing extremely slow feedback in each record  
>> commitment.
>> Initially, the speed was normal but when the items tend to be  
>> around 30
>> thousand, the speed of each commitment is unacceptable.
>> Is there any known problem with the maximum size of dspace database
>> using postgres or in the import tool?
>>
>> Any comments will be helpful.
>
> The following paper talks about this, and how DSpace performs when  
> ingesting
> 1 million items:
>
> Testing the Scalability of a DSpace-based Archive, Dharitri Misra,  
> James
> Seamans, George R. Thoma, National Library of Medicine, Bethesda,  
> Maryland,
> USA
>
> http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf
>
> Is this one big import of 30,000 items, or do you break them up into  
> smaller
> chunks?


Can I enquire as to where within the standard DSpace toolset the  
SIPIngestManager, as used in the tests in this paper, may be found? I  
haven't been able to locate it.

--
Simon Brown  - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH



--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Stuart Lewis
Hi Ilias,

>  I am using the dspace import tool for batch ingesting in a Postgres
> database and I am facing extremely slow feedback in each record commitment.
> Initially, the speed was normal but when the items tend to be around 30
> thousand, the speed of each commitment is unacceptable.
> Is there any known problem with the maximum size of dspace database
> using postgres or in the import tool?
> 
> Any comments will be helpful.

The following paper talks about this, and how DSpace performs when ingesting
1 million items:

Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James
Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland,
USA

http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf

Is this one big import of 30,000 items, or do you break them up into smaller
chunks?

Thanks,


Stuart
_

Gwasanaethau Gwybodaeth  Information Services
Prifysgol Aberystwyth  Aberystwyth University

E-bost / E-mail: stuart.le...@aber.ac.uk
 Ffon / Tel: (01970) 622860
_


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Speed problem in postgres during batch ingesting

2009-01-27 Thread Hlias Stavrakis

Hi,
I am using the dspace import tool for batch ingesting in a Postgres 
database and I am facing extremely slow feedback in each record commitment.
Initially, the speed was normal but when the items tend to be around 30 
thousand, the speed of each commitment is unacceptable.
Is there any known problem with the maximum size of dspace database 
using postgres or in the import tool?


Any comments will be helpful.

Thanks,
Ilias Stavrakis


smime.p7s
Description: S/MIME Cryptographic Signature
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech