Hi Sid,
 
    this is very useful informations that should be shared will all the DSpace 
community. Also, I don't have a very good knowledge of Manakin so, maybe other 
people could be of help.
 
If you actually experiencing some database performance issue, it could explain 
why your batch importing is slower in PROD : there's quite a bunch of queries 
that are executed into the database when importing, even for bitstreams. Also, 
you could have some performance issue through the i/o access of your disk which 
is supposed to be managed by your RAID application (depending of which kind of 
RAID you have).
Also, I need some DSpace experts here, but if I'm not wrong, the maximum number 
of subdirectories in the default assetstore could reach up to 1000000 (100 X 
100 X 100). It could be a problem if for each bitstream, DSpace have to check 
where it must place it (not sure though that this can be an issue).
 
For the front end, it could be a Tomcat problem : if the frequency of hits is 
high for items containing a lot of bitstreams, this can produce very large 
sessions and uses almost all of your server's memory by Tomcat. This is due to 
the Context object in DSpace which is kept in the user's session. Again, that 
is hypothetical and I need a DSpace expert here to be sure if sessions in 
DSpace could use a lot of memory. Mind that if your problems are fixed by 
rebooting it, this means there's a good chance that part of your solution will 
consist to boost your Tomcat memory and also your server's memory.
 
Also if you are producing thumbnails for each bitstreams and extracting the 
documents contents, that means your up to more than 1500000 bitstreams... But 
again, your problem could be more related to RAID and I/O access and database 
performance.
 
Is your PostgreSQL installed on another server or on the same machine?
 
By the way, if some pages are still served when the application stops 
responding, that's probably because of your browser's cache.
 
Hope it helps.
 
 

________________________________

De: Sid Byrd [mailto:[EMAIL PROTECTED]
Date: mer. 2008-02-06 16:33
À: Tellier, Stephane
Cc: Geneva Henry
Objet : Re: [Dspace-tech] RE : DSpace stability with large number of bitstreams



Hi.

I'm not sure if the number of bitstreams is causing our stability 
problems. Those started suddenly and seem to be database-related; I'm 
still trying to figure that one out. Bitstreams might be what causes 
batch imports to be so slow, though. On our test server, which has 
identical hardware but much less content, batch imports are quite 
fast, but on the production server they take about a second per file 
to import -- this can be many hours in total for a large batch.

The highest number of bitstreams for one item is about 4800. We have a 
several dozen items with over 1000 bitstreams. These large items are 
all books consisting of one or two main TEI-encoded XML files plus 
full-size and thumbnail images for each page. Most are in a collection 
that gets about 40,000 item views per month. We have actually bypassed 
two big scalability problems serving these items already. First, users 
now usually don't ever view most of the bitstreams in the books. 
Instead, they usually view a cached HTML derivative (stored in DSpace) 
that draws its thumbnails from duplicate copies of the thumbnail files 
that Apache serves directly. Even though DSpace also archives the 
thumbnails, I couldn't figure out how to have DSpace be responsible 
for serving them without it choking and dying on the hundreds of 
simultaneous bitstream requests sent by the user's browser. The other 
DSpace scalability problem we had to bypass for these large items had 
to do with item list views (like search results) in Manakin: the 
intermediate DRI XML included full METS for each item and it was 
choking Cocoon. The developers at A&M changed Manakin to only 
reference the METS at a separate lazily-loaded internal URL so the DRI 
wouldn't bloat unnecessarily, and that problem went away. So we've 
already tackled the most obvious problems with serving these large 
items, although there could be other subtler issues. I haven't yet 
properly profiled what happens during a batch import that makes it so 
slow.


I'll answer the questions about general stability anyway: Everything 
so far is in the default assetstore on a local RAID. A df on the RAID 
looks like this:
Filesystem                Size   Used  Avail Capacity   iused     
ifree %iused  Mounted on
/dev/disk2s3              1.8T   591G   1.2T    32% 154887214 
333462470   32%   /Volumes/vol1
Dspace just stops responding, at least on the front page, although 
rather bizarrely, it sometimes still serves pages I have open in a 
browser. Restarting Tomcat usually fixes things.
Postgres has a cron job to vacuumdb every night.

-Sid


On Feb 6, 2008, at 11:27 AM, Tellier, Stephane wrote:

> Geneva,
>
>     since you don't have a lot of items, this cannot be related to 
> the browse index problems in version 1.4.2 and earlier. It's the 
> first time I'm seeing such an issue (about number of bitstreams) 
> since I'm following the DSpace community so I'm very curious about 
> the reason.
> Can you tell us about the average number of bitstreams each items 
> possess?
> What's the highest number of bitstreams for one item?
> Are the items, possessing a lot of bitstreams, frequently accessed 
> by users?
> In which cases does you experience sluggish responses? When you 
> access an item? When you're doing a search? When you're opening a 
> bitstream?
> Can you know approximately what's the average number of concurrent 
> users that are accessing your DSpace instance?
> What do you mean by "platform going down"? The machine or Tomcat 
> only? Do you restart only Tomcat to resolve that issue or do you 
> have to restart the server entirely?
> Are the bitstreams registered or not? Do they resides in the default 
> assetstore or in another server (SAN, SRB)? If they resides in the 
> same server, what's the capacity of its drive (space and number of 
> inodes for directories, in case of a Unix/Linux system)?
>
> Thanks for your responses, this can help.
>
> De: [EMAIL PROTECTED] de la part de Geneva 
> Henry
> Date: mer. 2008-02-06 10:01
> À: Richard Rodgers
> Cc: dspace-tech@lists.sourceforge.net
> Objet : Re: [Dspace-tech] DSpace stability with large number of 
> bitstreams
>
> Sluggish response and the platform goes down rather frequently. We 
> have
> a test server and development server running, with the test server
> configured the same as the production server except for the content
> load. The test environment works just fine so that's why we're 
> thinking
> it's the number of bitstreams.
>
>                 regards,
>                 geneva
>
>  ^~^
> (o o)
> /`v`\ who?
> |'''|
> |\'/|
>  """
> Geneva Henry
> Executive Director, Digital Library Initiative
> Rice University
> Fondren Library -- MS-44
> P.O. Box 1892
> Houston, TX  77251-1892
> (overnight delivery address: 6100 Main St., Houston, TX 77005)
> voice:(713)348-2480 | [EMAIL PROTECTED]
>
>
>
> Richard Rodgers wrote:
> > Hi geneva:
> >
> > Can you characterize the stability problems a bit more? Errors or
> > sluggish responses, etc
> >
> > Thanks,
> >
> > Richard
> >
> > Quoting Geneva Henry <[EMAIL PROTECTED]>:
> >
> >
> >> On our production DSpace server we're managing around 12,243 
> items, but
> >> around 430,000 bitstreams. We've seen a lot of issues with the 
> server's
> >> stability and are wondering if it's related to this large number of
> >> bitstreams being managed. Has anyone else experienced stability 
> problems
> >> as the number of bitstreams has scaled up significantly?
> >>
> >> --
> >>              regards,
> >>              geneva
> >>
> >> ^~^
> >> (o o)
> >> /`v`\ who?
> >> |'''|
> >> |\'/|
> >> """
> >> Geneva Henry
> >> Executive Director, Digital Library Initiative
> >> Rice University
> >> Fondren Library -- MS-44
> >> P.O. Box 1892
> >> Houston, TX  77251-1892
> >> (overnight delivery address: 6100 Main St., Houston, TX 77005)
> >> voice:(713)348-2480 | [EMAIL PROTECTED]
> >>
> >>
> >> 
> -------------------------------------------------------------------------
> >> This SF.net email is sponsored by: Microsoft
> >> Defy all challenges. Microsoft(R) Visual Studio 2008.
> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> >> _______________________________________________
> >> DSpace-tech mailing list
> >> DSpace-tech@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> >>
> >>
> >
> >
> >
> > 
> -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > DSpace-tech mailing list
> > DSpace-tech@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/dspace-tech
> >
> > !DSPAM:6672,47a9c150263744090616079!
> >
> >
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to