[Dspace-tech] Long Delay in doing Submit - Part 2

2008-02-15 Thread George Kozak
Hi...
Back in January, I wrote to the list about a problem we are having 
(at Cornell) with DSpace after we migrated from v. 1.3.2 to v. 1.4.2.

Basically, if someone logs into DSpace by clicking on My DSpace and 
then clicks on the Start a New Submission button, there can be a 
delay from anywhere from 30 seconds to several minutes before the 
Submit: Choose Collection screen appears.  Clicking on the Start a 
New Submission button in My DSpace can run the CPU usage up over 
40%.  This doesn't happen if one goes directly to the Collection to 
which he or she is authorized to submit and clicks on the Submit to 
this Collection button there.

Randall Floyd of Indiana University thought that this might be a 
cleanup program in PostGreSQL, but I run vaccumdb nightly and do a 
re-index daily.

Claudia Jurgen of TU Dortmund suggested that this may be the result 
of a complicated Community/SubCommunity/Collection structure (we have 
118 Communities/Sub-Communities and 359 Collections).

I have been looking at alternatives.  I have put a message on the My 
DSpace main page telling people to go to the collection they want to 
submit to and click on the Submit to this Collection button there, 
but (as you can imagine) most people aren't doing that, and I am 
beginning to receive complaints from our users.

Is anyone else having this problem?

Does anyone have any ideas as to what I can do to make this a little 
bit better?   Thank you!

(PS.  I am running a SunFire-480-R with 4GB of memory and 8GB of swap 
space.  I am running PostGreSQL 7.2.3 and Tomcat 4.0.6.)

***
George Kozak
Digital Library Specialist
Library Systems
501 Olin Library
Cornell University
607-255-8924
***
[EMAIL PROTECTED]  


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Long Delay in doing Submit - Part 2

2008-02-15 Thread Desmond Elliott
I hypothesise that retrieving the ResourcePolicy for each of the 359 
Collections and then determining if a user is a member of the Group 
associated with the Collection is why there is a long delay in 
submissions from the MyDSpace interface.

George could insert log.info() method calls into the SubmitServlet.java 
file at line 306, and after line 309, and re-compile and re-deploy. 
These calls would log the time of the start and the end of the operation 
for users who are / are not administrators to attempt to help determine 
if this is the reason for long delays. If this proves there is an 
objective difference in the time then the log methods can be moved 
closer to the end of the call tree to determine exactly where most 
computation occurs.

The SubmitServlet adds a Collection[] to the select-collection step of 
the submission process [-1] line 308.

This Collection[] is created through a call to the 
Collection.findAuthorized(context) static method [0] line 1150.

The findAuthorized() method calls the findAll() method, [0] line 273, 
which returns a list of all Collections in the collection database table 
which should not be an expensive operation since it queries the database 
and retrieves 359 rows.

A line contained inside the findAll() method block, line 285, determines 
if the Collection object to be added to the Collection[] is cached in 
memory. This should not be an expensive operation since the fromCache() 
method determines if the Collection object is cached in a HashMap - 
which offers O(1) access, [1] line 289.

The returned Collection[] is then examined, for each Collection object, 
to determine if the current user has authorisation to write to that 
Collection. The AuthorizeManager class method authorizeBooleanAction() 
[2] line 217, calls authorizeAction() [2] line 131, calls authorize() 
[2] line 258.

The authorize() method in AuthorizeManager retrieves a ResourcePolicy 
and determines if the user is authorized as per the ResourcePolicy to 
have access to the collection.

Users who are admins do have have to step through the ResourcePolicy 
stage. George states that he suffers a delay when trying to submit from 
the MyDSpace page but that it is a shorter delay than what others are 
experiencing - 30s - 60s vs 60s - 300s. The only stage in the process 
I can determine where there is a difference between Admins and Users is 
at this point.

Having to retrieve the ResourcePolicy for each of the 359 collection 
objects and determine if a user is part of that ResourcePolicy could be 
the choke point.

I appreciate any thoughts on my analysis of what could be causing the 
problem that George is experiencing.

[-1] 
http://dspace.svn.sourceforge.net/viewvc/dspace/branches/dspace-1_4_x/dspace/src/org/dspace/app/webui/servlet/SubmitServlet.java?view=markup
[0] 
http://dspace.svn.sourceforge.net/viewvc/dspace/branches/dspace-1_4_x/dspace/src/org/dspace/content/Collection.java?view=markup
[1] 
http://dspace.svn.sourceforge.net/viewvc/dspace/branches/dspace-1_4_x/dspace/src/org/dspace/core/Context.java?view=markup
[2] 
http://dspace.svn.sourceforge.net/viewvc/dspace/branches/dspace-1_4_x/dspace/src/org/dspace/authorize/AuthorizeManager.java?view=markup

-- 
Desmond Elliott   |  Hewlett-Packard Limited registered Office:
Research Associate|  Cain Road,
HP Labs   |  Bracknell,
Bristol, UK   |  Berks
+44 117 312 8526  |  RG12 1HN.
[EMAIL PROTECTED]|  Registered No: 690597 England

The contents of this message and any attachments to it are
confidential and may be legally privileged. If you have received this
message in error, you should delete it from your system immediately
and advise the sender. To any recipient of this message within HP,
unless otherwise stated you should consider this message and
attachments as HP CONFIDENTIAL.


George Kozak wrote:

 Hi...
 Back in January, I wrote to the list about a problem we are having
 (at Cornell) with DSpace after we migrated from v. 1.3.2 to v. 1.4.2.

 Basically, if someone logs into DSpace by clicking on My DSpace and
 then clicks on the Start a New Submission button, there can be a
 delay from anywhere from 30 seconds to several minutes before the
 Submit: Choose Collection screen appears.  Clicking on the Start a
 New Submission button in My DSpace can run the CPU usage up over
 40%.  This doesn't happen if one goes directly to the Collection to
 which he or she is authorized to submit and clicks on the Submit to
 this Collection button there.

 Randall Floyd of Indiana University thought that this might be a
 cleanup program in PostGreSQL, but I run vaccumdb nightly and do a
 re-index daily.

 Claudia Jurgen of TU Dortmund suggested that this may be the result
 of a complicated Community/SubCommunity/Collection structure (we have
 118 Communities/Sub-Communities and 359 Collections).

 I have been looking at alternatives.  I have put a message on the My
 DSpace main page telling people to