Richard,
Thank you for the suggestions. I have implemented the Context parameter as
you suggest.
I am running the process from the command line. Unfortunately, I do not
see the -l parameter having an effect.
Unless I set the memory allocation to -Xmx3000m, the process continues to
fail. Here is my stacktrace on failure.
[dspace@dspace-dev-1 ~]$ /opt/dspace/bin/dspace curate -l 100 -v -i
10822/559389 -t exception -r -
Adding task: exception
Starting curation
Curating id: 10822/559389
Exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:559)
at java.util.HashMap.addEntry(HashMap.java:851)
at java.util.HashMap.put(HashMap.java:484)
at org.dspace.storage.rdbms.TableRow.<init>(TableRow.java:57)
at
org.dspace.storage.rdbms.DatabaseManager.process(DatabaseManager.java:1084)
at
org.dspace.storage.rdbms.TableRowIterator.next(TableRowIterator.java:151)
at org.dspace.content.Bundle.<init>(Bundle.java:105)
at org.dspace.content.Item.getBundles(Item.java:1190)
at org.dspace.content.Item.getBundles(Item.java:1223)
at
org.dspace.ctask.georgetown.ExceptionReport.performItem(ExceptionReport.java:192)
at
org.dspace.curate.AbstractCurationTask.performObject(AbstractCurationTask.java:137)
at
org.dspace.ctask.georgetown.ExceptionReport.distribute(ExceptionReport.java:140)
at
org.dspace.ctask.georgetown.ExceptionReport.perform(ExceptionReport.java:101)
at org.dspace.curate.ResolvedTask.perform(ResolvedTask.java:88)
at org.dspace.curate.Curator$TaskRunner.run(Curator.java:563)
at org.dspace.curate.Curator.curate(Curator.java:260)
at org.dspace.curate.Curator.curate(Curator.java:207)
at org.dspace.curate.CurationCli.main(CurationCli.java:242)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
at
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
Please let me know if you have any additional suggestions.
Thanks, Terry
On Tue, Dec 16, 2014 at 1:11 PM, Richard Rodgers <[email protected]> wrote:
>
> Hi Terry:
>
> I’m away from the office with limited network, but based on a quick look
> at the code, I have a few suggestions.
>
> You don’t indicate how you run this task, but I’m assuming that for the
> large collections at least, you are not doing it in the admin UI but with
> the command-line tool.
> It has a switch ‘-l’ for limiting the number of objects in a context. I’d
> recommend using a value like -l 100 to -l 1000, which should ensure that
> the context is cleared before getting too large.
> (Check the doc for command-line usage).
>
> I would also remove all the Context management in the task code itself
> (READ-ONLY or otherwise) - I ‘m not sure it’s needed and indeed may a
> source of memory problems.
> Wherever you need a context reference (e..g authorizeActionBoolean(context,
> item, Constants.READ, false)), use the API call Curator.curationContext()
> instead.
> This will insure that the same context object is reused each time. You
> should not have to allocate any contexts yourself.
>
> Please report if you still encounter problems, and I’ll look more
> thoroughly,
>
> Thanks,
>
> Richard
>
> On Dec 16, 2014, at 2:49 PM, Terry Brady <[email protected]>
> wrote:
>
> I am experimenting with the Curation System. I have written a task to
> crawl a collection/community and identify specific items that are exception
> cases (restricted items, multiple bitstreams, non-standard bitstream type).
>
> https://gist.github.com/terrywbrady/24f6ddf24d9026149aff
>
> The process is working well for me, but I encounter memory/heap/garbage
> collection exceptions when I attempt to process my largest collection.
> That collection contains 150,000 items.
>
> I have discovered that I need to crank up the memory and turn on
> incremental garbage collection in order to get the process to complete.
>
> export JAVA_OPTS="-Xmx3000m -Xincgc"
>
> Since I am simply processing items in a read-only fashion, I am surprised
> that I have needed these settings in order to process my collection. Can
> you recommend a more efficient way to traverse the collection and the items?
>
> CONTEXT INITIALIZATION
> I attempted to set the READ_ONLY option to prevent result caching.
>
> Context context = new Context(Context.READ_ONLY);
>
> ITEM TRAVERSAL
> ItemIterator iter = ((Collection)dso).getAllItems();
> while (iter.hasNext())
> {
> performObject(iter.next());
> }
> iter.close();
>
> ITEM ACCESS CHECK
> if (!AuthorizeManager.authorizeActionBoolean(context, item,
> Constants.READ, false)) {
>
> BUNDLE/BITSTREAM TRAVERSAL
> boolean hasAnon = true;
> for (Bundle bundle : item.getBundles("ORIGINAL")) {
> for (Bitstream bs : bundle.getBitstreams()) {
> count++;
> String type = bs.getFormat().getMIMEType();
>
> if (isStandardMimeType(type)) {
> } else {
> errtype = type;
> unsuppType++;
> }
>
> hasAnon = hasAnon &&
> AuthorizeManager.authorizeActionBoolean(context, bs, Constants.READ, false);
> }
> }
>
> Does this sound like an issue that should be submitted as a bug report?
>
> I am running DSpace 4.2.
>
> Thanks, Terry
>
> --
> Terry Brady
> Applications Programmer Analyst
> Georgetown University Library Information Technology
> https://www.library.georgetown.edu/lit/code
> 425-298-5498
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk_______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
>
>
--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette