Hi Terry: I’m away from the office with limited network, but based on a quick look at the code, I have a few suggestions.
You don’t indicate how you run this task, but I’m assuming that for the large collections at least, you are not doing it in the admin UI but with the command-line tool. It has a switch ‘-l’ for limiting the number of objects in a context. I’d recommend using a value like -l 100 to -l 1000, which should ensure that the context is cleared before getting too large. (Check the doc for command-line usage). I would also remove all the Context management in the task code itself (READ-ONLY or otherwise) - I ‘m not sure it’s needed and indeed may a source of memory problems. Wherever you need a context reference (e..g authorizeActionBoolean(context, item, Constants.READ, false)), use the API call Curator.curationContext() instead. This will insure that the same context object is reused each time. You should not have to allocate any contexts yourself. Please report if you still encounter problems, and I’ll look more thoroughly, Thanks, Richard On Dec 16, 2014, at 2:49 PM, Terry Brady <terry.br...@georgetown.edu> wrote: > I am experimenting with the Curation System. I have written a task to crawl > a collection/community and identify specific items that are exception cases > (restricted items, multiple bitstreams, non-standard bitstream type). > > https://gist.github.com/terrywbrady/24f6ddf24d9026149aff > > The process is working well for me, but I encounter memory/heap/garbage > collection exceptions when I attempt to process my largest collection. That > collection contains 150,000 items. > > I have discovered that I need to crank up the memory and turn on incremental > garbage collection in order to get the process to complete. > > export JAVA_OPTS="-Xmx3000m -Xincgc" > > Since I am simply processing items in a read-only fashion, I am surprised > that I have needed these settings in order to process my collection. Can you > recommend a more efficient way to traverse the collection and the items? > > CONTEXT INITIALIZATION > I attempted to set the READ_ONLY option to prevent result caching. > > Context context = new Context(Context.READ_ONLY); > > ITEM TRAVERSAL > ItemIterator iter = ((Collection)dso).getAllItems(); > while (iter.hasNext()) > { > performObject(iter.next()); > } > iter.close(); > > ITEM ACCESS CHECK > if (!AuthorizeManager.authorizeActionBoolean(context, item, > Constants.READ, false)) { > > BUNDLE/BITSTREAM TRAVERSAL > boolean hasAnon = true; > for (Bundle bundle : item.getBundles("ORIGINAL")) { > for (Bitstream bs : bundle.getBitstreams()) { > count++; > String type = bs.getFormat().getMIMEType(); > > if (isStandardMimeType(type)) { > } else { > errtype = type; > unsuppType++; > } > > hasAnon = hasAnon && > AuthorizeManager.authorizeActionBoolean(context, bs, Constants.READ, false); > } > } > > Does this sound like an issue that should be submitted as a bug report? > > I am running DSpace 4.2. > > Thanks, Terry > > -- > Terry Brady > Applications Programmer Analyst > Georgetown University Library Information Technology > https://www.library.georgetown.edu/lit/code > 425-298-5498 > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk_______________________________________________ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech > List Etiquette: > https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette