I am experimenting with the Curation System. I have written a task to
crawl a collection/community and identify specific items that are exception
cases (restricted items, multiple bitstreams, non-standard bitstream type).
https://gist.github.com/terrywbrady/24f6ddf24d9026149aff
The process is working well for me, but I encounter memory/heap/garbage
collection exceptions when I attempt to process my largest collection.
That collection contains 150,000 items.
I have discovered that I need to crank up the memory and turn on
incremental garbage collection in order to get the process to complete.
export JAVA_OPTS="-Xmx3000m -Xincgc"
Since I am simply processing items in a read-only fashion, I am surprised
that I have needed these settings in order to process my collection. Can
you recommend a more efficient way to traverse the collection and the items?
CONTEXT INITIALIZATION
I attempted to set the READ_ONLY option to prevent result caching.
Context context = new Context(Context.READ_ONLY);
ITEM TRAVERSAL
ItemIterator iter = ((Collection)dso).getAllItems();
while (iter.hasNext())
{
performObject(iter.next());
}
iter.close();
ITEM ACCESS CHECK
if (!AuthorizeManager.authorizeActionBoolean(context, item,
Constants.READ, false)) {
BUNDLE/BITSTREAM TRAVERSAL
boolean hasAnon = true;
for (Bundle bundle : item.getBundles("ORIGINAL")) {
for (Bitstream bs : bundle.getBitstreams()) {
count++;
String type = bs.getFormat().getMIMEType();
if (isStandardMimeType(type)) {
} else {
errtype = type;
unsuppType++;
}
hasAnon = hasAnon &&
AuthorizeManager.authorizeActionBoolean(context, bs, Constants.READ, false);
}
}
Does this sound like an issue that should be submitted as a bug report?
I am running DSpace 4.2.
Thanks, Terry
--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette