I am experimenting with the Curation System.  I have written a task to
crawl a collection/community and identify specific items that are exception
cases (restricted items, multiple bitstreams, non-standard bitstream type).

https://gist.github.com/terrywbrady/24f6ddf24d9026149aff

The process is working well for me, but I encounter memory/heap/garbage
collection exceptions when I attempt to process my largest collection.
That collection contains 150,000 items.

I have discovered that I need to crank up the memory and turn on
incremental garbage collection in order to get the process to complete.

export JAVA_OPTS="-Xmx3000m -Xincgc"

Since I am simply processing items in a read-only fashion, I am surprised
that I have needed these settings in order to process my collection.  Can
you recommend a more efficient way to traverse the collection and the items?

CONTEXT INITIALIZATION
I attempted to set the READ_ONLY option to prevent result caching.

Context context = new Context(Context.READ_ONLY);

ITEM TRAVERSAL
                ItemIterator iter = ((Collection)dso).getAllItems();
                while (iter.hasNext())
                {
                    performObject(iter.next());
                }
                iter.close();

ITEM ACCESS CHECK
        if (!AuthorizeManager.authorizeActionBoolean(context, item,
Constants.READ, false)) {

BUNDLE/BITSTREAM TRAVERSAL
        boolean hasAnon = true;
        for (Bundle bundle : item.getBundles("ORIGINAL")) {
            for (Bitstream bs : bundle.getBitstreams()) {
                count++;
                String type = bs.getFormat().getMIMEType();

                if (isStandardMimeType(type)) {
                } else {
                    errtype = type;
                    unsuppType++;
                }

                hasAnon = hasAnon &&
AuthorizeManager.authorizeActionBoolean(context, bs, Constants.READ, false);
            }
        }

Does this sound like an issue that should be submitted as a bug report?

I am running DSpace 4.2.

Thanks, Terry

-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
https://www.library.georgetown.edu/lit/code
425-298-5498
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to