I developed a set of four related command line apps for dspace:
    1) a lister / report generator
    2) a policy tool, that adds / removes policies
    3) a metada tool, that adds / removes specific metadata  values
    4) a bitstream replacer tool

All expect 4 work on a set of dspace objects specified by two command line 
arguments:
    --root  ROOT      - where ROOT is a handle  or  object type follow by an ID
    --type  TYPE      - where TYPE is  one of  collection, item, bundle, or 
bitstream

—-root COLLECTION.10  --type BITSTREAM
    means work on all bitstreams in collection with ID 10
--root handle/12345  --type ITEM
    means work on all items contained in the object designated by handle

There is an additional argument, --doWorkFlowItems, that restricts sets to 
items in workflows and by extension to bundles or bitstreams in items in 
workflows.


The lister generates tsv or txt formatted output, printing properties of the 
selected set of DSpace objects. Its  --include option determines which 
properties are printed.  You can choose to print IDs and handles, as well as 
policy information, or specify select item metadata fields. You can include an 
items 'withdrawn' status or a bundle's embargo state. Bitstream reports may 
print mimeType, checksum, ...   When printing DSpace objects, you can choose to 
print properties of enclosing Dspace objects. For example when printing  
bitstreams in a collection, you can include bundle names,  item handles, even 
item metadata values by using options like these:
    --include 
'object,name,mimeType,BUNDLE.name,ITEM.handle,ITEM.dc.contributor,author'

The lister works nicely with the other commands, since all four commands use 
the same mechanism to select the objects they work on. For example you might 
use the lister to review which DSpace objects need policy or metadata changes. 
After applying changes, it comes in handy, when making sure the changes 
performed are in fact the ones, that were intended.


The policy tool decides which action to apply to each DSpaceObject selected by 
the --root and --type parameters based on three options:
    --action   [ADD | DEL ]      - whether to add or delete policies
    --dspace_action  [READ | WRITE | REMOVE | ... ]
    --who [group  | eperson]

For example
    dspace bulk-pols -r handle/712657 -t BITSTREAM —action ADD  —dspace_action 
WRITE --who EPERSON.monikam
        gives the eperson monikam WRITE priviledges  on all bitstreams 
contained in the object  behind the given handle, which may be a community, 
collection, or item.

    dspace bulk-pols -r handle/712657 -t BITSTREAM -a DEL -d READ -w 
GROUP.Anonymous
        removes the READ permission from the Anonymous group

The metadata tool works similar to the policy tool. Of cause it makes only 
sense to apply to item sets.

The bitstream replacer works on single bitstreams. It is related to the other 
tools in that it selects the bitstream to work on in the same fashion, aka with 
--root and --type arguments.

I developed these commands in connection with a project here at Princeton, 
where I needed to  add a cover page to all bitstreams in original bundles in a 
community. The lister gave me the list of bitstreams. Printing the list in txt 
format, allowed me to grep for name=ORIGINAL. I included the mimeType in the  
listing, so I would only work on pdf documents. Including the internalId 
allowed me to use the file right from the assetstore and stick it into my  ‘add 
the cover page’ script. I  replaced the old bitstream using the IDs, printed 
earlier, to define the —root parameter to  the bitstream replacer.  Finally I 
used the lister to check on the access policies of the bitstreams.  Right now I 
run the lister command in a cronjob to watch the submission progress in one of 
our communities.

I wrote more detailed documentation which is part of the pull request that I 
created for this code. Here at Princeton we are still running 1.8. The bulk-do 
code mostly lives in its own package and should play well with version 3 (I 
have not tried it). The PR is based on the master.  In other words unless you 
run pre 1.8,  merging this into your version should be relatively painless - 
and it goes without saying - I'd help sort out conflicts.

The PR is HERE<https://github.com/DSpace/DSpace/pull/560> and the documentation 
is 
THERE<https://github.com/akinom/DSpace/blob/prq_bulk_commands/dspace-api/src/main/java/org/dspace/app/bulkdo/README.md>

I believe this code would be useful for many DSpace administrators.  It would 
be straight forward to add a JSON/XML output format to offer this functionally 
in the REST API.  So please have a look, send feedback, and possibly step up as 
a volunteer tester / reviewer.


Monika

—
Monika Mevenkamp
phone: 609-258-4161
123 693 Alexander Street, Princeton University, Princeton, NJ 08544
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to