Yeah, the speed is not that crucial, if it stays somewhere in the order of 
minutes or even some hours. What I'm doing in is transferring items from CRIS, 
which doesn't know which items DSpace already has, and I'll have to prune those 
records already in the DSpace. This happens once a day (night) by cron, so I 
can live with that speed. It's just probably the little computer scientist in 
me that had hoped for the most efficient solution.

Thanks for the numbers and testing!

Ilja
________________________________________
From: Monika Mevenkamp <mome...@gmail.com>
Sent: Thursday, September 1, 2016 7:05:12 PM
To: Ilja Sidoroff
Cc: DSpace Tech
Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST

does speed matter ?  Is this something you’ll have to do a lot - or is it one 
of those one-of-scripts ?

If you run this on the command line / cron   it may not be so important - 
especially with a cron job  you may not care that much - as log as you can 
start it at midnight and it gets done by 7am

Calling the JRuby script from the UI, aka calling from Java is possible - but I 
have not actually done that yet

I don’t believe that calling Java via JRuby adds much to the performance

A bigger issue,  I see, is that DSpace.findByMetadataValue  returns an array of 
matching DSpaceObjects - if speed matters this needs to be changed to return an 
iterator, which shouldn’t be too hard

Why not just try and see - since the script only accesses data and does not 
change anything - there is no danger to disturb your instance. Plus you can run 
this anywhere - as long as you have access to the database.

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Sep 1, 2016, at 11:48 AM, Ilja Sidoroff <ilja.sidor...@uef.fi> wrote:
>
> Thanks! That script would indeed do what I'd need, but I'm bit concerned 
> about the scalability, since it will have to do one request per item - and if 
> I have thousands of items, that might get a bit heavy? Or would it? I really 
> don't know don't know how long for instance 10.000 item/id/metadata requests 
> would take.
>
> Ilja
>
> ________________________________________
> From: Monika Mevenkamp <mome...@gmail.com>
> Sent: Thursday, September 1, 2016 6:30:33 PM
> To: Ilja Sidoroff
> Cc: DSpace Tech
> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST
>
> Hi Ilja
>
> I have a script that given a metadata field, e.g. pu.workflow.state, produces 
> a tab separated list so:
>
> field   id      handle  value
> pu.workflow.state       969     99999/fk4w099v32        approved
> pu.workflow.state       903     null    emailed
> pu.workflow.state       753     null    emailed
> pu.workflow.state       752     null    emailed
> pu.workflow.state       902     null    orphaned
>
>
> The script is written in jruby and based on my dspace-jruby gem, see Script 
> here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>.
> The gem as well as the script are available from github:   jrdspace 
> gem<https://github.com/akinom/dspace-jruby>.  and 
> cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of other 
> scripts.
>
> The script is quite small, its ‘action’ is in the doit method
>
> def doit(metadata_field)
>  puts ['field', 'id', 'handle', 'value'].join("\t")
>  dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
>  dsos.each  do  |dso|
>    vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
> v.value }
>    puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
> dso.getHandle, vals  ].join("\t")
>  end
> end
>
> if you want to try this out , there are instructions on GitHUb. If you want 
> to work in Java, look at the implementation of the DSpace.findByMetadataValue 
>  method. It has the SQL statement. see 
> HERE<https://github.com/akinom/dspace-jruby/blob/master/lib/dspace/dspace.rb#L150-L171>
>
> Monika
>
> —
> Monika Mevenkamp
> Digital Repository Infrastructure Developer
> Princeton University
> Phone: 609-258-4161
> Skype: mo-meven
>
>
>
> On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff 
> <ilja.sidor...@uef.fi<mailto:ilja.sidor...@uef.fi>> wrote:
>
> Hello,
>
> I am using DSpace 5.5.
>
> Am I correct, that SOLR queries return only items that are in
> *collections* and not in the *workflow*? At least my search attemps
> indicate that?
>
> In the REST API, however, it seems that GET /items returns only
> results that are in the collections. However, with POST
> /items/find-by-metadata-field I can get all items in the DSpace, both
> those in the collections and those in the workflow?
>
> What I need, is a list of *all items* (both in the workflow and the
> collections) that have certain metadata field set and *the value of
> that field*. I don't see other way of doing that, except by direct SQL
> query to the database. I have one for 5.x, but I'm not happy with it
> since, I need to update it for 6.x etc. Is there any other way of
> doing this?
>
> Also, it seems that
>
> dspace import -d -m mapfile ...
>
> does not delete items currently in the workflow? Is this intentional or a bug?
>
> regards,
>
> Ilja Sidoroff
> University of Eastern Finland
>
> --
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to 
> dspace-tech+unsubscr...@googlegroups.com<mailto:dspace-tech+unsubscr...@googlegroups.com>.
> To post to this group, send email to 
> dspace-tech@googlegroups.com<mailto:dspace-tech@googlegroups.com>.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to