Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-04 Thread Ilja Sidoroff
Ari and Terry,

thanks for help! At least with brief testing, it seems the new api does what I 
need. One question - if I leave the parameters 'limit' and 'offset' from the 
query, will I get all the items, or is there some upper limit for results 
returned in one query?

Ilja


From: Ari 
Sent: Monday, September 5, 2016 8:34:50 AM
To: DSpace Technical Support
Cc: terry.br...@georgetown.edu; Ilja Sidoroff
Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST

Hi Ilja,

we aiming fullly REST-based solution (dumping both workflow and XMLUI admin UI) 
so we do lot of REST tests.

This should work for filtered-items (items who's author name contains "Matti"):

 curl  -H "Accept: application/json"  
"http:your.dspace.com:8080/rest/filtered-items?&query_field[]=dc.contributor.author&query_op[]=contains&query_val[]=Matti&collSel[]=&limit=100&offset=0&expand=parentCollection,metadata&filters=none"
 -g | python -m json.tool


If you need to be logged in, then:


1. First, authenticate your self:
curl --data "email=your.email&password=your_pass" 
http://your_dspace.com:8080/rest/login<http://your_dspace.com:8080/rest/login> 
-c cookies.txt


2. Test that authentication was successful:
curl  -H "Accept: application/json"  
http://your_dspace.com:8080/rest/status<http://your_dspace.com:8080/rest/status>
 -b cookies.txt


- this should return something like this:
{"okay":true,"authenticated":true,"email":"your.email","fullname":"ari 
","sourceVersion":null,"apiVersion":null}



Hope this helps,
Ari


On Saturday, 3 September 2016 10:38:07 UTC+3, Ilja Sidoroff wrote:
Terry,

I looked very briefly into this on Friday, but I didn't quite get how to create 
and execute queries without using the interactive webpages. The endpoint GET 
/rest/filtered-items seemed promising, but at the limited time I looked at it, 
I didn't see how to use it, but I'll try to look bit more into that.

Ilja

________________
From: dspac...@googlegroups.com 
> on behalf of Terry Brady 
>
Sent: Friday, September 2, 2016 8:55:13 PM
To: Ilja Sidoroff
Cc: DSpace Technical Support
Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST

Ilja,

In DSpace 6, the REST API will provide additional query capabilities.

https://wiki.duraspace.org/display/DSDOC6x/REST+Based+Quality+Control+Reports

While this may not solve your immediate issues, it might provide a good future 
solution.

Terry

On Thu, Sep 1, 2016 at 3:43 AM, Ilja Sidoroff 
<mailto:ilja.sidor...@uef.fi>> wrote:
Hello,

I am using DSpace 5.5.

Am I correct, that SOLR queries return only items that are in
*collections* and not in the *workflow*? At least my search attemps
indicate that?

In the REST API, however, it seems that GET /items returns only
results that are in the collections. However, with POST
/items/find-by-metadata-field I can get all items in the DSpace, both
those in the collections and those in the workflow?

What I need, is a list of *all items* (both in the workflow and the
collections) that have certain metadata field set and *the value of
that field*. I don't see other way of doing that, except by direct SQL
query to the database. I have one for 5.x, but I'm not happy with it
since, I need to update it for 6.x etc. Is there any other way of
doing this?

Also, it seems that

dspace import -d -m mapfile ...

does not delete items currently in the workflow? Is this intentional or a bug?

regards,

Ilja Sidoroff
University of Eastern Finland

--
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
dspace-tech...@googlegroups.com<mailto:dspace-tech%2bunsubscr...@googlegroups.com>.
To post to this group, send email to 
dspac...@googlegroups.com<mailto:dspac...@googlegroups.com>.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.



--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
http://georgetown-university-libraries.github.io/<https://www.library.georgetown.edu/lit/code>
425-298-5498 (Seattle, WA)

--
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
dspace-tech...@googlegroups.com<mailto:dspace-tech+unsubscr...@googlegroups.com>.
To post to this group, send email to 
dspac...@googlegroups.com<mailto:dspac...@googlegroups.com>.
Visit this group at https://groups.google.com/grou

Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-04 Thread Ari
Hi Ilja,

we aiming fullly REST-based solution (dumping both workflow and XMLUI admin 
UI) so we do lot of REST tests. 

This should work for filtered-items (items who's author name contains 
"Matti"):

 curl  -H "Accept: application/json"  
"http:your.dspace.com:8080/rest/filtered-items?&query_field[]=dc.contributor.author&query_op[]=contains&query_val[]=Matti&collSel[]=&limit=100&offset=0&expand=parentCollection,metadata&filters=none"
 
-g | python -m json.tool


If you need to be logged in, then:


1. First, authenticate your self:
curl --data "email=your.email&password=your_pass" http://
your_dspace.com:8080/rest/login -c cookies.txt


2. Test that authentication was successful:
curl  -H "Accept: application/json"  http://your_dspace.com:8080/rest/status 
-b cookies.txt


- this should return something like this:
{"okay":true,"authenticated":true,"email":"your.email","fullname":"ari ",
"sourceVersion":null,"apiVersion":null}



Hope this helps,
Ari


On Saturday, 3 September 2016 10:38:07 UTC+3, Ilja Sidoroff wrote:
>
> Terry, 
>
> I looked very briefly into this on Friday, but I didn't quite get how to 
> create and execute queries without using the interactive webpages. The 
> endpoint GET /rest/filtered-items seemed promising, but at the limited time 
> I looked at it, I didn't see how to use it, but I'll try to look bit more 
> into that. 
>
> Ilja 
>
> ________________ 
> From: dspac...@googlegroups.com   > on behalf of Terry Brady  > 
> Sent: Friday, September 2, 2016 8:55:13 PM 
> To: Ilja Sidoroff 
> Cc: DSpace Technical Support 
> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and 
> REST 
>
> Ilja, 
>
> In DSpace 6, the REST API will provide additional query capabilities. 
>
>
> https://wiki.duraspace.org/display/DSDOC6x/REST+Based+Quality+Control+Reports 
>
> While this may not solve your immediate issues, it might provide a good 
> future solution. 
>
> Terry 
>
> On Thu, Sep 1, 2016 at 3:43 AM, Ilja Sidoroff  <mailto:ilja.sidor...@uef.fi >> wrote: 
> Hello, 
>
> I am using DSpace 5.5. 
>
> Am I correct, that SOLR queries return only items that are in 
> *collections* and not in the *workflow*? At least my search attemps 
> indicate that? 
>
> In the REST API, however, it seems that GET /items returns only 
> results that are in the collections. However, with POST 
> /items/find-by-metadata-field I can get all items in the DSpace, both 
> those in the collections and those in the workflow? 
>
> What I need, is a list of *all items* (both in the workflow and the 
> collections) that have certain metadata field set and *the value of 
> that field*. I don't see other way of doing that, except by direct SQL 
> query to the database. I have one for 5.x, but I'm not happy with it 
> since, I need to update it for 6.x etc. Is there any other way of 
> doing this? 
>
> Also, it seems that 
>
> dspace import -d -m mapfile ... 
>
> does not delete items currently in the workflow? Is this intentional or a 
> bug? 
>
> regards, 
>
> Ilja Sidoroff 
> University of Eastern Finland 
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group. 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dspace-tech...@googlegroups.com  dspace-tech%2bunsubscr...@googlegroups.com >. 
> To post to this group, send email to dspac...@googlegroups.com 
> <mailto:dspac...@googlegroups.com >. 
> Visit this group at https://groups.google.com/group/dspace-tech. 
> For more options, visit https://groups.google.com/d/optout. 
>
>
>
> -- 
> Terry Brady 
> Applications Programmer Analyst 
> Georgetown University Library Information Technology 
> http://georgetown-university-libraries.github.io/<
> https://www.library.georgetown.edu/lit/code> 
> 425-298-5498 (Seattle, WA) 
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group. 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dspace-tech...@googlegroups.com  dspace-tech+unsubscr...@googlegroups.com >. 
> To post to this group, send email to dspac...@googlegroups.com 
> <mailto:dspac...@googlegroups.com >. 
> Visit this group at https://groups.google.com/group/dspace-tech. 
> For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-03 Thread Ilja Sidoroff
Terry,

I looked very briefly into this on Friday, but I didn't quite get how to create 
and execute queries without using the interactive webpages. The endpoint GET 
/rest/filtered-items seemed promising, but at the limited time I looked at it, 
I didn't see how to use it, but I'll try to look bit more into that.

Ilja


From: dspace-tech@googlegroups.com  on behalf of 
Terry Brady 
Sent: Friday, September 2, 2016 8:55:13 PM
To: Ilja Sidoroff
Cc: DSpace Technical Support
Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST

Ilja,

In DSpace 6, the REST API will provide additional query capabilities.

https://wiki.duraspace.org/display/DSDOC6x/REST+Based+Quality+Control+Reports

While this may not solve your immediate issues, it might provide a good future 
solution.

Terry

On Thu, Sep 1, 2016 at 3:43 AM, Ilja Sidoroff 
mailto:ilja.sidor...@uef.fi>> wrote:
Hello,

I am using DSpace 5.5.

Am I correct, that SOLR queries return only items that are in
*collections* and not in the *workflow*? At least my search attemps
indicate that?

In the REST API, however, it seems that GET /items returns only
results that are in the collections. However, with POST
/items/find-by-metadata-field I can get all items in the DSpace, both
those in the collections and those in the workflow?

What I need, is a list of *all items* (both in the workflow and the
collections) that have certain metadata field set and *the value of
that field*. I don't see other way of doing that, except by direct SQL
query to the database. I have one for 5.x, but I'm not happy with it
since, I need to update it for 6.x etc. Is there any other way of
doing this?

Also, it seems that

dspace import -d -m mapfile ...

does not delete items currently in the workflow? Is this intentional or a bug?

regards,

Ilja Sidoroff
University of Eastern Finland

--
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
dspace-tech+unsubscr...@googlegroups.com<mailto:dspace-tech%2bunsubscr...@googlegroups.com>.
To post to this group, send email to 
dspace-tech@googlegroups.com<mailto:dspace-tech@googlegroups.com>.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.



--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
http://georgetown-university-libraries.github.io/<https://www.library.georgetown.edu/lit/code>
425-298-5498 (Seattle, WA)

--
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
dspace-tech+unsubscr...@googlegroups.com<mailto:dspace-tech+unsubscr...@googlegroups.com>.
To post to this group, send email to 
dspace-tech@googlegroups.com<mailto:dspace-tech@googlegroups.com>.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-03 Thread Ilja Sidoroff
On Fri, Sep 02, 2016 at 09:52:54AM -0400, Monika Mevenkamp wrote:
> Ilja 
> 
> Yep - that little CS guy sometimes gets in the way 
> 
> I myselfI do more and more in ruby - much faster turn around, since it is an 
> interpreted language and without the strict typing and all that Java 
> verbosity many fewer lines of codes - i actually wrote the little lister 
> script you need in between other things yesterday.  I even use jruby when I 
> need to develop Java code and want to figure out what all those mysterious 
> semi documented DSPace Java methods really are doing 
> 
> I am eager (as you probably can tell) to promote my dspace / ruby gem 
> So if you are interested in trying this out - I am very interested in helping 
> / supporting this 

Monika,

I'm not sure if I will use your scripts directly, since I've done other parts 
of my production pipeline already in golang; using (j)ruby in just one might 
confuse others. However, I'm starting to see the advantange of using a 
jvm-based scripting language; I've already have an acquintance of mine using 
jython with DSpace. I myself prefer ruby over python (or maybe I'm just bored 
with python), so I will definitely keep your gem and script in mind.

Ilja

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-02 Thread Terry Brady
Ilja,

In DSpace 6, the REST API will provide additional query capabilities.

https://wiki.duraspace.org/display/DSDOC6x/REST+Based+Quality+Control+Reports

While this may not solve your immediate issues, it might provide a good
future solution.

Terry

On Thu, Sep 1, 2016 at 3:43 AM, Ilja Sidoroff  wrote:

> Hello,
>
> I am using DSpace 5.5.
>
> Am I correct, that SOLR queries return only items that are in
> *collections* and not in the *workflow*? At least my search attemps
> indicate that?
>
> In the REST API, however, it seems that GET /items returns only
> results that are in the collections. However, with POST
> /items/find-by-metadata-field I can get all items in the DSpace, both
> those in the collections and those in the workflow?
>
> What I need, is a list of *all items* (both in the workflow and the
> collections) that have certain metadata field set and *the value of
> that field*. I don't see other way of doing that, except by direct SQL
> query to the database. I have one for 5.x, but I'm not happy with it
> since, I need to update it for 6.x etc. Is there any other way of
> doing this?
>
> Also, it seems that
>
> dspace import -d -m mapfile ...
>
> does not delete items currently in the workflow? Is this intentional or a
> bug?
>
> regards,
>
> Ilja Sidoroff
> University of Eastern Finland
>
> --
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
http://georgetown-university-libraries.github.io/

425-298-5498 (Seattle, WA)

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-02 Thread Ilja Sidoroff
Yeah, the speed is not that crucial, if it stays somewhere in the order of 
minutes or even some hours. What I'm doing in is transferring items from CRIS, 
which doesn't know which items DSpace already has, and I'll have to prune those 
records already in the DSpace. This happens once a day (night) by cron, so I 
can live with that speed. It's just probably the little computer scientist in 
me that had hoped for the most efficient solution.

Thanks for the numbers and testing!

Ilja

From: Monika Mevenkamp 
Sent: Thursday, September 1, 2016 7:05:12 PM
To: Ilja Sidoroff
Cc: DSpace Tech
Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST

does speed matter ?  Is this something you’ll have to do a lot - or is it one 
of those one-of-scripts ?

If you run this on the command line / cron   it may not be so important - 
especially with a cron job  you may not care that much - as log as you can 
start it at midnight and it gets done by 7am

Calling the JRuby script from the UI, aka calling from Java is possible - but I 
have not actually done that yet

I don’t believe that calling Java via JRuby adds much to the performance

A bigger issue,  I see, is that DSpace.findByMetadataValue  returns an array of 
matching DSpaceObjects - if speed matters this needs to be changed to return an 
iterator, which shouldn’t be too hard

Why not just try and see - since the script only accesses data and does not 
change anything - there is no danger to disturb your instance. Plus you can run 
this anywhere - as long as you have access to the database.

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Sep 1, 2016, at 11:48 AM, Ilja Sidoroff  wrote:
>
> Thanks! That script would indeed do what I'd need, but I'm bit concerned 
> about the scalability, since it will have to do one request per item - and if 
> I have thousands of items, that might get a bit heavy? Or would it? I really 
> don't know don't know how long for instance 10.000 item/id/metadata requests 
> would take.
>
> Ilja
>
> 
> From: Monika Mevenkamp 
> Sent: Thursday, September 1, 2016 6:30:33 PM
> To: Ilja Sidoroff
> Cc: DSpace Tech
> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST
>
> Hi Ilja
>
> I have a script that given a metadata field, e.g. pu.workflow.state, produces 
> a tab separated list so:
>
> field   id  handle  value
> pu.workflow.state   969 9/fk4w099v32approved
> pu.workflow.state   903 nullemailed
> pu.workflow.state   753 nullemailed
> pu.workflow.state   752 nullemailed
> pu.workflow.state   902 nullorphaned
>
>
> The script is written in jruby and based on my dspace-jruby gem, see Script 
> here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>.
> The gem as well as the script are available from github:   jrdspace 
> gem<https://github.com/akinom/dspace-jruby>.  and 
> cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of other 
> scripts.
>
> The script is quite small, its ‘action’ is in the doit method
>
> def doit(metadata_field)
>  puts ['field', 'id', 'handle', 'value'].join("\t")
>  dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
>  dsos.each  do  |dso|
>vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
> v.value }
>puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
> dso.getHandle, vals  ].join("\t")
>  end
> end
>
> if you want to try this out , there are instructions on GitHUb. If you want 
> to work in Java, look at the implementation of the DSpace.findByMetadataValue 
>  method. It has the SQL statement. see 
> HERE<https://github.com/akinom/dspace-jruby/blob/master/lib/dspace/dspace.rb#L150-L171>
>
> Monika
>
> —
> Monika Mevenkamp
> Digital Repository Infrastructure Developer
> Princeton University
> Phone: 609-258-4161
> Skype: mo-meven
>
>
>
> On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff 
> mailto:ilja.sidor...@uef.fi>> wrote:
>
> Hello,
>
> I am using DSpace 5.5.
>
> Am I correct, that SOLR queries return only items that are in
> *collections* and not in the *workflow*? At least my search attemps
> indicate that?
>
> In the REST API, however, it seems that GET /items returns only
> results that are in the collections. However, with POST
> /items/find-by-metadata-field I can get all items in the DSpace, both
> those in the collections and those in the workflo

Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-01 Thread Monika Mevenkamp
I just ran a test  and timed execution time 

script  4681  items-> 26.334u  1.829s 0:35.43 79.4%   0+0k  0+ 36io   0pf+0w
script 64065  items-> 77.505u 16.817s 6:07.68 25.6%   0+0k  1+365io   0pf+0w
jruby+gem+start dspace -> 12.047u  0.525s 0:06.75 186.0%  0+0k 52+ 38io 393pf+0w
dspace database test   ->  6.616u  0.348s 0:03.44 202.0%  0+0k  2+ 15io   0pf+0w

comparing the time of running a regular database test versus running a 
comparable JRuby script that loads the dspace gem and connects  to the Dspace 
instance, which involves more or less the same actions as testing the database, 
shows that this costs an extra 6sec user time and .2 sec system time. 

the second script example processes about 13 times as many items than the first 
- but the real elapsed time   6min versus 35sec more like 10 times as long; 
just starting up the ruby interpreter, loading the gem and starting the DSPace 
kernel takes takes almost 7sec which explains most of that ‘imbalance’

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Sep 1, 2016, at 12:05 PM, Monika Mevenkamp  wrote:
> 
> does speed matter ?  Is this something you’ll have to do a lot - or is it one 
> of those one-of-scripts ?
> 
> If you run this on the command line / cron   it may not be so important - 
> especially with a cron job  you may not care that much - as log as you can 
> start it at midnight and it gets done by 7am 
> 
> Calling the JRuby script from the UI, aka calling from Java is possible - but 
> I have not actually done that yet 
> 
> I don’t believe that calling Java via JRuby adds much to the performance
> 
> A bigger issue,  I see, is that DSpace.findByMetadataValue  returns an array 
> of matching DSpaceObjects - if speed matters this needs to be changed to 
> return an iterator, which shouldn’t be too hard 
> 
> Why not just try and see - since the script only accesses data and does not 
> change anything - there is no danger to disturb your instance. Plus you can 
> run this anywhere - as long as you have access to the database. 
> 
> Monika
> 
> —
> Monika Mevenkamp
> Digital Repository Infrastructure Developer
> Princeton University
> Phone: 609-258-4161
> Skype: mo-meven
> 
> 
> 
>> On Sep 1, 2016, at 11:48 AM, Ilja Sidoroff  wrote:
>> 
>> Thanks! That script would indeed do what I'd need, but I'm bit concerned 
>> about the scalability, since it will have to do one request per item - and 
>> if I have thousands of items, that might get a bit heavy? Or would it? I 
>> really don't know don't know how long for instance 10.000 item/id/metadata 
>> requests would take.
>> 
>> Ilja
>> 
>> ____________________
>> From: Monika Mevenkamp 
>> Sent: Thursday, September 1, 2016 6:30:33 PM
>> To: Ilja Sidoroff
>> Cc: DSpace Tech
>> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST
>> 
>> Hi Ilja
>> 
>> I have a script that given a metadata field, e.g. pu.workflow.state, 
>> produces a tab separated list so:
>> 
>> field   id  handle  value
>> pu.workflow.state   969 9/fk4w099v32approved
>> pu.workflow.state   903 nullemailed
>> pu.workflow.state   753 nullemailed
>> pu.workflow.state   752 nullemailed
>> pu.workflow.state   902 nullorphaned
>> 
>> 
>> The script is written in jruby and based on my dspace-jruby gem, see Script 
>> here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>.
>> The gem as well as the script are available from github:   jrdspace 
>> gem<https://github.com/akinom/dspace-jruby>.  and 
>> cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of 
>> other scripts.
>> 
>> The script is quite small, its ‘action’ is in the doit method
>> 
>> def doit(metadata_field)
>> puts ['field', 'id', 'handle', 'value'].join("\t")
>> dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
>> dsos.each  do  |dso|
>>   vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
>> v.value }
>>   puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
>> dso.getHandle, vals  ].join("\t")
>> end
>> end
>> 
>> if you want to try this out , there are instructions on GitHUb. If you want 
>> to work in Java, look at the implementation of the 
>> DSpace.findByMetadataValue  method. It has the SQL statement. see 
>> HERE<https:

Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-01 Thread Monika Mevenkamp
does speed matter ?  Is this something you’ll have to do a lot - or is it one 
of those one-of-scripts ?

If you run this on the command line / cron   it may not be so important - 
especially with a cron job  you may not care that much - as log as you can 
start it at midnight and it gets done by 7am 

Calling the JRuby script from the UI, aka calling from Java is possible - but I 
have not actually done that yet 

I don’t believe that calling Java via JRuby adds much to the performance

A bigger issue,  I see, is that DSpace.findByMetadataValue  returns an array of 
matching DSpaceObjects - if speed matters this needs to be changed to return an 
iterator, which shouldn’t be too hard 

Why not just try and see - since the script only accesses data and does not 
change anything - there is no danger to disturb your instance. Plus you can run 
this anywhere - as long as you have access to the database. 

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Sep 1, 2016, at 11:48 AM, Ilja Sidoroff  wrote:
> 
> Thanks! That script would indeed do what I'd need, but I'm bit concerned 
> about the scalability, since it will have to do one request per item - and if 
> I have thousands of items, that might get a bit heavy? Or would it? I really 
> don't know don't know how long for instance 10.000 item/id/metadata requests 
> would take.
> 
> Ilja
> 
> 
> From: Monika Mevenkamp 
> Sent: Thursday, September 1, 2016 6:30:33 PM
> To: Ilja Sidoroff
> Cc: DSpace Tech
> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST
> 
> Hi Ilja
> 
> I have a script that given a metadata field, e.g. pu.workflow.state, produces 
> a tab separated list so:
> 
> field   id  handle  value
> pu.workflow.state   969 9/fk4w099v32approved
> pu.workflow.state   903 nullemailed
> pu.workflow.state   753 nullemailed
> pu.workflow.state   752 nullemailed
> pu.workflow.state   902 nullorphaned
> 
> 
> The script is written in jruby and based on my dspace-jruby gem, see Script 
> here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>.
> The gem as well as the script are available from github:   jrdspace 
> gem<https://github.com/akinom/dspace-jruby>.  and 
> cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of other 
> scripts.
> 
> The script is quite small, its ‘action’ is in the doit method
> 
> def doit(metadata_field)
>  puts ['field', 'id', 'handle', 'value'].join("\t")
>  dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
>  dsos.each  do  |dso|
>vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
> v.value }
>puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
> dso.getHandle, vals  ].join("\t")
>  end
> end
> 
> if you want to try this out , there are instructions on GitHUb. If you want 
> to work in Java, look at the implementation of the DSpace.findByMetadataValue 
>  method. It has the SQL statement. see 
> HERE<https://github.com/akinom/dspace-jruby/blob/master/lib/dspace/dspace.rb#L150-L171>
> 
> Monika
> 
> —
> Monika Mevenkamp
> Digital Repository Infrastructure Developer
> Princeton University
> Phone: 609-258-4161
> Skype: mo-meven
> 
> 
> 
> On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff 
> mailto:ilja.sidor...@uef.fi>> wrote:
> 
> Hello,
> 
> I am using DSpace 5.5.
> 
> Am I correct, that SOLR queries return only items that are in
> *collections* and not in the *workflow*? At least my search attemps
> indicate that?
> 
> In the REST API, however, it seems that GET /items returns only
> results that are in the collections. However, with POST
> /items/find-by-metadata-field I can get all items in the DSpace, both
> those in the collections and those in the workflow?
> 
> What I need, is a list of *all items* (both in the workflow and the
> collections) that have certain metadata field set and *the value of
> that field*. I don't see other way of doing that, except by direct SQL
> query to the database. I have one for 5.x, but I'm not happy with it
> since, I need to update it for 6.x etc. Is there any other way of
> doing this?
> 
> Also, it seems that
> 
> dspace import -d -m mapfile ...
> 
> does not delete items currently in the workflow? Is this intentional or a bug?
> 
> regards,
> 
> Ilja Sidoroff
> University of Eastern Finland
> 
> --
> You received this message because you are s

Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-01 Thread Ilja Sidoroff
Thanks! That script would indeed do what I'd need, but I'm bit concerned about 
the scalability, since it will have to do one request per item - and if I have 
thousands of items, that might get a bit heavy? Or would it? I really don't 
know don't know how long for instance 10.000 item/id/metadata requests would 
take.

Ilja


From: Monika Mevenkamp 
Sent: Thursday, September 1, 2016 6:30:33 PM
To: Ilja Sidoroff
Cc: DSpace Tech
Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST

Hi Ilja

I have a script that given a metadata field, e.g. pu.workflow.state, produces a 
tab separated list so:

field   id  handle  value
pu.workflow.state   969 9/fk4w099v32approved
pu.workflow.state   903 nullemailed
pu.workflow.state   753 nullemailed
pu.workflow.state   752 nullemailed
pu.workflow.state   902 nullorphaned


The script is written in jruby and based on my dspace-jruby gem, see Script 
here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>.
The gem as well as the script are available from github:   jrdspace 
gem<https://github.com/akinom/dspace-jruby>.  and 
cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of other 
scripts.

The script is quite small, its ‘action’ is in the doit method

def doit(metadata_field)
  puts ['field', 'id', 'handle', 'value'].join("\t")
  dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
  dsos.each  do  |dso|
vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
v.value }
puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
dso.getHandle, vals  ].join("\t")
  end
end

if you want to try this out , there are instructions on GitHUb. If you want to 
work in Java, look at the implementation of the DSpace.findByMetadataValue  
method. It has the SQL statement. see 
HERE<https://github.com/akinom/dspace-jruby/blob/master/lib/dspace/dspace.rb#L150-L171>

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff 
mailto:ilja.sidor...@uef.fi>> wrote:

Hello,

I am using DSpace 5.5.

Am I correct, that SOLR queries return only items that are in
*collections* and not in the *workflow*? At least my search attemps
indicate that?

In the REST API, however, it seems that GET /items returns only
results that are in the collections. However, with POST
/items/find-by-metadata-field I can get all items in the DSpace, both
those in the collections and those in the workflow?

What I need, is a list of *all items* (both in the workflow and the
collections) that have certain metadata field set and *the value of
that field*. I don't see other way of doing that, except by direct SQL
query to the database. I have one for 5.x, but I'm not happy with it
since, I need to update it for 6.x etc. Is there any other way of
doing this?

Also, it seems that

dspace import -d -m mapfile ...

does not delete items currently in the workflow? Is this intentional or a bug?

regards,

Ilja Sidoroff
University of Eastern Finland

--
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
dspace-tech+unsubscr...@googlegroups.com<mailto:dspace-tech+unsubscr...@googlegroups.com>.
To post to this group, send email to 
dspace-tech@googlegroups.com<mailto:dspace-tech@googlegroups.com>.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Querying items by metadata item via SOLR and REST

2016-09-01 Thread Monika Mevenkamp
Hi Ilja 

I have a script that given a metadata field, e.g. pu.workflow.state, produces a 
tab separated list so: 

field   id  handle  value
pu.workflow.state   969 9/fk4w099v32approved
pu.workflow.state   903 nullemailed
pu.workflow.state   753 nullemailed
pu.workflow.state   752 nullemailed
pu.workflow.state   902 nullorphaned


The script is written in jruby and based on my dspace-jruby gem, see Script 
here 
. 
The gem as well as the script are available from github:   jrdspace gem 
.  and cli-dspace 
 , which has a bunch of other scripts.

The script is quite small, its ‘action’ is in the doit method 
def doit(metadata_field)
  puts ['field', 'id', 'handle', 'value'].join("\t")
  dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
  dsos.each  do  |dso|
vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
v.value }
puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
dso.getHandle, vals  ].join("\t")
  end
end
if you want to try this out , there are instructions on GitHUb. If you want to 
work in Java, look at the implementation of the DSpace.findByMetadataValue  
method. It has the SQL statement. see HERE 

 

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff  wrote:
> 
> Hello,
> 
> I am using DSpace 5.5.
> 
> Am I correct, that SOLR queries return only items that are in
> *collections* and not in the *workflow*? At least my search attemps
> indicate that?
> 
> In the REST API, however, it seems that GET /items returns only
> results that are in the collections. However, with POST
> /items/find-by-metadata-field I can get all items in the DSpace, both
> those in the collections and those in the workflow?
> 
> What I need, is a list of *all items* (both in the workflow and the
> collections) that have certain metadata field set and *the value of
> that field*. I don't see other way of doing that, except by direct SQL
> query to the database. I have one for 5.x, but I'm not happy with it
> since, I need to update it for 6.x etc. Is there any other way of
> doing this?
> 
> Also, it seems that
> 
> dspace import -d -m mapfile ...
> 
> does not delete items currently in the workflow? Is this intentional or a bug?
> 
> regards,
> 
> Ilja Sidoroff
> University of Eastern Finland
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.