I just ran a test and timed execution time script 4681 items -> 26.334u 1.829s 0:35.43 79.4% 0+0k 0+ 36io 0pf+0w script 64065 items -> 77.505u 16.817s 6:07.68 25.6% 0+0k 1+365io 0pf+0w jruby+gem+start dspace -> 12.047u 0.525s 0:06.75 186.0% 0+0k 52+ 38io 393pf+0w dspace database test -> 6.616u 0.348s 0:03.44 202.0% 0+0k 2+ 15io 0pf+0w
comparing the time of running a regular database test versus running a comparable JRuby script that loads the dspace gem and connects to the Dspace instance, which involves more or less the same actions as testing the database, shows that this costs an extra 6sec user time and .2 sec system time. the second script example processes about 13 times as many items than the first - but the real elapsed time 6min versus 35sec more like 10 times as long; just starting up the ruby interpreter, loading the gem and starting the DSPace kernel takes takes almost 7sec which explains most of that ‘imbalance’ Monika — Monika Mevenkamp Digital Repository Infrastructure Developer Princeton University Phone: 609-258-4161 Skype: mo-meven > On Sep 1, 2016, at 12:05 PM, Monika Mevenkamp <mome...@gmail.com> wrote: > > does speed matter ? Is this something you’ll have to do a lot - or is it one > of those one-of-scripts ? > > If you run this on the command line / cron it may not be so important - > especially with a cron job you may not care that much - as log as you can > start it at midnight and it gets done by 7am > > Calling the JRuby script from the UI, aka calling from Java is possible - but > I have not actually done that yet > > I don’t believe that calling Java via JRuby adds much to the performance > > A bigger issue, I see, is that DSpace.findByMetadataValue returns an array > of matching DSpaceObjects - if speed matters this needs to be changed to > return an iterator, which shouldn’t be too hard > > Why not just try and see - since the script only accesses data and does not > change anything - there is no danger to disturb your instance. Plus you can > run this anywhere - as long as you have access to the database. > > Monika > > — > Monika Mevenkamp > Digital Repository Infrastructure Developer > Princeton University > Phone: 609-258-4161 > Skype: mo-meven > > > >> On Sep 1, 2016, at 11:48 AM, Ilja Sidoroff <ilja.sidor...@uef.fi> wrote: >> >> Thanks! That script would indeed do what I'd need, but I'm bit concerned >> about the scalability, since it will have to do one request per item - and >> if I have thousands of items, that might get a bit heavy? Or would it? I >> really don't know don't know how long for instance 10.000 item/id/metadata >> requests would take. >> >> Ilja >> >> ________________________________________ >> From: Monika Mevenkamp <mome...@gmail.com> >> Sent: Thursday, September 1, 2016 6:30:33 PM >> To: Ilja Sidoroff >> Cc: DSpace Tech >> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST >> >> Hi Ilja >> >> I have a script that given a metadata field, e.g. pu.workflow.state, >> produces a tab separated list so: >> >> field id handle value >> pu.workflow.state 969 99999/fk4w099v32 approved >> pu.workflow.state 903 null emailed >> pu.workflow.state 753 null emailed >> pu.workflow.state 752 null emailed >> pu.workflow.state 902 null orphaned >> >> >> The script is written in jruby and based on my dspace-jruby gem, see Script >> here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>. >> The gem as well as the script are available from github: jrdspace >> gem<https://github.com/akinom/dspace-jruby>. and >> cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of >> other scripts. >> >> The script is quite small, its ‘action’ is in the doit method >> >> def doit(metadata_field) >> puts ['field', 'id', 'handle', 'value'].join("\t") >> dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM) >> dsos.each do |dso| >> vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| >> v.value } >> puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : >> dso.getHandle, vals ].join("\t") >> end >> end >> >> if you want to try this out , there are instructions on GitHUb. If you want >> to work in Java, look at the implementation of the >> DSpace.findByMetadataValue method. It has the SQL statement. see >> HERE<https://github.com/akinom/dspace-jruby/blob/master/lib/dspace/dspace.rb#L150-L171> >> >> Monika >> >> — >> Monika Mevenkamp >> Digital Repository Infrastructure Developer >> Princeton University >> Phone: 609-258-4161 >> Skype: mo-meven >> >> >> >> On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff >> <ilja.sidor...@uef.fi<mailto:ilja.sidor...@uef.fi>> wrote: >> >> Hello, >> >> I am using DSpace 5.5. >> >> Am I correct, that SOLR queries return only items that are in >> *collections* and not in the *workflow*? At least my search attemps >> indicate that? >> >> In the REST API, however, it seems that GET /items returns only >> results that are in the collections. However, with POST >> /items/find-by-metadata-field I can get all items in the DSpace, both >> those in the collections and those in the workflow? >> >> What I need, is a list of *all items* (both in the workflow and the >> collections) that have certain metadata field set and *the value of >> that field*. I don't see other way of doing that, except by direct SQL >> query to the database. I have one for 5.x, but I'm not happy with it >> since, I need to update it for 6.x etc. Is there any other way of >> doing this? >> >> Also, it seems that >> >> dspace import -d -m mapfile ... >> >> does not delete items currently in the workflow? Is this intentional or a >> bug? >> >> regards, >> >> Ilja Sidoroff >> University of Eastern Finland >> >> -- >> You received this message because you are subscribed to the Google Groups >> "DSpace Technical Support" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to >> dspace-tech+unsubscr...@googlegroups.com<mailto:dspace-tech+unsubscr...@googlegroups.com>. >> To post to this group, send email to >> dspace-tech@googlegroups.com<mailto:dspace-tech@googlegroups.com>. >> Visit this group at https://groups.google.com/group/dspace-tech. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.