[
https://jira.duraspace.org/browse/DS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=23096#comment-23096
]
Richard Rodgers commented on DS-1073:
-------------------------------------
I think Mark's observation above (catch and process on ingest, and not traverse
the whole repo every time) is indeed a more scalable approach for most of these
types of operations. Although I can offer no immediate help, I do hope to
release a rewrite of the mediafilter functionality as a set of curation tasks
that will enable exactly this sort of processing: automatic hooks into ingest
per item, run as a command-line batch, work directly in the admin UI for 'one
offs' etc. If you are interested, I'd be happy to share this work with you as
it progresses, although you will need a curation aware (i.e 1.7+) version of
DSpace.
> The maximum flag on filter-media is useless if results are returned in the
> same order every time
> ------------------------------------------------------------------------------------------------
>
> Key: DS-1073
> URL: https://jira.duraspace.org/browse/DS-1073
> Project: DSpace
> Issue Type: Bug
> Components: DSpace API
> Affects Versions: 1.8.0
> Reporter: Samuel Ottenhoff
> Attachments: DS-1073.patch
>
>
> Scenario: institution has a million PDFs on one sever and needs to run
> filter-media every night. Institution only wants to run on 10k PDFs per night.
> There is a "-m" flag to set a maximum. But the results are returned the same
> way every time preventing new items from being picked up.
> Possible solutions:
> 1) Return items sorted by recently updated?
> 2) Return a random sort of elements instead of the same ones every time?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel