[ 
https://jira.duraspace.org/browse/DS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=23090#comment-23090
 ] 

Tim Donohue commented on DS-1073:
---------------------------------

It seems like to me there could be a few ways to resolve this.

One simple version could be to go through items in *reverse* order (last 
modified would appear first). It'd essentially be the results of this query:

SELECT * FROM item WHERE in_archive='1' ORDER BY last_modified DESC

This query would give the same results as the Item.findAll(context) method, but 
they'd be in reverse order.

If things were returned in reverse order, then the Items that still need 
Filtering should be *first* in the list (or at least near the beginning -- it's 
possible already filtered items would be recently modified as well).

Alternatively, you could get a little "smarter" and provide an option to only 
return items that were "last_modified" in *after* a given date/time.  That way 
if you ran 'filtermedia' daily, you could run it only across items that were 
updated in the last 24 hours. If you ran it weekly, you could run it across 
items updated in the last 7 days. Again, it'd be similar to the Item.findAll() 
method, but in this case we'd be limiting results to things that have a 
last_modified date after a specified date & time.

I haven't written any sort of code around either of these ideas (and likely 
won't have a chance to do so myself).  But, they both sound like they could 
help resolve this specific issue (the latter, "smarter" version, may be an even 
better resolution than the former).

                
> The maximum flag on filter-media is useless if results are returned in the 
> same order every time
> ------------------------------------------------------------------------------------------------
>
>                 Key: DS-1073
>                 URL: https://jira.duraspace.org/browse/DS-1073
>             Project: DSpace
>          Issue Type: Bug
>          Components: DSpace API
>    Affects Versions: 1.8.0
>            Reporter: Samuel Ottenhoff
>         Attachments: DS-1073.patch
>
>
> Scenario: institution has a million PDFs on one sever and needs to run 
> filter-media every night. Institution only wants to run on 10k PDFs per night.
> There is a "-m" flag to set a maximum. But the results are returned the same 
> way every time preventing new items from being picked up.
> Possible solutions:
>  1) Return items sorted by recently updated?
>   2) Return a random sort of elements instead of the same ones every time?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to