OAI requests with resumption token start at the wrong offset if non-public 
items are included and there are withdrawn items
---------------------------------------------------------------------------------------------------------------------------

                 Key: DS-866
                 URL: https://jira.duraspace.org/browse/DS-866
             Project: DSpace
          Issue Type: Bug
          Components: DSpace API
    Affects Versions: 1.7.1, 1.7.0, 1.6.2, 1.6.1
            Reporter: Andrea Schweer
            Priority: Major
         Attachments: oai-offset.patch

In some circumstances, items are not disseminated via OAI.

OAI responds to listRecords requests with batches of up to 100 (or other number 
set via oai.didl.maxresponse property) records. Requests can include a 
resumption token to specify which batch is required. The resumption token is 
then translated into an offset to the response from the database. 

OAI listRecords responses contain withdrawn items (marked as "deleted"). 

If harvest.includerestricted.oai = false is set in dspace.cfg, only publicly 
readable items are included in the response. The offset into the database 
results is recalculated to skip over restricted items. If a withdrawn item is 
found in the database response, this item will also be skipped over by the 
offset recalculation code (because it is not publicly readable) even though it 
shouldn't because it will still be included in the OAI response. The code that 
actually adds items to the response, in contrast, does not skip over withdrawn 
items. The next batch will start n items later in the database response than it 
should, where n is the number of withdrawn items before the start of the batch.

This means that a full OAI harvest via consecutive listRecords requests will 
miss as many items as there are withdrawn items.

I first found this in the 1.6.1 code and it is still present in 1.7.1. I didn't 
check whether this affects earlier versions too. A patch against 1.7.1 is 
attached.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://jira.duraspace.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to