OAI requests with resumption token start at the wrong offset if non-public
items are included and there are withdrawn items
---------------------------------------------------------------------------------------------------------------------------
Key: DS-866
URL: https://jira.duraspace.org/browse/DS-866
Project: DSpace
Issue Type: Bug
Components: DSpace API
Affects Versions: 1.7.1, 1.7.0, 1.6.2, 1.6.1
Reporter: Andrea Schweer
Priority: Major
Attachments: oai-offset.patch
In some circumstances, items are not disseminated via OAI.
OAI responds to listRecords requests with batches of up to 100 (or other number
set via oai.didl.maxresponse property) records. Requests can include a
resumption token to specify which batch is required. The resumption token is
then translated into an offset to the response from the database.
OAI listRecords responses contain withdrawn items (marked as "deleted").
If harvest.includerestricted.oai = false is set in dspace.cfg, only publicly
readable items are included in the response. The offset into the database
results is recalculated to skip over restricted items. If a withdrawn item is
found in the database response, this item will also be skipped over by the
offset recalculation code (because it is not publicly readable) even though it
shouldn't because it will still be included in the OAI response. The code that
actually adds items to the response, in contrast, does not skip over withdrawn
items. The next batch will start n items later in the database response than it
should, where n is the number of withdrawn items before the start of the batch.
This means that a full OAI harvest via consecutive listRecords requests will
miss as many items as there are withdrawn items.
I first found this in the 1.6.1 code and it is still present in 1.7.1. I didn't
check whether this affects earlier versions too. A patch against 1.7.1 is
attached.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.duraspace.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself;
WebMatrix provides all the features you need to develop and
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel