I am trying to harvest articles from our DSPace instance using the
oai-harvert tool [1].
I run this command `oai-harvest -s col_123456789_2250
https://dspace.example.org/oai/request?` where col_123456789_2250 is the
set for articles. This tools creates an xml for each record but after a
finite number of records (always the same) it crashes with 500:
```
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7694.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7717.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7662.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7712.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7691.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F6578.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7726.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7827.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1050.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F96.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7835.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1066.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F234.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1040.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F162.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F209.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F6602.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F224.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F207.oai_dc.xml
DEBUGWriting to file
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1139.oai_dc.xml
ERRORHTTP Error 500:
Traceback (most recent call last):
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaiharvest/harvest.py",
line 339, in main
**kwargs
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaiharvest/harvest.py",
line 173, in harvest
**kwargs):
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaiharvest/harvest.py",
line 125, in _listRecords
for record in client.listRecords(**kwargs):
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
line 365, in ResumptionListGenerator
result, token = nextBatch(token)
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
line 194, in nextBatch
resumptionToken=token)
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
line 286, in makeRequestErrorHandling
xml = self.makeRequest(**kw)
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
line 346, in makeRequest
return retrieveFromUrlWaiting(request)
File
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
line 373, in retrieveFromUrlWaiting
f = urllib2.urlopen(request)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in
http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500:
```
The number of produced xml files is always the same (1194):
```
$ ls *.xml | wc -l
1194
```
In the dspace.log.2019-10-07 I see this:
```
2019-10-07 09:26:23,338 INFO org.dspace.usage.LoggerUsageEventListener @
anonymous:session_id=35781E7FCB2ABC37D6366C1CB7332D31:ip_addr=169.48.66.89::
2019-10-072019-10-07 09:26:23,338 INFO
org.dspace.usage.LoggerUsageEventListener @
anonymous:session_id=35781E7FCB2ABC37D6366C1CB7332D31:ip_addr=169.48.66.89::
2019-10-07 09:26:23,880 WARN
org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI
2.0 :: DSpace } Not able to retrieve the dspace