[dspace-tech] Limit on OAI harvested data?

2019-10-17 Thread Michel Santana
We had a problem very similar and was because the OAI server has a bug in one 
library that causes a server crash when the dataset has an item with a non 
standard UNICODE character in the Metadata.
The solution is to change that library (xalan 1.7.1) to a previous one (xalan 
1.7.0) in the pom.xml file.
I have done a pull request to solve this.
https://github.com/4Science/DSpace/pull/101

Take a look and try. Good luck! 

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/2bdd8320-15f1-468f-a56a-d4c97d135439%40googlegroups.com.


[dspace-tech] Limit on OAI harvested data?

2019-10-15 Thread Theodotos Andreou
I am trying to harvest articles from our DSPace instance using the 
oai-harvert tool [1].

I run this command `oai-harvest -s col_123456789_2250 
https://dspace.example.org/oai/request?` where col_123456789_2250 is the 
set for articles. This tools creates an xml for each record but after a 
finite number of records (always the same) it crashes with 500:

```
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7694.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7717.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7662.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7712.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7691.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F6578.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7726.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7827.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1050.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F96.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F7835.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1066.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F234.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1040.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F162.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F209.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F6602.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F224.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F207.oai_dc.xml
DEBUGWriting to file 
/home/user/arena/oai-harvest/oai:dspace.example.org:10488%2F1139.oai_dc.xml
ERRORHTTP Error 500: 
Traceback (most recent call last):
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaiharvest/harvest.py",
 
line 339, in main 
**kwargs
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaiharvest/harvest.py",
 
line 173, in harvest
**kwargs):
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaiharvest/harvest.py",
 
line 125, in _listRecords
for record in client.listRecords(**kwargs):
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
 
line 365, in ResumptionListGenerator
result, token = nextBatch(token)
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
 
line 194, in nextBatch
resumptionToken=token)
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
 
line 286, in makeRequestErrorHandling
xml = self.makeRequest(**kw)
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
 
line 346, in makeRequest
return retrieveFromUrlWaiting(request)
  File 
"/home/user/arena/oai-harvest/venv/lib/python3.6/site-packages/oaipmh/client.py",
 
line 373, in retrieveFromUrlWaiting
f = urllib2.urlopen(request)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in 
http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500:
```

The number of produced xml files is always the same (1194):

```
$ ls *.xml | wc -l
1194
```

In the dspace.log.2019-10-07 I see this:

```
2019-10-07 09:26:23,338 INFO  org.dspace.usage.LoggerUsageEventListener @ 
anonymous:session_id=35781E7FCB2ABC37D6366C1CB7332D31:ip_addr=169.48.66.89::
2019-10-072019-10-07 09:26:23,338 INFO  
org.dspace.usage.LoggerUsageEventListener @ 
anonymous:session_id=35781E7FCB2ABC37D6366C1CB7332D31:ip_addr=169.48.66.89::
2019-10-07 09:26:23,880 WARN  
org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 
2.0 :: DSpace } Not able to retrieve the dspace