[
https://issues.apache.org/jira/browse/CONNECTORS-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972830#comment-13972830
]
Karl Wright commented on CONNECTORS-927:
----------------------------------------
Online, as David says, I find the following:
>>>>>>
This appears to be a bug in the latest version of java
link to bug: https://bugs.openjdk.java.net/browse/JDK-8028111
Temp Solution: (may be security risks involved, I just know this worked for me)
Use the jaxp.properties File
The jaxp.properties file is a plain configuration file. It is located at
${java.home}/lib/jaxp.properties where java.home is the JRE install directory,
e.g., [path to installation directory]/jdk7/jre.
A limit can be set by adding the following line to the jaxp.properties file:
jdk.xml.maxGeneralEntitySizeLimit=0
Note that the property name is the same as that of the system property and has
the prefix jdk.xml. Setting it to 0 gives it no limit.
When the property is set via the file, all invocations of the JDK and JRE will
observe the limit.
http://docs.oracle.com/javase/tutorial/jaxp/limits/using.html
<<<<<<
So in theory, setting a system property of jdk.xml.maxGeneralEntitySizeLimit to
a value of 0 should work. If not zero, then at least something very large,
e.g. Integer.MAX_VALUE.
I'll propose a Solr connector patch with this baked in, and let's see if it
works on jdk 1.7 for those who need it.
> Message: JAXP00010001: The parser has encountered more than "64000" entity
> expansions in this document; this is the limit imposed by the JDK.
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-927
> URL: https://issues.apache.org/jira/browse/CONNECTORS-927
> Project: ManifoldCF
> Issue Type: Bug
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.5.1
> Reporter: David Morana
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
>
> FYI:
> Initially, large (>500MB) zip files in a Livelink repository would halt the
> crawl.
> And eventually these errors would happen on any size file. (See below)
> Oracle originally had a work around; (set entityExpansionLimit=0) but it
> didn’t work
> This is a known bug in JDK 5, 6, and 7.
> We upgraded to JDK8 and it seems to have fixed the issue
> You can read about it here: https://bugs.openjdk.java.net/browse/JDK-8028111
> And here:
> http://stackoverflow.com/questions/20482331/whats-causing-these-parseerror-exceptions-when-reading-off-an-aws-sqs-queue-in
> {code}
> 2014-04-12 07:39:20,730 [Worker thread '21'] WARN
> org.apache.manifoldcf.ingest- Solr exception during indexing
> https://[redacted]/cs/llisapi.dll?func=ll&objID=2547652&objAction=download
> (500): parsing error
> org.apache.solr.common.SolrException: parsing error
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
> at
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:325)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:949)
> Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
> Message: JAXP00010001: The parser has encountered more than "64000" entity
> expansions in this document; this is the limit imposed by the JDK.
> at
> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)
> at
> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)
> at
> com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)
> at
> com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:155)
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:99)
> ... 4 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)