The problem isn't really with DTD validation (by default validation is disabled). The underlying problem is that the DTD can't be found (which is indicated in your stack trace below). The underlying parser will try and retrieve the DTD (regardless of validation) because things such as entities could be expressed in the DTD.
I will explore providing access to some of the underlying 'processor' configurations. For example, you could provide your own EntityResolver class that could either completely ignore the Doctype declaration (return a 'dummy' DTD that is completely empty) or you could have it find 'local' versions (on the workers or in S3 and then cache them locally for performance). I will post an update when the code has been adjusted. Darin. ----- Original Message ----- From: Shivalik <shivalik.malho...@outlook.com> To: user@spark.apache.org Sent: Tuesday, December 1, 2015 8:15 AM Subject: Turning off DTD Validation using XML Utils package - Spark Hi Team, I've been using XML Utils library (http://spark-packages.org/package/elsevierlabs-os/spark-xml-utils) to parse XML using XPath in a spark job. One problem I am facing is with the DTDs. My XML file, has a doctype tag included in it. I want to turn off DTD validation using this library since I don't have access to DTD file. Has someone faced this problem before. Please help. The exception I am getting it is as below: stage 0.0 (TID 0, localhost): com.elsevier.spark_xml_utils.xpath.XPathException: I/O error reported by XML parser processing null: <path>/filename.dtd (No such file or directory) at com.elsevier.spark_xml_utils.xpath.XPathProcessor.evaluate(XPathProcessor.java:301) at com.elsevier.spark_xml_utils.xpath.XPathProcessor.evaluateString(XPathProcessor.java:219) at com.thomsonreuters.xmlutils.XMLParser.lambda$0(XMLParser.java:31) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Turning-off-DTD-Validation-using-XML-Utils-package-Spark-tp25534.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org