[ https://issues.apache.org/jira/browse/TIKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-740. -------------------------------- Resolution: Won't Fix > SAX parser used for HTML > ------------------------ > > Key: TIKA-740 > URL: https://issues.apache.org/jira/browse/TIKA-740 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.0 > Reporter: Erik Hetzner > Attachments: a221657.html > > > {noformat} > egh@gales[510] 1 :~/d/software/tika-trunk > $ java -jar tika-app/target/tika-app-1.0-SNAPSHOT.jar -v > http://www.almasry-alyoum.com/article2.aspx?ArticleID=221657 > /dev/null > Exception in thread "main" org.apache.tika.exception.TikaException: XML parse > error > at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:126) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:367) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:97) > Caused by: org.xml.sax.SAXParseException: The element type "td" must be > terminated by the matching end-tag "</td>". > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195) > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174) > at > com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388) > at > com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1749) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) > at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:65) > ... 6 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)