Github user HansBrende commented on a diff in the pull request: https://github.com/apache/any23/pull/59#discussion_r163680710 --- Diff: core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java --- @@ -105,7 +109,24 @@ public void run( parser.getParserConfig().addNonFatalError(BasicParserSettings.NORMALIZE_DATATYPE_VALUES); //ByteBuffer seems to represent incorrect content. Need to make sure it is the content //of the <script> node and not anything else! - parser.parse(in, extractionContext.getDocumentIRI().stringValue()); + RDFFormat format = parser.getRDFFormat(); + String iri = extractionContext.getDocumentIRI().stringValue(); + + if (format.hasFileExtension("xhtml")) { --- End diff -- All of the RDF formats defined in the RDFFormat class which have file extension "html" also have file extension "xhtml". (And there is exactly one of them: the RDFA format--which is the one that is giving us problems.)
---