Github user HansBrende commented on a diff in the pull request:

    https://github.com/apache/any23/pull/59#discussion_r163680710
  
    --- Diff: 
core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java ---
    @@ -105,7 +109,24 @@ public void run(
                 
parser.getParserConfig().addNonFatalError(BasicParserSettings.NORMALIZE_DATATYPE_VALUES);
                 //ByteBuffer seems to represent incorrect content. Need to 
make sure it is the content
                 //of the <script> node and not anything else!
    -            parser.parse(in, 
extractionContext.getDocumentIRI().stringValue());
    +            RDFFormat format = parser.getRDFFormat();
    +            String iri = extractionContext.getDocumentIRI().stringValue();
    +
    +            if (format.hasFileExtension("xhtml")) {
    --- End diff --
    
    All of the RDF formats defined in the RDFFormat class which have file 
extension "html" also have file extension "xhtml". (And there is exactly one of 
them: the RDFA format--which is the one that is giving us problems.)


---

Reply via email to