Github user HansBrende commented on a diff in the pull request: https://github.com/apache/any23/pull/71#discussion_r178926732 --- Diff: core/src/main/java/org/apache/any23/extractor/html/HTMLDocument.java --- @@ -375,15 +376,16 @@ public String getDefaultLanguage() { private java.net.URI getBaseIRI() throws ExtractionException { if (baseIRI == null) { + String uri = (document instanceof Document ? (Document)document : document.getOwnerDocument()).getDocumentURI(); try { - if (document.getBaseURI() == null) { - log.warn("document.getBaseURI() is null, this should not happen"); + if (uri == null) { + log.warn("document.getBaseURI() is null, this should not happen", new Exception()); --- End diff -- @lewismc Oops, yeah, I added that exception in there just to get the stacktrace for debugging purposes, but forgot to take it out again!
---