[ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133880#comment-13133880 ]
Pablo Queixalos commented on TIKA-760: -------------------------------------- Concerning the HSLFExtractor, this is already fixed in trunk. getAuthor() is checked before calling xhtml.characters( comment.getAuthor() ); > NPE XHTMLContentHandler in characters Method > -------------------------------------------- > > Key: TIKA-760 > URL: https://issues.apache.org/jira/browse/TIKA-760 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.10 > Environment: JDK 1.6, Linux > Reporter: Torsten Krah > > The method: > public void characters(String characters) throws SAXException { > characters(characters.toCharArray(), 0, characters.length()); > } > does not check for null values. > On many code references a check is done "before" calling this methd. However > on other sides, e.g. HSLFExtractor some values are not checked: > xhtml.characters( comment.getAuthor() ); > which may be null. > The simplest fix would be to check for null on the handler and if it is null > handle it as NOOP or insert the new UTF-8 "replacement char" to let the user > decide. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira