[ https://issues.apache.org/jira/browse/PIG-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dale Jin updated PIG-2110: -------------------------- Attachment: SearchTermExtractor.diff > NullPointerException in > piggybank.evaluation.util.apachelogparser.SearchTermExtractor > ------------------------------------------------------------------------------------- > > Key: PIG-2110 > URL: https://issues.apache.org/jira/browse/PIG-2110 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0 > Reporter: Michael Brauwerman > Attachments: SearchTermExtractor.diff > > Original Estimate: 24h > Remaining Estimate: 24h > > When processing a large log file, I get an exception in > SearchTermExtractor.exec > I don't have a specific log line with a repro yet, but I assume the error > occurs when the input URL is null, or maybe just has no query string: > I think a fix would be to be add a guard after creating queryString: > String queryString = urlObject.getQuery(); > if (queryString == null) { return null; } > Stack Trace: > <code> > Caused by: java.io.IOException: Caught exception processing input row > at > org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:195) > at > org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:64) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) > Caused by: java.lang.NullPointerException > at java.util.regex.Matcher.getTextLength(Matcher.java:1140) > at java.util.regex.Matcher.reset(Matcher.java:291) > at java.util.regex.Matcher.reset(Matcher.java:311) > at > org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:170) > </code> -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira