[ 
https://issues.apache.org/jira/browse/NUTCH-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832568#comment-16832568
 ] 

Sebastian Nagel commented on NUTCH-2585:
----------------------------------------

PR including fix is open: [#452|https://github.com/apache/nutch/pull/452]

I've decided to move the unsafe code block into a synchronized method. Because 
the TrieStringMatcher allows to mix matching and adding strings, the lazy 
conversion of nodes is mandatory. The impact on matching performance should be 
negligible because the synchronized method is only called on-demand if the node 
wasn't already prepared for matching.

> NPE in TrieStringMatcher
> ------------------------
>
>                 Key: NUTCH-2585
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2585
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.14
>            Reporter: Markus Jelsma
>            Priority: Major
>             Fix For: 1.16
>
>
> Stumbled on this one just now:
> {code}
> 2018-05-25 14:29:31,844 INFO [FetcherThread] 
> org.apache.nutch.fetcher.FetcherThread: FetcherThread 42 fetch of 
> http://www.ndcmediagroep.nl/wp-content/uploads/2017/03/Leaflet-Noflik-Wenje.pdf
>  failed with: java.lang.NullPointerException
>       at 
> org.apache.nutch.util.TrieStringMatcher$TrieNode.getChild(TrieStringMatcher.java:107)
>       at 
> org.apache.nutch.util.SuffixStringMatcher.shortestMatch(SuffixStringMatcher.java:74)
>       at 
> org.apache.nutch.urlfilter.suffix.SuffixURLFilter.filter(SuffixURLFilter.java:164)
>       at org.apache.nutch.net.URLFilters.filter(URLFilters.java:43)
>       at 
> org.apache.nutch.fetcher.FetcherThread.handleRedirect(FetcherThread.java:487)
>       at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:404)
> {code}
> Edit - added on 1 may 2019, i got a slightly different strack trace and using 
> PrefixURLFilter this time:
> {code}
> 2019-05-01 08:50:07,282 INFO [FetcherThread] 
> org.apache.nutch.fetcher.FetcherThread: FetcherThread 38 fetch of 
> https://kanaalstreek.nl/fzh/2018/06/04/vijf-maal-goud-voor-pegasus-op-nk 
> failed with: java.lang.NullPointerException
>       at 
> org.apache.nutch.util.TrieStringMatcher$TrieNode.getChild(TrieStringMatcher.java:107)
>       at 
> org.apache.nutch.util.PrefixStringMatcher.shortestMatch(PrefixStringMatcher.java:79)
>       at 
> org.apache.nutch.urlfilter.prefix.PrefixURLFilter.filter(PrefixURLFilter.java:73)
>       at org.apache.nutch.net.URLFilters.filter(URLFilters.java:43)
>       at 
> org.apache.nutch.fetcher.FetcherThread.handleRedirect(FetcherThread.java:487)
>       at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:404)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to