[
https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051007#comment-18051007
]
Markus Schuch commented on CONNECTORS-1720:
-------------------------------------------
Raised PR https://github.com/apache/manifoldcf/pull/172
> Fatal Error due to NPE in RepriorizationTracker
> -----------------------------------------------
>
> Key: CONNECTORS-1720
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1720
> Project: ManifoldCF
> Issue Type: Bug
> Affects Versions: ManifoldCF 2.22
> Reporter: Maunel Bach
> Assignee: Karl Wright
> Priority: Major
> Fix For: ManifoldCF 2.23
>
>
> A couple of times a week our ManifoldCF instance logged a stacktrace like
> that:
> {code:bash}
> 2022-07-06T14:04:39,297 FATAL [Set priority thread]
> org.apache.manifoldcf.crawlerthreads: Error tossed: null
> java.lang.NullPointerException: null
> at
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460)
> ~[mcf-pull-agent.jar:?]
> at java.util.HashMap.hash(HashMap.java:339) ~[?:?]
> at java.util.HashMap.get(HashMap.java:552) ~[?:?]
> at
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
> ~[mcf-pull-agent.jar:?]{code}
> We tracked it down to a not null safe {{hashCode()}} implementation of the
> private {{PreloadRequest}} class consuming the first entry of the array
> {{binNames}}. The specifics of the array is an implementation detail of a
> connector base class. Potentially array might contain {{null}} values at
> least for the {{JiraRepositoryConnector}} and more critical for the
> {{WebcrawlerConnector}}.
> As a solution we fixed the {{hashCode()}} implementation in our own code base
> which did the trick. We should consider to fix this bug in the Apache
> repository as well. But of course the fix does not help improve the
> extraction and usage of the {{binName}} itself.
> We stumbled over this bug because the jobs on our ManifoldCF instance got
> stuck. Though we cannot be sure whether there was a correlation between this
> greater incident and the fatal error here because that was not the only fatal
> bug we fixed to get ManifoldCF up an running again. But the topic
> "repriorization" indicates that a bit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)