Maunel Bach created CONNECTORS-1720:
---------------------------------------
Summary: Fatal Error due to NPE in RepriorizationTracker
Key: CONNECTORS-1720
URL: https://issues.apache.org/jira/browse/CONNECTORS-1720
Project: ManifoldCF
Issue Type: Bug
Affects Versions: ManifoldCF 2.22
Reporter: Maunel Bach
A couple of times a week our ManifoldCF instance logged a stacktrace like that:
{code:bash}
2022-07-06T14:04:39,297 FATAL [Set priority thread]
org.apache.manifoldcf.crawlerthreads: Error tossed: null
java.lang.NullPointerException: null
at
org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460)
~[mcf-pull-agent.jar:?]
at java.util.HashMap.hash(HashMap.java:339) ~[?:?]
at java.util.HashMap.get(HashMap.java:552) ~[?:?]
at
org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
~[mcf-pull-agent.jar:?]{code}
We tracked it down to a not null safe {{hashCode()}} implementation of the
private {{PreloadRequest}} class consuming the first entry of the array
{{binNames}}. The specifics of the array is an implementation detail of a
connector base class. Potentially array might contain {{null}} values at least
for the {{JiraRepositoryConnector}} and more critical for the
{{WebcrawlerConnector}}.
As a solution we fixed the {{hashCode()}} implementation in our own code base
which did the trick. We should consider to fix this bug in the Apache
repository as well. But of course the fix does not help improve the extraction
and usage of the {{binName}} itself.
We stumbled over this bug because the jobs on our ManifoldCF instance got
stuck. Though we cannot be sure whether there was a correlation between this
greater incident and the fatal error here because that was not the only fatal
bug we fixed to get ManifoldCF up an running again. But the topic
"repriorization" indicates that a bit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)