[jira] [Resolved] (CONNECTORS-1720) Fatal Error due to NPE in RepriorizationTracker
[ https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1720. - Resolution: Fixed r1902717 > Fatal Error due to NPE in RepriorizationTracker > --- > > Key: CONNECTORS-1720 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1720 > Project: ManifoldCF > Issue Type: Bug >Affects Versions: ManifoldCF 2.22 >Reporter: Maunel Bach >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.23 > > > A couple of times a week our ManifoldCF instance logged a stacktrace like > that: > {code:bash} > 2022-07-06T14:04:39,297 FATAL [Set priority thread] > org.apache.manifoldcf.crawlerthreads: Error tossed: null > java.lang.NullPointerException: null > at > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460) > ~[mcf-pull-agent.jar:?] > at java.util.HashMap.hash(HashMap.java:339) ~[?:?] > at java.util.HashMap.get(HashMap.java:552) ~[?:?] > at > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141) > ~[mcf-pull-agent.jar:?]{code} > We tracked it down to a not null safe {{hashCode()}} implementation of the > private {{PreloadRequest}} class consuming the first entry of the array > {{binNames}}. The specifics of the array is an implementation detail of a > connector base class. Potentially array might contain {{null}} values at > least for the {{JiraRepositoryConnector}} and more critical for the > {{WebcrawlerConnector}}. > As a solution we fixed the {{hashCode()}} implementation in our own code base > which did the trick. We should consider to fix this bug in the Apache > repository as well. But of course the fix does not help improve the > extraction and usage of the {{binName}} itself. > We stumbled over this bug because the jobs on our ManifoldCF instance got > stuck. Though we cannot be sure whether there was a correlation between this > greater incident and the fatal error here because that was not the only fatal > bug we fixed to get ManifoldCF up an running again. But the topic > "repriorization" indicates that a bit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONNECTORS-1720) Fatal Error due to NPE in RepriorizationTracker
[ https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566747#comment-17566747 ] Karl Wright commented on CONNECTORS-1720: - [~mil7], ok, now it is clearer what the failure mode is. The binNames[] array comes from the connector. The connector is supposed to insure that they make sense in the context of throttling. A binName that is null or empty is not expected because it basically means the connector is doing something strange. The WebConnector, which you referenced, uses the domain name as a bin name, so any URL that doesn't have a domain name isn't in fact a valid URL. I'd be very interested to see how that could happen given this. > Fatal Error due to NPE in RepriorizationTracker > --- > > Key: CONNECTORS-1720 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1720 > Project: ManifoldCF > Issue Type: Bug >Affects Versions: ManifoldCF 2.22 >Reporter: Maunel Bach >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.23 > > > A couple of times a week our ManifoldCF instance logged a stacktrace like > that: > {code:bash} > 2022-07-06T14:04:39,297 FATAL [Set priority thread] > org.apache.manifoldcf.crawlerthreads: Error tossed: null > java.lang.NullPointerException: null > at > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460) > ~[mcf-pull-agent.jar:?] > at java.util.HashMap.hash(HashMap.java:339) ~[?:?] > at java.util.HashMap.get(HashMap.java:552) ~[?:?] > at > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141) > ~[mcf-pull-agent.jar:?]{code} > We tracked it down to a not null safe {{hashCode()}} implementation of the > private {{PreloadRequest}} class consuming the first entry of the array > {{binNames}}. The specifics of the array is an implementation detail of a > connector base class. Potentially array might contain {{null}} values at > least for the {{JiraRepositoryConnector}} and more critical for the > {{WebcrawlerConnector}}. > As a solution we fixed the {{hashCode()}} implementation in our own code base > which did the trick. We should consider to fix this bug in the Apache > repository as well. But of course the fix does not help improve the > extraction and usage of the {{binName}} itself. > We stumbled over this bug because the jobs on our ManifoldCF instance got > stuck. Though we cannot be sure whether there was a correlation between this > greater incident and the fatal error here because that was not the only fatal > bug we fixed to get ManifoldCF up an running again. But the topic > "repriorization" indicates that a bit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (CONNECTORS-1720) Fatal Error due to NPE in RepriorizationTracker
[ https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maunel Bach reopened CONNECTORS-1720: - Hi there, unfortunately my description has not been complete. Fixing the fatal error gives way to a "proper" exception: {code:bash} Description2022-07-12T14:00:34,044 ERROR [Set priority thread] org.apache.manifoldcf.crawlerthreads: Set priority thread aborting and restarting due to database connection reset: Database exception: SQLException doing query (07004): (conn=278783) Parameter at position 2 is not setorg.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException doing query (07004): (conn=278783) Parameter at position 2 is not set at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:715) ~[mcf-core.jar:?]at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:741) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:918) ~[mcf-core.jar:?] at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221) ~[mcf-core.jar:?] at org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValues(BinManager.java:152) ~[mcf-pull-agent.jar:?] at org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValuesInTransaction(BinManager.java:205) ~[mcf-pull-agent.jar:?]at org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.preloadBinValues(ReprioritizationTracker.java:254) ~[classes/:?] at org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1256) ~[mcf-pull-agent.jar:?]at org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141) ~[mcf-pull-agent.jar:?]Caused by: java.sql.SQLTransientConnectionException: (conn=278783) Parameter at position 2 is not set at [...] {code} In the end we were forced to change every assignment of binNames in {{PriorityCalculator.java}} into {code:java} String binName = binNames[i]==null? "" : binNames[i]; {code} Sorry for the hazzle. > Fatal Error due to NPE in RepriorizationTracker > --- > > Key: CONNECTORS-1720 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1720 > Project: ManifoldCF > Issue Type: Bug >Affects Versions: ManifoldCF 2.22 >Reporter: Maunel Bach >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.23 > > > A couple of times a week our ManifoldCF instance logged a stacktrace like > that: > {code:bash} > 2022-07-06T14:04:39,297 FATAL [Set priority thread] > org.apache.manifoldcf.crawlerthreads: Error tossed: null > java.lang.NullPointerException: null > at > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460) > ~[mcf-pull-agent.jar:?] > at java.util.HashMap.hash(HashMap.java:339) ~[?:?] > at java.util.HashMap.get(HashMap.java:552) ~[?:?] > at > org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141) > ~[mcf-pull-agent.jar:?]{code} > We tracked it down to a not null safe {{hashCode()}} implementation of the > private {{PreloadRequest}} class consuming the first entry of the array > {{binNames}}. The specifics of the array is an implementation detail of a > connector base class. Potentially array might contain {{null}} values at > least for the {{JiraRepositoryConnector}} and more critical for the > {{WebcrawlerConnector}}. > As a solution we fixed the {{hashCode()}} implementation in our own code base > which did the trick. We should consider to fix this bug in the Apache > repository as well. But of course the fix does not help improve the > extraction and usage of the {{binName}} itself. > We stumbled over this bug because the