[jira] [Resolved] (CONNECTORS-1720) Fatal Error due to NPE in RepriorizationTracker

2022-07-14 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1720.
-
Resolution: Fixed

r1902717


> Fatal Error due to NPE in RepriorizationTracker
> ---
>
> Key: CONNECTORS-1720
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1720
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.22
>Reporter: Maunel Bach
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.23
>
>
> A couple of times a week our ManifoldCF instance logged a stacktrace like 
> that:
> {code:bash}
> 2022-07-06T14:04:39,297 FATAL [Set priority thread] 
> org.apache.manifoldcf.crawlerthreads: Error tossed: null
> java.lang.NullPointerException: null
>   at 
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460)
>  ~[mcf-pull-agent.jar:?]
>   at java.util.HashMap.hash(HashMap.java:339) ~[?:?]
>   at java.util.HashMap.get(HashMap.java:552) ~[?:?]
>   at 
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
>  ~[mcf-pull-agent.jar:?]{code}
> We tracked it down to a not null safe {{hashCode()}} implementation of the 
> private {{PreloadRequest}} class consuming the first entry of the array 
> {{binNames}}. The specifics of the array is an implementation detail of a 
> connector base class. Potentially array might contain {{null}} values at 
> least for the {{JiraRepositoryConnector}} and more critical for the 
> {{WebcrawlerConnector}}.
> As a solution we fixed the {{hashCode()}} implementation in our own code base 
> which did the trick. We should consider to fix this bug in the Apache 
> repository as well. But of course the fix does not help improve the 
> extraction and usage of the {{binName}} itself.
> We stumbled over this bug because the jobs on our ManifoldCF instance got 
> stuck. Though we cannot be sure whether there was a correlation between this 
> greater incident and the fatal error here because that was not the only fatal 
> bug we fixed to get ManifoldCF up an running again. But the topic 
> "repriorization" indicates that a bit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1720) Fatal Error due to NPE in RepriorizationTracker

2022-07-14 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566747#comment-17566747
 ] 

Karl Wright commented on CONNECTORS-1720:
-

[~mil7], ok, now it is clearer what the failure mode is.

The binNames[] array comes from the connector.  The connector is supposed to 
insure that they make sense in the context of throttling.  A binName that is 
null or empty is not expected because it basically means the connector is doing 
something strange.  The WebConnector, which you referenced, uses the domain 
name as a bin name, so any URL that doesn't have a domain name isn't in fact a 
valid URL.  I'd be very interested to see how that could happen given this.


> Fatal Error due to NPE in RepriorizationTracker
> ---
>
> Key: CONNECTORS-1720
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1720
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.22
>Reporter: Maunel Bach
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.23
>
>
> A couple of times a week our ManifoldCF instance logged a stacktrace like 
> that:
> {code:bash}
> 2022-07-06T14:04:39,297 FATAL [Set priority thread] 
> org.apache.manifoldcf.crawlerthreads: Error tossed: null
> java.lang.NullPointerException: null
>   at 
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460)
>  ~[mcf-pull-agent.jar:?]
>   at java.util.HashMap.hash(HashMap.java:339) ~[?:?]
>   at java.util.HashMap.get(HashMap.java:552) ~[?:?]
>   at 
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
>  ~[mcf-pull-agent.jar:?]{code}
> We tracked it down to a not null safe {{hashCode()}} implementation of the 
> private {{PreloadRequest}} class consuming the first entry of the array 
> {{binNames}}. The specifics of the array is an implementation detail of a 
> connector base class. Potentially array might contain {{null}} values at 
> least for the {{JiraRepositoryConnector}} and more critical for the 
> {{WebcrawlerConnector}}.
> As a solution we fixed the {{hashCode()}} implementation in our own code base 
> which did the trick. We should consider to fix this bug in the Apache 
> repository as well. But of course the fix does not help improve the 
> extraction and usage of the {{binName}} itself.
> We stumbled over this bug because the jobs on our ManifoldCF instance got 
> stuck. Though we cannot be sure whether there was a correlation between this 
> greater incident and the fatal error here because that was not the only fatal 
> bug we fixed to get ManifoldCF up an running again. But the topic 
> "repriorization" indicates that a bit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (CONNECTORS-1720) Fatal Error due to NPE in RepriorizationTracker

2022-07-14 Thread Maunel Bach (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maunel Bach reopened CONNECTORS-1720:
-

Hi there,
unfortunately my description has not been complete. Fixing the fatal error 
gives way to a "proper" exception:
{code:bash}
Description2022-07-12T14:00:34,044 ERROR [Set priority thread] 
org.apache.manifoldcf.crawlerthreads: Set priority thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (07004): (conn=278783) Parameter at position 2 is not 
setorg.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
exception: SQLException doing query (07004): (conn=278783) Parameter at 
position 2 is not set  at 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:715)
 ~[mcf-core.jar:?]at 
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:741)
 ~[mcf-core.jar:?]   at 
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
 ~[mcf-core.jar:?]   at 
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
 ~[mcf-core.jar:?] at 
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
 ~[mcf-core.jar:?]  at 
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) 
~[mcf-core.jar:?]   at 
org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:918)
 ~[mcf-core.jar:?]   at 
org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221) 
~[mcf-core.jar:?] at 
org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValues(BinManager.java:152)
 ~[mcf-pull-agent.jar:?] at 
org.apache.manifoldcf.crawler.bins.BinManager.getIncrementBinValuesInTransaction(BinManager.java:205)
 ~[mcf-pull-agent.jar:?]at 
org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.preloadBinValues(ReprioritizationTracker.java:254)
 ~[classes/:?]   at 
org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1256)
 ~[mcf-pull-agent.jar:?]at 
org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
 ~[mcf-pull-agent.jar:?]Caused by: java.sql.SQLTransientConnectionException: 
(conn=278783) Parameter at position 2 is not set  at
[...]
{code}

In the end we were forced to change every assignment of binNames in 
{{PriorityCalculator.java}} into
{code:java}
String binName = binNames[i]==null? "" : binNames[i];
{code}

Sorry for the hazzle.

> Fatal Error due to NPE in RepriorizationTracker
> ---
>
> Key: CONNECTORS-1720
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1720
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.22
>Reporter: Maunel Bach
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.23
>
>
> A couple of times a week our ManifoldCF instance logged a stacktrace like 
> that:
> {code:bash}
> 2022-07-06T14:04:39,297 FATAL [Set priority thread] 
> org.apache.manifoldcf.crawlerthreads: Error tossed: null
> java.lang.NullPointerException: null
>   at 
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker$PreloadKey.hashCode(ReprioritizationTracker.java:460)
>  ~[mcf-pull-agent.jar:?]
>   at java.util.HashMap.hash(HashMap.java:339) ~[?:?]
>   at java.util.HashMap.get(HashMap.java:552) ~[?:?]
>   at 
> org.apache.manifoldcf.crawler.reprioritizationtracker.ReprioritizationTracker.addPreloadRequest(ReprioritizationTracker.java:234)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.PriorityCalculator.makePreloadRequest(PriorityCalculator.java:123)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1247)
>  ~[mcf-pull-agent.jar:?]
>   at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
>  ~[mcf-pull-agent.jar:?]{code}
> We tracked it down to a not null safe {{hashCode()}} implementation of the 
> private {{PreloadRequest}} class consuming the first entry of the array 
> {{binNames}}. The specifics of the array is an implementation detail of a 
> connector base class. Potentially array might contain {{null}} values at 
> least for the {{JiraRepositoryConnector}} and more critical for the 
> {{WebcrawlerConnector}}.
> As a solution we fixed the {{hashCode()}} implementation in our own code base 
> which did the trick. We should consider to fix this bug in the Apache 
> repository as well. But of course the fix does not help improve the 
> extraction and usage of the {{binName}} itself.
> We stumbled over this bug because the