[jira] [Commented] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2019-05-13 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838490#comment-16838490
 ] 

roel goovaerts commented on CONNECTORS-1519:


Hi [~kwri...@metacarta.com],

No Problem, my question was only informative. On the basis of your answer I can 
see if there are resources to contribute this fix in our organisation.

Thanks

> CLIENTPROTOCOLEXCEPTION   is thrown with 2.10 -> ES 6.x.y
> ---
>
> Key: CONNECTORS-1519
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1519
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> Investigating CLIENTPROTOCOLEXCEPTION when using 2.10 with ES 6.x.y
> More information to follow.
> Fails when using security , i.e. 
> [http://user:password@elasticsearch:9200.|http://user:password@elasticsearch:9200./]
> Remedy:
>  # Disable x-pack security.
>  # Use http://elasticsearch:9200.
>  
>  
> |07-27-2018 17:53:19.010|Indexation 
> (ES)|file:/var/manifoldcf/corpus/14.html|CLIENTPROTOCOLEXCEPTION|38053|23|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2019-05-13 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838365#comment-16838365
 ] 

roel goovaerts commented on CONNECTORS-1519:


[~kwri...@metacarta.com]

Since the bug is of relative importance to us, I was wondering if there is a 
known estimation for the release of 2.14?

Cheers,

Roel

> CLIENTPROTOCOLEXCEPTION   is thrown with 2.10 -> ES 6.x.y
> ---
>
> Key: CONNECTORS-1519
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1519
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> Investigating CLIENTPROTOCOLEXCEPTION when using 2.10 with ES 6.x.y
> More information to follow.
> Fails when using security , i.e. 
> [http://user:password@elasticsearch:9200.|http://user:password@elasticsearch:9200./]
> Remedy:
>  # Disable x-pack security.
>  # Use http://elasticsearch:9200.
>  
>  
> |07-27-2018 17:53:19.010|Indexation 
> (ES)|file:/var/manifoldcf/corpus/14.html|CLIENTPROTOCOLEXCEPTION|38053|23|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2019-04-29 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829132#comment-16829132
 ] 

roel goovaerts commented on CONNECTORS-1519:


Hi, I think I'm experiencing this bug:

- running on es 6.5, manifold 2.12
- before setting up authorization in ES, crawling the specific site went fine
- recently we restricted the default 'anonymous'-user which we were using, and 
added a manifold user with the 'superuser'-role
- when added this user to the outputconnector's 'server'-tab to the username 
and userpassword fields, we receive a "serviceinterruption" in the 
process-phase, and a "clientprotocolexception" in the indexation-phase.
- when keeping the user-authentication in elasticsearch and removing the 
previously mentioned user from the outputconnector configuration, we receive an 
httperror with the actual elasticsearch-message "security_exception", which is 
excepted, which indicated to me that the issue is with providing the user and 
password to the outputconnector.

Could I have an update on what the state of this ticket is?

> CLIENTPROTOCOLEXCEPTION   is thrown with 2.10 -> ES 6.x.y
> ---
>
> Key: CONNECTORS-1519
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1519
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Investigating CLIENTPROTOCOLEXCEPTION when using 2.10 with ES 6.x.y
> More information to follow.
> Fails when using security , i.e. 
> [http://user:password@elasticsearch:9200.|http://user:password@elasticsearch:9200./]
> Remedy:
>  # Disable x-pack security.
>  # Use http://elasticsearch:9200.
>  
>  
> |07-27-2018 17:53:19.010|Indexation 
> (ES)|file:/var/manifoldcf/corpus/14.html|CLIENTPROTOCOLEXCEPTION|38053|23|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1601) selector/regexp for which http error responses should/shouldn't be deleted

2019-04-24 Thread roel goovaerts (JIRA)
roel goovaerts created CONNECTORS-1601:
--

 Summary: selector/regexp for which http error responses 
should/shouldn't be deleted
 Key: CONNECTORS-1601
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1601
 Project: ManifoldCF
  Issue Type: New Feature
  Components: Web connector
Affects Versions: ManifoldCF 2.12
Reporter: roel goovaerts


Would it be feasible to support a feature which lets the user control the 
deletions as an action response to certain error http codes?

In our use-case we would be ok with deleting 404 responses, but would need to 
keep 5xx's and 401's. As far as I was able to see in the UI and the source 
code, there is no operational distinction to make that would result in the 
prior description.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1599) response code 401 still gets deleted with the setting "keep unreachable documents"

2019-04-18 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820875#comment-16820875
 ] 

roel goovaerts commented on CONNECTORS-1599:


Hi [~kwri...@metacarta.com]

Is there some resource which specifies the different technical responses of 
manifold to different http codes?

Regards,
Roel

> response code 401 still gets deleted with the setting "keep unreachable 
> documents"
> --
>
> Key: CONNECTORS-1599
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1599
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Even with the "Hop count mode" set to "keep unreachable documents, 'for now' 
> || forever" manifold deletes documents for which it receives a 401 response 
> code.
> The documentation does not specify such a distinction as described above. Is 
> there some information/configuration that I'm missing? Is there a reasoning 
> behind the guaranteed deletion of a 401?
> Ideally, for our use-case, we would want to remove all documents that return 
> 404, but keep everything which is due the server not responding or the 
> crawler being unauthenticated.
> Is there a way to configure this in a more granular fashion?
> Regards,
> roel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1599) response code 401 still gets deleted with the setting "keep unreachable documents"

2019-04-11 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815471#comment-16815471
 ] 

roel goovaerts commented on CONNECTORS-1599:


fair enough, so the hop count mode is purely for documents being "unreachable" 
in terms of pathing via the intrinsiclink-table.
Is there some resource where different responses of manifold to different http 
codes are documented?

> response code 401 still gets deleted with the setting "keep unreachable 
> documents"
> --
>
> Key: CONNECTORS-1599
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1599
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Even with the "Hop count mode" set to "keep unreachable documents, 'for now' 
> || forever" manifold deletes documents for which it receives a 401 response 
> code.
> The documentation does not specify such a distinction as described above. Is 
> there some information/configuration that I'm missing? Is there a reasoning 
> behind the guaranteed deletion of a 401?
> Ideally, for our use-case, we would want to remove all documents that return 
> 404, but keep everything which is due the server not responding or the 
> crawler being unauthenticated.
> Is there a way to configure this in a more granular fashion?
> Regards,
> roel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1599) response code 401 still gets deleted with the setting "keep unreachable documents"

2019-04-11 Thread roel goovaerts (JIRA)
roel goovaerts created CONNECTORS-1599:
--

 Summary: response code 401 still gets deleted with the setting 
"keep unreachable documents"
 Key: CONNECTORS-1599
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1599
 Project: ManifoldCF
  Issue Type: Bug
  Components: Web connector
Affects Versions: ManifoldCF 2.12
Reporter: roel goovaerts


Even with the "Hop count mode" set to "keep unreachable documents, 'for now' || 
forever" manifold deletes documents for which it receives a 401 response code.

The documentation does not specify such a distinction as described above. Is 
there some information/configuration that I'm missing? Is there a reasoning 
behind the guaranteed deletion of a 401?

Ideally, for our use-case, we would want to remove all documents that return 
404, but keep everything which is due the server not responding or the crawler 
being unauthenticated.

Is there a way to configure this in a more granular fashion?

Regards,
roel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-10 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814313#comment-16814313
 ] 

roel goovaerts commented on CONNECTORS-1592:


The setting of the hop count mode was kept like this on the justification of 
requirements. 
But I think I'm following you now, I interpreted it as disabling the whole 
'tab'.
If i understand correctly with "disabling hop count filtering", you mean 
setting it to "keep unreachable documents, forever"?

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt, 
> SELECT_blocked_queries.txt, postgresql.conf, properties.xml
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
> [index=SYS_IDX_13329
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: 

[jira] [Commented] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-10 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814287#comment-16814287
 ] 

roel goovaerts commented on CONNECTORS-1592:


Hi Karl,
 
I have not yet seen any "very long-running" queries. upon looking at the logs 
(there was a bunch of long-running queries logged about an hour ago) there is 
not an 'extreme' maximum of time spent on a query: the largest was 223673ms, 
the minimum time spent was 172416ms, the others are distributed between these 
extrema. From this I suppose this is not really the issue.
 
I of course understand that it's not that evident to commit to a conference 
call, thanks for considering.
 
Just one more question, considering what you said of the hopcount filtering; In 
the "Hop Filters"-tab we have nothing of configuration except for "hop count 
mode" is set to "delete unreachable", which i had interpreted as being the 
default. Is this correct that it is the default, and is there something else we 
could do to disable hop count filtering?
 
We will continue to look for other possible external influences.
There is now a possibility that the settings of postgres automatically got 
reverted to the defaults (which would include autovacuum to be on), so we are 
looking into this now.
Thanks again for the info and the quick replies.
 
Regards,
Roel

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt, 
> SELECT_blocked_queries.txt, postgresql.conf, properties.xml
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a 

[jira] [Comment Edited] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-09 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813384#comment-16813384
 ] 

roel goovaerts edited comment on CONNECTORS-1592 at 4/9/19 1:17 PM:


Hi Karl,

We've had some time to analyze and debug in detail. First of all we ran the 
databasemaintenance script when manifold was shut down; after a restart and 2 
hours of crawling it started to log long-running queries again.

While monitoring, we noticed that there are frequent locks following queries; 
normally these logs are resolved quickly (as yo uwould expect). But every once 
in a while the locks start stacking up until postgres is using 100% of cpu and 
is shown as being idle and manifold is idle as well. After a while things pick 
up again and manifold starts logging long-running queries.

SELECT_blocked_queries.txt (attached to this ticket) contains a incomplete list 
of queries that are blocked by another process. This list was captured at a 
time we were monitoring such an inactive moment.

When looking into the tables created in postgres we saw that jobID is a primary 
key of the jobs table, and this is a foreign key for the intrinsiclink-table 
and jobqueue-table. There are 21 entries in the job-table and 1400+ entries in 
the jobQueue-table.

>From our analysis we have some hypothesis:
 - one 'root' query doesn't get committed, this keeps a lock on the job-, 
intrinsiclink- or jobQueue-table and cascades into the bulk of locked queries. 
Main question here is how one query could get stuck; can a query be waiting for 
something from manifold until it is committed?
 - there is a locking conflict that arises from the jobID being a foreing key 
constraint for both the jobQueue and intrinsiclinks. From debugging we have the 
impression that postgres locks the whole intrinsiclink-table in a query which 
is specified to have one specific jobId.

Your input in this issue would be appreciated. 
 Based on the "performance tuning" and "building ManifoldCF" resources we have 
verified our properties to be in the correct database limits; The only thing we 
were wondering, concerning the formula 'manifoldcf_db_pool_size * 
number_of_manifoldcf_processes <= maximum_postgresql_database_handles - 2', is 
if manifold_db_pool_size is postgres.max_connections?

 (i have also attached properties.xml and postgresql.conf, should they be 
necessary)

some additional questions:
 - could using the multi process-functionality of 
org.apache.manifoldcf.usejettyparentclassloader be used to improve this issue?
 - I have read that disabling swap can be good for intensive db-interactions; 
do you have experience with disabling swap improving manifold?
 - is there a possibility that we could set-up a conference call with someone 
from the manifold team?

Many thanks for your time.


was (Author: goovaertsr):
Hi Karl, 

We've had some time to analyze and debug in detail. First of all we ran the 
databasemaintenance script when manifold was shut down; after a restart and 2 
hours of crawling it started to log long-running queries again.

While monitoring, we noticed that there are frequent locks following queries; 
normally these logs are resolved quickly (as yo uwould expect). But every once 
in a while the locks start stacking up until postgres is using 100% of cpu and 
is shown as being idle and manifold is idle as well. After a while things pick 
up again and manifold starts logging long-running queries.

SELECT_blocked_queries.txt contains a incomplete list of queries that are 
blocked by another process. This list was captured at a time we were monitoring 
such an inactive moment.

When looking into the tables created in postgres we saw that jobID is a primary 
key of the jobs table, and this is a foreign key for the intrinsiclink-table 
and jobqueue-table. There are 21 entries in the job-table and 1400+ entries in 
the jobQueue-table.

>From our analysis we have some hypothesis:
- one 'root' query doesn't get committed, this keeps a lock on the job-, 
intrinsiclink- or jobQueue-table and cascades into the bulk of locked queries. 
Main question here is how one query could get stuck; can a query be waiting for 
something from manifold until it is committed?
- there is a locking conflict that arises from the jobID being a foreing key 
constraint for both the jobQueue and intrinsiclinks. From debugging we have the 
impression that postgres locks the whole intrinsiclink-table in a query which 
is specified to have one specific jobId.

Your input in this issue would be appreciated. 
Based on the "performance tuning" and "building ManifoldCF" resources we have 
verified our properties to be in the correct database limits; The only thing we 
were wondering, concerning the formula 'manifoldcf_db_pool_size * 
number_of_manifoldcf_processes <= maximum_postgresql_database_handles - 2', is 
if manifold_db_pool_size is 

[jira] [Updated] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-09 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1592:
---
Attachment: (was: properties.xml)

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt, 
> SELECT_blocked_queries.txt, postgresql.conf
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
> [index=SYS_IDX_13329
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: cardinality=3
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: access=INDEX PRED
>  WARN 

[jira] [Commented] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-09 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813384#comment-16813384
 ] 

roel goovaerts commented on CONNECTORS-1592:


Hi Karl, 

We've had some time to analyze and debug in detail. First of all we ran the 
databasemaintenance script when manifold was shut down; after a restart and 2 
hours of crawling it started to log long-running queries again.

While monitoring, we noticed that there are frequent locks following queries; 
normally these logs are resolved quickly (as yo uwould expect). But every once 
in a while the locks start stacking up until postgres is using 100% of cpu and 
is shown as being idle and manifold is idle as well. After a while things pick 
up again and manifold starts logging long-running queries.

SELECT_blocked_queries.txt contains a incomplete list of queries that are 
blocked by another process. This list was captured at a time we were monitoring 
such an inactive moment.

When looking into the tables created in postgres we saw that jobID is a primary 
key of the jobs table, and this is a foreign key for the intrinsiclink-table 
and jobqueue-table. There are 21 entries in the job-table and 1400+ entries in 
the jobQueue-table.

>From our analysis we have some hypothesis:
- one 'root' query doesn't get committed, this keeps a lock on the job-, 
intrinsiclink- or jobQueue-table and cascades into the bulk of locked queries. 
Main question here is how one query could get stuck; can a query be waiting for 
something from manifold until it is committed?
- there is a locking conflict that arises from the jobID being a foreing key 
constraint for both the jobQueue and intrinsiclinks. From debugging we have the 
impression that postgres locks the whole intrinsiclink-table in a query which 
is specified to have one specific jobId.

Your input in this issue would be appreciated. 
Based on the "performance tuning" and "building ManifoldCF" resources we have 
verified our properties to be in the correct database limits; The only thing we 
were wondering, concerning the formula 'manifoldcf_db_pool_size * 
number_of_manifoldcf_processes <= maximum_postgresql_database_handles - 2', is 
if manifold_db_pool_size is postgres.max_connections?

 

some additional questions:
- could using the multi process-functionality of 
org.apache.manifoldcf.usejettyparentclassloader be used to improve this issue?
- I have read that disabling swap can be good for intensive db-interactions; do 
you have experience with disabling swap improving manifold?
- is there a possibility that we could set-up a conference call with someone 
from the manifold team?

Many thanks for your time.

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt, 
> SELECT_blocked_queries.txt, postgresql.conf, properties.xml
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs 

[jira] [Updated] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-09 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1592:
---
Attachment: (was: 0904debug)

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt, 
> SELECT_blocked_queries.txt, postgresql.conf, properties.xml
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
> [index=SYS_IDX_13329
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: cardinality=3
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: access=INDEX 

[jira] [Updated] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-09 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1592:
---
Attachment: postgresql.conf
SELECT_blocked_queries.txt
properties.xml
0904debug

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: 0904debug, LongRunningWithPlan_thread39.txt, 
> SELECT_blocked_queries.txt, postgresql.conf, properties.xml
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
> [index=SYS_IDX_13329
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS
>  WARN 2019-03-08T23:58:20,500 

[jira] [Comment Edited] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-04 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809654#comment-16809654
 ] 

roel goovaerts edited comment on CONNECTORS-1592 at 4/4/19 9:29 AM:


Hi Karl, 
 We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
 If you would be so kind, I have attached  log of one thread with the 
description and plan of one long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?

[^LongRunningWithPlan_thread39.txt]

 Regards


was (Author: goovaertsr):
Hi Karl, 
 We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
 If you would be so kind, I have attached  log of one thread with the 
description and plan of on long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?

[^LongRunningWithPlan_thread39.txt]

 Regards

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query 

[jira] [Comment Edited] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-04 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809654#comment-16809654
 ] 

roel goovaerts edited comment on CONNECTORS-1592 at 4/4/19 9:29 AM:


Hi Karl, 
 We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
 If you would be so kind, I have attached  log of one thread with the 
description and plan of on long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?

[^LongRunningWithPlan_thread39.txt]

 Regards


was (Author: goovaertsr):
Hi Karl, 
 We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
 If you would be so kind, I have attached a truncated log of one thread with 
the description and plan of on long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?
 
[^LongRunningWithPlan_thread39.txt]

 Regards

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> 

[jira] [Comment Edited] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-04 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809654#comment-16809654
 ] 

roel goovaerts edited comment on CONNECTORS-1592 at 4/4/19 9:17 AM:


Hi Karl, 
 We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
 If you would be so kind, I have attached a truncated log of one thread with 
the description and plan of on long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?
 
[^LongRunningWithPlan_thread39.txt]

 Regards


was (Author: goovaertsr):
Hi Karl, 
We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
If you would be so kind, I have attached a truncated log of one thread with the 
description and plan of on long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?
[^LongRunningWithPlan_thread39.txt]

 

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> 

[jira] [Commented] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-04 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809654#comment-16809654
 ] 

roel goovaerts commented on CONNECTORS-1592:


Hi Karl, 
We are analyzing the plans of of the long-running queries and still have some 
questions/uncertainties.
If you would be so kind, I have attached a truncated log of one thread with the 
description and plan of on long-running query.

The main questions at this time are:
 * is this indeed a bad plan?
 * is this query influenced by the optimization of the db? (or how up to date 
the db is, as it is referenced to at one point in the documentation)
 * would the query be different if the db had been optimized/vacuumed?
[^LongRunningWithPlan_thread39.txt]

 

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 

[jira] [Updated] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-04 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1592:
---
Attachment: LongRunningWithPlan_thread39.txt

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
> Attachments: LongRunningWithPlan_thread39.txt
>
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
> [index=SYS_IDX_13329
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: cardinality=3
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: access=INDEX PRED
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) 

[jira] [Commented] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-03 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808658#comment-16808658
 ] 

roel goovaerts commented on CONNECTORS-1592:


Thanks for the quick reply, I will get back to you if this information is not 
enough to fix it.

> Found long running query in manifold scheduled job
> --
>
> Key: CONNECTORS-1592
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1592
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.12
>Reporter: Subasini Rath
>Priority: Major
>
> Hi Karl,
>    I am also facing the above mentioned issue. (Similar to Connector-880)
> I am using manifold2.12 binary version. I am using Solr output connector and 
> Web repository connection. Manifold is using all default configuration.
> When I am running the jobs manually, it runs fine. Same jobs have been 
> scheduled to run everyday.
> I am getting below exceptions and the job gets hanged/ going to waiting stage.
> Could you please help me in resolving the same.
> I am getting the below error -
> Scenario-1
> WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
> (2706114 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
> long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
> (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
>  WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 
> 'e'
>  WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running 
> query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE 
> needpriority=? LIMIT 1000]
>  WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
>  WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
>  WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
> long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
>  WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 
> 0: 'S'
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
> (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
>  WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
>  WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
> query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
>  WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
>  WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running 
> query (2496641 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
> isDistinctSelect=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
> PUBLIC.JOBS.ID not nullable
>  WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running 
> query (2435908 ms): [SELECT 
> t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
> t0 ORDER BY description ASC]
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
>  WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
>  WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
> [index=SYS_IDX_13329
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: cardinality=3
>  WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: access=INDEX PRED
>  WARN 

[jira] [Comment Edited] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-03 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808643#comment-16808643
 ] 

roel goovaerts edited comment on CONNECTORS-1592 at 4/3/19 11:52 AM:
-

Hi,

I am also experiencing the same (or likewise) issue; from time to time we 
notice that manifold gets stuck and has no activity (by way of scheduled 
crawling) whatsoever. 
 As far as my knowledge goes this could be related to the 'dead tuple bloat' 
and could be resolved by 'vacuum full', but the databasemaintenance script is 
run daily.

An example log from such a moment:

logs/manifoldcf.log: WARN 2019-04-02T18:29:17,988 (Worker thread '94') - Found 
a long-running query (132676 ms): [UPDATE hopcount SET distance=?,deathmark=? 
WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND 
t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE 
t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,988 (Worker thread '15') - Found 
a long-running query (131477 ms): [UPDATE hopcount SET distance=?,deathmark=? 
WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND 
t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE 
t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '23') - Found 
a long-running query (133229 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '8') - Found a 
long-running query (133217 ms): [SELECT parentidhash FROM intrinsiclink WHERE 
jobid=? AND (parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=?) AND linktype=? AND childidhash=? FOR UPDATE]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '36') - Found 
a long-running query (133212 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '29') - Found 
a long-running query (133168 ms): [SELECT parentidhash FROM intrinsiclink WHERE 
jobid=? AND (parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=?) AND linktype=? AND 
childidhash=? FOR UPDATE]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,993 (Worker thread '55') - Found 
a long-running query (132950 ms): [UPDATE hopcount SET distance=?,deathmark=? 
WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND 
t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE 
t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,993 (Worker thread '31') - Found 
a long-running query (133216 ms): [SELECT parentidhash FROM intrinsiclink WHERE 
jobid=? AND (parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=?) AND linktype=? AND childidhash=? FOR UPDATE]
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,994 (Worker thread '79') - Found 
a long-running query (133228 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?]
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,005 (Worker thread '88') - Found 
a long-running query (133234 ms): [UPDATE intrinsiclink SET 
processid=NULL,isnew=? WHERE jobid=? AND childidhash=? AND isnew IN (?,?)]
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,036 (Worker thread '45') - Found 
a long-running query (133329 ms): [SELECT id,status,checktime FROM jobqueue 
WHERE dochash=? AND jobid=? FOR UPDATE]
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,036 (Worker thread '60') - Found 
a long-running query (133264 ms): [SELECT id,status,checktime FROM jobqueue 
WHERE dochash=? AND jobid=? FOR UPDATE]
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,037 (Worker thread '38') - Found 
a long-running query (133468 ms): [SELECT id,status,checktime FROM jobqueue 
WHERE dochash=? AND jobid=? FOR UPDATE]
logs/manifoldcf.log: WARN 

[jira] [Comment Edited] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-03 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808643#comment-16808643
 ] 

roel goovaerts edited comment on CONNECTORS-1592 at 4/3/19 11:50 AM:
-

Hi,

I am also experiencing the same (or likewise) issue; from time to time we 
notice that manifold gets stuck and has no activity (by way of scheduled 
crawling) whatsoever. 
 As far as my knowledge goes this could be related to the 'dead tuple bloat' 
and could be resolved by 'vacuum full', but the databasemaintenance script is 
run daily.

An example log from such a moment:
{code:java}
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,988 (Worker thread '94') - Found 
a long-running query (132676 ms): [UPDATE hopcount SET distance=?,deathmark=? 
WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND 
t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE 
t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,988 (Worker thread 
'15') - Found a long-running query (131477 ms): [UPDATE hopcount SET 
distance=?,deathmark=? WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE 
t0.jobid=? AND t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 
WHERE t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread 
'23') - Found a long-running query (133229 ms): [UPDATE intrinsiclink SET 
processid=?,isnew=? WHERE jobid=? AND parentidhash=? AND linktype=? AND 
childidhash=?] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread 
'8') - Found a long-running query (133217 ms): [SELECT parentidhash FROM 
intrinsiclink WHERE jobid=? AND (parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=?) AND linktype=? AND childidhash=? FOR UPDATE] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '36') - Found 
a long-running query (133212 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '29') - Found 
a long-running query (133168 ms): [SELECT parentidhash FROM intrinsiclink WHERE 
jobid=? AND (parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=?) AND linktype=? AND 
childidhash=? FOR UPDATE] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,993 
(Worker thread '55') - Found a long-running query (132950 ms): [UPDATE hopcount 
SET distance=?,deathmark=? WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 
WHERE t0.jobid=? AND t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink 
t1 WHERE t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,993 (Worker thread 
'31') - Found a long-running query (133216 ms): [SELECT parentidhash FROM 
intrinsiclink WHERE jobid=? AND (parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=?) AND linktype=? AND childidhash=? FOR UPDATE] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,994 (Worker thread '79') - Found 
a long-running query (133228 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,005 (Worker thread '88') - Found 
a long-running query (133234 ms): [UPDATE intrinsiclink SET 
processid=NULL,isnew=? WHERE jobid=? AND childidhash=? AND isnew IN (?,?)] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,036 (Worker thread '45') - Found 
a long-running query (133329 ms): [SELECT id,status,checktime FROM jobqueue 
WHERE dochash=? AND jobid=? FOR UPDATE] logs/manifoldcf.log: WARN 
2019-04-02T18:29:18,036 (Worker thread '60') - Found a long-running query 
(133264 ms): [SELECT id,status,checktime FROM jobqueue WHERE dochash=? AND 
jobid=? FOR UPDATE] logs/manifoldcf.log: WARN 2019-04-02T18:29:18,037 (Worker 
thread '38') - Found a long-running query (133468 ms): [SELECT 
id,status,checktime FROM jobqueue WHERE dochash=? AND jobid=? FOR UPDATE] 

[jira] [Commented] (CONNECTORS-1592) Found long running query in manifold scheduled job

2019-04-03 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808643#comment-16808643
 ] 

roel goovaerts commented on CONNECTORS-1592:


Hi,

I am also experiencing the same (or likewise) issue; from time to time we 
notice that manifold gets stuck and has no activity (by way of scheduled 
crawling) whatsoever. 
As far as my knowledge goes this could be related to the 'dead tuple bloat' and 
could be resolved by 'vacuum full', but the databasemaintenance script is run 
daily.

An example log from such a moment:


{noformat}
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,988 (Worker thread '94') - Found 
a long-running query (132676 ms): [UPDATE hopcount SET distance=?,deathmark=? 
WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=? AND 
t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE 
t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,988 (Worker thread 
'15') - Found a long-running query (131477 ms): [UPDATE hopcount SET 
distance=?,deathmark=? WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE 
t0.jobid=? AND t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 
WHERE t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread 
'23') - Found a long-running query (133229 ms): [UPDATE intrinsiclink SET 
processid=?,isnew=? WHERE jobid=? AND parentidhash=? AND linktype=? AND 
childidhash=?] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread 
'8') - Found a long-running query (133217 ms): [SELECT parentidhash FROM 
intrinsiclink WHERE jobid=? AND (parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=?) AND linktype=? AND childidhash=? FOR UPDATE] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '36') - Found 
a long-running query (133212 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,989 (Worker thread '29') - Found 
a long-running query (133168 ms): [SELECT parentidhash FROM intrinsiclink WHERE 
jobid=? AND (parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=?) AND linktype=? AND 
childidhash=? FOR UPDATE] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,993 
(Worker thread '55') - Found a long-running query (132950 ms): [UPDATE hopcount 
SET distance=?,deathmark=? WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 
WHERE t0.jobid=? AND t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink 
t1 WHERE t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND 
t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash AND 
t1.isnew=?))] logs/manifoldcf.log: WARN 2019-04-02T18:29:17,993 (Worker thread 
'31') - Found a long-running query (133216 ms): [SELECT parentidhash FROM 
intrinsiclink WHERE jobid=? AND (parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=? OR parentidhash=? OR parentidhash=? OR 
parentidhash=? OR parentidhash=?) AND linktype=? AND childidhash=? FOR UPDATE] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:17,994 (Worker thread '79') - Found 
a long-running query (133228 ms): [UPDATE intrinsiclink SET processid=?,isnew=? 
WHERE jobid=? AND parentidhash=? AND linktype=? AND childidhash=?] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,005 (Worker thread '88') - Found 
a long-running query (133234 ms): [UPDATE intrinsiclink SET 
processid=NULL,isnew=? WHERE jobid=? AND childidhash=? AND isnew IN (?,?)] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,036 (Worker thread '45') - Found 
a long-running query (133329 ms): [SELECT id,status,checktime FROM jobqueue 
WHERE dochash=? AND jobid=? FOR UPDATE] logs/manifoldcf.log: WARN 
2019-04-02T18:29:18,036 (Worker thread '60') - Found a long-running query 
(133264 ms): [SELECT id,status,checktime FROM jobqueue WHERE dochash=? AND 
jobid=? FOR UPDATE] logs/manifoldcf.log: WARN 2019-04-02T18:29:18,037 (Worker 
thread '38') - Found a long-running query (133468 ms): [SELECT 
id,status,checktime FROM jobqueue WHERE dochash=? AND jobid=? FOR UPDATE] 
logs/manifoldcf.log: WARN 2019-04-02T18:29:18,037 (Worker 

[jira] [Comment Edited] (CONNECTORS-1595) cross-site request forgery vulnerability

2019-03-28 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803825#comment-16803825
 ] 

roel goovaerts edited comment on CONNECTORS-1595 at 3/28/19 11:02 AM:
--

Thank you for your quick reply.

The points you rise were also mentioned by us in the conversations around these 
issues. The ui is indeed only used as a back-office application.

It was, however, my responsibility to report these issues to check if there was 
something that could be done.

Thanks for your time,

Roel


was (Author: goovaertsr):
Thank you for your quick reply.

The point you rise were also mentioned by us in the conversations around these 
issues. The ui is indeed only used as a back-office application.

It was, however, my responsibility to report these issues to check if there was 
something that could be done.

Thanks for your time,

Roel

> cross-site request forgery vulnerability
> 
>
> Key: CONNECTORS-1595
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1595
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> Below is the full analysis and description as a result from the penetration 
> test.
> *Summary*
> The application is vulnerable to Cross-Site Request Forgery (CSRF).
> A cross-site request forgery attack uses the following scenario:
> 1. An attacker creates a web page that includes an image or a form pointing 
> to the attacked application.
> The image source would actually be a URL with parameters pointing to the 
> application page that
> performs some action. In case of a form, the form action would point to the 
> action page in the target
> application, and the form is submitted automatically by JavaScript when the 
> page is viewed.
> 2. The attacker tricks the victim user to browse to this page. The attacker 
> may get the victim to click a
> link, or embed the attacking HTML code into some page the victim views, for 
> example in a bulletin
> board or chat.
> 3. When the victim views the attacker's page, his browser sends a request 
> prepared by the attacker to
> the attacked application. If the victim is logged in to the target 
> application, his browser will possess
> all necessary session tokens, so the request will appear as authorized to the 
> application and
> succeed.
> A cross-site request forgery attack uses the fact that the victim's browser 
> possesses the necessary
> authentication tokens to perform some actions in the target application.
> *Impact*
> A remote, unauthenticated attacker that can trick an authenticated user into 
> clicking a link crafted by the
> attacker or open a malicious web page, can force the victim to unknowingly 
> perform various actions within
> the application.
> Given that the whole application is not protected against CSRF, any action 
> that an administrator can take on
> Apache Manifold could be unknowingly performed if they fall for a CSRF attack.
> *Affected Systems*
>  * [https://els-manifold-uat.bc:8475/mcf-crawler-ui/]
> *Description*
> It appears that the application does not implement any CSRF protection. 
> Consider the following example. An
> attacker tricks a logged in application user to visit a page containing the 
> following code:
> {code:java}
> 
> 
> 
> history.pushState('', '', '/')
> https://els-manifold-uat.bc:8475/mcf-crawler-ui/execute.jsp;
> method="POST" enctype="multipart/form-data">
> 
> 
> 
> 
> 
> 
>  value="orgapachemanifoldcfcrawlerconnectorswebcrawlerWebcr
> awlerConnector" />
> 
> 
> 
>  value="ferdiklompcraftworkznl" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  value="httpsintrauatwebbc" />
>  />
> 
>  value="validation" />
>  value=""
> />
>  value="Continue" />
>  value="username" />
>  value="id996812" />
>  value="" />
>  value="Continue" />
>  value="password" />
>  value="Th1sIs4cl1X" />
>  value="" />
>  value="Continue" />
>  value="loginformtype" />
>  value="pwd" />
>  value="" />
>  value="3" />
> 
> 
>  value="httpsintrauatwebbc" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> {code}
> When the victim's browser parses the page and tries to load images, it will 
> cause them to execute any action
> of the attacker's choosing on Manifold.
> *Recommendations*
> The usual approach to preventing CSRF attacks is to add a new parameter with 
> an unpredictable value to
> each form or link that performs some action in the application, commonly 
> referred to as a CSRF-Token. The
> parameter value should have enough entropy so that it cannot be predicted by 
> an attacker and should be
> unique to the current user session. When the user submits the form or clicks 
> the link, the server side code
> checks the parameter value. If it is valid, the request is accepted, 
> 

[jira] [Commented] (CONNECTORS-1595) cross-site request forgery vulnerability

2019-03-28 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803825#comment-16803825
 ] 

roel goovaerts commented on CONNECTORS-1595:


Thank you for your quick reply.

The point you rise were also mentioned by us in the conversations around these 
issues. The ui is indeed only used as a back-office application.

It was, however, my responsibility to report these issues to check if there was 
something that could be done.

Thanks for your time,

Roel

> cross-site request forgery vulnerability
> 
>
> Key: CONNECTORS-1595
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1595
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> Below is the full analysis and description as a result from the penetration 
> test.
> *Summary*
> The application is vulnerable to Cross-Site Request Forgery (CSRF).
> A cross-site request forgery attack uses the following scenario:
> 1. An attacker creates a web page that includes an image or a form pointing 
> to the attacked application.
> The image source would actually be a URL with parameters pointing to the 
> application page that
> performs some action. In case of a form, the form action would point to the 
> action page in the target
> application, and the form is submitted automatically by JavaScript when the 
> page is viewed.
> 2. The attacker tricks the victim user to browse to this page. The attacker 
> may get the victim to click a
> link, or embed the attacking HTML code into some page the victim views, for 
> example in a bulletin
> board or chat.
> 3. When the victim views the attacker's page, his browser sends a request 
> prepared by the attacker to
> the attacked application. If the victim is logged in to the target 
> application, his browser will possess
> all necessary session tokens, so the request will appear as authorized to the 
> application and
> succeed.
> A cross-site request forgery attack uses the fact that the victim's browser 
> possesses the necessary
> authentication tokens to perform some actions in the target application.
> *Impact*
> A remote, unauthenticated attacker that can trick an authenticated user into 
> clicking a link crafted by the
> attacker or open a malicious web page, can force the victim to unknowingly 
> perform various actions within
> the application.
> Given that the whole application is not protected against CSRF, any action 
> that an administrator can take on
> Apache Manifold could be unknowingly performed if they fall for a CSRF attack.
> *Affected Systems*
>  * [https://els-manifold-uat.bc:8475/mcf-crawler-ui/]
> *Description*
> It appears that the application does not implement any CSRF protection. 
> Consider the following example. An
> attacker tricks a logged in application user to visit a page containing the 
> following code:
> {code:java}
> 
> 
> 
> history.pushState('', '', '/')
> https://els-manifold-uat.bc:8475/mcf-crawler-ui/execute.jsp;
> method="POST" enctype="multipart/form-data">
> 
> 
> 
> 
> 
> 
>  value="orgapachemanifoldcfcrawlerconnectorswebcrawlerWebcr
> awlerConnector" />
> 
> 
> 
>  value="ferdiklompcraftworkznl" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  value="httpsintrauatwebbc" />
>  />
> 
>  value="validation" />
>  value=""
> />
>  value="Continue" />
>  value="username" />
>  value="id996812" />
>  value="" />
>  value="Continue" />
>  value="password" />
>  value="Th1sIs4cl1X" />
>  value="" />
>  value="Continue" />
>  value="loginformtype" />
>  value="pwd" />
>  value="" />
>  value="3" />
> 
> 
>  value="httpsintrauatwebbc" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> {code}
> When the victim's browser parses the page and tries to load images, it will 
> cause them to execute any action
> of the attacker's choosing on Manifold.
> *Recommendations*
> The usual approach to preventing CSRF attacks is to add a new parameter with 
> an unpredictable value to
> each form or link that performs some action in the application, commonly 
> referred to as a CSRF-Token. The
> parameter value should have enough entropy so that it cannot be predicted by 
> an attacker and should be
> unique to the current user session. When the user submits the form or clicks 
> the link, the server side code
> checks the parameter value. If it is valid, the request is accepted, 
> otherwise it is denied. The attacker has no
> way of knowing the value of the unpredictable parameter, so he cannot 
> construct a form or link that will
> submit a valid request.
> *References*
>  * OWASP - Cross-Site Request Forgery - 
> [https://www.owasp.org/index.php/Cross-]
> Site_Request_Forgery



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1595) cross-site request forgery vulnerability

2019-03-28 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1595:
---
Description: 
Below is the full analysis and description as a result from the penetration 
test.

*Summary*
The application is vulnerable to Cross-Site Request Forgery (CSRF).
A cross-site request forgery attack uses the following scenario:
1. An attacker creates a web page that includes an image or a form pointing to 
the attacked application.
The image source would actually be a URL with parameters pointing to the 
application page that
performs some action. In case of a form, the form action would point to the 
action page in the target
application, and the form is submitted automatically by JavaScript when the 
page is viewed.
2. The attacker tricks the victim user to browse to this page. The attacker may 
get the victim to click a
link, or embed the attacking HTML code into some page the victim views, for 
example in a bulletin
board or chat.
3. When the victim views the attacker's page, his browser sends a request 
prepared by the attacker to
the attacked application. If the victim is logged in to the target application, 
his browser will possess
all necessary session tokens, so the request will appear as authorized to the 
application and
succeed.
A cross-site request forgery attack uses the fact that the victim's browser 
possesses the necessary
authentication tokens to perform some actions in the target application.

*Impact*
A remote, unauthenticated attacker that can trick an authenticated user into 
clicking a link crafted by the
attacker or open a malicious web page, can force the victim to unknowingly 
perform various actions within
the application.
Given that the whole application is not protected against CSRF, any action that 
an administrator can take on
Apache Manifold could be unknowingly performed if they fall for a CSRF attack.

*Affected Systems*
 * [https://els-manifold-uat.bc:8475/mcf-crawler-ui/]

*Description*
It appears that the application does not implement any CSRF protection. 
Consider the following example. An
attacker tricks a logged in application user to visit a page containing the 
following code:
{code:java}



history.pushState('', '', '/')
https://els-manifold-uat.bc:8475/mcf-crawler-ui/execute.jsp;
method="POST" enctype="multipart/form-data">






















































{code}
When the victim's browser parses the page and tries to load images, it will 
cause them to execute any action
of the attacker's choosing on Manifold.

*Recommendations*
The usual approach to preventing CSRF attacks is to add a new parameter with an 
unpredictable value to
each form or link that performs some action in the application, commonly 
referred to as a CSRF-Token. The
parameter value should have enough entropy so that it cannot be predicted by an 
attacker and should be
unique to the current user session. When the user submits the form or clicks 
the link, the server side code
checks the parameter value. If it is valid, the request is accepted, otherwise 
it is denied. The attacker has no
way of knowing the value of the unpredictable parameter, so he cannot construct 
a form or link that will
submit a valid request.

*References*
 * OWASP - Cross-Site Request Forgery - [https://www.owasp.org/index.php/Cross-]
Site_Request_Forgery

  was:It appears that manifoldcf does not implement any CSRF protection.


> cross-site request forgery vulnerability
> 
>
> Key: CONNECTORS-1595
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1595
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> Below is the full analysis and description as a result from the penetration 
> test.
> *Summary*
> The application is vulnerable to Cross-Site Request Forgery (CSRF).
> A cross-site request forgery attack uses the following scenario:
> 1. An attacker creates a web page that includes an image or a form pointing 
> to the attacked application.
> The image source would actually be a URL with parameters pointing to the 
> application page that
> performs some action. In case of a form, the form action would point to the 
> action page in the target
> application, and the form is submitted automatically by JavaScript when the 
> page is viewed.
> 2. The attacker tricks the victim user to browse to this page. The attacker 
> may get the victim to click a
> link, or embed the attacking HTML code into some page the victim views, for 
> example in a bulletin
> board or chat.
> 3. When the victim views the attacker's page, his browser sends a request 
> prepared by the attacker to
> the attacked application. If the victim is logged in to the target 
> 

[jira] [Commented] (CONNECTORS-1597) reflected cross-site scripting vulnerability

2019-03-28 Thread roel goovaerts (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803757#comment-16803757
 ] 

roel goovaerts commented on CONNECTORS-1597:


Hi Karl, I have updated the description to include the full report/analysis.

> reflected cross-site scripting vulnerability
> 
>
> Key: CONNECTORS-1597
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1597
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> This is the full report of a penetration test, performed at a client where we 
> deployed a system which uses manifold:
> *Summary*
> A reflected cross-site scripting vulnerability was discovered in the 
> application.
> Reflected cross-site scripting occurs when a web application displays data 
> submitted by the user that
> contains HTML markup and scripting code without properly escaping it. An 
> attacker will create a link to the
> vulnerable page that will display JavaScript code crated by the attacker. The 
> attacker will then trick an
> authenticated application user into clicking or following this crated link. 
> When the user's browser parses the
> generated page, it will execute the code crafted by the attacker. If the user 
> was logged in to the application
> when he followed the link, the attacker's code could perform any action in 
> the application that the user can
> perform.
> *Impact*
> Reflected cross-site scripting can be used by attackers to compromise the 
> session of an authenticated user.
> By persuading the victim to click on a specially crafted link, the attacker 
> can execute his own JavaScript
> payload in the browser context of the victim. In this specific case, an 
> attacker could hijack its victim's session
> given that the session token is not flagged as HttpOnly as demonstrated in 
> [G190204T1F4][MANIFOLD]
> Insecure Cookie Configuration.
> Additional attacks exist where an attacker can deceive end users of the 
> application by redirecting them to
> replica sites or trick them into downloading trojans or other malware. The 
> attacker can also use a so called
> browser exploitation framework. In this scenario the attacker injects 
> JavaScript code that communicates to
> the attack framework running on the attacker's computer. When the victim user 
> executes the JavaScript code
> the attacker can control the victim's browser. Publicly available frameworks 
> exist (BeEF -
> [http://www.bindshell.net/tools/beef], Backframe 
> -[http://www.gnucitizen.org/projects/backframe/], XSS Proxy -
> [http://xss-proxy.sourceforge.net/]).
> *Affected Systems*
>  * [https://els-manifold-uat.bc:8475/mcf-crawler-ui/] [name of an arbitrarily 
> supplied URL parameter]
> *Description*
> A case where the application includes user input into the generated HTML 
> pages without properly escaping
> the user supplied data was discovered in the application. The HTTP requests 
> and responses shown below
> demonstrate the problem.
> {code:java}
> GET /mcf-crawler-ui/?smafi">alert(1)non7x=1 HTTP/1.1
> Host: els-manifold-uat.bc:8475
> Accept-Encoding: gzip, deflate
> Accept: */*
> Accept-Language: en
> User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; 
> Trident/5.0)
> Connection: close
> Cookie: JSESSIONID=ov3qae9biucxdat0xiin5s18
> {code}
> {code:java}
> HTTP/1.1 200 OK
> Server: nginx/1.12.2
> Date: Mon, 18 Feb 2019 13:07:02 GMT
> Content-Type: text/html;charset=utf-8
> Content-Length: 2576
> Connection: close
> Pragma: No-cache
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Cache-Control: no-cache
> max-age: Thu, 01 Jan 1970 00:00:00 GMT
> 
> 
> 
> http://www.w3.org/1999/xhtml;>
> 
> 
> 
> 
>  type="text/css"/>
> 
> Apache ManifoldCF™ Login
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Sign in to start your session
>  method="POST">
> alert(1)non7x=1">
> 
> --snip--
> {code}
> *Recommendations*
> We recommend that the application enforces proper validation on user input. 
> In most situations where usercontrollable
> data is copied into application responses, cross-site scripting attacks can 
> be prevented using two
> layers of defenses:
>  * Input should be validated as strictly as possible on arrival, given the 
> kind of content which it is
> expected to contain. For example, personal names should consist of 
> alphabetical and a small range
> of typographical characters, and be relatively short; a year of birth should 
> consist of exactly four
> numerals; 

[jira] [Updated] (CONNECTORS-1597) reflected cross-site scripting vulnerability

2019-03-28 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1597:
---
Description: 
This is the full report of a penetration test, performed at a client where we 
deployed a system which uses manifold:

*Summary*
A reflected cross-site scripting vulnerability was discovered in the 
application.
Reflected cross-site scripting occurs when a web application displays data 
submitted by the user that
contains HTML markup and scripting code without properly escaping it. An 
attacker will create a link to the
vulnerable page that will display JavaScript code crated by the attacker. The 
attacker will then trick an
authenticated application user into clicking or following this crated link. 
When the user's browser parses the
generated page, it will execute the code crafted by the attacker. If the user 
was logged in to the application
when he followed the link, the attacker's code could perform any action in the 
application that the user can
perform.

*Impact*
Reflected cross-site scripting can be used by attackers to compromise the 
session of an authenticated user.
By persuading the victim to click on a specially crafted link, the attacker can 
execute his own JavaScript
payload in the browser context of the victim. In this specific case, an 
attacker could hijack its victim's session
given that the session token is not flagged as HttpOnly as demonstrated in 
[G190204T1F4][MANIFOLD]
Insecure Cookie Configuration.
Additional attacks exist where an attacker can deceive end users of the 
application by redirecting them to
replica sites or trick them into downloading trojans or other malware. The 
attacker can also use a so called
browser exploitation framework. In this scenario the attacker injects 
JavaScript code that communicates to
the attack framework running on the attacker's computer. When the victim user 
executes the JavaScript code
the attacker can control the victim's browser. Publicly available frameworks 
exist (BeEF -
[http://www.bindshell.net/tools/beef], Backframe 
-[http://www.gnucitizen.org/projects/backframe/], XSS Proxy -
[http://xss-proxy.sourceforge.net/]).

*Affected Systems*
 * [https://els-manifold-uat.bc:8475/mcf-crawler-ui/] [name of an arbitrarily 
supplied URL parameter]

*Description*
A case where the application includes user input into the generated HTML pages 
without properly escaping
the user supplied data was discovered in the application. The HTTP requests and 
responses shown below
demonstrate the problem.
{code:java}
GET /mcf-crawler-ui/?smafi">alert(1)non7x=1 HTTP/1.1
Host: els-manifold-uat.bc:8475
Accept-Encoding: gzip, deflate
Accept: */*
Accept-Language: en
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; 
Trident/5.0)
Connection: close
Cookie: JSESSIONID=ov3qae9biucxdat0xiin5s18
{code}
{code:java}
HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Mon, 18 Feb 2019 13:07:02 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 2576
Connection: close
Pragma: No-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache
max-age: Thu, 01 Jan 1970 00:00:00 GMT



http://www.w3.org/1999/xhtml;>






Apache ManifoldCF™ Login











Sign in to start your session

alert(1)non7x=1">

--snip--
{code}
*Recommendations*
We recommend that the application enforces proper validation on user input. In 
most situations where usercontrollable
data is copied into application responses, cross-site scripting attacks can be 
prevented using two
layers of defenses:
 * Input should be validated as strictly as possible on arrival, given the kind 
of content which it is
expected to contain. For example, personal names should consist of alphabetical 
and a small range
of typographical characters, and be relatively short; a year of birth should 
consist of exactly four
numerals; email addresses should match a well-defined regular expression. Input 
which fails the
validation should be rejected, not sanitized.
 * User input should be HTML-encoded at any point where it is copied into 
application responses. All
HTML metacharacters, including < > " ' and =, should be replaced with the 
corresponding HTML
entities (< > etc).

*References*
 * OWASP – Cross-site scripting - 
[https://www.owasp.org/index.php/Cross-site_Scripting_(XSS])

  was:As a result from a pen test, a reflected cross-site scripting 
vulnerability was discovered


> reflected cross-site scripting vulnerability
> 
>
> Key: CONNECTORS-1597
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1597
> Project: ManifoldCF

[jira] [Created] (CONNECTORS-1597) reflected cross-site scripting vulnerability

2019-03-27 Thread roel goovaerts (JIRA)
roel goovaerts created CONNECTORS-1597:
--

 Summary: reflected cross-site scripting vulnerability
 Key: CONNECTORS-1597
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1597
 Project: ManifoldCF
  Issue Type: Improvement
  Components: API
Affects Versions: ManifoldCF 2.12
Reporter: roel goovaerts


As a result from a pen test, a reflected cross-site scripting vulnerability was 
discovered



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1594) insecure cookie configuration vulnerability

2019-03-27 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1594:
---
Summary: insecure cookie configuration vulnerability  (was: insecure cookie 
configuration)

> insecure cookie configuration vulnerability
> ---
>
> Key: CONNECTORS-1594
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1594
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> The application session cookie "JSESSIONID" does not have Secure and HTTPOnly 
> flags set.
> The application uses an HTTP cookie as session identifier. The Set-Cookie 
> instruction sent by the application to the browser does not specifically 
> instruct the browser to only use the cookie on secure communication channels 
> (HTTPS). As the instruction is missing, browsers will fall back to their 
> default setting, generally meaning that the cookie will be used on both 
> secure and insecure communication channels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1596) brute-force vulnerability

2019-03-27 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1596:
---
Summary: brute-force vulnerability  (was: brute-force protection)

> brute-force vulnerability
> -
>
> Key: CONNECTORS-1596
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1596
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> As a result of a pen test, it appears there is no functionality to counter 
> brute-force attacks for logging in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1595) cross-site request forgery vulnerability

2019-03-27 Thread roel goovaerts (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roel goovaerts updated CONNECTORS-1595:
---
Summary: cross-site request forgery vulnerability  (was: cross-site request 
forgery)

> cross-site request forgery vulnerability
> 
>
> Key: CONNECTORS-1595
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1595
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: roel goovaerts
>Priority: Minor
>
> It appears that manifoldcf does not implement any CSRF protection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1596) brute-force protection

2019-03-27 Thread roel goovaerts (JIRA)
roel goovaerts created CONNECTORS-1596:
--

 Summary: brute-force protection
 Key: CONNECTORS-1596
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1596
 Project: ManifoldCF
  Issue Type: Improvement
  Components: API
Affects Versions: ManifoldCF 2.12
Reporter: roel goovaerts


As a result of a pen test, it appears there is no functionality to counter 
brute-force attacks for logging in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1595) cross-site request forgery

2019-03-27 Thread roel goovaerts (JIRA)
roel goovaerts created CONNECTORS-1595:
--

 Summary: cross-site request forgery
 Key: CONNECTORS-1595
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1595
 Project: ManifoldCF
  Issue Type: Improvement
  Components: API
Affects Versions: ManifoldCF 2.12
Reporter: roel goovaerts


It appears that manifoldcf does not implement any CSRF protection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1594) insecure cookie configuration

2019-03-27 Thread roel goovaerts (JIRA)
roel goovaerts created CONNECTORS-1594:
--

 Summary: insecure cookie configuration
 Key: CONNECTORS-1594
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1594
 Project: ManifoldCF
  Issue Type: Improvement
  Components: API
Affects Versions: ManifoldCF 2.12
Reporter: roel goovaerts


The application session cookie "JSESSIONID" does not have Secure and HTTPOnly 
flags set.

The application uses an HTTP cookie as session identifier. The Set-Cookie 
instruction sent by the application to the browser does not specifically 
instruct the browser to only use the cookie on secure communication channels 
(HTTPS). As the instruction is missing, browsers will fall back to their 
default setting, generally meaning that the cookie will be used on both secure 
and insecure communication channels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)