[ 
https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722207#comment-17722207
 ] 

Mingchun Zhao edited comment on CONNECTORS-1746 at 5/12/23 3:25 PM:
--------------------------------------------------------------------

Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.

1. "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze.<tablename>" at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

2. "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze.<tablename>" 
only when events per second drops below the threshold. defaults to 1 (1 event 
processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]


was (Author: mingchun.zhao):
Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.
 # "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by 
property "org.apache.manifoldcf.db.postgres.analyze.<tablename>" at the start 
of job. defaults to false (not to run "ANALYZE" at the start).

 # "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is 
specified by property "org.apache.manifoldcf.db.postgres.analyze.<tablename>" 
only when events per second drops below the threshold. defaults to 1 (1 event 
processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was 
executed correctly in the above two situations. Especially, when MCF's 
throughput (event counts per second) dropped due to PostgreSQL's bad query 
plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling 
> become extremely slow.
> --------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1746
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>         Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the 
> database. 
>            Reporter: Mingchun Zhao
>            Priority: Major
>         Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there 
> is nothing logged about long-running queries. The performance can be restored 
> by firing the 'ANALYZE' command manually. It seems that a bad query plan 
> caused this performance problem.
> Therefore, in addition to the current configuration parameter 
> 'org.apache.manifoldcf.db.postgres.analyze.<tablename>', it is considered 
> necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for 
> creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing 
> rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to