[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mingchun Zhao updated CONNECTORS-1746: -------------------------------------- Description: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.<tablename>', it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. was: Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem. Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.<tablename>', it is considered necessary to execute the 'ANALYZE' even in the following situations. 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts. 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold. So, how about adding two parameters to handle the timing of 'ANALYZE' execution as below? 1.'org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumrowcount' Specify how many records should be inserted before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100. 2.'org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumprocessrate' Specify the minimum number of documents processed per minute. If the processing rate of documents drops below this threshold, the 'ANALYZE' will be executed. defaults to 1. > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -------------------------------------------------------------------------------------------------- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. > Reporter: Mingchun Zhao > Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > 'org.apache.manifoldcf.db.postgres.analyze.<tablename>', it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010)