[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mingchun Zhao updated CONNECTORS-1746: -------------------------------------- Summary: Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow. (was: Adding execution conditions of PostgreSQL's ANALYZE command to avoid crawling become extremely slow.) > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -------------------------------------------------------------------------------------------------- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. > Reporter: Mingchun Zhao > Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > org.apache.manifoldcf.db.postgres.analyze.<tablename> , it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating an query plan after the job starts. > 2. When the crawling performance slows down. For example, if the document > processing rate drops below a specified threshold. > How about adding two parameters to handle the timing of 'ANALYZE' execution > as below? > 1. `org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumrowcount` > Specify how many records should be accumulated before carrying out an > 'ANALYZE' on the specified table as the first time.defaults to 100. > 2.`org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumprocessrate` > Specify the number of documents processed in the last minute. If the actual > processing rate falls below this, the 'ANALYZE' will be carrying out. > defaults to 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)