[ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mingchun Zhao updated CONNECTORS-1746: -------------------------------------- Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. (was: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. ) > Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling > become extremely slow. > -------------------------------------------------------------------------------------------------- > > Key: CONNECTORS-1746 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1746 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the > database. > Reporter: Mingchun Zhao > Priority: Major > > Sometimes, the crawling does not process any documents for a while and there > is nothing logged about long-running queries. The performance can be restored > by firing the 'ANALYZE' command manually. It seems that a bad query plan > caused this performance problem. > Therefore, in addition to the current configuration parameter > org.apache.manifoldcf.db.postgres.analyze.<tablename> , it is considered > necessary to execute the 'ANALYZE' even in the following situations. > 1. When the number of records in the table exceeds the number required for > creating a execution plan after the job starts. > 2. When the crawling performance slows down. For example, if the processing > rate of documents drops below a specified threshold. > So, how about adding two parameters to handle the timing of 'ANALYZE' > execution as below? > 1. `org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumrowcount` > Specify how many records should be inserted before carrying out an 'ANALYZE' > on the specified table as the first time.defaults to 100. > 2.`org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumprocessrate` > Specify the number of documents processed in the last minute. If the actual > processing rate falls below this, the 'ANALYZE' will be executed. defaults to > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)