Re: Too many slow queries caused by MCF running MySQL 5.5
Hi Karl. I could build the source ok but the following code is missing from connectors.xml. Does this mean I built it incorrectly or this is on purpose? Do I have to just add the code to enable the Windows share connection? repositoryconnector name=Windows shares class=org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector/ Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com Hi Shigeki, I'm uploading a new version of ManifoldCF 1.1-dev, which you can pick up at http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev . This has a good chance of fixing the query performance problem. Please try it out, and let me know if you still get slow queries in the log. You should be to use the existing database instance. Thanks, Karl On Mon, Dec 10, 2012 at 5:05 PM, Karl Wright daddy...@gmail.com wrote: Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com wrote: Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT
Re: Too many slow queries caused by MCF running MySQL 5.5
Sorry, My bad. jcifs.jar was missing. Probably this is the cause. Sorry Shigeki 2012/12/11 Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp Hi Karl. I could build the source ok but the following code is missing from connectors.xml. Does this mean I built it incorrectly or this is on purpose? Do I have to just add the code to enable the Windows share connection? repositoryconnector name=Windows shares class=org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector/ Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com Hi Shigeki, I'm uploading a new version of ManifoldCF 1.1-dev, which you can pick up at http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev . This has a good chance of fixing the query performance problem. Please try it out, and let me know if you still get slow queries in the log. You should be to use the existing database instance. Thanks, Karl On Mon, Dec 10, 2012 at 5:05 PM, Karl Wright daddy...@gmail.com wrote: Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com wrote: Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the
RE: Too many slow queries caused by MCF running MySQL 5.5
You just need to run ant make-deps too before building. Karl Sent from my Windows Phone -- From: Shigeki Kobayashi Sent: 12/11/2012 3:58 AM To: user@manifoldcf.apache.org Subject: Re: Too many slow queries caused by MCF running MySQL 5.5 Hi Karl. I could build the source ok but the following code is missing from connectors.xml. Does this mean I built it incorrectly or this is on purpose? Do I have to just add the code to enable the Windows share connection? repositoryconnector name=Windows shares class=org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector/ Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com Hi Shigeki, I'm uploading a new version of ManifoldCF 1.1-dev, which you can pick up at http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev . This has a good chance of fixing the query performance problem. Please try it out, and let me know if you still get slow queries in the log. You should be to use the existing database instance. Thanks, Karl On Mon, Dec 10, 2012 at 5:05 PM, Karl Wright daddy...@gmail.com wrote: Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com wrote: Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10
Re: Too many slow queries caused by MCF running MySQL 5.5
Hi Karl. I downloaded the new MCF dev and crawled files again. There are still the same slow queries. The follwoing is the result of explain against the slow query: mysql explain SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 FORCE INDEX (i1355220194891) WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1355221199007 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-+---+-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-+---+-+ | 1 | PRIMARY| t0| ALL| NULL | NULL | NULL| NULL| 85293 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1355220194897 | I1355220194897 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1355220194890,I1355220194893,I1355220194892 | I1355220194890 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1355220194900 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-+---+-+ 5 rows in set (0.00 sec) Full table scanning was happening still. Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com You just need to run ant make-deps too before building. Karl Sent from my Windows Phone -- From: Shigeki Kobayashi Sent: 12/11/2012 3:58 AM To: user@manifoldcf.apache.org Subject: Re: Too many slow queries caused by MCF running MySQL 5.5 Hi Karl. I could build the source ok but the following code is missing from connectors.xml. Does this mean I built it incorrectly or this is on purpose? Do I have to just add the code to enable the Windows share connection? repositoryconnector name=Windows shares class=org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector/ Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com Hi Shigeki, I'm uploading a new version of ManifoldCF 1.1-dev, which you can pick up at http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev . This has a good chance of fixing the query performance problem. Please try it out, and let me know if you still get slow queries in the log. You should be to use the existing database instance. Thanks, Karl On Mon, Dec 10, 2012 at 5:05 PM, Karl Wright daddy...@gmail.com wrote: Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com wrote: Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl
Re: Too many slow queries caused by MCF running MySQL 5.5
Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200 Thanks! Karl On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can find a solution. The ticket for this CONNECTORS-584. Karl On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows share connection and index them into Solr. As MCF reached over 1 million files, the crawling speed started getting slower. So I checked slow queries and found out that too many slow queries occurred, especially the following kinds: # Time: 121204 16:25:40 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 7.240532 Lock_time: 0.000204 Rows_sent: 1200 Rows_examined: 611091 SET timestamp=1354605940; SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; # Time: 121204 16:25:44 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 3.064339 Lock_time: 0.84 Rows_sent: 1 Rows_examined: 406359 SET timestamp=1354605944; SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN ('P','G') AND checkaction='R' AND checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid) ORDER BY docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1; --- I wonder if the queries appropriately use index of the table. As a result of EXPLAIN against the slow query, there was filesort. There seems to be some conditions that MySQL does not use index depending on ORDER BY: - Executing ORDER BY against multiple keys - When keys selected from records are different from keys used by ORDER BY Since filesort was happening, fully scanning records should be having MCF slower. Do you think this could happen even in PostgreSQL or HSQLDB? Do you think queries could be modified to use index appropriately? Regards, Shigeki
Re: Too many slow queries caused by MCF running MySQL 5.5
Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200 Thanks! Karl On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can find a solution. The ticket for this CONNECTORS-584. Karl On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows share connection and index them into Solr. As MCF reached over 1 million files, the crawling speed started getting slower. So I checked slow queries and found out that too many slow queries occurred, especially the following kinds: # Time: 121204 16:25:40 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 7.240532 Lock_time: 0.000204 Rows_sent: 1200 Rows_examined: 611091 SET timestamp=1354605940; SELECT t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue
Re: Too many slow queries caused by MCF running MySQL 5.5
Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200 Thanks! Karl On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can find a solution. The ticket for this CONNECTORS-584. Karl On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows share connection and index them into Solr. As MCF reached over 1 million files, the crawling speed started getting slower. So I checked slow queries and found out that too many slow queries occurred, especially the following kinds: # Time: 121204
Re: Too many slow queries caused by MCF running MySQL 5.5
Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200 Thanks! Karl On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can find a solution. The ticket for this CONNECTORS-584. Karl On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows
Re: Too many slow queries caused by MCF running MySQL 5.5
Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com wrote: Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200 Thanks! Karl On Mon, Dec 10, 2012 at 2:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL.
Re: Too many slow queries caused by MCF running MySQL 5.5
Hi Shigeki, I'm uploading a new version of ManifoldCF 1.1-dev, which you can pick up at http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev . This has a good chance of fixing the query performance problem. Please try it out, and let me know if you still get slow queries in the log. You should be to use the existing database instance. Thanks, Karl On Mon, Dec 10, 2012 at 5:05 PM, Karl Wright daddy...@gmail.com wrote: Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com wrote: Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy...@gmail.com wrote: Ok, that is unfortunate. I will do some further MySQL research here. There is a FORCE INDEX MySQL construct that may help, e.g. SELECT ... FROM ... FORCE INDEX (key1_key2_key3) WHERE ... which we can also try. In this case that would be: FORCE INDEX (docpriority,status,checkaction,checktime) or FORCE INDEX (docpriority_status_checkaction_checktime) - unclear what the right syntax actually is. Maybe you can try an explain with that in the query? FWIW, PostgreSQL should always use the index for this situation. Karl On Mon, Dec 10, 2012 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') - AND t0.checkaction='R' AND - t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE - t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT - EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status - IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT - 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND - t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status - ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; +++---++--++-+-++-+ | id | select_type| table | type | possible_keys | key| key_len | ref | rows | Extra | +++---++--++-+-++-+ | 1 | PRIMARY| t0| range | I1354241297073,I1354241297072,I1354241297071 | I1354241297071 | 25 | NULL| 151494 | Using where; Using filesort | | 4 | DEPENDENT SUBQUERY | t3| ref| I1354241297077 | I1354241297077 | 8 | manifoldcf.t0.id| 1 | | | 4 | DEPENDENT SUBQUERY | t4| eq_ref | PRIMARY | PRIMARY| 767 | manifoldcf.t3.eventname | 1 | Using index | | 3 | DEPENDENT SUBQUERY | t2| ref| I1354241297070,I1354241297073,I1354241297072 | I1354241297070 | 122 | manifoldcf.t0.dochash | 1 | Using where | | 2 | DEPENDENT SUBQUERY | t1| eq_ref | PRIMARY,I1354241297080 | PRIMARY| 8 | manifoldcf.t0.jobid | 1 | Using where | +++---++--++-+-++-+ As you see Using filesort, I do not think it uses the index. By the way, which database do you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html ? Does it use the index? SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.docpriority = 0 AND t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
Re: Too many slow queries caused by MCF running MySQL 5.5
Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can find a solution. The ticket for this CONNECTORS-584. Karl On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows share connection and index them into Solr. As MCF reached over 1 million files, the crawling speed started getting slower. So I checked slow queries and found out that too many slow queries occurred, especially the following kinds: # Time: 121204 16:25:40 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 7.240532 Lock_time: 0.000204 Rows_sent: 1200 Rows_examined: 611091 SET timestamp=1354605940; SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; # Time: 121204 16:25:44 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 3.064339 Lock_time: 0.84 Rows_sent: 1 Rows_examined: 406359 SET timestamp=1354605944; SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN ('P','G') AND checkaction='R' AND checktime=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid) ORDER BY docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1; --- I wonder if the queries appropriately use index of the table. As a result of EXPLAIN against the slow query, there was filesort. There seems to be some conditions that MySQL does not use index depending on ORDER BY: - Executing ORDER BY against multiple keys - When keys selected from records are different from keys used by ORDER BY Since filesort was happening, fully scanning records should be having MCF slower. Do you think this could happen even in PostgreSQL or HSQLDB? Do you think queries could be modified to use index appropriately? Regards, Shigeki