bq. Yes, the use case is that of a few jobs (some of which are inactive) with a large number of documents in a 'P' or 'G' status.
Large numbers of documents with a 'P' or 'G' status should not be a concern. Large numbers of documents whose jobs are inactive *may* be, because the index it's supposed to use does not know anything about the joined jobs table. So if you have a million jobqueue records that belong to inactive jobs which all look just like active job jobqueue records, I can understand why the query planner would choose a different plan. Is this in fact the case? How many documents in states 'P' or 'G' do you have which belong to inactive jobs? My concern is that if we really don't understand why Postgres is making a mess of the plan, any "fix" we make will be at best temporary. For example, can you explain why replacing an implicit join with an explicit join would make any difference in the plan? According to the documentation, I'm pretty sure that a revised query with a join would *never* do an index read of the kind that MCF needs for this to truly scale, even if it is fast for your current situation. Karl On Fri, Oct 31, 2014 at 10:56 AM, Aeham Abushwashi < [email protected]> wrote: > Hi Karl, > > Yes, the use case is that of a few jobs (some of which are inactive) with a > large number of documents in a 'P' or 'G' status. > > Updating the jobqueue table stats with ANALYZE doesn't help I'm afraid. The > sub-query still uses the less-optimal index and generates a large number of > redundant rows that the subsequent Nested Loop Semi Join spends all of its > time culling. > > There's some good news though. Changing query #1 to use an explicit JOIN > between jobqueue (t0) and jobs (t1) has made it significantly faster. Fom > 12 secs all the way down to 1 sec. > > This is the original query and its execution plan: > > EXPLAIN ANALYZE SELECT > t0.id > ,t0.docpriority,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset > FROM jobqueue t0 WHERE (t0.status='P' OR t0.status='G') AND > t0.checkaction='R' AND t0.checktime<= 1414683318802 AND EXISTS(SELECT 'x' > FROM jobs t1 WHERE (t1.status='A' OR t1.status='a') AND t1.id=t0.jobid > AND > t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE > t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND > t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events > t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority > ASC LIMIT 280; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..95017.41 rows=280 width=177) (actual > time=12395.701..12395.701 rows=0 loops=1) > -> Nested Loop Anti Join (cost=0.00..2641483.78 rows=7784 width=177) > (actual time=12395.699..12395.699 rows=0 loops=1) > Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text = > (t0.dochash)::text)) > -> Nested Loop Anti Join (cost=0.00..2641310.77 rows=7784 > width=177) (actual time=12395.697..12395.697 rows=0 loops=1) > -> Nested Loop Semi Join (cost=0.00..2634519.48 rows=7784 > width=177) (actual time=12395.696..12395.696 rows=0 loops=1) > -> Index Scan using i1392985450172 on jobqueue t0 > (cost=0.00..2609016.45 rows=89520 width=177) (actual time=668.127..3970.495 > rows=3346768 loops=1) > Index Cond: ((checkaction = 'R'::bpchar) AND > (checktime <= 1414683318802::bigint)) > Filter: ((status = 'P'::bpchar) OR (status = > 'G'::bpchar)) > -> Index Scan using jobs_pkey on jobs t1 > (cost=0.00..0.28 rows=1 width=8) (actual time=0.002..0.002 rows=0 > loops=3346768) > Index Cond: (id = t0.jobid) > Filter: ((priority = 5) AND ((status = > 'A'::bpchar) OR (status = 'a'::bpchar))) > Rows Removed by Filter: 1 > -> Nested Loop (cost=0.00..0.86 rows=1 width=8) (never > executed) > -> Index Scan using i1392985450178 on prereqevents > t3 (cost=0.00..0.27 rows=1 width=524) (never executed) > Index Cond: (t0.id = owner) > -> Index Only Scan using events_pkey on events t4 > (cost=0.00..0.58 rows=1 width=516) (never executed) > Index Cond: (name = (t3.eventname)::text) > Heap Fetches: 0 > -> Materialize (cost=0.00..36.79 rows=1 width=49) (never > executed) > -> Index Scan using i1392985450173 on jobqueue t2 > (cost=0.00..36.78 rows=1 width=49) (never executed) > Index Cond: (status = ANY ('{A,F,a,f,D,d}'::bpchar[])) > Total runtime: 12395.884 ms > > > > and this is the modified query and its execution plan: > > EXPLAIN ANALYZE SELECT > t0.id > ,t0.docpriority,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset > FROM jobqueue t0 JOIN jobs t1 ON t1.id = t0.jobid WHERE (t0.status='P' > OR > t0.status='G') AND t0.checkaction='R' AND t0.checktime<= 1414683318802 AND > (t1.status = 'A' OR t1.status='a') AND t1.priority=5 AND NOT EXISTS(SELECT > 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN > ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' > FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname= > t4.name) > ORDER BY t0.docpriority ASC LIMIT 280; > > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > --------------------------------- > Limit (cost=122440.72..122441.42 rows=280 width=177) (actual > time=1010.722..1010.722 rows=0 loops=1) > -> Sort (cost=122440.72..122451.13 rows=4164 width=177) (actual > time=1010.719..1010.719 rows=0 loops=1) > Sort Key: t0.docpriority > Sort Method: quicksort Memory: 25kB > -> Nested Loop Anti Join (cost=45766.95..122250.65 rows=4164 > width=177) (actual time=1010.702..1010.702 rows=0 loops=1) > Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text > = (t0.dochash)::text)) > -> Hash Anti Join (cost=45766.95..122140.99 rows=4164 > width=177) (actual time=1010.701..1010.701 rows=0 loops=1) > Hash Cond: (t0.id = t3.owner) > -> Nested Loop (cost=45739.17..122060.64 rows=4164 > width=177) (actual time=1010.699..1010.699 rows=0 loops=1) > -> Bitmap Heap Scan on jobs t1 > (cost=8.52..15.79 rows=2 width=8) (actual time=0.028..0.031 rows=1 loops=1) > Recheck Cond: ((status = 'A'::bpchar) OR > (status = 'a'::bpchar)) > Filter: (priority = 5) > -> BitmapOr (cost=8.52..8.52 rows=2 > width=0) (actual time=0.021..0.021 rows=0 loops=1) > -> Bitmap Index Scan on > i1392985450182 (cost=0.00..4.26 rows=1 width=0) (actual time=0.017..0.017 > rows=1 loops=1) > Index Cond: (status = > 'A'::bpchar) > -> Bitmap Index Scan on > i1392985450182 (cost=0.00..4.26 rows=1 width=0) (actual time=0.003..0.003 > rows=0 loops=1) > Index Cond: (status = > 'a'::bpchar) > -> Bitmap Heap Scan on jobqueue t0 > (cost=45730.65..60983.50 rows=3892 width=177) (actual > time=1010.661..1010.661 rows=0 loops=1) > Recheck Cond: ((((status = 'P'::bpchar) > AND (checkaction = 'R'::bpchar) AND (checktime <= 1414683318802::bigint)) > OR ((status = 'G'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime > <= 1414683318802 > ::bigint))) AND (jobid = t1.id)) > -> BitmapAnd (cost=45730.65..45730.65 > rows=3982 width=0) (actual time=1009.818..1009.818 rows=0 loops=1) > -> BitmapOr (cost=3129.52..3129.52 > rows=91575 width=0) (actual time=934.891..934.891 rows=0 loops=1) > -> Bitmap Index Scan on > i1392985450174 (cost=0.00..1806.76 rows=53658 width=0) (actual > time=513.541..513.541 rows=1946394 loops=1) > Index Cond: ((status = > 'P'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= > 1414683318802::bigint)) > -> Bitmap Index Scan on > i1392985450174 (cost=0.00..1278.00 rows=37917 width=0) (actual > time=421.346..421.346 rows=1400375 loops=1) > Index Cond: ((status = > 'G'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= > 1414683318802::bigint)) > -> Bitmap Index Scan on > i1392985450177 (cost=0.00..42599.91 rows=1612579 width=0) (actual > time=5.887..5.887 rows=43401 loops=1) > Index Cond: (jobid = t1.id) > -> Hash (cost=26.15..26.15 rows=130 width=8) (never > executed) > -> Hash Join (cost=12.93..26.15 rows=130 > width=8) (never executed) > Hash Cond: ((t3.eventname)::text = ( > t4.name > )::text) > -> Seq Scan on prereqevents t3 > (cost=0.00..11.40 rows=140 width=524) (never executed) > -> Hash (cost=11.30..11.30 rows=130 > width=516) (never executed) > -> Seq Scan on events t4 > (cost=0.00..11.30 rows=130 width=516) (never executed) > -> Materialize (cost=0.00..36.79 rows=1 width=49) (never > executed) > -> Index Scan using i1392985450173 on jobqueue t2 > (cost=0.00..36.78 rows=1 width=49) (never executed) > Index Cond: (status = ANY > ('{A,F,a,f,D,d}'::bpchar[])) > Total runtime: 1010.861 ms >
