Hi Aeham, I opened CONNECTORS-1090 to track any changes that get made for this problem.
I've attached a patch which should change query generation for the stuffer query to remove the remaining IN clause. Generally I don't expect that to change the plan, but given that we don't understand the planner's decision process fully, it may have an effect. Karl On Thu, Oct 30, 2014 at 9:04 PM, Karl Wright <[email protected]> wrote: > Hi Aeham, > > The stuffer query is designed to make use of the following index, and > indeed to read rows in sorting order FROM that index: > > IndexDescription docpriorityIndex = new IndexDescription(false,new > String[]{docPriorityField,statusField,checkActionField,checkTimeField}); > > All that the joins do is effectively filter the results based on job > status and based on whether a given document is actually present in and > being worked on by another job. The second case is rare, so any plan that > emphasizes the jobqueue interaction is plain wrong. I could see the first > join being meaningful in some cases, but only if you had lots of inactive > jobs which each had lots of documents. Please let me know if that is the > situation here. > > It is a mystery why PostgreSQL does not always use this index. In my > experience, the main cause is due to PostgreSQL's cost optimizer making > huge distinctions between when there is zero of something vs one of > something. You can tell readily enough if this is what is happening by > getting a plan for the stuffer query, and if it is NOT reading from the > index I mentioned, doing ANALYZE jobqueue, and then getting the plan > again. If the second plan is fine, then this previously noted PostgreSQL > problem is to blame. > > I'm afraid that your notion of breaking up the query and doing it in Java > is extremely problematic, because then we would be forced to read huge > resultsets into memory. If you demonstrate that an ANALYZE needs to be > done more frequently, on the other hand, there is a configuration parameter > for that. Please let me know. > > Karl > > On Thu, Oct 30, 2014 at 8:28 PM, Aeham Abushwashi < > [email protected]> wrote: > >> Hi Karl, >> >> I do have that change. Paul Boichat, who was involved in the conversation >> leading up to CONNECTORS-1027, applied the fix and tested it in our >> environment. >> >> I see your point that query #1 still has the IN (...) clause indicating >> that it hasn't benefited from the CONNECTORS-1027 fix. However, in my >> execution plan that part of the query is never executed anyway - see below >> for details. In addition, query #2 is clean yet it hits the database hard. >> >> ====================== >> >> Execution Plan for Query #1: >> >> EXPLAIN ANALYZE SELECT >> t0.id >> ,t0.docpriority,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >> FROM jobqueue t0 WHERE (t0.status='P' OR t0.status='G') AND >> t0.checkaction='R' AND t0.checktime<= 1414683318802 AND EXISTS(SELECT 'x' >> FROM jobs t1 WHERE (t1.status='A' OR t1.status='a') AND t1.id=t0.jobid >> AND >> t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE >> t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND >> t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events >> t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority >> ASC LIMIT 280; >> >> QUERY PLAN >> >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..85910.03 rows=280 width=178) (actual >> time=15152.258..15152.258 rows=0 loops=1) >> -> Nested Loop Anti Join (cost=0.00..3187261.91 rows=10388 width=178) >> (actual time=15152.255..15152.255 rows=0 loops=1) >> Join Filter: ((t2.jobid <> t0.jobid) AND ((t2.dochash)::text = >> (t0.dochash)::text)) >> -> Nested Loop Anti Join (cost=0.00..3186114.69 rows=10388 >> width=178) (actual time=15152.253..15152.253 rows=0 loops=1) >> -> Nested Loop Semi Join (cost=0.00..3177053.22 >> rows=10388 >> width=178) (actual time=15152.252..15152.252 rows=0 loops=1) >> -> Index Scan using i1392985450172 on jobqueue t0 >> (cost=0.00..3141756.78 rows=124662 width=178) (actual >> time=686.159..4046.926 rows=3346768 loops=1) >> Index Cond: ((checkaction = 'R'::bpchar) AND >> (checktime <= 1414683318802::bigint)) >> Filter: ((status = 'P'::bpchar) OR (status = >> 'G'::bpchar)) >> -> Index Scan using jobs_pkey on jobs t1 >> (cost=0.00..0.28 rows=1 width=8) (actual time=0.003..0.003 rows=0 >> loops=3346768) >> Index Cond: (id = t0.jobid) >> Filter: ((priority = 5) AND ((status = >> 'A'::bpchar) OR (status = 'a'::bpchar))) >> Rows Removed by Filter: 1 >> -> Nested Loop (cost=0.00..0.86 rows=1 width=8) (never >> executed) >> -> Index Scan using i1392985450178 on prereqevents >> t3 (cost=0.00..0.27 rows=1 width=524) (never executed) >> Index Cond: (t0.id = owner) >> -> Index Only Scan using events_pkey on events t4 >> (cost=0.00..0.58 rows=1 width=516) (never executed) >> Index Cond: (name = (t3.eventname)::text) >> Heap Fetches: 0 >> -> Materialize (cost=0.00..56.48 rows=6 width=49) (never >> executed) >> -> Index Scan using i1392985450173 on jobqueue t2 >> (cost=0.00..56.45 rows=6 width=49) (never executed) >> Index Cond: (status = ANY >> ('{A,F,a,f,D,d}'::bpchar[])) >> Total runtime: 15152.487 ms >> >> >> Execution Plan for Query #2: >> >> EXPLAIN ANALYZE SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 >> WHERE (status='P' OR status='G') AND checkaction='R' AND checktime<= >> 1414685283327 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE (t1.status='A' OR >> t1.status='a') AND t1.id=t0.jobid) ORDER BY docpriority ASC LIMIT 1; >> >> QUERY PLAN >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..304.78 rows=1 width=144) (actual >> time=20620.458..20620.458 rows=0 loops=1) >> -> Nested Loop Semi Join (cost=0.00..3029548.13 rows=9940 width=144) >> (actual time=20620.456..20620.456 rows=0 loops=1) >> -> Index Scan using i1392985450172 on jobqueue t0 >> (cost=0.00..2996093.69 rows=119282 width=144) (actual >> time=44.118..5665.201 >> rows=3346768 loops=1) >> Index Cond: ((checkaction = 'R'::bpchar) AND (checktime <= >> 1414685283327::bigint)) >> Filter: ((status = 'P'::bpchar) OR (status = 'G'::bpchar)) >> -> Index Scan using jobs_pkey on jobs t1 (cost=0.00..0.28 >> rows=1 >> width=8) (actual time=0.004..0.004 rows=0 loops=3346768) >> Index Cond: (id = t0.jobid) >> Filter: ((status = 'A'::bpchar) OR (status = 'a'::bpchar)) >> Rows Removed by Filter: 1 >> Total runtime: 20620.642 ms >> >> ====================== >> >> >> Best regards, >> Aeham >> >> On 30 October 2014 23:45, Karl Wright <[email protected]> wrote: >> >> > The ticket, in case you are interested, is CONNECTORS-1027. >> > >> > Karl >> > >> > >> > On Thu, Oct 30, 2014 at 7:42 PM, Karl Wright <[email protected]> >> wrote: >> > >> > > Hi Aeham, >> > > >> > > The problem is that you are not using MCF 1.7.1, which has a fix for >> the >> > > stuffer thread query that makes it run quickly on PostgreSQL 9.1 and >> > above. >> > > >> > > Thanks, >> > > Karl >> > > >> > > >> > > On Thu, Oct 30, 2014 at 7:30 PM, Aeham Abushwashi < >> > > [email protected]> wrote: >> > > >> > >> Hi, >> > >> >> > >> I have a multi-node zookeeper-based test environment with ~29M >> documents >> > >> (~36M jobqueue records and ~29M ingeststatus records) and jobs in >> > varying >> > >> states of being (running vs complete vs stopped vs paused). Some jobs >> > are >> > >> continuous and a few are not. >> > >> >> > >> During a quiet time -when no documents are actively ingested- I see >> very >> > >> high CPU utilisation on the Postgresql database server. This is >> > primarily >> > >> a >> > >> result of two queries executed by the Stuffer Thread; namely the main >> > >> query >> > >> in JobManager#fetchAndProcessDocuments and the doc priority query in >> > >> JobManager#getNextDocuments >> > >> >> > >> ========================== >> > >> >> > >> Query 1: >> > >> SELECT t0.id >> > >> >> > >> ,t0.docpriority,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >> > >> FROM jobqueue t0 WHERE (t0.status=? OR t0.status=?) AND >> > t0.checkaction=? >> > >> AND t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE >> > (t1.status=? >> > >> OR t1.status=?) AND t1.id=t0.jobid AND t1.priority=?) AND NOT >> > >> EXISTS(SELECT >> > >> 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN >> > >> (?,?,?,?,?,?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM >> > >> prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname= >> t4.name >> > ) >> > >> ORDER BY t0.docpriority ASC LIMIT 280 >> > >> >> > >> Query 1 example params: >> > >> [P, G, R, 1414683318802, A, a, 5, A, F, a, f, D, d] >> > >> >> > >> ++++++++ >> > >> >> > >> Query 2: >> > >> SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE >> > (status=? >> > >> OR status=?) AND checkaction=? AND checktime<=? AND EXISTS(SELECT 'x' >> > FROM >> > >> jobs t1 WHERE (t1.status=? OR t1.status=?) AND t1.id=t0.jobid) >> ORDER >> > BY >> > >> docpriority ASC LIMIT 1 >> > >> >> > >> Query 2 example params: >> > >> [P, G, R, 1414685283327, A, a] >> > >> >> > >> ========================== >> > >> >> > >> There is one continuous job in a running state and which has ~50K >> > >> documents >> > >> in a pending-purgatory ('G') state but not due for processing until >> > >> another >> > >> couple of days. There are a few other jobs which are stopped and >> have a >> > >> few >> > >> million documents in a pending-purgatory state. I don't expect >> Manifold >> > to >> > >> find any documents eligible for processing in this case and it >> doesn't, >> > >> but >> > >> it takes time and (more seriously) a big chunk of Postgresql CPU to >> find >> > >> the answer, 1 full CPU core for ~15 secs per MCF instance. This >> wouldn't >> > >> be >> > >> a big deal if the queries were executed infrequently. Unfortunately, >> > they >> > >> are run in a loop with a 2 second pause between iterations. >> > >> I'd argue that it would be relatively easy to run into this issue >> with a >> > >> bunch of large continuous jobs scheduled to run at varying times of >> the >> > >> day. >> > >> >> > >> With my limited knowledge of the inner workings of the Stuffer >> Thread, >> > >> it's >> > >> not obvious to me why query 2 needs to run IF query 1 returns no >> hits at >> > >> all. If it's not needed, then short-circuitting it in this case would >> > >> help. >> > >> >> > >> Increasing the sleep time between iterations of the Stuffer Thread, >> in >> > >> response to there being little or no work for a number of consecutive >> > >> iterations, could be another easy win. >> > >> >> > >> More profoundly would be to optimise the joins between the jobs table >> > and >> > >> the jobqueue table. One way to do that would be to move some of the >> > logic >> > >> to the Java code to first identify active jobs and then query the >> > jobqueue >> > >> table just for those. I understand this would require merging and >> > sorting >> > >> priorities in the Java code which is less than ideal. I continue to >> > >> analyse >> > >> the execution plans of the two queries and experiment with slight >> > variants >> > >> to test performance. I'll report back any important findings. >> > >> >> > >> Any thoughts or advice are welcome. Also if have experience running >> > large >> > >> continuous crawls I'd be interested to hear from you particularly on >> > your >> > >> experience with performance and especially the performance of SQL >> > queries. >> > >> >> > >> Best regards, >> > >> Aeham >> > >> >> > > >> > > >> > >> > >
