Unfortunately, I've found that this change triggers another bug. Please hold off until I have a fix for that one as well.
Karl On Sun, Nov 2, 2014 at 9:34 AM, Karl Wright <[email protected]> wrote: > Hi Aeham, > > The more I look at this, the more I'm convinced that there must be a bug > elsewhere that permits jobqueue records with non-null document priorities > to hang around. > > If you look at this JobManager method: > > public void finishJobStops(long timestamp, ArrayList modifiedJobs) > throws ManifoldCFException > > ... which is used whenever a job is stopped for any reason, you will note > that the logic in question is *supposed* to clear the document priority > field for all documents belonging to that job. If that works as it is > supposed to, then there should be *no* documents in the docpriority index > with a document priority which belong to inactive or paused jobs. > > So the only way that this index can be slow is if, perchance, PostgreSQL > puts "null" values for a field FIRST in the index. But: > > >>>>>> > /** Clear all document priorities for a job */ > public void clearDocPriorities(Long jobID) > throws ManifoldCFException > { > HashMap map = new HashMap(); > map.put(prioritySetField,null); > map.put(docPriorityField,nullDocPriority); > ArrayList list = new ArrayList(); > String query = buildConjunctionClause(list,new ClauseDescription[]{ > new UnitaryClause(jobIDField,jobID)}); > performUpdate(map,"WHERE "+query,list,null); > noteModifications(0,1,0); > } > <<<<<< > > ... where "nullDocPriority" is declared as: > > >>>>>> > public static final double noDocPriorityValue = 1e9; > public static final Double nullDocPriority = new > Double(noDocPriorityValue + 1.0); > <<<<<< > > ... which is a value picked to be last, always. (It was necessary to pick > a specific value because MySQL did not put nulls at the end, but rather at > the beginning.) > > So now I am very curious as to what is *really* at the front of this index > -- so please execute the query I sent and let me know. > > It occurs to me that there is another way for the query to be slow, > however -- which is if there are insufficient matching rows to meet the > LIMIT size. In this case, the entire index would be scanned. But this can > be readily addressed by simply putting in an appropriate WHERE condition: > > >>>>>> > new UnitaryClause("t0."+jobQueue.docPriorityField,"<",nullDocPriority), > <<<<<< > > If the results of the query I asked you to run look fine, then we may want > to try this next. I've attached a second patch to CONNECTORS-1090. Please > apply both patches to see if that fixes your query execution time. > > > Thanks, > Karl > > > > On Sat, Nov 1, 2014 at 8:09 AM, Karl Wright <[email protected]> wrote: > >> Hi Aeham, >> >> To figure out what is going on, it would probably be sufficient to >> execute the following queries and send me the results: >> >> EXPLAIN SELECT >> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >> FROM jobqueue t0 WHERE t0.checkaction='R' AND t0.checktime<=2414771559577 >> ORDER BY t0.docpriority LIMIT 20; >> SELECT >> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >> FROM jobqueue t0 WHERE t0.checkaction='R' AND t0.checktime<=2414771559577 >> ORDER BY t0.docpriority LIMIT 20; >> >> These should both read directly from the index in question; we can see >> immediately what kinds of documents it is having to skip to get to the >> active ones. >> >> Thanks! >> Karl >> >> >> On Fri, Oct 31, 2014 at 2:42 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Aeham, >>> >>> I've looked at this at some level of detail now, and have some >>> conclusions. >>> >>> (1) The first query you provided is in fact using the proper index for >>> ordering; it's just not obvious from the plan. The way you can tell is if >>> there is an explicit sort/limit. The primary jobqueue index scan is >>> happening within an anti-join and a semi-join; you can tell this by cutting >>> out all the EXISTS clauses and getting a reduced plan that way, and >>> comparing. For instance, here's an explain of the simple query: >>> >>> >>>>>> >>> dbname=# explain SELECT >>> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >>> FROM jobqueue t0 ORDER BY t0.docpriority LIMIT 240; >>> QUERY PLAN >>> >>> >>> ------------------------------------------------------------------------------------------------- >>> Limit (cost=0.29..38.20 rows=240 width=161) >>> -> Index Scan using i1413214448001 on jobqueue t0 >>> (cost=0.29..2312.39 rows=14634 width=161) >>> (2 rows) >>> <<<<<< >>> >>> Applying filtering criteria: >>> >>> >>>>>> >>> dbname=# explain SELECT >>> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >>> FROM jobqueue t0 WHERE (t0.status='P' OR t0.status='G') AND >>> t0.checkaction='R' AND t0.checktime<=2414771559577 ORDER BY t0.docpriority >>> LIMIT 240; >>> QUERY PLAN >>> >>> >>> ------------------------------------------------------------------------------------------------ >>> Limit (cost=0.29..68.98 rows=240 width=161) >>> -> Index Scan using i1413214448001 on jobqueue t0 >>> (cost=0.29..2389.31 rows=8347 width=161) >>> Index Cond: ((checkaction = 'R'::bpchar) AND (checktime <= >>> 2414771559577::bigint)) >>> Filter: ((status = 'P'::bpchar) OR (status = 'G'::bpchar)) >>> (4 rows) >>> <<<<<< >>> >>> Now, this is a bit strange, because there is a filter on the status, so >>> the index isn't helping there, and yet the index includes the status: >>> "i1413214448001" btree (docpriority, status, checkaction, checktime) >>> But maybe this is a statistical optimization? I don't know. >>> >>> Anyhow, as soon as a NOT EXISTS clause is added, you see an anti-join: >>> >>> >>>>>> >>> dbname=# explain SELECT >>> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset >>> FROM jobqueue t0 WHERE (t0.status='P' OR t0.status='G') AND >>> t0.checkaction='R' AND t0.checktime<=2414771559577 AND NOT EXISTS(SELECT >>> 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND >>> t3.eventname=t4.name) ORDER BY t0.docpriority LIMIT 240; >>> QUERY PLAN >>> >>> >>> ---------------------------------------------------------------------------------------------------------- >>> Limit (cost=0.57..231.40 rows=240 width=161) >>> -> Nested Loop Anti Join (cost=0.57..7957.24 rows=8273 width=161) >>> -> Index Scan using i1413214448001 on jobqueue t0 >>> (cost=0.29..2389.31 rows=8347 width=161) >>> Index Cond: ((checkaction = 'R'::bpchar) AND (checktime >>> <= 2414771559577::bigint)) >>> Filter: ((status = 'P'::bpchar) OR (status = 'G'::bpchar)) >>> -> Nested Loop (cost=0.29..0.66 rows=1 width=8) >>> -> Index Scan using i1413214448007 on prereqevents t3 >>> (cost=0.14..0.17 rows=1 width=524) >>> Index Cond: (t0.id = owner) >>> -> Index Only Scan using events_pkey on events t4 >>> (cost=0.14..0.48 rows=1 width=516) >>> Index Cond: (name = (t3.eventname)::text) >>> (10 rows) >>> <<<<<< >>> >>> The index scan is still there and is apparently providing the sorting, >>> which is good still. It's just inside an anti-join. >>> >>> So you can take this further and add the two remaining EXISTS/NOT EXISTS >>> clauses, and then your first plan makes sense. But your proposed *second* >>> plan is clearly *not* using the docpriority index for sorting, since it >>> sorts by itself later on: >>> >>> >>>>>> >>> Limit (cost=122440.72..122441.42 rows=280 width=177) (actual >>> time=1010.722..1010.722 rows=0 loops=1) >>> -> Sort (cost=122440.72..122451.13 rows=4164 width=177) (actual >>> time=1010.719..1010.719 rows=0 loops=1) >>> Sort Key: t0.docpriority >>> Sort Method: quicksort Memory: 25kB >>> <<<<<< >>> >>> So what we need to understand now is why the first query is slow. It's >>> reading from the correct index, but is nevertheless taking a long time, >>> possibly because there are a lot of records with status 'P' or 'G' that >>> belong to jobs that are not in state 'a' or 'A' that it has to go through >>> before it assembles 240 results. Can you confirm this picture? >>> >>> Thanks, >>> Karl >>> >>> >>> On Fri, Oct 31, 2014 at 1:09 PM, Aeham Abushwashi < >>> [email protected]> wrote: >>> >>>> That's great news, thanks Karl! On my part, I've been digging deeper >>>> into >>>> the execution plans of the two queries and I've made a similar >>>> observation >>>> of the ORDER BY skewing the original query's plan significantly. >>>> However, I >>>> have yet to figure out why the same ORDER BY clause doesn't have a >>>> similar >>>> impact on the modified query that uses JOIN. I'll look into this a bit >>>> more >>>> over the weekend. >>>> >>>> Best regards, >>>> Aeham >>>> >>> >>> >> >
