Apologies for not including this in the original email.  The other
index, job_u_createtime_2cy0wgyqpani8,
is on pc_job(CreateTime, Retired, Subtype, ID).  The optimizer chooses
Nested Loop when choosing that index, vs Hash Join when choosing the index
in the first plan that I posted.  It seems like the choice of the Hash Join
in the 1st plan that I posted is collateral damage from the seemingly
unnecessary need to do the sort.

Here's the plan without forcing the index:

Limit  (cost=1.00..52692.73 rows=10 width=20) (actual
time=55219.289..87704.704 rows=10 loops=1)
  Buffers: shared hit=9579294 read=328583
  I/O Timings: read=1157740.299
  ->  Nested Loop  (cost=1.00..2007555.82 rows=381 width=20) (actual
time=55219.288..87704.695 rows=10 loops=1)
        Buffers: shared hit=9579294 read=328583
        I/O Timings: read=1157740.299
        ->  Index Scan using job_u_createtime_2cy0wgyqpani8 on pc_job groot
 (cost=0.56..1800117.94 rows=153696 width=28) (actual
time=102.075..79470.670 rows=5650 loops=1)
                Index Cond: ((groot.retired = 0) AND (groot.subtype = 7))
                Filter: (groot.closedate IS NULL)
                Rows Removed by Filter: 14994857
              Buffers: shared hit=9563981 read=321566
              I/O Timings: read=1149579.949
        ->  Index Scan using pc_policy_pk on pc_policy policy_0
 (cost=0.43..1.35 rows=1 width=8) (actual time=1.456..1.456 rows=0
loops=5650)
                Index Cond: (policy_0.id = groot.policyid)
                Filter: ((policy_0.retired = 0) AND
(policy_0.producercodeofserviceid = ANY
('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])))
                Rows Removed by Filter: 1
              Buffers: shared hit=15313 read=7017
              I/O Timings: read=8160.350
Planning time: 2.209 ms
Execution time: 87705.116 ms

Thanks,
Jerry

On Wed, Jan 17, 2024 at 6:39 AM Jerry Brenner <jbren...@guidewire.com>
wrote:

> We are on 13.9.
> I'm wondering why a sort is required for this query, as the index should
> be providing the required ordering to satisfy the ORDER BY clause.  Does it
> have to do with the IS NULL predicate on the leading key column in the
> index?
>
> There's an index, job_u_closedate_g9cdc6ghupib, on pc_job(CloseDate,
> Retired, Subtype, CreateTime, ID).  All columns have ASC sort order and
> NULLs LAST.
>
>    - pc_job is the probe table in a hash join
>    - There are IS NULL and equality predicates on the 3 leading columns
>    in the index and the last 2 key columns (CreateTime, ID) are the ordering
>    columns in the query
>    - So, the Index Scan of  job_u_closedate_g9cdc6ghupib is returning the
>    rows in the sorted order
>    - NOTE: The sort is cheap, but I'm investigating this because
>    "CloseDate IS NULL" is very selective and without forcing the index the
>    optimizer is choosing a different sort avert index that does not include
>    CloseDate and hence a lot of time is spent filtering out rows on that
>    predicate against the heap.
>
> Here's the query
>
> SELECT /* ISNULL:pc_job.CloseDate:, KeyTable:pc_job; */ gRoot.ID col0,
> gRoot.Subtype col1, gRoot.CreateTime col2
> FROM pc_job gRoot INNER JOIN pc_policy policy_0
>      ON policy_0.ID = gRoot.PolicyID
> WHERE gRoot.Subtype = 7 AND gRoot.CloseDate IS NULL
>       AND gRoot.Retired = 0
>       AND policy_0.ProducerCodeOfServiceID IN
>
>  (248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719)
>    AND policy_0.Retired = 0
> ORDER BY col2 ASC, col0 ASC LIMIT 10
>
> Here's the query plan:
>
> Limit  (cost=107826.77..107826.79 rows=10 width=20) (actual 
> time=13149.872..13149.877 rows=10 loops=1)
>   Buffers: shared hit=2756 read=40121
>   I/O Timings: read=105917.908
>   ->  Sort  (cost=107826.77..107827.72 rows=381 width=20) (actual 
> time=13149.871..13149.874 rows=10 loops=1)
>           Sort Key: groot.createtime, groot.id
>           Sort Method: top-N heapsort Memory: 25kB
>         Buffers: shared hit=2756 read=40121
>         I/O Timings: read=105917.908
>         ->  Hash Join  (cost=15632.51..107818.53 rows=381 width=20) (actual 
> time=578.511..13149.658 rows=144 loops=1)
>               Buffers: shared hit=2750 read=40121
>               I/O Timings: read=105917.908
>               ->  Index Scan using job_u_closedate_g9cdc6ghupib on pc_job 
> groot  (cost=0.56..91783.14 rows=153696 width=28) (actual 
> time=3.864..12562.568 rows=75558 loops=1)
>                       Index Cond: ((groot.closedate IS NULL) AND 
> (groot.retired = 0) AND (groot.subtype = 7))
>                     Buffers: shared hit=2721 read=27934
>                     I/O Timings: read=58781.220
>               ->  Hash  (cost=15427.92..15427.92 rows=16322 width=8) (actual 
> time=543.298..543.299 rows=13016 loops=1)
>                     Buffers: shared hit=29 read=12187
>                     I/O Timings: read=47136.688
>                     ->  Index Scan using policy_n_producerco_3e8i0ojsyckhx on 
> pc_policy policy_0  (cost=0.43..15427.92 rows=16322 width=8) (actual 
> time=6.149..540.501 rows=13016 loops=1)
>                             Index Cond: ((policy_0.producercodeofserviceid = 
> ANY 
> ('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[]))
>  AND (policy_0.retired = 0))
>                           Buffers: shared hit=29 read=12187
>                           I/O Timings: read=47136.688
> Planning time: 0.538 ms
> Execution time: 13150.301 ms
>
> Thanks,
>
> Jerry
>
>
>

Reply via email to