Re: Slow query and wrong row estimates for CTE

2021-02-16 Thread Michael Lewis
>
>Sort Method: external
> merge  Disk: 30760kB
>Worker 0:  Sort Method:
> external merge  Disk: 30760kB
>Worker 1:  Sort Method:
> external merge  Disk: 30760kB
>

If you can increase work_mem, even setting it temporarily higher for the
session or transaction, that may dramatically change the plan. The advice
given by Justin particularly about row estimates would be wise to pursue.
I'd wonder how selective that condition of score_name =
'student_performance_index' is in filtering out many of the 9.3 million
tuples in that table and if an index with that as the leading column, or
just an index on that column would be helpful. You'd need to look at
pg_stats for the table and see how many distinct values, and
if student_performance_index is relatively high or low (or not present) in
the MCVs list.

I am not sure if your query does what you want it to do as I admit I didn't
follow your explanation of the desired behavior. My hunch is that you want
to make use of a window function and get rid of one of the CTEs.


Re: Slow query and wrong row estimates for CTE

2021-02-16 Thread Dane Foster
On Mon, Feb 15, 2021 at 5:32 PM Justin Pryzby  wrote:

> ...
>
> Without looking closely, an index might help: student_id,assignment_id
> That'd avoid the sort, and maybe change the shape of the whole plan.
>
I tried that prior to posting on the forum and it didn't make a difference.
🙁

I'll try your other suggestions later today or tomorrow. I will keep you
posted.

-- 
> Justin
>

Thanks,

Dane


Re: Slow query and wrong row estimates for CTE

2021-02-16 Thread Dane Foster
On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis  wrote:

>Sort Method: external
>> merge  Disk: 30760kB
>>Worker 0:  Sort
>> Method: external merge  Disk: 30760kB
>>Worker 1:  Sort
>> Method: external merge  Disk: 30760kB
>>
>
> If you can increase work_mem, even setting it temporarily higher for the
> session or transaction, that may dramatically change the plan.
>
I will try increasing work_mem for the session later today.

> The advice given by Justin particularly about row estimates would be wise
> to pursue.
>


> I'd wonder how selective that condition of score_name =
> 'student_performance_index' is in filtering out many of the 9.3 million
> tuples in that table and if an index with that as the leading column, or
> just an index on that column would be helpful.
>
There are 1,206,355 rows where score_name='student_performance_idex'.

> You'd need to look at pg_stats for the table and see how many distinct
> values, and if student_performance_index is relatively high or low (or not
> present) in the MCVs list.
>
I will look into that.


> I am not sure if your query does what you want it to do as I admit I
> didn't follow your explanation of the desired behavior. My hunch is that
> you want to make use of a window function and get rid of one of the CTEs.
>
If you could tell me what part(s) are unclear I would appreciate it so that
I can write a better comment.

Thank you sooo much for all the feedback. It is *greatly* appreciated!
Sincerely,

Dane


Re: Query performance issue

2021-02-16 Thread Michael Lewis
What indexes exist on those tables? How many rows do you expect to get back
in total? Is the last_contacted_anychannel_dttm clause restrictive, or does
that include most of the prospect table (check pg_stats for the histogram
if you don't know).

and (a."shared_paddr_with_customer_ind" = 'N')
   and (a."profane_wrd_ind" = 'N')
   and (a."tmo_ofnsv_name_ind" = 'N')
   and (a."has_individual_address" = 'Y')
   and (a."has_last_name" = 'Y')
   and (a."has_first_name" = 'Y'))

Are these conditions expected to throw out very few rows, or most of the
table?

If you change both joins to EXISTS clauses, do you get the same plan when
you run explain?


Re: Slow query and wrong row estimates for CTE

2021-02-16 Thread Dane Foster
Short conclusion:
Switching from CTEs to temporary tables and analyzing reduced the runtime
from 15 minutes to about 1.5 minutes.


Longer conclusion:

@Justin Pryzby 

   - I experimented w/ materializing the CTEs and it helped at the margins
   but did not significantly contribute to a reduction in runtime.
   - No clustering was required because once I switched to temporary tables
   the new plan no longer used the for_upsert index.

@Michael Lewis 

   - Increasing work_mem to 100MB (up from 4MB) helped at the margins
   (i.e., some 100's of millisecond improvement) but did not represent a
   significant reduction in the runtime.
   - It wasn't obvious to me which window function would be appropriate for
   the problem I was trying to solve therefore I didn't experiment w/ that
   approach.
   - The selectivity of score_name='student_performance_index' was not
   enough for the planner to choose an index over doing a FTS.

Finally, thank you both for helping me bring this poor performing query to
heel. Your insights were helpful and greatly appreciated.

Sincerely,

Dane


On Tue, Feb 16, 2021 at 10:25 AM Dane Foster  wrote:

>
> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis  wrote:
>
>>Sort Method: external
>>> merge  Disk: 30760kB
>>>Worker 0:  Sort
>>> Method: external merge  Disk: 30760kB
>>>Worker 1:  Sort
>>> Method: external merge  Disk: 30760kB
>>>
>>
>> If you can increase work_mem, even setting it temporarily higher for the
>> session or transaction, that may dramatically change the plan.
>>
> I will try increasing work_mem for the session later today.
>
>> The advice given by Justin particularly about row estimates would be wise
>> to pursue.
>>
>
>
>> I'd wonder how selective that condition of score_name =
>> 'student_performance_index' is in filtering out many of the 9.3 million
>> tuples in that table and if an index with that as the leading column, or
>> just an index on that column would be helpful.
>>
> There are 1,206,355 rows where score_name='student_performance_idex'.
>
>> You'd need to look at pg_stats for the table and see how many distinct
>> values, and if student_performance_index is relatively high or low (or not
>> present) in the MCVs list.
>>
> I will look into that.
>
>
>> I am not sure if your query does what you want it to do as I admit I
>> didn't follow your explanation of the desired behavior. My hunch is that
>> you want to make use of a window function and get rid of one of the CTEs.
>>
> If you could tell me what part(s) are unclear I would appreciate it so
> that I can write a better comment.
>
> Thank you sooo much for all the feedback. It is *greatly* appreciated!
> Sincerely,
>
> Dane
>
>