RE: select query does not pick up the right index

Abadie Lana Fri, 04 Jan 2019 01:00:02 -0800

-----Original Message-----
From: [email protected] 
<[email protected]> On Behalf Of 
Abadie Lana
Sent: 04 January 2019 09:18
To: Justin Pryzby <[email protected]>
Cc: David Rowley <[email protected]>; 
[email protected]
Subject: [Possible Spoof] RE: select query does not pick up the right index


Warning: This message was sent by 
[email protected] supposedly on behalf 
of Abadie Lana <[email protected]>. Please contact

-----Original Message-----
From: Justin Pryzby <[email protected]>
Sent: 04 January 2019 00:48
To: Abadie Lana <[email protected]>
Cc: David Rowley <[email protected]>; 
[email protected]
Subject: Re: select query does not pick up the right index

On Thu, Jan 03, 2019 at 12:57:27PM +0000, Abadie Lana wrote:
> Main parameters : effective_cache_size : 4GB, shared_buffers 4GB, 
> work_mem 4MB

I doubt it will help much, but you should consider increasing work_mem, unless 
you have many expensive queries running at once.

Could you also send the rest of the pg_statistic for that table ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='...' AND 
tablename='...' ORDER BY 1 DESC; 

Hmm. Is it normal that the couple (tablename,attname ) is not unique? I'm 
surprised to see sample_{ctrl,util,buil} quoted twice

css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) 
frac_MCV, tablename, attname, null_frac, n_distinct, 
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist 
FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 
1 DESC;
 frac_mcv |     tablename     |  attname   | null_frac | n_distinct | n_mcv | 
n_hist
----------+-------------------+------------+-----------+------------+-------+--------
        1 | sample_buil_year  | channel_id |         0 |         16 |    16 |
  0.98249 | sample_ctrl       | channel_id |         0 |         26 |    17 |   
   9
 0.982333 | sample_ctrl_month | channel_id |         0 |         34 |    17 |   
  17
 0.981533 | sample_ctrl       | channel_id |         0 |         28 |    18 |   
  10
   0.9371 | sample_ctrl_year  | channel_id |         0 |         38 |    16 |   
  22
 0.928767 | sample_buil_month | channel_id |         0 |        940 |    54 |   
 101
  0.92535 | sample            | channel_id |         0 |       2144 |   167 |   
1001
 0.907501 | sample_buil       | channel_id |         0 |        565 |    43 |   
 101
   0.8876 | sample_util_year  | channel_id |         0 |        501 |    45 |   
 101
    0.815 | sample_util       | channel_id |         0 |        557 |    82 |   
 101
 0.807667 | sample_buil       | channel_id |         0 |        164 |    31 |   
 101
 0.806267 | sample_util       | channel_id |         0 |        732 |   100 |   
 101
 0.803766 | sample_util_month | channel_id |         0 |        731 |   100 |   
 101
(13 rows)

Ah...sample_ctrl_year and sample_buil_year have n_distinct -1? Unlike 
sample_util_year. Could that explain the wrong choice? 

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='smpl_time' 
AND tablename like 'sample%' ORDER BY 1 DESC;
  frac_mcv  |     tablename     |  attname  | null_frac | n_distinct  | n_mcv | 
n_hist
------------+-------------------+-----------+-----------+-------------+-------+--------
            | sample_ctrl_month | smpl_time |         0 |          -1 |       | 
   101
            | sample_ctrl_year  | smpl_time |         0 |          -1 |       | 
   101
            | sample_ctrl       | smpl_time |         0 |          -1 |       | 
   101
            | sample_ctrl       | smpl_time |         0 |          -1 |       | 
   101
            | sample_buil_year  | smpl_time |         0 |          -1 |       | 
   101
  0.0154667 | sample_buil_month | smpl_time |         0 | 1.03857e+06 |   100 | 
   101
  0.0154523 | sample_buil       | smpl_time |         0 |      854250 |   100 | 
   101
     0.0115 | sample_util       | smpl_time |         0 |      405269 |   100 | 
   101
  0.0112333 | sample_util       | smpl_time |         0 |      537030 |   100 | 
   101
  0.0106667 | sample_util_month | smpl_time |         0 |      539001 |   100 | 
   101
 0.00946667 | sample_buil       | smpl_time |         0 |   -0.328554 |   100 | 
   101
 0.00852342 | sample            | smpl_time |         0 |  1.5125e+07 |  1000 | 
  1001
 0.00780001 | sample_util_year  | smpl_time |         0 | 1.73199e+06 |   100 | 
   101
(13 rows)

Based on your feedback...i rerun analyse directly on the two table 
sample_ctrl_year and sample_buil_year
The new values are
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE 
attname='channel_id' AND tablename like 'sample%' ORDER BY 1 DESC;
 frac_mcv |     tablename     |  attname   | null_frac | n_distinct | n_mcv | 
n_hist
----------+-------------------+------------+-----------+------------+-------+--------
  0.99987 | sample_buil_year  | channel_id |         0 |         76 |    16 |   
  60
 0.999632 | sample_ctrl_year  | channel_id |         0 |        132 |    31 |   
 101
 0.999628 | sample_ctrl_month | channel_id |         0 |         84 |    23 |   
  61
 0.999627 | sample_ctrl       | channel_id |         0 |        132 |    31 |   
 101
 0.999599 | sample_ctrl       | channel_id |         0 |         42 |    22 |   
  20
 0.998074 | sample_buil       | channel_id |         0 |        493 |   122 |   
 371
 0.997693 | sample_util       | channel_id |         0 |       1379 |   509 |   
 870
 0.991841 | sample_buil       | channel_id |         0 |       9867 |   107 |   
9740
 0.991567 | sample_util_month | channel_id |         0 |       5716 |   504 |   
5209
 0.990369 | sample_util_year  | channel_id |         0 |       4946 |   255 |   
4689
 0.990062 | sample_util       | channel_id |         0 |       5804 |   641 |   
5160
 0.972386 | sample_buil_month | channel_id |         0 |      19946 |   148 |  
10001
 0.967391 | sample            | channel_id |         0 |       7597 |   409 |   
7178
(13 rows)


Now when running the query again, only for sample_buil_year table the wrong 
index is picked up...
explain (analyze, buffers) select 
'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val
 from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE 
name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;
                                                                                
      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------
 Limit  (cost=13.40..30.01 rows=5 width=112) (actual time=13554.536..13554.570 
rows=3 loops=1)
   Buffers: shared hit=26626389 read=17
   InitPlan 1 (returns $0)
     ->  Index Scan using unique_chname on channel  (cost=0.41..8.43 rows=1 
width=8) (actual time=26.858..26.860 rows=1 loop
s=1)
           Index Cond: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text)
           Buffers: shared hit=2 read=2
   ->  Result  (cost=4.96..5131208.65 rows=1544048 width=112) (actual 
time=13554.534..13554.567 rows=3 loops=1)
         Buffers: shared hit=26626389 read=17
         ->  Merge Append  (cost=4.96..5115768.17 rows=1544048 width=80) 
(actual time=13554.531..13554.562 rows=3 loops=1)
               Sort Key: c.smpl_time DESC
               Buffers: shared hit=26626389 read=17
               ->  Index Scan Backward using smpl_time_qa_idx on sample c  
(cost=0.12..8.14 rows=1 width=326) (actual time=0
.005..0.005 rows=0 loops=1)
                     Filter: (channel_id = $0)
                     Buffers: shared hit=1
               ->  Index Scan Backward using sample_time_b_idx on sample_buil 
c_1  (cost=0.42..7775.26 rows=2096 width=320)
(actual time=38.931..38.932 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=3 read=4
               ->  Index Scan Backward using sample_time_c_idx on sample_ctrl 
c_2  (cost=0.42..77785.57 rows=22441 width=320
) (actual time=0.010..0.010 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=3
               ->  Index Scan Backward using sample_time_u_idx on sample_util 
c_3  (cost=0.43..14922.72 rows=3830 width=320)
 (actual time=8.939..8.939 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=1 read=2
               ->  Index Scan Backward using sample_time_bm_idx on 
sample_buil_month c_4  (cost=0.56..2967.10 rows=740 width
=74) (actual time=260.282..260.311 rows=3 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=3 read=5
               ->  Index Scan Backward using smpl_time_bx2_idx on 
sample_buil_year c_5  (cost=0.56..2023054.76 rows=665761 w
idth=75) (actual time=13216.589..13216.589 rows=0 loops=1)
                     Filter: (channel_id = $0)
                     Rows Removed by Filter: 50597834
                     Buffers: shared hit=26626368
               ->  Index Scan Backward using sample_time_cm_idx on 
sample_ctrl_month c_6  (cost=0.56..759241.36 rows=217585
width=75) (actual time=0.019..0.019 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=4
               ->  Index Scan Backward using sample_time_cy_idx on 
sample_ctrl_year c_7  (cost=0.57..2097812.02 rows=602872
width=76) (actual time=0.007..0.007 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=4
               ->  Index Scan Backward using sample_time_um_idx on 
sample_util_month c_8  (cost=0.57..48401.65 rows=12418 wi
dth=75) (actual time=18.999..19.000 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=1 read=3
               ->  Index Scan Backward using sample_time_uy_idx on 
sample_util_year c_9  (cost=0.57..54293.22 rows=16304 wid
th=74) (actual time=10.739..10.739 rows=0 loops=1)
                     Index Cond: (channel_id = $0)
                     Buffers: shared hit=1 read=3
 Planning time: 0.741 ms
 Execution time: 13554.666 ms
(44 rows)
Looking more closely to the sample_buil_year table
select count(distinct channel_id),count(*) from sample_buil_year;
 count |  count
-------+----------
   100 | 50597834
(1 row)

Now, the channel name I gave has no entries in sample_buil_year...(and when I 
run the query directly against sample_buil_year the right index is picked 
up).... So maybe something related with the partitioning?

select 
'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val
 from sample_buil_year c WHERE c.channel_id = (SELECT channel_id FROM channel 
WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;
 ?column? | smpl_time | nanosecs | float_val | num_val | str_val | datatype | 
array_val
----------+-----------+----------+-----------+---------+---------+----------+-----------
(0 rows)

css_archive_3_0_0=# explain analyze select 
'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val
 from sample_buil_year c WHERE c.channel_id = (SELECT channel_id FROM channel 
WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;
                                                                            
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
---------------------------------------
 Limit  (cost=9.00..21.31 rows=5 width=107) (actual time=0.055..0.055 rows=0 
loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using unique_chname on channel  (cost=0.41..8.43 rows=1 
width=8) (actual time=0.038..0.040 rows=1 loops=
1)
           Index Cond: ((name)::text = 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW'::text)
   ->  Index Scan Backward using sample_time_by_idx on sample_buil_year c  
(cost=0.56..1639944.37 rows=665761 width=107) (ac
tual time=0.054..0.054 rows=0 loops=1)
         Index Cond: (channel_id = $0)
 Planning time: 0.178 ms
 Execution time: 0.088 ms
(8 rows)

RE: select query does not pick up the right index

Reply via email to