Pavel
Tx for the tip. But given that if the I64_01 column is removed from the select
list, the index is used I guess the cast is not likely to be the cause.
Like so
explain
select d.short_id,mv.timestamp --,mv.I64_01
from device d, device_configuration dc, measurement_value mv,
pg_timezone_names tzn
where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and
dc.timezone=tzn.name and
mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and
mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
==>
Nested Loop (cost=1.13..6217004.08 rows=60425437 width=12)
-> Nested Loop (cost=0.56..21334.84 rows=2186 width=20)
Join Filter: (dc.timezone = pg_timezone_names.name)
-> Nested Loop (cost=0.56..7497.34 rows=615 width=18)
-> Index Scan using device_short_id_key on device d
(cost=0.28..2423.90 rows=683 width=20)
-> Index Scan using device_configuration_device_latest_idx on
device_configuration dc (cost=0.28..7.42 rows=1 width=30)
Index Cond: ((device_id = d.id) AND (latest = true))
Filter: latest
-> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000
width=48)
-> Index Only Scan using measurement_values_pkey on measurement_value mv
(cost=0.57..2399.33 rows=43492 width=12)
Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06
00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND
("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone -
pg_timezone_names.utc_offset)))
Peter
> On 10 Nov 2020, at 08:25, Pavel Stehule <[email protected]> wrote:
>
>
>
> út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <[email protected]
> <mailto:[email protected]>> napsal:
> Michael
>
> Many thanks for spending your time on this. Your alternative does not help
> unfortunately (see execution plan)
>
> Still a sequential scan on the complete table. I have tried many alternatives
> and somehow whenever I add a column that is not in the index (I64_01) the
> optimizer decides not to use the index. If I remove that column, the index is
> used. I guess it estimates that the extra indirection from index pages to the
> row pages is more costly than scanning the 168M records. Pretty sure it’s
> not, but I cannot explain it to the stubborn thing :)
>
> Btw, thanks for the >= tip (I was aware of it)
>
> Wkr,
>
> Peter
>
>
> Hash Join (cost=683.93..7270857.46 rows=458127 width=20)
> Hash Cond: (mv_inner.device_id = d.short_id)
> Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp
> without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp"
> < ('2020-11-07 00:00:00'::timestamp without time zone -
> pg_timezone_names.utc_offset)))
> -> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33
> rows=1287989 width=1006)
> Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time
> zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
>
> when you see cast in filter, then you should check type equality in
> constraints. With some exception Postgres uses indexes only when filtered
> value has same type like column type.
>
> Maybe there is inconsistency between timestamp (with time zone), and
> timestamp without time zone
>
> Regards
>
> Pavel
>
>
> -> Hash (cost=656.61..656.61 rows=2186 width=20)
> -> Hash Join (cost=77.87..656.61 rows=2186 width=20)
> Hash Cond: (dc.timezone = pg_timezone_names.name
> <http://pg_timezone_names.name/>)
> -> Hash Join (cost=55.37..533.83 rows=615 width=18)
> Hash Cond: (dc.device_id = d.id <http://d.id/>)
> -> Seq Scan on device_configuration dc
> (cost=0.00..470.01 rows=615 width=30)
> Filter: latest
> -> Hash (cost=46.83..46.83 rows=683 width=20)
> -> Seq Scan on device d (cost=0.00..46.83
> rows=683 width=20)
> -> Hash (cost=10.00..10.00 rows=1000 width=48)
> -> Function Scan on pg_timezone_names (cost=0.00..10.00
> rows=1000 width=48)
>
>
>
>> On 10 Nov 2020, at 01:15, Michael Lewis <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <[email protected]
>> <mailto:[email protected]>> wrote:
>> Adding the tzn.utc_offset results in the fact that the execution plan no
>> longer considers to use the index on the measurement_value table. Is there
>> any way the SQL can be rewritten so that the index is used? Or any other
>> solution so that the query with the timezone offset returns in a comparable
>> time?
>>
>> I am not aware of a best practice to handle this. Your where condition on
>> mv.timestamp now depends on several joins to do a filtering that used to be
>> a static range that can be scanned into the index as a first node in the
>> plan. I have sometimes used a sub-query on a broader condition that allows
>> the use of the index, and then fully reducing the set later. Something like
>> this-
>>
>> select d.short_id,mv.timestamp,mv.I64_01
>> from device d, device_configuration dc, (
>> select mv.*
>> from measurement_value AS mv_inner
>> where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and
>> mv.timestamp < '2020-11-07'::timestamp + interval '1 day'
>> offset 0 /* to prevent in-lining the join to the outside set */
>> ) mv, pg_timezone_names tzn
>> where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and
>> dc.latest=true and dc.timezone=tzn.name <http://tzn.name/> and
>> mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and
>> mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
>>
>> By the way, it seems a little odd to be exclusive on both the begin and end.
>> I'd usually expect timestamp >= start_date and timestamp < end_date +
>> interval '1 day' to fully capture a 24 hour period. Right now, you are
>> excluding any data that happens to have a timestamp value with .000000
>> seconds (midnight exactly).
>