Michael
Many thanks for spending your time on this. Your alternative does not help
unfortunately (see execution plan)
Still a sequential scan on the complete table. I have tried many alternatives
and somehow whenever I add a column that is not in the index (I64_01) the
optimizer decides not to use the index. If I remove that column, the index is
used. I guess it estimates that the extra indirection from index pages to the
row pages is more costly than scanning the 168M records. Pretty sure it’s not,
but I cannot explain it to the stubborn thing :)
Btw, thanks for the >= tip (I was aware of it)
Wkr,
Peter
Hash Join (cost=683.93..7270857.46 rows=458127 width=20)
Hash Cond: (mv_inner.device_id = d.short_id)
Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp
without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" <
('2020-11-07 00:00:00'::timestamp without time zone -
pg_timezone_names.utc_offset)))
-> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33
rows=1287989 width=1006)
Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time
zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
-> Hash (cost=656.61..656.61 rows=2186 width=20)
-> Hash Join (cost=77.87..656.61 rows=2186 width=20)
Hash Cond: (dc.timezone = pg_timezone_names.name)
-> Hash Join (cost=55.37..533.83 rows=615 width=18)
Hash Cond: (dc.device_id = d.id)
-> Seq Scan on device_configuration dc (cost=0.00..470.01
rows=615 width=30)
Filter: latest
-> Hash (cost=46.83..46.83 rows=683 width=20)
-> Seq Scan on device d (cost=0.00..46.83 rows=683
width=20)
-> Hash (cost=10.00..10.00 rows=1000 width=48)
-> Function Scan on pg_timezone_names (cost=0.00..10.00
rows=1000 width=48)
> On 10 Nov 2020, at 01:15, Michael Lewis <[email protected]> wrote:
>
> On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <[email protected]
> <mailto:[email protected]>> wrote:
> Adding the tzn.utc_offset results in the fact that the execution plan no
> longer considers to use the index on the measurement_value table. Is there
> any way the SQL can be rewritten so that the index is used? Or any other
> solution so that the query with the timezone offset returns in a comparable
> time?
>
> I am not aware of a best practice to handle this. Your where condition on
> mv.timestamp now depends on several joins to do a filtering that used to be a
> static range that can be scanned into the index as a first node in the plan.
> I have sometimes used a sub-query on a broader condition that allows the use
> of the index, and then fully reducing the set later. Something like this-
>
> select d.short_id,mv.timestamp,mv.I64_01
> from device d, device_configuration dc, (
> select mv.*
> from measurement_value AS mv_inner
> where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and
> mv.timestamp < '2020-11-07'::timestamp + interval '1 day'
> offset 0 /* to prevent in-lining the join to the outside set */
> ) mv, pg_timezone_names tzn
> where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and
> dc.latest=true and dc.timezone=tzn.name <http://tzn.name/> and
> mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and
> mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
>
> By the way, it seems a little odd to be exclusive on both the begin and end.
> I'd usually expect timestamp >= start_date and timestamp < end_date +
> interval '1 day' to fully capture a 24 hour period. Right now, you are
> excluding any data that happens to have a timestamp value with .000000
> seconds (midnight exactly).