Hi

I have two main problems and that is slow updates and joins, but when I build 
up the table met_vaer_wisline.nora_bc25_observation with more than 4 billion we 
are able to insert about 85.000 rows pr sekund so thats ok.


The problems start when I need to update or joins with other tables using this 
table.


In this example I have two tables one with 4 billion rows and another with 
50000 rows and then I try to do a standard simple join between this two tables 
and this takes 397391  ms. with this SQL (the query plan is added is further 
down)

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch

but if I use this SQL  it takes 25727 ms (the query plan is added is further 
down).

SELECT

o.*

FROM

(

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o

WHERE

EXISTS (SELECT 1 FROM  (SELECT distinct epoch FROM  met_vaer_wisline.new_data) 
AS n WHERE n.epoch = o.epoch )

AND

EXISTS (SELECT 1 FROM  (SELECT distinct id_point FROM  
met_vaer_wisline.new_data) AS n WHERE n.id_point = o.point_uid_ref )

) AS o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch


The columns are indexed and I did run vacuum analyze on both tables before I 
tested. work_mem is 200MB but I also tested with much more work_mem but that 
does not change the execution time.

The CPU goes to 100% when the query is running and there is no IOWait while the 
SQL is running.


Why is the second SQL 15 times faster ?

Is this normal or have I done something wrong here ?


I have tested clustering around a index but that did not help.


Is the only way to fix slow updates and joins to use partitioning ?

https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html



Here are the SQL and more info


EXPLAIN  analyze

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch



-[ RECORD 1 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN | Merge Join  (cost=0.87..34374722.51 rows=52579 width=16) (actual 
time=0.127..397379.844 rows=50000 loops=1)

-[ RECORD 2 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   Merge Cond: (n.id_point = o.point_uid_ref)

-[ RECORD 3 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   Join Filter: (o.epoch = n.epoch)

-[ RECORD 4 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   Rows Removed by Join Filter: 2179150000

-[ RECORD 5 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   ->  Index Scan using idx_met_vaer_wisline_new_data_id_point on 
new_data n  (cost=0.29..23802.89 rows=50000 width=8) (actual time=0.024..16.736 
rows=50000 loops=1)

-[ RECORD 6 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   ->  Index Scan using 
idx_met_vaer_wisline_nora_bc25_observation_point_uid_ref on 
nora_bc25_observation o  (cost=0.58..2927642364.25 rows=4263866624 width=16) 
(actual time=0.016..210486.136 rows=2179200001 loops=1)

-[ RECORD 7 
]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN | Total runtime: 397383.663 ms


Time: 397391.388 ms



EXPLAIN  analyze

SELECT

o.*

FROM

(

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o

WHERE

EXISTS (SELECT 1 FROM  (SELECT distinct epoch FROM  met_vaer_wisline.new_data) 
AS n WHERE n.epoch = o.epoch )

AND

EXISTS (SELECT 1 FROM  (SELECT distinct id_point FROM  
met_vaer_wisline.new_data) AS n WHERE n.id_point = o.point_uid_ref )

) AS o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch


-[ RECORD 1 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN | Hash Semi Join  (cost=1019.70..1039762.81 rows=54862 width=16) 
(actual time=359.284..25717.838 rows=50096 loops=1)

-[ RECORD 2 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   Hash Cond: (o.point_uid_ref = new_data_1.id_point)

-[ RECORD 3 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   ->  Nested Loop  (cost=0.87..972602.28 rows=24964326 width=16) 
(actual time=0.287..24412.088 rows=24262088 loops=1)

-[ RECORD 4 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |         ->  Unique  (cost=0.29..1014.29 rows=248 width=4) (actual 
time=0.117..6.849 rows=248 loops=1)

-[ RECORD 5 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |               ->  Index Only Scan using 
idx_met_vaer_wisline_new_data_epoch on new_data  (cost=0.29..889.29 rows=50000 
width=4) (actual time=0.115..4.521 rows=50000 loops=1)

-[ RECORD 6 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |                     Heap Fetches: 0

-[ RECORD 7 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |         ->  Index Scan using 
idx_met_vaer_wisline_nora_bc25_observation_epoch on nora_bc25_observation o  
(cost=0.58..2911.05 rows=100663 width=16) (actual time=0.014..89.512 rows=97831 
loops=248)

-[ RECORD 8 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |               Index Cond: (epoch = new_data.epoch)

-[ RECORD 9 
]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |   ->  Hash  (cost=1016.31..1016.31 rows=202 width=4) (actual 
time=16.636..16.636 rows=202 loops=1)

-[ RECORD 10 
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |         Buckets: 1024  Batches: 1  Memory Usage: 8kB

-[ RECORD 11 
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |         ->  Unique  (cost=0.29..1014.29 rows=202 width=4) (actual 
time=0.046..16.544 rows=202 loops=1)

-[ RECORD 12 
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |               ->  Index Only Scan using 
idx_met_vaer_wisline_new_data_id_point on new_data new_data_1  
(cost=0.29..889.29 rows=50000 width=4) (actual time=0.046..11.315 rows=50000 
loops=1)

-[ RECORD 13 
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN |                     Heap Fetches: 0

-[ RECORD 14 
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN | Total runtime: 25719.120 ms


Time: 25727.097 ms



select version();

                                                   version

--------------------------------------------------------------------------------------------------------------

 PostgreSQL 9.3.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 
20140911 (Red Hat 4.8.3-9), 64-bit

(1 row)


 \d met_vaer_wisline.nora_bc25_observation;

Table "met_vaer_wisline.nora_bc25_observation"

       Column       |  Type   | Modifiers

--------------------+---------+-----------

 point_uid_ref      | integer | not null

 epoch              | integer | not null

 windspeed_10m      | real    |

 air_temperature_2m | real    |

Indexes:

    "idx_met_vaer_wisline_nora_bc25_observation_epoch" btree (epoch)

    "idx_met_vaer_wisline_nora_bc25_observation_point_uid_ref" btree 
(point_uid_ref)



\d met_vaer_wisline.new_data ;

     Unlogged table "met_vaer_wisline.new_data"

       Column       |       Type        | Modifiers

--------------------+-------------------+-----------

 windspeed_10m      | real              |

 air_temperature_2m | real              |

 lon                | character varying | not null

 lat                | character varying | not null

 epoch              | integer           |

 epoch_as_numeric   | numeric           | not null

 rest               | character varying |

 id_point           | integer           |

Indexes:

    "idx_met_vaer_wisline_new_data_epoch" btree (epoch)

    "idx_met_vaer_wisline_new_data_id_point" btree (id_point)


vacuum analyze met_vaer_wisline.nora_bc25_observation;


vacuum analyze met_vaer_wisline.new_data;


SELECT count(*) from met_vaer_wisline.new_data;

 count

-------

 50000

(1 row)


SELECT count(*) from met_vaer_wisline.nora_bc25_observation ;

   count

------------

  4263866304


Thanks .


Lars


Reply via email to