2014-09-01 8:54 GMT+02:00 Huang, Suya <suya.hu...@au.experian.com>: > Thank you Pavel. > > > > The cost of unpacking hstore comparing to non-hstore could be calculated > by: > > Seq scan on hstore table + hash join with seg1 table: > > Hstore: 416.741+ 34619.879 =~34 seconds > > Non-hstore: 8858.594 +26477.652 =~ 34 seconds > > > > The subsequent hash-aggregate and sort operation should be working on the > unpacked hstore rows which has same row counts as non-hstore table. > however, timing on those operations actually makes the big difference. >
> > I don’t quite get why… > These values can be messy -- timing in EXPLAIN ANALYZE has relative big impact but different for some methods try to watch complete time for EXPLAIN (ANALYZE, TIMING OFF) > > > Thanks, > > Suya > > > > *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com] > *Sent:* Monday, September 01, 2014 4:22 PM > *To:* Huang, Suya > *Cc:* pgsql-performance@postgresql.org > *Subject:* Re: [PERFORM] query performance with hstore vs. non-hstore > > > > Hi > > In this use case hstore should not help .. there is relative high overhead > related with unpacking hstore -- so classic schema is better. > > Hstore should not to replace well normalized schema - it should be a > replace for some semi normalized structures as EAV. > > Hstore can have some profit from TOAST .. comprimation, less system data > overhead, but this advantage started from some length of data. You should > to see this benefit on table size. When table with HStore is less than > without, then there is benefit of Hstore. Last benefit of Hstore are > indexes over tuple (key, value) .. but you don't use it. > > Regards > > Pavel > > > > 2014-09-01 8:10 GMT+02:00 Huang, Suya <suya.hu...@au.experian.com>: > > Hi , > > > > I’m tweaking table layout to get better performance of query. One table > doesn’t use hstore but expand all metrics of cha_type to different rows. > The other table has hstore for metrics column as cha_type->metrics so it > has less records than the first one. > > > > I would be expecting the query on seconds table has better performance > than the first one. However, it’s not the case at all. I’m wondering if > there’s something wrong with my execution plan? With the hstore table, the > optimizer has totally wrong estimation on row counts at hash aggregate > stage and it takes 34 seconds on hash-join,25 seconds on hash-aggregate, 10 > seconds on sort. However, with non-hstore table, it takes 17 seconds on > hash join, 18 seconds on hashaggregate and 2 seconds on sort. > > > > Can someone help me to explain why this is happening? And is there a way > to fine-tune the query? > > > > Table structure > > > > dev=# \d+ weekly_non_hstore > > Table "test.weekly_non_hstore" > > Column | Type | Modifiers | Storage | Stats target | > Description > > > ----------+------------------------+-----------+----------+--------------+------------- > > date | date | | plain | | > > ref_id | character varying(256) | | extended | | > > cha_typel | text | | extended | | > > visits | double precision | | plain | | > > pages | double precision | | plain | | > > duration | double precision | | plain | | > > Has OIDs: no > > Tablespace: "tbs_data" > > > > dev=# \d+ weekly_hstore > > Table "test.weekly_hstore" > > Column | Type | Modifiers | Storage | Stats target | > Description > > > ----------+------------------------+-----------+----------+--------------+------------- > > date | date | | plain | | > > ref_id | character varying(256) | | extended | | > > visits | hstore | | extended | | > > pages | hstore | | extended | | > > duration | hstore | | extended | | > > Has OIDs: no > > Tablespace: "tbs_data" > > > > dev=# select count(*) from weekly_non_hstore; > > count > > ---------- > > 71818882 > > (1 row) > > > > > > dev=# select count(*) from weekly_hstore; > > count > > --------- > > 1292314 > > (1 row) > > > > > > Query > > dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore > a join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits) > desc; > > > QUERY PLAN > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Sort (cost=3674073.37..3674431.16 rows=143115 width=27) (actual > time=47520.637..47969.658 rows=3639539 loops=1) > > Sort Key: (sum(a.visits)) > > Sort Method: quicksort Memory: 391723kB > > -> HashAggregate (cost=3660386.70..3661817.85 rows=143115 width=27) > (actual time=43655.637..44989.202 rows=3639539 loops=1) > > -> Hash Join (cost=12029.58..3301286.54 rows=71820032 width=27) > (actual time=209.789..26477.652 rows=36962761 loops=1) > > Hash Cond: ((a.ref_id)::text = (b.ref_id)::text) > > -> Seq Scan on weekly_non_hstore a (cost=0.00..1852856.32 > rows=71820032 width=75) (actual time=0.053..8858.594 rows=71818882 loops=1) > > -> Hash (cost=7382.59..7382.59 rows=371759 width=47) > (actual time=209.189..209.189 rows=371759 loops=1) > > Buckets: 65536 Batches: 1 Memory Usage: 28951kB > > -> Seq Scan on seg1 b (cost=0.00..7382.59 > rows=371759 width=47) (actual time=0.014..64.695 rows=371759 loops=1) > > Total runtime: 48172.405 ms > > (11 rows) > > > > Time: 48173.569 ms > > > > dev=# explain analyze select cha_type, sum(visits) from (select > (each(visits)).key as cha_type,(each(visits)).value::numeric as visits from > weekly_hstore a join seg1 b on a.ref_id=b.ref_id )foo group by cha_type > order by sum(visits) desc; > > QUERY > PLAN > > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Sort (cost=7599039.89..7599040.39 rows=200 width=64) (actual > time=70424.561..70986.202 rows=3639539 loops=1) > > Sort Key: (sum((((each(a.visits)).value)::numeric))) > > Sort Method: quicksort Memory: 394779kB > > -> HashAggregate (cost=7599030.24..7599032.24 rows=200 width=64) > (actual time=59267.120..60502.647 rows=3639539 loops=1) > > -> Hash Join (cost=12029.58..2022645.24 rows=371759000 > width=184) (actual time=186.140..34619.879 rows=36962761 loops=1) > > Hash Cond: ((a.ref_id)::text = (b.ref_id)::text) > > -> Seq Scan on weekly_hstore a (cost=0.00..133321.14 > rows=1292314 width=230) (actual time=0.107..416.741 rows=1292314 loops=1) > > -> Hash (cost=7382.59..7382.59 rows=371759 width=47) > (actual time=185.742..185.742 rows=371759 loops=1) > > Buckets: 65536 Batches: 1 Memory Usage: 28951kB > > -> Seq Scan on seg1 b (cost=0.00..7382.59 > rows=371759 width=47) (actual time=0.016..62.123 rows=371759 loops=1) > > Total runtime: 71177.675 ms > > >