Re: [PERFORM] query performance with hstore vs. non-hstore

Pavel Stehule Mon, 01 Sep 2014 00:08:54 -0700

2014-09-01 8:54 GMT+02:00 Huang, Suya <[email protected]>:

>  Thank you Pavel.
>
>
>
> The cost of unpacking hstore comparing to non-hstore could be calculated
> by:
>
> Seq scan on hstore table + hash join with seg1 table:
>
> Hstore: 416.741+ 34619.879 =~34 seconds
>
> Non-hstore: 8858.594 +26477.652 =~ 34 seconds
>
>
>
> The subsequent hash-aggregate and sort operation should be working on the
> unpacked hstore rows which has same row counts as non-hstore table.
> however, timing on those operations actually makes the big difference.
>


>
> I don’t quite get why…
>

These values can be messy -- timing in EXPLAIN ANALYZE has relative big
impact but different for some methods

try to watch complete time for EXPLAIN (ANALYZE, TIMING OFF)


>
>
> Thanks,
>
> Suya
>
>
>
> *From:* Pavel Stehule [mailto:[email protected]]
> *Sent:* Monday, September 01, 2014 4:22 PM
> *To:* Huang, Suya
> *Cc:* [email protected]
> *Subject:* Re: [PERFORM] query performance with hstore vs. non-hstore
>
>
>
> Hi
>
> In this use case hstore should not help .. there is relative high overhead
> related with unpacking hstore -- so classic schema is better.
>
> Hstore should not to replace well normalized schema - it should be a
> replace for some semi normalized structures as EAV.
>
> Hstore can have some profit from TOAST .. comprimation, less system data
> overhead, but this advantage started from some length of data. You should
> to see this benefit on table size. When table with HStore is less than
> without, then there is benefit of Hstore. Last benefit of Hstore are
> indexes over tuple (key, value) .. but you don't use it.
>
> Regards
>
> Pavel
>
>
>
> 2014-09-01 8:10 GMT+02:00 Huang, Suya <[email protected]>:
>
> Hi ,
>
>
>
> I’m tweaking table layout to get better performance of query. One table
> doesn’t use hstore but expand all metrics of cha_type to different rows.
> The other table has hstore for metrics column as cha_type->metrics so it
> has less records than the first one.
>
>
>
> I would be expecting the query on seconds table has better performance
> than the first one. However, it’s not the case at all. I’m wondering if
> there’s something wrong with my execution plan? With the hstore table, the
> optimizer has totally wrong estimation on row counts at hash aggregate
> stage and it takes 34 seconds on hash-join,25 seconds on hash-aggregate, 10
> seconds on sort. However, with non-hstore table, it takes 17 seconds on
> hash join, 18 seconds on hashaggregate and 2 seconds on sort.
>
>
>
> Can someone help me to explain why this is happening? And is there a way
> to fine-tune the query?
>
>
>
> Table structure
>
>
>
> dev=# \d+ weekly_non_hstore
>
>                      Table "test.weekly_non_hstore"
>
>   Column  |          Type          | Modifiers | Storage  | Stats target |
> Description
>
>
> ----------+------------------------+-----------+----------+--------------+-------------
>
> date     | date                   |           | plain    |              |
>
> ref_id    | character varying(256) |           | extended |              |
>
> cha_typel  | text                   |           | extended |              |
>
> visits   | double precision       |           | plain    |              |
>
> pages    | double precision       |           | plain    |              |
>
> duration | double precision       |           | plain    |              |
>
> Has OIDs: no
>
> Tablespace: "tbs_data"
>
>
>
> dev=# \d+ weekly_hstore
>
>                        Table "test.weekly_hstore"
>
>   Column  |          Type          | Modifiers | Storage  | Stats target |
> Description
>
>
> ----------+------------------------+-----------+----------+--------------+-------------
>
> date     | date                   |           | plain    |              |
>
> ref_id    | character varying(256) |           | extended |              |
>
> visits   | hstore                 |           | extended |              |
>
> pages    | hstore                 |           | extended |              |
>
> duration | hstore                 |           | extended |              |
>
> Has OIDs: no
>
> Tablespace: "tbs_data"
>
>
>
> dev=# select count(*) from weekly_non_hstore;
>
>   count
>
> ----------
>
> 71818882
>
> (1 row)
>
>
>
>
>
> dev=# select count(*) from weekly_hstore;
>
>   count
>
> ---------
>
> 1292314
>
> (1 row)
>
>
>
>
>
> Query
>
> dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore
> a  join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits)
> desc;
>
>
>              QUERY PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort  (cost=3674073.37..3674431.16 rows=143115 width=27) (actual
> time=47520.637..47969.658 rows=3639539 loops=1)
>
>    Sort Key: (sum(a.visits))
>
>    Sort Method: quicksort  Memory: 391723kB
>
>    ->  HashAggregate  (cost=3660386.70..3661817.85 rows=143115 width=27)
> (actual time=43655.637..44989.202 rows=3639539 loops=1)
>
>          ->  Hash Join  (cost=12029.58..3301286.54 rows=71820032 width=27)
> (actual time=209.789..26477.652 rows=36962761 loops=1)
>
>                Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
>
>                ->  Seq Scan on weekly_non_hstore a  (cost=0.00..1852856.32
> rows=71820032 width=75) (actual time=0.053..8858.594 rows=71818882 loops=1)
>
>                ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47)
> (actual time=209.189..209.189 rows=371759 loops=1)
>
>                      Buckets: 65536  Batches: 1  Memory Usage: 28951kB
>
>                      ->  Seq Scan on seg1 b  (cost=0.00..7382.59
> rows=371759 width=47) (actual time=0.014..64.695 rows=371759 loops=1)
>
> Total runtime: 48172.405 ms
>
> (11 rows)
>
>
>
> Time: 48173.569 ms
>
>
>
> dev=# explain analyze select cha_type, sum(visits) from (select
> (each(visits)).key as cha_type,(each(visits)).value::numeric as visits from
> weekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo  group by cha_type
> order by sum(visits) desc;
>
>                                                                        QUERY
> PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort  (cost=7599039.89..7599040.39 rows=200 width=64) (actual
> time=70424.561..70986.202 rows=3639539 loops=1)
>
>    Sort Key: (sum((((each(a.visits)).value)::numeric)))
>
>    Sort Method: quicksort  Memory: 394779kB
>
>    ->  HashAggregate  (cost=7599030.24..7599032.24 rows=200 width=64)
> (actual time=59267.120..60502.647 rows=3639539 loops=1)
>
>          ->  Hash Join  (cost=12029.58..2022645.24 rows=371759000
> width=184) (actual time=186.140..34619.879 rows=36962761 loops=1)
>
>                Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
>
>                ->  Seq Scan on weekly_hstore a  (cost=0.00..133321.14
> rows=1292314 width=230) (actual time=0.107..416.741 rows=1292314 loops=1)
>
>                ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47)
> (actual time=185.742..185.742 rows=371759 loops=1)
>
>                      Buckets: 65536  Batches: 1  Memory Usage: 28951kB
>
>                      ->  Seq Scan on seg1 b  (cost=0.00..7382.59
> rows=371759 width=47) (actual time=0.016..62.123 rows=371759 loops=1)
>
> Total runtime: 71177.675 ms
>
>
>

Re: [PERFORM] query performance with hstore vs. non-hstore

Reply via email to