Re: [HACKERS] How to change order sort of table in HashJoin

2016-11-21 Thread Man

Thanks for reply, sir.

On 11/21/2016 1:39 AM, Tom Lane wrote:

Man  writes:

Additional information.
In 9.6 the second table (lesser tuple) was choosen (the same testdata).
There are something (cost estimation?) different  in previous versions.

I'd bet on different statistics in the two installations (either you
forgot to ANALYZE, or the random sample came up quite a bit different).
And I'm a little suspicious that these tests weren't all done with the
same work_mem setting.


I dumped the two tables in pg9.4 and restored to pg9.6, sir.
I also set default_statistics_target to 1000 and ANALYZE d two tables in 
both installations.

And so that were result.

Anyway i know that order can not change by tuning parameters because it 
depend on storing data, thanks.



regards, tom lane


Thanks and best regards,


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change order sort of table in HashJoin

2016-11-20 Thread Tom Lane
Man  writes:
> Additional information.
> In 9.6 the second table (lesser tuple) was choosen (the same testdata).
> There are something (cost estimation?) different  in previous versions.

I'd bet on different statistics in the two installations (either you
forgot to ANALYZE, or the random sample came up quite a bit different).
And I'm a little suspicious that these tests weren't all done with the
same work_mem setting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change order sort of table in HashJoin

2016-11-20 Thread Man

Thanks for response, sir.

On 11/20/2016 1:18 AM, Tom Lane wrote:

Man Trieu  writes:

As in the example below, i think the plan which hash table is created on
testtbl2 (the fewer tuples) should be choosen.

The planner usually prefers to hash on the table that has a flatter
MCV histogram, since a hash table with many key collisions will be
inefficient.  You might find it illuminating to read the comments around
estimate_hash_bucketsize().


Thanks, I will read it.

Additional information.
In 9.6 the second table (lesser tuple) was choosen (the same testdata).
There are something (cost estimation?) different  in previous versions.

--- In 9.6.1 ---
postgres=# explain analyze select * from testtbl1 inner join testtbl2 
using(c1,c2,c3);

 QUERY PLAN

-
 Hash Join  (cost=6935.57..60389.58 rows=1 width=60) (actual 
time=80.214..1165.762 rows=142857 loops=1)
   Hash Cond: ((testtbl1.c1 = testtbl2.c1) AND (testtbl1.c2 = 
testtbl2.c2) AND (testtbl1.c3 = testtbl2.c3))
   ->  Seq Scan on testtbl1  (cost=0.00..21276.00 rows=100 
width=56) (actual time=0.038..226.324 rows=100 loops=1)
   ->  Hash  (cost=3039.57..3039.57 rows=142857 width=56) (actual 
time=79.632..79.632 rows=142857 loops=1)

 Buckets: 65536  Batches: 4  Memory Usage: 3658kB
 ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 
width=56) (actual time=0.028..20.646 rows=142857 loops=1)

 Planning time: 0.252 ms
 Execution time: 1174.588 ms
(8 rows)
--

--- In 9.4.10 ---
postgres=# explain analyze select * from testtbl1 inner join testtbl2 
using(c1,c2,c3);

   QUERY PLAN

-
 Hash Join  (cost=48542.00..67353.86 rows=1 width=60) (actual 
time=880.580..1277.611 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = 
testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56) 
(actual time=0.016..24.421 rows=142857 loops=1)
   ->  Hash  (cost=21276.00..21276.00 rows=100 width=56) (actual 
time=878.296..878.296 rows=100 loops=1)

 Buckets: 8192  Batches: 32  Memory Usage: 2839kB
 ->  Seq Scan on testtbl1  (cost=0.00..21276.00 rows=100 
width=56) (actual time=0.025..258.193 rows=100 loops=1)

 Planning time: 2.683 ms
 Execution time: 1285.868 ms
(8 rows)
--


In general, given a hashtable that fits in memory and light bucket
loading, a hash join is more or less O(M) + O(N); it doesn't matter
so much whether the larger table is on the inside.  It does matter if
the table gets big enough to force batching of the join, but that's
not happening in your example (at least not the first one; it's unclear
to me why it did happen in the second one).  The key thing that will
drive the choice, then, is avoiding a skewed bucket distribution that
causes lots of comparisons for common values.

regards, tom lane


Thanks and best regards,


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change order sort of table in HashJoin

2016-11-19 Thread Tom Lane
Man Trieu  writes:
> As in the example below, i think the plan which hash table is created on
> testtbl2 (the fewer tuples) should be choosen.

The planner usually prefers to hash on the table that has a flatter
MCV histogram, since a hash table with many key collisions will be
inefficient.  You might find it illuminating to read the comments around
estimate_hash_bucketsize().

In general, given a hashtable that fits in memory and light bucket
loading, a hash join is more or less O(M) + O(N); it doesn't matter
so much whether the larger table is on the inside.  It does matter if
the table gets big enough to force batching of the join, but that's
not happening in your example (at least not the first one; it's unclear
to me why it did happen in the second one).  The key thing that will
drive the choice, then, is avoiding a skewed bucket distribution that
causes lots of comparisons for common values.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers