Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2011-01-01 Thread Simon Riggs
On Thu, 2010-12-30 at 10:45 -0500, Tom Lane wrote:

 Comments?

Thanks for working on this. I love the reuse of tuple flags; I can't
help feeling that opens up doors, just not sure how yet...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Tom Lane
I had an epiphany about this topic, or actually two of them.

1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason.  The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right.  So we could
greatly reduce the scope of the dreaded FULL JOIN is only supported
with merge-joinable join conditions error.  (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)

2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData.  The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations.  (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.)  But we only need one bit, so what about commandeering
an infomask bit in the tuple itself?  For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2.  We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore.  But
that seems like a kluge that could wait until we really need the flag
space.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I had an epiphany about this topic, or actually two of them.

 1. Whether or not you think there's a significant performance reason
 to support hash right joins, there's a functionality reason.  The
 infrastructure for right join could just as easily do full joins.
 And AFAICS, a hash full join would only require one hashable join
 clause --- the other FULL JOIN ON conditions could be anything at all.
 This is unlike the situation for merge join, where all the JOIN ON
 conditions have to be mergeable or it doesn't work right.  So we could
 greatly reduce the scope of the dreaded FULL JOIN is only supported
 with merge-joinable join conditions error.  (Well, okay, it's not
 *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat.  It might be a lot faster in some cases, too.

 2. The obvious way to implement this would involve adding an extra bool
 field to struct HashJoinTupleData.  The difficulty with that, and the
 reason I'd been resistant to the whole idea, is that it'd eat up a full
 word per hashtable entry because of alignment considerations.  (On
 64-bit machines it'd be free because of alignment considerations, but
 that's cold comfort when 32-bit machines are the ones pressed for
 address space.)  But we only need one bit, so what about commandeering
 an infomask bit in the tuple itself?  For the initial implementation
 I'd be inclined to take one of the free bits in t_infomask2.  We could
 actually get away with overlaying the flag bit with one of the tuple
 visibility bits, since it will only be used in tuples that are in the
 in-memory hash table, which don't need visibility info anymore.  But
 that seems like a kluge that could wait until we really need the flag
 space.

I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ... But we only need one bit, so what about commandeering
 an infomask bit in the tuple itself?  For the initial implementation
 I'd be inclined to take one of the free bits in t_infomask2.  We could
 actually get away with overlaying the flag bit with one of the tuple
 visibility bits, since it will only be used in tuples that are in the
 in-memory hash table, which don't need visibility info anymore.  But
 that seems like a kluge that could wait until we really need the flag
 space.

 I think that's a reasonable approach, although I might be inclined to
 do the overlay sooner rather than later if it doesn't add too much
 complexity.

Well, there's no complexity involved, it's just which bit do we define
the macro as.  Any complexity is conceptual.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Jie Li
On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I had an epiphany about this topic, or actually two of them.
 
  1. Whether or not you think there's a significant performance reason
  to support hash right joins, there's a functionality reason.  The
  infrastructure for right join could just as easily do full joins.
  And AFAICS, a hash full join would only require one hashable join
  clause --- the other FULL JOIN ON conditions could be anything at all.
  This is unlike the situation for merge join, where all the JOIN ON
  conditions have to be mergeable or it doesn't work right.  So we could
  greatly reduce the scope of the dreaded FULL JOIN is only supported
  with merge-joinable join conditions error.  (Well, okay, it's not
  *that* dreaded, but people complain about it occasionally.)

 Yeah, that would be neat.  It might be a lot faster in some cases, too.


Yeah, PostgreSQL should have this great feature.

Actually Oracle 10g already has the right hash join,
http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html

 And Oracle 11g has the full hash join.
http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm

Haven't checked whether other DBMS have this feature.

Thanks,
Li Jie