On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23j...@gmail.com> wrote:

> Hi,
>
> Please see the following plan:
>
> postgres=# explain select * from small_table left outer join big_table
> using (id);
>                                  QUERY PLAN
>
>
> ----------------------------------------------------------------------------
>  Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
>    Hash Cond: (small_table.id = big_table.id)
>    ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
>    ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
>          ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000
> width=8)
> (5 rows)
>
> Here I have a puzzle, why not choose the small table to build hash table?
> It can avoid multiple batches thus save significant I/O cost, isn't it?
>
> We can perform this query in two phases:
> 1) inner join, using the small table to build hash table.
> 2) check whether each tuple in the hash table has matches before, which can
> be done with another flag bit
>
> The only compromise is the output order, due to the two separate phases.
> Not sure whether the SQL standard requires it.
>
>
SQL standard does not require the result to be in any particular order
unless an ORDER BY is used.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Reply via email to