Re: Two Tables Join (One Big table and other 1gb size table)

2015-10-13 Thread Gopal Vijayaraghavan

> I tried doing stream table, but ran for long time like 3 hrs : Looks
>like only 1 reducer is working on it
...
> on (trim(p.pid)=trim(c.p_id) and p.source='XYZ');

In case that's devolving to a cross-product, it might be a miss in pushing
down the trim() to the TableScan.

Are you using hive-13? If you're using a version >1.0.0, can you see if
the query prints a warning about cross-products?

Cheers,
Gopal




Two Tables Join (One Big table and other 1gb size table)

2015-10-13 Thread Kartik Eyan
Hi,
  I am trying to do inner join on two tables, but running for long time

Tab1 - 100GB
Tab2 - 2GB --  Partition table on source

I tried doing stream table, but ran for long time like 3 hrs : Looks like
only 1 reducer is working on it
I tried Map Join by increasing the mem, it failed.

Pls find the sample query:


set hive.ignore.mapjoin.hint=false;

SET mapred.reduce.tasks=320;

create table ev_claim_claimline_pat_test as

select /*+ streamtable(c) */ c.*, p.col1,p.col2,p.col3 from Tab2 p inner
join Tab1 c

on (trim(p.pid)=trim(c.p_id) and p.source='XYZ');


Can some one help me.


Thanks,

Karthik. B