RE: hive Multi Table/File Inserts questions

Zhou Shuaifeng Tue, 22 Jun 2010 17:58:18 -0700

Hi Edward,
You said that there may be a "chain" of many map/reduce jobs. Is this
realized by the class "Chain"? (org.apache.hadoop.mapred.lib)
 
And I think it may save jobs if in the chain, output of one map/reduce job
can be the input of many other jobs, it will be more effective.
This means the "chain" have many branches, many jobs share the same input.
The struct of this chain likes a tree.
 
If not, it's a simple chain, it will be no effective, I think.
 
So, what's your opinion?
 
Regards,
Zhou
 
 
  _____

From: Edward Capriolo [mailto:[email protected]] 
Sent: Tuesday, June 22, 2010 11:32 PM
To: [email protected]
Cc: [email protected]
Subject: Re: hive Multi Table/File Inserts questions

On Tue, Jun 22, 2010 at 2:55 AM, Zhou Shuaifeng <[email protected]>
wrote:

Hi, when I use Multi Table/File Inserts commands, some may be not more
effective than run table insert commands separately.
Â 
For example, 
Â 
Â Â Â  from pokes 
Â Â Â  insert overwrite table pokes_count
Â Â Â  select bar,count(foo) group by bar
Â Â Â  insert overwrite table pokes_sum
Â Â Â  select bar,sum(foo) group by bar;
Â 
To execute this, 2 map/reduce jobs is needed, which is not less than run the
two command separately:
Â 
Â Â Â  insert overwrite table pokes_count select bar,count(foo) from pokesÂ
group by bar;
Â Â Â Â insert overwrite table pokes_sum select bar,sum(foo) from pokesÂ
group by bar;Â Â 
Â 
And the time taken is the same. 
But theÂ first one seems only scan the table 'pokes' once, why still need 2
map/reduce jobs? And why the time taken couldnot be less?
Is there any way to make it more effective?
Â 
Thanks a lot,
Zhou

This e-mail and its attachments contain confidential information from
HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any
use of the 
information contained herein in any way (including, but not limited to,
total or partial 
disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by 
phone or email immediately and delete it!

Â 

Zhou,

In the case of simple selects and a few tables you are not going to see the
full benefit.

Imagine some complex query was like this:

from (
Â  from (
Â Â Â  select (table1 join table2 where x=6) t1 
Â  ) x
Â  join table3 on x.col1 = t3.col1
) y

This could theoretically be a chain of thousands of map reduce jobs. Then
you would save jobs and time by only evaluating once. 

Also you are only testing with 2 output tables. What happens with 10 or 20?
Just curious.

Regards,
Edward

RE: hive Multi Table/File Inserts questions

Reply via email to