[jira] [Commented] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?

wangmeng (JIRA) Mon, 23 Jun 2014 23:31:18 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041761#comment-14041761
 ]


wangmeng commented on HIVE-7277:
--------------------------------

well, the MR api  can not  fit up  with  this logical  plan  to  generate the 
physical plan .





--

Best      Regards
HomePage:http://wangmeng.us/
Name:    Wang Meng---Data structures and Algorithms,Java，Jvm, Linux, Shell, 
Distributed system , Hadoop  Hive , Performancse Optimization and Debug 
,Spark/Shark 
Major:     Software Engineering
Degree:  Master
E-mail:   [email protected]   [email protected]
GitHub:    https://github.com/sjtufighter







> how to decide reduce numbers   according  to  the input size of reduce stage 
> rather than the  input size of  map stage?
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7277
>                 URL: https://issues.apache.org/jira/browse/HIVE-7277
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: wangmeng
>             Fix For: 0.13.0
>
>
> As we  know ,now  hive decide the  reduce numbers  just by  the " Input size 
> of   map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I  think  the out put size of map stage  may have a big difference from  
> the original  input size , so I  think  this  strategy to decide 
> reduce-numbers may be improper....
> So is   there any feature  which can decide the reduce number just  according 
> to the out put  of the map stage.?    thanks .  
>  As  I know , actually ,the reduce stage will begin just  after some map 
> tasks have finished rather than until  the  whole map stage have finished , 
> so I  think  it is improper too  decide reduce numbers   when  the  whole map 
> stage  have finished.
> As  someone point ,We can just according to  the out put size of the  
> earliest map tasks which have finished   to  estimate the whole reduce 
> numbers......However,   in fact ,now Hive has used filter push down(where) 
> ,which may  resulting a big  difference from each map task .
> So，  this  estimation  is improper.
> thanks .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?

Reply via email to