[ 
https://issues.apache.org/jira/browse/HIVE-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245801#comment-13245801
 ] 

Ashutosh Chauhan commented on HIVE-2869:
----------------------------------------

@Shrijeet, Take a look at 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute on how to 
contribute. Essentially, you need to submit a patch. 
                
> Merging small files throws RuntimeException when hive.mergejob.maponly=false
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-2869
>                 URL: https://issues.apache.org/jira/browse/HIVE-2869
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>         Environment: CentOS release 5.5 (Final)
>            Reporter: Shrijeet Paliwal
>         Attachments: data_to_reproduce.tar.gz
>
>
> Hive Version: Hive 0.8 (last commit SHA  
> b581a6192b8d4c544092679d05f45b2e50d42b45 ) 
> Hadoop version : chd3u0
> Trying to use the hive merge small file feature by setting all the necessary 
> params.
> Have disabled use of CombineHiveInputFormat since my input is compressed 
> text. 
> {noformat}
> hive> set mapred.min.split.size.per.node=1000000000;
> hive> set mapred.min.split.size.per.rack=1000000000;
> hive> set mapred.max.split.size=1000000000;
> hive> set hive.merge.size.per.task=1000000000;
> hive> set hive.merge.smallfiles.avgsize=1000000000;
> hive> set hive.merge.size.smallfiles.avgsize=1000000000;
> hive> set hive.merge.mapfiles=true;
> hive> set hive.merge.mapredfiles=true;
> hive> set hive.mergejob.maponly=false;
> {noformat}
> The plan decides to launch two MR jobs but after first job succeeds I get 
> runt time error 
> "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce 
> operator specified"
> *How to reproduce :* 
> * Creare tables as follows : 
> {code}
> --create input table
> create table tmp_notmerged (
>   id                int,
>   name              string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
> --create o/p table
> create table tmp_merged (
>   id                int
> )
> STORED AS TEXTFILE;
> {code}
> * Load data into tmp_notmerged (find files attached in with this jira)
> * set knobs and fire hive query 
> {code}
> set hive.merge.mapfiles=true;
> set hive.mergejob.maponly=false;
> insert overwrite table tmp_merged select id from tmp_notmerged;
> {code}
> * You should see error "java.lang.RuntimeException: Plan invalid, Reason: 
> Reducers == 0 but reduce operator specified"
> *Proposed fix :*
> Patch is here : https://gist.github.com/2025303

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to