Pradeep Kamath created HIVE-3733:
------------------------------------

             Summary: Improve Hive's logic for conditional merge
                 Key: HIVE-3733
                 URL: https://issues.apache.org/jira/browse/HIVE-3733
             Project: Hive
          Issue Type: Improvement
            Reporter: Pradeep Kamath


If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is 
set to false then when hive encounters a FileSinkOperator when generating map 
reduce tasks, it will look at the entire job to see if it has a reducer, if it 
does it will not merge. Instead it should be check if the FileSinkOperator is a 
child of the reducer. This means that outputs generated in the mapper will be 
merged, and outputs generated in the reducer will not be, the intended effect 
of setting those configs.

Simple repro:

set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=false;
EXPLAIN
FROM <input_table>
INSERT OVERWRITE TABLE <output_table1> SELECT key, COUNT(*) group by key
INSERT OVERWRITE TABLE <output_table2> SELECT *;

The output should contain a Conditional Operator, Mapred Stages, and Move tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to