-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/
-----------------------------------------------------------
(Updated Nov. 7, 2014, 9:16 p.m.)
Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
Changes
-------
1. removed whilespace characters
2. handle operators which have multiple children
3. update stats config info for all cloned FileSinkOperators
Bugs: Hive-8756
https://issues.apache.org/jira/browse/Hive-8756
Repository: hive-git
Description
-------
numRows and rawDataSize are not collected by the Spark stats. That is caused by
the FileSinkOperator in the ReduceWork is not set the stats config. In the
GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new
FileSinkOperator is generated and set to the reduce work. However, during
processFileSink, the original FileSinkOperator is set the collectStats tag in
GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the
ReduceWork.
Diffs (updated)
-----
itests/src/test/resources/testconfiguration.properties 79a0132
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java
8290568
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7
ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5
ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 4946815
ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d
ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27719/diff/
Testing
-------
Thanks,
Na Yang