----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27719/ -----------------------------------------------------------
(Updated Nov. 7, 2014, 9:16 p.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Changes ------- 1. removed whilespace characters 2. handle operators which have multiple children 3. update stats config info for all cloned FileSinkOperators Bugs: Hive-8756 https://issues.apache.org/jira/browse/Hive-8756 Repository: hive-git Description ------- numRows and rawDataSize are not collected by the Spark stats. That is caused by the FileSinkOperator in the ReduceWork is not set the stats config. In the GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new FileSinkOperator is generated and set to the reduce work. However, during processFileSink, the original FileSinkOperator is set the collectStats tag in GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the ReduceWork. Diffs (updated) ----- itests/src/test/resources/testconfiguration.properties 79a0132 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 8290568 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5 ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 4946815 ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27719/diff/ Testing ------- Thanks, Na Yang