----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27719/ -----------------------------------------------------------
Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: Hive-8756 https://issues.apache.org/jira/browse/Hive-8756 Repository: hive-git Description ------- numRows and rawDataSize are not collected by the Spark stats. That is caused by the FileSinkOperator in the ReduceWork is not set the stats config. In the GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new FileSinkOperator is generated and set to the reduce work. However, during processFileSink, the original FileSinkOperator is set the collectStats tag in GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the ReduceWork. Diffs ----- itests/src/test/resources/testconfiguration.properties 79a0132 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 8290568 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27719/diff/ Testing ------- Thanks, Na Yang