----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30262/ -----------------------------------------------------------
(Updated Feb. 2, 2015, 8:47 p.m.) Review request for pig, liyun zhang and Praveen R. Changes ------- Corresponding to each POStore, there could be multiple Spark jobs. For example: StreamingConverter adds RDD action count(), which launches a separate job. Updated patch to address this. Bugs: PIG-4393 https://issues.apache.org/jira/browse/PIG-4393 Repository: pig-git Description ------- PIG-4393 : Add stats and error reporting for Spark After Pig submits a job to Spark cluster, we need to report job progress, spark specific stats and any error logs back to the user. (1) It adds getting back status of basic success/failure for each Spark job. (2) It adds logging of Spark specific stats in log file. Essentially, registers a job metrics listener with spark context and collects spark task level metrics and aggregates. (3) It also re-factors code to correctly populate PigStats, which is used by most unit tests. This should fix a bunch of unit tests. TODO items in a follow-up patch: - Add #records to OutputStats for each job. - Though StatsReportListener prints spark job progress in the logs, we also probably need to implement PigProgressNotificationListener for spark. Diffs (updated) ----- src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java db152b5 src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b15994d src/org/apache/pig/tools/pigstats/SparkStats.java fd45dd4 src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java PRE-CREATION src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java PRE-CREATION src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java PRE-CREATION Diff: https://reviews.apache.org/r/30262/diff/ Testing (updated) ------- Tested with unit tests: Compared to last Jenkins unit test run for the branch (baseline), two unit tests TestToolsPigServer and TestStoreInstances are fixed. Baseline: https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/ Example of Spark Job metrics that appear in logs: 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - Spark Job [0] Metrics 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - EexcutorDeserializeTime : 74 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - ExecutorRunTime : 538 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - ResultSize : 2535 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - JvmGCTime : 0 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - ResultSerializationTime : 1 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - MemoryBytesSpilled : 0 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - DiskBytesSpilled : 0 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBlocksFetched : 0 2015-01-29 23:06:42,520 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - LocalBlocksFetched : 2 2015-01-29 23:06:42,521 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - TotalBlocksFetched : 2 2015-01-29 23:06:42,521 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - FetchWaitTime : 0 2015-01-29 23:06:42,521 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBytesRead : 0 2015-01-29 23:06:42,521 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleBytesWritten : 918 2015-01-29 23:06:42,521 [main] INFO org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleWriteTime : 67000 Thanks, Mohit Sabharwal