> On Feb. 4, 2015, 8:18 a.m., Praveen R wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, > > line 86 > > <https://reviews.apache.org/r/30262/diff/4/?file=844075#file844075line86> > > > > Any reason for using JavaSparkContext instead of SparkContext?
We were using both a SparkContext and JavaSparkContext in SparkLauncher -- I'd like to just use one -- JavaSparkContext is a Java-friendly version of SparkContext that works with Java collections instead of Scala ones to it makes more sense for us. - Mohit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30262/#review70932 ----------------------------------------------------------- On Feb. 2, 2015, 8:47 p.m., Mohit Sabharwal wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30262/ > ----------------------------------------------------------- > > (Updated Feb. 2, 2015, 8:47 p.m.) > > > Review request for pig, liyun zhang and Praveen R. > > > Bugs: PIG-4393 > https://issues.apache.org/jira/browse/PIG-4393 > > > Repository: pig-git > > > Description > ------- > > PIG-4393 : Add stats and error reporting for Spark > > After Pig submits a job to Spark cluster, we need to report job progress, > spark specific stats and any error logs back to the user. > > (1) It adds getting back status of basic success/failure for each Spark job. > (2) It adds logging of Spark specific stats in log file. Essentially, > registers a job metrics listener with spark context and collects spark task > level metrics and aggregates. > (3) It also re-factors code to correctly populate PigStats, which is used by > most unit tests. This should fix a bunch of unit tests. > > TODO items in a follow-up patch: > - Add #records to OutputStats for each job. > - Though StatsReportListener prints spark job progress in the logs, we also > probably need to implement PigProgressNotificationListener for spark. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java > db152b5 > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > b15994d > src/org/apache/pig/tools/pigstats/SparkStats.java fd45dd4 > src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java PRE-CREATION > src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java PRE-CREATION > src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java PRE-CREATION > > Diff: https://reviews.apache.org/r/30262/diff/ > > > Testing > ------- > > Tested with unit tests: > Compared to last Jenkins unit test run for the branch (baseline), two unit > tests TestToolsPigServer and TestStoreInstances are fixed. > Baseline: > https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/ > > > Example of Spark Job metrics that appear in logs: > > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - Spark Job [0] Metrics > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - > EexcutorDeserializeTime : 74 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - ExecutorRunTime : > 538 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - ResultSize : 2535 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - JvmGCTime : 0 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - > ResultSerializationTime : 1 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - MemoryBytesSpilled > : 0 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - DiskBytesSpilled : > 0 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - > RemoteBlocksFetched : 0 > 2015-01-29 23:06:42,520 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - LocalBlocksFetched > : 2 > 2015-01-29 23:06:42,521 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - TotalBlocksFetched > : 2 > 2015-01-29 23:06:42,521 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - FetchWaitTime : 0 > 2015-01-29 23:06:42,521 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBytesRead : 0 > 2015-01-29 23:06:42,521 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - > ShuffleBytesWritten : 918 > 2015-01-29 23:06:42,521 [main] INFO > org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleWriteTime : > 67000 > > > Thanks, > > Mohit Sabharwal > >