[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060633#comment-16060633 ] Nandor Kollar edited comment on PIG-5157 at 6/23/17 12:50 PM: -- I'll update on RB soon, ran with my latest patch: {code} ant -Dtest.junit.output.format=xml clean -Dtestcase=TestGrunt -Dexectype=spark -Dhadoopversion=2 test ... [junit] Tests run: 67, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 64.505 sec [delete] Deleting directory /var/folders/0n/97lfzsrs3dj1nlgfghj3221wgp/T/pig_junit_tmp1871324592 BUILD SUCCESSFUL Total time: 3 minutes 8 seconds {code} [~kellyzly] could you please execute the tests with my latest patch (PIG-5157_11.patch)? was (Author: nkollar): I'll update on RB soon, ran with my latest patch: {code} ant -Dtest.junit.output.format=xml clean -Dtestcase=TestGrunt -Dexectype=spark -Dhadoopversion=2 test ... [junit] Tests run: 67, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 64.505 sec [delete] Deleting directory /var/folders/0n/97lfzsrs3dj1nlgfghj3221wgp/T/pig_junit_tmp1871324592 BUILD SUCCESSFUL Total time: 3 minutes 8 seconds {code} > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060574#comment-16060574 ] liyunzhang_intel edited comment on PIG-5157 at 6/23/17 8:29 AM: [~nkollar]:looks good, but met some problem when testing in local and yarn-client, give me more time to verify the problem is caused by the configuration or others. thanks! after using this patch, the result of unit test {code} ant -Dtest.junit.output.format=xml clean -Dtestcase=TestGrunt -Dexectype=spark -Dhadoopversion=2 test {code} the result: {noformat} Tests run: 67, Failures: 1, Errors: 5, Skipped: 4, Time elapsed: 138.459 sec {noformat} I will investigate the reason in my env but can you verify it in your env? was (Author: kellyzly): [~nkollar]:looks good, but met some problem when testing in local and yarn-client, give me more time to verify the problem is caused by the configuration or others. thanks! after using this patch, the result of unit test {code} ant -Dtest.junit.output.format=xml clean -Dtestcase=TestGrunt -Dexectype=spark -Dhadoopversion=2 test {code} the result: {noformat} Tests run: 67, Failures: 1, Errors: 5, Skipped: 4, Time elapsed: 138.459 sec {noformat} > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047531#comment-16047531 ] liyunzhang_intel edited comment on PIG-5157 at 6/13/17 7:36 AM: [~nkollar]: made some comments on review board. can you update patch with latest code? latest code {noformat} * 5c55102 - (origin/trunk, origin/HEAD) PIG-4700: Enable progress reporting for Tasks in Tez (satishsaley via rohini) (7 days ago) {noformat} when i download the patch from review board and patch like following {code} patch -p0 {noformat} when i download the patch from review board and patch like following {code} patch -p0 Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030799#comment-16030799 ] liyunzhang_intel edited comment on PIG-5157 at 5/31/17 7:54 AM: [~nkollar]: bq. in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? the reason to modify it is because [~rohini] suggested that [memory| is used a lot if we update metric info in onTaskEnd()(suppose there are thousand tasks) in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of spark21, we should use code like following notice: not fully test, can not guarantee it is right. {code} public void onStageCompleted(SparkListenerStageCompleted stageCompleted) { int stageId = stageCompleted.stageInfo().stageId(); int stageAttemptId = stageCompleted.stageInfo().attemptId(); String stageIdentifier = stageId + "_" + stageAttemptId; Integer jobId = stageIdToJobId.get(stageId); if (jobId == null) { LOG.warn("Cannot find job id for stage[" + stageId + "]."); } else { Map> jobMetrics = allJobMetrics.get(jobId); if (jobMetrics == null) { jobMetrics = Maps.newHashMap(); allJobMetrics.put(jobId, jobMetrics); } List stageMetrics = jobMetrics.get(stageIdentifier); if (stageMetrics == null) { stageMetrics = Lists.newLinkedList(); jobMetrics.put(stageIdentifier, stageMetrics); } stageMetrics.add(stageCompleted.stageInfo().taskMetrics()); } } public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) { } {code} bq. I removed JobLogger, do we need it? It seems that a property called 'spark.eventLog.enabled' is the proper replacement for this class, should we use it instead? It looks like JobLogger became deprecated and was removed from Spark 2. It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2 was (Author: kellyzly): [~nkollar]: bq. in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? the reason to modify it is because [~rohini] suggested that [memory| is used a lot if we update metric info in onTaskEnd()(suppose there are thousand tasks) in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of spark21, we should use code like following notice: not fully test, can not guarantee it is right. {code} public void onStageCompleted(SparkListenerStageCompleted stageCompleted) { if we update taskMetrics in onTaskEnd(), it consumes lot of memory. int stageId = stageCompleted.stageInfo().stageId(); int stageAttemptId = stageCompleted.stageInfo().attemptId(); String stageIdentifier = stageId + "_" + stageAttemptId; Integer jobId = stageIdToJobId.get(stageId); if (jobId == null) { LOG.warn("Cannot find job id for stage[" + stageId + "]."); } else { Map> jobMetrics = allJobMetrics.get(jobId); if (jobMetrics == null) { jobMetrics = Maps.newHashMap(); allJobMetrics.put(jobId, jobMetrics); } List stageMetrics = jobMetrics.get(stageIdentifier); if (stageMetrics == null) { stageMetrics = Lists.newLinkedList(); jobMetrics.put(stageIdentifier, stageMetrics); } stageMetrics.add(stageCompleted.stageInfo().taskMetrics()); } } public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) { } {code} bq. I removed JobLogger, do we need it? It seems that a property called 'spark.eventLog.enabled' is the proper replacement for this class, should we use it instead? It looks like JobLogger became deprecated and was removed from Spark 2. It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2 > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028477#comment-16028477 ] Nandor Kollar edited comment on PIG-5157 at 5/29/17 3:52 PM: - [~kellyzly] could you please have a look at my patch? There are two questionable change: - in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? - I removed JobLogger, do we need it? It seems that a property called 'spark.eventLog.enabled' is the proper replacement for this class, should we use it instead? It looks like JobLogger became deprecated and was removed from Spark 2. was (Author: nkollar): [~kellyzly] could you please have a look at my patch? There are two questionable change: - in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? - I didn't find a proper replacement for JobLogger, hence it is removed. What was it used for? It looks like it became deprecated and was removed from Spark. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.17.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)