[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0

2017-06-23 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060633#comment-16060633
 ] 

Nandor Kollar edited comment on PIG-5157 at 6/23/17 12:50 PM:
--

I'll update on RB soon, ran with my latest patch:
{code}
ant  -Dtest.junit.output.format=xml clean  -Dtestcase=TestGrunt  
-Dexectype=spark  -Dhadoopversion=2  test
...
[junit] Tests run: 67, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 
64.505 sec
   [delete] Deleting directory 
/var/folders/0n/97lfzsrs3dj1nlgfghj3221wgp/T/pig_junit_tmp1871324592

BUILD SUCCESSFUL
Total time: 3 minutes 8 seconds
{code}

[~kellyzly] could you please execute the tests with my latest patch 
(PIG-5157_11.patch)?


was (Author: nkollar):
I'll update on RB soon, ran with my latest patch:
{code}
ant  -Dtest.junit.output.format=xml clean  -Dtestcase=TestGrunt  
-Dexectype=spark  -Dhadoopversion=2  test
...
[junit] Tests run: 67, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 
64.505 sec
   [delete] Deleting directory 
/var/folders/0n/97lfzsrs3dj1nlgfghj3221wgp/T/pig_junit_tmp1871324592

BUILD SUCCESSFUL
Total time: 3 minutes 8 seconds
{code}

> Upgrade to Spark 2.0
> 
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0

2017-06-23 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060574#comment-16060574
 ] 

liyunzhang_intel edited comment on PIG-5157 at 6/23/17 8:29 AM:


[~nkollar]:looks good, but met some problem when testing in local and 
yarn-client, give me more time to verify the problem is caused by the 
configuration or others. thanks!
after using this patch,  the result of unit test
{code}
 ant  -Dtest.junit.output.format=xml clean  -Dtestcase=TestGrunt  
-Dexectype=spark  -Dhadoopversion=2  test

{code}
the result:
{noformat}
Tests run: 67, Failures: 1, Errors: 5, Skipped: 4, Time elapsed: 138.459 sec

{noformat}

I will investigate the reason in my env but can you verify it in your env?


was (Author: kellyzly):
[~nkollar]:looks good, but met some problem when testing in local and 
yarn-client, give me more time to verify the problem is caused by the 
configuration or others. thanks!
after using this patch,  the result of unit test
{code}
 ant  -Dtest.junit.output.format=xml clean  -Dtestcase=TestGrunt  
-Dexectype=spark  -Dhadoopversion=2  test

{code}
the result:
{noformat}
Tests run: 67, Failures: 1, Errors: 5, Skipped: 4, Time elapsed: 138.459 sec

{noformat}

> Upgrade to Spark 2.0
> 
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0

2017-06-13 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047531#comment-16047531
 ] 

liyunzhang_intel edited comment on PIG-5157 at 6/13/17 7:36 AM:


[~nkollar]:  made some comments on review board.
 can you update patch with latest code?
latest code
{noformat}
* 5c55102 - (origin/trunk, origin/HEAD) PIG-4700: Enable progress reporting for 
Tasks in Tez (satishsaley via rohini) (7 days ago) 
{noformat}  
when i download the patch from review board and patch like following
{code}
 patch -p0
{noformat}  
when i download the patch from review board and patch like following
{code}
 patch -p0 Upgrade to Spark 2.0
> 
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0

2017-05-31 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030799#comment-16030799
 ] 

liyunzhang_intel edited comment on PIG-5157 at 5/31/17 7:54 AM:


[~nkollar]:
bq. in JobMetricsListener.java there's a huge code section commented out 
(uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we 
enable that?
the reason to modify it is because [~rohini] suggested that [memory| is used a 
lot if we update metric info in onTaskEnd()(suppose there are thousand tasks)
in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of 
spark21, we should use code like following 
notice: not fully test, can not guarantee it is right.
{code}
  public void onStageCompleted(SparkListenerStageCompleted stageCompleted) {
int stageId = stageCompleted.stageInfo().stageId();
int stageAttemptId = stageCompleted.stageInfo().attemptId();
String stageIdentifier = stageId + "_" + stageAttemptId;
Integer jobId = stageIdToJobId.get(stageId);
if (jobId == null) {
LOG.warn("Cannot find job id for stage[" + stageId + "].");
} else {
Map> jobMetrics = 
allJobMetrics.get(jobId);
if (jobMetrics == null) {
jobMetrics = Maps.newHashMap();
allJobMetrics.put(jobId, jobMetrics);
}
List stageMetrics = jobMetrics.get(stageIdentifier);
if (stageMetrics == null) {
stageMetrics = Lists.newLinkedList();
jobMetrics.put(stageIdentifier, stageMetrics);
}
 
 stageMetrics.add(stageCompleted.stageInfo().taskMetrics());
}
}
public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) {
}
{code}
bq. I removed JobLogger, do we need it? It seems that a property called 
'spark.eventLog.enabled' is the proper replacement for this class, should we 
use it instead? It looks like JobLogger became deprecated and was removed from 
Spark 2.
It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2



was (Author: kellyzly):
[~nkollar]:
bq. in JobMetricsListener.java there's a huge code section commented out 
(uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we 
enable that?
the reason to modify it is because [~rohini] suggested that [memory| is used a 
lot if we update metric info in onTaskEnd()(suppose there are thousand tasks)
in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of 
spark21, we should use code like following 
notice: not fully test, can not guarantee it is right.
{code}
  public void onStageCompleted(SparkListenerStageCompleted stageCompleted) {
if we update taskMetrics in onTaskEnd(), it consumes lot of memory.
int stageId = stageCompleted.stageInfo().stageId();
int stageAttemptId = stageCompleted.stageInfo().attemptId();
String stageIdentifier = stageId + "_" + stageAttemptId;
Integer jobId = stageIdToJobId.get(stageId);
if (jobId == null) {
LOG.warn("Cannot find job id for stage[" + stageId + "].");
} else {
Map> jobMetrics = 
allJobMetrics.get(jobId);
if (jobMetrics == null) {
jobMetrics = Maps.newHashMap();
allJobMetrics.put(jobId, jobMetrics);
}
List stageMetrics = jobMetrics.get(stageIdentifier);
if (stageMetrics == null) {
stageMetrics = Lists.newLinkedList();
jobMetrics.put(stageIdentifier, stageMetrics);
}
 
 stageMetrics.add(stageCompleted.stageInfo().taskMetrics());
}
}
public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) {
}
{code}
bq. I removed JobLogger, do we need it? It seems that a property called 
'spark.eventLog.enabled' is the proper replacement for this class, should we 
use it instead? It looks like JobLogger became deprecated and was removed from 
Spark 2.
It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2


> Upgrade to Spark 2.0
> 
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PIG-5157) Upgrade to Spark 2.0

2017-05-29 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028477#comment-16028477
 ] 

Nandor Kollar edited comment on PIG-5157 at 5/29/17 3:52 PM:
-

[~kellyzly] could you please have a look at my patch? There are two 
questionable change:
- in JobMetricsListener.java there's a huge code section commented out 
(uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we 
enable that?
- I removed JobLogger, do we need it? It seems that a property called 
'spark.eventLog.enabled' is the proper replacement for this class, should we 
use it instead? It looks like JobLogger became deprecated and was removed from 
Spark 2.


was (Author: nkollar):
[~kellyzly] could you please have a look at my patch? There are two 
questionable change:
- in JobMetricsListener.java there's a huge code section commented out 
(uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we 
enable that?
- I didn't find a proper replacement for JobLogger, hence it is removed. What 
was it used for? It looks like it became deprecated and was removed from Spark.

> Upgrade to Spark 2.0
> 
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.17.0
>
> Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)