[ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030799#comment-16030799
 ] 

liyunzhang_intel commented on PIG-5157:
---------------------------------------

[~nkollar]:
bq. in JobMetricsListener.java there's a huge code section commented out 
(uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we 
enable that?
the reason to modify it is because [~rohini] suggested that [memory| is used a 
lot if we update metric info in onTaskEnd()(suppose there are thousand tasks)
in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of 
spark21, we should use code like following 
notice: not fully test, can not guarantee it is right.
{code}
  public void onStageCompleted(SparkListenerStageCompleted stageCompleted) {
        if we update taskMetrics in onTaskEnd(), it consumes lot of memory.
        int stageId = stageCompleted.stageInfo().stageId();
        int stageAttemptId = stageCompleted.stageInfo().attemptId();
        String stageIdentifier = stageId + "_" + stageAttemptId;
        Integer jobId = stageIdToJobId.get(stageId);
        if (jobId == null) {
            LOG.warn("Cannot find job id for stage[" + stageId + "].");
        } else {
            Map<String, List<TaskMetrics>> jobMetrics = 
allJobMetrics.get(jobId);
            if (jobMetrics == null) {
                jobMetrics = Maps.newHashMap();
                allJobMetrics.put(jobId, jobMetrics);
            }
            List<TaskMetrics> stageMetrics = jobMetrics.get(stageIdentifier);
            if (stageMetrics == null) {
                stageMetrics = Lists.newLinkedList();
                jobMetrics.put(stageIdentifier, stageMetrics);
            }
 
             stageMetrics.add(stageCompleted.stageInfo().taskMetrics());
        }
    }
    public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) {
}
{code}
bq. I removed JobLogger, do we need it? It seems that a property called 
'spark.eventLog.enabled' is the proper replacement for this class, should we 
use it instead? It looks like JobLogger became deprecated and was removed from 
Spark 2.
It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2


> Upgrade to Spark 2.0
> --------------------
>
>                 Key: PIG-5157
>                 URL: https://issues.apache.org/jira/browse/PIG-5157
>             Project: Pig
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: 0.18.0
>
>         Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to