[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-22 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686194#comment-15686194
 ] 

Adam Szita commented on PIG-5052:
-

Thanks for the review and committing [~kellyzly] and [~xuefuz]
I'll mark this as resolved then

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.3-incrementalToPatch1.patch, 
> PIG-5052.3.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-21 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685739#comment-15685739
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~xuefuz]: thanks for committing and please close this jira [~szita]: thank for 
patch.

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.3-incrementalToPatch1.patch, 
> PIG-5052.3.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685729#comment-15685729
 ] 

Xuefu Zhang commented on PIG-5052:
--

 PIG-5052.3-incrementalToPatch1.patch is committed. Shall we close this ticket?

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.3-incrementalToPatch1.patch, 
> PIG-5052.3.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-21 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685542#comment-15685542
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~szita]:  PIG-5052.3-incrementalToPatch1.patch is good to me, +1.
[~xuefuz]: Please check in PIG-5052.3-incrementalToPach1.patch.  How to use it:
1.  checkout latest code, latest code is 815a0f2
2. patch -p1 Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.3-incrementalToPatch1.patch, 
> PIG-5052.3.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-21 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683043#comment-15683043
 ] 

Adam Szita commented on PIG-5052:
-

Attached new patches with adjusted code style:
[^PIG-5052.3.patch] - can be used if we revert the first patch submitted 
([^PIG-5052.patch])
[^PIG-5052.3-incrementalToPatch1.patch] - is the incremental diff to 
[^PIG-5052.patch] in case we don't want to revert 

My suggestion is to use [^PIG-5052.3.patch]

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.3-incrementalToPatch1.patch, 
> PIG-5052.3.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-20 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682488#comment-15682488
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~kexianda],[~mohitsabharwal] and [~xuefuz]: Currently 
[jenkins|https://builds.apache.org/job/Pig-spark/lastUnsuccessfulBuild/] fail 
after I tried the patch from [~szita], all unit  tests pass in my local jenkins.

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-17 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676015#comment-15676015
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~szita]: +1. But please repatch according to the latest code style. 

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-07 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645119#comment-15645119
 ] 

Adam Szita commented on PIG-5052:
-

You can try the following:
{code}
./pig -x spark_local

A = LOAD '../test/org/apache/pig/test/data/passwd' using PigStorage();
dump A
dump A
{code}

The second dump will hang for me. The reason is that jobs 0 and 1 are returned 
(because of using the same job group id) in JobGraphBuilder#225:
{code}
sparkContext.statusTracker().getJobIdsForGroup(jobGroupID)
{code}

..but JobMetricsListener will only have job 1 here in finishedJobIds:
{code}
public synchronized boolean waitForJobToEnd(int jobId) throws 
InterruptedException {
if (finishedJobIds.contains(jobId)) {
finishedJobIds.remove(jobId);
return true;
}

wait();
return false;
}
{code}

so we will never see job 0 after the second dump, but yet expect to.
On top of this I think it's a clearer approach to use different job group IDs 
for different jobs. 

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-07 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644944#comment-15644944
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~szita]:  Sorry for reply a bit late.
{quote}
This can be seen by just repeating the same pig query, (e.g. load, foreach, 
dump, dump) - the second job will hang in SparkStatsUtil#waitForJobAddStats.
Reason is that JobGraphBuilder#getJobIDs will return all jobs accociated with 
the same groupID, in the case above 0 and 1. Then it will wait for job 0 to 
finish but that's no longer in sparkContext, it was the previous job..
{quote}

 not very understand.   use following repeating query(load) script and run 
successfully.
{code}
A = load './SkewedJoinInput1.txt' as (id,name,n);
B = load './SkewedJoinInput1.txt' as (id,name);
store A into './duplicate.out.A';
store B into './duplicate.out.B';
explain A;
{code}

Can you give a script to show the failure you mentioned above?

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-04 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636461#comment-15636461
 ] 

Adam Szita commented on PIG-5052:
-

[~kellyzly] I found that it is indeed problematic to have the same ID set as 
jobGroupId accross multiple Pig on Spark jobs; and this commit has actually 
introduced a bug because of this.
This can be seen by just repeating the same pig query, (e.g. load, foreach, 
dump, dump) - the second job will hang in SparkStatsUtil#waitForJobAddStats.
Reason is that JobGraphBuilder#getJobIDs will return all jobs accociated with 
the same groupID, in the case above 0 and 1. Then it will wait for job 0 to 
finish but that's no longer in sparkContext, it was the previous job..

So I think we should do something like in [^PIG-5052.2.patch], we can combine 
the appId provided by sparkContext with a random UUID.

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-03 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635000#comment-15635000
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~xuefuz]: thanks for your commit in . Please also commit PIG-5051.

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-02 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631391#comment-15631391
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~szita]: thanks for review.
{quote}
 I think sparkContext.getConf().getAppId() will return the same value for the 
same spark context. That means that (since we're not creating a new spark 
context every time we run a job) that more jobs will get the same ID. Would 
that still be fine for our use cases (etc. org.apache.pig.builtin.RANDOM#exec) ?
{quote}
 Currently in pig on spark, in most cases 1 physical plan will be converted to 
1 spark job except multiquery case like
{code}
A = load './SkewedJoinInput1.txt' as (id,name,n);
B = foreach A generate id,name,RANDOM();
C = foreach A generate name,n,RANDOM();
store B into './multiQ.1.out';
store C into './multiQ.2.out';
explain B;
{code}
{code}
Spark node scope-36
Split - scope-42
|   |
|   B: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQ.1.out:org.apache.pig.builtin.PigStorage)
 - scope-26
|   |
|   |---B: New For Each(false,false,false)[bag] - scope-25
|   |   |
|   |   Project[bytearray][0] - scope-20
|   |   |
|   |   Project[bytearray][1] - scope-22
|   |   |
|   |   POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-24
|   |
|   C: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQ.2.out:org.apache.pig.builtin.PigStorage)
 - scope-35
|   |
|   |---C: New For Each(false,false,false)[bag] - scope-34
|   |   |
|   |   Project[bytearray][1] - scope-29
|   |   |
|   |   Project[bytearray][2] - scope-31
|   |   |
|   |   POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-33
|
|---A: New For Each(false,false,false)[bag] - scope-16
|   |
|   Project[bytearray][0] - scope-10
|   |
|   Project[bytearray][1] - scope-12
|   |
|   Project[bytearray][2] - scope-14
|
|---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/SkewedJoinInput1.txt:org.apache.pig.builtin.PigStorage)
 - scope-9
{code}
  This multiquery case will generate two spark jobs but they have same 
application id.   what you pointed is really a good catch. But i think it will 
*not* influence the output of RANDOM#exec. Because the jobId in mr is more 
closed to application id in spark in multiquery case because it will only 
generate 1 mr job in above multiquery case.



> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-02 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630611#comment-15630611
 ] 

Adam Szita commented on PIG-5052:
-

Just one remark: I think sparkContext.getConf().getAppId() will return the same 
value for the same spark context. That means that (since we're not creating a 
new spark context every time we run a job) that more jobs will get the same ID. 
Would that still be fine for our use cases (etc. 
org.apache.pig.builtin.RANDOM#exec) ?

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627744#comment-15627744
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~xuefuz]: please check in PIG-5052.patch

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-11-01 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627352#comment-15627352
 ] 

Xianda Ke commented on PIG-5052:


LGTM
+1 (non-bingd)

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-5052) Initialize MRConfiguration.JOB_ID in spark mode correctly

2016-10-31 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624409#comment-15624409
 ] 

liyunzhang_intel commented on PIG-5052:
---

[~szita]: Thanks your userful code.
[~kexianda]: please help review PIG-5052.patch. modification:
1. get jobGroupId of spark application by spark api and store the value in 
jobConfiguration. we use the value in org.apache.pig.builtin.RANDOM#exec.

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> -
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)