[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269241#comment-16269241
 ] 

Xuefu Zhang commented on PIG-5316:
--

[~nkollar], Sorry for the late reply.
{quote}
Though MRConfiguration is not intended for public use in Pig, should Hive use 
MRConfiguration#TASK_ID instead of referring to the taskId as a string?
{quote}
Your concern is valid. However, Hive has a lot of legacy MR1 code (more than 
just mapred.task.id). It probably takes a lot of effort to clean up this. 
Before that happens, Yes, the risk is going to be there.

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch, PIG-5316_2.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268913#comment-16268913
 ] 

Adam Szita commented on PIG-5316:
-

Good catch Nandor, fix committed!

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch, PIG-5316_2.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268832#comment-16268832
 ] 

Nandor Kollar commented on PIG-5316:


Looks like we can't create TaskAttemptID with default constructor, should use 
the one which gets parameters to avoid NPEs on Hadoop 2.x. Using 
HadoopShims#getNewTaskAttemptID should solve this, attached PIG-5316_2.patch.

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch, PIG-5316_2.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-28 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268704#comment-16268704
 ] 

Adam Szita commented on PIG-5316:
-

Committed to trunk, thanks Nandor and Xuefu!

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5316_1.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-23 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263989#comment-16263989
 ] 

Adam Szita commented on PIG-5316:
-

[~nkollar] +1 on the patch, unless objections by [~xuefuz] I'll commit tomorrow

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Attachments: PIG-5316_1.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-21 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260931#comment-16260931
 ] 

Nandor Kollar commented on PIG-5316:


[~xuefuz] I attached a patch, however I'm a bit worried that Hive relies on 
{{mapred.task.id}}. This is a "hidden" dependency, if we change the name of 
this property (because it is a deprecated Hadoop property, it's going to 
change), then Pig-HCat interoperability will be broken. Though MRConfiguration 
is not intended for public use in Pig, should Hive use MRConfiguration#TASK_ID 
instead of referring to the taskId as a string?

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
> Attachments: PIG-5316_1.patch
>
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-20 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259301#comment-16259301
 ] 

Xuefu Zhang commented on PIG-5316:
--

[~nkollar], while it might be just an placeholder, it's used to create staging 
scratch or staging directories. I think we should follow the custom format of a 
task id. You might want to check Pig code where this is set. As another 
reference, you can find how Hive on Spark sets it in 
{{HivePairFlatMapFunction.jara}}. 

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs

2017-11-20 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259259#comment-16259259
 ] 

Nandor Kollar commented on PIG-5316:


[~xuefuz] what should be the value of {{mapred.task.id}}? Spark doesn't use 
this, is it enough to use {{new TaskAttemptID()}}, so it would be just a 
placeholder value?

> Initialize mapred.task.id property for PoS jobs
> ---
>
> Key: PIG-5316
> URL: https://issues.apache.org/jira/browse/PIG-5316
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Adam Szita
>Assignee: Nandor Kollar
>
> Some downstream systems may require the presence of {{mapred.task.id}} 
> property (e.g. HCatalog). This is currently not set when Pig On Spark jobs 
> are started. Let's initialise it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)