[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269241#comment-16269241 ] Xuefu Zhang commented on PIG-5316: -- [~nkollar], Sorry for the late reply. {quote} Though MRConfiguration is not intended for public use in Pig, should Hive use MRConfiguration#TASK_ID instead of referring to the taskId as a string? {quote} Your concern is valid. However, Hive has a lot of legacy MR1 code (more than just mapred.task.id). It probably takes a lot of effort to clean up this. Before that happens, Yes, the risk is going to be there. > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5316_1.patch, PIG-5316_2.patch > > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268913#comment-16268913 ] Adam Szita commented on PIG-5316: - Good catch Nandor, fix committed! > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5316_1.patch, PIG-5316_2.patch > > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268832#comment-16268832 ] Nandor Kollar commented on PIG-5316: Looks like we can't create TaskAttemptID with default constructor, should use the one which gets parameters to avoid NPEs on Hadoop 2.x. Using HadoopShims#getNewTaskAttemptID should solve this, attached PIG-5316_2.patch. > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5316_1.patch, PIG-5316_2.patch > > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268704#comment-16268704 ] Adam Szita commented on PIG-5316: - Committed to trunk, thanks Nandor and Xuefu! > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5316_1.patch > > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263989#comment-16263989 ] Adam Szita commented on PIG-5316: - [~nkollar] +1 on the patch, unless objections by [~xuefuz] I'll commit tomorrow > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > Attachments: PIG-5316_1.patch > > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260931#comment-16260931 ] Nandor Kollar commented on PIG-5316: [~xuefuz] I attached a patch, however I'm a bit worried that Hive relies on {{mapred.task.id}}. This is a "hidden" dependency, if we change the name of this property (because it is a deprecated Hadoop property, it's going to change), then Pig-HCat interoperability will be broken. Though MRConfiguration is not intended for public use in Pig, should Hive use MRConfiguration#TASK_ID instead of referring to the taskId as a string? > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > Attachments: PIG-5316_1.patch > > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259301#comment-16259301 ] Xuefu Zhang commented on PIG-5316: -- [~nkollar], while it might be just an placeholder, it's used to create staging scratch or staging directories. I think we should follow the custom format of a task id. You might want to check Pig code where this is set. As another reference, you can find how Hive on Spark sets it in {{HivePairFlatMapFunction.jara}}. > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5316) Initialize mapred.task.id property for PoS jobs
[ https://issues.apache.org/jira/browse/PIG-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259259#comment-16259259 ] Nandor Kollar commented on PIG-5316: [~xuefuz] what should be the value of {{mapred.task.id}}? Spark doesn't use this, is it enough to use {{new TaskAttemptID()}}, so it would be just a placeholder value? > Initialize mapred.task.id property for PoS jobs > --- > > Key: PIG-5316 > URL: https://issues.apache.org/jira/browse/PIG-5316 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Adam Szita >Assignee: Nandor Kollar > > Some downstream systems may require the presence of {{mapred.task.id}} > property (e.g. HCatalog). This is currently not set when Pig On Spark jobs > are started. Let's initialise it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)