[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6251:
--
Fix Version/s: 2.8.0

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch, MAPREDUCE-6251.8.patch, 
> MAPREDUCE-6251.8.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6251:
---
   Resolution: Fixed
Fix Version/s: 2.7.1
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and 2.7. Thanks Craig!

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch, MAPREDUCE-6251.8.patch, 
> MAPREDUCE-6251.8.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-08 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.8.patch

Reattach as I don't see it getting built.

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch, MAPREDUCE-6251.8.patch, 
> MAPREDUCE-6251.8.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-08 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Status: Open  (was: Patch Available)

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch, MAPREDUCE-6251.8.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-08 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Status: Patch Available  (was: Open)

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch, MAPREDUCE-6251.8.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-07 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.8.patch

...add line failed to stage for last patch.

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch, MAPREDUCE-6251.8.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-07 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.7.patch

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch, MAPREDUCE-6251.7.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.6.patch

With mapred-default.xml entries

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch, 
> MAPREDUCE-6251.6.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6251:

Labels: BB2015-05-TBR  (was: )

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-05 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Status: Patch Available  (was: Open)

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-05 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.4.patch

Updated with recommended move to MRJobConfig

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6251:
---
Target Version/s: 2.8.0
  Status: Open  (was: Patch Available)

Okay, reviewing again after your responses
 - Configuration usually goes into MRJobConfig. Unless you explicitly don't 
want them publicly visible to end-users naturally.
 - Once you move them to MRJobConfig, the naming convention also changes. Names 
there follow MR_CLIENT_*, DEFAULT_* patterns
 - Document them in mapred-default.xml? Stating when they are needed, and how 
they should be used in contrast to the lower level retries.

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-04-27 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.3.patch

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-04-24 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.2.patch

Checkstyle not meaningful, see [HADOOP-11869].  Attached fix for trailing 
whitespace (also a questionable concern, but fixed easily enough)

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
> MAPREDUCE-6251.2.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-04-24 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.1.patch

added unit test

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-02-10 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Affects Version/s: (was: 2.2.0)
   2.6.0

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.6.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-02-10 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Component/s: mrv2

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.2.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-02-10 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.0.patch

Attached is a patch which locates the retry where it is effective in capturing 
the state and which provides a configurable retry count/interval which will 
address this issue for most reasonable "eventual consistency" timeframes.  
Without chaning the overall handoff mechanism to not be based on DFS this is 
the best type of fix I believe we can achieve.  Moving to synchronous calls to 
report history to the JH is another option I think we should consider, but that 
is a more significant change I think we will want to consider down the road - 
in the meantime this should work around the issue for most cases.

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.2.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-02-10 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Status: Patch Available  (was: Open)

> JobClient needs additional retries at a higher level to address 
> not-immediately-consistent dfs corner cases
> ---
>
> Key: MAPREDUCE-6251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.2.0
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: MAPREDUCE-6251.0.patch
>
>
> The JobClient is used to get job status information for running and completed 
> jobs.  Final state and history for a job is communicated from the application 
> master to the job history server via a distributed file system - where the 
> history is uploaded by the application master to the dfs and then 
> scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
> guarantees not all Hadoop DFS's do.  When used in conjunction with a 
> distributed file system which does not have this guarantee there will be 
> cases where the history server may not see an uploaded file, resulting in the 
> dreaded "no such job" and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)