[jira] [Updated] (TEZ-2839) Tez UI: Use another kind of bar to represent dag killed/failed

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2839:

Attachment: 2015-09-17_1359.png

> Tez UI: Use another kind of bar to represent dag killed/failed
> --
>
> Key: TEZ-2839
> URL: https://issues.apache.org/jira/browse/TEZ-2839
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Reporter: Jeff Zhang
>Priority: Minor
> Attachments: 2015-09-17_1359.png
>
>
> Currently tez-ui use a blue animation bar to indicate the progress of dag.  
> It would be better to use another kind (red one and without animation ?) of 
> bar in the case of dag failed/killed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2839) Tez UI: Use another kind of bar to represent dag killed/failed

2015-09-16 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-2839:
---

 Summary: Tez UI: Use another kind of bar to represent dag 
killed/failed
 Key: TEZ-2839
 URL: https://issues.apache.org/jira/browse/TEZ-2839
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jeff Zhang
Priority: Minor


Currently tez-ui use a blue animation bar to indicate the progress of dag.  It 
would be better to use another kind (red one and without animation ?) of bar in 
the case of dag failed/killed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2838) Tez UI: Finished Time is not updated in real-time

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2838:

Summary: Tez UI: Finished Time is not updated in real-time  (was: Tez UI: 
Finish Time and Duration is not available on DAG Details)

> Tez UI: Finished Time is not updated in real-time
> -
>
> Key: TEZ-2838
> URL: https://issues.apache.org/jira/browse/TEZ-2838
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 0.8.1
>Reporter: Jeff Zhang
>Priority: Minor
> Attachments: 2015-09-17_1338.png
>
>
> I have to refresh the page to see the finished time and duration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2838) Tez UI: Finished Time is not updated in real-time

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2838:

Description: 
I have to refresh the page to see the finished time and duration. 
Same for DAG/Vertex/Task/TaskAttempt

  was:I have to refresh the page to see the finished time and duration. 


> Tez UI: Finished Time is not updated in real-time
> -
>
> Key: TEZ-2838
> URL: https://issues.apache.org/jira/browse/TEZ-2838
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 0.8.1
>Reporter: Jeff Zhang
>Priority: Minor
> Attachments: 2015-09-17_1338.png
>
>
> I have to refresh the page to see the finished time and duration. 
> Same for DAG/Vertex/Task/TaskAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2838) Tez UI: Finish Time and Duration is not available on DAG Details

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2838:

Priority: Minor  (was: Major)

> Tez UI: Finish Time and Duration is not available on DAG Details
> 
>
> Key: TEZ-2838
> URL: https://issues.apache.org/jira/browse/TEZ-2838
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 0.8.1
>Reporter: Jeff Zhang
>Priority: Minor
> Attachments: 2015-09-17_1338.png
>
>
> I have to refresh the page to see the finished time and duration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2838) Tez UI: Finish Time and Duration is not available on DAG Details

2015-09-16 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-2838:
---

 Summary: Tez UI: Finish Time and Duration is not available on DAG 
Details
 Key: TEZ-2838
 URL: https://issues.apache.org/jira/browse/TEZ-2838
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.8.1
Reporter: Jeff Zhang
 Attachments: 2015-09-17_1338.png

I have to refresh the page to see the finished time and duration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2838) Tez UI: Finish Time and Duration is not available on DAG Details

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2838:

Attachment: 2015-09-17_1338.png

> Tez UI: Finish Time and Duration is not available on DAG Details
> 
>
> Key: TEZ-2838
> URL: https://issues.apache.org/jira/browse/TEZ-2838
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 0.8.1
>Reporter: Jeff Zhang
> Attachments: 2015-09-17_1338.png
>
>
> I have to refresh the page to see the finished time and duration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2837) TEZ UI: First Task Start Time is not available

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2837:

Issue Type: Sub-task  (was: Improvement)
Parent: TEZ-2760

> TEZ UI: First Task Start Time is not available
> --
>
> Key: TEZ-2837
> URL: https://issues.apache.org/jira/browse/TEZ-2837
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.8.1
>Reporter: Jeff Zhang
>Priority: Minor
> Attachments: 2015-09-17_1326.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2837) TEZ UI: First Task Start Time is not available

2015-09-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2837:

Attachment: 2015-09-17_1326.png

> TEZ UI: First Task Start Time is not available
> --
>
> Key: TEZ-2837
> URL: https://issues.apache.org/jira/browse/TEZ-2837
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.8.1
>Reporter: Jeff Zhang
>Priority: Minor
> Attachments: 2015-09-17_1326.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2837) TEZ UI: First Task Start Time is not available

2015-09-16 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-2837:
---

 Summary: TEZ UI: First Task Start Time is not available
 Key: TEZ-2837
 URL: https://issues.apache.org/jira/browse/TEZ-2837
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.8.1
Reporter: Jeff Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-814) Improve heuristic for determining a task has failed outputs

2015-09-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791544#comment-14791544
 ] 

TezQA commented on TEZ-814:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12756404/TEZ-814.1.patch
  against master revision 1a065b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1146//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1146//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//console

This message is automatically generated.

> Improve heuristic for determining a task has failed outputs
> ---
>
> Key: TEZ-814
> URL: https://issues.apache.org/jira/browse/TEZ-814
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.7.1
>
> Attachments: TEZ-814.1.patch
>
>
> Currently 25% of consumers need to report failure. However we may not always 
> have those many error reports. Eg. this is the last consumer and it the 
> source is lost. Or some consumers are cut off from the source. The job may 
> hang on those consumers waiting for a re-run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-814 PreCommit Build #1146

2015-09-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-814
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1146/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3456 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12756404/TEZ-814.1.patch
  against master revision 1a065b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1146//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1146//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
fe9df782fb7defa2d684dffef4d0e3e5d14ffe91 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1144
Archived 53 artifacts
Archive block size is 32768
Received 6 blocks and 3102498 bytes
Compression is 6.0%
Took 0.88 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-814) Improve heuristic for determining a task has failed outputs

2015-09-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-814:
---
Fix Version/s: 0.7.1

> Improve heuristic for determining a task has failed outputs
> ---
>
> Key: TEZ-814
> URL: https://issues.apache.org/jira/browse/TEZ-814
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.7.1
>
> Attachments: TEZ-814.1.patch
>
>
> Currently 25% of consumers need to report failure. However we may not always 
> have those many error reports. Eg. this is the last consumer and it the 
> source is lost. Or some consumers are cut off from the source. The job may 
> hang on those consumers waiting for a re-run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-814) Improve heuristic for determining a task has failed outputs

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791456#comment-14791456
 ] 

Bikas Saha commented on TEZ-814:


Heuristics are mainly designed to prevent inadvertent flurry of re-runs due to 
intermittent network issues. So we have fraction and unique failures reported 
heuristics to verify that multiple readers are reporting the same failure.

Regardless of these current and future heuristics we need to ensure indefinite 
job hangs due to non convergent heuristics. So this patch adds a time based 
deadline. If a consumer attempt reports a read error for a timespan exceeding a 
threshold (default 300s) then the producer attempt will be re-run.

[~rajesh.balamohan] [~hitesh] Please review

> Improve heuristic for determining a task has failed outputs
> ---
>
> Key: TEZ-814
> URL: https://issues.apache.org/jira/browse/TEZ-814
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
> Attachments: TEZ-814.1.patch
>
>
> Currently 25% of consumers need to report failure. However we may not always 
> have those many error reports. Eg. this is the last consumer and it the 
> source is lost. Or some consumers are cut off from the source. The job may 
> hang on those consumers waiting for a re-run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-814) Improve heuristic for determining a task has failed outputs

2015-09-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha reassigned TEZ-814:
--

Assignee: Bikas Saha

> Improve heuristic for determining a task has failed outputs
> ---
>
> Key: TEZ-814
> URL: https://issues.apache.org/jira/browse/TEZ-814
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-814.1.patch
>
>
> Currently 25% of consumers need to report failure. However we may not always 
> have those many error reports. Eg. this is the last consumer and it the 
> source is lost. Or some consumers are cut off from the source. The job may 
> hang on those consumers waiting for a re-run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-814) Improve heuristic for determining a task has failed outputs

2015-09-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-814:
---
Attachment: TEZ-814.1.patch

> Improve heuristic for determining a task has failed outputs
> ---
>
> Key: TEZ-814
> URL: https://issues.apache.org/jira/browse/TEZ-814
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
> Attachments: TEZ-814.1.patch
>
>
> Currently 25% of consumers need to report failure. However we may not always 
> have those many error reports. Eg. this is the last consumer and it the 
> source is lost. Or some consumers are cut off from the source. The job may 
> hang on those consumers waiting for a re-run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2836) Avoid setting framework/system counters for tasks running in threads

2015-09-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791435#comment-14791435
 ] 

TezQA commented on TEZ-2836:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12756368/TEZ-2836.1.txt
  against master revision 1a065b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1145//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1145//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//console

This message is automatically generated.

> Avoid setting framework/system counters for tasks running in threads
> 
>
> Key: TEZ-2836
> URL: https://issues.apache.org/jira/browse/TEZ-2836
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2836.1.txt
>
>
> Counters like FileSystemCounters, GC_TIME, CPU_TIME etc - are computed 
> incorrectly in case of LocalMode, Uber, TestService and others where tasks 
> may execute in threads. (The values end up being a combination of what's 
> running in the process - which could be other tasks or the AM).
> It's better not to set them for now, instead of reporting incorrect values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2836 PreCommit Build #1145

2015-09-16 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2836
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1145/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3463 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12756368/TEZ-2836.1.txt
  against master revision 1a065b9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1145//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1145//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
b62c2cda1c24e7403d040eca8a7c84c190e368f5 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1144
Archived 53 artifacts
Archive block size is 32768
Received 10 blocks and 2937915 bytes
Compression is 10.0%
Took 4.2 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Resolved] (TEZ-2830) Backport TEZ-2774 to branch-0.7

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-2830.
-
   Resolution: Fixed
Fix Version/s: 0.7.1

> Backport TEZ-2774 to branch-0.7
> ---
>
> Key: TEZ-2830
> URL: https://issues.apache.org/jira/browse/TEZ-2830
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.7.1
>
> Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2836) Avoid setting framework/system counters for tasks running in threads

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2836:

Attachment: TEZ-2836.1.txt

[~rajesh.balamohan], [~hitesh] - please review. This disables the final 
updateCounters for local and uber mode, and in the test service.

> Avoid setting framework/system counters for tasks running in threads
> 
>
> Key: TEZ-2836
> URL: https://issues.apache.org/jira/browse/TEZ-2836
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2836.1.txt
>
>
> Counters like FileSystemCounters, GC_TIME, CPU_TIME etc - are computed 
> incorrectly in case of LocalMode, Uber, TestService and others where tasks 
> may execute in threads. (The values end up being a combination of what's 
> running in the process - which could be other tasks or the AM).
> It's better not to set them for now, instead of reporting incorrect values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2836) Avoid setting framework/system counters for tasks running in threads

2015-09-16 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-2836:
---

 Summary: Avoid setting framework/system counters for tasks running 
in threads
 Key: TEZ-2836
 URL: https://issues.apache.org/jira/browse/TEZ-2836
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Counters like FileSystemCounters, GC_TIME, CPU_TIME etc - are computed 
incorrectly in case of LocalMode, Uber, TestService and others where tasks may 
execute in threads. (The values end up being a combination of what's running in 
the process - which could be other tasks or the AM).
It's better not to set them for now, instead of reporting incorrect values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2830) Backport TEZ-2774 to branch-0.7

2015-09-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791209#comment-14791209
 ] 

Siddharth Seth commented on TEZ-2830:
-

That's not relevant to branch-0.7, only for threaded execution of tasks. Thanks 
for taking a look. Committing.

> Backport TEZ-2774 to branch-0.7
> ---
>
> Key: TEZ-2830
> URL: https://issues.apache.org/jira/browse/TEZ-2830
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2835) [Timeline ACLs] Session-level entities should not be tied to the dag's domain

2015-09-16 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2835:


 Summary: [Timeline ACLs] Session-level entities should not be tied 
to the dag's domain
 Key: TEZ-2835
 URL: https://issues.apache.org/jira/browse/TEZ-2835
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah


Container events may be created either at session start or a different dag. 
Updates to the container entities if done in a different dag will have acl 
issues if a common domain-acl for Timeline is not used. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2826) save the status for completed dags in a session

2015-09-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2826:
-
Description: 
currently we store the list of dags completed. If we store the state of the dag 
too, it would be helpful in the case of tez-ui where the ui will query a 
completed dag and show the uptodate status for completed dags in a session.
\cc [~zjffdu]

  was:
currently we store the list of dags completed. If we store the state of the dag 
too, it would be helpful in the case of tez-ui where the ui will query a 
completed dag and show the uptodate status for completed dags in a session.
\cc [~jzhang]


> save the status for completed dags in a session
> ---
>
> Key: TEZ-2826
> URL: https://issues.apache.org/jira/browse/TEZ-2826
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Hitesh Shah
>
> currently we store the list of dags completed. If we store the state of the 
> dag too, it would be helpful in the case of tez-ui where the ui will query a 
> completed dag and show the uptodate status for completed dags in a session.
> \cc [~zjffdu]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2830) Backport TEZ-2774 to branch-0.7

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791099#comment-14791099
 ] 

Bikas Saha commented on TEZ-2830:
-

lgtm. found one missing item. perhaps its not relevant to 0.7.
{code}
-LOG.debug("ThreadId : " + id + ", name=" + threadInfo.getThreadName());
+if (LOG.isDebugEnabled()) {
+  LOG.debug("ThreadId : " + id + ", name=" + 
threadInfo.getThreadName());
+}{code}

> Backport TEZ-2774 to branch-0.7
> ---
>
> Key: TEZ-2830
> URL: https://issues.apache.org/jira/browse/TEZ-2830
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2830) Backport TEZ-2774 to branch-0.7

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2830:

Attachment: TEZ-2830.2.txt

Updated with the addendum to 2774

> Backport TEZ-2774 to branch-0.7
> ---
>
> Key: TEZ-2830
> URL: https://issues.apache.org/jira/browse/TEZ-2830
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2774) Reduce logging in the AM, and parts of the runtime

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791022#comment-14791022
 ] 

Bikas Saha commented on TEZ-2774:
-

Thanks!
commit 1a065b9d87d84645363d0c65ae021a6a514169a8
Author: Bikas Saha 
Date:   Wed Sep 16 12:50:38 2015 -0700

TEZ-2774. addendum to add a preemption periodic log


> Reduce logging in the AM, and parts of the runtime
> --
>
> Key: TEZ-2774
> URL: https://issues.apache.org/jira/browse/TEZ-2774
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.8.1
>
> Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, 
> TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2774) Reduce logging in the AM, and parts of the runtime

2015-09-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790974#comment-14790974
 ] 

Siddharth Seth commented on TEZ-2774:
-

Looks fine.

> Reduce logging in the AM, and parts of the runtime
> --
>
> Key: TEZ-2774
> URL: https://issues.apache.org/jira/browse/TEZ-2774
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.8.1
>
> Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, 
> TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2774) Reduce logging in the AM, and parts of the runtime

2015-09-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2774:

Attachment: TEZ-2774.addendum.patch

Attaching an addendum patch that periodically logs in preemption related code. 
The log wasnt removed in this jira but TEZ-2834 showed the absence of this log 
is bad. Adding a periodicity to that logging would help.

> Reduce logging in the AM, and parts of the runtime
> --
>
> Key: TEZ-2774
> URL: https://issues.apache.org/jira/browse/TEZ-2774
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.8.1
>
> Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, 
> TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2774) Reduce logging in the AM, and parts of the runtime

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790965#comment-14790965
 ] 

Bikas Saha edited comment on TEZ-2774 at 9/16/15 7:09 PM:
--

Attaching an addendum patch that periodically logs in preemption related code. 
The log wasnt removed in this jira but TEZ-2834 showed the absence of this log 
is bad. Adding a periodicity to that logging would help. [~sseth] Could you 
take a quick look at the addendum?


was (Author: bikassaha):
Attaching an addendum patch that periodically logs in preemption related code. 
The log wasnt removed in this jira but TEZ-2834 showed the absence of this log 
is bad. Adding a periodicity to that logging would help.

> Reduce logging in the AM, and parts of the runtime
> --
>
> Key: TEZ-2774
> URL: https://issues.apache.org/jira/browse/TEZ-2774
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.8.1
>
> Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, 
> TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2830) Backport TEZ-2774 to branch-0.7

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2830:

Attachment: TEZ-2830.1.txt

[~bikassaha] - could you please scan through the backport for sanity.

> Backport TEZ-2774 to branch-0.7
> ---
>
> Key: TEZ-2830
> URL: https://issues.apache.org/jira/browse/TEZ-2830
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2830.1.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha reassigned TEZ-2834:
---

Assignee: Bikas Saha

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
>Assignee: Bikas Saha
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2834:

Assignee: Bikas Saha  (was: Siddharth Seth)

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
>Assignee: Bikas Saha
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-2834:
---

Assignee: Siddharth Seth

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
>Assignee: Siddharth Seth
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2834:

Assignee: (was: Bikas Saha)

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790823#comment-14790823
 ] 

Bikas Saha commented on TEZ-2834:
-

Was the cluster fully occupied when this was happening. My speculation is that 
the headroom reported by RM was enough to cover this 1 task and so we were not 
preempting anything but we were not getting containers allocated to us.

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790819#comment-14790819
 ] 

Bikas Saha commented on TEZ-2834:
-

The preemption code logs are all debug. This issue cannot be debugged with the 
attached logs.

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790783#comment-14790783
 ] 

Gopal V commented on TEZ-2834:
--

[~bikassaha]: YARN-4149? That was fixed last night, it's not deployed on the 
cluster yet.

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790778#comment-14790778
 ] 

Bikas Saha commented on TEZ-2834:
-

If this cluster has latest YARN then the am logs can be separately downloaded 
using the new yarn logs commands enhancements.

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2833) Dont create extra directory during ATS file download

2015-09-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790762#comment-14790762
 ] 

Bikas Saha commented on TEZ-2833:
-

Couldn't understand the scenario :) The file names are already different. So 
not sure how having the extra folder helps.

> Dont create extra directory during ATS file download
> 
>
> Key: TEZ-2833
> URL: https://issues.apache.org/jira/browse/TEZ-2833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Rajesh Balamohan
>
> The file name already has the dag id as a unique identifier. Placing it 
> inside another directory with the dag id seems unnecessary and can throw off 
> a user expecting the zip file in the user specified download dir.
> /cc [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-09-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2732:
-
Affects Version/s: 0.5.0
   0.6.0
   0.7.0
   0.8.0-alpha

> DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
> ---
>
> Key: TEZ-2732
> URL: https://issues.apache.org/jira/browse/TEZ-2732
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 0.7.1
>
> Attachments: TEZ-2732.1.patch, TEZ-2732.branch-0.6-and-0.5.patch, 
> TEZ-2732.branch-0.7.patch
>
>
> {noformat}
>   kvbuffer.length = 2146435072 (2047 MB)
>   Corner case: bufIndex=2026133899, kvbidx=523629312.
>   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
>   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
> overflow)
> {noformat}
> Would be good to restrict the max allowed sort buffer to 1800 instead of 
> 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers

2015-09-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2732:
-
Affects Version/s: (was: 0.8.0-alpha)

> DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
> ---
>
> Key: TEZ-2732
> URL: https://issues.apache.org/jira/browse/TEZ-2732
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 0.7.1
>
> Attachments: TEZ-2732.1.patch, TEZ-2732.branch-0.6-and-0.5.patch, 
> TEZ-2732.branch-0.7.patch
>
>
> {noformat}
>   kvbuffer.length = 2146435072 (2047 MB)
>   Corner case: bufIndex=2026133899, kvbidx=523629312.
>   distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485
>   newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would 
> overflow)
> {noformat}
> Would be good to restrict the max allowed sort buffer to 1800 instead of 
> 2047. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2834:
--
Description: 
Will attach the DAG.

Repro for reference: TPC-DS q_70 @ 30 TB scale.

"Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
slightly late.  But before "Reducer 9" can get scheduled, slots are taken up by 
"Map 1", which is not preempted for running "Reducer 9".

This is with 0.7.1 codebase.

  was:
Will attach the DAG.

Repro for reference: TPC-DS q_70 @ 30 TB scale.

"Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
slightly late.  But before "Reducer 9" can get scheduled, slots are taken up by 
"Map 1", which is not preempted for running "Reducer 9".


> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".
> This is with 0.7.1 codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2834:
--
Attachment: application_1442254312093_0095.1.log.gz
application_1442254312093_0095.2.log.gz
DAG_view.png
hive_view.png

Attaching DAG, hive_view and app logs for reference. App logs has been split 
into 2 and uploaded as they are huge.

{noformat}
2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Creating 2 tasks for vertex: vertex_1442254312093_0095_1_05 [Reducer 9]
2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Directly initializing vertex: vertex_1442254312093_0095_1_05 [Reducer 9]
...
2015-09-15 09:43:25,493 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl: 
attempt_1442254312093_0095_1_05_00_0 TaskAttempt Transitioned from NEW to 
START_WAIT due to event TA_SCHEDULE
2015-09-15 09:43:25,493 INFO [TaskSchedulerEventHandlerThread] 
rm.YarnTaskSchedulerService: Allocation request for task: 
attempt_1442254312093_0095_1_05_00_0 with request: Capability[]Priority[11] host: null rack: null
{noformat}

Reducer 9 is not getting transitioned after "NEW to START_WAIT due to event 
TA_SCHEDULE"

> tez app hangs at large scale (~30TB)
> 
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2834) tez app hangs at large scale (~30TB)

2015-09-16 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-2834:
-

 Summary: tez app hangs at large scale (~30TB)
 Key: TEZ-2834
 URL: https://issues.apache.org/jira/browse/TEZ-2834
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Rajesh Balamohan


Will attach the DAG.

Repro for reference: TPC-DS q_70 @ 30 TB scale.

"Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
slightly late.  But before "Reducer 9" can get scheduled, slots are taken up by 
"Map 1", which is not preempted for running "Reducer 9".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)