[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314343#comment-16314343
 ] 

TezQA commented on TEZ-3880:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12904899/TEZ-3880.01.patch
  against master revision d777f45.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.tests.TestExternalTezServices

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2709//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2709//console

This message is automatically generated.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: TEZ-3880 PreCommit Build #2709

2018-01-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3880
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2709/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 331.20 KB...]
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-ext-service-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12904899/TEZ-3880.01.patch
  against master revision d777f45.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.tests.TestExternalTezServices

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2709//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2709//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
b5764015be033d03560b175609758b8f39a02f94 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[Fast Archiver] Compressed 3.51 MB of artifacts by 30.3% relative to #2708
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.tez.tests.TestExternalTezServices.testErrorPropagation

Error Message:
expected:<1> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.tez.tests.TestExternalTezServices.runExceptionSimulation(TestExternalTezServices.java:203)
at 
org.apache.tez.tests.TestExternalTezServices.testErrorPropagation(TestExternalTezServices.java:187)

[jira] [Updated] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3880:
--
Attachment: TEZ-3880.01.patch

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3880:
--
Attachment: (was: TEZ-3880.01.patch)

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3880:
--
Attachment: TEZ-3880.01.patch

Removed the TODOs, and added a test

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3877) Delete unordered spill files once merge is done

2018-01-05 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314186#comment-16314186
 ] 

Rohini Palaniswamy commented on TEZ-3877:
-

+1

> Delete unordered spill files once merge is done
> ---
>
> Key: TEZ-3877
> URL: https://issues.apache.org/jira/browse/TEZ-3877
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
> Attachments: TEZ-3877.001.patch
>
>
>   I see that spill files are not deleted right after merge completes. We 
> should do that as it takes up a lot of space and we can't afford that wastage 
> when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me 
> they are only cleaned up after application completes as they are written in 
> app directory and not container directory. That also has to be done so that 
> they are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3877) Delete unordered spill files once merge is done

2018-01-05 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314116#comment-16314116
 ] 

TezQA commented on TEZ-3877:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12904866/TEZ-3877.001.patch
  against master revision d777f45.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2708//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2708//console

This message is automatically generated.

> Delete unordered spill files once merge is done
> ---
>
> Key: TEZ-3877
> URL: https://issues.apache.org/jira/browse/TEZ-3877
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
> Attachments: TEZ-3877.001.patch
>
>
>   I see that spill files are not deleted right after merge completes. We 
> should do that as it takes up a lot of space and we can't afford that wastage 
> when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me 
> they are only cleaned up after application completes as they are written in 
> app directory and not container directory. That also has to be done so that 
> they are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Success: TEZ-3877 PreCommit Build #2708

2018-01-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3877
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2708/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 339.61 KB...]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 53:16 min
[INFO] Finished at: 2018-01-05T23:20:24Z
[INFO] Final Memory: 93M/1412M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12904866/TEZ-3877.001.patch
  against master revision d777f45.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2708//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2708//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d8520269c451c2f541fff5dfc3fe8ae16e810a9f logged out


==
==
Finished build.
==
==


Archiving artifacts
[Fast Archiver] Compressed 3.52 MB of artifacts by 24.0% relative to #2706
[description-setter] Description set: TEZ-3877
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314066#comment-16314066
 ] 

Sergey Shelukhin commented on TEZ-3880:
---

I don't see it used anywhere in the codebase, so I'm assuming it's unused. I 
can remove the TODO-s.


> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314059#comment-16314059
 ] 

Gunther Hagleitner commented on TEZ-3880:
-

There's a comment in the TaskAttemptTerminationCause that references LLAP. I 
think that shouldn't be committed. I also don't know why this patch is calling 
in question whether INTERRUPTED_BY_SYSTEM is used or not. Can you add a test 
for the new behavior?

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3877) Delete unordered spill files once merge is done

2018-01-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated TEZ-3877:

Attachment: TEZ-3877.001.patch

Attaching a patch that cleans up the intermediate spills in in the unordered 
writer after the merge is complete or encounters an error.


> Delete unordered spill files once merge is done
> ---
>
> Key: TEZ-3877
> URL: https://issues.apache.org/jira/browse/TEZ-3877
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
> Attachments: TEZ-3877.001.patch
>
>
>   I see that spill files are not deleted right after merge completes. We 
> should do that as it takes up a lot of space and we can't afford that wastage 
> when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me 
> they are only cleaned up after application completes as they are written in 
> app directory and not container directory. That also has to be done so that 
> they are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3884) Hadoop3-beta1 fixes for Tez tests

2018-01-05 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313887#comment-16313887
 ] 

Gopal V commented on TEZ-3884:
--

This is a place-holder for -Phadoop3, so that the build against Hadoop3 has its 
own profile instead of using -Phadoop28

> Hadoop3-beta1 fixes for Tez tests
> -
>
> Key: TEZ-3884
> URL: https://issues.apache.org/jira/browse/TEZ-3884
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Gopal V
>
> {code}
> [ERROR] 
> /grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30]
>  cannot find symbol
> [ERROR] symbol:   class DistributedFileSystem
> [ERROR] location: package org.apache.hadoop.hdfs
> [ERROR] 
> /grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50]
>  cannot find symbol
> [ERROR] symbol:   class DistributedFileSystem
> [ERROR] location: class org.apache.tez.client.TestTezClientUtils
> [ERROR] 
> /grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42]
>  cannot access org.apache.hadoop.hdfs.DistributedFileSystem
> [ERROR] class file for org.apache.hadoop.hdfs.DistributedFileSystem not found
> [ERROR] -> [Help 1]
> [ERROR] 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3884) Hadoop3-beta1 fixes for Tez tests

2018-01-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-3884:
-
Priority: Minor  (was: Major)

> Hadoop3-beta1 fixes for Tez tests
> -
>
> Key: TEZ-3884
> URL: https://issues.apache.org/jira/browse/TEZ-3884
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Gopal V
>Priority: Minor
>
> {code}
> [ERROR] 
> /grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30]
>  cannot find symbol
> [ERROR] symbol:   class DistributedFileSystem
> [ERROR] location: package org.apache.hadoop.hdfs
> [ERROR] 
> /grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50]
>  cannot find symbol
> [ERROR] symbol:   class DistributedFileSystem
> [ERROR] location: class org.apache.tez.client.TestTezClientUtils
> [ERROR] 
> /grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42]
>  cannot access org.apache.hadoop.hdfs.DistributedFileSystem
> [ERROR] class file for org.apache.hadoop.hdfs.DistributedFileSystem not found
> [ERROR] -> [Help 1]
> [ERROR] 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TEZ-3884) Hadoop3-beta1 fixes for Tez tests

2018-01-05 Thread Gopal V (JIRA)
Gopal V created TEZ-3884:


 Summary: Hadoop3-beta1 fixes for Tez tests
 Key: TEZ-3884
 URL: https://issues.apache.org/jira/browse/TEZ-3884
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.9.1
Reporter: Gopal V


{code}
[ERROR] 
/grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[48,30]
 cannot find symbol
[ERROR] symbol:   class DistributedFileSystem
[ERROR] location: package org.apache.hadoop.hdfs
[ERROR] 
/grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/client/TestTezClientUtils.java:[680,50]
 cannot find symbol
[ERROR] symbol:   class DistributedFileSystem
[ERROR] location: class org.apache.tez.client.TestTezClientUtils
[ERROR] 
/grid/5/dev/gopalv/llap-autobuild/tez/tez-api/src/test/java/org/apache/tez/common/TestTezCommonUtils.java:[62,42]
 cannot access org.apache.hadoop.hdfs.DistributedFileSystem
[ERROR] class file for org.apache.hadoop.hdfs.DistributedFileSystem not found
[ERROR] -> [Help 1]
[ERROR] 
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Eric Wohlstadter (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313832#comment-16313832
 ] 

Eric Wohlstadter commented on TEZ-3880:
---

[~sershe]

Ok, the important thing is that for non-LLAP tasks, the old behavior is 
preserved.
So if SERVICE_BUSY is an LLAP specific termination reason, then this lgtm.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.

2018-01-05 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313799#comment-16313799
 ] 

Rohini Palaniswamy commented on TEZ-160:


Recently ran noticed that about 5% of Pig jobs launched from Oozie in a 
cluster, had application status as KILLED even though the DAG succeeded and Pig 
scripts completed successfully. This was because Pig calls TezClient.stop() on 
shutdown. If it is not killed within 10 seconds, it calls 
frameworkClient.killApplication(sessionAppId); which kill the AM. Because of 
the sleep time of 5 seconds after shutdown is issued, an application finishing 
as SUCCEEDED or KILLED depended on whether the shutdown completed within the 
next 5 seconds. 

Can we skip this check if it is a user initiated shutdown or at least lower it 
to 1 or 2 seconds? In case of Pig it is a Tez session and pig client is calling 
shutdown. I think we can skip it in general if it was a Tez session. The only 
time it will go down automatically is if session timeout expires. Adding 
another 5 seconds in that case is also wasteful.

> Remove 5 second sleep at the end of AM completion.
> --
>
> Key: TEZ-160
> URL: https://issues.apache.org/jira/browse/TEZ-160
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>  Labels: TEZ-0.2.0
> Attachments: test.timeouts.txt
>
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion 
> status from the AM after job completion. It, instead, always relies on the RM 
> for this information. The information returned by the AM should be used while 
> it's available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (TEZ-3877) Delete unordered spill files once merge is done

2018-01-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned TEZ-3877:
---

Assignee: Jason Lowe
 Summary: Delete unordered spill files once merge is done  (was: Delete 
spill files once merge is done)

Offline Rohini pointed me to the UnorderedKVWriter, and indeed the intermediate 
spill files are *not* being deleted after being merged like they are for the 
ordered case.  Updated the JIRA summary accordingly.

> Delete unordered spill files once merge is done
> ---
>
> Key: TEZ-3877
> URL: https://issues.apache.org/jira/browse/TEZ-3877
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
>
>   I see that spill files are not deleted right after merge completes. We 
> should do that as it takes up a lot of space and we can't afford that wastage 
> when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me 
> they are only cleaned up after application completes as they are written in 
> app directory and not container directory. That also has to be done so that 
> they are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)