[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321209#comment-16321209
 ] 

Sergey Shelukhin commented on TEZ-3880:
---

[~hagleitn] can you please commit it? I'm not a Tez committer :)

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-10 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320472#comment-16320472
 ] 

Gunther Hagleitner commented on TEZ-3880:
-

Looks good to me now: +1

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319389#comment-16319389
 ] 

Sergey Shelukhin commented on TEZ-3880:
---

[~hagleitn] can you take a look at the updated patch? thanks

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316923#comment-16316923
 ] 

Sergey Shelukhin commented on TEZ-3880:
---

[~sseth] perfect timing ;) Fixed the test.
The follow up jira is supposed to address that. Instead of classifying killed 
and failed (or in addition) I'd like to have tasks grouped by error types. 
Phase 4 ;)

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316909#comment-16316909
 ] 

Siddharth Seth commented on TEZ-3880:
-

Is the test failure related? Otherwise, the patch looks good to me.
One thing that is likely not being handled is the case where the executors 
accept work, and then reject/preempt them before execution - that is more like 
a rejection than a preemption.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314343#comment-16314343
 ] 

TezQA commented on TEZ-3880:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12904899/TEZ-3880.01.patch
  against master revision d777f45.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.tests.TestExternalTezServices

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2709//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2709//console

This message is automatically generated.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.01.patch, TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314066#comment-16314066
 ] 

Sergey Shelukhin commented on TEZ-3880:
---

I don't see it used anywhere in the codebase, so I'm assuming it's unused. I 
can remove the TODO-s.


> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314059#comment-16314059
 ] 

Gunther Hagleitner commented on TEZ-3880:
-

There's a comment in the TaskAttemptTerminationCause that references LLAP. I 
think that shouldn't be committed. I also don't know why this patch is calling 
in question whether INTERRUPTED_BY_SYSTEM is used or not. Can you add a test 
for the new behavior?

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-05 Thread Eric Wohlstadter (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313832#comment-16313832
 ] 

Eric Wohlstadter commented on TEZ-3880:
---

[~sershe]

Ok, the important thing is that for non-LLAP tasks, the old behavior is 
preserved.
So if SERVICE_BUSY is an LLAP specific termination reason, then this lgtm.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312260#comment-16312260
 ] 

Sergey Shelukhin commented on TEZ-3880:
---

When AM tries to schedule on LLAP and there's no capacity, it treats task 
attempt as killed with SERVICE_BUSY error.
This is not really a killed task but just an artifact of fitting the model that 
is based on how RM gives out containers for LLAP that works differently 
(similarly, queueing in LLAP is not accounted for in current Tez model because 
YARN handles it differently thru RM).
On a full cluster, this affects killed task attempt counter in the UI.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-04 Thread Eric Wohlstadter (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312224#comment-16312224
 ] 

Eric Wohlstadter commented on TEZ-3880:
---

[~sershe]

I'm not sure what you mean by ... rejected tasks because the cluster is full, 
or ... that the AM won't continuously queue tasks. 
e.g. if a task requires 600 containers and only 400 are immediately available, 
the AM generally won't just kill the whole task. Eventually resources for 200 
more containers will become free and the task will complete.  

Obviously it's normal for a cluster to be at capacity and jobs to still 
complete successfully. So I probably just don't understand your use-case. 

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress

2018-01-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312199#comment-16312199
 ] 

TezQA commented on TEZ-3880:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12904680/TEZ-3880.patch
  against master revision 4c378b4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.tests.TestExternalTezServices

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2707//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2707//console

This message is automatically generated.

> do not count rejected tasks as killed in vertex progress
> 
>
> Key: TEZ-3880
> URL: https://issues.apache.org/jira/browse/TEZ-3880
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: TEZ-3880.patch
>
>
> Tasks rejected from LLAP because the cluster is full are shown as killed 
> tasks in the commandline query UI (CLI and beeline). This shouldn't really 
> happen; killed tasks in the container case means something else, and this 
> scenario doesn't exist because AM doesn't continuously try to queue tasks. We 
> could change LLAP queue to use sort of a pull model (would also allow for 
> better duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)