[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321209#comment-16321209 ] Sergey Shelukhin commented on TEZ-3880: --- [~hagleitn] can you please commit it? I'm not a Tez committer :) > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320472#comment-16320472 ] Gunther Hagleitner commented on TEZ-3880: - Looks good to me now: +1 > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319389#comment-16319389 ] Sergey Shelukhin commented on TEZ-3880: --- [~hagleitn] can you take a look at the updated patch? thanks > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316923#comment-16316923 ] Sergey Shelukhin commented on TEZ-3880: --- [~sseth] perfect timing ;) Fixed the test. The follow up jira is supposed to address that. Instead of classifying killed and failed (or in addition) I'd like to have tasks grouped by error types. Phase 4 ;) > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.01.patch, TEZ-3880.02.patch, TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316909#comment-16316909 ] Siddharth Seth commented on TEZ-3880: - Is the test failure related? Otherwise, the patch looks good to me. One thing that is likely not being handled is the case where the executors accept work, and then reject/preempt them before execution - that is more like a rejection than a preemption. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.01.patch, TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314343#comment-16314343 ] TezQA commented on TEZ-3880: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12904899/TEZ-3880.01.patch against master revision d777f45. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.tests.TestExternalTezServices Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2709//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2709//console This message is automatically generated. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.01.patch, TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314066#comment-16314066 ] Sergey Shelukhin commented on TEZ-3880: --- I don't see it used anywhere in the codebase, so I'm assuming it's unused. I can remove the TODO-s. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314059#comment-16314059 ] Gunther Hagleitner commented on TEZ-3880: - There's a comment in the TaskAttemptTerminationCause that references LLAP. I think that shouldn't be committed. I also don't know why this patch is calling in question whether INTERRUPTED_BY_SYSTEM is used or not. Can you add a test for the new behavior? > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313832#comment-16313832 ] Eric Wohlstadter commented on TEZ-3880: --- [~sershe] Ok, the important thing is that for non-LLAP tasks, the old behavior is preserved. So if SERVICE_BUSY is an LLAP specific termination reason, then this lgtm. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312260#comment-16312260 ] Sergey Shelukhin commented on TEZ-3880: --- When AM tries to schedule on LLAP and there's no capacity, it treats task attempt as killed with SERVICE_BUSY error. This is not really a killed task but just an artifact of fitting the model that is based on how RM gives out containers for LLAP that works differently (similarly, queueing in LLAP is not accounted for in current Tez model because YARN handles it differently thru RM). On a full cluster, this affects killed task attempt counter in the UI. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312224#comment-16312224 ] Eric Wohlstadter commented on TEZ-3880: --- [~sershe] I'm not sure what you mean by ... rejected tasks because the cluster is full, or ... that the AM won't continuously queue tasks. e.g. if a task requires 600 containers and only 400 are immediately available, the AM generally won't just kill the whole task. Eventually resources for 200 more containers will become free and the task will complete. Obviously it's normal for a cluster to be at capacity and jobs to still complete successfully. So I probably just don't understand your use-case. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3880) do not count rejected tasks as killed in vertex progress
[ https://issues.apache.org/jira/browse/TEZ-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312199#comment-16312199 ] TezQA commented on TEZ-3880: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12904680/TEZ-3880.patch against master revision 4c378b4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.tests.TestExternalTezServices Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2707//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2707//console This message is automatically generated. > do not count rejected tasks as killed in vertex progress > > > Key: TEZ-3880 > URL: https://issues.apache.org/jira/browse/TEZ-3880 > Project: Apache Tez > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: TEZ-3880.patch > > > Tasks rejected from LLAP because the cluster is full are shown as killed > tasks in the commandline query UI (CLI and beeline). This shouldn't really > happen; killed tasks in the container case means something else, and this > scenario doesn't exist because AM doesn't continuously try to queue tasks. We > could change LLAP queue to use sort of a pull model (would also allow for > better duplicate scheduling), but for now we should fix the UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)