[jira] [Created] (AURORA-1773) Asynchronously call webhook endpoint not to block EventBus
Dmitriy Shirchenko created AURORA-1773: -- Summary: Asynchronously call webhook endpoint not to block EventBus Key: AURORA-1773 URL: https://issues.apache.org/jira/browse/AURORA-1773 Project: Aurora Issue Type: Sub-task Reporter: Dmitriy Shirchenko Assignee: Dmitriy Shirchenko -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1772) Do not resend TaskStateChange events every time a scheduler starts
Dmitriy Shirchenko created AURORA-1772: -- Summary: Do not resend TaskStateChange events every time a scheduler starts Key: AURORA-1772 URL: https://issues.apache.org/jira/browse/AURORA-1772 Project: Aurora Issue Type: Sub-task Reporter: Dmitriy Shirchenko Assignee: Dmitriy Shirchenko -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AURORA-1770) Shiro realms default module should provide a human readable name
[ https://issues.apache.org/jira/browse/AURORA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zameer Manji reassigned AURORA-1770: Assignee: Zameer Manji > Shiro realms default module should provide a human readable name > > > Key: AURORA-1770 > URL: https://issues.apache.org/jira/browse/AURORA-1770 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Stephan Erb >Assignee: Zameer Manji >Priority: Trivial > Labels: newbie > > The help output of the scheduler contains an entry similar to the following: > {code} > -shiro_realm_modules (default > [org.apache.aurora.scheduler.app.MoreModules$1@158a8276]) > {code} > We should implement a proper {{toString}} method to print out the actual > module name rather than an ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AURORA-1225) Modify executor state transition logic to rely on health checks (if enabled)
[ https://issues.apache.org/jira/browse/AURORA-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Huang updated AURORA-1225: -- Sprint: Twitter Aurora Q2'16 Sprint 21 > Modify executor state transition logic to rely on health checks (if enabled) > > > Key: AURORA-1225 > URL: https://issues.apache.org/jira/browse/AURORA-1225 > Project: Aurora > Issue Type: Task > Components: Executor >Reporter: Maxim Khutornenko >Assignee: Kai Huang > > Executor needs to start executing user content in STARTING and transition to > RUNNING when a successful required number of health checks is reached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AURORA-1771) Consider scheduling multiple tasks per scheduling round
[ https://issues.apache.org/jira/browse/AURORA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Khutornenko updated AURORA-1771: -- Sprint: Twitter Aurora Q2'16 Sprint 21 > Consider scheduling multiple tasks per scheduling round > --- > > Key: AURORA-1771 > URL: https://issues.apache.org/jira/browse/AURORA-1771 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Maxim Khutornenko >Assignee: Maxim Khutornenko > > This is a placeholder for the scheduling loop perf optimization approach > described in https://reviews.apache.org/r/51759/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1771) Consider scheduling multiple tasks per scheduling round
Maxim Khutornenko created AURORA-1771: - Summary: Consider scheduling multiple tasks per scheduling round Key: AURORA-1771 URL: https://issues.apache.org/jira/browse/AURORA-1771 Project: Aurora Issue Type: Task Components: Scheduler Reporter: Maxim Khutornenko Assignee: Maxim Khutornenko This is a placeholder for the scheduling loop perf optimization approach described in https://reviews.apache.org/r/51759/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1770) Shiro realms default module should provide a human readable name
Stephan Erb created AURORA-1770: --- Summary: Shiro realms default module should provide a human readable name Key: AURORA-1770 URL: https://issues.apache.org/jira/browse/AURORA-1770 Project: Aurora Issue Type: Task Components: Scheduler Reporter: Stephan Erb Priority: Trivial The help output of the scheduler contains an entry similar to the following: {code} -shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@158a8276]) {code} We should implement a proper {{toString}} method to print out the actual module name rather than an ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1769) Enabling webhook is synchronous and could cause longer leader reelection cycle
[ https://issues.apache.org/jira/browse/AURORA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484846#comment-15484846 ] Zameer Manji commented on AURORA-1769: -- Agreed that a solution to this problem involves two components: * Not sending the {{TaskStateChange}} on scheduler restart (that's very surprising to me) * Sending data asynchronously to no block in the event bus callback. I don't think this has to be a blocker either for 0.16.0, but I just wanted to surface it incase [~joshua.cohen] agreed. > Enabling webhook is synchronous and could cause longer leader reelection cycle > -- > > Key: AURORA-1769 > URL: https://issues.apache.org/jira/browse/AURORA-1769 > Project: Aurora > Issue Type: Bug >Reporter: Dmitriy Shirchenko >Assignee: Dmitriy Shirchenko > > We had an issue where on scheduler leader reelection EventBus was full of > TaskStateChange events and caused scheduler to not be able to post > DriverRegistered() message which caused Aurora scheduler to not register > within 1 minute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1769) Enabling webhook is synchronous and could cause longer leader reelection cycle
[ https://issues.apache.org/jira/browse/AURORA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484533#comment-15484533 ] Maxim Khutornenko commented on AURORA-1769: --- My suggestion was targeting the restart issue where events should be suppressed regardless: you don't want to resend {{TaskStateChange}} events for all tasks every time a scheduler restarts. As for the general perf issue, blocking {{EventBus}} threads was one of the concerns raised in the original https://reviews.apache.org/r/47440/ RB. We concluded back then that using aggressive connection timeouts _was_ appropriate to mitigate possible event queue saturation. If you feel that is no longer the case, please follow up with an async proposal. You'll likely need something akin the [BatchWorker|https://reviews.apache.org/r/51759/] sending thread working off of its own queue. In any case, given this feature is optional and off by default I feel blocking the release until it's improved is not justified. > Enabling webhook is synchronous and could cause longer leader reelection cycle > -- > > Key: AURORA-1769 > URL: https://issues.apache.org/jira/browse/AURORA-1769 > Project: Aurora > Issue Type: Bug >Reporter: Dmitriy Shirchenko >Assignee: Dmitriy Shirchenko > > We had an issue where on scheduler leader reelection EventBus was full of > TaskStateChange events and caused scheduler to not be able to post > DriverRegistered() message which caused Aurora scheduler to not register > within 1 minute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1768) Command `aurora task ssh` is not namespace and taskfs aware
[ https://issues.apache.org/jira/browse/AURORA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484248#comment-15484248 ] Joshua Cohen commented on AURORA-1768: -- It may be enough to just enter the executor's namespace. The task's filesystem mount is visible there. If we want to enter the process namespace we can get the pid from the checkpoint, but it's more complicated in that we need to somehow figure out *which* process's namespaces to enter. > Command `aurora task ssh` is not namespace and taskfs aware > > > Key: AURORA-1768 > URL: https://issues.apache.org/jira/browse/AURORA-1768 > Project: Aurora > Issue Type: Story > Components: Thermos >Reporter: Stephan Erb > > In order to guarantee isolation among tasks and to simplify debugging in > production environments, we should make sure commands executed via `aurora > ssh` have been isolated in the same way as the tasks itself. This implies > that we have to use the same container filesystem and enter the same > namespaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AURORA-1768) Command `aurora task ssh` is not namespace and taskfs aware
[ https://issues.apache.org/jira/browse/AURORA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483814#comment-15483814 ] Stephan Erb edited comment on AURORA-1768 at 9/12/16 11:08 AM: --- That's tough. The PID of the executor is available in {{/var/lib/mesos/meta/slaves/latest/frameworks/\*/executors/\*/runs/latest/pids/forked.pid}} but as the executor runs outside of the mount namespace this might not work. was (Author: stephanerb): That's tough. The PID of the executor is available in {{/var/lib/mesos/meta/slaves/latest/frameworks/*/executors/*/runs/latest/pids/forked.pid}} but as the executor runs outside of the mount namespace this might not work. > Command `aurora task ssh` is not namespace and taskfs aware > > > Key: AURORA-1768 > URL: https://issues.apache.org/jira/browse/AURORA-1768 > Project: Aurora > Issue Type: Story > Components: Thermos >Reporter: Stephan Erb > > In order to guarantee isolation among tasks and to simplify debugging in > production environments, we should make sure commands executed via `aurora > ssh` have been isolated in the same way as the tasks itself. This implies > that we have to use the same container filesystem and enter the same > namespaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1768) Command `aurora task ssh` is not namespace and taskfs aware
[ https://issues.apache.org/jira/browse/AURORA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483814#comment-15483814 ] Stephan Erb commented on AURORA-1768: - That's tough. The PID of the executor is available in {{/var/lib/mesos/meta/slaves/latest/frameworks/*/executors/*/runs/latest/pids/forked.pid}} but as the executor runs outside of the mount namespace this might not work. > Command `aurora task ssh` is not namespace and taskfs aware > > > Key: AURORA-1768 > URL: https://issues.apache.org/jira/browse/AURORA-1768 > Project: Aurora > Issue Type: Story > Components: Thermos >Reporter: Stephan Erb > > In order to guarantee isolation among tasks and to simplify debugging in > production environments, we should make sure commands executed via `aurora > ssh` have been isolated in the same way as the tasks itself. This implies > that we have to use the same container filesystem and enter the same > namespaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)