[jira] [Created] (AURORA-1773) Asynchronously call webhook endpoint not to block EventBus

2016-09-12 Thread Dmitriy Shirchenko (JIRA)
Dmitriy Shirchenko created AURORA-1773:
--

 Summary: Asynchronously call webhook endpoint not to block EventBus
 Key: AURORA-1773
 URL: https://issues.apache.org/jira/browse/AURORA-1773
 Project: Aurora
  Issue Type: Sub-task
Reporter: Dmitriy Shirchenko
Assignee: Dmitriy Shirchenko






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1772) Do not resend TaskStateChange events every time a scheduler starts

2016-09-12 Thread Dmitriy Shirchenko (JIRA)
Dmitriy Shirchenko created AURORA-1772:
--

 Summary: Do not resend TaskStateChange events every time a 
scheduler starts
 Key: AURORA-1772
 URL: https://issues.apache.org/jira/browse/AURORA-1772
 Project: Aurora
  Issue Type: Sub-task
Reporter: Dmitriy Shirchenko
Assignee: Dmitriy Shirchenko






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-1770) Shiro realms default module should provide a human readable name

2016-09-12 Thread Zameer Manji (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zameer Manji reassigned AURORA-1770:


Assignee: Zameer Manji

> Shiro realms default module should provide a human readable name
> 
>
> Key: AURORA-1770
> URL: https://issues.apache.org/jira/browse/AURORA-1770
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Stephan Erb
>Assignee: Zameer Manji
>Priority: Trivial
>  Labels: newbie
>
> The help output of the scheduler contains an entry similar to the following:
> {code}
> -shiro_realm_modules (default 
> [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])
> {code}
> We should implement a proper {{toString}} method to print out the actual 
> module name rather than an ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AURORA-1225) Modify executor state transition logic to rely on health checks (if enabled)

2016-09-12 Thread Kai Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Huang updated AURORA-1225:
--
Sprint: Twitter Aurora Q2'16 Sprint 21

> Modify executor state transition logic to rely on health checks (if enabled)
> 
>
> Key: AURORA-1225
> URL: https://issues.apache.org/jira/browse/AURORA-1225
> Project: Aurora
>  Issue Type: Task
>  Components: Executor
>Reporter: Maxim Khutornenko
>Assignee: Kai Huang
>
> Executor needs to start executing user content in STARTING and transition to 
> RUNNING when a successful required number of health checks is reached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AURORA-1771) Consider scheduling multiple tasks per scheduling round

2016-09-12 Thread Maxim Khutornenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Khutornenko updated AURORA-1771:
--
Sprint: Twitter Aurora Q2'16 Sprint 21

> Consider scheduling multiple tasks per scheduling round
> ---
>
> Key: AURORA-1771
> URL: https://issues.apache.org/jira/browse/AURORA-1771
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Maxim Khutornenko
>Assignee: Maxim Khutornenko
>
> This is a placeholder for the scheduling loop perf optimization approach 
> described in https://reviews.apache.org/r/51759/. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1771) Consider scheduling multiple tasks per scheduling round

2016-09-12 Thread Maxim Khutornenko (JIRA)
Maxim Khutornenko created AURORA-1771:
-

 Summary: Consider scheduling multiple tasks per scheduling round
 Key: AURORA-1771
 URL: https://issues.apache.org/jira/browse/AURORA-1771
 Project: Aurora
  Issue Type: Task
  Components: Scheduler
Reporter: Maxim Khutornenko
Assignee: Maxim Khutornenko


This is a placeholder for the scheduling loop perf optimization approach 
described in https://reviews.apache.org/r/51759/. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1770) Shiro realms default module should provide a human readable name

2016-09-12 Thread Stephan Erb (JIRA)
Stephan Erb created AURORA-1770:
---

 Summary: Shiro realms default module should provide a human 
readable name
 Key: AURORA-1770
 URL: https://issues.apache.org/jira/browse/AURORA-1770
 Project: Aurora
  Issue Type: Task
  Components: Scheduler
Reporter: Stephan Erb
Priority: Trivial


The help output of the scheduler contains an entry similar to the following:

{code}
-shiro_realm_modules (default 
[org.apache.aurora.scheduler.app.MoreModules$1@158a8276])
{code}

We should implement a proper {{toString}} method to print out the actual module 
name rather than an ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1769) Enabling webhook is synchronous and could cause longer leader reelection cycle

2016-09-12 Thread Zameer Manji (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484846#comment-15484846
 ] 

Zameer Manji commented on AURORA-1769:
--

Agreed that a solution to this problem involves two components:
* Not sending the {{TaskStateChange}} on scheduler restart (that's very 
surprising to me)
* Sending data asynchronously to no block in the event bus callback.

I don't think this has to be a blocker either for 0.16.0, but I just wanted to 
surface it incase [~joshua.cohen] agreed.

> Enabling webhook is synchronous and could cause longer leader reelection cycle
> --
>
> Key: AURORA-1769
> URL: https://issues.apache.org/jira/browse/AURORA-1769
> Project: Aurora
>  Issue Type: Bug
>Reporter: Dmitriy Shirchenko
>Assignee: Dmitriy Shirchenko
>
> We had an issue where on scheduler leader reelection EventBus was full of 
> TaskStateChange events and caused scheduler to not be able to post 
> DriverRegistered() message which caused Aurora scheduler to not register 
> within 1 minute. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1769) Enabling webhook is synchronous and could cause longer leader reelection cycle

2016-09-12 Thread Maxim Khutornenko (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484533#comment-15484533
 ] 

Maxim Khutornenko commented on AURORA-1769:
---

My suggestion was targeting the restart issue where events should be suppressed 
regardless: you don't want to resend {{TaskStateChange}} events for all tasks 
every time a scheduler restarts.

As for the general perf issue, blocking {{EventBus}} threads was one of the 
concerns raised in the original https://reviews.apache.org/r/47440/ RB. We 
concluded back then that using aggressive connection timeouts _was_ appropriate 
to mitigate possible event queue saturation. If you feel that is no longer the 
case, please follow up with an async proposal. You'll likely need something 
akin the [BatchWorker|https://reviews.apache.org/r/51759/] sending thread 
working off of its own queue. In any case, given this feature is optional and 
off by default I feel blocking the release until it's improved is not justified.

> Enabling webhook is synchronous and could cause longer leader reelection cycle
> --
>
> Key: AURORA-1769
> URL: https://issues.apache.org/jira/browse/AURORA-1769
> Project: Aurora
>  Issue Type: Bug
>Reporter: Dmitriy Shirchenko
>Assignee: Dmitriy Shirchenko
>
> We had an issue where on scheduler leader reelection EventBus was full of 
> TaskStateChange events and caused scheduler to not be able to post 
> DriverRegistered() message which caused Aurora scheduler to not register 
> within 1 minute. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1768) Command `aurora task ssh` is not namespace and taskfs aware

2016-09-12 Thread Joshua Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484248#comment-15484248
 ] 

Joshua Cohen commented on AURORA-1768:
--

It may be enough to just enter the executor's namespace. The task's filesystem 
mount is visible there. If we want to enter the process namespace we can get 
the pid from the checkpoint, but it's more complicated in that we need to 
somehow figure out *which* process's namespaces to enter.

> Command `aurora task ssh` is not namespace and taskfs aware 
> 
>
> Key: AURORA-1768
> URL: https://issues.apache.org/jira/browse/AURORA-1768
> Project: Aurora
>  Issue Type: Story
>  Components: Thermos
>Reporter: Stephan Erb
>
> In order to guarantee isolation among tasks and to simplify debugging in 
> production environments, we should make sure commands executed via `aurora 
> ssh` have been isolated in the same way as the tasks itself. This implies 
> that we have to use the same container filesystem and enter the same 
> namespaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AURORA-1768) Command `aurora task ssh` is not namespace and taskfs aware

2016-09-12 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483814#comment-15483814
 ] 

Stephan Erb edited comment on AURORA-1768 at 9/12/16 11:08 AM:
---

That's tough. The PID of the executor is available in 
{{/var/lib/mesos/meta/slaves/latest/frameworks/\*/executors/\*/runs/latest/pids/forked.pid}}
 but as the executor runs outside of the mount namespace this might not work.


was (Author: stephanerb):
That's tough. The PID of the executor is available in 
{{/var/lib/mesos/meta/slaves/latest/frameworks/*/executors/*/runs/latest/pids/forked.pid}}
 but as the executor runs outside of the mount namespace this might not work.

> Command `aurora task ssh` is not namespace and taskfs aware 
> 
>
> Key: AURORA-1768
> URL: https://issues.apache.org/jira/browse/AURORA-1768
> Project: Aurora
>  Issue Type: Story
>  Components: Thermos
>Reporter: Stephan Erb
>
> In order to guarantee isolation among tasks and to simplify debugging in 
> production environments, we should make sure commands executed via `aurora 
> ssh` have been isolated in the same way as the tasks itself. This implies 
> that we have to use the same container filesystem and enter the same 
> namespaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1768) Command `aurora task ssh` is not namespace and taskfs aware

2016-09-12 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483814#comment-15483814
 ] 

Stephan Erb commented on AURORA-1768:
-

That's tough. The PID of the executor is available in 
{{/var/lib/mesos/meta/slaves/latest/frameworks/*/executors/*/runs/latest/pids/forked.pid}}
 but as the executor runs outside of the mount namespace this might not work.

> Command `aurora task ssh` is not namespace and taskfs aware 
> 
>
> Key: AURORA-1768
> URL: https://issues.apache.org/jira/browse/AURORA-1768
> Project: Aurora
>  Issue Type: Story
>  Components: Thermos
>Reporter: Stephan Erb
>
> In order to guarantee isolation among tasks and to simplify debugging in 
> production environments, we should make sure commands executed via `aurora 
> ssh` have been isolated in the same way as the tasks itself. This implies 
> that we have to use the same container filesystem and enter the same 
> namespaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)