Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206216
---


Ship it!




Master (efe8656) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 18, 2018, 8:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 8:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Santhosh Kumar Shanmugham


> On July 18, 2018, 2:21 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py
> > Lines 159 (patched)
> > 
> >
> > Should we use TASK_LOST here instead? Most users interpret TASK_FAILED 
> > as their responsibility whereas TASK_LOST is more of a misshap of 
> > Aurora/Mesos/Thermos. I would think an unknown exception in the runner is 
> > part of the latter category.
> 
> Santhosh Kumar Shanmugham wrote:
> Hmm. Then we can argue that failure to create sandbox or fork the process 
> etc also should be treated as TASK_LOST? At Twitter this is really not going 
> to help us, since we have platform wrapper that cause TASK_FAILED and it is 
> already hard to differentiate user configuration failures against platform 
> dependency failures.
> 
> I wanted to keep this consistent with the rest. If TASK_LOST makes more 
> sense for you I can update it.
> 
> Stephan Erb wrote:
> You make a good point. Let's keep it at FAILED. If really necessary, I 
> could always come back later with a more complete proposal.

I think the differentiation of user vs platform failure needs a whole lot of 
clean up in the executor codebase. We have been putting this work off for 
sometime but we are starting to realize that we need this data to make better 
decisions.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
---


On July 18, 2018, 1:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 1:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206214
---


Ship it!




Ship It!

- Stephan Erb


On July 18, 2018, 10:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 10:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Stephan Erb


> On July 18, 2018, 11:21 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py
> > Lines 159 (patched)
> > 
> >
> > Should we use TASK_LOST here instead? Most users interpret TASK_FAILED 
> > as their responsibility whereas TASK_LOST is more of a misshap of 
> > Aurora/Mesos/Thermos. I would think an unknown exception in the runner is 
> > part of the latter category.
> 
> Santhosh Kumar Shanmugham wrote:
> Hmm. Then we can argue that failure to create sandbox or fork the process 
> etc also should be treated as TASK_LOST? At Twitter this is really not going 
> to help us, since we have platform wrapper that cause TASK_FAILED and it is 
> already hard to differentiate user configuration failures against platform 
> dependency failures.
> 
> I wanted to keep this consistent with the rest. If TASK_LOST makes more 
> sense for you I can update it.

You make a good point. Let's keep it at FAILED. If really necessary, I could 
always come back later with a more complete proposal.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
---


On July 18, 2018, 10:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 10:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Santhosh Kumar Shanmugham


> On July 18, 2018, 2:21 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py
> > Lines 159 (patched)
> > 
> >
> > Should we use TASK_LOST here instead? Most users interpret TASK_FAILED 
> > as their responsibility whereas TASK_LOST is more of a misshap of 
> > Aurora/Mesos/Thermos. I would think an unknown exception in the runner is 
> > part of the latter category.

Hmm. Then we can argue that failure to create sandbox or fork the process etc 
also should be treated as TASK_LOST? At Twitter this is really not going to 
help us, since we have platform wrapper that cause TASK_FAILED and it is 
already hard to differentiate user configuration failures against platform 
dependency failures.

I wanted to keep this consistent with the rest. If TASK_LOST makes more sense 
for you I can update it.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
---


On July 18, 2018, 1:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 1:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
---




src/main/python/apache/aurora/executor/aurora_executor.py
Lines 159 (patched)


Should we use TASK_LOST here instead? Most users interpret TASK_FAILED as 
their responsibility whereas TASK_LOST is more of a misshap of 
Aurora/Mesos/Thermos. I would think an unknown exception in the runner is part 
of the latter category.


- Stephan Erb


On July 18, 2018, 10:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 10:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206206
---


Ship it!




Ship It!

- Jordan Ly


On July 18, 2018, 8:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 8:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206205
---


Ship it!




Ship It!

- David McLaughlin


On July 18, 2018, 8:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 8:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/
---

Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
Stephan Erb.


Repository: aurora


Description
---

If the ThermoTaskRunner encounters an Exception when trying to
fork the process, it bubbles this up to the Executor which does
not handle execptions other than TaskError. This leads to the
executor leaving the task in STARTING state and we end up with
tasks that get stranded in this state.

Fix it so that any unknown expection that is thrown when starting
a runner leads to task failure and get marked as FAILED.


Diffs
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
8a9958fffc2312686dccc7daf6d216631d4c956e 
  src/test/python/apache/aurora/executor/test_thermos_executor.py 
f6ae1be5d56bfd845bd09db67efa92091136 


Diff: https://reviews.apache.org/r/67967/diff/1/


Testing
---

./gradlew test
./pants test src/test/python/apache::


Thanks,

Santhosh Kumar Shanmugham