Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Ben Mahler


> On Dec. 10, 2015, 2:35 a.m., Artem Harutyunyan wrote:
> > src/health-check/main.cpp, line 120
> > 
> >
> > Do we need to create a JIRA for eventually get rid of the hack?

Good idea, I filed MESOS-4111 and will reference it in a TODO. Will also add a 
reference in the command executor sleep.


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109667
---


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Artem Harutyunyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109668
---

Ship it!


Ship It!

- Artem Harutyunyan


On Dec. 9, 2015, 6:01 p.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 9, 2015, 6:01 p.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Artem Harutyunyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109667
---



src/health-check/main.cpp (line 120)


Do we need to create a JIRA for eventually get rid of the hack?


- Artem Harutyunyan


On Dec. 9, 2015, 6:01 p.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 9, 2015, 6:01 p.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Ben Mahler


> On Dec. 10, 2015, 2:10 a.m., Neil Conway wrote:
> > src/tests/health_check_tests.cpp, line 633
> > 
> >
> > Comment needs updating.

Thanks for catching this!


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109664
---


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



Re: Review Request 41178: Fixed a message dropping bug in the health checker.

2015-12-09 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109664
---



src/tests/health_check_tests.cpp (line 633)


Comment needs updating.


- Neil Conway


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> ---
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> ---
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition 
> on a machine loaded with many `openssl speed` commands in the background 
> reproduces the flakiness. After this patch it is no longer flaky in this 
> setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>