> On Feb. 17, 2015, 5:58 a.m., Ben Mahler wrote:
> > 3rdparty/libprocess/src/process.cpp, line 2118
> > <https://reviews.apache.org/r/30587/diff/1/?file=846771#file846771line2118>
> >
> >     Hm.. it looks like `make check` with `GLOG_v=1` generates a ton of 
> > noise from these messages, is this expected?
> >     
> >     If this is to be expected, do we want to bump up the level?

The metrics ones seem to come from overall libprocess shutdown / after 
completion of tests. It looks like we shut down metrics __before__ we stop 
sending messages to it. (Probably not overly harmful, but not strictly 
correct). That some code is running after all the tests have "completed" is 
worrying to me... (Relevant log chunks below)

The waiter ones are generated in some of the libprocess tests. They don't show 
up in the test which actually is causing the code to fire (again worrying...). 
There is a WaitWaiter process spawned when you do a process::wait() with a 
duration. That WaitWaiter has its own timeout function set with a delay. Since 
the wait completes before the timeout, the wait() returns, the WaitWaiter 
destructs, and doesn't clean up the delay it set. The delay then fires 
eventually once the clock has been advanced enough by other tests.

Both of these I believe are programmer errors of not cleaning up properly.

We could fix it by just fixing the cases (Sample patch for the WaitWaiter):
```
diff --git a/3rdparty/libprocess/src/process.cpp 
b/3rdparty/libprocess/src/process.cpp
index 
d7fcba90bbe990236a45a76b8f39285a5097861c..f60f21c7b197b30f812fb923616be8693ee5365c
 100644
--- a/3rdparty/libprocess/src/process.cpp
+++ b/3rdparty/libprocess/src/process.cpp
@@ -2915,7 +2915,7 @@ public:
   {
     VLOG(3) << "Running waiter process for " << pid;
     link(pid);
-    delay(duration, self(), &WaitWaiter::timeout);
+    timer = delay(duration, self(), &WaitWaiter::timeout);
   }
 
 private:
@@ -2923,6 +2923,7 @@ private:
   {
     VLOG(3) << "Waiter process waited for " << pid;
     *waited = true;
+    Clock::cancel(timer.get());
     terminate(self());
   }
 
@@ -2937,6 +2938,7 @@ private:
   const UPID pid;
   const Duration duration;
   bool* const waited;
+  Option<Timer> timer;
 };

```

What would be better though I think is making it so when a Process / 
ProcessBase is destroyed / de-registers we remove all its outstanding clocks. 
This prevents future programmer mistakes, as well as eliminates a possible 
cause of subtle bugs inside mesos processes: Spawn a process, set a delay 
message for it, kill that process, start a new one with the same name. The new 
process will get the message once the timeout is reached. Reusing process names 
happens fairly frequently in the tests.

The relevant log bits (Generated with GLOG_v=3):

**Metrics log chunk:**
```
[==========] 1 test from 1 test case ran. (103 ms total)
[  PASSED  ] 1 test.

  YOU HAVE 3 DISABLED TESTS

I0227 12:29:24.962259  6723 clock.cpp:295] Clock paused at 0xeb0400
I0227 12:29:24.963186  6732 clock.cpp:135] Handling timers up to 2015-02-27 
20:29:30.962255872+00:00
I0227 12:29:24.963199  6732 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:29:30.959864832+00:00
I0227 12:29:24.963220  6732 clock.cpp:180] Clock has settled
I0227 12:29:24.963242  6725 process.cpp:2161] Resuming 
reaper(1)@127.0.0.1:53591 at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.963263  6725 clock.cpp:243] Created a timer for 
reaper(1)@127.0.0.1:53591 in 100ms in the future (2015-02-27 
20:29:31.062255872+00:00)
I0227 12:29:24.972337  6723 clock.cpp:429] Clock is settled
I0227 12:29:24.972363  6723 process.cpp:2466] Donating thread to 
[email protected]:53591 while waiting
I0227 12:29:24.972375  6723 process.cpp:2161] Resuming [email protected]:53591 
at 2015-02-27 20:29:30.962255872+00:00e
I0227 12:29:24.972388  6723 process.cpp:2268] Cleaning up [email protected]:53591
I0227 12:29:24.972437  6723 process.cpp:2466] Donating thread to 
__limiter__(1)@127.0.0.1:53591 while waiting
I0227 12:29:24.972446  6723 process.cpp:2161] Resuming 
__limiter__(1)@127.0.0.1:53591 at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972457  6723 process.cpp:2268] Cleaning up 
__limiter__(1)@127.0.0.1:53591
I0227 12:29:24.972487  6726 process.cpp:2161] Resuming 
[email protected]:53591 at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972506  6726 process.cpp:2268] Cleaning up 
[email protected]:53591
I0227 12:29:24.972527  6723 process.cpp:2466] Donating thread to 
[email protected]:53591 while waiting
I0227 12:29:24.972535  6723 process.cpp:2161] Resuming [email protected]:53591 at 
2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972545  6723 process.cpp:2268] Cleaning up [email protected]:53591
I0227 12:29:24.972571  6731 process.cpp:2161] Resuming [email protected]:53591 
at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972590  6731 process.cpp:2268] Cleaning up 
[email protected]:53591
I0227 12:29:24.972611  6723 process.cpp:2466] Donating thread to 
[email protected]:53591 while waiting
I0227 12:29:24.972620  6723 process.cpp:2161] Resuming [email protected]:53591 
at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972630  6723 process.cpp:2268] Cleaning up 
[email protected]:53591
I0227 12:29:24.972651  6726 process.cpp:2161] Resuming [email protected]:53591 
at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972662  6726 process.cpp:2268] Cleaning up 
[email protected]:53591
I0227 12:29:24.972677  6729 process.cpp:2161] Resuming 
reaper(1)@127.0.0.1:53591 at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972688  6729 process.cpp:2268] Cleaning up 
reaper(1)@127.0.0.1:53591
I0227 12:29:24.972705  6723 process.cpp:2466] Donating thread to 
[email protected]:53591 while waiting
I0227 12:29:24.972714  6723 process.cpp:2161] Resuming [email protected]:53591 
at 2015-02-27 20:29:30.962255872+00:00
I0227 12:29:24.972730  6723 process.cpp:2118] Dropped / Lost event for PID: 
[email protected]:53591
I0227 12:29:24.972743  6723 process.cpp:2118] Dropped / Lost event for PID: 
[email protected]:53591
I0227 12:29:24.972755  6723 process.cpp:2118] Dropped / Lost event for PID: 
[email protected]:53591
I0227 12:29:24.972767  6723 process.cpp:2118] Dropped / Lost event for PID: 
[email protected]:53591
I0227 12:29:24.972779  6723 process.cpp:2118] Dropped / Lost event for PID: 
[email protected]:53591
I0227 12:29:24.972790  6723 process.cpp:2118] Dropped / Lost event for PID: 
[email protected]:53591
I0227 12:29:24.972800  6723 process.cpp:2268] Cleaning up [email protected]:53591
make[5]: Leaving directory '/home/cody/projects/mesos/build/3rdparty/libprocess'
```

**__waiter__ log chunks:**
```
[----------] 7 tests from HTTP
[ RUN      ] HTTP.auth
I0227 12:40:42.212824  8256 process.cpp:2161] Resuming [email protected]:44132 at 
2015-02-27 20:40:42.212829184+00:00
I0227 12:40:42.212836  8257 process.cpp:2161] Resuming (1)@127.0.0.1:44132 at 
2015-02-27 20:40:42.212840192+00:00
I0227 12:40:42.212839  8253 process.cpp:2151] Spawned process 
(1)@127.0.0.1:44132
I0227 12:40:42.212970  8253 process.cpp:2151] Spawned process 
__latch__(1)@127.0.0.1:44132
I0227 12:40:42.212975  8261 process.cpp:2161] Resuming [email protected]:44132 
at 2015-02-27 20:40:42.212979200+00:00
I0227 12:40:42.212995  8253 process.cpp:2151] Spawned process 
__waiter__(1)@127.0.0.1:44132
I0227 12:40:42.212976  8260 process.cpp:2161] Resuming 
__latch__(1)@127.0.0.1:44132 at 2015-02-27 20:40:42.212996096+00:00
I0227 12:40:42.212996  8256 process.cpp:2161] Resuming 
__waiter__(1)@127.0.0.1:44132 at 2015-02-27 20:40:42.212999168+00:00
I0227 12:40:42.213027  8256 process.cpp:2916] Running waiter process for 
__latch__(1)@127.0.0.1:44132
I0227 12:40:42.213052  8256 clock.cpp:243] Created a timer for 
__waiter__(1)@127.0.0.1:44132 in 15secs in the future (2015-02-27 
20:40:57.213051136+00:00)
I0227 12:40:42.213198  8254 process.cpp:2161] Resuming (1)@127.0.0.1:44132 at 
2015-02-27 20:40:42.213201920+00:00
I0227 12:40:42.213217  8254 process.cpp:2771] Handling HTTP event for process 
'(1)' with path: '/(1)/auth'
I0227 12:40:42.213254  8254 process.cpp:2151] Spawned process 
__http__(1)@127.0.0.1:44132
I0227 12:40:42.213258  8258 process.cpp:2161] Resuming [email protected]:44132 
at 2015-02-27 20:40:42.213270784+00:00
I0227 12:40:42.213290  8259 process.cpp:2161] Resuming 
__http__(1)@127.0.0.1:44132 at 2015-02-27 20:40:42.213294080+00:00
I0227 12:40:42.213408  8257 process.cpp:2161] Resuming 
__http__(1)@127.0.0.1:44132 at 2015-02-27 20:40:42.213412096+00:00
I0227 12:40:42.213420  8257 process.cpp:2268] Cleaning up 
__http__(1)@127.0.0.1:44132
I0227 12:40:42.213443  8255 process.cpp:2161] Resuming [email protected]:44132 
at 2015-02-27 20:40:42.213448960+00:00
I0227 12:40:42.213508  8257 process.cpp:2161] Resuming 
__latch__(1)@127.0.0.1:44132 at 2015-02-27 20:40:42.213510912+00:00
I0227 12:40:42.213518  8257 process.cpp:2268] Cleaning up 
__latch__(1)@127.0.0.1:44132
I0227 12:40:42.213531  8258 process.cpp:2161] Resuming 
__waiter__(1)@127.0.0.1:44132 at 2015-02-27 20:40:42.213534976+00:00
I0227 12:40:42.213533  8261 process.cpp:2161] Resuming [email protected]:44132 
at 2015-02-27 20:40:42.213537024+00:00
I0227 12:40:42.213546  8258 process.cpp:2924] Waiter process waited for 
__latch__(1)@127.0.0.1:44132
I0227 12:40:42.213553  8258 process.cpp:2268] Cleaning up 
__waiter__(1)@127.0.0.1:44132

...
[       OK ] HTTP.auth (2 ms)

...

[ RUN      ] Metrics.SnapshotTimeout

...
I0227 12:40:42.484825  8253 clock.cpp:429] Clock is settled
I0227 12:40:42.484838  8253 clock.cpp:345] Clock advanced (1secs) to 0x2599420
I0227 12:40:42.484966  8262 clock.cpp:135] Handling timers up to 2015-02-27 
20:40:57.446742048+00:00
I0227 12:40:42.484983  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.213051136+00:00
I0227 12:40:42.484994  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.213692928+00:00
I0227 12:40:42.485003  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.214238976+00:00
I0227 12:40:42.485012  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.214799872+00:00
I0227 12:40:42.485019  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.214999808+00:00
I0227 12:40:42.485028  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.215301888+00:00
I0227 12:40:42.485035  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.215611904+00:00
I0227 12:40:42.485046  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.216196096+00:00
I0227 12:40:42.485055  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.216718080+00:00
I0227 12:40:42.485062  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.217353984+00:00
I0227 12:40:42.485070  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.217850880+00:00
I0227 12:40:42.485077  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.219537920+00:00
I0227 12:40:42.485088  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.221196800+00:00
I0227 12:40:42.485096  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.228478976+00:00
I0227 12:40:42.485103  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.334412064+00:00
I0227 12:40:42.485110  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.334772000+00:00
I0227 12:40:42.485118  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.334937120+00:00
I0227 12:40:42.485126  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.335040032+00:00
I0227 12:40:42.485133  8262 clock.cpp:142] Have timeout(s) at 2015-02-27 
20:40:57.335128864+00:00
I0227 12:40:42.485159  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(1)@127.0.0.1:44132
I0227 12:40:42.485170  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(2)@127.0.0.1:44132
I0227 12:40:42.485179  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(3)@127.0.0.1:44132
I0227 12:40:42.485188  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(4)@127.0.0.1:44132
I0227 12:40:42.485198  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(5)@127.0.0.1:44132
I0227 12:40:42.485205  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(6)@127.0.0.1:44132
I0227 12:40:42.485214  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(7)@127.0.0.1:44132
I0227 12:40:42.485224  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(8)@127.0.0.1:44132
I0227 12:40:42.485231  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(9)@127.0.0.1:44132
I0227 12:40:42.485241  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(10)@127.0.0.1:44132
I0227 12:40:42.485250  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(11)@127.0.0.1:44132
I0227 12:40:42.485260  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(12)@127.0.0.1:44132
I0227 12:40:42.485268  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(13)@127.0.0.1:44132
I0227 12:40:42.485278  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(14)@127.0.0.1:44132
I0227 12:40:42.485286  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(15)@127.0.0.1:44132
I0227 12:40:42.485297  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(16)@127.0.0.1:44132
I0227 12:40:42.485306  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(17)@127.0.0.1:44132
I0227 12:40:42.485314  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(18)@127.0.0.1:44132
I0227 12:40:42.485324  8262 process.cpp:2118] Dropped / Lost event for PID: 
__waiter__(19)@127.0.0.1:44132
I0227 12:40:42.485333  8262 clock.cpp:180] Clock has settled
```


- Cody


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30587/#review72676
-----------------------------------------------------------


On Feb. 4, 2015, 12:06 a.m., Cody Maloney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30587/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 12:06 a.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-1806
>     https://issues.apache.org/jira/browse/MESOS-1806
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Log when we drop a message since we can't find the local recipient.
> Include details of what was lost in a send failure.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp 
> 67b6b3b9c13d95fa1a24b48a12c5c831c7f249bf 
> 
> Diff: https://reviews.apache.org/r/30587/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Cody Maloney
> 
>

Reply via email to