> On March 25, 2014, 1:59 a.m., Ben Mahler wrote:
> > 3rdparty/libprocess/include/process/metrics/gauge.hpp, line 24
> > <https://reviews.apache.org/r/18718/diff/11/?file=534292#file534292line24>
> >
> >     We really should not be blocking implicitly when we call Gauge::get.
> >     
> >     I see two paths forward here:
> >     
> >     1. Metric::get returns a Future<double> and in any code you write that 
> > reads metrics, it will need to collect() the values from the metrics.
> >     
> >     2. Alternatively, split the logic for fetching data and reading fetched 
> > data. Ignore naming for a second, something like 'Future<double> 
> > Metric::fetch' for fetching and caching the data asynchronously, and 
> > 'double Metric::fetched' for reading the cached fetched data synchronously.
> >     
> >     I only mention 2 because it may be a bit too cumbersome to do reading 
> > if we go with 1, because you will need to build up a bit of structure 
> > (mapping of context,name -> Future<double>) in your code to pass to the 
> > continuation function so that it may read the data easily.
> >     
> >     With 2 however, you could simply collect all the fetch() calls and have 
> > the continuation read through the fetched() data.
> >     
> >     But, I don't think the code would be too complex for 1 so I would 
> > recommend going that route so that we can keep the API as clean as possible 
> > here.

Let me give a little more motivation as to why we don't block like this, since 
the pitfalls are not spelled out clearly anywhere.

When you block a 'Process', the corresponding libprocess worker thread will 
block.
There are a limited number of worker threads, say N, where N ~ number of 
logical cores on the machine.
Now, if you have M > N Processes running, and N of these end up in a blocked 
state, then libprocess is deadlocked.

In the case of Gauge, this is especially dangerous since the blocking is 
implicitly done so it's not clear to the caller that the call to get() is a 
blocking operation.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18718/#review38400
-----------------------------------------------------------


On March 24, 2014, 7:51 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18718/
> -----------------------------------------------------------
> 
> (Updated March 24, 2014, 7:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Bugs: MESOS-1036
>     https://issues.apache.org/jira/browse/MESOS-1036
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/Makefile.am 3c6219eb6e76306463b3710ab7e50ec8b75d3d76 
>   3rdparty/libprocess/include/process/metrics/counter.hpp PRE-CREATION 
>   3rdparty/libprocess/include/process/metrics/gauge.hpp PRE-CREATION 
>   3rdparty/libprocess/include/process/metrics/metric.hpp PRE-CREATION 
>   3rdparty/libprocess/src/metrics/metric.cpp PRE-CREATION 
>   3rdparty/libprocess/src/metrics/metrics.hpp PRE-CREATION 
>   3rdparty/libprocess/src/metrics/metrics.cpp PRE-CREATION 
>   3rdparty/libprocess/src/process.cpp 
> 6c6acc03ad78779e8f25b1a4584069fd377f647b 
>   3rdparty/libprocess/src/tests/metrics_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18718/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests. make check.
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>

Reply via email to