Ian Downes created MESOS-1199:
---------------------------------

             Summary: Subprocess is "slow" -> gated by process::reap poll 
interval
                 Key: MESOS-1199
                 URL: https://issues.apache.org/jira/browse/MESOS-1199
             Project: Mesos
          Issue Type: Improvement
    Affects Versions: 0.18.0
            Reporter: Ian Downes


Subprocess uses process::reap to wait on the subprocess pid and set the exit 
status. However, process::reap polls with a one second interval resulting in a 
delay up to the interval duration before the status future is set.

This means if you need to wait for the subprocess to complete you get hit with 
E(delay) = 0.5 seconds, independent of the execution time. For example, the 
MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the executor 
during launch. At Twitter we fetch a local file, i.e., a very fast operation, 
but the launch is blocked until the mesos-fetcher pid is reaped -> adding 0 to 
1 seconds for every launch!

The problem is even worse with a chain of short Subprocesses because after the 
first Subprocess completes you'll be synchronized with the reap interval and 
you'll see nearly the full interval before notification, i.e., 10 Subprocesses 
each of << 1 second duration with take ~10 seconds!

This has become particularly apparent in some new tests I'm working on where 
test durations are now greatly extended with each taking several seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to