[jira] [Commented] (MESOS-1199) Subprocess is "slow" -> gated by process::reap poll interval

Craig Hansen-Sturm (JIRA) Mon, 04 Aug 2014 11:08:46 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085001#comment-14085001
 ]


Craig Hansen-Sturm commented on MESOS-1199:
-------------------------------------------

Hi Benjamin,
I think developing an adaptive algorithm is a fine idea; however, what you are 
suggesting safely bounds the system load, at the cost of increasing 
notification latency.   e.g., you are suggesting we make a classic interrupt vs 
latency tradeoff.  



> Subprocess is "slow" -> gated by process::reap poll interval
> ------------------------------------------------------------
>
>                 Key: MESOS-1199
>                 URL: https://issues.apache.org/jira/browse/MESOS-1199
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 0.18.0
>            Reporter: Ian Downes
>            Assignee: Craig Hansen-Sturm
>         Attachments: wiatpid.pdf
>
>
> Subprocess uses process::reap to wait on the subprocess pid and set the exit 
> status. However, process::reap polls with a one second interval resulting in 
> a delay up to the interval duration before the status future is set.
> This means if you need to wait for the subprocess to complete you get hit 
> with E(delay) = 0.5 seconds, independent of the execution time. For example, 
> the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the 
> executor during launch. At Twitter we fetch a local file, i.e., a very fast 
> operation, but the launch is blocked until the mesos-fetcher pid is reaped -> 
> adding 0 to 1 seconds for every launch!
> The problem is even worse with a chain of short Subprocesses because after 
> the first Subprocess completes you'll be synchronized with the reap interval 
> and you'll see nearly the full interval before notification, i.e., 10 
> Subprocesses each of << 1 second duration with take ~10 seconds!
> This has become particularly apparent in some new tests I'm working on where 
> test durations are now greatly extended with each taking several seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1199) Subprocess is "slow" -> gated by process::reap poll interval

Reply via email to