[ https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085001#comment-14085001 ]
Craig Hansen-Sturm commented on MESOS-1199: ------------------------------------------- Hi Benjamin, I think developing an adaptive algorithm is a fine idea; however, what you are suggesting safely bounds the system load, at the cost of increasing notification latency. e.g., you are suggesting we make a classic interrupt vs latency tradeoff. > Subprocess is "slow" -> gated by process::reap poll interval > ------------------------------------------------------------ > > Key: MESOS-1199 > URL: https://issues.apache.org/jira/browse/MESOS-1199 > Project: Mesos > Issue Type: Improvement > Affects Versions: 0.18.0 > Reporter: Ian Downes > Assignee: Craig Hansen-Sturm > Attachments: wiatpid.pdf > > > Subprocess uses process::reap to wait on the subprocess pid and set the exit > status. However, process::reap polls with a one second interval resulting in > a delay up to the interval duration before the status future is set. > This means if you need to wait for the subprocess to complete you get hit > with E(delay) = 0.5 seconds, independent of the execution time. For example, > the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the > executor during launch. At Twitter we fetch a local file, i.e., a very fast > operation, but the launch is blocked until the mesos-fetcher pid is reaped -> > adding 0 to 1 seconds for every launch! > The problem is even worse with a chain of short Subprocesses because after > the first Subprocess completes you'll be synchronized with the reap interval > and you'll see nearly the full interval before notification, i.e., 10 > Subprocesses each of << 1 second duration with take ~10 seconds! > This has become particularly apparent in some new tests I'm working on where > test durations are now greatly extended with each taking several seconds. -- This message was sent by Atlassian JIRA (v6.2#6252)