> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, line 1135
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line1135>
> >
> >     If we do the above and if we have MESOS-995, can you just leverage 
> > subprocess here?
> >

Almost, yes. Let me quote from the header:
  // TODO(tillt): Remove and use process::subprocess once that has
  // become flexible enough to support all features needed:
  // - in-child-context commands (chdir) after fork and before exec.
  // - direct execution without the invocation of /bin/sh.
  // - external ownership of pipe file descriptors.

Additionally, we are missing a setsid within the child-context to prevent a 
slave suicide once the external containerizer process gets terminated.

For the reasoning:
chdir() is clear and would be covered by MESOS-995, which is great.

Direct execution is needed as we currently are depending on proper pipe close 
escalation. If /bin/sh is used to invoke and when its child process is closing 
a pipe, the shell has to close its end as well. In some tests I did, I found 
out that this is not always happening, depending on the shell in use - 
specifically dash on ubuntu appears to not do that in certain configurations.

External ownership of the pipe handles is needed as the current way of 
signaling a complete transmission of those protobufs is done via detection of 
an EOF. Subprocess assumes to have full control and does assume the pipes to be 
open until the process terminates.


Now circling back to your earlier comment on prepending the protobufs by their 
size to signal a complete transmission. That might ease the process as we could 
remove the need for "direct execution" and "external pipe ownership". So the 
only remaining reason to not use subprocess then would be the missing setsid().


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/flags.hpp, line 207
> > <https://reviews.apache.org/r/17567/diff/15/?file=531802#file531802line207>
> >
> >     s/task/task, when using external containerizer/ ?

Yes, that is better.


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, line 163
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line163>
> >
> >     Why VLOG?

At some point I refactored the log-levels to reduce things to a minimum (on 
INFO-level) - may have gone too far :).


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/examples/python/test_containerizer.py, lines 54-56
> > <https://reviews.apache.org/r/17567/diff/15/?file=531798#file531798line54>
> >
> >     What happens if the slave dies here before it gets a chance to 
> > checkpoint the pid? Does the executor stay up forever?

Is that still a problem if we can leverage the new logic which can clean up 
even though the executor is not in place?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/examples/python/test_containerizer.py, lines 54-56
> > <https://reviews.apache.org/r/17567/diff/15/?file=531798#file531798line54>
> >
> >     So the launch waits for the executor to exit? That seems wrong. Can it 
> > just fork the process and pass the pid back in the return protobuf?

That way, we would have to do reap two processes, the external containerizer 
script as well as the launched process (executor). Totally doable for sure, but 
I am a bit unsure how this would improve the results. Are you mostly aiming for 
the comment below?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.hpp, lines 119-120
> > <https://reviews.apache.org/r/17567/diff/15/?file=531800#file531800line119>
> >
> >     Why would the *executor* assume anything?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, lines 1125-1126
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line1125>
> >
> >     Instead of closing the file descriptor, how about just writing the 
> > length of the protobuf first and then serialized protobuf. That way the 
> > read end could know the exact size of data to read.
> >     
> >     This is how we do serialization and deserialization of protobufs to 
> > files. See protobuf::write() and protobuf::read().
> >     
> >     
> >

I like this approach. Will play with this a bit to see if there is something we 
are missing.


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, lines 284-287
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line284>
> >
> >     What is the reason for having defaults in 2 places, the slave and the 
> > external containerizer?

If no default image was supplied to the slave (via flags) and no image is 
supplied with the command, then something else has to be used. We have no 
control over that "something" and hence this log-output. Does that make sense?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, line 270
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line270>
> >
> >     only set HADOOP_HOME if the flag is non-empty.


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.hpp, lines 158-171
> > <https://reviews.apache.org/r/17567/diff/15/?file=531800#file531800line158>
> >
> >     Can all this be stuck into the Container/Running struct above?

That is possible but I felt this would be bit of a stretch as both are 
semantically different.

Sandbox is capturing information of a process to be launched, hence it is 
populated before the actual launch. Running/Container is capturing a 
information of a successfully launched process.

When putting both into a single structure, I would need to introduce additional 
state checks.

For example, this:
if (!containers.contains(containerId)) {
    return Failure("Container '" + containerId.value() + "' not running");
}

Needed to be changed towards something like this:
if (!containers.contains(containerId) || containers[containerId].pid == 0) {
    return Failure("Container '" + containerId.value() + "' not running");
}

Would you prefer that?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, line 331
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line331>
> >
> >     so launch() exec()s the executor? if not this seems to be the pid of 
> > the external containerizer script and not the executor.

Yes, it launches the external containerizer script which in turn takes care of 
launching an executor/command.


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.cpp, line 648
> > <https://reviews.apache.org/r/17567/diff/15/?file=531801#file531801line648>
> >
> >     Looks like this is the pid of the containerizer script and not the 
> > container?

Yes, see above.


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > include/mesos/mesos.proto, line 508
> > <https://reviews.apache.org/r/17567/diff/15/?file=531795#file531795line508>
> >
> >     does this need to be a protobuf?

My idea was to adhere to our general communication scheme by using a protobuf - 
even though this one currently only contains a string.

Would you prefer using a 0 terminated string without wrapping protobuf instead?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/external_containerizer.hpp, lines 142-144
> > <https://reviews.apache.org/r/17567/diff/15/?file=531800#file531800line142>
> >
> >     Why tuple instead of a struct?

I am using this for the await on tuple overload of process. Enhancing the 
comment to point that out, is that ok?


> On March 31, 2014, 8:03 p.m., Vinod Kone wrote:
> > src/examples/python/test_containerizer.py, line 28
> > <https://reviews.apache.org/r/17567/diff/15/?file=531798#file531798line28>
> >
> >     indent by 4. we are trying to follow PEP8.

We should also fix that in all other python scripts.


- Till


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17567/#review38504
-----------------------------------------------------------


On April 2, 2014, 2:02 a.m., Till Toenshoff wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17567/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 2:02 a.m.)
> 
> 
> Review request for mesos, Ian Downes, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-816
>     https://issues.apache.org/jira/browse/MESOS-816
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This patch adds the so-called external containerizer. This
> containerizer delegates all containerizer calls directly to 
> an external containerizer program (which can be specified on 
> start-up). Few calls have internal fall-back implementations 
> such as wait(), destroy() and usage().
> 
> The protocol for the interactions with the external program
> is as follows:
> 
> COMMAND (ADDITIONAL-PARAMETERS) < INPUT-PROTO > RESULT-PROTO
> 
> launch (ContainerID, --mesos-executor, <path>) < TaskInfo > ExternalStatus
> update (ContainerID) < ResourceArray > ExternalStatus
> usage (ContainerID) > ResourceStatistics
> wait (ContainerID) > ExternalTermination
> destroy (ContainerID) > ExternalStatus
> 
> When protocol buffers need to be provided, the Mesos side of
> the external containerizer implementation will serialize the 
> protos on stdin and vice-versa for reading protos on stdout as 
> drafted in the above scheme.
> 
> 
> NOTE: This is currently work in progress to cover all comments.
> 
> 
> Diffs
> -----
> 
>   configure.ac 5404dc2 
>   include/mesos/mesos.proto 37f8a7f 
>   src/Makefile.am 0775a0d 
>   src/examples/python/test-containerizer.in PRE-CREATION 
>   src/examples/python/test_containerizer.py PRE-CREATION 
>   src/slave/containerizer/containerizer.cpp 6de091e 
>   src/slave/containerizer/external_containerizer.hpp PRE-CREATION 
>   src/slave/containerizer/external_containerizer.cpp PRE-CREATION 
>   src/slave/flags.hpp c9a627b 
>   src/tests/external_containerizer_test.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/17567/diff/
> 
> 
> Testing
> -------
> 
> make check and functional testing.
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>

Reply via email to