Re: Bootstrapping Beam's Job Server

Maximilian Michels Thu, 23 Aug 2018 04:55:02 -0700

Big +1. Process-based execution should be simple to reason about forusers. The implementation should not be too involved. The user has toensure the environment is suitable for process-based execution.


There are some minor features that we should support:

- Activating a virtual environment for Python / Adding pre-installedlibraries to the classpath


- Staging libraries, similarly to the boot code for Docker


On 22.08.18 07:49, Henning Rohde wrote:

Agree with Luke. Perhaps something simple, prescriptive yet flexible,such as custom command line (defined in the environment proto) rooted atthe base of the provided artifacts and either passed the same argumentsor defined in the container contract or made available throughsubstitution. That way, all the restrictions/assumptions of theexecution environment become implicit and runner/deployment dependent.

On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <[email protected]<mailto:[email protected]>> wrote:


    I believe supporting a simple Process environment makes sense. It
    would be best if we didn't make the Process route solve all the
    problems that Docker solves for us. In my opinion we should limit
    the Process route to assume that the execution environment:
    * has all dependencies and libraries installed
    * is of a compatible machine architecture
    * doesn't require special networking rules to be setup

    Any other suggestions for reasonable limits on a Process environment?

    On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <[email protected]
    <mailto:[email protected]>> wrote:

        It is also worth to mention that apart of the
        testing/development use
        case there is also the case of supporting people running in Hadoop
        distributions. There are two extra reasons to want a process based
        version: (1) Some Hadoop distributions run in machines with
        really old
        kernels where docker support is limited or nonexistent (yes, some of
        those run on kernel 2.6!) and (2) Ops people may be reticent to the
        additional operational overhead of enabling docker in their
        clusters.
        On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
        <[email protected] <mailto:[email protected]>> wrote:
         >
         > Thanks Henning and Thomas. It looks like
         >
         > a) we want to keep the Docker Job Server Docker container and
        rely on
         > spinning up "sibling" SDK harness containers via the Docker
        socket. This
         > should require little changes to the Runner code.
         >
         > b) have the InProcess SDK harness as an alternative way to
        running user
         > code. This can be done independently of a).
         >
         > Thomas, let's sync today on the InProcess SDK harness. I've
        created a
         > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
         >
         > Cheers,
         > Max
         >
         > On 21.08.18 00:35, Thomas Weise wrote:
         > > The original objective was to make test/development easier
        (which I
         > > think is super important for user experience with portable
        runner).
         > >
         > >  From first hand experience I can confirm that dealing with
        Flink
         > > clusters and Docker containers for local setup is a
        significant hurdle
         > > for Python developers.
         > >
         > > To simplify using Flink in embedded mode, the (direct)
        process based SDK
         > > harness would be a good option, especially when it can be
        linked to the
         > > same virtualenv that developers have already setup,
        eliminating extra
         > > packaging/deployment steps.
         > >
         > > Max, I would be interested to sync up on what your thoughts are
         > > regarding that option since you mention you also started to
        work on it
         > > (see previous discussion [1], not sure if there is a JIRA
        for it yet).
         > > Internally we are planning to use a direct SDK harness
        process instead
         > > of Docker containers. For our specific needs it will works
        equally well
         > > for development and production, including future plans to
        deploy Flink
         > > TMs via Kubernetes.
         > >
         > > Thanks,
         > > Thomas
         > >
         > > [1]
         > >
        
https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
         > >
         > >
         > >
         > >
         > >
         > >
         > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
        <[email protected] <mailto:[email protected]>
         > > <mailto:[email protected] <mailto:[email protected]>>> wrote:
         > >
         > >     Thanks for your suggestions. Please see below.
         > >
         > >      > Option 3) would be to map in the docker binary and
        socket to allow
         > >      > the containerized Flink job server to start
        "sibling" containers on
         > >      > the host.
         > >
         > >     Do you mean packaging Docker inside the Job Server
        container and
         > >     mounting /var/run/docker.sock from the host inside the
        container? That
         > >     looks like a bit of a hack but for testing it could be
        fine.
         > >
         > >      > notably, if the runner supports auto-scaling or
        similar non-trivial
         > >      > configurations, that would be difficult to manage
        from the SDK side.
         > >
         > >     You're right, it would be unfortunate if the SDK would
        have to deal with
         > >     spinning up SDK harness/backend containers. For non-trivial
         > >     configurations it would probably require an extended
        protocol.
         > >
         > >      > Option 4) We are also thinking about adding process
        based SDKHarness.
         > >      > This will avoid docker in docker scenario.
         > >
         > >     Actually, I had started implementing a process-based
        SDK harness but
         > >     figured it might be impractical because it doubles the
        execution path
         > >     for UDF code and potentially doesn't work with custom
        dependencies.
         > >
         > >      > Process based SDKHarness also has other applications
        and might be
         > >      > desirable in some of the production use cases.
         > >
         > >     True. Some users might want something more lightweight.
         > >
         >
         > --
         > Max


--
Max

Re: Bootstrapping Beam's Job Server

Reply via email to