A beam cluster with the spark runner would include a spark cluster, plus what's needed for portability, plus the beam sdk.
On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau <[email protected]> wrote: > > > Le 5 mai 2018 08:43, "Reuven Lax" <[email protected]> a écrit : > > I don't believe we enforce docker anywhere. In fact if someone wanted to > run an all-windows beam cluster, they would probably not use docker for > their runner (docker runs on Windows, but not efficiently). > > > > Or doesnt run sometimes - a colleague hit that yesterday :(. > > What is a "beam cluster" - opposed to a spark or foink cluster? How would > it work on windows servers? > > > On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau <[email protected]> > wrote: > >> >> >> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <[email protected]>: >> >>> What docker really buys is a package format and runtime environment that >>> is language and operating system agnostic. The docker packaging and >>> runtime format is the de facto standard for portable applications such as >>> this, and there is a group trying to turn it into an actual standard. >>> >>> I would agree with you that dockerd has become bloated but there are >>> projects that solve that. There is no longer lock-in to dockerd, there >>> are package format compatible docker replacements that eliminate the >>> performance issues and overhead associated with docker. CRI-O ( >>> https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat >>> project which is a minimalist replacement for docker. I was recently >>> working at a startup where I migrated our "data mover" appliance from >>> Docker to CRI-O. Our application was able to get direct access to the >>> ethernet driver and block devices which enabled a huge performance boost >>> but we were also able to run containers produced by docker without >>> modification. >>> >>> You mention that docker is "detail of one runner+vendor corrupting all >>> the project and adding complexity and work to everyone". It sounds like >>> you have a specific example you'd like to share? Is there a runner that is >>> unable to move to portability because of docker? >>> >> >> IBM one for instance, some custom ones like an hazelcast based one, >> etc... More generally any runner developped outside beam itself - even if >> we take a snapshot today, most of beam's ones have the same pitall. >> >> Note: i never said docker was a bad techno or so. Let me try to clarify. >> >> Main issue is that you enforce docker usage which is still trendy. It is >> like scla which was promishing to kill java, check what it does today... >> It starts to be tooled but it is also very impacting on the deployment >> side and for a good number of beam users who deploy it outside the cloud it >> is an issue. >> Keep in mind beam is embeddable by design, it is not a runner environment >> and with the docker choice it imposes some environment which is >> inconsistent with beam design itself and this is where this choice blocks. >> >> >>> >>> Andrew >>> >>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <[email protected]> wrote: >>> >>>> Romain, >>>> >>>> Docker, unlike selinux, solves a great number of tangible problems for >>>> us with IMO a relatively small tax. It does not have to be the only way. >>>> Some of the concerns you bring up along with possibilities were also >>>> discussed here: https://s.apache.org/beam-fn-api-container-contract. I >>>> encourage you to take a look. >>>> >>>> Thanks, >>>> Henning >>>> >>>> >>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau < >>>> [email protected]> wrote: >>>> >>>>> >>>>> >>>>> Le 4 mai 2018 21:31, "Henning Rohde" <[email protected]> a écrit : >>>>> >>>>> I disagree with the characterization of docker and the implications >>>>> made towards portability. Graal looks like a neat project (and I >>>>> never thought I would live to see the phrase "Practical Partial >>>>> Evaluation" >>>>> ..), but it doesn't address the needs of portability. In addition to >>>>> Luke's >>>>> examples, Go and most other languages don't work on it either. Docker >>>>> containers also address packaging, OS dependencies, conflicting versions >>>>> and distribution aspects in addition to truly universal language support. >>>>> >>>>> >>>>> This is wrong, docker also has its conflicts, is not universal (fails >>>>> on windows and mac easily - as host or not, cloud vendors put layers >>>>> limiting or corrupting it, and it is an infra constraint imposed and a >>>>> vendor locking not welcomed in beam IMHO). >>>>> >>>>> This is my main concern. All the work done looks like an >>>>> implemzntation detail of one runner+vendor corrupting all the project and >>>>> adding complexity and work to everyone instead of keeping it localised >>>>> (technically it is possible). >>>>> >>>>> Would you accept i enforce you to use selinux? Using docker is the >>>>> same kind of constraint. >>>>> >>>>> >>>>> That said, it's entirely fine for some runners to use Jython, Graal, >>>>> etc to provide a specialized offering similar to the direct runners, but >>>>> it >>>>> would be disjoint from portability IMO. >>>>> >>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <[email protected]> a écrit : >>>>>> >>>>>> I did take a look at Graal a while back when thinking about how >>>>>> execution environments could be defined, my concerns were related to it >>>>>> not >>>>>> supporting all of the features of a language. >>>>>> For example, its typical for Python to load and call native libraries >>>>>> and Graal can only execute C/C++ code that has been compiled to LLVM. >>>>>> Also, a good amount of people interested in using ML libraries will >>>>>> want access to GPUs to improve performance which I believe that Graal >>>>>> can't >>>>>> support. >>>>>> >>>>>> It can be a very useful way to run simple lamda functions written in >>>>>> some language directly without needing to use a docker environment but >>>>>> you >>>>>> could probably use something even lighter weight then Graal that is >>>>>> language specific like Jython. >>>>>> >>>>>> >>>>>> >>>>>> Right, the jsr223 impl works very well but you can also have a perf >>>>>> boost using native (like v8 java binding for js for instance). It is way >>>>>> more efficient than docker most of the time and not code intrusive at all >>>>>> in runners so likely more adoption-able and maintainable. That said all >>>>>> is >>>>>> doable behind the jsr223 so maybe not a big deal in terms of api. We just >>>>>> need to ensure portability work stay clean and actually portable and >>>>>> doesnt >>>>>> impact runners as poc done until today did. >>>>>> >>>>>> Works for me. >>>>>> >>>>>> >>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi guys >>>>>>> >>>>>>> Since some time there are efforts to have a language portable >>>>>>> support in beam but I cant really find a case it "works" being based on >>>>>>> docker except for some vendor specific infra. >>>>>>> >>>>>>> Current solution: >>>>>>> >>>>>>> 1. Is runner intrusive (which is bad for beam and prevents adoption >>>>>>> of big data vendors) >>>>>>> 2. Based on docker (which assumed a runtime environment and is very >>>>>>> ops/infra intrusive and likely too $$ quite often for what it brings) >>>>>>> >>>>>>> Did anyone had a look to graal which seems a way to make the feature >>>>>>> doable in a lighter manner and optimized compared to default jsr223 >>>>>>> impls? >>>>>>> >>>>>>> >>>>>> >>>>> >> >
