Re: Back from Berlin Beam Summit 2019

Gyewon Lee Mon, 01 Jul 2019 18:51:59 -0700

Good job, wonwook! I will be looking forward to hearing further news from
you.


Best,
Gyewon

2019년 7월 2일 (화) 오전 3:17, Byung-Gon Chun <[email protected]>님이 작성:

> Thanks for sharing the news!
>
> Sounds very exciting. I’d be interested in following up with the companies
> that may use Nemo, too.
>
> Cheers,
> Gon
>
> Sent from my iPad
>
> > On 1 Jul 2019, at 5:40 PM, 송원욱 <[email protected]> wrote:
> >
> > Hi!
> >
> > I got back from the Beam Summit Europe 2019 that happened last week in
> > Berlin, and I had lots of interesting conversations and feedbacks from
> the
> > people that I've met there. I thought I would share some of them with the
> > dev list. By the way you can check out the talk on youtube
> > <https://youtu.be/DKxYE8YWF_o>!
> >
> > First of all, a lot of people were *very* interested in Apache Nemo! and
> a
> > lot of people from the Beam community were very excited to hear about a
> new
> > runner with primary support for their language! A few reasons for their
> > interest had been that since Beam does not actually get involved in the
> > runtime layer, where the actual scheduling or communication or
> distributed
> > computation happens, they were interested in the optimizations that can
> be
> > done in such layers.
> >
> > Second, with all the support from the TFX team, as well as the Beam SQL
> > team, it would bring loads of new possibilities for Nemo by supporting
> the
> > *portability* *layer* of Beam, which supports applications written with
> any
> > languages among Java, Python, and Go (and more in the future!). The
> > portability layer is getting more and more mature, and I think it's about
> > time to support the portability layer for Nemo as well, as not a lot of
> > runners support it so far and it would give Nemo a head start.
> >
> > Another thing that I've noticed is that a lot of people are still very
> much
> > interested in *batch* processing rather than stream processing. From the
> > people that I've talked to, I've learned that people found stream
> > processing to be quite pricey and that they haven't found stream
> processing
> > worth the price that they were paying (for example, Spotify runs all of
> > their data processing workloads as batch). I guess Nemo could be a good
> > candidate to run batch processing, as Spark often suffers from problems
> as
> > large-scale shuffle and data skew problems, if not provided with machines
> > with enough memory, whereas Nemo is able to provide the optimizations for
> > such problems. I've also found the people were interested if Nemo
> supports
> > Kubernetes, which is a topic that we should definitely look into.
> >
> > I've also had many questions from the engineers from *Seznam.cz *and
> > *shopify.com
> > <http://shopify.com>* where they run their own datacenters to process
> their
> > data (I think). They have been facing exactly the same problems as
> > illustrated above (large-scale shuffle, data skew, frequent data
> reloading
> > for broadcasted data, utilizing transient resources, etc.), and have had
> > questions about running their data processing workloads on their large
> > amounts of data that they are facing every day (upto 40TB/day). I should
> > definitely follow up with them to see how they are doing, if they are
> > trying to use Nemo in their production, to provide help if needed and to
> > see Nemo's performance with real workloads.
> >
> > Lastly, I have been talking with Pablo (from Beam) about the trip to
> > *Seattle* and Renton, Washington next week regarding the USENIX ATC '19
> > conference, and have had a chat about organizing a lunch and maybe a
> small
> > talk with the Googlers there as well! I've also heard that Davor is also
> > based in Seattle, so I have been thinking that it would be a great
> > opportunity for us to meet in person. 😀The date would be probably the
> *15th
> > of July*, so please keep the date in mind if you would be interested!
> >
> > Cheers,
> > Wonook
>

Re: Back from Berlin Beam Summit 2019

Reply via email to