Thanks for sharing the news!

Sounds very exciting. I’d be interested in following up with the companies that 
may use Nemo, too.

Cheers,
Gon

Sent from my iPad

> On 1 Jul 2019, at 5:40 PM, μ†‘μ›μš± <won...@apache.org> wrote:
> 
> Hi!
> 
> I got back from the Beam Summit Europe 2019 that happened last week in
> Berlin, and I had lots of interesting conversations and feedbacks from the
> people that I've met there. I thought I would share some of them with the
> dev list. By the way you can check out the talk on youtube
> <https://youtu.be/DKxYE8YWF_o>!
> 
> First of all, a lot of people were *very* interested in Apache Nemo! and a
> lot of people from the Beam community were very excited to hear about a new
> runner with primary support for their language! A few reasons for their
> interest had been that since Beam does not actually get involved in the
> runtime layer, where the actual scheduling or communication or distributed
> computation happens, they were interested in the optimizations that can be
> done in such layers.
> 
> Second, with all the support from the TFX team, as well as the Beam SQL
> team, it would bring loads of new possibilities for Nemo by supporting the
> *portability* *layer* of Beam, which supports applications written with any
> languages among Java, Python, and Go (and more in the future!). The
> portability layer is getting more and more mature, and I think it's about
> time to support the portability layer for Nemo as well, as not a lot of
> runners support it so far and it would give Nemo a head start.
> 
> Another thing that I've noticed is that a lot of people are still very much
> interested in *batch* processing rather than stream processing. From the
> people that I've talked to, I've learned that people found stream
> processing to be quite pricey and that they haven't found stream processing
> worth the price that they were paying (for example, Spotify runs all of
> their data processing workloads as batch). I guess Nemo could be a good
> candidate to run batch processing, as Spark often suffers from problems as
> large-scale shuffle and data skew problems, if not provided with machines
> with enough memory, whereas Nemo is able to provide the optimizations for
> such problems. I've also found the people were interested if Nemo supports
> Kubernetes, which is a topic that we should definitely look into.
> 
> I've also had many questions from the engineers from *Seznam.cz *and
> *shopify.com
> <http://shopify.com>* where they run their own datacenters to process their
> data (I think). They have been facing exactly the same problems as
> illustrated above (large-scale shuffle, data skew, frequent data reloading
> for broadcasted data, utilizing transient resources, etc.), and have had
> questions about running their data processing workloads on their large
> amounts of data that they are facing every day (upto 40TB/day). I should
> definitely follow up with them to see how they are doing, if they are
> trying to use Nemo in their production, to provide help if needed and to
> see Nemo's performance with real workloads.
> 
> Lastly, I have been talking with Pablo (from Beam) about the trip to
> *Seattle* and Renton, Washington next week regarding the USENIX ATC '19
> conference, and have had a chat about organizing a lunch and maybe a small
> talk with the Googlers there as well! I've also heard that Davor is also
> based in Seattle, so I have been thinking that it would be a great
> opportunity for us to meet in person. πŸ˜€The date would be probably the *15th
> of July*, so please keep the date in mind if you would be interested!
> 
> Cheers,
> Wonook

Reply via email to