Thanks for sharing the news! Sounds very exciting. Iβd be interested in following up with the companies that may use Nemo, too.
Cheers, Gon Sent from my iPad > On 1 Jul 2019, at 5:40 PM, μ‘μμ± <won...@apache.org> wrote: > > Hi! > > I got back from the Beam Summit Europe 2019 that happened last week in > Berlin, and I had lots of interesting conversations and feedbacks from the > people that I've met there. I thought I would share some of them with the > dev list. By the way you can check out the talk on youtube > <https://youtu.be/DKxYE8YWF_o>! > > First of all, a lot of people were *very* interested in Apache Nemo! and a > lot of people from the Beam community were very excited to hear about a new > runner with primary support for their language! A few reasons for their > interest had been that since Beam does not actually get involved in the > runtime layer, where the actual scheduling or communication or distributed > computation happens, they were interested in the optimizations that can be > done in such layers. > > Second, with all the support from the TFX team, as well as the Beam SQL > team, it would bring loads of new possibilities for Nemo by supporting the > *portability* *layer* of Beam, which supports applications written with any > languages among Java, Python, and Go (and more in the future!). The > portability layer is getting more and more mature, and I think it's about > time to support the portability layer for Nemo as well, as not a lot of > runners support it so far and it would give Nemo a head start. > > Another thing that I've noticed is that a lot of people are still very much > interested in *batch* processing rather than stream processing. From the > people that I've talked to, I've learned that people found stream > processing to be quite pricey and that they haven't found stream processing > worth the price that they were paying (for example, Spotify runs all of > their data processing workloads as batch). I guess Nemo could be a good > candidate to run batch processing, as Spark often suffers from problems as > large-scale shuffle and data skew problems, if not provided with machines > with enough memory, whereas Nemo is able to provide the optimizations for > such problems. I've also found the people were interested if Nemo supports > Kubernetes, which is a topic that we should definitely look into. > > I've also had many questions from the engineers from *Seznam.cz *and > *shopify.com > <http://shopify.com>* where they run their own datacenters to process their > data (I think). They have been facing exactly the same problems as > illustrated above (large-scale shuffle, data skew, frequent data reloading > for broadcasted data, utilizing transient resources, etc.), and have had > questions about running their data processing workloads on their large > amounts of data that they are facing every day (upto 40TB/day). I should > definitely follow up with them to see how they are doing, if they are > trying to use Nemo in their production, to provide help if needed and to > see Nemo's performance with real workloads. > > Lastly, I have been talking with Pablo (from Beam) about the trip to > *Seattle* and Renton, Washington next week regarding the USENIX ATC '19 > conference, and have had a chat about organizing a lunch and maybe a small > talk with the Googlers there as well! I've also heard that Davor is also > based in Seattle, so I have been thinking that it would be a great > opportunity for us to meet in person. πThe date would be probably the *15th > of July*, so please keep the date in mind if you would be interested! > > Cheers, > Wonook