Hi! I got back from the Beam Summit Europe 2019 that happened last week in Berlin, and I had lots of interesting conversations and feedbacks from the people that I've met there. I thought I would share some of them with the dev list. By the way you can check out the talk on youtube <https://youtu.be/DKxYE8YWF_o>!
First of all, a lot of people were *very* interested in Apache Nemo! and a lot of people from the Beam community were very excited to hear about a new runner with primary support for their language! A few reasons for their interest had been that since Beam does not actually get involved in the runtime layer, where the actual scheduling or communication or distributed computation happens, they were interested in the optimizations that can be done in such layers. Second, with all the support from the TFX team, as well as the Beam SQL team, it would bring loads of new possibilities for Nemo by supporting the *portability* *layer* of Beam, which supports applications written with any languages among Java, Python, and Go (and more in the future!). The portability layer is getting more and more mature, and I think it's about time to support the portability layer for Nemo as well, as not a lot of runners support it so far and it would give Nemo a head start. Another thing that I've noticed is that a lot of people are still very much interested in *batch* processing rather than stream processing. From the people that I've talked to, I've learned that people found stream processing to be quite pricey and that they haven't found stream processing worth the price that they were paying (for example, Spotify runs all of their data processing workloads as batch). I guess Nemo could be a good candidate to run batch processing, as Spark often suffers from problems as large-scale shuffle and data skew problems, if not provided with machines with enough memory, whereas Nemo is able to provide the optimizations for such problems. I've also found the people were interested if Nemo supports Kubernetes, which is a topic that we should definitely look into. I've also had many questions from the engineers from *Seznam.cz *and *shopify.com <http://shopify.com>* where they run their own datacenters to process their data (I think). They have been facing exactly the same problems as illustrated above (large-scale shuffle, data skew, frequent data reloading for broadcasted data, utilizing transient resources, etc.), and have had questions about running their data processing workloads on their large amounts of data that they are facing every day (upto 40TB/day). I should definitely follow up with them to see how they are doing, if they are trying to use Nemo in their production, to provide help if needed and to see Nemo's performance with real workloads. Lastly, I have been talking with Pablo (from Beam) about the trip to *Seattle* and Renton, Washington next week regarding the USENIX ATC '19 conference, and have had a chat about organizing a lunch and maybe a small talk with the Googlers there as well! I've also heard that Davor is also based in Seattle, so I have been thinking that it would be a great opportunity for us to meet in person. πThe date would be probably the *15th of July*, so please keep the date in mind if you would be interested! Cheers, Wonook