Good job, wonwook! I will be looking forward to hearing further news from you.
Best, Gyewon 2019년 7월 2일 (화) 오전 3:17, Byung-Gon Chun <bgc...@gmail.com>님이 작성: > Thanks for sharing the news! > > Sounds very exciting. I’d be interested in following up with the companies > that may use Nemo, too. > > Cheers, > Gon > > Sent from my iPad > > > On 1 Jul 2019, at 5:40 PM, 송원욱 <won...@apache.org> wrote: > > > > Hi! > > > > I got back from the Beam Summit Europe 2019 that happened last week in > > Berlin, and I had lots of interesting conversations and feedbacks from > the > > people that I've met there. I thought I would share some of them with the > > dev list. By the way you can check out the talk on youtube > > <https://youtu.be/DKxYE8YWF_o>! > > > > First of all, a lot of people were *very* interested in Apache Nemo! and > a > > lot of people from the Beam community were very excited to hear about a > new > > runner with primary support for their language! A few reasons for their > > interest had been that since Beam does not actually get involved in the > > runtime layer, where the actual scheduling or communication or > distributed > > computation happens, they were interested in the optimizations that can > be > > done in such layers. > > > > Second, with all the support from the TFX team, as well as the Beam SQL > > team, it would bring loads of new possibilities for Nemo by supporting > the > > *portability* *layer* of Beam, which supports applications written with > any > > languages among Java, Python, and Go (and more in the future!). The > > portability layer is getting more and more mature, and I think it's about > > time to support the portability layer for Nemo as well, as not a lot of > > runners support it so far and it would give Nemo a head start. > > > > Another thing that I've noticed is that a lot of people are still very > much > > interested in *batch* processing rather than stream processing. From the > > people that I've talked to, I've learned that people found stream > > processing to be quite pricey and that they haven't found stream > processing > > worth the price that they were paying (for example, Spotify runs all of > > their data processing workloads as batch). I guess Nemo could be a good > > candidate to run batch processing, as Spark often suffers from problems > as > > large-scale shuffle and data skew problems, if not provided with machines > > with enough memory, whereas Nemo is able to provide the optimizations for > > such problems. I've also found the people were interested if Nemo > supports > > Kubernetes, which is a topic that we should definitely look into. > > > > I've also had many questions from the engineers from *Seznam.cz *and > > *shopify.com > > <http://shopify.com>* where they run their own datacenters to process > their > > data (I think). They have been facing exactly the same problems as > > illustrated above (large-scale shuffle, data skew, frequent data > reloading > > for broadcasted data, utilizing transient resources, etc.), and have had > > questions about running their data processing workloads on their large > > amounts of data that they are facing every day (upto 40TB/day). I should > > definitely follow up with them to see how they are doing, if they are > > trying to use Nemo in their production, to provide help if needed and to > > see Nemo's performance with real workloads. > > > > Lastly, I have been talking with Pablo (from Beam) about the trip to > > *Seattle* and Renton, Washington next week regarding the USENIX ATC '19 > > conference, and have had a chat about organizing a lunch and maybe a > small > > talk with the Googlers there as well! I've also heard that Davor is also > > based in Seattle, so I have been thinking that it would be a great > > opportunity for us to meet in person. 😀The date would be probably the > *15th > > of July*, so please keep the date in mind if you would be interested! > > > > Cheers, > > Wonook >