Hi all, This is Xun Liu contributing to the Submarine project for deep learning workloads running with big data workloads together on Hadoop clusters.
There are a bunch of integrations of Submarine to other projects are finished or going on, such as Apache Zeppelin, TonY, Azkaban. The next step of Submarine is going to integrate with more projects like Apache Arrow, Redis, MLflow, etc. & be able to handle end-to-end machine learning use cases like model serving, notebook management, advanced training optimizations (like auto parameter tuning, memory cache optimizations for large datasets for training, etc.), and make it run on other platforms like Kubernetes or natively on Cloud. LinkedIn also wants to donate TonY project to Apache so we can put Submarine and TonY together to the same codebase (Page #30. https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-tony-tensorflow-on-yarn-and-beyond#30 ). This expands the scope of the original Submarine project in exciting new ways. Toward that end, would it make sense to create a separate Submarine project at Apache? This can make faster adoption of Submarine, and allow Submarine to grow to a full-blown machine learning platform. There will be lots of technical details to work out, but any initial thoughts on this? Best Regards, Xun Liu