Hi Holden, The note on Flink & Spark support sounds reasonable to me. I am optimistic about getting Flink + TFX + Kubeflow working fairly soon, but I agree that we don't want to over-promise.
I'm not so sure about the status of Dataflow here, perhaps someone else can comment on that. Looking forward to the book :) Kyle On Fri, Apr 17, 2020 at 1:14 PM Holden Karau <[email protected]> wrote: > Hi Apache Beam Developers, > > I'm working on a book about Kubeflow, which naturally has a section on > TFX. I want to set users expectations correctly so I wanted to know what > y'all thought of this NOTE we were thinking of including in the early > release: > > Apache Beam’s Python support outside of Google cloud's Dataflow is > relatively new. TFX is a Python tool, so scaling it depends on Apache > Beam's Python support. You can scale your job by using the non-portable > dataflow component, but this requires changing your pipeline code and isn't > supported by Kubeflow's current TFX components. As Apache Beam's support > for Apache Flink & Spark improves support may be added for scaling the TFX > components in a portable manner. > > Does this sound reasonable to folks? I don't want to over-promise but I > also don't want to scare people away given all of the progress that is > being made in supporting the open-source runners with language portability. > > Cheers, > > Holden :) > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >
