On Fri, Apr 17, 2020 at 2:56 PM Holden Karau <hol...@pigscanfly.ca> wrote:
> > On Fri, Apr 17, 2020 at 2:45 PM Robert Bradshaw <rober...@google.com> > wrote: > >> Hi Holden! >> >> I agree with Kyle that it makes sense to have some caveat about Flink and >> Spark, though at this point they're not /that/ new (at least not Flink). >> > True, maybe "early-stage" would be better wording? The TFX PyBeam Flink > support isn't yet mature enough (although there is interest in integrating > it in Kubeflow I believe, it hasn't happened yet). > I might just say "not as mature." Most of the work being done now is fit-n-finish. There's also some extra flags that need to be passed to work around bugs in Flink itself encountered when running TFX jobs. (There's the separate question of using kuberneties to deploy/manage the Flink cluster itself, but the mode where Flink workers invoke docker to start up the Python binaries is pretty stable at this point.) > I am curious what extra support Kubeflow is "missing" (or, conversely, >> what extra support it has for Dataflow that goes beyond just specifying a >> different runner) to the point that these runners are declared >> "unsupported." Or it it literally a matter of not providing user support? >> > So the Kubeflow TFX components (in > https://github.com/kubeflow/pipelines/tree/master/components) are limited > to local mode. > So in that sense it's not less supported than Dataflow? > >> On Fri, Apr 17, 2020 at 12:27 PM Kyle Weaver <kcwea...@google.com> wrote: >> >>> Hi Holden, >>> >>> The note on Flink & Spark support sounds reasonable to me. I am >>> optimistic about getting Flink + TFX + Kubeflow working fairly soon, but I >>> agree that we don't want to over-promise. >>> >>> I'm not so sure about the status of Dataflow here, perhaps someone else >>> can comment on that. >>> >>> Looking forward to the book :) >>> >>> Kyle >>> >>> On Fri, Apr 17, 2020 at 1:14 PM Holden Karau <hol...@pigscanfly.ca> >>> wrote: >>> >>>> Hi Apache Beam Developers, >>>> >>>> I'm working on a book about Kubeflow, which naturally has a section on >>>> TFX. I want to set users expectations correctly so I wanted to know what >>>> y'all thought of this NOTE we were thinking of including in the early >>>> release: >>>> >>>> Apache Beam’s Python support outside of Google cloud's Dataflow is >>>> relatively new. TFX is a Python tool, so scaling it depends on Apache >>>> Beam's Python support. You can scale your job by using the non-portable >>>> dataflow component, but this requires changing your pipeline code and isn't >>>> supported by Kubeflow's current TFX components. As Apache Beam's support >>>> for Apache Flink & Spark improves support may be added for scaling the TFX >>>> components in a portable manner. >>>> >>>> Does this sound reasonable to folks? I don't want to over-promise but I >>>> also don't want to scare people away given all of the progress that is >>>> being made in supporting the open-source runners with language portability. >>>> >>>> Cheers, >>>> >>>> Holden :) >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >