On Fri, Apr 17, 2020 at 2:56 PM Holden Karau <hol...@pigscanfly.ca> wrote:

>
> On Fri, Apr 17, 2020 at 2:45 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Hi Holden!
>>
>> I agree with Kyle that it makes sense to have some caveat about Flink and
>> Spark, though at this point they're not /that/ new (at least not Flink).
>>
> True, maybe "early-stage" would be better wording?  The TFX PyBeam Flink
> support isn't yet mature enough (although there is interest in integrating
> it in Kubeflow I believe, it hasn't happened yet).
>

I might just say "not as mature." Most of the work being done now is
fit-n-finish. There's also some extra flags that need to be passed to work
around bugs in Flink itself encountered when running TFX jobs. (There's the
separate question of using kuberneties to deploy/manage the Flink cluster
itself, but the mode where Flink workers invoke docker to start up the
Python binaries is pretty stable at this point.)


> I am curious what extra support Kubeflow is "missing" (or, conversely,
>> what extra support it has for Dataflow that goes beyond just specifying a
>> different runner) to the point that these runners are declared
>> "unsupported." Or it it literally a matter of not providing user support?
>>
> So the Kubeflow TFX components (in
> https://github.com/kubeflow/pipelines/tree/master/components) are limited
> to local mode.
>

So in that sense it's not less supported than Dataflow?


>
>> On Fri, Apr 17, 2020 at 12:27 PM Kyle Weaver <kcwea...@google.com> wrote:
>>
>>> Hi Holden,
>>>
>>> The note on Flink & Spark support sounds reasonable to me. I am
>>> optimistic about getting Flink + TFX + Kubeflow working fairly soon, but I
>>> agree that we don't want to over-promise.
>>>
>>> I'm not so sure about the status of Dataflow here, perhaps someone else
>>> can comment on that.
>>>
>>> Looking forward to the book :)
>>>
>>> Kyle
>>>
>>> On Fri, Apr 17, 2020 at 1:14 PM Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Hi Apache Beam Developers,
>>>>
>>>> I'm working on a book about Kubeflow, which naturally has a section on
>>>> TFX. I want to set users expectations correctly so I wanted to know what
>>>> y'all thought of this NOTE we were thinking of including in the early
>>>> release:
>>>>
>>>> Apache Beam’s Python support outside of Google cloud's Dataflow is
>>>> relatively new. TFX is a Python tool, so scaling it depends on Apache
>>>> Beam's Python support. You can scale your job by using the non-portable
>>>> dataflow component, but this requires changing your pipeline code and isn't
>>>> supported by Kubeflow's current TFX components. As Apache Beam's support
>>>> for Apache Flink & Spark improves support may be added for scaling the TFX
>>>> components in a portable manner.
>>>>
>>>> Does this sound reasonable to folks? I don't want to over-promise but I
>>>> also don't want to scare people away given all of the progress that is
>>>> being made in supporting the open-source runners with language portability.
>>>>
>>>> Cheers,
>>>>
>>>> Holden :)
>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Reply via email to