Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Aljoscha Krettek Tue, 21 May 2019 04:31:29 -0700

We discussed this in private and came to the conclusion that we should (for 
now) have the dependency on flink-table-api-xxx-bridge because we need access 
to the collect() method, which is not yet available in the Table API. Once that 
is available the code can be refactored but for now we want to unblock work on 
this new module.


We also agreed that we don’t need a direct dependency on flink-table-planner.

I hope I summarised our discussion correctly.

> On 17. May 2019, at 12:20, Gen Luo <[email protected]> wrote:
> 
> Thanks for your reply.
> 
> For the first question, it's not strictly necessary. But I perfer not to
> have a TableEnvironment argument in Estimator.fit() or
> Transformer.transform(), which is not part of machine learning concept, and
> may make our API not as clean and pretty as other systems do. I would like
> another way other than introducing flink-table-planner to do this. If it's
> impossible or severely opposed, I may make the concession to add the
> argument.
> 
> Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> common case is that an algorithm needs to guarantee that it's running under
> a BatchTableEnvironment, which makes it possible to collect result each
> iteration. A typical algorithm like this is ALS. By flink1.8, this can be
> only achieved by converting Table to DataSet than call DataSet.collect(),
> which is available in flink-table-api-xxx-bridge. Besides, registering
> UDAGG is also depending on it.
> 
> In conclusion, '"planner" can be removed from dependencies but introducing
> "bridge"s are inevitable. Whether and how to acquire TableEnvironment from
> a Table can be discussed.

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Reply via email to