Re: [Question] use ML_PREDICT/ML_EVALUATE with python model provider in Flink

Shengkai Fang Tue, 14 Oct 2025 03:00:49 -0700

Hi, Matyas.

Thanks for the proposal.  I have some suggestions about the proposal.


1. I'm wondering whether we could extend the SQL API to change how Python
models are loaded. For example, we could allow users to write:

```
CREATE MODEL my_pytorch_model
WITH (
   'type' = 'pytorch'
) LANGUAGE PYTHON;
```
In this case, we wouldn't rely on Java SPI to load the Python model
provider. However, I'm not sure whether Python has a similar mechanism to
SPI that avoids hardcoding class paths.

2. Beam already supports TensorFlow, ONNX, and many built-in models. Can we
reuse Beam's utilities to build Flink prediction functions[1]?

3. It would be better if we introduced a PredictRuntimeContext to help
users download required weight files.

4. In ML, users typically perform inference on batches of data. Therefore,
per-record evaluation may not be necessary. How about we just introduce API
like[2]?

Best,
Shengkai

[1] https://beam.apache.org/documentation/ml/about-ml/
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation




Swapna Marru <[email protected]> 于2025年10月14日周二 11:53写道：

> Thanks Matyas.
>
> Hao,
>
> The proposal is to provide a generic framework .
> Interfaces ->  PythonPredictRuntimeProvider / PythonPredictFunction /
> PredictFunction(in Python) are defined to provide a base for that
> framework.
>
> generic-python is one of the implementations, registered similar to openai
> in original FLIP.
> This is though not a concrete implementation end to end. It can be used as,
> 1. As a reference implementation for other complete end to end concrete
> model provider implementations
> 2. For simple python model implementations, this can be used out of box to
> avoid boilerplate java provider implementation.
>
> I will also open a PR with current implementation changes , so it's more
> clear for further discussion.
>
> -Thanks,
> M.Swapna
>
> On Mon, Oct 13, 2025 at 5:04 PM Őrhidi Mátyás <[email protected]>
> wrote:
>
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-552+Support+ML_PREDICT+for+Python+based+model+providers
> >
> > On Mon, Oct 13, 2025 at 4:10 PM Őrhidi Mátyás <[email protected]>
> > wrote:
> > >
> > > Swapna, I can help you to create a FLIP page.
> > >
> > > On Mon, Oct 13, 2025 at 3:58 PM Hao Li <[email protected]>
> wrote:
> > > >
> > > > Hi Swapna,
> > > >
> > > > Thanks for the proposal. Can you put it in a FLIP and start a
> > discussion
> > > > thread for it?
> > > >
> > > > From an initial look, I'm a bit confused if this is a concrete
> > > > implementation for "generic-python" or it's generic framework to
> handle
> > > > python predict function. Because everything seems concrete like
> > > > `GenericPythonModelProviderFactory`, `GenericPythonModelProvider`
> > exception
> > > > the final Python predict function.
> > > >
> > > > Also if `GenericPythonModelProviderFactory` is predefined, do you
> > predefine
> > > > the required and optional options for it? Will it be inflexible if
> > > > predefined?
> > > >
> > > > Thanks,
> > > > Hao
> > > >
> > > > On Mon, Oct 13, 2025 at 10:04 AM Swapna Marru <
> > [email protected]>
> > > > wrote:
> > > > >
> > > > > Hi ShengKai,
> > > > >
> > > > > Documented the initial proposal here ,
> > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1YzBxLUPvluaZIvR0S3ktc5Be1FF4bNeTsXB9ILfgyWY/edit?usp=sharing
> > > > >
> > > > > Please review and let me know your thoughts.
> > > > >
> > > > > -Thanks,
> > > > > Swapna
> > > > >
> > > > > On Tue, Sep 23, 2025 at 10:39 PM Shengkai Fang <[email protected]>
> > wrote:
> > > > >
> > > > > > I see your point, and I agree that your proposal is feasible.
> > However,
> > > > > > there is one limitation to consider: the current loading
> mechanism
> > first
> > > > > > discovers all available factories on the classpath and then
> > filters them
> > > > > > based on the user-specified identifiers.
> > > > > >
> > > > > > In most practical scenarios, we would likely have only one
> generic
> > > > factory
> > > > > > (e.g., a GenericPythonModelFactory) present in the classpath.
> This
> > means
> > > > > > the framework would be able to load either PyTorch or TensorFlow
> > > > > > models—whichever is defined within that single generic
> > > > implementation—but
> > > > > > not both simultaneously unless additional mechanisms are
> > introduced.
> > > > > >
> > > > > > This doesn't block the proposal, but it’s something worth noting
> > as we
> > > > > > design the extensibility model. We may want to explore ways to
> > support
> > > > > > multiple user-defined providers more seamlessly in the future.
> > > > > >
> > > > > > Best,
> > > > > > Shengkai
> > > > > >
> >
>

Re: [Question] use ML_PREDICT/ML_EVALUATE with python model provider in Flink

Reply via email to