Re: [Question] use ML_PREDICT/ML_EVALUATE with python model provider in Flink

Swapna Marru Fri, 17 Oct 2025 21:34:44 -0700

Hi Hao,

Sure, can change that.
>> Maybe change `getPythonPredictFunction` to `getPythonPredictFunction` to
align with `createPredictFunction` in `PredictRuntimeProvider`.


Yes , this can be enforced . It can fail if it's not an implementation of
PredictFunction.
This can be done as part of getPythonFunction in java_gateway.py (where the
underlying python function is instantiated).
>> Is it possible to enforce the returned Java `PythonFunction` from
`createPythonFunction` implements Python `PredictFunction`?

On Tue, Oct 14, 2025 at 4:09 PM Hao Li <[email protected]> wrote:

> Hi Swapna,
>
> Some other suggestion and questions:
>
> 1. Maybe change `getPythonPredictFunction` to `getPythonPredictFunction` to
> align with `createPredictFunction` in `PredictRuntimeProvider`.
> 2. Is it possible to enforce the returned Java `PythonFunction` from
> `createPythonFunction` implements Python `PredictFunction`?
>
> Hao
>
> On Tue, Oct 14, 2025 at 2:52 PM Swapna Marru <[email protected]>
> wrote:
>
> > Thanks Matyas and Shengkai.
> >
> >
> > > 1. I'm wondering whether we could extend the SQL API to change how
> Python
> > > models are loaded. For example, we could allow users to write:
> > >
> > > ```
> > > CREATE MODEL my_pytorch_model
> > > WITH (
> > >    'type' = 'pytorch'
> > > ) LANGUAGE PYTHON;
> > > ```
> >
> > Kind of thought about this, but one initial concern I had with this model
> > is,
> > Will a model provider be completely implemented in python itself ?
> > When we refer to PyTorch or HuggingFace or ONNX model providers for
> > example, do we need different behavior
> > or optimizations related to Predictruntimecontext/model config building ,
> > batching or resource scheduling decisions .. which needs to be done in
> Java
> > Flink entrypoints ?
> >
> >
> > > 2. Beam already supports TensorFlow, ONNX, and many built-in models.
> Can
> > we
> > > reuse Beam's utilities to build Flink prediction functions[1]?
> >
> > Thanks, I will take a look at this to understand how it works and if we
> can
> > learn from that design.
> >
> > > 3. It would be better if we introduced a PredictRuntimeContext to help
> > > users download required weight files.
> >
> > Currently I have this as Model Config(set_model_config) , where this will
> > be passed to PredictFunction in python. But PredictRuntimeContext seems
> > more suitable .
> > I will look into passing PredictRuntimeContext in open, similar to
> > RuntimeContext for UDF's.
> >
> >
> > > 4. In ML, users typically perform inference on batches of data.
> > Therefore,
> > > per-record evaluation may not be necessary. How about we just introduce
> > API
> > > like[2]?
> >
> > Yes, I completely agree on this. I was first aiming at agreement on model
> > creation and provider api and then look at this in more detail.
> >
> > On Tue, Oct 14, 2025 at 10:39 AM Őrhidi Mátyás <[email protected]>
> > wrote:
> >
> > > Hey Shengkai,
> > >
> > > Thank you for your observations. This proposal is mostly driven by
> > > Swapna, but I could also share my thoughts here, please find them
> > > inline.
> > >
> > > Cheers,
> > > Matyas
> > >
> > > On Tue, Oct 14, 2025 at 3:02 AM Shengkai Fang <[email protected]>
> wrote:
> > > >
> > > > Hi, Matyas.
> > > >
> > > > Thanks for the proposal.  I have some suggestions about the proposal.
> > > >
> > > > 1. I'm wondering whether we could extend the SQL API to change how
> > Python
> > > > models are loaded. For example, we could allow users to write:
> > > >
> > > > ```
> > > > CREATE MODEL my_pytorch_model
> > > > WITH (
> > > >    'type' = 'pytorch'
> > > > ) LANGUAGE PYTHON;
> > > > ```
> > > > In this case, we wouldn't rely on Java SPI to load the Python model
> > > > provider. However, I'm not sure whether Python has a similar
> mechanism
> > to
> > > > SPI that avoids hardcoding class paths.
> > >
> > > This is an interesting idea, however we are proposing using the
> > > provider model because it aligns with Flink's existing Java-based
> > > architecture for discovering plugins. A Java entry point is required
> > > to launch the Python code, and this is the standard way to do it.
> > >
> > > > 2. Beam already supports TensorFlow, ONNX, and many built-in models.
> > Can
> > > we
> > > > reuse Beam's utilities to build Flink prediction functions[1]?
> > >
> > > We can certainly learn from Beam's design, but directly reusing it
> > > would add a very heavy dependency and be difficult to integrate
> > > cleanly into Flink's native processing model.
> > >
> > > > 3. It would be better if we introduced a PredictRuntimeContext to
> help
> > > > users download required weight files.
> > >
> > > This is actually a great idea and essential for usability. Just to
> > > double check on your suggestion, your proposal is to have an explicit
> > > PredictRuntimeContext for dynamic model file downloading?
> > >
> > > >
> > > > 4. In ML, users typically perform inference on batches of data.
> > > Therefore,
> > > > per-record evaluation may not be necessary. How about we just
> introduce
> > > API
> > > > like[2]?
> > >
> > > I agree completely. The row-by-row API is just a starting point, and
> > > we should aim to prioritize support for efficient batch inference to
> > > ensure good performance for real-world models.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > > [1] https://beam.apache.org/documentation/ml/about-ml/
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation
> > > >
> > > >
> > > >
> > > >
> > > > Swapna Marru <[email protected]> 于2025年10月14日周二 11:53写道：
> > > >
> > > > > Thanks Matyas.
> > > > >
> > > > > Hao,
> > > > >
> > > > > The proposal is to provide a generic framework .
> > > > > Interfaces ->  PythonPredictRuntimeProvider /
> PythonPredictFunction /
> > > > > PredictFunction(in Python) are defined to provide a base for that
> > > > > framework.
> > > > >
> > > > > generic-python is one of the implementations, registered similar to
> > > openai
> > > > > in original FLIP.
> > > > > This is though not a concrete implementation end to end. It can be
> > > used as,
> > > > > 1. As a reference implementation for other complete end to end
> > concrete
> > > > > model provider implementations
> > > > > 2. For simple python model implementations, this can be used out of
> > > box to
> > > > > avoid boilerplate java provider implementation.
> > > > >
> > > > > I will also open a PR with current implementation changes , so it's
> > > more
> > > > > clear for further discussion.
> > > > >
> > > > > -Thanks,
> > > > > M.Swapna
> > > > >
> > > > > On Mon, Oct 13, 2025 at 5:04 PM Őrhidi Mátyás <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-552+Support+ML_PREDICT+for+Python+based+model+providers
> > > > > >
> > > > > > On Mon, Oct 13, 2025 at 4:10 PM Őrhidi Mátyás <
> > > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > Swapna, I can help you to create a FLIP page.
> > > > > > >
> > > > > > > On Mon, Oct 13, 2025 at 3:58 PM Hao Li
> <[email protected]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Swapna,
> > > > > > > >
> > > > > > > > Thanks for the proposal. Can you put it in a FLIP and start a
> > > > > > discussion
> > > > > > > > thread for it?
> > > > > > > >
> > > > > > > > From an initial look, I'm a bit confused if this is a
> concrete
> > > > > > > > implementation for "generic-python" or it's generic framework
> > to
> > > > > handle
> > > > > > > > python predict function. Because everything seems concrete
> like
> > > > > > > > `GenericPythonModelProviderFactory`,
> > `GenericPythonModelProvider`
> > > > > > exception
> > > > > > > > the final Python predict function.
> > > > > > > >
> > > > > > > > Also if `GenericPythonModelProviderFactory` is predefined, do
> > you
> > > > > > predefine
> > > > > > > > the required and optional options for it? Will it be
> inflexible
> > > if
> > > > > > > > predefined?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Hao
> > > > > > > >
> > > > > > > > On Mon, Oct 13, 2025 at 10:04 AM Swapna Marru <
> > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi ShengKai,
> > > > > > > > >
> > > > > > > > > Documented the initial proposal here ,
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1YzBxLUPvluaZIvR0S3ktc5Be1FF4bNeTsXB9ILfgyWY/edit?usp=sharing
> > > > > > > > >
> > > > > > > > > Please review and let me know your thoughts.
> > > > > > > > >
> > > > > > > > > -Thanks,
> > > > > > > > > Swapna
> > > > > > > > >
> > > > > > > > > On Tue, Sep 23, 2025 at 10:39 PM Shengkai Fang <
> > > [email protected]>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I see your point, and I agree that your proposal is
> > feasible.
> > > > > > However,
> > > > > > > > > > there is one limitation to consider: the current loading
> > > > > mechanism
> > > > > > first
> > > > > > > > > > discovers all available factories on the classpath and
> then
> > > > > > filters them
> > > > > > > > > > based on the user-specified identifiers.
> > > > > > > > > >
> > > > > > > > > > In most practical scenarios, we would likely have only
> one
> > > > > generic
> > > > > > > > factory
> > > > > > > > > > (e.g., a GenericPythonModelFactory) present in the
> > classpath.
> > > > > This
> > > > > > means
> > > > > > > > > > the framework would be able to load either PyTorch or
> > > TensorFlow
> > > > > > > > > > models—whichever is defined within that single generic
> > > > > > > > implementation—but
> > > > > > > > > > not both simultaneously unless additional mechanisms are
> > > > > > introduced.
> > > > > > > > > >
> > > > > > > > > > This doesn't block the proposal, but it’s something worth
> > > noting
> > > > > > as we
> > > > > > > > > > design the extensibility model. We may want to explore
> ways
> > > to
> > > > > > support
> > > > > > > > > > multiple user-defined providers more seamlessly in the
> > > future.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Shengkai
> > > > > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [Question] use ML_PREDICT/ML_EVALUATE with python model provider in Flink

Reply via email to