Hi Hao, Sure, can change that. >> Maybe change `getPythonPredictFunction` to `getPythonPredictFunction` to align with `createPredictFunction` in `PredictRuntimeProvider`.
Yes , this can be enforced . It can fail if it's not an implementation of PredictFunction. This can be done as part of getPythonFunction in java_gateway.py (where the underlying python function is instantiated). >> Is it possible to enforce the returned Java `PythonFunction` from `createPythonFunction` implements Python `PredictFunction`? On Tue, Oct 14, 2025 at 4:09 PM Hao Li <[email protected]> wrote: > Hi Swapna, > > Some other suggestion and questions: > > 1. Maybe change `getPythonPredictFunction` to `getPythonPredictFunction` to > align with `createPredictFunction` in `PredictRuntimeProvider`. > 2. Is it possible to enforce the returned Java `PythonFunction` from > `createPythonFunction` implements Python `PredictFunction`? > > Hao > > On Tue, Oct 14, 2025 at 2:52 PM Swapna Marru <[email protected]> > wrote: > > > Thanks Matyas and Shengkai. > > > > > > > 1. I'm wondering whether we could extend the SQL API to change how > Python > > > models are loaded. For example, we could allow users to write: > > > > > > ``` > > > CREATE MODEL my_pytorch_model > > > WITH ( > > > 'type' = 'pytorch' > > > ) LANGUAGE PYTHON; > > > ``` > > > > Kind of thought about this, but one initial concern I had with this model > > is, > > Will a model provider be completely implemented in python itself ? > > When we refer to PyTorch or HuggingFace or ONNX model providers for > > example, do we need different behavior > > or optimizations related to Predictruntimecontext/model config building , > > batching or resource scheduling decisions .. which needs to be done in > Java > > Flink entrypoints ? > > > > > > > 2. Beam already supports TensorFlow, ONNX, and many built-in models. > Can > > we > > > reuse Beam's utilities to build Flink prediction functions[1]? > > > > Thanks, I will take a look at this to understand how it works and if we > can > > learn from that design. > > > > > 3. It would be better if we introduced a PredictRuntimeContext to help > > > users download required weight files. > > > > Currently I have this as Model Config(set_model_config) , where this will > > be passed to PredictFunction in python. But PredictRuntimeContext seems > > more suitable . > > I will look into passing PredictRuntimeContext in open, similar to > > RuntimeContext for UDF's. > > > > > > > 4. In ML, users typically perform inference on batches of data. > > Therefore, > > > per-record evaluation may not be necessary. How about we just introduce > > API > > > like[2]? > > > > Yes, I completely agree on this. I was first aiming at agreement on model > > creation and provider api and then look at this in more detail. > > > > On Tue, Oct 14, 2025 at 10:39 AM Őrhidi Mátyás <[email protected]> > > wrote: > > > > > Hey Shengkai, > > > > > > Thank you for your observations. This proposal is mostly driven by > > > Swapna, but I could also share my thoughts here, please find them > > > inline. > > > > > > Cheers, > > > Matyas > > > > > > On Tue, Oct 14, 2025 at 3:02 AM Shengkai Fang <[email protected]> > wrote: > > > > > > > > Hi, Matyas. > > > > > > > > Thanks for the proposal. I have some suggestions about the proposal. > > > > > > > > 1. I'm wondering whether we could extend the SQL API to change how > > Python > > > > models are loaded. For example, we could allow users to write: > > > > > > > > ``` > > > > CREATE MODEL my_pytorch_model > > > > WITH ( > > > > 'type' = 'pytorch' > > > > ) LANGUAGE PYTHON; > > > > ``` > > > > In this case, we wouldn't rely on Java SPI to load the Python model > > > > provider. However, I'm not sure whether Python has a similar > mechanism > > to > > > > SPI that avoids hardcoding class paths. > > > > > > This is an interesting idea, however we are proposing using the > > > provider model because it aligns with Flink's existing Java-based > > > architecture for discovering plugins. A Java entry point is required > > > to launch the Python code, and this is the standard way to do it. > > > > > > > 2. Beam already supports TensorFlow, ONNX, and many built-in models. > > Can > > > we > > > > reuse Beam's utilities to build Flink prediction functions[1]? > > > > > > We can certainly learn from Beam's design, but directly reusing it > > > would add a very heavy dependency and be difficult to integrate > > > cleanly into Flink's native processing model. > > > > > > > 3. It would be better if we introduced a PredictRuntimeContext to > help > > > > users download required weight files. > > > > > > This is actually a great idea and essential for usability. Just to > > > double check on your suggestion, your proposal is to have an explicit > > > PredictRuntimeContext for dynamic model file downloading? > > > > > > > > > > > 4. In ML, users typically perform inference on batches of data. > > > Therefore, > > > > per-record evaluation may not be necessary. How about we just > introduce > > > API > > > > like[2]? > > > > > > I agree completely. The row-by-row API is just a starting point, and > > > we should aim to prioritize support for efficient batch inference to > > > ensure good performance for real-world models. > > > > > > > > Best, > > > > Shengkai > > > > > > > > [1] https://beam.apache.org/documentation/ml/about-ml/ > > > > [2] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation > > > > > > > > > > > > > > > > > > > > Swapna Marru <[email protected]> 于2025年10月14日周二 11:53写道: > > > > > > > > > Thanks Matyas. > > > > > > > > > > Hao, > > > > > > > > > > The proposal is to provide a generic framework . > > > > > Interfaces -> PythonPredictRuntimeProvider / > PythonPredictFunction / > > > > > PredictFunction(in Python) are defined to provide a base for that > > > > > framework. > > > > > > > > > > generic-python is one of the implementations, registered similar to > > > openai > > > > > in original FLIP. > > > > > This is though not a concrete implementation end to end. It can be > > > used as, > > > > > 1. As a reference implementation for other complete end to end > > concrete > > > > > model provider implementations > > > > > 2. For simple python model implementations, this can be used out of > > > box to > > > > > avoid boilerplate java provider implementation. > > > > > > > > > > I will also open a PR with current implementation changes , so it's > > > more > > > > > clear for further discussion. > > > > > > > > > > -Thanks, > > > > > M.Swapna > > > > > > > > > > On Mon, Oct 13, 2025 at 5:04 PM Őrhidi Mátyás < > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-552+Support+ML_PREDICT+for+Python+based+model+providers > > > > > > > > > > > > On Mon, Oct 13, 2025 at 4:10 PM Őrhidi Mátyás < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > Swapna, I can help you to create a FLIP page. > > > > > > > > > > > > > > On Mon, Oct 13, 2025 at 3:58 PM Hao Li > <[email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi Swapna, > > > > > > > > > > > > > > > > Thanks for the proposal. Can you put it in a FLIP and start a > > > > > > discussion > > > > > > > > thread for it? > > > > > > > > > > > > > > > > From an initial look, I'm a bit confused if this is a > concrete > > > > > > > > implementation for "generic-python" or it's generic framework > > to > > > > > handle > > > > > > > > python predict function. Because everything seems concrete > like > > > > > > > > `GenericPythonModelProviderFactory`, > > `GenericPythonModelProvider` > > > > > > exception > > > > > > > > the final Python predict function. > > > > > > > > > > > > > > > > Also if `GenericPythonModelProviderFactory` is predefined, do > > you > > > > > > predefine > > > > > > > > the required and optional options for it? Will it be > inflexible > > > if > > > > > > > > predefined? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Hao > > > > > > > > > > > > > > > > On Mon, Oct 13, 2025 at 10:04 AM Swapna Marru < > > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi ShengKai, > > > > > > > > > > > > > > > > > > Documented the initial proposal here , > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1YzBxLUPvluaZIvR0S3ktc5Be1FF4bNeTsXB9ILfgyWY/edit?usp=sharing > > > > > > > > > > > > > > > > > > Please review and let me know your thoughts. > > > > > > > > > > > > > > > > > > -Thanks, > > > > > > > > > Swapna > > > > > > > > > > > > > > > > > > On Tue, Sep 23, 2025 at 10:39 PM Shengkai Fang < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > I see your point, and I agree that your proposal is > > feasible. > > > > > > However, > > > > > > > > > > there is one limitation to consider: the current loading > > > > > mechanism > > > > > > first > > > > > > > > > > discovers all available factories on the classpath and > then > > > > > > filters them > > > > > > > > > > based on the user-specified identifiers. > > > > > > > > > > > > > > > > > > > > In most practical scenarios, we would likely have only > one > > > > > generic > > > > > > > > factory > > > > > > > > > > (e.g., a GenericPythonModelFactory) present in the > > classpath. > > > > > This > > > > > > means > > > > > > > > > > the framework would be able to load either PyTorch or > > > TensorFlow > > > > > > > > > > models—whichever is defined within that single generic > > > > > > > > implementation—but > > > > > > > > > > not both simultaneously unless additional mechanisms are > > > > > > introduced. > > > > > > > > > > > > > > > > > > > > This doesn't block the proposal, but it’s something worth > > > noting > > > > > > as we > > > > > > > > > > design the extensibility model. We may want to explore > ways > > > to > > > > > > support > > > > > > > > > > multiple user-defined providers more seamlessly in the > > > future. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Shengkai > > > > > > > > > > > > > > > > > > > > > > > > > > >
