Hi Jark,

Thanks for your questions. These are good questions!

1. The polymorphism table function I was referring to takes a table as
input and outputs a table. So the syntax would be like
```
SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table))
```
As far as I know, this is not supported yet on Flink. So before it's
supported, one option for the predict function is using table function
which can output multiple columns
```
SELECT * FROM my_table, LATERAL VIEW (ML_PREDICT('model', col1, col2))
```

2. Good question. Type inference is hard for the `ML_PREDICT` function
because it takes a model name string as input. I can think of three ways of
doing type inference for it.
   1). Treat `ML_PREDICT` function as something special and during sql
parsing or planning time, if it's encountered, we need to look up the model
from the first argument which is a model name from catalog. Then we can
infer the input/output for the function.
   2). We can define a `model` keyword and use that in the predict function
to indicate the argument refers to a model. So it's like `ML_PREDICT(model
'my_model', col1, col2))`
   3). We can create a special type of table function maybe called
`ModelFunction` which can resolve the model type inference by special
handling it during parsing or planning time.
1) is hacky, 2) isn't supported in Flink for function, 3) might be a
good option.

3. I sketched the `ML_PREDICT` function for inference. But there are
limitations of the function mentioned in 1 and 2. So maybe we don't need to
introduce them as built-in functions until polymorphism table function and
we can properly deal with type inference.
After that, defining a user-defined model function should also be
straightforward.

4. For model types, do you mean 'remote', 'import', 'native' models or
other things?

5. We could support popular providers such as 'azureml', 'vertexai',
'googleai' as long as we support the `ML_PREDICT` function. Users should be
able to implement 3rd-party providers if they can implement a function
handling the input/output for the provider.

I think for the model functions, there are still dependencies or hacks we
need to sort out as a built-in function. Maybe we can separate that as a
follow up if we want to have it built-in and focus on the model syntax for
this FLIP?

Thanks,
Hao

On Tue, Mar 12, 2024 at 10:33 PM Jark Wu <imj...@gmail.com> wrote:

> Hi Minge, Chris, Hao,
>
> Thanks for proposing this interesting idea. I think this is a nice step
> towards
> the AI world for Apache Flink. I don't know much about AI/ML, so I may have
> some stupid questions.
>
> 1. Could you tell more about why polymorphism table function (PTF) doesn't
> work and do we have plan to use PTF as model functions?
>
> 2. What kind of object does the model map to in SQL? A relation or a data
> type?
> It looks like a data type because we use it as a parameter of the table
> function.
> If it is a data type, how does it cooperate with type inference[1]?
>
> 3. What built-in model functions will we support? How to define a
> user-defined model function?
>
> 4. What built-in model types will we support? How to define a user-defined
> model type?
>
> 5. Regarding the remote model, what providers will we support? Can users
> implement
> 3rd-party providers except OpenAI?
>
> Best,
> Jark
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference
>
>
>
>
> On Wed, 13 Mar 2024 at 05:55, Hao Li <h...@confluent.io.invalid> wrote:
>
> > Hi, Dev
> >
> >
> > Mingge, Chris and I would like to start a discussion about FLIP-437:
> > Support ML Models in Flink SQL.
> >
> > This FLIP is proposing to support machine learning models in Flink SQL
> > syntax so that users can CRUD models with Flink SQL and use models on
> Flink
> > to do prediction with Flink data. The FLIP also proposes new model
> entities
> > and changes to catalog interface to support model CRUD operations in
> > catalog.
> >
> > For more details, see FLIP-437 [1]. Looking forward to your feedback.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> >
> > Thanks,
> > Minge, Chris & Hao
> >
>

Reply via email to