Hi all,

I'd like to start a discussion about introducing a few convenient operations in 
Table API from the perspective of ease of use. 

Currently some tasks are not easy to express in Table API e.g. deduplication, 
topn, etc, or not easy to express when there are hundreds of columns in a 
table, e.g. null data handling, etc.

I'd like to propose to introduce a few operations in Table API with the 
following purposes:
- Make Table API users to easily leverage the powerful features already in SQL, 
e.g. deduplication, topn, etc
- Provide some convenient operations, e.g. introducing a series of operations 
for null data handling (it may become a problem when there are hundreds of 
columns), data sampling and splitting (which is a very common use case in ML 
which usually needs to split a table into multiple tables for training and 
validation separately).

Please refer to FLIP-155 [1] for more details.

Looking forward to your feedback!

Regards,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-155%3A+Introduce+a+few+convenient+operations+in+Table+API

Reply via email to