Hi everyone,

I'm sending this email to make sure everyone is on the same page about slowly deprecating the DataSet API.

There have been a few thoughts mentioned in presentations, offline discussions, and JIRA issues. However, I have observed that there are still some concerns or different opinions on what steps are necessary to implement this change.

Let me summarize some of the steps and assumpations and let's have a discussion about it:

Step 1: Introduce a batch mode for Table API (FLIP-32)
[DONE in 1.9]

Step 2: Introduce a batch mode for DataStream API (FLIP-134)
[DONE in 1.12]

Step 3: Soft deprecate DataSet API (FLIP-131)
[DONE in 1.12]

We updated the documentation recently to make this deprecation even more visible. There is a dedicated `(Legacy)` label right next to the menu item now.

We won't deprecate concrete classes of the API with a @Deprecated annotation to avoid extensive warnings in logs until then.

Step 4: Drop the legacy SQL connectors and formats (FLINK-14437)
[DONE in 1.14]

We dropped code for ORC, Parque, and HBase formats that were only used by DataSet API users. The removed classes had no documentation and were not annotated with one of our API stability annotations.

The old functionality should be available through the new sources and sinks for Table API and DataStream API. If not, we should bring them into a shape that they can be a full replacement.

DataSet users are encouraged to either upgrade the API or use Flink 1.13. Users can either just stay at Flink 1.13 or copy only the format's code to a newer Flink version. We aim to keep the core interfaces (i.e. InputFormat and OutputFormat) stable until the next major version.

We will maintain/allow important contributions to dropped connectors in 1.13. So 1.13 could be considered as kind of a DataSet API LTS release.

Step 5: Drop the legacy SQL planner (FLINK-14437)
[DONE in 1.14]

This included dropping support of DataSet API with SQL.

Step 6: Connect both Table and DataStream API in batch mode (FLINK-20897)
[PLANNED in 1.14]

Step 7: Reach feature parity of Table API/DataStream API with DataSet API
[PLANNED for 1.14++]

We need to identify blockers when migrating from DataSet API to Table API/DataStream API. Here we need to estabilish a good feedback pipeline to include DataSet users in the roadmap planning.

Step 7: Drop the Gelly library

No concrete plan yet. Latest would be the next major Flink version aka Flink 2.0.

Step 8: Drop DataSet API

Planned for the next major Flink version aka Flink 2.0.


Please let me know if this matches your thoughts. We can also convert this into a blog post or mention it in the next release notes.

Regards,
Timo

Reply via email to