If we want to publicize this plan more shouldn't we have a rough
timeline for when 2.0 is on the table?
On 6/23/2021 2:44 PM, Stephan Ewen wrote:
Thanks for writing this up, this also reflects my understanding.
I think a blog post would be nice, ideally with an explicit call for
feedback so we learn about user concerns.
A blog post has a lot more reach than an ML thread.
Best,
Stephan
On Wed, Jun 23, 2021 at 12:23 PM Timo Walther <twal...@apache.org> wrote:
Hi everyone,
I'm sending this email to make sure everyone is on the same page about
slowly deprecating the DataSet API.
There have been a few thoughts mentioned in presentations, offline
discussions, and JIRA issues. However, I have observed that there are
still some concerns or different opinions on what steps are necessary to
implement this change.
Let me summarize some of the steps and assumpations and let's have a
discussion about it:
Step 1: Introduce a batch mode for Table API (FLIP-32)
[DONE in 1.9]
Step 2: Introduce a batch mode for DataStream API (FLIP-134)
[DONE in 1.12]
Step 3: Soft deprecate DataSet API (FLIP-131)
[DONE in 1.12]
We updated the documentation recently to make this deprecation even more
visible. There is a dedicated `(Legacy)` label right next to the menu
item now.
We won't deprecate concrete classes of the API with a @Deprecated
annotation to avoid extensive warnings in logs until then.
Step 4: Drop the legacy SQL connectors and formats (FLINK-14437)
[DONE in 1.14]
We dropped code for ORC, Parque, and HBase formats that were only used
by DataSet API users. The removed classes had no documentation and were
not annotated with one of our API stability annotations.
The old functionality should be available through the new sources and
sinks for Table API and DataStream API. If not, we should bring them
into a shape that they can be a full replacement.
DataSet users are encouraged to either upgrade the API or use Flink
1.13. Users can either just stay at Flink 1.13 or copy only the format's
code to a newer Flink version. We aim to keep the core interfaces (i.e.
InputFormat and OutputFormat) stable until the next major version.
We will maintain/allow important contributions to dropped connectors in
1.13. So 1.13 could be considered as kind of a DataSet API LTS release.
Step 5: Drop the legacy SQL planner (FLINK-14437)
[DONE in 1.14]
This included dropping support of DataSet API with SQL.
Step 6: Connect both Table and DataStream API in batch mode (FLINK-20897)
[PLANNED in 1.14]
Step 7: Reach feature parity of Table API/DataStream API with DataSet API
[PLANNED for 1.14++]
We need to identify blockers when migrating from DataSet API to Table
API/DataStream API. Here we need to estabilish a good feedback pipeline
to include DataSet users in the roadmap planning.
Step 7: Drop the Gelly library
No concrete plan yet. Latest would be the next major Flink version aka
Flink 2.0.
Step 8: Drop DataSet API
Planned for the next major Flink version aka Flink 2.0.
Please let me know if this matches your thoughts. We can also convert
this into a blog post or mention it in the next release notes.
Regards,
Timo