Thanks for writing this up, this also reflects my understanding. I think a blog post would be nice, ideally with an explicit call for feedback so we learn about user concerns. A blog post has a lot more reach than an ML thread.
Best, Stephan On Wed, Jun 23, 2021 at 12:23 PM Timo Walther <twal...@apache.org> wrote: > Hi everyone, > > I'm sending this email to make sure everyone is on the same page about > slowly deprecating the DataSet API. > > There have been a few thoughts mentioned in presentations, offline > discussions, and JIRA issues. However, I have observed that there are > still some concerns or different opinions on what steps are necessary to > implement this change. > > Let me summarize some of the steps and assumpations and let's have a > discussion about it: > > Step 1: Introduce a batch mode for Table API (FLIP-32) > [DONE in 1.9] > > Step 2: Introduce a batch mode for DataStream API (FLIP-134) > [DONE in 1.12] > > Step 3: Soft deprecate DataSet API (FLIP-131) > [DONE in 1.12] > > We updated the documentation recently to make this deprecation even more > visible. There is a dedicated `(Legacy)` label right next to the menu > item now. > > We won't deprecate concrete classes of the API with a @Deprecated > annotation to avoid extensive warnings in logs until then. > > Step 4: Drop the legacy SQL connectors and formats (FLINK-14437) > [DONE in 1.14] > > We dropped code for ORC, Parque, and HBase formats that were only used > by DataSet API users. The removed classes had no documentation and were > not annotated with one of our API stability annotations. > > The old functionality should be available through the new sources and > sinks for Table API and DataStream API. If not, we should bring them > into a shape that they can be a full replacement. > > DataSet users are encouraged to either upgrade the API or use Flink > 1.13. Users can either just stay at Flink 1.13 or copy only the format's > code to a newer Flink version. We aim to keep the core interfaces (i.e. > InputFormat and OutputFormat) stable until the next major version. > > We will maintain/allow important contributions to dropped connectors in > 1.13. So 1.13 could be considered as kind of a DataSet API LTS release. > > Step 5: Drop the legacy SQL planner (FLINK-14437) > [DONE in 1.14] > > This included dropping support of DataSet API with SQL. > > Step 6: Connect both Table and DataStream API in batch mode (FLINK-20897) > [PLANNED in 1.14] > > Step 7: Reach feature parity of Table API/DataStream API with DataSet API > [PLANNED for 1.14++] > > We need to identify blockers when migrating from DataSet API to Table > API/DataStream API. Here we need to estabilish a good feedback pipeline > to include DataSet users in the roadmap planning. > > Step 7: Drop the Gelly library > > No concrete plan yet. Latest would be the next major Flink version aka > Flink 2.0. > > Step 8: Drop DataSet API > > Planned for the next major Flink version aka Flink 2.0. > > > Please let me know if this matches your thoughts. We can also convert > this into a blog post or mention it in the next release notes. > > Regards, > Timo > >