[GitHub] [flink-web] infoverload commented on a diff in pull request #526: Announcement blogpost for the 1.15 release

GitBox Tue, 12 Apr 2022 01:26:42 -0700


infoverload commented on code in PR #526:
URL: https://github.com/apache/flink-web/pull/526#discussion_r848133827



##########
_posts/2022-04-11-1.15-announcement.md:
##########
@@ -0,0 +1,375 @@
+---
+layout: post
+title:  "Announcing the Release of Apache Flink 1.15"
+subtitle: ""
+date: 2022-04-11T08:00:00.000Z
+categories: news
+authors:
+- yungao:
+  name: "Yun Gao"
+  twitter: "YunGao16"
+- joemoe:
+  name: "Joe Moser"
+  twitter: "JoemoeAT"
+
+---
+
+Thanks to our well-organized, kind, and open community, Apache Flink continues 
+[to grow](https://www.apache.org/foundation/docs/FY2021AnnualReport.pdf) as a 
+technology. We are and remain one of the most active projects in
+the Apache community. With release 1.15, we are proud to announce a number of 
+exciting changes.
+
+One of the main concepts that makes Apache Flink stand out is the unification 
of 
+batch (aka bounded data) and streaming (aka unbounded data) processing. A lot 
of 
+effort went into this in the last releases but we are only getting started 
there. 
+Apache Flink is not only growing when it comes to contributions and users, it 
is 
+also growing out of the original use cases and personas. Like the whole 
industry, 
+it is moving more towards business/analytics use cases that are implemented as 
+low-/no-code. The feature that represents the most within the Flink space is 
+Flink SQL. That’s why its popularity continues to grow. 
+
+Apache Flink is considered an essential building block in data architectures.  
It 
+is included with other technologies to drive all sorts of use cases. New ideas 
pop 
+up, existing technologies establish themselves as standards for solving some 
aspects 
+of a problem. In order to be successful, it is important that the experience 
of 
+integrating with Apache Flink is as seamless and easy as possible. 
+
+In the 1.15 release the Apache Flink community made significant progress 
across all 
+these areas. Still those are not the only things that made it into 1.15. The 
+contributors improved the experience of operating Apache Flink by making it 
much 
+easier and more transparent to handle checkpoints and savepoints and their 
ownership, 
+making auto scaling more seamless and complete, by removing side effects of 
use cases 
+in which different data sources produce varying amounts of data, and - finally 
- the 
+ability to upgrade SQL jobs without losing the state. By continuing on 
supporting 
+checkpoints after tasks finished and adding window table valued functions in 
batch 
+mode, the experience of unified stream and batch processing was once more 
improved 
+making hybrid use cases way easier. In the SQL space, not only the first step 
in 
+version upgrades have been added but also JSON functions to make it easier to 
import 
+and export structured data in SQL. Both will allow users to better rely on 
Flink SQL 
+for production use cases in the long term. To establish Apache Flink as part 
of the 
+data processing ecosystem we improved the cloud interoperability and added 
more sink 
+connectors and formats. And yes we enabled a Scala-free runtime 
+([the hype is real](https://flink.apache.org/2022/02/22/scala-free.html)).
+
+
+## Operating Apache Flink with joy
+
+Even jobs that have been built and tuned by the best engineering teams still 
need to 
+be operated. Looking at the lifecycle of Flink based projects most of them are 
built 
+to stay, putting long-term burdens on the people operating them. The many 
deployment 
+patterns, APIs, tuneable configs, and use cases covered by Apache Flink come 
at the 
+high cost of support.
+
+
+### Clarifying checkpoint and savepoint semantics
+
+An essential cornerstone of Flink’s fault tolerance strategy is based on 
+[checkpoints](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/checkpoints/)
+[and](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/checkpoints_vs_savepoints/)
 
+[savepoints](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/).
 
+The intention of savepoints has always been to put transitions, 
+backups, and upgrades of Apache Flink jobs in the control of users, 
checkpoints, on 
+the other hand, are intended to be fully controlled by Flink and guarantee 
fault 
+tolerance through fast recovery, fail over, etc. Both concepts are quite 
similar and 
+the underlying implementation also shares the same ideas and some aspects. 
Still, 
+both concepts have grown apart by following specific feature requests and 
sometimes 
+neglecting the overarching idea and strategy. It became apparent that this 
should be 
+aligned and harmonized better. It has been leading to situations in which 
users have 
+been relying on checkpoints to stop and restart jobs whereas savepoints would 
have 
+been the right way to go. Also savepoints are fairly slower as they don’t 
include 
+some of the features that made taking checkpoints so fast. In some cases like 
+resuming from a retained checkpoint in which the checkpoint is somehow 
considered as 
+a savepoint but it is unclear to the user when they can actually clean it up. 
To sum 
+it up: users have been confused.
+
+With [FLIP-193 (Snapshots 
ownership)](https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership)
 
+the community aims to make the ownership the only difference between savepoint 
and 
+checkpoint. In the 1.15 release the community has fixed some of those 
shortcomings 
+by supporting 
+[native and incremental 
savepoints](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#savepoint-format).
 
+Savepoints always used to use the 
+canonical format which made them slower. Also writing full savepoints for sure 
takes 
+longer than doing it in an incremental way. With 1.15 if users use the native 
format 
+to take savepoints as well as the RocksDB state backend, savepoints will be 
+automatically taken in an incremental manner. The documentation has also been 
+clarified to provide a better overview and understanding of the differences 
between 
+checkpoints and savepoints. The semantics for 
+[resuming from savepoint/retained 
checkpoint](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#resuming-from-savepoints)
 
+have also been clarified introducing the CLAIM and NO_CLAIM mode. With 
+the CLAIM mode Flink takes over ownership of an existing snapshot, with 
NO_CLAIM it 
+creates its own copy and leaves the existing one up to the user. Please note 
that 
+NO_CLAIM mode is the new default behavior. The old semantic of resuming from 
+savepoint/retained checkpoint is still accessible but has to be manually 
selected by 
+choosing LEGACY mode.
+
+
+### Elastic scaling: Adaptive scheduler/reactive mode
+
+Driven by the increasing number of cloud services built on top of Apache 
Flink, the 
+project becomes increasingly cloud native. As part of this development, 
elastic 
+scaling grows in importance. This release improves metrics for the reactive 
mode 
+(Job scope), adds an exception history for the adaptive scheduler, and speeds 
up 
+down-scaling by 10x.
+
+To achieve that, dealing with metrics has been improved making all the metrics 
in 
+the Job scope work correctly when reactive mode is enabled 
+([yes, only limitations have been removed from the 
documentation](https://github.com/apache/flink/pull/17766/files)). 
+The TaskManager now has a dedicated 
+shutdown code path, where it actively deregisters itself from the cluster 
instead 
+of relying on heartbeats, giving the JobManager a clear signal for downscaling.
+
+
+### Watermark alignment across sources
+
+Having sources that are increasing the watermarks at a different pace could 
lead to 
+problems with downstream operators. Some operators might need to buffer 
excessive 
+amounts of data which could lead to huge operator states. For sources based on 
the 
+new source interface, 
+[watermark 
alignment](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_)
+can be activated. Users can define 
+alignment groups for which consuming from sources which are too far ahead from 
others 
+are paused. The ideal case for aligned watermarks is when there are two or 
more 
+sources that produce watermarks at a different speed and when the source has 
the same 
+parallelism as splits/shards/partitions.

Review Comment:
   ```suggestion
   ### Watermark alignment across data sources
   
   Having data sources that increase watermarks at different paces could lead 
to 
   problems with downstream operators. For example, some operators might need 
to buffer excessive 
   amounts of data which could lead to huge operator states. This is why we 
introduced watermark alignment
   in this release.
   
   For sources based on the new source interface, 
   [watermark 
alignment](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_)
   can be activated. Users can define 
   alignment groups to pause consuming from sources which are too far ahead 
from others. The ideal scenario for aligned watermarks is when there are two or 
more 
   sources that produce watermarks at a different speed and when the source has 
the same 
   parallelism as splits/shards/partitions.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] infoverload commented on a diff in pull request #526: Announcement blogpost for the 1.15 release

Reply via email to