[GitHub] [flink-web] dianfu commented on a diff in pull request #526: Announcement blogpost for the 1.15 release

GitBox Wed, 20 Apr 2022 23:22:49 -0700


dianfu commented on code in PR #526:
URL: https://github.com/apache/flink-web/pull/526#discussion_r854819441



##########
_posts/2022-04-11-1.15-announcement.md:
##########
@@ -0,0 +1,431 @@
+---
+layout: post
+title:  "Announcing the Release of Apache Flink 1.15"
+subtitle: ""
+date: 2022-04-11T08:00:00.000Z
+categories: news
+authors:
+- yungao:
+  name: "Yun Gao"
+  twitter: "YunGao16"
+- joemoe:
+  name: "Joe Moser"
+  twitter: "JoemoeAT"
+
+---
+
+Thanks to our well-organized and open community, Apache Flink continues 
+[to grow](https://www.apache.org/foundation/docs/FY2021AnnualReport.pdf) as a 
+technology and remain one of the most active projects in
+the Apache community. With the release of Flink 1.15, we are proud to announce 
a number of 
+exciting changes.
+
+One of the main concepts that makes Apache Flink stand out is the unification 
of 
+batch (aka bounded) and stream (aka unbounded) data processing. A lot of 
+effort went into this unification in the previous releases but you can expect 
more efforts in this direction. 
+Apache Flink is not only growing when it comes to contributions and users, but
+also out of the original use cases. We are seeing a trend towards more 
business/analytics 
+use cases implemented in low-/no-code. Flink SQL is the feature in the Flink 
ecosystem 
+that enables such uses cases and this is why its popularity continues to grow. 
 
+
+Apache Flink is an essential building block in data pipelines/architectures 
and 
+is used with many other technologies in order to drive all sorts of use cases. 
While new ideas/products
+may appear in this domain, existing technologies continue to establish 
themselves as standards for solving 
+mission-critical problems. Knowing that we have such a wide reach and play a 
role in the success of many 
+projects, it is important that the experience of 
+integrating with Apache Flink is as seamless and easy as possible. 
+
+In the 1.15 release the Apache Flink community made significant progress 
across all 
+these areas. Still those are not the only things that made it into 1.15. The 
+contributors improved the experience of operating Apache Flink by making it 
much 
+easier and more transparent to handle checkpoints and savepoints and their 
ownership, 
+making auto scaling more seamless and complete, by removing side effects of 
use cases 
+in which different data sources produce varying amounts of data, and - finally 
- the 
+ability to upgrade SQL jobs without losing the state. By continuing on 
supporting 
+checkpoints after tasks finished and adding window table valued functions in 
batch 
+mode, the experience of unified stream and batch processing was once more 
improved 
+making hybrid use cases way easier. In the SQL space, not only the first step 
in 
+version upgrades have been added but also JSON functions to make it easier to 
import 
+and export structured data in SQL. Both will allow users to better rely on 
Flink SQL 
+for production use cases in the long term. To establish Apache Flink as part 
of the 
+data processing ecosystem we improved the cloud interoperability and added 
more sink 
+connectors and formats. And yes we enabled a Scala-free runtime 
+([the hype is real](https://flink.apache.org/2022/02/22/scala-free.html)).
+
+
+## Operating Apache Flink with ease
+
+Even Flink jobs that have been built and tuned by the best engineering teams 
still need to 
+be operated, usually on a long-term basis. The many deployment 
+patterns, APIs, tuneable configs, and use cases covered by Apache Flink mean 
that operation
+support is vital and can be burdensome.
+
+In this release, we listened to user feedback and now operating Flink is made 
much 
+easier. It is now more transparent in terms of handling checkpoints and 
savepoints and their ownership, 
+which makes auto-scaling more seamless and complete (by removing side effects 
of use cases 
+where different data sources produce varying amounts of data) and enables the  
+ability to upgrade SQL jobs without losing the state. 
+
+
+### Clarification of checkpoint and savepoint semantics
+
+An essential cornerstone of Flink’s fault tolerance strategy is based on 
+[checkpoints](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/checkpoints/)
 and 
+[savepoints](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/)
 (see [the 
comparison](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/checkpoints_vs_savepoints/).
 
+The purpose of savepoints has always been to put transitions, 
+backups, and upgrades of Flink jobs in the control of users. Checkpoints, on 
+the other hand, are intended to be fully controlled by Flink and guarantee 
fault 
+tolerance through fast recovery, failover, etc. Both concepts are quite 
similar and 
+the underlying implementation also shares aspects of the same ideas. 
+
+However, both concepts grew apart by following specific feature requests and 
sometimes 
+neglecting the overarching idea and strategy. Based on user feedback, it 
became apparent that this should be 
+aligned and harmonized better and, above all, to make more clear! 
+
+There have been situations in which users relied on checkpoints to 
stop/restart jobs when savepoints would have 
+been the right way to go. It was also not clear that savepoints are slower 
since they don’t include 
+some of the features that make taking checkpoints so fast. In some cases like 
+resuming from a retained checkpoint in which the checkpoint is somehow 
considered a 
+a savepoint, it is unclear to the user when they can actually clean it up. 
+
+With [FLIP-193 (Snapshots 
ownership)](https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership)
 
+the community aims to make the ownership the only difference between savepoint 
and 
+checkpoint. In the 1.15 release the community has fixed some of those 
shortcomings 
+by supporting 
+[native and incremental 
savepoints](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#savepoint-format).
 
+Savepoints always used to use the 
+canonical format which made them slower. Also writing full savepoints for sure 
takes 
+longer than doing it in an incremental way. With 1.15 if users use the native 
format 
+to take savepoints as well as the RocksDB state backend, savepoints will be 
+automatically taken in an incremental manner. The documentation has also been 
+clarified to provide a better overview and understanding of the differences 
between 
+checkpoints and savepoints. The semantics for 
+[resuming from savepoint/retained 
checkpoint](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#resuming-from-savepoints)
 
+have also been clarified introducing the CLAIM and NO_CLAIM mode. With 
+the CLAIM mode Flink takes over ownership of an existing snapshot, with 
NO_CLAIM it 
+creates its own copy and leaves the existing one up to the user. Please note 
that 
+NO_CLAIM mode is the new default behavior. The old semantic of resuming from 
+savepoint/retained checkpoint is still accessible but has to be manually 
selected by 
+choosing LEGACY mode.
+
+
+### Elastic scaling with reactive mode and the adaptive scheduler
+
+Driven by the increasing number of cloud services built on top of Apache 
Flink, the 
+project is increasingly cloud native which makes elastic 
+scaling more and more important. 
+
+This release improves metrics for the [reactive 
mode](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#reactive-mode),
 which is a job-scope mode where the JobManager will try to use all TaskManager 
resources available. To do this, we made all the metrics in 
+the Job scope work correctly when reactive mode is enabled 
+([yes, only limitations have been removed from the 
documentation](https://github.com/apache/flink/pull/17766/files)). 
+
+We also added an exception history for the [adaptive 
scheduler](https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler),
 which is a new scheduler that first declares the required resources and waits 
for them before deciding on the parallelism with which to execute a job. 
+
+Furthermore, downscaling is sped up by 10x. The TaskManager now has a 
dedicated 
+shutdown code path, where it actively deregisters itself from the cluster 
instead 
+of relying on heartbeats, giving the JobManager a clear signal for downscaling.
+
+
+### Adaptive batch scheduler
+
+In 1.15, we introduced a new scheduler to Apache Flink: the 
+[Adaptive Batch 
Scheduler](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/#adaptive-batch-scheduler).
 
+The new scheduler can automatically decide parallelisms of job vertices for 
batch jobs, 
+according to the size of data volume each vertex needs to process.
+
+Major benefits of this scheduler includes:
+
+1. Ease-of-use: Batch job users can be relieved from parallelism tuning.
+2. Adaptive: Automatically tuned parallelisms can better fit consumed datasets 
which 
+   have a varying volume size every day.
+3. Fine-grained: Parallelism of each job vertex will be tuned individually. 
This allows 
+   vertices of SQL batch jobs to be automatically assigned different proper 
parallelisms.
+
+
+### Watermark alignment across data sources
+
+Having data sources that increase watermarks at different paces could lead to 
+problems with downstream operators. For example, some operators might need to 
buffer excessive 
+amounts of data which could lead to huge operator states. This is why we 
introduced watermark alignment
+in this release.
+
+For sources based on the new source interface, 
+[watermark 
alignment](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_)
+can be activated. Users can define 
+alignment groups to pause consuming from sources which are too far ahead from 
others. The ideal scenario for aligned watermarks is when there are two or more 
+sources that produce watermarks at a different speed and when the source has 
the same 
+parallelism as splits/shards/partitions.
+
+
+### SQL version upgrades
+
+The execution plan of SQL queries and its resulting topology is based on 
optimization 
+rules and a cost model. This means that even minimal changes could introduce a 
completely 
+different topology. This dynamism makes guaranteeing snapshot compatibility 
+very challenging across different Flink versions. In the efforts of 1.15, the 
community focused 
+on keeping the same query (via the same topology) up and running even after 
upgrades. 
+
+At the core of SQL upgrades are JSON plans 
+([please note that we only have documentation in our JavaDocs for now and are 
still working on updating the 
documentation](https://nightlies.apache.org/flink/flink-docs-release-1.15/api/java/org/apache/flink/table/api/CompiledPlan.html)),
 which are JSON functions that make it easier to import and export structured 
data in SQL. This has been introduced for 
+internal use already in previous releases and will now be exposed externally. 
Both the Table API 
+and SQL will provide a way to compile and execute a plan which guarantees the 
same 
+topology throughout different versions. This feature will be released as an 
experimental MVP. 
+Users who want to give it a try already can create a JSON plan that can then 
be used 
+to restore a Flink job based on the old operator structure. The full feature 
can be expected 
+in Flink 1.16.
+
+Reliable upgrades makes Flink SQL more dependable for production use cases in 
the long term. 
+
+
+### Changelog state backend
+
+In Flink 1.15, we introduced the MVP feature of the
+[changelog state 
backend](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/state_backends/#enabling-changelog),
 
+which aims at 
+making checkpoint intervals shorter and more predictable with the following 
advantages:
+
+1. Shorter end-to-end latency: end-to-end latency mostly depends on the 
checkpointing 
+   mechanism, especially for transactional sinks. Transactional sinks commit 
on 
+   checkpoints, so faster checkpoints mean more frequent commits.
+2. More predictable checkpoint intervals: currently checkpointing time largely 
depends 
+   on the size of the artifacts that need to be persisted on the checkpoint 
storage. 
+   By keeping the size consistently small, checkpointing time becomes more 
predictable.
+3. Less work on recovery: with more frequently checkpoints are taken, less 
data need 
+   to be re-processed after each recovery.
+
+The changelog state backend helps achieve the above by continuously 
+persisting state changes on non-volatile storage while performing state 
materialization 
+in the background.
+
+
+### Repeatable cleanup
+
+In previous releases of Flink, cleaning up job-related artifacts was done only 
once which might have resulted in abandoned artifacts in case of an error. In 
this version, Flink will try to run the cleanup again to avoid leaving 
artifacts behind. This retry mechanism runs until it was successful, by 
default. Users can change this behavior by configuring the [repeatable cleanup 
options](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#retryable-cleanup).
 Disabling the retry strategy will lead to Flink behaving like in previous 
releases.
+
+There is still work in progress around cleaning up checkpoints, which is 
covered by 
+[FLINK-26606](https://issues.apache.org/jira/browse/FLINK-26606).
+
+
+### OpenAPI
+
+Flink is now providing the REST API specifications following the 
+[OpenAPI](https://www.openapis.org) standard.
+This allows the REST API to be used with standard tools that are implementing 
the
+OpenAPI standard.
+You can find the specifications 
[here](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobmanager).
+
+
+### Improvements to application mode
+
+When running Flink in [application 
mode](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/overview/),
 it can now be guaranteed that jobs will take a savepoint after they are 
completed if they have been configured to do so
+([see 
execution.shutdown-on-application-finish](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#execution-shutdown-on-application-finish)).
 
+
+The recovery and clean up of jobs running in application mode has also been 
improved. The local 
+state can be persisted in the working directory, which makes recovering 
+from local storage easier.
+
+
+## Unification of stream and batch processing - more progress
+
+In the latest release, we picked up new efforts and continued some previous 
ones in the goal of unifying stream and batch processing.
+
+
+### Final checkpoints
+
+In Flink 1.14, final checkpoints were added as a feature that had to be 
enabled manually. 
+Since the last release, we listened to user feedback and decided to enable it 
by default. For more 
+information and how to disable this feature, please refer to the 
+[documentation](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/fault-tolerance/checkpointing/#checkpointing-with-parts-of-the-graph-finished).
+This change in configuration can prolong the shutting down sequence of bounded 
+streaming jobs because jobs have to wait for a final checkpoint before being 
allowed to 
+finish.
+
+
+### Window table-valued functions
+
+[Window table-valued 
functions](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/window-tvf/)
 
+have only been available for unbounded data streams. 
+With this release they will also be usable in BATCH mode. While working on 
this, 
+change window table-valued functions have also been improved in general by 
implementing 
+a dedicated operator which no longer requires those window functions to be 
used with 
+aggregators.
+
+
+## Flink SQL
+
+Community metrics indicate that Flink SQL is widely used and becomes more 
popular every day. The community made several improvements but we’d like to go 
into two in more detail.
+
+
+### CAST/Type system enhancements
+
+Data appears in all sorts and shapes but is often not in the type that you 
need 
+it to be, which is why 
+[casting](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/types/#casting)
 
+is one of the most common operations in SQL. In Flink 
+1.15, the default behavior of a failing CAST has changed from returning a null 
to 
+returning an error, which makes it more compliant with the SQL standard. The 
old 
+casting behavior can still be used by calling the newly introduced TRY_CAST 
function
+or restored via a configuration flag.
+
+In addition, many bugs have been fixed and improvements made to the casting 
+functionality, to ensure correct results. 
+
+
+### JSON functions
+
+JSON is one of the most popular data formats and SQL users increasingly need 
to build 
+and read these data structures.  Multiple 
+[JSON](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/functions/systemfunctions/#json-functions)
+functions have been added to Flink SQL 
+according to the SQL 2016 standard. It allows users to inspect, create, and 
modify JSON 
+strings using the Flink SQL dialect.
+
+
+## Community enablement
+
+Enabling people to build streaming data pipelines to solve their use cases is 
our goal. 
+The community is well aware that a 
+technology like Apache Flink is never used on its own and will always be part 
of a 
+bigger architecture. Thus, it is important that Flink operates well in the 
cloud, 
+connects seamlessly to other systems, and continues to support programming 
languages like Java and Python. 
+
+
+### Cloud interoperability
+
+There are users operating Flink deployments in cloud infrastructures from 
various 
+cloud providers. There are also services that offer to manage Flink 
deployments for 
+users on their platform. 
+
+In Flink 1.15, a recoverable writer for Google Cloud Storage has 
+been added. We also organized the connectors in the Flink ecosystem and put 
some focus 
+on connectors for the AWS ecosystem (i.e. 
+[KDS](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kinesis/),
 
+[Firehose](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/firehose/)).
+
+
+### The Elasticsearch sink
+
+There was significant work on Flink’s overall connector ecosystem, but we want 
to highlight the 
+[Elasticsearch 
sink](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/elasticsearch/)
 because it was implemented with 
+the new connector interfaces, which offers asynchronous functionality coupled 
with end-to-end semantics. 
+This sink will act as a template to follow in the future.
+
+
+### A Scala-free Flink
+
+A detailed 
+[blog post](https://flink.apache.org/2022/02/22/scala-free.html)  
+already explains the ins and outs of why Scala users can now use the Flink 
+Java API with any Scala version (including Scala 3). 
+
+In the end, removing Scala is just part of a larger effort of cleaning up and 
updating
+various technologies from the Flink ecosystem. 
+
+Starting in Flink 1.14, we removed the Mesos integration, isolated Akka, 
deprecated the
+DataSet Java API, and hid the Table API behind an abstraction. There’s already 
a lot of traction in the community towards these endeavors. 
+
+
+## PyFlink
+
+Before Flink 1.15, Python user-defined functions were executed in separate 
Python 
+processes which caused additional serialization/deserialization 
andcommunication overhead. 

Review Comment:
   ```suggestion
   processes which caused additional serialization/deserialization and 
communication overhead. 
   ```
   nit: missing an empty space



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] dianfu commented on a diff in pull request #526: Announcement blogpost for the 1.15 release

Reply via email to