gaoyunhaii commented on a change in pull request #352:
URL: https://github.com/apache/flink-web/pull/352#discussion_r450170041
##########
File path: _posts/2020-07-06-release-1.11.0.md
##########
@@ -0,0 +1,309 @@
+---
+layout: post
+title: "Apache Flink 1.11.0 Release Announcement"
+date: 2020-07-06T08:00:00.000Z
+categories: news
+authors:
+- morsapaes:
+ name: "Marta Paes"
+ twitter: "morsapaes"
+
+excerpt: The Apache Flink community is proud to announce the release of Flink
1.11.0! More than 200 contributors worked on over 1.3k issues to bring
significant improvements to usability as well as new features to Flink users
across the whole API stack. We're particularly excited about unaligned
checkpoints to cope with high backpressure scenarios, a new source API that
simplifies and unifies the implementation of (custom) sources, and support for
Change Data Capture (CDC) and other common use cases in the Table API/SQL. Read
on for all major new features and improvements, important changes to be aware
of and what to expect moving forward!
+---
+
+The Apache Flink community is proud to announce the release of Flink 1.11.0!
More than 200 contributors worked on over 1.3k issues to bring significant
improvements to usability as well as new features to Flink users across the
whole API stack. Some highlights that we're particularly excited about are:
+
+* The core engine is introducing **unaligned checkpoints**, a major change to
Flink's fault tolerance mechanism that improves checkpointing performance under
heavy backpressure.
+
+* A **new Source API** that simplifies the implementation of (custom) sources
by unifying batch and streaming execution, as well as offloading internals such
as event-time handling, watermark generation or idleness detection to Flink.
+
+* Flink SQL is introducing **Support for Change Data Capture (CDC)** to easily
consume and interpret database changelogs from tools like Debezium. The renewed
**FileSystem Connector** also expands the set of use cases and formats
supported in the Table API/SQL, enabling scenarios like streaming data directly
from Kafka to Hive.
+
+* Multiple performance optimizations to PyFlink, including support for
**vectorized User-defined Functions (Pandas UDFs)**. This improves
interoperability with libraries like Pandas and NumPy, making Flink more
powerful for data science and ML workloads.
+
+Read on for all major new features and improvements, important changes to be
aware of and what to expect moving forward!
+
+{% toc %}
+
+The binary distribution and source artifacts are now available on the updated
[Downloads page]({{ site.baseurl }}/downloads.html) of the Flink website, and
the most recent distribution of PyFlink is available on
[PyPI](https://pypi.org/project/apache-flink/). Please review the [release
notes]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/release-notes/flink-1.11.html) carefully, and check
the complete [release
changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346364&styleName=Html&projectId=12315522)
and [updated documentation]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/flink-docs-release-1.11/) for more details.
+
+We encourage you to download the release and share your feedback with the
community through the [Flink mailing
lists](https://flink.apache.org/community.html#mailing-lists) or
[JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## New Features and Improvements
+
+### Unaligned Checkpoints (Beta)
+
+Triggering a checkpoint in Flink will cause a [checkpoint barrier]({{
site.DOCS_BASE_URL
}}flink-docs-release-1.11/internals/stream_checkpointing.html#barriers) to flow
from the sources of your topology all the way towards the sinks. For operators
that receive more than one input stream, the barriers flowing through each
channel need to be aligned before the operator can snapshot its state and
forward the checkpoint barrier — typically, this alignment will take just a few
milliseconds to complete, but it can become a bottleneck in backpressured
pipelines as:
+
+ * Checkpoint barriers will flow much slower through backpressured channels,
effectively blocking the remaining channels and their upstream operators during
checkpointing;
+
+ * Slow checkpoint barrier propagation leads to longer checkpointing times and
can, worst case, result in little to no progress in the application.
+
+To improve the performance of checkpointing under backpressure scenarios, the
community is rolling out the first iteration of unaligned checkpoints
([FLIP-76](https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints))
with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1),
this approach doesn’t wait for barrier alignment across input channels, instead
allowing barriers to overtake in-flight records (i.e., data stored in buffers)
and forwarding them downstream before the synchronous part of the checkpoint
takes place (Fig. 2).
+
+<div style="line-height:60%;">
+ <br>
+</div>
+
+<div class="row">
+ <div class="col-lg-6">
+ <div class="text-center">
+ <figure>
+ <img src="{{ site.baseurl
}}/img/blog/2020-07-06-release-1.11.0/image1.gif" width="600px" alt="Aligned
Checkpoints"/>
+ <br/><br/>
+ <figcaption><i><b>Fig.1:</b> Aligned
Checkpoints</i></figcaption>
+ </figure>
+ </div>
+ </div>
+ <div class="col-lg-6">
+ <div class="text-center">
+ <figure>
+ <img src="{{ site.baseurl
}}/img/blog/2020-07-06-release-1.11.0/image2.png" width="600px" alt="Unaligned
Checkpoints"/>
+ <br/><br/>
+ <figcaption><i><b>Fig.2:</b> Unaligned
Checkpoints</i></figcaption>
+ </figure>
+ </div>
+ </div>
+</div>
+
+<div style="line-height:150%;">
+ <br>
+</div>
+
+Because in-flight records have to be persisted as part of the snapshot,
unaligned checkpoints will lead to increased checkpoints sizes. On the upside,
**checkpointing times are heavily reduced**, so users will see more progress
(even in unstable environments) as more up-to-date checkpoints will lighten the
recovery process. You can learn more about the current limitations of unaligned
checkpoints in the [documentation]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/ops/state/checkpoints.html#unaligned-checkpoints),
and track the improvement work planned for this feature in
[FLINK-14551](https://issues.apache.org/jira/browse/FLINK-14551).
+
+As with any beta feature, we appreciate early feedback that you might want to
share with the community after giving unaligned checkpoints a try!
+
+<span class="label label-info">Info</span> To enable this feature, you need to
configure the [``enableUnalignedCheckpoints``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html)
option in your [checkpoint config]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing).
Please note that unaligned checkpoints can only be enabled if
``checkpointingMode`` is set to ``CheckpointingMode.EXACTLY_ONCE``.
+
+### Unified Watermark Generators
+
+So far, watermark generation (prev. also called _assignment_) relied on two
different interfaces: [``AssignerWithPunctuatedWatermarks``]({{
site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/functions/AssignerWithPunctuatedWatermarks.html)
and [``AssignerWithPeriodicWatermarks``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/functions/AssignerWithPeriodicWatermarks.html);
that were closely intertwined with timestamp extraction. This made it
difficult to implement long-requested features like support for idleness
detection, besides leading to code duplication and maintenance burden. With
[FLIP-126](https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners),
the legacy watermark assigners are unified into a single interface: the
[``WatermarkGenerator``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkGenerator.html);
and detached from the [``TimestampAssigner``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/TimestampAssigner.html).
+
+This gives users more control over watermark emission and simplifies the
implementation of new connectors that need to support watermark assignment and
timestamp extraction at the source (see _[New Data Source
API](#new-data-source-api-beta)_). Multiple [strategies for watermarking]({{
site.DOCS_BASE_URL
}}flink-docs-release-1.11//dev/event_timestamps_watermarks.html#introduction-to-watermark-strategies)
are available out-of-the-box as convenience methods in Flink 1.11 (e.g.
[``forBoundedOutOfOrderness``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#forBoundedOutOfOrderness-java.time.Duration-),
[``forMonotonousTimestamps``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#forMonotonousTimestamps--)),
though you can also choose to customize your own.
+
+**Support for Watermark Idleness Detection**
+
+The [``WatermarkStrategy.withIdleness()``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#withIdleness-java.time.Duration-)
method allows you to mark a stream as idle if no events arrive within a
configured time (i.e. a timeout duration), which in turn allows handling event
time skew properly and preventing idle partitions from holding back the event
time progress of the entire application. Users can already benefit from
**per-partition idleness detection** in the Kafka connector, which has been
adapted to use the new interfaces
([FLINK-17669](https://issues.apache.org/jira/browse/FLINK-17669)).
+
+<span class="label label-info">Note</span>
[FLIP-126](https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners)
introduces no breaking changes, but we recommend that users give preference to
the new ``WatermarkGenerator`` interface moving forward, in preparation for the
deprecation of the legacy watermark assigners in future releases.
+
+### New Data Source API (Beta)
+
+Up to this point, writing a production-grade source connector for Flink was a
non-trivial task that required users to be somewhat familiar with Flink
internals and account for implementation details like event time assignment,
watermark generation or idleness detection in their code. Flink 1.11 introduces
a new Data Source API
([FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface))
to overcome these limitations, as well as the need to rewrite separate code
for batch and streaming execution.
+
+<center>
+ <figure>
+ <img src="{{ site.baseurl
}}/img/blog/2020-07-06-release-1.11.0/image3.png" width="600px" alt="Data
Source API"/>
+ </figure>
+</center>
+
+<div style="line-height:150%;">
+ <br>
+</div>
+
+Separating the work of split discovery and the actual reading of the consumed
data (i.e. the [_splits_]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/stream/sources.html#data-source-concepts)) in
different components — resp. the ``SplitEnumerator`` and ``SourceReader`` —
allows mixing and matching different enumeration strategies and split readers.
+
+As an example, the existing Kafka connector has multiple strategies for
partition discovery that are intermingled with the rest of the code. With the
new interfaces in place, it would only need a single reader implementation and
there could be several split enumerators for the different partition discovery
strategies.
+
+**Batch and Streaming Unification**
+
+Source connectors implemented using the Data Source API will be able to work
both as a bounded (_batch_) and unbounded (_streaming_) source. The difference
between both cases is minimal: for bounded input, the ``SplitEnumerator`` will
generate a fixed set of splits and each split is finite; for unbounded input,
either the splits are not finite or the ``SplitEnumerator`` keeps generating
new splits.
+
+**Implicit Watermark and Event Time Handling**
+
+The ``TimestampAssigner`` and ``WatermarkGenerator`` run transparently as part
of the ``SourceReader`` component, so users also don’t have to implement any
timestamp extraction or watermark generation code.
+
+<span class="label label-info">Note</span> The existing source connectors have
not yet been reimplemented using the Data Source API — this is planned for
upcoming releases. If you’re looking to implement a new source, please refer to
the [Data Source documentation]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/stream/sources.html#data-sources) and [the tips
on source development]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/stream/sources.html#the-data-source-api).
+
+### Application Mode Deployments
+
+Prior to Flink 1.11, jobs in a Flink application could either be submitted to
a long-running [Flink Session Cluster]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/concepts/flink-architecture.html#flink-session-cluster)
(_session mode_) or a dedicated [Flink Job Cluster]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/concepts/flink-architecture.html#flink-job-cluster)
(_job mode_). For both these modes, the ``main()`` method of user programs runs
on the _client_ side. This presents some challenges: on one hand, if the client
is part of a large installation, it can easily become a bottleneck for
``JobGraph`` generation; and on the other, it’s not a good fit for
containerized environments like Docker or Kubernetes.
+
+From this release on, Flink gets an additional deployment mode: [Application
Mode]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/ops/deployment/#application-mode)
([FLIP-85](https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode));
where the ``main()`` method runs on the cluster, rather than the client. The
job submission becomes a one-step process: you package your application logic
and dependencies into an executable job JAR and the cluster entrypoint
([``ApplicationClusterEntryPoint``]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/api/java/org/apache/flink/client/deployment/application/ApplicationClusterEntryPoint.html))
is responsible for calling the ``main()`` method to extract the ``JobGraph``.
+
+In Flink 1.11, the community worked to already support _application mode_ in
Kubernetes ([FLINK-10934](https://issues.apache.org/jira/browse/FLINK-10934)).
+
+### Other Improvements
+
+**Unified Memory Configuration for JobManagers
([FLIP-116](https://jira.apache.org/jira/browse/FLINK-16614))**
+
+Following the work started in Flink 1.10 to improve memory management and
configuration, this release introduces a new memory model that aligns the
[JobManagers’ configuration options]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/ops/memory/mem_setup_master.html) and terminology
with that introduced in
[FLIP-49](https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors)
for TaskManagers. This affects all deployment types: standalone, YARN, Mesos
and the new active Kubernetes integration.
+
+<span class="label label-danger">Attention</span> Reusing a previous Flink
configuration without any adjustments can result in differently computed memory
parameters for the JVM and, as a result, performance changes or even failures.
Make sure to check the [migration
guide](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration)
if you’re planning to update to the latest version.
+
+**Improvements to the Flink WebUI
([FLIP-75](https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal))**
+
+In Flink 1.11, the community kicked off a series of improvements to the Flink
WebUI. The first to roll out are better TaskManager and JobManager log display
([FLIP-103](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427143)),
as well as a new thread dump utility
([FLINK-14816](https://issues.apache.org/jira/browse/FLINK-14816)). Some
additional work planned for upcoming releases includes better backpressure
detection, more flexible and configurable exception display and support for
displaying the history of subtask failure attempts.
+
+**Docker Image Unification
([FLIP-111](https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification))**
+
+With this release, all Docker-related resources have been consolidated into
[apache/flink-docker](https://github.com/apache/flink-docker) and the entry
point script has been extended to allow users to run the default Docker image
in different modes without the need to create a custom image. The [updated
documentation](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image)
describes in detail how to use and customize the official Flink Docker image
for different environments and deployment modes.
+
+<hr>
+
+### Table API/SQL: Support for Change Data Capture (CDC)
+
+Change Data Capture (CDC) has become a popular pattern to capture committed
changes from a database and propagate those changes to downstream consumers,
for example to keep multiple datastores in sync and avoid common pitfalls such
as [dual writes](https://thorben-janssen.com/dual-writes/). Being able to
easily ingest and interpret these changelogs into the Table API/SQL has been a
highly demanded feature in the Flink community — and it’s now possible with
Flink 1.11.
+
+To extend the scope of the Table API/SQL to use cases like CDC, Flink 1.11
introduces new table source and sink interfaces with **changelog mode** (see
_[New TableSource and TableSink
Interfaces](#other-improvements-to-the-table-apisql)_) and support for the
[Debezium](https://debezium.io/) and [Canal](https://github.com/alibaba/canal)
formats
([FLIP-105](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289)).
This means that dynamic tables sources are no longer limited to append-only
operations and can ingest these external changelogs (``INSERT`` events),
interpret them into change operations (``INSERT``, ``UPDATE``, ``DELETE``
events) and emit them downstream with the change type.
+
+<center>
+ <figure>
+ <img src="{{ site.baseurl
}}/img/blog/2020-07-06-release-1.11.0/image4.png" width="500px" alt="CDC"/>
+ </figure>
+</center>
+
+<div style="line-height:150%;">
+ <br>
+</div>
+
+Users have to specify either ``“format=debezium-json”`` or
``“format=canal-json”`` in their ``CREATE TABLE`` statement to consume
changelogs using SQL DDL.
+
+```sql
+CREATE TABLE my_table (
+ ...
+) WITH (
+ 'connector'='...', -- e.g. 'kafka'
+ 'format'='debezium-json',
+ 'debezium-json.schema-include'='true' -- default: false (Debezium can be
configured to include or exclude the message schema)
+ 'debezium-json.ignore-parse-errors'='true' -- default: false
+);
+```
+
+Flink 1.11 only supports Kafka as a changelog source out-of-the-box and
JSON-encoded changelogs, with Avro (Debezium) and Protobuf (Canal) planned for
future releases. There are also plans to support MySQL binlogs and Kafka
compacted topics as sources, as well as to extend changelog support to batch
execution.
+
+<span class="label label-danger">Attention</span> There is a known issue
([FLINK-18461](https://issues.apache.org/jira/browse/FLINK-18461)) that
prevents changelog sources from being used to write to upsert sinks (e.g.
MySQL, HBase, Elasticsearch). This will be fixed in the next patch release
(1.11.1).
+
+### Table API/SQL: JDBC Catalog Interface and Postgres Catalog
+
+Flink 1.11 introduces a generic JDBC catalog interface
([FLIP-93](https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog))
that enables users of the Table API/SQL to **derive table schemas
automatically** from connections to relational databases over [JDBC]({{
site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/table/connect.html#jdbc-connector). This
eliminates the previous need for manual schema definition and type conversion,
and also allows to check for schema errors at compile time instead of runtime.
+
+The first implementation, rolling out with the new release, is the [Postgres
catalog]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/table/catalogs.html#postgrescatalog).
+
+### Table API/SQL: FileSystem Connector with Support for Avro, ORC and Parquet
+
+To improve the user experience for end-to-end streaming ETL use cases, the
Flink community worked on a new FileSystem Connector for the Table API/SQL
([FLIP-115](https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table)).
The implementation is based on Flink’s [FileSystem abstraction]({{
site.DOCS_BASE_URL }}flink-docs-release-1.11/ops/filesystems/index.html) and
reuses [StreamingFileSink]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/connectors/streamfile_sink.html) to ensure the
same set of capabilities and consistent behaviour with the DataStream API.
+
+StreamingFileSink has also been extended to support a wider range of
[bulk-encoded formats]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.11/dev/connectors/streamfile_sink.html#bulk-encoded-formats),
including Avro
([FLINK-11395](https://issues.apache.org/jira/browse/FLINK-11395)), (Proto)
Parquet ([FLINK-11427](https://issues.apache.org/jira/browse/FLINK-11427)) and
Orc ([FLINK-10114](https://issues.apache.org/jira/browse/FLINK-10114)).
Review comment:
OK, thanks @morsapaes !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]