[GitHub] [flink-web] XComp commented on a change in pull request #436: Add Apache Flink release 1.13.0

GitBox Wed, 28 Apr 2021 11:07:47 -0700


XComp commented on a change in pull request #436:
URL: https://github.com/apache/flink-web/pull/436#discussion_r622363151




##########
File path: _posts/2021-04-22-release-1.13.0.md
##########
@@ -0,0 +1,483 @@
+---
+layout: post 
+title:  "Apache Flink 1.13.0 Release Announcement"
+date: 2021-04-22T08:00:00.000Z 
+categories: news 
+authors:
+- stephan:
+  name: "Stephan Ewen"
+  twitter: "StephanEwen"
+- dwysakowicz:
+  name: "Dawid Wysakowicz"
+  twitter: "dwysakowicz"
+
+excerpt: The Apache Flink community is excited to announce the release of 
Flink 1.13.0! Close to xxx contributors worked on over xxx threads to bring 
significant improvements to usability and observability as well as new features 
that improve elasticity of Flink’s Application-style deployments.
+---
+
+
+The Apache Flink community is excited to announce the release of Flink 1.13.0! 
More than 200
+contributors worked on over 1k threads to bring significant improvements to 
usability and
+observability as well as new features that improve elasticity of Flink’s 
Application-style
+deployments.
+
+This release brings us a big step forward in one of our major efforts: Making 
Stream Processing
+Applications as natural and as simple to manage as any other application. The 
new reactive scaling
+mode means that scaling streaming applications in and out now works like in 
any other application,
+by just changing the number of parallel processes.
+
+We also added a series of improvements that help users better understand the 
performance of
+applications. When the streams don't flow as fast as you’d hope, these can 
help you to understand
+why: Load and backpressure visualization to identify bottlenecks, CPU flame 
graphs to identify hot
+code paths in your application, and State Access Latencies to see how the 
State Backends are keeping
+up.
+
+This blog post describes all major new features and improvements, important 
changes to be aware of
+and what to expect moving forward.
+
+{% toc %}
+
+We encourage you to [download the 
release](https://flink.apache.org/downloads.html) and share your
+feedback with the community through
+the [Flink mailing 
lists](https://flink.apache.org/community.html#mailing-lists)
+or [JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## Notable Features and Improvements
+
+### Reactive mode
+
+The Reactive Mode is the latest piece in Flink's initiative for making Stream 
Processing
+Applications as natural and as simple to manage as any other application.
+
+Flink has a dual nature when it comes to resource management and deployments: 
You can deploy
+clusters onto Resource Managers like Kubernetes or Yarn in such a way that 
Flink actively manages
+the resource, and allocates and releases workers as needed. That is especially 
useful for jobs and
+applications that rapidly change their required resources, like batch 
applications and ad-hoc SQL
+queries. The application parallelism rules, the number of workers follows. We 
call this active
+scaling.
+
+For long running streaming applications, it is often a nicer model to just 
deploy them like any
+other long-running application: The application doesn't really need to know 
that it runs on K8s,
+EKS, Yarn, etc. and doesn't try to acquire a specific amount of workers; 
instead, it just uses the
+number of workers that is given to it. The number of workers rules, the 
application parallelism
+adjusts to that. We call that re-active scaling.
+
+The Application Deployment Mode started this effort, making deployments 
application-like (avoiding
+having to separate deployment steps to (1) start cluster and (2) submit 
application). The reactive
+scheduler completes this, and you now don't have to use extra tools (scripts 
or a K8s operator) any
+more to keep the number of workers and the application parallelism settings in 
sync.
+
+You can now put an auto-scaler around Flink applications like around other 
typical applications — as
+long as you are mindful when configuring the autoscaler that stateful 
applications still spend
+effort in moving state around when scaling.
+
+
+### Bottleneck detection, Backpressure and Idleness Monitoring
+
+One of the most important metrics to investigate when a job does not consume 
records as fast as you
+would expect is the backpressure ratio. It lets you track down bottlenecks in 
your pipelines. The
+current mechanism had two limitations:
+It was heavy, because it worked by repeatedly taking stack trace samples of 
your running tasks. It
+was difficult to find out which vertex was the source of backpressure. In 
Flink 1.13, we reworked
+the mechanism to include new metrics for the time tasks spend being 
backpressured, along with a
+reworked graphical representation of the job (including a percentage of time 
particular vertices are
+backpressured).
+
+
+<figure style="align-content: center">
+  <img src="{{ site.baseurl 
}}/img/blog/2021-04-xx-release-1.13.0/bottleneck.png" style="width: 900px"/>
+</figure>
+
+### Support for CPU flame graphs in Web UI
+
+It is desirable to provide better visibility into the distribution of CPU 
resources while executing
+user code. One of the most visually effective means to do that are Flame 
Graphs. They allow to
+easily answer question like:
+Which methods are currently consuming CPU resources? How does consumption by 
one method compare to
+the others? Which series of calls on the stack led to executing a particular 
method? Flame Graphs
+are constructed by sampling stack traces a number of times. Every method call 
is represented by a
+bar, where the length of the bar is proportional to the number of times it is 
present in the
+samples. In order to prevent unintended impacts on production environments, 
Flame Graphs are
+currently available as an opt-in feature that needs to be enabled in the 
configuration. Once enabled
+they are accessible via a new component in the UI at the level of the selected 
operator:
+
+<figure style="align-content: center">
+  <img src="{{ site.baseurl }}/img/blog/2021-04-xx-release-1.13.0/7.png" 
style="display: block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+### Access Latency Metrics for State
+
+State interactions are a crucial part of the majority of data
+pipelines. Especially in case of using RocksDB they might be rather IO 
intensive and therefore they
+play an important role in the overall performance of the pipeline. Therefore, 
it is important to be
+able to get insights into what is going on under the hood. To provide more 
insights, we exposed
+latency tracking metrics.
+
+The metrics are disabled by default, but you can enable them using the
+`state.backend.rocksdb.latency-track-enabled` option.
+
+### Unified binary savepoint format
+
+All available state backends are forced to produce a single common unified 
binary format for their
+savepoints. This means that savepoints are now mutually interchangeable. You 
are no longer locked
+into the first state backend you chose when starting your application for the 
first time. It makes
+it easier to start with Heap Backend and switch later on to RocksDB, if JVM 
Heap becomes too full (
+which you usually see when the GC times start to go up too much).
+
+### Support user-specified pod templates for Active Kubernetes Deployments
+
+The native Kubernetes deployment received an important update that it supports 
custom pod templates.
+Flink from now on allows users to define the JobManager and TaskManager pods 
via template files.
+This allows to support advanced features that are not supported by Flink 
Kubernetes config options
+directly. Major Observability Improvements
+
+What runs on Flink are often critical workloads with SLAs, so it is important 
to have the right
+tools to understand what is happening inside the applications.
+
+If your application does not progress as expected, the latency is higher or 
the throughput lower
+than you would expect, these features help you figure out what is going on.
+
+### Unaligned Checkpoints - Production Ready
+
+Unaligned checkpoints now support adaptive triggering with timeouts which will 
only perform
+unaligned checkpoint in case of backpressure and use aligned checkpoint 
otherwise. Thus, checkpoint
+times become reliable in all situation while keeping extra state to a minimum.
+
+Further, you can now rescale from unaligned checkpoints to provide more 
resources in case of sudden
+spikes (together with reactive mode). This feature makes it easier to catch up 
and reduce
+backpressure in a timely fashion.
+
+Flink 1.13 brings together all features that the community initially 
envisioned for unaligned
+checkpoints. Together with all bugfixes that happened in 1.12, we generally 
encourage the use of
+unaligned checkpoints for all applications with potential backpressure.
+
+
+### Machine Learning Library moving to a separate Repository
+
+Flink’s ML development has been slowed down by too tight coupling of the ML 
efforts with the core
+system.
+
+ML will now follow the same approach as Stateful Functions and be developed as 
a separate library
+with independent release cycles, but still under the umbrella of the Flink 
project.
+
+---
+
+## Table API/SQL
+
+### Table-valued functions for Windows
+
+Defining time windows is one of the most frequent operations in streaming SQL 
queries. We updated
+the Flink SQL window syntax and semantics to be
+closer to the SQL standard. This is in sync with other vendors in the industry 
by expressing windows
+as a kind of table-valued functions.
+
+TUMBLE and HOP windows have been ported to the new syntax. SESSION windows 
will follow in a
+subsequent release. A new CUMULATE window function can assigns windows with an 
expanding step size
+until the maximum window size is reached:
+
+```sql
+SELECT window_time, window_start, window_end, SUM(price) AS total_price 
+  FROM TABLE(CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, 
INTERVAL '10' MINUTES))
+GROUP BY window_start, window_end, window_time;
+```
+
+By referencing window start and window end of the table-valued window 
functions, it is now possible
+to compute Top N windowed aggregations next to regular windowed aggregations 
and windowed joins:
+
+```sql
+SELECT window_time, ...
+  FROM (
+    SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER 
BY total_price DESC) 
+      as rank 
+    FROM t
+  ) WHERE rank <= 100;
+```
+
+### Improved interoperability between DataStream and Table API/SQL 
+
+The Table API becomes equally important to the DataStream API which is why a 
good interoperability
+between the two APIs is crucial. (FLIP-136): The core Row class has received a 
major update with new
+toString/hash/equals methods. Fields can not only be accessed by position but 
also by name with
+support of sparse representations. The new methods 
`StreamTableEnvironment.toDataStream/fromDataStream`
+can model the DataStream API as a regular table source or sink with a smooth 
type system and event-time
+integration.
+
+```java
+Table table=tableEnv.fromDataStream(
+       dataStream,Schema.newBuilder()
+       .columnByMetadata("rowtime","TIMESTAMP(3)")
+       .watermark("rowtime","SOURCE_WATERMARK()")
+       .build());
+
+DataStream<Row> dataStream=tableEnv.toDataStream(table)
+       .keyBy(r->r.getField("user"))
+       .window(...)
+```
+
+### Improved SQL Client 
+
+During the past releases, the SQL Client's feature set could not keep up with 
the evolution of the
+TableEnvironment in the Table API. In this release, the SQL Client has been 
updated for feature
+parity with the programmatic APIs.
+
+*Easier Configuration and Code Sharing*
+
+The support of YAML files to configure the SQL Client will be discontinued. 
Instead, the client
+accepts one or more initialization scripts to configure an empty session. 
Additionally, a user can
+declare a main SQL script for executing multiple statements in one file.
+
+```
+./sql-client.sh -i init1.sql init2.sql -f sqljob.sql
+```
+
+*More Commands and Support for Statement Sets*
+
+For multi-statement execution, commands such as STATEMENT SET, and improved 
SET/RESET
+commands with new config options have been added. The following example shows 
how a full SQL script
+could be used as a command line tool to submit Flink jobs in a unified way for 
both batch and
+streaming use cases:
+
+```sql
+-- set up a catalog
+CREATE CATALOG hive_catalog WITH ('type' = 'hive');
+USE CATALOG hive_catalog;
+
+-- or use temporary objects
+CREATE TEMPORARY TABLE clicks (
+  user_id BIGINT,
+  page_id BIGINT,
+  viewtime TIMESTAMP
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'clicks',
+  'properties.bootstrap.servers' = '...',
+  'format' = 'avro'
+);
+
+-- set the execution mode for jobs
+SET execution.runtime-mode=streaming;
+
+-- set the sync/async mode for INSERT INTOs
+SET table.dml-sync=false;
+
+-- set the job's parallelism
+SET parallism.default=10;
+
+-- set the job name
+SET pipeline.name = my_flink_job;
+
+-- restore state from the specific savepoint path
+SET execution.savepoint.path=/tmp/flink-savepoints/savepoint-bb0dab;
+
+BEGIN STATEMENT SET;
+
+INSERT INTO pageview_pv_sink
+SELECT page_id, count(1) FROM clicks GROUP BY page_id;
+
+INSERT INTO pageview_uv_sink
+SELECT page_id, count(distinct user_id) FROM clicks GROUP BY page_id;
+
+END;
+```
+
+---
+
+## PyFlink
+
+### State access in DataStream API 
+Stateful stream processing has always been a major differentiator for Apache 
Flink from other data
+processing systems. While many operations in a
+dataflow simply look at one individual event at a time (for example an event 
parser), some
+operations remember information across multiple events (for example window 
operators). These
+operations are called stateful.
+
+The 1.12.0 release was the release when we reintroduced a rearchitected Python 
DataStream API, but
+it is the access to State that comes in this release that fully opens up the 
full potential of
+Apache Flink to Python users.
+
+```python
+class CountWindowAverage(FlatMapFunction):
+    def __init__(self, window_size):
+        self.window_size = window_size
+
+    def open(self, runtime_context: RuntimeContext):
+        descriptor = ValueStateDescriptor("average", 
Types.TUPLE([Types.LONG(), Types.LONG()]))
+        self.sum = runtime_context.get_state(descriptor)
+
+    def flat_map(self, value):
+        current_sum = self.sum.value()
+        if current_sum is None:
+            current_sum = (0, 0)
+        # update the count
+        current_sum = (current_sum[0] + 1, current_sum[1] + value[1])
+        # if the count reaches window_size, emit the average and clear the 
state
+        if current_sum[0] >= self.window_size:
+            self.sum.clear()
+            yield value[0], current_sum[1] // current_sum[0]
+        else:
+            self.sum.update(current_sum)
+
+ds = ...  # type: DataStream
+ds.key_by(lambda row: row[0]) \
+  .flat_map(CountWindowAverage(5))
+```
+
+### Window support in DataStream API
+
+Windows are at the heart of processing infinite streams. Windows split the 
stream into “buckets” of finite size,
+over which we can apply computations. In this release, the support of 
user-defined window was added in PyFlink
+DataStream API.
+
+### Row-based operations in Table API 
+
+Every Flink release brings the Python Table API closer to a full feature 
parity with the Java API.
+In this release, the support for row-based operations was added. Row-based 
operations take row as
+input and returns row as output.
+
+Following is an example of using map operation in Python Table API:
+```python
+@udf(result_type=DataTypes.ROW(
+  [DataTypes.FIELD("c1", DataTypes.BIGINT()),
+   DataTypes.FIELD("c2", DataTypes.STRING())]))
+def my_map(r: Row) -> Row:
+  return Row(r[0] + 1, r[1])
+
+table = ...  # type: Table
+mapped_result = table.map(my_map)
+```
+
+In addition to map, it also supports flat_map, aggregate, flat_aggregate, etc 
for row-based operations.
+
+## Other improvements
+
+**Exception histories in the Web UI**
+
+From now on, the Flink Web UI will present up to n last exceptions that caused 
a job to fail apart
+from the last one.

Review comment:
       ```suggestion
   From now on, the Flink Web UI will present up to n last exceptions that 
caused a job to fail.
   ```
   Just a slight adjustment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] XComp commented on a change in pull request #436: Add Apache Flink release 1.13.0

Reply via email to