[GitHub] [flink-web] leonardBang commented on a diff in pull request #618: Announcement blogpost for the 1.17 release

via GitHub Fri, 17 Mar 2023 04:54:13 -0700


leonardBang commented on code in PR #618:
URL: https://github.com/apache/flink-web/pull/618#discussion_r1140137456



##########
docs/content/posts/2023-03-09-release-1.17.0.md:
##########
@@ -0,0 +1,485 @@
+---
+authors:
+- LeonardXu:
+  name: "Leonard Xu"
+  twitter: Leonardxbj
+date: "2023-03-09T08:00:00Z" #FIXME: Change to the actual release date, also 
the date in the filename, and the directory name of linked images
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.17
+aliases:
+- /news/2023/03/09/release-1.17.0.html #FIXME: Change to the actual release 
date
+---
+
+The Apache Flink PMC is pleased to announce Apache Flink release 1.17.0. Apache
+Flink is the leading stream processing standard, and the concept of unified
+stream and batch data processing is being successfully adopted in more and more
+companies. Thanks to our excellent community and contributors, Apache Flink
+continues to grow as a technology and remains one of the most active projects 
in
+the Apache Software Foundation. Flink 1.17 had 173 contributors 
enthusiastically
+participating and saw the completion of 7 FLIPs and 600+ issues, bringing many
+exciting new features and improvements to the community.
+
+
+# Towards Streaming Warehouses
+
+In order to achieve greater efficiency in the realm of [streaming
+warehouse](https://www.alibabacloud.com/blog/more-than-computing-a-new-era-led-by-the-warehouse-architecture-of-apache-flink_598821),
+Flink 1.17 contains substantial improvements to both the performance of batch
+processing and the semantics of streaming processing. These improvements
+represent a significant stride towards the creation of a more efficient and
+streamlined data warehouse, capable of processing large quantities of data in
+real-time.
+
+For batch processing, this release includes several new features and
+improvements:
+
+* **Streaming Warehouse API:**
+  
[FLIP-282](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235838061)
+  introduces the new Delete and Update API in Flink SQL which works in(only) 
batch
+  mode. External storage systems like Flink Table Store can implement row-level
+  updates via this new API. The ALTER TABLE syntax is enhanced by including the
+  ability to ADD/MODIFY/DROP columns, primary keys, and watermarks, making it
+  easier for users to maintain their table schema.
+* **Batch Execution Improvements:** Execution of batch workloads has been
+  significantly improved in Flink 1.17 in terms of performance, stability and
+  usability. Performance wise, a 26% TPC-DS improvement on 10T dataset is 
achieved
+  with strategy and operator optimizations, such as new join reordering and 
adaptive
+  local hash aggregation, Hive aggregate functions improvements, and the hybrid
+  shuffle mode enhancements. Stability wise, speculative execution now supports
+  all operators, and the Adaptive Batch Scheduler is more robust against data
+  skew. Usability wise, the tuning effort required for batch workloads has been
+  reduced. The Adaptive Batch Scheduler is now the default scheduler in batch 
mode.
+  The hybrid shuffle is compatible with speculative execution and the Adaptive 
+  Batch Scheduler, next to various configuration simplifications.
+* **SQL Client/Gateway:** Apache Flink 1.17 introduces the "gateway mode" for
+  SQL Client, allowing users to submit SQL queries to a SQL Gateway for 
enhanced
+  functionality. Users can use SQL statements to manage job lifecycles,
+  including displaying job information and stopping running jobs.  This 
provides
+  a powerful tool for managing Flink jobs.
+
+For stream processing, the following features and improvements are realized:
+
+* **Streaming SQL Semantics:** Non-deterministic operations may bring incorrect
+  results or exceptions which is a challenging topic in streaming SQL. 
Incorrect
+  optimization plans and functional issues have been fixed, and the 
experimental
+  feature of 
[PLAN_ADVICE](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/explain/#explaindetails)
+  is introduced to inform of potential correctness risks and optimization
+  suggestions to SQL users.
+* **Checkpoint Improvements:** The generic incremental checkpoint improvements
+  enhance the speed and stability of the checkpoint procedure, and the 
unaligned
+  checkpoint has improved  stability under backpressure and is production-ready
+  in Flink 1.17. Users can manually trigger checkpoints with self-defined
+  checkpoint types while a job is running with the newly introduced REST
+  interface for triggering checkpoints.
+* **Watermark Alignment Enhancement:** Efficient watermark processing directly
+  affects the execution efficiency of event time applications. In Flink 1.17,
+  
[FLIP-217](https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits)
+  introduces an improvement to watermark alignment by aligning data emission
+  across splits within a source operator. This improvement results in more
+  efficient coordination of watermark progress in the source, which in turn
+  mitigates excessive buffering by downstream operators and enhances the 
overall
+  efficiency of steaming job execution.
+* **StateBackend Upgrade:** The updated version of
+  [FRocksDB](https://github.com/ververica/frocksdb) to 6.20.3-ververica-2.0
+  brings improvements to RocksDBStateBackend like sharing memory between slots,
+  and now supports Apple Silicon chipsets like the Mac M1.
+
+
+# Batch processing
+
+As a unified stream and batch data processing engine, Flink stands out
+particularly in the field of stream processing. In order to improve its batch
+processing capabilities, the community contributors put in a lot of effort to
+improve Flink's batch performance and ecosystem in version 1.17. This makes it
+easier for users to build a streaming warehouse based on Flink.
+
+
+## Speculative Execution
+
+Speculative execution for sinks is now supported. Previously, speculative
+execution was not enabled for sinks to avoid instability or incorrect results.
+In Flink 1.17, the context of sinks are improved so that sinks, including [new
+sinks](https://github.com/apache/flink/blob/master//flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java)
+and [OutputFormat
+sinks](https://github.com/apache/flink/blob/master//flink-core/src/main/java/org/apache/flink/api/common/io/OutputFormat.java),
+are aware of the number of attempts. With the number of attempts, sinks are 
able
+to isolate the produced data of different attempts of the same subtask, even if
+the attempts are running at the same time. The _FinalizeOnMaster_ interface is
+also improved so that OutputFormat sinks can see which attempts are finished 
and
+then properly commit the written data. Once a sink can work well with 
concurrent
+attempts, it can implement the decorative interface
+[SupportsConcurrentExecutionAttempts](https://github.com/apache/flink/blob/master//flink-core/src/main/java/org/apache/flink/api/common/SupportsConcurrentExecutionAttempts.java)
+so that speculative execution is allowed to be performed on it. Some built in
+sinks are enabled to do speculative execution, including DiscardingSink,
+PrintSinkFunction, PrintSink, FileSink, FileSystemOutputFormat and
+HiveTableSink.
+
+The slow task detection is improved for speculative execution. Previously, it
+only considered the execution time of tasks when deciding which tasks are slow.
+It now takes the input data volume of tasks into account. Tasks which have a
+longer execution time but consume more data may not be considered as slow. This
+improvement helps to eliminate the negative impacts of data skew on slow task

Review Comment:
   The scope of `Any`   may beyond the fact



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] leonardBang commented on a diff in pull request #618: Announcement blogpost for the 1.17 release

Reply via email to