davidradl commented on code in PR #26266: URL: https://github.com/apache/flink/pull/26266#discussion_r1985412983
########## docs/content/release-notes/flink-2.0.md: ########## @@ -0,0 +1,1666 @@ +--- +title: "Release Notes - Flink 2.0" +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Release notes - Flink 2.0 + +These release notes discuss important aspects, such as configuration, behavior or dependencies, +that changed between Flink 1.20 and Flink 2.0. Please read these notes carefully if you are +planning to upgrade your Flink version to 2.0. + +## New Features & Behavior Changes + +### State & Checkpoints + +#### Disaggregated State Storage and Management + +##### [FLINK-32070](https://issues.apache.org/jira/browse/FLINK-32070) + +The past decade has witnessed a dramatic shift in Flink's deployment mode, workload patterns, and hardware improvements. We've moved from the map-reduce era where workers are computation-storage tightly coupled nodes to a cloud-native world where containerized deployments on Kubernetes become standard. To enable Flink's Cloud-Native future, we introduce Disaggregated State Storage and Management that uses remote storage as primary storage in Flink 2.0. + +This new architecture solves the following challenges brought in the cloud-native era for Flink. +1. Local Disk Constraints in containerization +2. Spiky Resource Usage caused by compaction in the current state model +3. Fast Rescaling for jobs with large states (hundreds of Terabytes) +4. Light and Fast Checkpoint in a native way + +While extending the state store to interact with remote DFS seems like a straightforward solution, it is insufficient due to Flink's existing blocking execution model. To overcome this limitation, Flink 2.0 introduces an asynchronous execution model alongside a disaggregated state backend, as well as newly designed SQL operators performing asynchronous state access in parallel. + +#### Native file copy support + +##### [FLINK-35886](https://issues.apache.org/jira/browse/FLINK-35886) + +Users can now configure Flink to use s5cmd to speed up downloading files from S3 during the recovery process, when using RocksDB, by a factor of 2. + +#### Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler + +##### [FLINK-35549](https://issues.apache.org/jira/browse/FLINK-35549) + +This enables the user to synchronize checkpointing and rescaling in the AdaptiveScheduler. New configuration parameters were introduced for the maximum trigger delay and the number of acceptable failed checkpoints before triggering a rescale to make this behavior configurable. These parameters were updated in [FLINK-36015](https://issues.apache.org/jira/browse/FLINK-36015). Review Comment: I suggest sentence `This enables the user to synchronize checkpointing and rescaling in the AdaptiveScheduler.` is extends with `so that ...` to bring out the value. What is the value of doing these things together is it efficiency , or a more robust way of doing it? I suggest "maximum trigger delay and the number of acceptable failed checkpoints before triggering a rescale to make this behaviour configurable." have links for the config options - again it would be dgood to try to show the value these options bring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
