Re: [PR] Add Flink 2.1.0 release [flink-web]

via GitHub Sun, 27 Jul 2025 20:01:48 -0700


leonardBang commented on code in PR #800:
URL: https://github.com/apache/flink-web/pull/800#discussion_r2234502885



##########
docs/data/release_archive.yml:
##########
@@ -1,5 +1,9 @@
 release_archive:
   flink:
+    -
+      version_short: "2.0"

Review Comment:
   ```suggestion
         version_short: "2.1"
   ```



##########
docs/content/posts/2025-07-31-release-2.1.0.md:
##########
@@ -0,0 +1,528 @@
+---
+authors:
+  - reswqa:
+    name: "Ron Liu"
+    twitter: "Ron999"
+
+date: "2025-07-31T08:00:00Z"
+subtitle: ""
+title: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive 
Upgrades
+aliases:
+  - /news/2025/07/31/release-2.1.0.html
+---
+
+The Apache Flink PMC is proud to announce the release of Apache Flink 2.1.0. 
This marks a significant milestone 
+in the evolution of the real-time data processing engine into a unified Data + 
AI platform. This release brings 
+together 116 global contributors, implements 16 FLIPs (Flink Improvement 
Proposals), and resolves over 220 issues, 
+with a strong focus on deepening the integration of real-time AI and 
intelligent stream processing:
+
+1. **Breakthroughs in Real-Time AI**:
+   - Introduces AI Model DDL, enabling flexible management of AI models 
through Flink SQL and the Table API.
+
+   - Extends the `ML_PREDICT` Table-Valued Function (TVF), empowering 
real-time invocation of AI models within Flink SQL, 
+     laying the foundation for building end-to-end real-time AI workflows.
+
+2. **Enhanced Real-Time Data Processing**:
+   - Process Table Functions (PTFs) open up the Flink SQL engine for more 
event-driven application.
+     Giving access to Flink’s managed state, event-time and timer services, 
and underlying table changelogs.
+   
+   - Adds the `VARIANT` data type for efficient handling of semi-structured 
data like JSON. Combined with the `PARSE_JSON` function 
+     and lakehouse formats (e.g., Apache Paimon), it enables dynamic schema 
data analysis.
+
+   - Significantly optimizes streaming joins with the innovative introduction 
of `DeltaJoin` and `MultiJoin` strategies, 
+     eliminating state bottlenecks and improving resource utilization and job 
stability.
+
+Flink 2.1.0 seamlessly integrates real-time data processing with AI models, 
empowering enterprises to advance from 
+real-time analytics to real-time intelligent decision-making, meeting the 
evolving demands of modern data applications. 
+We extend our gratitude to all contributors for their invaluable support!
+
+Let's dive into the highlights.
+
+# Flink SQL Improvements
+
+## Model DDLs
+Since Flink 2.0, we have introduced dedicated syntax for AI models, enabling 
users to define models 
+as easily as creating catalog objects and invoke them like standard functions 
or table functions in SQL statements. 
+In Flink 2.1, we have also added Model DDLs Table API support, enabling users 
to define and manage AI models programmatically 
+via the Table API in both Java and Python. This provides a flexible, 
code-driven alternative to SQL for model management and
+integration within Flink applications.
+
+Example: 
+- Defining a Model via Flink SQL
+```sql
+CREATE MODEL my_model
+INPUT (f0 STRING)
+OUTPUT (label STRING)
+WITH (
+  'task' = 'classification',
+  'type' = 'remote',
+  'provider' = 'openai',
+  'openai.endpoint' = 'remote',
+  'openai.api_key' = 'abcdefg',
+);
+```
+
+- Defining a Model via Table API (Java)
+```java
+tEnv.createModel(
+    "MyModel", 
+    ModelDescriptor.forProvider("OPENAI")
+      .inputSchema(Schema.newBuilder()
+        .column("f0", DataTypes.STRING())
+        .build())
+      .outputSchema(Schema.newBuilder()
+        .column("label", DataTypes.STRING())
+        .build())
+      .option("task", "classification")
+      .option("type", "remote")
+      .option("provider", "openai")
+      .option("openai.endpoint", "remote")
+      .option("openai.api_key", "abcdefg")
+      .build(),
+    true);
+```
+
+**More Information**
+* [FLINK-37548](https://issues.apache.org/jira/browse/FLINK-37548)
+* 
[FLIP-437](https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL)
+* 
[FLIP-507](https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API)
+
+## Realtime AI Function
+
+Based on the AI model DDL, In Flink 2.1, we expanded the `ML_PREDICT` 
table-valued function (TVF) to perform realtime model inference in SQL queries, 
applying machine learning models to data streams seamlessly.
+The implementation supports both Flink builtin model providers (OpenAI) and 
interfaces for users to define custom model providers, accelerating Flink's 
evolution from a real-time 
+data processing engine to a unified realtime AI platform. Looking ahead, we 
plan to introduce more AI functions such as `ML_EVALUATE`, `VECTOR_SEARCH` to 
unlock end-to-end experience 
+for real-time data processing, model training, and inference.
+
+Take the following SQL statements as an example:
+```sql
+-- Declare a AI model
+CREATE MODEL `my_model`
+INPUT (text STRING)
+OUTPUT (response STRING)
+WITH(
+  'provider' = 'openai',
+  'endpoint' = 'https://api.openai.com/v1/llm/v1/chat',
+  'api-key' = 'abcdefg',
+  'system-prompt' = 'translate to Chinese',
+  'model' = 'gpt-4o'
+);
+
+-- Basic usage
+SELECT * FROM ML_PREDICT(
+  TABLE input_table,
+  MODEL my_model,
+  DESCRIPTOR(text)
+);
+
+-- With configuration options
+SELECT * FROM ML_PREDICT(
+  TABLE input_table,
+  MODEL my_model,
+  DESCRIPTOR(text)
+  MAP['async', 'true', 'timeout', '100s']
+);
+
+-- Using named parameters
+SELECT * FROM ML_PREDICT(
+  INPUT => TABLE input_table,
+  MODEL => MODEL my_model,
+  ARGS => DESCRIPTOR(text),
+  CONFIG => MAP['async', 'true']
+);
+```
+
+**More Information**
+* [FLINK-34992](https://issues.apache.org/jira/browse/FLINK-34992)
+* [FLINK-37777](https://issues.apache.org/jira/browse/FLINK-37777)
+* 
[FLIP-437](https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL)
+* 
[FLIP-525](https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design)
+* [Model 
Inference](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/sql/queries/model-inference/)
+
+## Process Table Functions (PTFs)
+
+Apache Flink now includes support for Process Table Functions (PTFs), the most 
powerful function kind for Flink SQL and Table API.
+
+Conceptually, a PTF is a superset of all other user-defined functions, mapping 
zero, one, or multiple tables to zero, one, or multiple rows.
+They enable implementing user-defined operators that can be as feature-rich as 
built-in operations. PTFs have access to Flink's managed state,
+event-time, timer services, and table changelogs.
+
+PTFs enable the following tasks:
+- Apply transformations on each row of a table.
+- Logically partition the table into distinct sets and apply transformations 
per set.
+- Store seen events for repeated access.
+- Continue the processing at a later point in time enabling waiting, 
synchronization, or timeouts.
+- Buffer and aggregate events using complex state machines or rule-based 
conditional logic.
+
+This moves Flink SQL significantly closer to the DataStream API, leveraging 
the robustness and familiarity of the existing SQL ecosystem.
+Detailed information on PTF syntax and semantics can be found here: [Process 
Table 
Functions](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/ptfs/).
+
+Take the following code as an example:
+```java
+// Declare a ProcessTableFunction for memorizing your customers
+public static class GreetingWithMemory extends ProcessTableFunction<String> {
+    public static class CountState {
+               public long counter = 0L;
+    }
+
+    public void eval(@StateHint CountState state, 
@ArgumentHint(SET_SEMANTIC_TABLE) Row input) {
+        state.counter++;
+        collect("Hello " + input.getFieldAs("name") + ", your " + 
state.counter + " time?");
+    }
+}
+
+TableEnvironment env = TableEnvironment.create(...);
+
+// Call the PTF in Table API
+env.fromValues("Bob", "Alice", "Bob")
+   .as("name")
+   .partitionBy($("name"))
+   .process(GreetingWithMemory.class)
+   .execute()
+   .print();
+
+// Call the PTF in SQL
+env.executeSql("SELECT * FROM GreetingWithMemory(TABLE Names PARTITION BY 
name)")
+   .print();
+```
+
+**More Information**
+* 
[FLIP-440](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093)
+
+## Variant Type
+
+Variant is a new data type for semi-structured data(e.g. JSON), it supports 
storing any
+semi-structured data, including ARRAY, MAP(with STRING keys), and scalar 
types—while preserving
+field type information in a JSON-like structure. Unlike ROW and STRUCTURED 
types, VARIANT provides
+superior flexibility for handling deeply nested and evolving schemas.
+
+Users can use `PARSE_JSON` or`TRY_PARSE_JSON` to convert JSON-formatted 
VARCHAR data to VARIANT. In
+addition, table formats like Apache Paimon now support the VARIANT type, this 
enable
+users to efficiently process semi-structured data in lakehouse using Flink SQL.
+
+Take the following SQL statements as an example:
+```sql
+CREATE TABLE t1 (
+  id INTEGER,
+  v STRING -- a json string
+) WITH (
+  'connector' = 'mysql-cdc',
+  ...
+);
+ 
+CREATE TABLE t2 (
+  id INTEGER,
+  v VARIANT
+) WITH (
+  'connector' = 'paimon'
+  ...
+);
+ 
+-- write to t2 with VARIANT type
+INSERT INTO t2 SELECT id, PARSE_JSON(v) FROM t1;
+```
+
+**More Information**
+* [FLINK-37922](https://issues.apache.org/jira/browse/FLINK-37922)
+* 
[FLIP-521](https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing)
+* 
[Variant](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/types/#other-data-types)
+
+## Structured Type Enhancements
+
+In Flink 2.1, we enabled declare user-defined objects via STRUCTURED TYPE 
directly in `CREATE TABLE` DDL
+statements, resolving critical type equivalence issues and significantly 
improving API usability.
+
+Take the following SQL statements as an example:
+```sql
+CREATE TABLE MyTable (
+    uid BIGINT,
+    user STRUCTURED<'com.example.User', name STRING, age INT NOT NULL>
+);
+
+-- Casts a row type into a structured type
+INSERT INTO MyTable SELECT 1, CAST(('Bob', 42) AS 
STRUCTURED<'com.example.User', name STRING, age INT>);
+```
+
+**More Information**
+* [FLINK-37861](https://issues.apache.org/jira/browse/FLINK-37861)
+* 
[FLIP-520](https://cwiki.apache.org/confluence/display/FLINK/FLIP-520%3A+Simplify+StructuredType+handling)
+* 
[STRUCTURED](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/types/#user-defined-data-types)
+
+## Delta Join
+
+Introduced a new DeltaJoin operator in stream processing jobs, along with 
optimizations for simple
+streaming join pipeline. Compared to traditional streaming join, delta join 
requires significantly
+less state, effectively mitigating issues related to large state, including 
resource bottlenecks,
+slow checkpointing, and lengthy job recovery times. This feature is enabled by 
default.
+
+**More Information**
+* [FLINK-37836](https://issues.apache.org/jira/browse/FLINK-37836)
+* [Delta 
Join](https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin)
+
+## Multiple Regular Joins
+
+Streaming Flink jobs with multiple cascaded streaming joins often experience 
operational
+instability and performance degradation due to large state sizes. This release 
introduces a
+multi-join operator (`StreamingMultiJoinOperator`) that drastically reduces 
state size
+by eliminating intermediate results. The operator achieves this by processing 
joins across all input
+streams simultaneously within a single operator instance, storing only raw 
input records instead of
+propagated join output.
+
+This "zero intermediate state" approach primarily targets state reduction, 
offering substantial
+benefits in resource consumption and operational stability. This feature is 
now available for
+pipelines with multiple INNER/LEFT joins that share at least one common join 
key, enable with
+`SET 'table.optimizer.multi-join.enabled' = 'true'`.
+
+**Benchmark**: we conducted a benchmark comparing the benefits of the 
multi-join operator with default binary joins, more detail can see
+[MultiJoin 
Benchmark](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/tuning/#multiple-regular-joins).
+
+
+**More Information**
+* [FLINK-37859](https://issues.apache.org/jira/browse/FLINK-37859)
+* 
[MultiJoin](https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator)
+
+## Async Lookup Join Enhancements
+
+In previous versions of async lookup join, even if users set 
`table.exec.async-lookup.output-mode` to `ALLOW_UNORDERED`, 
+the engine would still forcibly fallback to ordered mode when processing 
update streams to ensure correctness.
+Starting from Flink 2.1, the engine allows parallel processing of unrelated 
update records while still ensuring correctness, 
+thereby achieving higher throughput when handling changelog streams.
+
+**More Information**
+* [FLINK-37874](https://issues.apache.org/jira/browse/FLINK-37874)
+* [Async Lookup 
Join](https://cwiki.apache.org/confluence/display/FLINK/FLIP-519%3A++Introduce+async+lookup+key+ordered+mode)
+
+## Sink Reuse
+
+Within a single Flink job, when writing multiple `INSERT INTO` statements 
updating identical columns (
+different columns will be supported in next release) of a target table, the 
planner will optimize 
+the execution plan and merge the sink nodes to achieve reuse. This will be a 
great usability improvement 
+for users using partial-update features with data lake storages like Apache 
Paimon.
+
+**More Information**
+* [FLINK-37227](https://issues.apache.org/jira/browse/FLINK-37227)
+* [Sink 
Reuse](https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner)
+
+## Support Smile Format for Compiled Plan Serialization
+
+In Flink 2.1, we added smile binary format support for compiled plans, 
providing a memory-efficient
+alternative to JSON for serialization/deserialization. By default JSON is 
used, in order to use
+smile format need to call `CompiledPlan#asSmileBytes` and 
`PlanReference#fromSmileBytes` method.
+
+**More Information**
+* [FLINK-37341](https://issues.apache.org/jira/browse/FLINK-37341)
+* [Smile 
Format](https://cwiki.apache.org/confluence/display/FLINK/FLIP-508%3A+Add+support+for+Smile+format+for+Compiled+plans)
+* [Smile Format 
Specification](https://github.com/FasterXML/smile-format-specification/blob/master/smile-specification.md)
+
+# Runtime
+
+## Add Pluggable Batching for Async Sink
+
+In Flink 2.1, we introduced a pluggable batching mechanism for async sink that 
allows users to define custom
+batching write strategies tailored to specific requirements.
+
+**More Information**
+* [FLINK-37298](https://issues.apache.org/jira/browse/FLINK-37298)
+* [Pluggable Batching for Async 
Sink](https://cwiki.apache.org/confluence/display/FLINK/FLIP-509+Add+pluggable+Batching+for+Async+Sink)
+
+## Split-level Watermark Metrics
+
+In Flink 2.1, we added some split level watermark metrics, covering watermark 
progress and per-split state gauges
+to enhance the watermark observability:
+
+- `currentWatermark`: the last watermark this split has received.
+- `activeTimeMsPerSecond`: the time this split is active per second.
+- `pausedTimeMsPerSecond`: the time this split is paused due to watermark 
alignment per second.
+- `idleTimeMsPerSecond`: the time this split is marked idle by idleness 
detection per second.
+- `accumulatedActiveTimeMs`: accumulated time this split was active since 
registered.
+- `accumulatedPausedTimeMs`: accumulated time this split was paused since 
registered.
+- `accumulatedIdleTimeMs`: accumulated time this split was idle since 
registered.
+
+**More Information**
+* [FLINK-37410](https://issues.apache.org/jira/browse/FLINK-37410)
+* [Watermark 
Metrics](https://cwiki.apache.org/confluence/display/FLINK/FLIP-513%3A+Split-level+Watermark+Metrics)
+
+# Connectors
+
+## Introduce SQL Connector for Keyed State
+
+In Flink 2.1, we introduced a new connector for keyed state. This connector 
allows
+users to query keyed state directly from checkpoint or savepoint using Flink 
SQL, making it easier
+to inspect, debug, and validate the state of Flink jobs without custom 
tooling. This feature is
+especially useful for analyzing long-running jobs and validating state 
migrations.
+
+With a simple DDL, you can expose ValueState as table and run Flink SQL query 
the snapshot:
+```sql
+CREATE TABLE keyed_state (
+    k INTEGER,
+    user_id STRING,
+    balance DOUBLE
+) WITH (
+    'connector' = 'savepoint',
+    'path' = 'file:///savepoint/path&',
+    'uid' = 'my-operator-id'
+);
+
+-- Query the keyed state
+SELECT * FROM keyed_state;
+```
+
+**More Information**
+* [FLINK-36929](https://issues.apache.org/jira/browse/FLINK-36929)
+* [Savepoint 
Connector](https://cwiki.apache.org/confluence/display/FLINK/FLIP-496%3A+SQL+connector+for+keyed+savepoint+data)
+
+# Others Improvements
+
+## PyFlink
+In PyFlink 2.1, we added support for Python 3.12 and removed Python 3.8.
+
+**More Information**
+* [FLINK-37823](https://issues.apache.org/jira/browse/FLINK-37823)
+* [FLINK-37776](https://issues.apache.org/jira/browse/FLINK-37776)
+
+
+## Upgrade flink-shaded version to 20.0
+
+Bump flink-shaded version to 20.0 to support Smile format.
+
+**More Information**
+* [FLINK-37376](https://issues.apache.org/jira/browse/FLINK-37376)
+
+## Upgrade Parquet version to 1.15.3
+
+Bump parquet version to 1.15.3 to resolve parquet-avro module
+vulnerability found in 
[CVE-2025-30065](https://nvd.nist.gov/vuln/detail/CVE-2025-30065).
+
+**More Information**
+* [FLINK-37760](https://issues.apache.org/jira/browse/FLINK-37760)
+
+# Upgrade Notes
+
+The Flink community tries to ensure that upgrades are as seamless as possible.
+However, certain changes may require users to make adjustments to certain parts
+of the program when upgrading to version 2.1. Please refer to the
+[release 
notes](https://nightlies.apache.org/flink/flink-docs-release-2.1/release-notes/flink-2.1/)
+for a comprehensive list of adjustments to make and issues to check during the
+upgrading process.
+
+# List of Contributors
+
+The Apache Flink community would like to express gratitude to all the
+contributors who made this release possible:
+
+Ahmed Hamdy,
+Alan Sheinberg,
+Aleksandr Iushmanov,
+Aleksandr Savonin,
+AlexYinHan,
+Ammu Parvathy,
+Anupam Aggarwal,
+Ao Li,
+Arvid Heise,
+Au-Miner,
+Benchao Li,
+Bonnie Varghese,
+Chris,
+David Moravek,
+David Radley,
+David Wang,
+Dawid Wysakowicz,
+Dian Fu,
+Efrat Levitan,
+Feng Jin,
+Ferenc Csaky,
+Francesco Di Chiara,
+Gabor Somogyi,
+Gunnar Morling,
+Gustavo de Morais,
+Hangxiang Yu,
+Hao Li,

Review Comment:
   So many lines, could you collect these name to in one line ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add Flink 2.1.0 release [flink-web]

Reply via email to