Re: [PR] Add announcement blog post for Flink 1.19 [flink-web]

via GitHub Mon, 11 Mar 2024 08:55:32 -0700


lincoln-lil commented on code in PR #721:
URL: https://github.com/apache/flink-web/pull/721#discussion_r1519964324



##########
docs/content/posts/2024-03-xx-release-1.19.0.md:
##########
@@ -0,0 +1,456 @@
+---
+authors:
+- LincolnLee:
+  name: "Lincoln Lee"
+  twitter: lincoln_86xy
+
+date: "2024-03-xxT22:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.19
+aliases:
+- /news/2024/03/xx/release-1.19.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.19.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 162
+people contributed to this release completing 33 FLIPS and 600+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Flink SQL Improvements
+
+## Custom Parallelism for Table/SQL Sources
+
+Now in Flink 1.19, you can set a custom parallelism for performance tuning via 
the `scan.parallelism`
+option. The first available connector is DataGen (Kafka connector is on the 
way). Here is an example
+using SQL Client:
+
+```sql
+-- set parallelism within the ddl
+CREATE TABLE Orders (
+    order_number BIGINT,
+    price        DECIMAL(32,2),
+    buyer        ROW<first_name STRING, last_name STRING>,
+    order_time   TIMESTAMP(3)
+) WITH (
+    'connector' = 'datagen',
+    'scan.parallelism' = '4'
+);
+
+-- or set parallelism via dynamic table option
+SELECT * FROM Orders /*+ OPTIONS('scan.parallelism'='4') */;
+```
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sourcessinks/#scan-table-source)
+* [FLIP-367: Support Setting Parallelism for Table/SQL 
Sources](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150)
+
+
+## Configurable SQL Gateway Java Options
+
+A new option `env.java.opts.sql-gateway` for specifying the Java options, so 
you can fine-tune the
+memory settings, garbage collection behavior, and other relevant Java 
parameters for SQL Gateway.
+
+**More Information**
+* [FLINK-33203](https://issues.apache.org/jira/browse/FLINK-33203)
+
+
+## Configure Different State TTLs using SQL Hint
+
+Starting from Flink 1.18, Table API and SQL users can set state time-to-live 
(TTL) individually for
+stateful operators via the SQL compiled plan. In Flink 1.19, users have a more 
flexible way to
+specify custom TTL values for regular joins and group aggregations directly 
within their queries by [utilizing the STATE_TTL 
hint](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/queries/hints/#state-ttl-hints).
+This improvement means that you no longer need to alter your compiled plan to 
set specific TTLs for
+these frequently used operators. With the introduction of `STATE_TTL` hints, 
you can streamline your workflow and
+dynamically adjust the TTL based on your operational requirements.
+
+Here is an example:
+```sql
+-- set state ttl for join
+SELECT /*+ STATE_TTL('Orders'= '1d', 'Customers' = '20d') */ *
+FROM Orders LEFT OUTER JOIN Customers
+    ON Orders.o_custkey = Customers.c_custkey;
+
+-- set state ttl for aggregation
+SELECT /*+ STATE_TTL('o' = '1d') */ o_orderkey, SUM(o_totalprice) AS revenue
+FROM Orders AS o
+GROUP BY o_orderkey;
+```
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/queries/hints/#state-ttl-hints)
+* [FLIP-373: Support Configuring Different State TTLs using SQL 
Hint](https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint)
+
+
+## Named Parameters for Functions and Procedures
+
+Named parameters now can be used when calling a function or stored procedure. 
With named parameters,
+users do not need to strictly specify the parameter position, just specify the 
parameter name and its
+corresponding value. At the same time, if non-essential parameters are not 
specified, they will default to being filled with null.
+
+Here's an example of defining a function with one mandatory parameter and two 
optional parameters using named parameters:
+```java
+public static class NamedArgumentsTableFunction extends TableFunction<Object> {
+
+       @FunctionHint(
+                       output = @DataTypeHint("STRING"),
+                       arguments = {
+                                       @ArgumentHint(name = "in1", isOptional 
= false, type = @DataTypeHint("STRING")),
+                                       @ArgumentHint(name = "in2", isOptional 
= true, type = @DataTypeHint("STRING")),
+                                       @ArgumentHint(name = "in3", isOptional 
= true, type = @DataTypeHint("STRING"))})
+       public void eval(String arg1, String arg2, String arg3) {
+               collect(arg1 + ", " + arg2 + "," + arg3);
+       }
+
+}
+```
+When calling the function in SQL, parameters can be specified by name, for 
example:
+```sql
+SELECT * FROM TABLE(myFunction(in1 => 'v1', in3 => 'v3', in2 => 'v2'))
+```
+Also the optional parameters can be omitted:
+```sql
+SELECT * FROM TABLE(myFunction(in1 => 'v1'))
+```
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/#named-parameters)
+* [FLIP-387: Support named parameters for functions and call 
procedures](https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures)
+
+## Window TVF Aggregation Features
+
+* Supports SESSION Window TVF in Streaming Mode
+Now users can use SESSION Window TVF in streaming mode. A simple example is as 
follows:
+```sql
+-- session window with partition keys
+SELECT * FROM TABLE(
+   SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' 
MINUTES));
+
+-- apply aggregation on the session windowed table with partition keys
+SELECT window_start, window_end, item, SUM(price) AS total_price
+FROM TABLE(
+    SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' 
MINUTES))
+GROUP BY item, window_start, window_end;
+```
+* Supports Changelog Inputs For Window TVF Aggregation
+  Window aggregation operators (generated based on Window TVF Function) can 
now handle changelog
+  streams (e.g., CDC data sources, etc.) without issue. Users are recommended 
to migrate from legacy
+  window aggregation to the new syntax for more complete feature support.
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/queries/window-tvf/#session)
+
+
+## Tuning: MiniBatch Optimization for Regular Joins
+
+The record amplification is a pain point when performing cascading joins in 
Flink, now in Flink 1.19,
+the new mini-batch optimization can be used for regular join to reduce 
intermediate result in such
+cascading join scenarios.
+
+<div style="text-align: center;">
+<img src="/img/blog/2024-03-xx-release-1.19.0/minibatch_join.png" 
style="width:90%;margin:15px">
+</div>
+
+**More Information**
+* 
[minibatch-regular-joins](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/tuning/#minibatch-regular-joins).
+* [FLIP-415: Introduce a new join operator to support 
minibatch](https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch)
+
+# Runtime & Coordination Improvements
+
+## Dynamic Source Parallelism Inference for Batch Jobs
+
+In Flink 1.19, we have supported dynamic source parallelism inference for 
batch jobs, which allows
+source connectors to dynamically infer the parallelism based on the actual 
amount of data to consume.
+This feature is a significant improvement over previous versions, which only 
assigned a fixed default
+parallelism to source vertices.
+Source connectors need to implement the inference interface to enable dynamic 
parallelism inference.
+Currently, the FileSource connector has already been developed with this 
functionality in place.
+Additionally, the configuration 
`execution.batch.adaptive.auto-parallelism.default-source-parallelism`
+will be used as the upper bound of source parallelism inference. And now it 
will not default to 1.
+Instead, if it is not set, the upper bound of allowed parallelism set via
+`execution.batch.adaptive.auto-parallelism.max-parallelism` will be used. If 
that configuration is
+also not set, the default parallelism set via `parallelism.default` or 
`StreamExecutionEnvironment#setParallelism()`
+will be used instead.
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/elastic_scaling/#enable-dynamic-parallelism-inference-support-for-sources).
+* [FLIP-379: Support dynamic source parallelism inference for batch 
jobs](https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs)
+
+## Standard YAML for FLINK Configuration
+
+Starting with Flink 1.19, Flink has officially introduced full support for the 
standard YAML 1.2
+syntax. The default configuration file has been changed to `config.yaml` and 
placed in the `conf/`
+directory. Users should directly modify this file to configure Flink.
+If users want to use the legacy configuration file `flink-conf.yaml`, users 
just need to copy this
+file into the `conf/` directory. Once the legacy configuration file 
`flink-conf.yaml` is detected,
+Flink will prioritize using it as the configuration file. And in the upcoming 
Flink 2.0, the
+`flink-conf.yaml` configuration file will no longer work.
+
+**More Information**
+* 
[flink-configuration-file](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/config/#flink-configuration-file)
+* [FLIP-366: Support standard YAML for FLINK 
configuration](https://cwiki.apache.org/confluence/display/FLINK/FLIP-366%3A+Support+standard+YAML+for+FLINK+configuration?src=contextnavpagetreemode)
+
+## Profiling JobManager/TaskManager on Flink Web
+
+In Flink 1.19, we support triggering profiling at the JobManager/TaskManager 
level, allowing users to
+create a profiling instance with arbitrary intervals and event modes 
(supported by 
[async-profiler](https://github.com/async-profiler/async-profiler)).
+Users can easily submit profiles and export results in the Flink Web UI.
+
+For example,
+- First, users should identify the candidate TaskManager/JobManager with 
performance bottleneck for profiling and switch to the corresponding 
TaskManager/JobManager page (profiler tab).
+- The user simply clicks on the `Create Profiling Instance` button to submit a 
profiling instance with specified period and mode. (The description of the 
profiling mode will be displayed when hovering over the corresponding mode.)
+- Once the profiling instance is complete, the user can easily download the 
interactive HTML file by clicking on the link.
+
+<div style="text-align: center;">
+<img src="/img/blog/2024-03-xx-release-1.19.0/profiling.png" 
style="width:90%;margin:15px">
+</div>
+
+Profiling result:
+<div style="text-align: center;">
+<img src="/img/blog/2024-03-xx-release-1.19.0/profiling-res.png" 
style="width:90%;margin:15px">
+</div>
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/debugging/profiler/)
+* [FLIP-375: Built-in cross-platform powerful java 
profiler](https://cwiki.apache.org/confluence/x/64lEE)
+
+## New Config Options for Administrator JVM Options
+
+A set of administrator JVM options are available, which prepend the user-set 
extra JVM options for
+platform-wide JVM tuning.
+
+**More Information**
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/config/#jvm-and-logging-options)
+* [FLIP-397: Add config options for administrator JVM 
options](https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options?src=jira)
+
+# Checkpoints Improvements
+
+## Using Larger Checkpointing Interval When Source is Processing Backlog
+
+`ProcessingBacklog` is introduced to demonstrate whether a record should be 
processed with low latency
+or high throughput. `ProcessingBacklog` can be set by source operators, and 
can be used to change the
+checkpoint interval of a job during runtime.
+
+**More Information**
+* [FLINK-32514](https://issues.apache.org/jira/browse/FLINK-32514)
+* [FLIP-309: Support using larger checkpointing interval when source is 
processing 
backlog](https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog
+
+## CheckpointsCleaner Clean Individual Checkpoint States in Parallel
+
+Now when disposing of no longer needed checkpoints, every state handle/state 
file will be disposed
+in parallel by the ioExecutor, vastly improving the disposing speed of a 
single checkpoint (for
+large checkpoints the disposal time can be improved from 10 minutes to < 1 
minute) . The old
+behavior can be restored by setting `state.checkpoint.cleaner.parallel-mode` 
to false.
+
+**More Information**
+* [FLINK-33090](https://issues.apache.org/jira/browse/FLINK-33090)
+
+## Trigger Checkpoints through command line client
+
+The command line interface supports triggering a checkpoint manually. Usage:
+```shell
+./bin/flink checkpoint $JOB_ID [-full]
+```
+By specifying the '-full' option, a full checkpoint is triggered. Otherwise an 
incremental
+checkpoint is triggered if the job is configured to take incremental ones 
periodically.
+
+**More Information**
+* [FLINK-6755](https://issues.apache.org/jira/browse/FLINK-6755)
+
+
+# Important Deprecations
+
+In preparation for the release of Flink 2.0 next year, the community has 
decided to officially deprecate multiple APIs 
+that were approaching end of life for a while.
+
+* Flink's 
[`org.apache.flink.api.common.time.Time`](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/time/Time.java)
 is now officially deprecated and will be dropped in Flink 2.0
+Please migrate it to Java's own `Duration` class. Methods supporting the 
`Duration` class that replace the deprecated `Time`-based methods were 
introduced.
+* 
[`org.apache.flink.runtime.jobgraph.RestoreMode#LEGACY`](https://github.com/apache/flink/blob/release-1.19/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/RestoreMode.java#L40)
 is deprecated. Please use 
[`RestoreMode#CLAIM`](https://github.com/apache/flink/blob/release-1.19/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/RestoreMode.java#L31)
 or 
[`RestoreMode#NO_CLAIM`](https://github.com/apache/flink/blob/release-1.19/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/RestoreMode.java#L34)
 mode instead to get a clear state file ownership when restoring.
+* The old method of resolving schema compatibility has been deprecated, please 
migrate to the new method following [Migrating from deprecated 
`TypeSerializerSnapshot#resolveSchemaCompatibility(TypeSerializer 
newSerializer)` before Flink 
1.19](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/serialization/custom_serialization/#migrating-from-deprecated-typeserializersnapshotresolveschemacompatibilityt).
+* Configuring serialization behavior through hard codes is deprecated, e.g., 
[`ExecutionConfig#enableForceKryo()`](https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java#L643).
 Please use the options
+`pipeline.serialization-config`, `pipeline.force-avro`, `pipeline.force-kryo`, 
and `pipeline.generic-types`. Registration of instance-level serializers is 
deprecated, using class-level serializers instead.
+* We have deprecated all `setXxx` and `getXxx` methods except 
[`getString(String key, String 
defaultValue)`](https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/configuration/Configuration.java#L176)
 
+and [`setString(String key, String 
value)`](https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/configuration/Configuration.java#L220),
 such as: `setInteger`, `setLong`, `getInteger` and `getLong` etc.
+Users and developers are recommend to use get and set methods with 
`ConfigOption` instead of string as key.
+* The non-`ConfigOption` objects in the `StreamExecutionEnvironment`, 
`CheckpointConfig`, and `ExecutionConfig` and their corresponding getter/setter 
interfaces are now be deprecated. These objects and methods are planned to be 
removed in Flink 2.0. The deprecated interfaces include the getter and setter 
methods of `RestartStrategy`, `CheckpointStorage`, and `StateBackend`.
+* 
[`org.apache.flink.api.common.functions.RuntimeContext#getExecutionConfig`](https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/api/common/functions/RuntimeContext.java#L191)
 is now officially deprecated and will be dropped in Flink 2.0. Please use 
[`getGlobalJobParameters()`](https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/api/common/functions/RuntimeContext.java#L208)
 or 
[`isObjectReuseEnabled()`](https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/api/common/functions/RuntimeContext.java#L216).

Review Comment:
   @JunRuiLee  I've changed to a list similar to the one in flip(a little 
clearly for display), what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add announcement blog post for Flink 1.19 [flink-web]

Reply via email to