This is an automated email from the ASF dual-hosted git repository. gaoyunhaii pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/master by this push: new 21712c932cd [FLINK-28857][docs] Add Document for DataStream Cache API 21712c932cd is described below commit 21712c932cdbcc6be26f46a201412646c56f9650 Author: sxnan <suxuanna...@gmail.com> AuthorDate: Mon Aug 8 15:21:14 2022 +0800 [FLINK-28857][docs] Add Document for DataStream Cache API This closes #20491. --- .../docs/dev/datastream/operators/overview.md | 36 ++++++++++++++++++++ .../docs/dev/datastream/operators/overview.md | 38 ++++++++++++++++++++++ 2 files changed, 74 insertions(+) diff --git a/docs/content.zh/docs/dev/datastream/operators/overview.md b/docs/content.zh/docs/dev/datastream/operators/overview.md index e0da3f5d911..d16f89e2923 100644 --- a/docs/content.zh/docs/dev/datastream/operators/overview.md +++ b/docs/content.zh/docs/dev/datastream/operators/overview.md @@ -572,6 +572,42 @@ Python 中尚不支持此特性。 {{< /tab >}} {{< /tabs>}} +### Cache +#### DataStream → CachedDataStream + +把算子的结果缓存起来。目前只支持批执行模式下运行的作业。算子的结果在算子第一次执行的时候会被缓存起来,之后的 +作业中会复用该算子缓存的结果。如果算子的结果丢失了,它会被原来的算子重新计算并缓存。 + +{{< tabs cache >}} +{{< tab "Java" >}} +```java +DataStream<Integer> dataStream = //... +CachedDataStream<Integer> cachedDataStream = dataStream.cache(); +cachedDataStream.print(); // Do anything with the cachedDataStream +... +env.execute(); // Execute and create cache. + +cachedDataStream.print(); // Consume cached result. +env.execute(); +``` +{{< /tab >}} +{{< tab "Scala" >}} +```scala +val dataStream : DataStream[Int] = //... +val cachedDataStream = dataStream.cache() +cachedDataStream.print() // Do anything with the cachedDataStream +... +env.execute() // Execute and create cache. + +cachedDataStream.print() // Consume cached result. +env.execute() +``` +{{< /tab >}} +{{< tab "Python" >}} +Python 中尚不支持此特性。 +{{< /tab >}} +{{< /tabs>}} + ## 物理分区 Flink 也提供以下方法让用户根据需要在数据转换完成后对数据分区进行更细粒度的配置。 diff --git a/docs/content/docs/dev/datastream/operators/overview.md b/docs/content/docs/dev/datastream/operators/overview.md index 81e183aa1b7..a396a5af923 100644 --- a/docs/content/docs/dev/datastream/operators/overview.md +++ b/docs/content/docs/dev/datastream/operators/overview.md @@ -575,6 +575,44 @@ This feature is not yet supported in Python {{< /tab >}} {{< /tabs>}} +### Cache +#### DataStream → CachedDataStream + +Cache the intermediate result of the transformation. Currently, only jobs that run with batch +execution mode are supported. The cache intermediate result is generated lazily at the first time +the intermediate result is computed so that the result can be reused by later jobs. If the cache is +lost, it will be recomputed using the original transformations. + +{{< tabs cache >}} +{{< tab "Java" >}} +```java +DataStream<Integer> dataStream = //... +CachedDataStream<Integer> cachedDataStream = dataStream.cache(); +cachedDataStream.print(); // Do anything with the cachedDataStream +... +env.execute(); // Execute and create cache. + +cachedDataStream.print(); // Consume cached result. +env.execute(); +``` +{{< /tab >}} +{{< tab "Scala" >}} +```scala +val dataStream : DataStream[Int] = //... +val cachedDataStream = dataStream.cache() +cachedDataStream.print() // Do anything with the cachedDataStream +... +env.execute() // Execute and create cache. + +cachedDataStream.print() // Consume cached result. +env.execute() +``` +{{< /tab >}} +{{< tab "Python" >}} +This feature is not yet supported in Python +{{< /tab >}} +{{< /tabs>}} + ## Physical Partitioning Flink also gives low-level control (if desired) on the exact stream partitioning after a transformation, via the following functions.