[flink] branch master updated: [FLINK-28857][docs] Add Document for DataStream Cache API

gaoyunhaii Tue, 09 Aug 2022 08:37:08 -0700

This is an automated email from the ASF dual-hosted git repository.

gaoyunhaii pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git



The following commit(s) were added to refs/heads/master by this push:
     new 21712c932cd [FLINK-28857][docs] Add Document for DataStream Cache API
21712c932cd is described below

commit 21712c932cdbcc6be26f46a201412646c56f9650
Author: sxnan <suxuanna...@gmail.com>
AuthorDate: Mon Aug 8 15:21:14 2022 +0800

    [FLINK-28857][docs] Add Document for DataStream Cache API
    
    This closes #20491.
---
 .../docs/dev/datastream/operators/overview.md      | 36 ++++++++++++++++++++
 .../docs/dev/datastream/operators/overview.md      | 38 ++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/docs/content.zh/docs/dev/datastream/operators/overview.md 
b/docs/content.zh/docs/dev/datastream/operators/overview.md
index e0da3f5d911..d16f89e2923 100644
--- a/docs/content.zh/docs/dev/datastream/operators/overview.md
+++ b/docs/content.zh/docs/dev/datastream/operators/overview.md
@@ -572,6 +572,42 @@ Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
+### Cache
+#### DataStream &rarr; CachedDataStream
+
+把算子的结果缓存起来。目前只支持批执行模式下运行的作业。算子的结果在算子第一次执行的时候会被缓存起来，之后的
+作业中会复用该算子缓存的结果。如果算子的结果丢失了，它会被原来的算子重新计算并缓存。
+
+{{< tabs cache >}}
+{{< tab "Java" >}}
+```java
+DataStream<Integer> dataStream = //...
+CachedDataStream<Integer> cachedDataStream = dataStream.cache();
+cachedDataStream.print(); // Do anything with the cachedDataStream
+...
+env.execute(); // Execute and create cache.
+        
+cachedDataStream.print(); // Consume cached result.
+env.execute();
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+val dataStream : DataStream[Int] = //...
+val cachedDataStream = dataStream.cache()
+cachedDataStream.print() // Do anything with the cachedDataStream
+...
+env.execute() // Execute and create cache.
+
+cachedDataStream.print() // Consume cached result.
+env.execute()
+```
+{{< /tab >}}
+{{< tab "Python" >}}
+Python 中尚不支持此特性。
+{{< /tab >}}
+{{< /tabs>}}
+
 ## 物理分区
 
 Flink 也提供以下方法让用户根据需要在数据转换完成后对数据分区进行更细粒度的配置。
diff --git a/docs/content/docs/dev/datastream/operators/overview.md 
b/docs/content/docs/dev/datastream/operators/overview.md
index 81e183aa1b7..a396a5af923 100644
--- a/docs/content/docs/dev/datastream/operators/overview.md
+++ b/docs/content/docs/dev/datastream/operators/overview.md
@@ -575,6 +575,44 @@ This feature is not yet supported in Python
 {{< /tab >}}
 {{< /tabs>}}
 
+### Cache
+#### DataStream &rarr; CachedDataStream
+
+Cache the intermediate result of the transformation. Currently, only jobs that 
run with batch 
+execution mode are supported. The cache intermediate result is generated 
lazily at the first time 
+the intermediate result is computed so that the result can be reused by later 
jobs. If the cache is 
+lost, it will be recomputed using the original transformations.
+
+{{< tabs cache >}}
+{{< tab "Java" >}}
+```java
+DataStream<Integer> dataStream = //...
+CachedDataStream<Integer> cachedDataStream = dataStream.cache();
+cachedDataStream.print(); // Do anything with the cachedDataStream
+...
+env.execute(); // Execute and create cache.
+        
+cachedDataStream.print(); // Consume cached result.
+env.execute();
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+val dataStream : DataStream[Int] = //...
+val cachedDataStream = dataStream.cache()
+cachedDataStream.print() // Do anything with the cachedDataStream
+...
+env.execute() // Execute and create cache.
+
+cachedDataStream.print() // Consume cached result.
+env.execute()
+```
+{{< /tab >}}
+{{< tab "Python" >}}
+This feature is not yet supported in Python
+{{< /tab >}}
+{{< /tabs>}}
+
 ## Physical Partitioning
 
 Flink also gives low-level control (if desired) on the exact stream 
partitioning after a transformation, via the following functions.

[flink] branch master updated: [FLINK-28857][docs] Add Document for DataStream Cache API

Reply via email to