[GitHub] [flink] HuangXingBo commented on a change in pull request #15769: [FLINK-18952][python][docs] Add "10 minutes to DataStream API" documentation

GitBox Tue, 27 Apr 2021 02:03:04 -0700


HuangXingBo commented on a change in pull request #15769:
URL: https://github.com/apache/flink/pull/15769#discussion_r620998140




##########
File path: docs/content.zh/docs/dev/python/datastream/intro_to_datastream_api.md
##########
@@ -0,0 +1,374 @@
+---
+title: "Python DataStream API 简介"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Intro to the Python DataStream API
+
+DataStream programs in Flink are regular programs that implement 
transformations on data streams
+(e.g., filtering, updating state, defining windows, aggregating). The data 
streams are initially
+created from various sources (e.g., message queues, socket streams, files). 
Results are returned via
+sinks, which may for example write the data to files, or to standard output 
(for example the command
+line terminal).
+
+Python DataStream API is a Python version of DataStream API which allows 
Python users could write
+Python DatStream API jobs.
+
+Common Structure of Python DataStream API Programs
+--------------------------------------------
+
+The following code example shows the common structure of Python DataStream API 
programs.
+
+```python
+from pyflink.common import WatermarkStrategy, Row
+from pyflink.common.serialization import Encoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import FileSink, OutputFileConfig, 
NumberSequenceSource
+from pyflink.datastream.functions import RuntimeContext, MapFunction
+from pyflink.datastream.state import ValueStateDescriptor
+
+
+class MyMapFunction(MapFunction):
+
+    def open(self, runtime_context: RuntimeContext):
+        state_desc = ValueStateDescriptor('cnt', Types.LONG())
+        self.cnt_state = runtime_context.get_state(state_desc)
+
+    def map(self, value):
+        cnt = self.cnt_state.value()
+        if cnt is None or cnt < 2:
+            self.cnt_state.update(1 if cnt is None else cnt + 1)
+            return value[0], value[1] + 1
+        else:
+            return value[0], value[1]
+
+
+def state_access_demo():
+    # 1. create a StreamExecutionEnvironment
+    env = StreamExecutionEnvironment.get_execution_environment()
+
+    # 2. create source DataStream
+    seq_num_source = NumberSequenceSource(1, 10000)
+    ds = env.from_source(
+        source=seq_num_source,
+        watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
+        source_name='seq_num_source',
+        type_info=Types.LONG())
+
+    # 3. define the execution logic
+    ds = ds.map(lambda a: Row(a % 4, 1), output_type=Types.ROW([Types.LONG(), 
Types.LONG()])) \
+           .key_by(lambda a: a[0]) \
+           .map(MyMapFunction(), output_type=Types.ROW([Types.LONG(), 
Types.LONG()]))
+
+    # 4. create sink and emit result to sink
+    output_path = '/opt/output/'
+    file_sink = FileSink \
+        .for_row_format(output_path, Encoder.simple_string_encoder()) \
+        
.with_output_file_config(OutputFileConfig.builder().with_part_prefix('pre').with_part_suffix('suf').build())
 \
+        .build()
+    ds.sink_to(file_sink)
+
+    # 5. execute the job
+    env.execute('state_access_demo')
+
+
+if __name__ == '__main__':
+    state_access_demo()
+```
+
+{{< top >}}
+
+Create a StreamExecutionEnvironment
+---------------------------
+
+The `StreamExecutionEnvironment` is a central concept of the DataStream API 
program.
+The following code example shows how to create a `StreamExecutionEnvironment`:
+
+```python
+from pyflink.datastream import StreamExecutionEnvironment
+
+env = StreamExecutionEnvironment.get_execution_environment()
+```
+
+{{< top >}}
+
+Create a DataStream
+---------------
+
+The DataStream API gets its name from the special `DataStream` class that is
+used to represent a collection of data in a Flink program. You can think of
+them as immutable collections of data that can contain duplicates. This data
+can either be finite or unbounded, the API that you use to work on them is the
+same.
+
+A `DataStream` is similar to a regular Python `Collection` in terms of usage 
but
+is quite different in some key ways. They are immutable, meaning that once they
+are created you cannot add or remove elements. You can also not simply inspect
+the elements inside but only work on them using the `DataStream` API
+operations, which are also called transformations.
+
+You can create an initial `DataStream` by adding a source in a Flink program.
+Then you can derive new streams from this and combine them by using API methods
+such as `map`, `filter`, and so on.
+
+### Create from a list object
+
+You can create a `DataStream` from a list object:
+
+```python
+ds = env.from_collection(
+    collection=[(1, 'aaa|bb'), (2, 'bb|a'), (3, 'aaa|a')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+```
+
+The parameter `type_info` is optional, if not specified, the output type of 
the returned `DataStream`
+will be `Types.PICKLED_BYTE_ARRAY()`.
+
+### Create using DataStream connectors
+
+You can also create a `DataStream` using DataStream connectors with method 
`add_source` as following:
+
+```python
+deserialization_schema = JsonRowDeserializationSchema.builder() \
+    .type_info(type_info=Types.ROW([Types.INT(), Types.STRING()])).build()
+
+kafka_consumer = FlinkKafkaConsumer(
+    topics='test_source_topic',
+    deserialization_schema=deserialization_schema,
+    properties={'bootstrap.servers': 'localhost:9092', 'group.id': 
'test_group'})
+
+ds = env.add_source(kafka_consumer)

Review comment:
       ```suggestion
   from pyflink.common.serialization import JsonRowDeserializationSchema
   from pyflink.common.typeinfo import Types
   from pyflink.datastream import StreamExecutionEnvironment
   from pyflink.datastream.connectors import FlinkKafkaConsumer
   
   env = StreamExecutionEnvironment.get_execution_environment()
   
   deserialization_schema = JsonRowDeserializationSchema.builder() \
       .type_info(type_info=Types.ROW([Types.INT(), Types.STRING()])).build()
   
   kafka_consumer = FlinkKafkaConsumer(
       topics='test_source_topic',
       deserialization_schema=deserialization_schema,
       properties={'bootstrap.servers': 'localhost:9092', 'group.id': 
'test_group'})
   
   ds = env.add_source(kafka_consumer)
   ```

##########
File path: docs/content.zh/docs/dev/python/datastream/intro_to_datastream_api.md
##########
@@ -0,0 +1,374 @@
+---
+title: "Python DataStream API 简介"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Intro to the Python DataStream API
+
+DataStream programs in Flink are regular programs that implement 
transformations on data streams
+(e.g., filtering, updating state, defining windows, aggregating). The data 
streams are initially
+created from various sources (e.g., message queues, socket streams, files). 
Results are returned via
+sinks, which may for example write the data to files, or to standard output 
(for example the command
+line terminal).
+
+Python DataStream API is a Python version of DataStream API which allows 
Python users could write
+Python DatStream API jobs.
+
+Common Structure of Python DataStream API Programs
+--------------------------------------------
+
+The following code example shows the common structure of Python DataStream API 
programs.
+
+```python
+from pyflink.common import WatermarkStrategy, Row
+from pyflink.common.serialization import Encoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import FileSink, OutputFileConfig, 
NumberSequenceSource
+from pyflink.datastream.functions import RuntimeContext, MapFunction
+from pyflink.datastream.state import ValueStateDescriptor
+
+
+class MyMapFunction(MapFunction):
+
+    def open(self, runtime_context: RuntimeContext):
+        state_desc = ValueStateDescriptor('cnt', Types.LONG())
+        self.cnt_state = runtime_context.get_state(state_desc)
+
+    def map(self, value):
+        cnt = self.cnt_state.value()
+        if cnt is None or cnt < 2:
+            self.cnt_state.update(1 if cnt is None else cnt + 1)
+            return value[0], value[1] + 1
+        else:
+            return value[0], value[1]
+
+
+def state_access_demo():
+    # 1. create a StreamExecutionEnvironment
+    env = StreamExecutionEnvironment.get_execution_environment()
+
+    # 2. create source DataStream
+    seq_num_source = NumberSequenceSource(1, 10000)
+    ds = env.from_source(
+        source=seq_num_source,
+        watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
+        source_name='seq_num_source',
+        type_info=Types.LONG())
+
+    # 3. define the execution logic
+    ds = ds.map(lambda a: Row(a % 4, 1), output_type=Types.ROW([Types.LONG(), 
Types.LONG()])) \
+           .key_by(lambda a: a[0]) \
+           .map(MyMapFunction(), output_type=Types.ROW([Types.LONG(), 
Types.LONG()]))
+
+    # 4. create sink and emit result to sink
+    output_path = '/opt/output/'
+    file_sink = FileSink \
+        .for_row_format(output_path, Encoder.simple_string_encoder()) \
+        
.with_output_file_config(OutputFileConfig.builder().with_part_prefix('pre').with_part_suffix('suf').build())
 \
+        .build()
+    ds.sink_to(file_sink)
+
+    # 5. execute the job
+    env.execute('state_access_demo')
+
+
+if __name__ == '__main__':
+    state_access_demo()
+```
+
+{{< top >}}
+
+Create a StreamExecutionEnvironment
+---------------------------
+
+The `StreamExecutionEnvironment` is a central concept of the DataStream API 
program.
+The following code example shows how to create a `StreamExecutionEnvironment`:
+
+```python
+from pyflink.datastream import StreamExecutionEnvironment
+
+env = StreamExecutionEnvironment.get_execution_environment()
+```
+
+{{< top >}}
+
+Create a DataStream
+---------------
+
+The DataStream API gets its name from the special `DataStream` class that is
+used to represent a collection of data in a Flink program. You can think of
+them as immutable collections of data that can contain duplicates. This data
+can either be finite or unbounded, the API that you use to work on them is the
+same.
+
+A `DataStream` is similar to a regular Python `Collection` in terms of usage 
but
+is quite different in some key ways. They are immutable, meaning that once they
+are created you cannot add or remove elements. You can also not simply inspect
+the elements inside but only work on them using the `DataStream` API
+operations, which are also called transformations.
+
+You can create an initial `DataStream` by adding a source in a Flink program.
+Then you can derive new streams from this and combine them by using API methods
+such as `map`, `filter`, and so on.
+
+### Create from a list object
+
+You can create a `DataStream` from a list object:
+
+```python
+ds = env.from_collection(
+    collection=[(1, 'aaa|bb'), (2, 'bb|a'), (3, 'aaa|a')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))

Review comment:
       ```suggestion
   from pyflink.common.typeinfo import Types
   from pyflink.datastream import StreamExecutionEnvironment
   
   env = StreamExecutionEnvironment.get_execution_environment()
   
   ds = env.from_collection(
       collection=[(1, 'aaa|bb'), (2, 'bb|a'), (3, 'aaa|a')],
       type_info=Types.ROW([Types.INT(), Types.STRING()]))
   ```

##########
File path: docs/content.zh/docs/dev/python/datastream/intro_to_datastream_api.md
##########
@@ -0,0 +1,374 @@
+---
+title: "Python DataStream API 简介"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Intro to the Python DataStream API
+
+DataStream programs in Flink are regular programs that implement 
transformations on data streams
+(e.g., filtering, updating state, defining windows, aggregating). The data 
streams are initially
+created from various sources (e.g., message queues, socket streams, files). 
Results are returned via
+sinks, which may for example write the data to files, or to standard output 
(for example the command
+line terminal).
+
+Python DataStream API is a Python version of DataStream API which allows 
Python users could write
+Python DatStream API jobs.
+
+Common Structure of Python DataStream API Programs
+--------------------------------------------
+
+The following code example shows the common structure of Python DataStream API 
programs.
+
+```python
+from pyflink.common import WatermarkStrategy, Row
+from pyflink.common.serialization import Encoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import FileSink, OutputFileConfig, 
NumberSequenceSource
+from pyflink.datastream.functions import RuntimeContext, MapFunction
+from pyflink.datastream.state import ValueStateDescriptor
+
+
+class MyMapFunction(MapFunction):
+
+    def open(self, runtime_context: RuntimeContext):
+        state_desc = ValueStateDescriptor('cnt', Types.LONG())
+        self.cnt_state = runtime_context.get_state(state_desc)
+
+    def map(self, value):
+        cnt = self.cnt_state.value()
+        if cnt is None or cnt < 2:
+            self.cnt_state.update(1 if cnt is None else cnt + 1)
+            return value[0], value[1] + 1
+        else:
+            return value[0], value[1]
+
+
+def state_access_demo():
+    # 1. create a StreamExecutionEnvironment
+    env = StreamExecutionEnvironment.get_execution_environment()
+
+    # 2. create source DataStream
+    seq_num_source = NumberSequenceSource(1, 10000)
+    ds = env.from_source(
+        source=seq_num_source,
+        watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
+        source_name='seq_num_source',
+        type_info=Types.LONG())
+
+    # 3. define the execution logic
+    ds = ds.map(lambda a: Row(a % 4, 1), output_type=Types.ROW([Types.LONG(), 
Types.LONG()])) \
+           .key_by(lambda a: a[0]) \
+           .map(MyMapFunction(), output_type=Types.ROW([Types.LONG(), 
Types.LONG()]))
+
+    # 4. create sink and emit result to sink
+    output_path = '/opt/output/'
+    file_sink = FileSink \
+        .for_row_format(output_path, Encoder.simple_string_encoder()) \
+        
.with_output_file_config(OutputFileConfig.builder().with_part_prefix('pre').with_part_suffix('suf').build())
 \
+        .build()
+    ds.sink_to(file_sink)
+
+    # 5. execute the job
+    env.execute('state_access_demo')
+
+
+if __name__ == '__main__':
+    state_access_demo()
+```
+
+{{< top >}}
+
+Create a StreamExecutionEnvironment
+---------------------------
+
+The `StreamExecutionEnvironment` is a central concept of the DataStream API 
program.
+The following code example shows how to create a `StreamExecutionEnvironment`:
+
+```python
+from pyflink.datastream import StreamExecutionEnvironment
+
+env = StreamExecutionEnvironment.get_execution_environment()
+```
+
+{{< top >}}
+
+Create a DataStream
+---------------
+
+The DataStream API gets its name from the special `DataStream` class that is
+used to represent a collection of data in a Flink program. You can think of
+them as immutable collections of data that can contain duplicates. This data
+can either be finite or unbounded, the API that you use to work on them is the
+same.
+
+A `DataStream` is similar to a regular Python `Collection` in terms of usage 
but
+is quite different in some key ways. They are immutable, meaning that once they
+are created you cannot add or remove elements. You can also not simply inspect
+the elements inside but only work on them using the `DataStream` API
+operations, which are also called transformations.
+
+You can create an initial `DataStream` by adding a source in a Flink program.
+Then you can derive new streams from this and combine them by using API methods
+such as `map`, `filter`, and so on.
+
+### Create from a list object
+
+You can create a `DataStream` from a list object:
+
+```python
+ds = env.from_collection(
+    collection=[(1, 'aaa|bb'), (2, 'bb|a'), (3, 'aaa|a')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+```
+
+The parameter `type_info` is optional, if not specified, the output type of 
the returned `DataStream`
+will be `Types.PICKLED_BYTE_ARRAY()`.
+
+### Create using DataStream connectors
+
+You can also create a `DataStream` using DataStream connectors with method 
`add_source` as following:
+
+```python
+deserialization_schema = JsonRowDeserializationSchema.builder() \
+    .type_info(type_info=Types.ROW([Types.INT(), Types.STRING()])).build()
+
+kafka_consumer = FlinkKafkaConsumer(
+    topics='test_source_topic',
+    deserialization_schema=deserialization_schema,
+    properties={'bootstrap.servers': 'localhost:9092', 'group.id': 
'test_group'})
+
+ds = env.add_source(kafka_consumer)

Review comment:
       This code snippet needs to add the jar package of `flink-kafka`, it may 
be better to explain or add a link about how to add jar dependency

##########
File path: docs/content.zh/docs/dev/python/datastream/intro_to_datastream_api.md
##########
@@ -0,0 +1,374 @@
+---
+title: "Python DataStream API 简介"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Intro to the Python DataStream API
+
+DataStream programs in Flink are regular programs that implement 
transformations on data streams
+(e.g., filtering, updating state, defining windows, aggregating). The data 
streams are initially
+created from various sources (e.g., message queues, socket streams, files). 
Results are returned via
+sinks, which may for example write the data to files, or to standard output 
(for example the command
+line terminal).
+
+Python DataStream API is a Python version of DataStream API which allows 
Python users could write
+Python DatStream API jobs.
+
+Common Structure of Python DataStream API Programs
+--------------------------------------------
+
+The following code example shows the common structure of Python DataStream API 
programs.
+
+```python
+from pyflink.common import WatermarkStrategy, Row
+from pyflink.common.serialization import Encoder
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import FileSink, OutputFileConfig, 
NumberSequenceSource
+from pyflink.datastream.functions import RuntimeContext, MapFunction
+from pyflink.datastream.state import ValueStateDescriptor
+
+
+class MyMapFunction(MapFunction):
+
+    def open(self, runtime_context: RuntimeContext):
+        state_desc = ValueStateDescriptor('cnt', Types.LONG())
+        self.cnt_state = runtime_context.get_state(state_desc)
+
+    def map(self, value):
+        cnt = self.cnt_state.value()
+        if cnt is None or cnt < 2:
+            self.cnt_state.update(1 if cnt is None else cnt + 1)
+            return value[0], value[1] + 1
+        else:
+            return value[0], value[1]
+
+
+def state_access_demo():
+    # 1. create a StreamExecutionEnvironment
+    env = StreamExecutionEnvironment.get_execution_environment()
+
+    # 2. create source DataStream
+    seq_num_source = NumberSequenceSource(1, 10000)
+    ds = env.from_source(
+        source=seq_num_source,
+        watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
+        source_name='seq_num_source',
+        type_info=Types.LONG())
+
+    # 3. define the execution logic
+    ds = ds.map(lambda a: Row(a % 4, 1), output_type=Types.ROW([Types.LONG(), 
Types.LONG()])) \
+           .key_by(lambda a: a[0]) \
+           .map(MyMapFunction(), output_type=Types.ROW([Types.LONG(), 
Types.LONG()]))
+
+    # 4. create sink and emit result to sink
+    output_path = '/opt/output/'
+    file_sink = FileSink \
+        .for_row_format(output_path, Encoder.simple_string_encoder()) \
+        
.with_output_file_config(OutputFileConfig.builder().with_part_prefix('pre').with_part_suffix('suf').build())
 \
+        .build()
+    ds.sink_to(file_sink)
+
+    # 5. execute the job
+    env.execute('state_access_demo')
+
+
+if __name__ == '__main__':
+    state_access_demo()
+```
+
+{{< top >}}
+
+Create a StreamExecutionEnvironment
+---------------------------
+
+The `StreamExecutionEnvironment` is a central concept of the DataStream API 
program.
+The following code example shows how to create a `StreamExecutionEnvironment`:
+
+```python
+from pyflink.datastream import StreamExecutionEnvironment
+
+env = StreamExecutionEnvironment.get_execution_environment()
+```
+
+{{< top >}}
+
+Create a DataStream
+---------------
+
+The DataStream API gets its name from the special `DataStream` class that is
+used to represent a collection of data in a Flink program. You can think of
+them as immutable collections of data that can contain duplicates. This data
+can either be finite or unbounded, the API that you use to work on them is the
+same.
+
+A `DataStream` is similar to a regular Python `Collection` in terms of usage 
but
+is quite different in some key ways. They are immutable, meaning that once they
+are created you cannot add or remove elements. You can also not simply inspect
+the elements inside but only work on them using the `DataStream` API
+operations, which are also called transformations.
+
+You can create an initial `DataStream` by adding a source in a Flink program.
+Then you can derive new streams from this and combine them by using API methods
+such as `map`, `filter`, and so on.
+
+### Create from a list object
+
+You can create a `DataStream` from a list object:
+
+```python
+ds = env.from_collection(
+    collection=[(1, 'aaa|bb'), (2, 'bb|a'), (3, 'aaa|a')],
+    type_info=Types.ROW([Types.INT(), Types.STRING()]))
+```
+
+The parameter `type_info` is optional, if not specified, the output type of 
the returned `DataStream`
+will be `Types.PICKLED_BYTE_ARRAY()`.
+
+### Create using DataStream connectors
+
+You can also create a `DataStream` using DataStream connectors with method 
`add_source` as following:
+
+```python
+deserialization_schema = JsonRowDeserializationSchema.builder() \
+    .type_info(type_info=Types.ROW([Types.INT(), Types.STRING()])).build()
+
+kafka_consumer = FlinkKafkaConsumer(
+    topics='test_source_topic',
+    deserialization_schema=deserialization_schema,
+    properties={'bootstrap.servers': 'localhost:9092', 'group.id': 
'test_group'})
+
+ds = env.add_source(kafka_consumer)
+```
+
+<span class="label label-info">Note</span> It currently only supports 
`FlinkKafkaConsumer` to be
+used as DataStream source connectors with method `add_source`.
+
+<span class="label label-info">Note</span> The `DataStream` created using 
`add_source` could only
+be executed in `streaming` executing mode.
+
+You could also call the `from_source` method to create a `DataStream` using 
unified DataStream
+source connectors:
+
+```python
+seq_num_source = NumberSequenceSource(1, 1000)
+ds = env.from_source(
+    source=seq_num_source,
+    watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
+    source_name='seq_num_source',
+    type_info=Types.LONG())

Review comment:
       ```suggestion
   from pyflink.common.typeinfo import Types
   from pyflink.common.watermark_strategy import WatermarkStrategy
   from pyflink.datastream import StreamExecutionEnvironment
   from pyflink.datastream.connectors import NumberSequenceSource
   
   env = StreamExecutionEnvironment.get_execution_environment()
   
   seq_num_source = NumberSequenceSource(1, 1000)
   ds = env.from_source(
       source=seq_num_source,
       watermark_strategy=WatermarkStrategy.for_monotonous_timestamps(),
       source_name='seq_num_source',
       type_info=Types.LONG())
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] HuangXingBo commented on a change in pull request #15769: [FLINK-18952][python][docs] Add "10 minutes to DataStream API" documentation

Reply via email to