[GitHub] [beam] sirenbyte commented on a diff in pull request #25301: [Tour of Beam] Learning content for "IO Connectors" module

via GitHub Thu, 30 Mar 2023 18:50:50 -0700


sirenbyte commented on code in PR #25301:
URL: https://github.com/apache/beam/pull/25301#discussion_r1153936917



##########
learning/tour-of-beam/learning-content/IO/kafka-io/kafka-read/description.md:
##########
@@ -0,0 +1,72 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Reading from Kafka using KafkaIO
+
+`KafkaIO` is a part of the Apache Beam SDK that provides a way to read data 
from Apache Kafka and write data to it. It allows for the creation of Beam 
pipelines that can consume data from a Kafka topic, process the data and write 
the processed data back to another Kafka topic. This makes it possible to build 
data processing pipelines using Apache Beam that can easily integrate with a 
Kafka-based data architecture.
+
+When reading data from Kafka topics using Apache Beam, developers can use the 
`ReadFromKafka` transform to create a `PCollection` of Kafka messages. This 
transform takes the following parameters:
+
+When the `ReadFromKafka` transform is executed, it creates a `PCollection` of 
Kafka messages, where each message is represented as a tuple containing the 
key, value, and metadata fields. If the with_metadata flag is set to True, the 
metadata fields are included in the tuple as well.
+
+Developers can then use other Apache Beam transforms to process and analyze 
the Kafka messages, such as filtering, aggregating, and joining them with other 
data sources. Once the data processing pipeline is defined, it can be executed 
on a distributed processing engine, such as **Apache Flink**, **Apache Spark**, 
or **Google Cloud Dataflow**, to process the Kafka messages in parallel and at 
scale.
+

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] sirenbyte commented on a diff in pull request #25301: [Tour of Beam] Learning content for "IO Connectors" module

Reply via email to