shuiqiangchen commented on a change in pull request #16:
URL: https://github.com/apache/flink-playgrounds/pull/16#discussion_r501192510



##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,140 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+The environment is based on Docker Compose, so the only requirement is that 
you have [Docker](https://docs.docker.com/get-docker/) 
+installed in your machine.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The amount being paid with this transaction.
+* `payPlatform`: The platform used to create this payment: pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+You can also create a new topic by executing the following command:
+```shell script
+$ docker-compose exec kafka kafka-topics.sh --bootstrap-server kafka:9092 
--create --topic <YOUR-TOPIC-NAME> --partitions 8 --replication-factor 1
+```

Review comment:
       I will move this part to the later section between before starting the 
PyFlink job after docker compose up.         




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to