sijie closed pull request #2976: add debezium source documentation
URL: https://github.com/apache/pulsar/pull/2976
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site2/docs/io-cdc.md b/site2/docs/io-cdc.md
new file mode 100644
index 0000000000..9f978bb37c
--- /dev/null
+++ b/site2/docs/io-cdc.md
@@ -0,0 +1,146 @@
+---
+id: io-cdc
+title: CDC Source Connector
+sidebar_label: CDC Source Connector
+---
+
+## Source
+
+The CDC Source connector is used to capture change log of existing databases 
like MySQL, MongoDB, PostgreSQL into Pulsar.
+
+The CDC Source connector is built on top of [Debezium](https://debezium.io/). 
This connector stores all data into Pulsar Cluster in a persistent, replicated 
and partitioned way.
+This CDC Source are tested by using MySQL, and you could get more information 
regarding how it works at [this 
link](https://debezium.io/docs/connectors/mysql/).
+Regarding how Debezium works, please reference to [Debezium 
tutorial](https://debezium.io/docs/tutorial/). It is recommended that you go 
through this tutorial first.
+
+### Source Configuration Options
+
+The Configuration is mostly related to Debezium task config, besides this we 
should provides the service URL of Pulsar cluster, and topic names that used to 
store offset and history.
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| `task.class` | `true` | `null` | A source task class that implemented in 
Debezium. |
+| `database.hostname` | `true` | `null` | The address of the Database server. |
+| `database.port` | `true` | `null` | The port number of the Database server.. 
|
+| `database.user` | `true` | `null` | The name of the Database user that has 
the required privileges. |
+| `database.password` | `true` | `null` | The password for the Database user 
that has the required privileges. |
+| `database.server.id` | `true` | `null` | The connector’s identifier that 
must be unique within the Database cluster and similar to Database’s server-id 
configuration property. |
+| `database.server.name` | `true` | `null` | The logical name of the Database 
server/cluster, which forms a namespace and is used in all the names of the 
Kafka topics to which the connector writes, the Kafka Connect schema names, and 
the namespaces of the corresponding Avro schema when the Avro Connector is 
used. |
+| `database.whitelist` | `false` | `null` | A list of all databases hosted by 
this server that this connector will monitor. This is optional, and there are 
other properties for listing the databases and tables to include or exclude 
from monitoring. |
+| `key.converter` | `true` | `null` | The converter provided by Kafka Connect 
to convert record key. |
+| `value.converter` | `true` | `null` | The converter provided by Kafka 
Connect to convert record value.  |
+| `database.history` | `true` | `null` | The name of the database history 
class name. |
+| `database.history.pulsar.topic` | `true` | `null` | The name of the database 
history topic where the connector will write and recover DDL statements. This 
topic is for internal use only and should not be used by consumers. |
+| `database.history.pulsar.service.url` | `true` | `null` | Pulsar cluster 
service url for history topic. |
+| `pulsar.service.url` | `true` | `null` | Pulsar cluster service url. |
+| `offset.storage.topic` | `true` | `null` | Record the last committed offsets 
that the connector successfully completed. |
+
+### Configuration Example
+
+Here is a configuration Json example:
+
+```$json
+{
+    "tenant": "public",
+    "namespace": "default",
+    "name": "debezium-kafka-source",
+    "className": "org.apache.pulsar.io.kafka.connect.KafkaConnectSource" ,
+    "topicName": "kafka-connect-topic",
+    "configs":
+    {
+        "task.class": "io.debezium.connector.mysql.MySqlConnectorTask",
+        "database.hostname": "localhost",
+        "database.port": "3306",
+        "database.user": "debezium",
+        "database.password": "dbz",
+        "database.server.id": "184054",
+        "database.server.name": "dbserver1",
+        "database.whitelist": "inventory",
+        "database.history": 
"org.apache.pulsar.io.debezium.PulsarDatabaseHistory",
+        "database.history.pulsar.topic": "history-topic",
+        "database.history.pulsar.service.url": "pulsar://127.0.0.1:6650",
+        "key.converter": "org.apache.kafka.connect.json.JsonConverter",
+        "value.converter": "org.apache.kafka.connect.json.JsonConverter",
+        "pulsar.service.url": "pulsar://127.0.0.1:6650",
+        "offset.storage.topic": "offset-topic"
+    },
+    "archive": "connectors/pulsar-io-kafka-connect-adaptor-2.3.0-SNAPSHOT.nar"
+}
+```
+
+You could also find the yaml example in this 
[file](https://github.com/apache/pulsar/blob/master/pulsar-io/kafka-connect-adaptor/src/main/resources/debezium-mysql-source-config.yaml),
 which has similar content below:
+
+```$yaml
+tenant: "public"
+namespace: "default"
+name: "debezium-kafka-source"
+topicName: "kafka-connect-topic"
+archive: "connectors/pulsar-io-kafka-connect-adaptor-2.3.0-SNAPSHOT.nar"
+
+##autoAck: true
+parallelism: 1
+
+configs:
+  ## sourceTask
+  task.class: "io.debezium.connector.mysql.MySqlConnectorTask"
+
+  ## config for mysql, docker image: debezium/example-mysql:0.8
+  database.hostname: "localhost"
+  database.port: "3306"
+  database.user: "debezium"
+  database.password: "dbz"
+  database.server.id: "184054"
+  database.server.name: "dbserver1"
+  database.whitelist: "inventory"
+
+  database.history: "org.apache.pulsar.io.debezium.PulsarDatabaseHistory"
+  database.history.pulsar.topic: "history-topic"
+  database.history.pulsar.service.url: "pulsar://127.0.0.1:6650"
+  ## KEY_CONVERTER_CLASS_CONFIG, VALUE_CONVERTER_CLASS_CONFIG
+  key.converter: "org.apache.kafka.connect.json.JsonConverter"
+  value.converter: "org.apache.kafka.connect.json.JsonConverter"
+  ## PULSAR_SERVICE_URL_CONFIG
+  pulsar.service.url: "pulsar://127.0.0.1:6650"
+  ## OFFSET_STORAGE_TOPIC_CONFIG
+  offset.storage.topic: "offset-topic"
+```
+
+### Usage example
+
+Here is a simple example to store MySQL change data using above example config.
+
+- Start a MySQL server with an example database, from which Debezium can 
capture changes.
+```$bash
+ docker run -it --rm --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=debezium 
-e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw debezium/example-mysql:0.8
+```
+
+- Start a Pulsar service locally in standalone mode.
+```$bash
+ bin/pulsar standalone
+```
+
+- Start pulsar debezium connector, with local run mode, and using above yaml 
config file. Please make sure that the nar file is available as configured in 
path `connectors/pulsar-io-kafka-connect-adaptor-2.3.0-SNAPSHOT.nar`.
+```$bash
+ bin/pulsar-admin source localrun  --sourceConfigFile 
debezium-mysql-source-config.yaml
+```
+
+- Subscribe the topic for table `inventory.products`.
+```
+ bin/pulsar-client consume -s "sub-products" 
public/default/dbserver1.inventory.products -n 0
+```
+
+- start a MySQL cli docker connector, and use it we could change to the table 
`products` in MySQL server.
+```$bash
+$docker run -it --rm --name mysqlterm --link mysql --rm mysql:5.7 sh -c 'exec 
mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot 
-p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'
+```
+
+This command will pop out MySQL cli, in this cli, we could do a change in 
table products, use commands below to change the name of 2 items in table 
products:
+
+```
+mysql> use inventory;
+mysql> show tables;
+mysql> SELECT * FROM  products ;
+mysql> UPDATE products SET name='1111111111' WHERE id=101;
+mysql> UPDATE products SET name='1111111111' WHERE id=107;
+```
+
+- In above subscribe topic terminal tab, we could find that 2 changes has been 
kept into products topic.
diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
index 5a7699898a..92a19dd70e 100644
--- a/site2/docs/io-connectors.md
+++ b/site2/docs/io-connectors.md
@@ -16,3 +16,4 @@ Pulsar Functions cluster.
 - [Kinesis Sink Connector](io-kinesis.md#sink)
 - [RabbitMQ Source Connector](io-rabbitmq.md#source)
 - [Twitter Firehose Source Connector](io-twitter.md)
+- [CDC Source Connector based on Debezium](io-cdc.md)
diff --git a/site2/docs/io-overview.md b/site2/docs/io-overview.md
index 058422dac6..8590e28e7b 100644
--- a/site2/docs/io-overview.md
+++ b/site2/docs/io-overview.md
@@ -37,3 +37,4 @@ The following connectors are currently available for Pulsar:
 |[Kinesis 
sink](https://aws.amazon.com/kinesis/)|[`org.apache.pulsar.io.kinesis.KinesisSink`](https://github.com/apache/pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisSink.java)|[Documentation](io-kinesis.md#sink)|
 |[RabbitMQ 
source](https://www.rabbitmq.com)|[`org.apache.pulsar.io.rabbitmq.RabbitMQSource`](https://github.com/apache/pulsar/blob/master/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQSource.java)|[Documentation](io-rabbitmq.md#sink)|
 |[Twitter Firehose 
source](https://developer.twitter.com/en/docs)|[`org.apache.pulsar.io.twitter.TwitterFireHose`](https://github.com/apache/pulsar/blob/master/pulsar-io/twitter/src/main/java/org/apache/pulsar/io/twitter/TwitterFireHose.java)|[Documentation](io-twitter.md#source)|
+|[CDC Source Connector based on 
Debezium](https://debezium.io/)|[`org.apache.pulsar.io.kafka.connect.KafkaConnectSource`](https://github.com/apache/pulsar/blob/master/pulsar-io/kafka-connect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/KafkaConnectSource.java)|[Documentation](io-cdc.md)|


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to