This is an automated email from the ASF dual-hosted git repository. zregvart pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/camel-website.git
The following commit(s) were added to refs/heads/master by this push: new 7bc7238 CDC blog post minor update 7bc7238 is described below commit 7bc7238c1b9cdfc786c552b94cd17a58cd4358f3 Author: Federico Valeri <fvaleri@localhost> AuthorDate: Tue May 26 21:07:17 2020 +0200 CDC blog post minor update --- content/blog/CdcWithCamelAndDebezium/index.md | 58 ++++++++++++--------------- 1 file changed, 26 insertions(+), 32 deletions(-) diff --git a/content/blog/CdcWithCamelAndDebezium/index.md b/content/blog/CdcWithCamelAndDebezium/index.md index 9914bbc..fa5cbb2 100644 --- a/content/blog/CdcWithCamelAndDebezium/index.md +++ b/content/blog/CdcWithCamelAndDebezium/index.md @@ -8,7 +8,7 @@ preview: "CDC approaches based on Camel and Debezium." --- Change Data Capture (CDC) is a well-established software design pattern for a system that monitors and captures -the changes in data, so that other software can respond to those changes. +data changes, so that other software can respond to those events. Using a CDC engine like [Debezium](https://debezium.io) along with [Camel](https://camel.apache.org) integration framework, we can easily build data pipelines to bridge traditional data stores and new cloud-native event-driven @@ -51,41 +51,34 @@ represents the number of microseconds since Unix Epoch as the server time at whi Prerequisites: Postgres 11, OpenJDK 1.8 and Maven 3.5+. -[GET CODE HERE](https://github.com/fvaleri/cdc) +[GET CODE HERE](https://github.com/fvaleri/cdc/tree/blog) ### External systems setup -First of all, you need to start Postgres (the procedure depends on your specific OS). - -Then, there is a simple script to create and initialize the database. -This script can also be used to query the table and produce a stream of changes. -```sh -./run.sh --database -./run.sh --query -./run.sh --stream -``` - -Enable Postgres internal transaction log access (required by Debezium). +Enable transaction log access and start Postgres. ```sh # postgresql.conf: configure replication slot wal_level = logical max_wal_senders = 1 max_replication_slots = 1 # pg_hba.conf: allow localhost replication to debezium user -local replication cdcadmin trust -host replication cdcadmin 127.0.0.1/32 trust -host replication cdcadmin ::1/128 trust -# add replication permission to user and enable previous values -psql cdcdb -ALTER ROLE cdcadmin WITH REPLICATION; -ALTER TABLE cdc.customers REPLICA IDENTITY FULL; -# restart Postgres +local cdcdb cdcadmin trust +host cdcdb cdcadmin 127.0.0.1/32 trust +host cdcdb cdcadmin ::1/128 trust +``` + +There is a simple script to create and initialize the database. +This script can also be used to query the table and produce a stream of changes. +```sh +./run.sh --database +./run.sh --query +./run.sh --stream ``` -Start Artemis broker and open the [web console](http://localhost:8161/console) to check messages. +Then, start Artemis broker and open the [web console](http://localhost:8161/console) (login: admin/admin). ```sh ./run.sh --artemis -# check status +# status check ps -ef | grep "[A]rtemis" | wc -l ``` @@ -106,7 +99,7 @@ We need a Kafka cluster up and running (3 ZooKeeper + 3 Kafka). This step also d (debezium-connector-postgres, camel-sjms2-kafka-connector) and dependencies. ```sh ./run.sh --kafka -# check status +# status check ps -ef | grep "[Q]uorumPeerMain" | wc -l ps -ef | grep "[K]afka" | wc -l ``` @@ -115,7 +108,7 @@ Now we can start our 3-nodes KafkaConnect cluster in distributed mode (workers t values automatically discover each other and form a cluster). ```sh ./run.sh --connect -# check status +# status check ps -ef | grep "[C]onnectDistributed" | wc -l tail -n100 /tmp/kafka/logs/connect.log /tmp/kafka/bin/kafka-topics.sh --zookeeper localhost:2180 --list @@ -180,11 +173,12 @@ One thing to be aware of is that Debezium offers better performance because of t but there is no standard for it, so a change to the database implementation may require a rewrite of the corresponding plugin. This also means that every data source has its own procedure to enable access to its internal log. -KafkaConnect single message transformations (SMTs) can be chained (sort of Unix pipeline) and extended with custom implementations, -but they are actually designed for simple modifications. Long chains of SMTs are hard to maintain and reason about. Moreover, remember -that tranformations are syncronous and applied at each message, so you can really slowdown the streaming pipeline with heavy processing -or external service calls. +Connectors configuration allows you to transform message payload by using single message transformations (SMTs), that can be +chained (sort of Unix pipeline) and extended with custom implementations. They are actually designed for simple modifications +and long chains of SMTs are hard to maintain and reason about. Moreover, remember that transformations are synchronous and +applied on each message, so you can really slowdown the streaming pipeline with heavy processing or external service calls. -In cases where you need to do heavy processing, split, enrich, aggregate records or call external services, you should use a stream -processing layer between Connectors such as Kafka Streams or Camel. Just remember that Kafka Streams creates internal topics and you -are forced to put transformed data back into some Kafka topic (data duplication), while this is just an option using Camel. +In cases where you need to do heavy processing, split, enrich, aggregate records or call external services, you should use a +stream processing layer between Connectors such as Kafka Streams or plain Camel. Just remember that Kafka Streams creates +internal topics and you are forced to put transformed data back into some Kafka topic (data duplication), while this is just +an option using Camel.