[jira] [Commented] (HUDI-2348) Publish a blog on schema evolution with KafkaAvroCustomDeserializer

ASF GitHub Bot (Jira) Fri, 27 Aug 2021 11:46:22 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405975#comment-17405975
 ]


ASF GitHub Bot commented on HUDI-2348:
--------------------------------------

pratyakshsharma commented on a change in pull request #3485:
URL: https://github.com/apache/hudi/pull/3485#discussion_r697650485



##########
File path: website/blog/2021-08-16-kafka-custom-deserializer.md
##########
@@ -0,0 +1,47 @@
+---
+title: "Schema evolution with DeltaStreamer using KafkaSource"
+excerpt: "Evolve schema used in Kafkasource of DeltaStreamer to keep data up 
to date with business"
+author: sbernauer
+category: blog
+---
+
+The schema used for data exchange between services can change change rapidly 
with new business requirements.
+Apache Hudi is often used in combination with kafka as a event stream where 
all events are transmitted according to an record schema.
+In our case a Confluent schema registry is used to maintain the schema and as 
schema evolves, newer versions are updated in the schema registry.
+<!--truncate-->
+
+## What do we want to achieve?
+We have multiple instances of DeltaStreamer running, consuming many topics 
with different schemas ingesting to multiple Hudi tables. Deltastreamer is a 
utility in Hudi to assist in ingesting data from multiple sources like DFS, 
kafka, etc into Hudi. If interested, you can read more about DeltaStreamer tool 
[here](https://hudi.apache.org/docs/writing_data#deltastreamer)
+Ideally every Topic should be able to evolve the schema to match new business 
requirements. Consumers start producing data with a new schema version and the 
DeltaStreamer picks up the new schema and ingests the data with the new schema. 
For this to work, we run our DeltaStreamer instances with the latest schema 
version available from the Schema Registry to ensure that we always use the 
freshest schema with all attributes.

Review comment:
       nit: every Topic -> every topic. 
   
   Also I did not get this part --> Consumers start producing data, should it 
be Producers start producing data?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Publish a blog on schema evolution with KafkaAvroCustomDeserializer
> -------------------------------------------------------------------
>
>                 Key: HUDI-2348
>                 URL: https://issues.apache.org/jira/browse/HUDI-2348
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> Publish a blog on schema evolution with KafkaAvroCustomDeserializer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2348) Publish a blog on schema evolution with KafkaAvroCustomDeserializer

Reply via email to