Re: Debezium Avro support for Paimon

umesh dangat Wed, 03 Apr 2024 14:24:28 -0700

Hello Zelin,

  I pinged you on this PR: https://github.com/apache/paimon/pull/2070.
Checking in to see what release of paimon you plan to pick this item:
"support debezium avro format with schema registry".


Thanks,
Umesh

On Mon, Jan 22, 2024 at 8:55 AM umesh dangat <[email protected]> wrote:

> Thanks for the update Zelin.
>
> Currently, the intermediate records from Kafka source are string type. But
> for debezium-avro, the intermediate records are avro objects,
> This is indeed the case for nested avro records containing arrays, maps,
> avro records etc. There is already a TODO comment here
> <https://github.com/apache/incubator-paimon/blob/master/paimon-flink/paimon-flink-cdc/src/main/java/org/apache/paimon/flink/sink/cdc/CdcRecordUtils.java#L102>
>  that
> mentions that we need to either extend TypeUtils to handle such types or
> change CdcRecord.fields Map to not have String as values.  My branch in [2]
> took the former approach. Ofc I also needed to change the
> DebeziumAvroParser to handle such types (rather than convert them to
> String).
>
>  I will continue on Debezium-avro format in 0.8.0
> Thanks for working on this. I am fine with the debezium avro being
> available in 0.8. One thing that would be nice is if you can rebase branch
> [1] on master, then I can continue working off it in the meanwhile as the
> current branch [2] is based on [1] and it's quite diverted from master.
>
> Thanks,
> Umesh
>
>
>
>
> On Sun, Jan 21, 2024 at 8:43 PM yu zelin <[email protected]> wrote:
>
>> Hi Umesh,
>>
>> Recently I’m working on support Confluent debezium avro format
>> in Kafka cdc based on [1]. But the Paimon community is planning
>> to cut 0.7.0 release branch at Jan. 25th. And I think there is not enough
>> time for me to complete the job before the deadline for some reasons:
>>
>> 1. I have to modify the current CDC framework. Currently, the
>> intermediate
>> records from Kafka source are string type. But for debezium-avro, the
>> intermediate records are avro objects, so we have to adjust the framework.
>> It needs some time.
>>
>> 2. I noticed that you want to support some complex type in [2] which
>> made some changes to TypeUtils. Since this util is used by many
>> features, we should do some tests to see if the changes are compatible
>> with other features. I think if we implement a simple version in this
>> release
>> which doesn’t support those complex types, this release cannot meet
>> your situation. So I suggest that you continue to use the jar buit by
>> yourself.
>>
>> Recently I’m also woking on preparing to release 0.7.0. I will continue
>> on
>> Debezium-avro format in 0.8.0. If you have any problems with [1], welcome
>> to discuss with us in mailing list.
>>
>> Best,
>> Zelin Yu
>>
>> [1] https://github.com/apache/incubator-paimon/pull/2070
>> [2] https://github.com/harveyyue/incubator-paimon/pull/1
>>
>> 2024年1月10日 01:21，umesh dangat <[email protected]> 写道：
>>
>> Hello,
>>
>> I am a software engineer at Yelp Inc and lead the data infrastructure
>> group at Yelp. We have a complex real time streaming ecosystem comprising
>> flink, kafka and our custom schema registry service. I am trying to
>> evaluate Apache Paimon as a potential replacement for a lot of our data
>> pipelines, involving streaming reads, joins and aggregations to help
>> minimize our growing operational complexity and cost. Also paimon seems to
>> solve the schema evolution problem better than flink sqlclient? (which we
>> use currently)
>>
>> One issue with integrating paimon in our ecosystem seems to be that it
>> does
>> not support debezium avro format. Although Jingsong Li pointed me to this
>> <https://github.com/apache/incubator-paimon/pull/2070> branch that does
>> seem to add support for debezium avro format using confluent schema
>> registry. This would allow us to ingest our data from kafka into paimon
>> and
>> then evaluate it.
>>
>> I wanted to know if we have plans to push this branch to master soonish. I
>> can help with reviewing, since I plan to consume data written using this
>> format for some of our production workflows.
>>
>> Thanks,
>> Umesh
>>
>>
>>

Re: Debezium Avro support for Paimon

Reply via email to