Thanks for the update Zelin.

Currently, the intermediate records from Kafka source are string type. But
for debezium-avro, the intermediate records are avro objects,
This is indeed the case for nested avro records containing arrays, maps,
avro records etc. There is already a TODO comment here
<https://github.com/apache/incubator-paimon/blob/master/paimon-flink/paimon-flink-cdc/src/main/java/org/apache/paimon/flink/sink/cdc/CdcRecordUtils.java#L102>
that
mentions that we need to either extend TypeUtils to handle such types or
change CdcRecord.fields Map to not have String as values.  My branch in [2]
took the former approach. Ofc I also needed to change the
DebeziumAvroParser to handle such types (rather than convert them to
String).

 I will continue on Debezium-avro format in 0.8.0
Thanks for working on this. I am fine with the debezium avro being
available in 0.8. One thing that would be nice is if you can rebase branch
[1] on master, then I can continue working off it in the meanwhile as the
current branch [2] is based on [1] and it's quite diverted from master.

Thanks,
Umesh




On Sun, Jan 21, 2024 at 8:43 PM yu zelin <[email protected]> wrote:

> Hi Umesh,
>
> Recently I’m working on support Confluent debezium avro format
> in Kafka cdc based on [1]. But the Paimon community is planning
> to cut 0.7.0 release branch at Jan. 25th. And I think there is not enough
> time for me to complete the job before the deadline for some reasons:
>
> 1. I have to modify the current CDC framework. Currently, the intermediate
> records from Kafka source are string type. But for debezium-avro, the
> intermediate records are avro objects, so we have to adjust the framework.
> It needs some time.
>
> 2. I noticed that you want to support some complex type in [2] which
> made some changes to TypeUtils. Since this util is used by many
> features, we should do some tests to see if the changes are compatible
> with other features. I think if we implement a simple version in this
> release
> which doesn’t support those complex types, this release cannot meet
> your situation. So I suggest that you continue to use the jar buit by
> yourself.
>
> Recently I’m also woking on preparing to release 0.7.0. I will continue on
> Debezium-avro format in 0.8.0. If you have any problems with [1], welcome
> to discuss with us in mailing list.
>
> Best,
> Zelin Yu
>
> [1] https://github.com/apache/incubator-paimon/pull/2070
> [2] https://github.com/harveyyue/incubator-paimon/pull/1
>
> 2024年1月10日 01:21,umesh dangat <[email protected]> 写道:
>
> Hello,
>
> I am a software engineer at Yelp Inc and lead the data infrastructure
> group at Yelp. We have a complex real time streaming ecosystem comprising
> flink, kafka and our custom schema registry service. I am trying to
> evaluate Apache Paimon as a potential replacement for a lot of our data
> pipelines, involving streaming reads, joins and aggregations to help
> minimize our growing operational complexity and cost. Also paimon seems to
> solve the schema evolution problem better than flink sqlclient? (which we
> use currently)
>
> One issue with integrating paimon in our ecosystem seems to be that it does
> not support debezium avro format. Although Jingsong Li pointed me to this
> <https://github.com/apache/incubator-paimon/pull/2070> branch that does
> seem to add support for debezium avro format using confluent schema
> registry. This would allow us to ingest our data from kafka into paimon and
> then evaluate it.
>
> I wanted to know if we have plans to push this branch to master soonish. I
> can help with reviewing, since I plan to consume data written using this
> format for some of our production workflows.
>
> Thanks,
> Umesh
>
>
>

Reply via email to