Hi Himanshu

Thanks for the email,  currently we flink+iceberg support writing CDC
events into apache iceberg table by flink datastream API, besides the
spark/presto/hive could read those events in batch job.

But there are still some issues that we do not finish yet:

1.  Expose the iceberg v2 to end users.  The row-level delete feature is
actually built on the iceberg format v2,  there are still some blockers
that we need to fix (pls see the document
https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit),
we iceberg team will need some resources to resolve them.
2.  As we know the CDC events depend on iceberg primary key identification
(Then we could define mysql_cdc sql table by using primary key cause) I saw
Jack Ye has published a PR to this
https://github.com/apache/iceberg/pull/2354,  I will review it today.
3.  The CDC writers will produce many small files inevitably as the
periodic checkpoints go on,  so for the real production env we must provide
the ability to rewrite small files into larger files ( compaction action)
.  There are few PRs needing to be reviewing:
       a.  https://github.com/apache/iceberg/pull/2303/files
       b.  https://github.com/apache/iceberg/pull/2294
       c.  https://github.com/apache/iceberg/pull/2216

I think it's better to resolve all those issues before we put the
production data into iceberg ( syncing mysql binlog via debezium).  I saw
the last sync notes saying  the next release 0.12.0 would be released in
end of this month ideally (
https://lists.apache.org/x/thread.html/rdb7d1ab221295adec33cf93dcbcac2b9b7b80708b2efd903b7105511@%3Cdev.iceberg.apache.org%3E)
,  I think that  that deadline is too tight.  In my mind,  if the release
0.12.0 won't expose the format v2 to end users, then what are the core
features that we want to release ?  If the features that we plan to release
are not major ones,  then how about releasing the 0.11.2 ?

According to my understanding of the needs of community users, the vast
majority of iceberg users have high expectations for format v2. I think we
may need to raise the v2 exposure to a higher priority so that our users
can do the whole PoC tests earlier.



On Wed, Mar 24, 2021 at 3:49 AM Himanshu Rathore
<himanshu.rath...@zomato.com.invalid> wrote:

> We are planning for use Flink + Iceberg for syncing mysql binlog's via
> debezium and its seams of things are dependent on next release.
>

Reply via email to