zhangpengbigdata opened a new issue, #6153:
URL: https://github.com/apache/iceberg/issues/6153
### Query engine
Iceberg 1.0.0 Flink1.13
### Question
Hi all,
I found duplicate records when i was repeatedly exporting records from CDC
Stream into iceberg partitioned table. Could you please help me ?
Iceberg 1.0.0, Flink1.13
SQL like this:
```
CREATE CATALOG iceberg WITH (
'type'='iceberg',
'catalog-type'='hive',
'uri'='thrift://xxxx:9083',
'clients'='5',
'property-version'='1',
'warehouse'='s3a://xxxx/xxxx/'
);
CREATE TABLE test_cdc (
`username` varchar(100),
`id` int,
start_time String,
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'xxxxxx',
'port' = '3306',
'username' = 'xxx',
'password' = 'xxx',
'database-name' = 'xxx',
'table-name' = 'xxx'
);
CREATE TABLE IF NOT EXISTS iceberg.test_db.test_cdc(
`username` STRING,
`id` int,
start_time String,
`dt` string,
PRIMARY KEY(`dt`, `id`) NOT ENFORCED
)partitioned by (dt)
WITH (
'write.format.default'='parquet',
'format-version' = '2' ,
'write.upsert.enabled'='true',
'location' = 's3a://xxxxxx/xxx/xxx'
);
INSERT INTO iceberg.test_db.test_cdc
SELECT
`username`,
`id`,
start_time,
FROM_UNIXTIME(cast(start_time as bigint) / 1000, 'yyyyMMdd') as dt
FROM test_cdc ;
```
Executed flink sql, the result is:
select count(distinct id), count(id),dt from iceberg.test_db.test_cdc group
by dt order by dt
<img width="579" alt="clipboard"
src="https://user-images.githubusercontent.com/116717900/200744200-61199ef0-81c7-4e56-bca9-3e0b1e5aeebe.png">
Rerun the flink sql, the result is:
select count(distinct id), count(id),dt from iceberg.test_db.test_cdc group
by dt order by dt
<img width="590" alt="clipboard2"
src="https://user-images.githubusercontent.com/116717900/200744267-7973fcd1-73e9-4796-85fa-2d80092e3214.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]