[ https://issues.apache.org/jira/browse/FLINK-25330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463512#comment-17463512 ]
Jing Ge commented on FLINK-25330: --------------------------------- Hi [~Ibson], yes, this is the weird part of HBase and depends on how you get/scan the data. The real deletion happens during the major compaction. Would you describe your business scenario a little bit more in details to help me understanding it? E.g. in real case, will you get the cell right after deleting it? i.e. right after assigning a tombstone marker to the last version of a column for deletion? Could it solve your problem if - the connector options sink.buffer-flush.xxx were set smaller to make sure flush has been triggered before getting the cell again? - using versions => 1 combined with KEEP_DELETED_CELLS => true? I know it looks a little bit weird, just want to check if it works for your case. Many thanks. > Flink SQL doesn't retract all versions of Hbase data > ---------------------------------------------------- > > Key: FLINK-25330 > URL: https://issues.apache.org/jira/browse/FLINK-25330 > Project: Flink > Issue Type: Bug > Components: Connectors / HBase > Reporter: Bruce Wong > Assignee: Jing Ge > Priority: Critical > Labels: pull-request-available > Attachments: Flink-SQL-Test.zip, bundle_data.zip, > image-2021-12-15-20-05-18-236.png, test_res.png, test_res_1.png > > > h2. Background > When we use CDC to synchronize mysql data to HBase, we find that HBase > deletes only the last version of the specified rowkey when deleting mysql > data. The data of the old version still exists. You end up using the wrong > data. And I think its a bug of HBase connector. > The following figure shows Hbase data changes before and after mysql data is > deleted. > !image-2021-12-15-20-05-18-236.png|width=910,height=669! > > h2. -- This message was sent by Atlassian Jira (v8.20.1#820001)