[ https://issues.apache.org/jira/browse/PHOENIX-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved PHOENIX-7653. ----------------------------------- Resolution: Fixed > New CDC Event for TTL expired rows > ---------------------------------- > > Key: PHOENIX-7653 > URL: https://issues.apache.org/jira/browse/PHOENIX-7653 > Project: Phoenix > Issue Type: New Feature > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 5.3.0 > > > The purpose of this Jira is to extend the Change Data Capture (CDC) > capabilities to generate CDC events when rows expire due to Time-To-Live > (TTL) settings (literal or conditional) during the major compaction. The > implementation ensures that applications consuming CDC streams receive > notification when data is automatically removed from tables, providing > additional visibility into the system-initiated deletions. > The proposed new event_type: *ttl_delete* > Example of TTL expired CDC event, assuming the row had two columns c1 and c2 > with values "v1" and "v2" respectively: > {code:java} > { > "event_type": "ttl_delete", > "pre_image": { > "c1": "v1", > "c2": "v2" > }, > "post_image": {} > } {code} > > *High level Design steps:* > * Identify the event which causes the row expiration: conditional_ttl, > maxlookback/ttl expired rows > * Capture the complete row image for the expiration. The image needs to be > directly inserted into the CDC index. If we do not provide the expired row > pre-image upfront, CDC index can not scan it after the major compaction > because the data table row no longer exists after it is expired by the major > compaction. CompactionScanner needs to send the exact CDC Json structure with > encoded bytes, which can later be directly sent to the client by the scanner > when requested. > * CDCGlobalIndexRegionScanner needs to check for the existence of the > special CF:CQ, which if found, can be directly returned as the value of "CDC > JSON" column. > * For single CF, CompactionScanner needs to perform mutation to the CDC > index directly only once. > * For multi CF, CompactionScanner might perform multiple mutation to the CDC > index. Therefore, it should use checkAndMutate to ensure the mutation happens > if the row does not exist. If the row is already inserted, and the other CF > compaction tries to put recent row values, it can update the existing > pre-image. > * In order to distinguish the same PHOENIX_ROW_TIMESTAMP() value for the CDC > index while multiple CF compactions are taking place, CompactionScanner needs > to provide compactionTime as the timestamp value in the CDC index rowkey by > updating the rowkey before performing the mutation. > * Introduce some retries in case of HTable mutation failures. -- This message was sent by Atlassian Jira (v8.20.10#820010)