[ 
https://issues.apache.org/jira/browse/PHOENIX-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved PHOENIX-7653.
-----------------------------------
    Resolution: Fixed

> New CDC Event for TTL expired rows
> ----------------------------------
>
>                 Key: PHOENIX-7653
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7653
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 5.3.0
>
>
> The purpose of this Jira is to extend the Change Data Capture (CDC) 
> capabilities to generate CDC events when rows expire due to Time-To-Live 
> (TTL) settings (literal or conditional) during the major compaction. The 
> implementation ensures that applications consuming CDC streams receive 
> notification when data is automatically removed from tables, providing 
> additional visibility into the system-initiated deletions.
> The proposed new event_type: *ttl_delete*
> Example of TTL expired CDC event, assuming the row had two columns c1 and c2 
> with values "v1" and "v2" respectively:
> {code:java}
> {
>   "event_type": "ttl_delete",
>   "pre_image": {
>     "c1": "v1",
>     "c2": "v2"
>   },
>   "post_image": {}
> } {code}
>  
> *High level Design steps:*
>  * Identify the event which causes the row expiration: conditional_ttl, 
> maxlookback/ttl expired rows
>  * Capture the complete row image for the expiration. The image needs to be 
> directly inserted into the CDC index. If we do not provide the expired row 
> pre-image upfront, CDC index can not scan it after the major compaction 
> because the data table row no longer exists after it is expired by the major 
> compaction. CompactionScanner needs to send the exact CDC Json structure with 
> encoded bytes, which can later be directly sent to the client by the scanner 
> when requested.
>  * CDCGlobalIndexRegionScanner needs to check for the existence of the 
> special CF:CQ, which if found, can be directly returned as the value of "CDC 
> JSON" column.
>  * For single CF, CompactionScanner needs to perform mutation to the CDC 
> index directly only once.
>  * For multi CF, CompactionScanner might perform multiple mutation to the CDC 
> index. Therefore, it should use checkAndMutate to ensure the mutation happens 
> if the row does not exist. If the row is already inserted, and the other CF 
> compaction tries to put recent row values, it can update the existing 
> pre-image.
>  * In order to distinguish the same PHOENIX_ROW_TIMESTAMP() value for the CDC 
> index while multiple CF compactions are taking place, CompactionScanner needs 
> to provide compactionTime as the timestamp value in the CDC index rowkey by 
> updating the rowkey before performing the mutation.
>  * Introduce some retries in case of HTable mutation failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to