Viraj Jasani created PHOENIX-7513: ------------------------------------- Summary: Clean-up CDC partition metadata for closed partitions Key: PHOENIX-7513 URL: https://issues.apache.org/jira/browse/PHOENIX-7513 Project: Phoenix Issue Type: Sub-task Reporter: Viraj Jasani
Phoenix CDC Partitions can be categorized into two categories: # Open partitions: Any partition with corresponding data table region that is currently active is considered as open partition. The data table region can continue to server read/write requests until it is split into two daughter regions or multiple parent regions are merged into one region. # Closed partitions: Any partition with corresponding data table regions that is not longer alive and ready to be archived or already archived after getting split or merged into new region(s), is considered as closed partition. The data table region is no longer live and hence can no longer server any more read/write requests. Once parent region(s) split or merged into child region(s), metadata for the closed partitions should stay in SYSTEM.CDC_STREAM at least for predetermined Stream metadata TTL time duration (let's say 24 hr by default). After this duration, the records should be cleaned up. The cleanup can be performed in any of the two ways: Wither, use background Task that can clean up partitions that have been closed i.e. the rows with not-null PARTITION_END_TIME and PHOENIX_ROW_TIMESTAMP() value less than current time - TTL (24 hr) Or, use Conditional TTL with condition like: {code:java} TTL_EXPRESSION = CASE WHEN PHOENIX_ROW_TIMESTAMP() < (CURRENT_TIME() - 24 hr) AND PARTITION_END_TIME IS NOT NULL THEN 0 ELSE <FOREVER> END{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)