[
https://issues.apache.org/jira/browse/PHOENIX-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696256#comment-17696256
]
ASF GitHub Bot commented on PHOENIX-6888:
-----------------------------------------
kadirozde commented on code in PR #1569:
URL: https://github.com/apache/phoenix/pull/1569#discussion_r1124851443
##########
phoenix-core/src/main/java/org/apache/phoenix/filter/PagedFilter.java:
##########
@@ -28,16 +28,21 @@
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.DoNotRetryIOException;
+import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.exceptions.DeserializationException;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterBase;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.util.Writables;
import org.apache.hadoop.io.Writable;
import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ScanUtil;
/**
- * This filter overrides the behavior of delegate so that we do not scan more
rows than pageSizeInRows .
+ * This is a top level Phoenix filter which injected to a scan at the server
side. If the scan has
+ * already a filter then PagedFilter wraps it. There two functions this filter
implements: paging
+ * and TTL. The paging function makes sure that the scan does not take more
than pageSizeInMs.
+ * The TTL function is for masking expired rows.
Review Comment:
I could not separate them. Both functionalities needs to be done at the top
level filter. If we choose to skip an expired row under or above the
PagedFilter then we cannot ensure paging happens timely. I am open to change
the name of the class if you have a good suggestion for this.
> Fixing TTL and Max Lookback Issues for Phoenix Tables
> -----------------------------------------------------
>
> Key: PHOENIX-6888
> URL: https://issues.apache.org/jira/browse/PHOENIX-6888
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.1.3
> Reporter: Kadir Ozdemir
> Assignee: Kadir Ozdemir
> Priority: Major
>
> In HBase, the unit of data is a cell and data retention rules are executed at
> the cell level. These rules are defined at the column family level. Phoenix
> leverages the data retention features of HBase and exposes them to its users
> to provide its TTL feature at the table level. However, these rules (since
> they are defined at the cell level instead of the row level) results in
> partial row retention that in turn creates data integrity issues at the
> Phoenix level.
> Similarly, Phoenix’s max lookback feature leverages HBase deleted data
> retention capabilities to preserve deleted cells within a configurable max
> lookback. This requires two data retention windows, max lookback and TTL. One
> end of these windows is the current time and the end is a moment in the past
> (i.e., current time minus the window size). Typically, the max lookback
> window is shorter than the TTL window. In the max lookback window, we would
> like to preserve the complete history of mutations regardless of how many
> cell versions these mutations generated. In the remaining TTL window outside
> the max lookback, we would like to apply the data retention rules defined
> above. However, HBase provides only one data retention window. Thus, the max
> lookback window had to be extended to become TTL window and the max lookback
> feature results in unwantedly retaining deleted data for the maximum of max
> lookback and TTL periods.
> This Jira is to fix both of these issues.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)