Re: [PR] PHOENIX-7751 : [SyncTable Tool] Feature to validate table data using PhoenixSyncTable tool b/w source and target cluster [phoenix]

via GitHub Mon, 13 Apr 2026 05:37:48 -0700


haridsv commented on code in PR #2379:
URL: https://github.com/apache/phoenix/pull/2379#discussion_r3073000599



##########
phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java:
##########
@@ -374,13 +388,14 @@ private ChunkScannerContext createChunkScanner(Connection 
conn, byte[] startKey,
     byte[] continuedDigestState, boolean isStartKeyInclusive, boolean 
isEndKeyInclusive,
     boolean isTargetScan) throws IOException, SQLException {
     // Not using try-with-resources since ChunkScannerContext owns the table 
lifecycle
-    Table hTable =
-      
conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(physicalTableName);
+    PhoenixConnection phoenixConn = conn.unwrap(PhoenixConnection.class);
+    Table hTable = phoenixConn.getQueryServices().getTable(physicalTableName);
     Scan scan =
       createChunkScan(startKey, endKey, isStartKeyInclusive, 
isEndKeyInclusive, isTargetScan);
     
scan.setAttribute(BaseScannerRegionObserverConstants.SYNC_TABLE_CHUNK_FORMATION,
 TRUE_BYTES);
     
scan.setAttribute(BaseScannerRegionObserverConstants.SKIP_REGION_BOUNDARY_CHECK,
 TRUE_BYTES);
     scan.setAttribute(BaseScannerRegionObserverConstants.UNGROUPED_AGG, 
TRUE_BYTES);
+    ScanUtil.setScanAttributesForPhoenixTTL(scan, pTable, phoenixConn);

Review Comment:
   Shouldn't expired rows get skipped even in non-strict mode? Perhaps we 
should override the table attribute?



##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/PhoenixSyncTableToolIT.java:
##########
@@ -179,7 +204,41 @@ public void testSyncTableWithDeletedRowsOnTarget() throws 
Exception {
     Job job = runSyncTool(uniqueTableName);
     SyncCountersResult counters = getSyncCounters(job);
 
-    validateSyncCounters(counters, 10, 10, 7, 3);
+    validateSyncCounters(counters, 10, 7, 7, 3);
+    validateMapperCounters(counters, 1, 3);
+  }
+
+  @Test
+  public void testSyncTableWithConditionalTTLExpiredRows() throws Exception {
+    String ddl = "CREATE TABLE IF NOT EXISTS %s (" + "ID INTEGER NOT NULL 
PRIMARY KEY, "
+      + "NAME VARCHAR(50), NAME_VALUE BIGINT, UPDATED_DATE TIMESTAMP, " + 
"EXPIRED BOOLEAN"
+      + ") REPLICATION_SCOPE=%d, UPDATE_CACHE_FREQUENCY=0, TTL='EXPIRED = 
TRUE' "
+      + "SPLIT ON (3, 5, 7)";
+    executeTableCreation(sourceConnection, String.format(ddl, uniqueTableName, 
1));
+    executeTableCreation(targetConnection, String.format(ddl, uniqueTableName, 
0));
+
+    // Insert 10 rows on source: rows 1-3 marked as expired, rows 4-10 as live
+    insertTestDataWithExpiredFlag(sourceConnection, uniqueTableName, 1, 3, 
true);
+    insertTestDataWithExpiredFlag(sourceConnection, uniqueTableName, 4, 10, 
false);
+
+    // Only non-expired rows (4-10) are visible via Phoenix queries
+    waitForReplication(targetConnection, uniqueTableName, 7);
+
+    int sourceCount = getRowCount(sourceConnection, uniqueTableName);
+    int targetCount = getRowCount(targetConnection, uniqueTableName);

Review Comment:
   We won't know if the data got really skipped as the queries for validation 
would also have skipped them. We can perhaps alter the attribute before running 
the verification queries, I am not sure.
   
   Alternatively, if we always run in strict-mode like I suggested above, we 
can create the table with `IS_STRICT_TTL=false` and we don't need to do 
anything extra here to verify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] PHOENIX-7751 : [SyncTable Tool] Feature to validate table data using PhoenixSyncTable tool b/w source and target cluster [phoenix]

Reply via email to