haridsv commented on code in PR #2379:
URL: https://github.com/apache/phoenix/pull/2379#discussion_r3073000599
##########
phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java:
##########
@@ -374,13 +388,14 @@ private ChunkScannerContext createChunkScanner(Connection
conn, byte[] startKey,
byte[] continuedDigestState, boolean isStartKeyInclusive, boolean
isEndKeyInclusive,
boolean isTargetScan) throws IOException, SQLException {
// Not using try-with-resources since ChunkScannerContext owns the table
lifecycle
- Table hTable =
-
conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(physicalTableName);
+ PhoenixConnection phoenixConn = conn.unwrap(PhoenixConnection.class);
+ Table hTable = phoenixConn.getQueryServices().getTable(physicalTableName);
Scan scan =
createChunkScan(startKey, endKey, isStartKeyInclusive,
isEndKeyInclusive, isTargetScan);
scan.setAttribute(BaseScannerRegionObserverConstants.SYNC_TABLE_CHUNK_FORMATION,
TRUE_BYTES);
scan.setAttribute(BaseScannerRegionObserverConstants.SKIP_REGION_BOUNDARY_CHECK,
TRUE_BYTES);
scan.setAttribute(BaseScannerRegionObserverConstants.UNGROUPED_AGG,
TRUE_BYTES);
+ ScanUtil.setScanAttributesForPhoenixTTL(scan, pTable, phoenixConn);
Review Comment:
Shouldn't expired rows get skipped even in non-strict mode? Perhaps we
should override the table attribute?
##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/PhoenixSyncTableToolIT.java:
##########
@@ -179,7 +204,41 @@ public void testSyncTableWithDeletedRowsOnTarget() throws
Exception {
Job job = runSyncTool(uniqueTableName);
SyncCountersResult counters = getSyncCounters(job);
- validateSyncCounters(counters, 10, 10, 7, 3);
+ validateSyncCounters(counters, 10, 7, 7, 3);
+ validateMapperCounters(counters, 1, 3);
+ }
+
+ @Test
+ public void testSyncTableWithConditionalTTLExpiredRows() throws Exception {
+ String ddl = "CREATE TABLE IF NOT EXISTS %s (" + "ID INTEGER NOT NULL
PRIMARY KEY, "
+ + "NAME VARCHAR(50), NAME_VALUE BIGINT, UPDATED_DATE TIMESTAMP, " +
"EXPIRED BOOLEAN"
+ + ") REPLICATION_SCOPE=%d, UPDATE_CACHE_FREQUENCY=0, TTL='EXPIRED =
TRUE' "
+ + "SPLIT ON (3, 5, 7)";
+ executeTableCreation(sourceConnection, String.format(ddl, uniqueTableName,
1));
+ executeTableCreation(targetConnection, String.format(ddl, uniqueTableName,
0));
+
+ // Insert 10 rows on source: rows 1-3 marked as expired, rows 4-10 as live
+ insertTestDataWithExpiredFlag(sourceConnection, uniqueTableName, 1, 3,
true);
+ insertTestDataWithExpiredFlag(sourceConnection, uniqueTableName, 4, 10,
false);
+
+ // Only non-expired rows (4-10) are visible via Phoenix queries
+ waitForReplication(targetConnection, uniqueTableName, 7);
+
+ int sourceCount = getRowCount(sourceConnection, uniqueTableName);
+ int targetCount = getRowCount(targetConnection, uniqueTableName);
Review Comment:
We won't know if the data got really skipped as the queries for validation
would also have skipped them. We can perhaps alter the attribute before running
the verification queries, I am not sure.
Alternatively, if we always run in strict-mode like I suggested above, we
can create the table with `IS_STRICT_TTL=false` and we don't need to do
anything extra here to verify.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]