[ https://issues.apache.org/jira/browse/PHOENIX-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807903#comment-17807903 ]
ASF GitHub Bot commented on PHOENIX-7106: ----------------------------------------- virajjasani commented on code in PR #1736: URL: https://github.com/apache/phoenix/pull/1736#discussion_r1456518028 ########## phoenix-core-client/src/main/java/org/apache/phoenix/iterate/OffsetResultIterator.java: ########## @@ -32,32 +34,49 @@ */ public class OffsetResultIterator extends DelegateResultIterator { private int rowCount; - private int offset; + private final int offset; + private Tuple lastScannedTuple; private long pageSizeMs = Long.MAX_VALUE; + private boolean isIncompatibleClient = false; public OffsetResultIterator(ResultIterator delegate, Integer offset) { super(delegate); this.offset = offset == null ? -1 : offset; + this.lastScannedTuple = null; } - public OffsetResultIterator(ResultIterator delegate, Integer offset, long pageSizeMs) { + public OffsetResultIterator(ResultIterator delegate, Integer offset, long pageSizeMs, + boolean isIncompatibleClient) { this(delegate, offset); this.pageSizeMs = pageSizeMs; + this.isIncompatibleClient = isIncompatibleClient; } + @Override public Tuple next() throws SQLException { + long startTime = EnvironmentEdgeManager.currentTimeMillis(); while (rowCount < offset) { Tuple tuple = super.next(); - if (tuple == null) { return null; } + if (tuple == null) { + return null; + } if (tuple.size() == 0 || isDummy(tuple)) { // while rowCount < offset absorb the dummy and call next on the underlying scanner continue; } rowCount++; - // no page timeout check at this level because we cannot correctly resume - // scans for OFFSET queries until the offset is reached + lastScannedTuple = tuple; + if (!isIncompatibleClient) { + if (EnvironmentEdgeManager.currentTimeMillis() - startTime >= pageSizeMs) { Review Comment: I am not able to reproduce any issue with this. OffsetScanner will return valid tuple with cell CF:CQ as `f_offset:c_offset`, from which the cell value will be used by SerialIterators to set offset on next scan: ``` while (index < scans.size()) { Scan currentScan = scans.get(index++); if (remainingOffset != null) { // <======= set offset value currentScan.setAttribute(BaseScannerRegionObserverConstants.SCAN_OFFSET, PInteger.INSTANCE.toBytes(remainingOffset)); } ScanMetricsHolder scanMetricsHolder = ScanMetricsHolder.getInstance(readMetrics, tableName, currentScan, context.getConnection().getLogLevel()); TableResultIterator itr = new TableResultIterator(mutationState, currentScan, scanMetricsHolder, renewLeaseThreshold, plan, scanGrouper, caches, maxQueryEndTime); PeekingResultIterator peekingItr = iteratorFactory.newIterator(context, itr, currentScan, tableName, plan); Tuple tuple; if ((tuple = peekingItr.peek()) == null) { peekingItr.close(); continue; // <======== retrieve offset value from server } else if ((remainingOffset = QueryUtil.getRemainingOffset(tuple)) != null) { peekingItr.next(); peekingItr.close(); continue; } context.getConnection().addIteratorForLeaseRenewal(itr); return peekingItr; } ``` Hence, depending on the cell returned by server, next scan's offset attribute will be set and accordingly new scanner will take that value. Basically, i will have to remove page timeout logic from offset because of it's tricky to manage returning dummy, and if we do that, we will anyways ensure that server is able to return valid offset value to client and client will check whether more serial iteration needs to happen depending on how many rows were scanned. The logic remains same even if region moves after returning valid row. The problem happens only when region moves after dummy is returned, so we will remove page timeout for offset scanner. > Data Integrity issues due to invalid rowkeys returned by various coprocessors > ----------------------------------------------------------------------------- > > Key: PHOENIX-7106 > URL: https://issues.apache.org/jira/browse/PHOENIX-7106 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 5.2.0, 5.1.4 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Blocker > Fix For: 5.2.0, 5.1.4 > > > HBase scanner interface expects server to perform scan of the cells from > HFile or Block cache and return consistent data i.e. rowkey of the cells > returned should stay in the range of the scan boundaries. When a region moves > and scanner needs reset, or if the current row is too large and the server > returns partial row, the subsequent scanner#next is supposed to return > remaining cells. When this happens, cell rowkeys returned by servers i.e. any > coprocessors is expected to be in the scan boundary range so that server can > reliably perform its validation and return remaining cells as expected. > Phoenix client initiates serial or parallel scans from the aggregators based > on the region boundaries and the scan boundaries are sometimes adjusted based > on where optimizer provided key ranges, to include tenant boundaries, salt > boundaries etc. After the client opens the scanner and performs scan > operation, some of the coprocs return invalid rowkey for the following cases: > # Grouped aggregate queries > # Some Ungrouped aggregate queries > # Offset queries > # Dummy cells returned with empty rowkey > # Update statistics queries > # Uncovered Index queries > # Ordered results at server side > # ORDER BY DESC on rowkey > # Global Index read-repair > # Paging region scanner with HBase scanner reopen > # ORDER BY on non-pk column(s) with/without paging > # GROUP BY on non-pk column(s) with/without paging > Since many of these cases return reserved rowkeys, they are likely not going > to match scan or region boundaries. It has potential to cause data integrity > issues in certain scenarios as explained above. Empty rowkey returned by > server can be treated as end of the region scan by HBase client. > With the paging feature enabled, if the page size is kept low, we have higher > chances of scanners returning dummy cell, resulting in increased num of RPC > calls for better latency and timeouts. We should return only valid rowkey in > the scan range for all the cases where we perform above mentioned operations > like complex aggregate or offset queries etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)