bbeaudreault commented on code in PR #5713: URL: https://github.com/apache/hbase/pull/5713#discussion_r1518989975
########## hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/DefaultOperationQuota.java: ########## @@ -158,24 +193,60 @@ public void addMutation(final Mutation mutation) { * Update estimate quota(read/write size/capacityUnits) which will be consumed * @param numWrites the number of write requests * @param numReads the number of read requests - * @param numScans the number of scan requests */ - protected void updateEstimateConsumeQuota(int numWrites, int numReads, int numScans) { + protected void updateEstimateConsumeBatchQuota(int numWrites, int numReads) { writeConsumed = estimateConsume(OperationType.MUTATE, numWrites, 100); if (useResultSizeBytes) { readConsumed = estimateConsume(OperationType.GET, numReads, 100); - readConsumed += estimateConsume(OperationType.SCAN, numScans, 1000); } else { // assume 1 block required for reads. this is probably a low estimate, which is okay readConsumed = numReads > 0 ? blockSizeBytes : 0; - readConsumed += numScans > 0 ? blockSizeBytes : 0; } writeCapacityUnitConsumed = calculateWriteCapacityUnit(writeConsumed); readCapacityUnitConsumed = calculateReadCapacityUnit(readConsumed); } + /** + * Update estimate quota(read/write size/capacityUnits) which will be consumed + * @param scanRequest the scan to be executed + * @param maxScannerResultSize the maximum bytes to be returned by the scanner + * @param maxBlockBytesScanned the maximum bytes scanned in a single RPC call by the scanner + */ + protected void updateEstimateConsumeScanQuota(ClientProtos.ScanRequest scanRequest, + long maxScannerResultSize, long maxBlockBytesScanned) { + if (useResultSizeBytes) { + readConsumed = estimateConsume(OperationType.GET, 1, 1000); + } else { + /* + * Estimating scan workload is more complicated, and if we severely underestimate workloads + * then throttled clients will exhaust retries too quickly, and could saturate the RPC layer. + * We have access to the ScanRequest's nextCallSeq number, the maxScannerResultSize, and the + * maxBlockBytesScanned by every relevant Scanner#next call. With these inputs we can make a + * more informed estimate about the scan's workload. + */ + long estimate; + if (scanRequest.getNextCallSeq() == 0) { + // start scanners with an optimistic 1 block IO estimate + // it is better to underestimate a large scan in the beginning + // than to overestimate, and block, a small scan + estimate = blockSizeBytes; + } else { + // scanner result sizes will be limited by quota availability, regardless of + // maxScannerResultSize. This means that we cannot safely assume that a long-running + // scan with a small maxBlockBytesScanned would not prefer to pull down + // a larger payload. So we should estimate with the assumption that long-running scans + // are appropriately configured to approach their maxScannerResultSize per RPC call + estimate = Review Comment: I was describing what I thought the current implementation does. It seems like every new scan will increase the estimate u til we hit the server configured max. But I'd like to stop increasing when we stop exceeding the old limit. Imagine the following 4 next calls: Estimate = 64k; Scanned = 64k Estimate = 128k; Scanned = 128k Estimate = 256k; Scanned = 128k Estimate = 384k; Scanned = 128k So the actually scanned amount had leveled out, but the estimate keeps increasing. Ideally it'd be like this instead: Estimate = 64k; Scanned = 64k Estimate = 128k; Scanned = 128k Estimate = 256k; Scanned = 128k Estimate = **128k**; Scanned = 128k Or something else that doesn't keep increasing. Of course the impact here is not huge yet, but for larger values it seems to matter a bit more. Unless I'm missing something -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org