[ https://issues.apache.org/jira/browse/HBASE-15325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169381#comment-15169381 ]
Phil Yang commented on HBASE-15325: ----------------------------------- [~tedyu] Thank you for reviewing. {quote} What if allResultSkipped is true and the original condition, doneWithRegion(remainingResultSize, countdown, serverHasMoreResults) && (!partialResults.isEmpty() || possiblyNextScanner(countdown, values == null)), is false ? {quote} There is an assumption that if the cache is still empty after loadcache() the scan is done. Before this patch, when enter this do-while condition statement, we have two possible status: this region is done we need continue do-while loop and scan next region to satisfy the caching number or maxScannerResultSize; this region is not done, we must have loaded some data to cache so we can exit the loop. However, in the second status if we skip all cells because they have been seen before, the cache is still empty. So we can not exit this loop, we must scan one more time in this region. So we need allResultSkipped flag. {quote} Why is the above assignment needed ? rs and r reference the same Result, right ? {quote} My fault. I thought we should use the original size of Result to be estimatedHeapSizeOfResult. However I think we should make sure we can done loadcache() only when the cache's size is larger than maxScannerResultSize or we have enough number of results for caching or scan is done. So if we skip some cells because we have seen them, we should set estimatedHeapSizeOfResult to the size of filtered Result. > ResultScanner allowing partial result will miss the rest of the row if the > region is moved between two rpc requests > ------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-15325 > URL: https://issues.apache.org/jira/browse/HBASE-15325 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0, 1.1.3 > Reporter: Phil Yang > Assignee: Phil Yang > Priority: Critical > Attachments: 15325-test.txt, HBASE-15325-v1.txt > > > HBASE-11544 allow scan rpc return partial of a row to reduce memory usage for > one rpc request. And client can setAllowPartial or setBatch to get several > cells in a row instead of the whole row. > However, the status of the scanner is saved on server and we need this to get > the next part if there is a partial result before. If we move the region to > another RS, client will get a NotServingRegionException and open a new > scanner to the new RS which will be regarded as a new scan from the end of > this row. So the rest cells of the row of last result will be missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)