[ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368375#comment-14368375 ]
Jonathan Lawlor commented on HBASE-13262: ----------------------------------------- bq. The client ultimately requests the server return a batch of size 'hbase.client.scanner.max.result.size' and then believe that the server returned less data than that limit. Exactly correct. The client looks at the Results returned from the server and from its point of view it sees that neither the maxResultSize or caching limit has been reached. The only explanation it can come up with as to why the server would return these Results is that it must have exhausted the region (otherwise it has no reason to stop accumulating Results). But the server stopped because from its PoV the size limit was reached. There is a miscommunication bq. I still don't completely understand what is causing the difference on the server-side in the first place (over 0.98) Ya, it's a little cryptic because the exact same function is used to calculate the size server side and client side. I would recommend adding some logs that allows you to see the estimatedHeapSize of a cell server side versus client side and see where they differ. My guess would be that somehow the Cell on the client side returns a slightly lower heap size estimation than the SAME Cell on the server (I don't believe it's related to the NextState size bubbling up since NextState is only in branch-1+ and the issue is branch-1.0+). Maybe the Cells/Results are serialized in such a way that these calculations are slightly different? Somehow the server's size calculation is larger than the client's size calculation. However, even when we do understand why the server's size calculation is different from the client's it may not help (of course we can only know once the issue has been identified). Like you said, the underlying problem is that the client shouldn't even be performing a size calculation but rather being told by the server why the Results were returned. As long as there is a possibility for the server and client to disagree on why the Results were returned, it is possible to incorrectly jump between regions. Fixing the size calculation may be sufficient for resolving this issue, but going forward I think your idea of passing information back to the client in the ScanResult will be the best way to go. bq. Ultimately, the underlying problem is likely best addressed from the stance that a scanner shouldn't be performing special logic based on the size of the batch of data returned from a server Agreed bq. The server already maintains a nice enum of the reason which it returns a batch of results to a client via NextState$State Just a note: NextState was introduced with HBASE-11544 which has only been backported to branch-1+ at this point. Since this issue appears in branch-1.0+, returning the NextState$State enum would require backporting that feature further. bq. I'm currently of the opinion that it's ideal to pass this information back to the client via the ScanResult I agree that somehow we need to communicate the reasoning behind why these Results were returned to the client rather than looking at the Result[] and making an "educated" guess bq. 0.98 clients running against 1.x could see this problem, although I have not tested that to confirm it happens. I suspect you're correct > ResultScanner doesn't return all rows in Scan > --------------------------------------------- > > Key: HBASE-13262 > URL: https://issues.apache.org/jira/browse/HBASE-13262 > Project: HBase > Issue Type: Bug > Components: Client > Affects Versions: 2.0.0, 1.1.0 > Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Blocker > Fix For: 2.0.0, 1.1.0 > > Attachments: testrun_0.98.txt, testrun_branch1.0.txt > > > Tried to write a simple Java client again 1.1.0-SNAPSHOT. > * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), > for a total of 10M cells written > * Read back the data from the table, ensure I saw 10M cells > Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of > the actual rows. Running against 1.0.0, returns all 10M records as expected. > [Code I was > running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java] > for the curious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)