[ 
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368375#comment-14368375
 ] 

Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------

bq. The client ultimately requests the server return a batch of size 
'hbase.client.scanner.max.result.size' and then believe that the server 
returned less data than that limit.

Exactly correct. The client looks at the Results returned from the server and 
from its point of view it sees that neither the maxResultSize or caching limit 
has been reached. The only explanation it can come up with as to why the server 
would return these Results is that it must have exhausted the region (otherwise 
it has no reason to stop accumulating Results). But the server stopped because 
from its PoV the size limit was reached. There is a miscommunication

bq. I still don't completely understand what is causing the difference on the 
server-side in the first place (over 0.98)

Ya, it's a little cryptic because the exact same function is used to calculate 
the size server side and client side. I would recommend adding some logs that 
allows you to see the estimatedHeapSize of a cell server side versus client 
side and see where they differ. My guess would be that somehow the Cell on the 
client side returns a slightly lower heap size estimation than the SAME Cell on 
the server (I don't believe it's related to the NextState size bubbling up 
since NextState is only in branch-1+ and the issue is branch-1.0+). Maybe the 
Cells/Results are serialized in such a way that these calculations are slightly 
different? Somehow the server's size calculation is larger than the client's 
size calculation.

However, even when we do understand why the server's size calculation is 
different from the client's it may not help (of course we can only know once 
the issue has been identified). Like you said, the underlying problem is that 
the client shouldn't even be performing a size calculation but rather being 
told by the server why the Results were returned. As long as there is a 
possibility for the server and client to disagree on why the Results were 
returned, it is possible to incorrectly jump between regions. Fixing the size 
calculation may be sufficient for resolving this issue, but going forward I 
think your idea of passing information back to the client in the ScanResult 
will be the best way to go.

bq. Ultimately, the underlying problem is likely best addressed from the stance 
that a scanner shouldn't be performing special logic based on the size of the 
batch of data returned from a server

Agreed

bq. The server already maintains a nice enum of the reason which it returns a 
batch of results to a client via NextState$State

Just a note: NextState was introduced with HBASE-11544 which has only been 
backported to branch-1+ at this point. Since this issue appears in branch-1.0+, 
returning the NextState$State enum would require backporting that feature 
further. 

bq. I'm currently of the opinion that it's ideal to pass this information back 
to the client via the ScanResult 

I agree that somehow we need to communicate the reasoning behind why these 
Results were returned to the client rather than looking at the Result[] and 
making an "educated" guess

bq. 0.98 clients running against 1.x could see this problem, although I have 
not tested that to confirm it happens.

I suspect you're correct


> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), 
> for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of 
> the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was 
> running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
>  for the curious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to