[jira] [Comment Edited] (HBASE-15484) Correct the semantic of batch and partial

Phil Yang (JIRA) Fri, 11 Nov 2016 00:44:38 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656513#comment-15656513
 ]


Phil Yang edited comment on HBASE-15484 at 11/11/16 8:43 AM:
-------------------------------------------------------------

For caching we had some discussion in HBASE-16987 and HBASE-16973. Using 
size/time limit is more direct than setCache for users because usually they 
setLimit because they want to limit size/time, and now by default we set cache 
to max_value.

Paging in cell level is a possible scene. It is different from "limit" which 
Duo mentions because limit means we can stop and close the scanner, but batch 
means we should pause and wait next call. Since we have size/time limit at 
server side, a large row will not result in OOM at server even users don't 
setBatch. If users indeed need setBatch to limit the max number of cells for 
one Result returns, I think we can keep setBatch interface but change it to a 
client-only logic. In server we only consider size/time limit, and if we return 
more than batch cells, we can cache the rest of them in client? By this 
changing, we can decrease the number of RPC requests without OOM/Timeout risk.

[~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in 
HBASE-16973 :) Thanks.


was (Author: yangzhe1991):
For caching we had some discussion in HBASE-16987 and HBASE-16973. Using 
size/time limit is more direct than setCache for users because usually they 
setLimit because they want to limit size/time, and now by default we set cache 
to max_value.

Paging in cell level is a possible scene. It is different from "limit" which 
Duo mentions because limit means we can stop and close the scanner, but batch 
means we should pause and wait next call. Since we have size/time limit at 
server side, a large row will not result in OOM at server even users don't 
setBatch. If users indeed need setBatch to limit the max number of cells for 
one Result returns, I think we can keep setBatch interface but change it to a 
client-only logic. In server we only consider size/time limit, and if we return 
more than batch cells, we can cache them in client? By this changing, we can 
decrease the number of RPC requests without OOM/Timeout risk.

[~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in 
HBASE-16973 :) Thanks.

> Correct the semantic of batch and partial
> -----------------------------------------
>
>                 Key: HBASE-15484
>                 URL: https://issues.apache.org/jira/browse/HBASE-15484
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, 
> HBASE-15484-v3.patch, HBASE-15484-v4.patch
>
>
> Follow-up to HBASE-15325, as discussed, the meaning of setBatch and 
> setAllowPartialResults should not be same. We should not regard setBatch as 
> setAllowPartialResults.
> And isPartial should be define accurately.
> (Considering getBatch==MaxInt if we don't setBatch.) If 
> result.rawcells.length<scan.getBatch && result is not the last part of this 
> row, isPartial==true, otherwise isPartial == false. So if user don't 
> setAllowPartialResults(true), isPartial should always be false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-15484) Correct the semantic of batch and partial

Reply via email to