[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656513#comment-15656513 ]
Phil Yang edited comment on HBASE-15484 at 11/11/16 8:43 AM: ------------------------------------------------------------- For caching we had some discussion in HBASE-16987 and HBASE-16973. Using size/time limit is more direct than setCache for users because usually they setLimit because they want to limit size/time, and now by default we set cache to max_value. Paging in cell level is a possible scene. It is different from "limit" which Duo mentions because limit means we can stop and close the scanner, but batch means we should pause and wait next call. Since we have size/time limit at server side, a large row will not result in OOM at server even users don't setBatch. If users indeed need setBatch to limit the max number of cells for one Result returns, I think we can keep setBatch interface but change it to a client-only logic. In server we only consider size/time limit, and if we return more than batch cells, we can cache the rest of them in client? By this changing, we can decrease the number of RPC requests without OOM/Timeout risk. [~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in HBASE-16973 :) Thanks. was (Author: yangzhe1991): For caching we had some discussion in HBASE-16987 and HBASE-16973. Using size/time limit is more direct than setCache for users because usually they setLimit because they want to limit size/time, and now by default we set cache to max_value. Paging in cell level is a possible scene. It is different from "limit" which Duo mentions because limit means we can stop and close the scanner, but batch means we should pause and wait next call. Since we have size/time limit at server side, a large row will not result in OOM at server even users don't setBatch. If users indeed need setBatch to limit the max number of cells for one Result returns, I think we can keep setBatch interface but change it to a client-only logic. In server we only consider size/time limit, and if we return more than batch cells, we can cache them in client? By this changing, we can decrease the number of RPC requests without OOM/Timeout risk. [~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in HBASE-16973 :) Thanks. > Correct the semantic of batch and partial > ----------------------------------------- > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0, 1.1.3 > Reporter: Phil Yang > Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length<scan.getBatch && result is not the last part of this > row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)