[jira] [Commented] (HBASE-15576) Support stateless scanning and scanning cursor

Phil Yang (JIRA) Sat, 02 Apr 2016 10:57:43 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222983#comment-15222983
 ]


Phil Yang commented on HBASE-15576:
-----------------------------------

{quote}
They are there when we flush so that any ongoing scanners that span memory and 
hfiles will do the right thing. 
{quote}
I see, thanks for explanation. So we can scan with a specific mvcc which is not 
too old, right?

I notice in https://hbase.apache.org/acid-semantics.html, it says that "Any row 
returned by the scan will be a consistent view" but "A consistent view is not 
guaranteed intra-row scanning". If we have mvcc in the scanner, why we can not 
guarantee this if user setBatch? Thanks.

> Support stateless scanning and scanning cursor
> ----------------------------------------------
>
>                 Key: HBASE-15576
>                 URL: https://issues.apache.org/jira/browse/HBASE-15576
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>
> After 1.1.0 released, we have partial and heartbeat protocol in scanning to 
> prevent responding large data or timeout. Now for ResultScanner.next(), we 
> may block for longer time larger than timeout settings to get a Result if the 
> row is very large, or filter is sparse, or there are too many delete markers 
> in files.
> However, in some scenes, we don't want it to be blocked for too long. For 
> example, a web service which handles requests from mobile devices whose 
> network is not stable and we can not set timeout too long(eg. only 5 seconds) 
> between mobile and web service. This service will scan rows from HBase and 
> return it to mobile devices. In this scene, the simplest way is to make the 
> web service stateless. Apps in mobile devices will send several requests one 
> by one to get the data until enough just like paging a list. In each request 
> it will carry a start position which depends on the last result from web 
> service. Different requests can be sent to different web service server 
> because it is stateless.
> Therefore, the stateless web service need a cursor from HBase telling where 
> we have scanned in RegionScanner when HBase client receives an empty 
> heartbeat. And the service will return the cursor to mobile device although 
> the response has no data. In next request we can start at the position of 
> cursor, without the cursor we have to scan from last returned result and we 
> may timeout forever. And of course even if the heartbeat message is not empty 
> we can still use cursor to prevent re-scan the same rows/cells which has beed 
> skipped.
> Obviously, we will give up consistency for scanning because even HBase client 
> is also stateless, but it is acceptable in this scene. And maybe we can keep 
> mvcc in cursor so we can get a consistent view?
> HBASE-13099 had some discussion, but it has no further progress by now.
> API:
> In Scan we need a new method setStateless to make the scanning stateless and 
> need another timeout setting for stateless scanning. In this mode we will not 
> block ResultScanner.next() longer than this timeout setting. And we will 
> return Results in next() as usual but the last Result (or only Result if we 
> receive empty heartbeat) has a special flag to mark it a cursor. The cursor 
> Result has only one Cell. Users can scan like this:
> {code}
> while( r = scanner.next() && r != null && !r.isCursor()){
>     //just like before
> }
> if(r != null){
>     // scanning is not end, it is a cursor
> } else {
>     // scanning is end
> }
> scanner.close()
> {code}
> Implementation:
> We will have two options to support stateless scanning: 
> Only one rpc like small scanning, not supporting batch/partials and cursor is 
> row level. It is simple to implementation.
> Support big scanning with several rpc requests, supporting batch/partials and 
> cursor is cell level. It is a little complex because we need seek at server 
> side and we should make sure total time of rpc requests not exceed timeout 
> setting.
> Or we can make it by two phases, support one-shot first?
> Any thoughts? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15576) Support stateless scanning and scanning cursor

Reply via email to