[ https://issues.apache.org/jira/browse/HBASE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phil Yang updated HBASE-15576: ------------------------------ Attachment: HBASE-15576.v06.patch fix javadoc issue > Scanning cursor to prevent blocking long time on ResultScanner.next() > --------------------------------------------------------------------- > > Key: HBASE-15576 > URL: https://issues.apache.org/jira/browse/HBASE-15576 > Project: HBase > Issue Type: New Feature > Reporter: Phil Yang > Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-15576.v01.patch, HBASE-15576.v02.patch, > HBASE-15576.v03.patch, HBASE-15576.v03.patch, HBASE-15576.v04.patch, > HBASE-15576.v04.patch, HBASE-15576.v05.patch, HBASE-15576.v06.patch > > > After 1.1.0 released, we have partial and heartbeat protocol in scanning to > prevent responding large data or timeout. Now for ResultScanner.next(), we > may block for longer time larger than timeout settings to get a Result if the > row is very large, or filter is sparse, or there are too many delete markers > in files. > However, in some scenes, we don't want it to be blocked for too long. For > example, a web service which handles requests from mobile devices whose > network is not stable and we can not set timeout too long(eg. only 5 seconds) > between mobile and web service. This service will scan rows from HBase and > return it to mobile devices. In this scene, the simplest way is to make the > web service stateless. Apps in mobile devices will send several requests one > by one to get the data until enough just like paging a list. In each request > it will carry a start position which depends on the last result from web > service. Different requests can be sent to different web service server > because it is stateless. > Therefore, the stateless web service need a cursor from HBase telling where > we have scanned in RegionScanner when HBase client receives an empty > heartbeat. And the service will return the cursor to mobile device although > the response has no data. In next request we can start at the position of > cursor, without the cursor we have to scan from last returned result and we > may timeout forever. And of course even if the heartbeat message is not empty > we can still use cursor to prevent re-scan the same rows/cells which has beed > skipped. > Obviously, we will give up consistency for scanning because even HBase client > is also stateless, but it is acceptable in this scene. And maybe we can keep > mvcc in cursor so we can get a consistent view? > HBASE-13099 had some discussion, but it has no further progress by now. > API: > In Scan we need a new method setNeedCursorResult(true) to get the cursor row > key when there is a RPC response but client can not return any Result. In > this mode we will not block ResultScanner.next() longer than this timeout > setting. > {code} > while (r = scanner.next() && r != null) { > if(r.isCursor()){ > // scanning is not end, it is a cursor, save its row key and close scanner > if you want, or > // just continue the loop to call next(). > } else { > // just like before > } > } > // scanning is end > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)