[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367166#comment-14367166 ]
Eshcar Hillel commented on HBASE-13071: --------------------------------------- Hi everyone, What would be the next thing to do to get this patch in (now that all the lights are green ;) )? Thanks, Eshcar > Hbase Streaming Scan Feature > ---------------------------- > > Key: HBASE-13071 > URL: https://issues.apache.org/jira/browse/HBASE-13071 > Project: HBase > Issue Type: New Feature > Reporter: Eshcar Hillel > Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, > HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, > HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, > HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, > HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, > HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, > HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, > HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, > hits.eshcar.png, network.png > > > A scan operation iterates over all rows of a table or a subrange of the > table. The synchronous nature in which the data is served at the client side > hinders the speed the application traverses the data: it increases the > overall processing time, and may cause a great variance in the times the > application waits for the next piece of data. > The scanner next() method at the client side invokes an RPC to the > regionserver and then stores the results in a cache. The application can > specify how many rows will be transmitted per RPC; by default this is set to > 100 rows. > The cache can be considered as a producer-consumer queue, where the hbase > client pushes the data to the queue and the application consumes it. > Currently this queue is synchronous, i.e., blocking. More specifically, when > the application consumed all the data from the cache --- so the cache is > empty --- the hbase client retrieves additional data from the server and > re-fills the cache with new data. During this time the application is blocked. > Under the assumption that the application processing time can be balanced by > the time it takes to retrieve the data, an asynchronous approach can reduce > the time the application is waiting for data. > We attach a design document. > We also have a patch that is based on a private branch, and some evaluation > results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)