[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366767#comment-14366767 ]
Eshcar Hillel commented on HBASE-13071: --------------------------------------- Yes it's all about setting the delays, but I don't want to change them to make the results look better.They are there just to make the point. From: Edward Bortnikov (JIRA) <j...@apache.org> To: esh...@yahoo-inc.com Sent: Monday, March 16, 2015 7:52 AM Subject: [jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature [ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362777#comment-14362777 ] Edward Bortnikov commented on HBASE-13071: ------------------------------------------ Eshcar, Do you have an idea why there are still steps in the async graph? This probably means that our delays are not long enough. Eddie On Monday, March 16, 2015 1:14 AM, Eshcar Hillel (JIRA) <j...@apache.org> wrote: [ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: ---------------------------------- Attachment: HBASE-13071_trunk_10.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- This message was sent by Atlassian JIRA (v6.3.4#6332) > Hbase Streaming Scan Feature > ---------------------------- > > Key: HBASE-13071 > URL: https://issues.apache.org/jira/browse/HBASE-13071 > Project: HBase > Issue Type: New Feature > Reporter: Eshcar Hillel > Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, > HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, > HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, > HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, > HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, > HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, > HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, > HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, > hits.eshcar.png, network.png > > > A scan operation iterates over all rows of a table or a subrange of the > table. The synchronous nature in which the data is served at the client side > hinders the speed the application traverses the data: it increases the > overall processing time, and may cause a great variance in the times the > application waits for the next piece of data. > The scanner next() method at the client side invokes an RPC to the > regionserver and then stores the results in a cache. The application can > specify how many rows will be transmitted per RPC; by default this is set to > 100 rows. > The cache can be considered as a producer-consumer queue, where the hbase > client pushes the data to the queue and the application consumes it. > Currently this queue is synchronous, i.e., blocking. More specifically, when > the application consumed all the data from the cache --- so the cache is > empty --- the hbase client retrieves additional data from the server and > re-fills the cache with new data. During this time the application is blocked. > Under the assumption that the application processing time can be balanced by > the time it takes to retrieve the data, an asynchronous approach can reduce > the time the application is waiting for data. > We attach a design document. > We also have a patch that is based on a private branch, and some evaluation > results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)