[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

stack (JIRA) Tue, 03 Mar 2015 08:58:23 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345329#comment-14345329
 ]


stack commented on HBASE-13071:
-------------------------------

[~eshcar] Any reason for why your formatting is unorthodox (compared to rest of 
code base?)  There is some here on formatting if that'll help: 
http://hbase.apache.org/book.html#_ides

Please add class comments describing what the class does.

isPrefetchRunning is the name of the method you would call to find the value of 
the boolean prefetchRunning; data members shouldn't have 'is' prefix (javabean 
idiom)

Do we have to add a new executor pool? Could we take one in on construction at 
least optionally (with perhaps the default being we pass in the tables 
executor?).  This could be done in a followup patch.  In general we create too 
many threads in the client and have been trying to go on a diet (but you know 
how diet's go)... in fact you take in a pool on construction...Can you exploit 
this passed-in pool rather than make one of your own?

On close, if a prefetch outstanding, we let it continue rather than interrupt 
it?

We already have AbstractClientScanner. Rather than make ClientScanner also 
abstract, could we not push what ClientScanner has down into ACS? Or add a 
'cache' or 'prefetch' interface that subclasses of ACS could implement?

Your formatting is a little irregular (smile).

IMO this should be ON by default.

I'm trying to get you some pretty pictures to show speedup.  Will be back.

Thanks for the patch.



> Hbase Streaming Scan Feature
> ----------------------------
>
>                 Key: HBASE-13071
>                 URL: https://issues.apache.org/jira/browse/HBASE-13071
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.98.11
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, 
> HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
> HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf
>
>
> A scan operation iterates over all rows of a table or a subrange of the 
> table. The synchronous nature in which the data is served at the client side 
> hinders the speed the application traverses the data: it increases the 
> overall processing time, and may cause a great variance in the times the 
> application waits for the next piece of data.
> The scanner next() method at the client side invokes an RPC to the 
> regionserver and then stores the results in a cache. The application can 
> specify how many rows will be transmitted per RPC; by default this is set to 
> 100 rows. 
> The cache can be considered as a producer-consumer queue, where the hbase 
> client pushes the data to the queue and the application consumes it. 
> Currently this queue is synchronous, i.e., blocking. More specifically, when 
> the application consumed all the data from the cache --- so the cache is 
> empty --- the hbase client retrieves additional data from the server and 
> re-fills the cache with new data. During this time the application is blocked.
> Under the assumption that the application processing time can be balanced by 
> the time it takes to retrieve the data, an asynchronous approach can reduce 
> the time the application is waiting for data.
> We attach a design document.
> We also have a patch that is based on a private branch, and some evaluation 
> results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

Reply via email to