[ 
https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501375#comment-13501375
 ] 

Lars Hofhansl commented on HBASE-7180:
--------------------------------------

Yeah I don't like it either. We somehow need to expose more internals to 
coprocessors in a clean way.

* KeyValue already is needed for RegionScanner, since it extends internal 
scanner.
* start/closeRegionOperation should be available to coprocessors anyway (I 
think). Otherwise it is hard to implement these types of things in coprocessors.
* I mainly do not like nextInternal on the interface. Is there a better way to 
expose the inner workings of RegionScannerImpl to avoid expensive setup at each 
iteration?

Another option is to keep the RegionScanner interface as it, and just make 
these methods public in RegionScannerImpl. A coprocessor can then cast the 
RegionScanner to RegionScannerImpl and access the stuff it needs.

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into 
> HBase via coprocessors. One method is to wrap RegionScanner in coprocessor 
> hooks and then do processing in the hook to avoid returning a lot of data to 
> the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's 
> next() does not "know" that it is called this way is still does all of this 
> on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call 
> could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) 
> it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of 
> sorts, so that all this overhead can be avoided. The coprocessor could call 
> the regular next() methods once and then just call the cheaper internal 
> version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to