[ 
https://issues.apache.org/jira/browse/HBASE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508476#comment-13508476
 ] 

Lars Hofhansl commented on HBASE-7180:
--------------------------------------

How about another approach:
# introduce a RawRegionScanner interface, which extends RegionScanner.
# RawRegionScanner has all the additional methods on it we need.
# Add a getRawScannner to the RegionScanner interface.
# RegionScannerImpl would then implement RawRegionScanner.

To the coprocessor framework we'd still hand a RegionScanner, but now the 
coprocessor can get the raw scanner via getRawScanner(). The 
RegionScannerImpl's implementation of getRawScanner() just returns "this".
Is that better? Or does anybody have another a cleaner idea?

closeRegionOperation and startRegionOperation would still need to be public, so 
that coprocessors can start/stop region operations.

                
> RegionScannerImpl.next() is inefficient.
> ----------------------------------------
>
>                 Key: HBASE-7180
>                 URL: https://issues.apache.org/jira/browse/HBASE-7180
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 7180-0.94-SKETCH.txt, 7180-0.94-v1.txt
>
>
> We just came across a special scenario.
> For our Phoenix project (SQL runtime for HBase), we push a lot of work into 
> HBase via coprocessors. One method is to wrap RegionScanner in coprocessor 
> hooks and then do processing in the hook to avoid returning a lot of data to 
> the client unnecessarily.
> In this specific case this is pretty bad. Since the wrapped RegionScanner's 
> next() does not "know" that it is called this way is still does all of this 
> on each invocation:
> # Starts a RegionOperation
> # Increments the request count
> # set the current read point on a thread local (because generally each call 
> could come from a different thread)
> # Finally does the next on its StoreScanner(s)
> # Ends the RegionOperation
> When this is done in a tight loop millions of times (as is the case for us) 
> it starts to become significant.
> Not sure what to do about this, really. Opening this issue for discussion.
> One way is to extend the RegionScanner with an "internal" next() method of 
> sorts, so that all this overhead can be avoided. The coprocessor could call 
> the regular next() methods once and then just call the cheaper internal 
> version.
> Are there better/cleaner ways?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to