[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001133#comment-15001133
 ] 

Zhan Zhang commented on HBASE-14795:
------------------------------------

We can consolidate the two approaches by changing the way of constructing the 
RDD from buildScan. By this way, we are able to  keep the code change small and 
reuse the existing logic, e.g., logical ingoperation, filter, etc.

Details:
After all non-overlapping scans are constructed (already done currently), we 
aggregate the scans per region server based on the start and end (some scan may 
need to split into mulitples if it across multiple regions).  Then we construct 
an RDD with each partition consists of multiple scans falling into that 
region/partition. In this way, we can actually consolidate the two approaches 
while reuse existing logic except the way constructing the RDD.

> Provide an alternative spark-hbase SQL implementations for Scan
> ---------------------------------------------------------------
>
>                 Key: HBASE-14795
>                 URL: https://issues.apache.org/jira/browse/HBASE-14795
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Malaska
>            Assignee: Zhan Zhang
>            Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to