[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001133#comment-15001133 ]
Zhan Zhang commented on HBASE-14795: ------------------------------------ We can consolidate the two approaches by changing the way of constructing the RDD from buildScan. By this way, we are able to keep the code change small and reuse the existing logic, e.g., logical ingoperation, filter, etc. Details: After all non-overlapping scans are constructed (already done currently), we aggregate the scans per region server based on the start and end (some scan may need to split into mulitples if it across multiple regions). Then we construct an RDD with each partition consists of multiple scans falling into that region/partition. In this way, we can actually consolidate the two approaches while reuse existing logic except the way constructing the RDD. > Provide an alternative spark-hbase SQL implementations for Scan > --------------------------------------------------------------- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement > Reporter: Ted Malaska > Assignee: Zhan Zhang > Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)