[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001133#comment-15001133 ] Zhan Zhang commented on HBASE-14795: We can consolidate the two approaches by changing the way of constructing the RDD from buildScan. By this way, we are able to keep the code change small and reuse the existing logic, e.g., logical ingoperation, filter, etc. Details: After all non-overlapping scans are constructed (already done currently), we aggregate the scans per region server based on the start and end (some scan may need to split into mulitples if it across multiple regions). Then we construct an RDD with each partition consists of multiple scans falling into that region/partition. In this way, we can actually consolidate the two approaches while reuse existing logic except the way constructing the RDD. > Provide an alternative spark-hbase SQL implementations for Scan > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001136#comment-15001136 ] Ted Malaska commented on HBASE-14795: - I would like that. Thanks Zhan. > Provide an alternative spark-hbase SQL implementations for Scan > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999621#comment-14999621 ] Ted Malaska commented on HBASE-14795: - There is no real negative to this proposed approach other then a second implementation of table scan. To bad the existing TableInputFormat can not be updated to handle this because then this would be in one local. As for implementation these is no reason this can't just be invoked straight from line 330 from DefaultSource or could be an alternate implementation in hbaseRDD that tables multi scan objects. https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala#L330 > Provide an alternative spark-hbase SQL implementations for Scan > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)