[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-11 Thread Zhan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001133#comment-15001133
 ] 

Zhan Zhang commented on HBASE-14795:


We can consolidate the two approaches by changing the way of constructing the 
RDD from buildScan. By this way, we are able to  keep the code change small and 
reuse the existing logic, e.g., logical ingoperation, filter, etc.

Details:
After all non-overlapping scans are constructed (already done currently), we 
aggregate the scans per region server based on the start and end (some scan may 
need to split into mulitples if it across multiple regions).  Then we construct 
an RDD with each partition consists of multiple scans falling into that 
region/partition. In this way, we can actually consolidate the two approaches 
while reuse existing logic except the way constructing the RDD.

> Provide an alternative spark-hbase SQL implementations for Scan
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001136#comment-15001136
 ] 

Ted Malaska commented on HBASE-14795:
-

I would like that.  Thanks Zhan.

> Provide an alternative spark-hbase SQL implementations for Scan
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999621#comment-14999621
 ] 

Ted Malaska commented on HBASE-14795:
-

There is no real negative to this proposed approach other then a second 
implementation of table scan.  To bad the existing TableInputFormat can not be 
updated to handle this because then this would be in one local.

As for implementation these is no reason this can't just be invoked straight 
from line 330 from DefaultSource or could be an alternate implementation in 
hbaseRDD that tables multi scan objects. 

https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala#L330


> Provide an alternative spark-hbase SQL implementations for Scan
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)