Re: pyspark hbase range scan

2015-04-02 Thread gen tang
Hi,

Maybe this might be helpful:
https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala

Cheers
Gen

On Thu, Apr 2, 2015 at 1:50 AM, Eric Kimbrel  wrote:

> I am attempting to read an hbase table in pyspark with a range scan.
>
> conf = {
> "hbase.zookeeper.quorum": host,
> "hbase.mapreduce.inputtable": table,
> "hbase.mapreduce.scan" : scan
> }
> hbase_rdd = sc.newAPIHadoopRDD(
> "org.apache.hadoop.hbase.mapreduce.TableInputFormat",
> "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
> "org.apache.hadoop.hbase.client.Result",
> keyConverter=keyConv,
> valueConverter=valueConv,
> conf=conf)
>
> If i jump over to scala or java and generate a base64 encoded protobuf scan
> object and convert it to a string, i can use that value for
> "hbase.mapreduce.scan" and everything works,  the rdd will correctly
> perform
> the range scan and I am happy.  The problem is that I can not find any
> reasonable way to generate that range scan string in python.   The scala
> code required is:
>
> import org.apache.hadoop.hbase.util.Base64;
> import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
> import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put,
> Result => HBaseResult, Scan}
>
> val scan = new Scan()
> scan.setStartRow("test_domain\0email".getBytes)
> scan.setStopRow("test_domain\0email~".getBytes)
> def scanToString(scan:Scan): String = { Base64.encodeBytes(
> ProtobufUtil.toScan(scan).toByteArray()) }
> scanToString(scan)
>
>
> Is there another way to perform an hbase range scan from pyspark or is that
> functionality something that might be supported in the future?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: pyspark hbase range scan

2015-04-01 Thread Ted Yu
Have you looked at http://happybase.readthedocs.org/en/latest/ ?

Cheers



> On Apr 1, 2015, at 4:50 PM, Eric Kimbrel  
> wrote:
> 
> I am attempting to read an hbase table in pyspark with a range scan.  
> 
> conf = {
>"hbase.zookeeper.quorum": host, 
>"hbase.mapreduce.inputtable": table,
>"hbase.mapreduce.scan" : scan
> }
> hbase_rdd = sc.newAPIHadoopRDD(
>"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
>"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
>"org.apache.hadoop.hbase.client.Result",
>keyConverter=keyConv,
>valueConverter=valueConv,
>conf=conf)
> 
> If i jump over to scala or java and generate a base64 encoded protobuf scan
> object and convert it to a string, i can use that value for
> "hbase.mapreduce.scan" and everything works,  the rdd will correctly perform
> the range scan and I am happy.  The problem is that I can not find any
> reasonable way to generate that range scan string in python.   The scala
> code required is:
> 
> import org.apache.hadoop.hbase.util.Base64;
> import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
> import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put,
> Result => HBaseResult, Scan}
> 
> val scan = new Scan()
> scan.setStartRow("test_domain\0email".getBytes)
> scan.setStopRow("test_domain\0email~".getBytes)
> def scanToString(scan:Scan): String = { Base64.encodeBytes( 
> ProtobufUtil.toScan(scan).toByteArray()) }
> scanToString(scan)
> 
> 
> Is there another way to perform an hbase range scan from pyspark or is that
> functionality something that might be supported in the future?
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org