Re: pyspark hbase range scan

2015-04-02 Thread gen tang
Hi,

Maybe this might be helpful:
https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala

Cheers
Gen

On Thu, Apr 2, 2015 at 1:50 AM, Eric Kimbrel eric.kimb...@soteradefense.com
 wrote:

 I am attempting to read an hbase table in pyspark with a range scan.

 conf = {
 hbase.zookeeper.quorum: host,
 hbase.mapreduce.inputtable: table,
 hbase.mapreduce.scan : scan
 }
 hbase_rdd = sc.newAPIHadoopRDD(
 org.apache.hadoop.hbase.mapreduce.TableInputFormat,
 org.apache.hadoop.hbase.io.ImmutableBytesWritable,
 org.apache.hadoop.hbase.client.Result,
 keyConverter=keyConv,
 valueConverter=valueConv,
 conf=conf)

 If i jump over to scala or java and generate a base64 encoded protobuf scan
 object and convert it to a string, i can use that value for
 hbase.mapreduce.scan and everything works,  the rdd will correctly
 perform
 the range scan and I am happy.  The problem is that I can not find any
 reasonable way to generate that range scan string in python.   The scala
 code required is:

 import org.apache.hadoop.hbase.util.Base64;
 import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
 import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put,
 Result = HBaseResult, Scan}

 val scan = new Scan()
 scan.setStartRow(test_domain\0email.getBytes)
 scan.setStopRow(test_domain\0email~.getBytes)
 def scanToString(scan:Scan): String = { Base64.encodeBytes(
 ProtobufUtil.toScan(scan).toByteArray()) }
 scanToString(scan)


 Is there another way to perform an hbase range scan from pyspark or is that
 functionality something that might be supported in the future?




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




pyspark hbase range scan

2015-04-01 Thread Eric Kimbrel
I am attempting to read an hbase table in pyspark with a range scan.  

conf = {
hbase.zookeeper.quorum: host, 
hbase.mapreduce.inputtable: table,
hbase.mapreduce.scan : scan
}
hbase_rdd = sc.newAPIHadoopRDD(
org.apache.hadoop.hbase.mapreduce.TableInputFormat,
org.apache.hadoop.hbase.io.ImmutableBytesWritable,
org.apache.hadoop.hbase.client.Result,
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf)

If i jump over to scala or java and generate a base64 encoded protobuf scan
object and convert it to a string, i can use that value for
hbase.mapreduce.scan and everything works,  the rdd will correctly perform
the range scan and I am happy.  The problem is that I can not find any
reasonable way to generate that range scan string in python.   The scala
code required is:

import org.apache.hadoop.hbase.util.Base64;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put,
Result = HBaseResult, Scan}

val scan = new Scan()
scan.setStartRow(test_domain\0email.getBytes)
scan.setStopRow(test_domain\0email~.getBytes)
def scanToString(scan:Scan): String = { Base64.encodeBytes( 
ProtobufUtil.toScan(scan).toByteArray()) }
scanToString(scan)


Is there another way to perform an hbase range scan from pyspark or is that
functionality something that might be supported in the future?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: pyspark hbase range scan

2015-04-01 Thread Ted Yu
Have you looked at http://happybase.readthedocs.org/en/latest/ ?

Cheers



 On Apr 1, 2015, at 4:50 PM, Eric Kimbrel eric.kimb...@soteradefense.com 
 wrote:
 
 I am attempting to read an hbase table in pyspark with a range scan.  
 
 conf = {
hbase.zookeeper.quorum: host, 
hbase.mapreduce.inputtable: table,
hbase.mapreduce.scan : scan
 }
 hbase_rdd = sc.newAPIHadoopRDD(
org.apache.hadoop.hbase.mapreduce.TableInputFormat,
org.apache.hadoop.hbase.io.ImmutableBytesWritable,
org.apache.hadoop.hbase.client.Result,
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf)
 
 If i jump over to scala or java and generate a base64 encoded protobuf scan
 object and convert it to a string, i can use that value for
 hbase.mapreduce.scan and everything works,  the rdd will correctly perform
 the range scan and I am happy.  The problem is that I can not find any
 reasonable way to generate that range scan string in python.   The scala
 code required is:
 
 import org.apache.hadoop.hbase.util.Base64;
 import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
 import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put,
 Result = HBaseResult, Scan}
 
 val scan = new Scan()
 scan.setStartRow(test_domain\0email.getBytes)
 scan.setStopRow(test_domain\0email~.getBytes)
 def scanToString(scan:Scan): String = { Base64.encodeBytes( 
 ProtobufUtil.toScan(scan).toByteArray()) }
 scanToString(scan)
 
 
 Is there another way to perform an hbase range scan from pyspark or is that
 functionality something that might be supported in the future?
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org