Re: pyspark hbase range scan
Hi, Maybe this might be helpful: https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala Cheers Gen On Thu, Apr 2, 2015 at 1:50 AM, Eric Kimbrel wrote: > I am attempting to read an hbase table in pyspark with a range scan. > > conf = { > "hbase.zookeeper.quorum": host, > "hbase.mapreduce.inputtable": table, > "hbase.mapreduce.scan" : scan > } > hbase_rdd = sc.newAPIHadoopRDD( > "org.apache.hadoop.hbase.mapreduce.TableInputFormat", > "org.apache.hadoop.hbase.io.ImmutableBytesWritable", > "org.apache.hadoop.hbase.client.Result", > keyConverter=keyConv, > valueConverter=valueConv, > conf=conf) > > If i jump over to scala or java and generate a base64 encoded protobuf scan > object and convert it to a string, i can use that value for > "hbase.mapreduce.scan" and everything works, the rdd will correctly > perform > the range scan and I am happy. The problem is that I can not find any > reasonable way to generate that range scan string in python. The scala > code required is: > > import org.apache.hadoop.hbase.util.Base64; > import org.apache.hadoop.hbase.protobuf.ProtobufUtil; > import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put, > Result => HBaseResult, Scan} > > val scan = new Scan() > scan.setStartRow("test_domain\0email".getBytes) > scan.setStopRow("test_domain\0email~".getBytes) > def scanToString(scan:Scan): String = { Base64.encodeBytes( > ProtobufUtil.toScan(scan).toByteArray()) } > scanToString(scan) > > > Is there another way to perform an hbase range scan from pyspark or is that > functionality something that might be supported in the future? > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: pyspark hbase range scan
Have you looked at http://happybase.readthedocs.org/en/latest/ ? Cheers > On Apr 1, 2015, at 4:50 PM, Eric Kimbrel > wrote: > > I am attempting to read an hbase table in pyspark with a range scan. > > conf = { >"hbase.zookeeper.quorum": host, >"hbase.mapreduce.inputtable": table, >"hbase.mapreduce.scan" : scan > } > hbase_rdd = sc.newAPIHadoopRDD( >"org.apache.hadoop.hbase.mapreduce.TableInputFormat", >"org.apache.hadoop.hbase.io.ImmutableBytesWritable", >"org.apache.hadoop.hbase.client.Result", >keyConverter=keyConv, >valueConverter=valueConv, >conf=conf) > > If i jump over to scala or java and generate a base64 encoded protobuf scan > object and convert it to a string, i can use that value for > "hbase.mapreduce.scan" and everything works, the rdd will correctly perform > the range scan and I am happy. The problem is that I can not find any > reasonable way to generate that range scan string in python. The scala > code required is: > > import org.apache.hadoop.hbase.util.Base64; > import org.apache.hadoop.hbase.protobuf.ProtobufUtil; > import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put, > Result => HBaseResult, Scan} > > val scan = new Scan() > scan.setStartRow("test_domain\0email".getBytes) > scan.setStopRow("test_domain\0email~".getBytes) > def scanToString(scan:Scan): String = { Base64.encodeBytes( > ProtobufUtil.toScan(scan).toByteArray()) } > scanToString(scan) > > > Is there another way to perform an hbase range scan from pyspark or is that > functionality something that might be supported in the future? > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org