I am attempting to read an hbase table in pyspark with a range scan.  

conf = {
    "hbase.zookeeper.quorum": host, 
    "hbase.mapreduce.inputtable": table,
    "hbase.mapreduce.scan" : scan
}
hbase_rdd = sc.newAPIHadoopRDD(
        "org.apache.hadoop.hbase.mapreduce.TableInputFormat",
        "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
        "org.apache.hadoop.hbase.client.Result",
        keyConverter=keyConv,
        valueConverter=valueConv,
        conf=conf)

If i jump over to scala or java and generate a base64 encoded protobuf scan
object and convert it to a string, i can use that value for
"hbase.mapreduce.scan" and everything works,  the rdd will correctly perform
the range scan and I am happy.  The problem is that I can not find any
reasonable way to generate that range scan string in python.   The scala
code required is:

import org.apache.hadoop.hbase.util.Base64;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put,
Result => HBaseResult, Scan}

val scan = new Scan()
scan.setStartRow("test_domain\0email".getBytes)
scan.setStopRow("test_domain\0email~".getBytes)
def scanToString(scan:Scan): String = { Base64.encodeBytes( 
ProtobufUtil.toScan(scan).toByteArray()) }
scanToString(scan)


Is there another way to perform an hbase range scan from pyspark or is that
functionality something that might be supported in the future?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to