I am attempting to read an hbase table in pyspark with a range scan. conf = { "hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table, "hbase.mapreduce.scan" : scan } hbase_rdd = sc.newAPIHadoopRDD( "org.apache.hadoop.hbase.mapreduce.TableInputFormat", "org.apache.hadoop.hbase.io.ImmutableBytesWritable", "org.apache.hadoop.hbase.client.Result", keyConverter=keyConv, valueConverter=valueConv, conf=conf)
If i jump over to scala or java and generate a base64 encoded protobuf scan object and convert it to a string, i can use that value for "hbase.mapreduce.scan" and everything works, the rdd will correctly perform the range scan and I am happy. The problem is that I can not find any reasonable way to generate that range scan string in python. The scala code required is: import org.apache.hadoop.hbase.util.Base64; import org.apache.hadoop.hbase.protobuf.ProtobufUtil; import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put, Result => HBaseResult, Scan} val scan = new Scan() scan.setStartRow("test_domain\0email".getBytes) scan.setStopRow("test_domain\0email~".getBytes) def scanToString(scan:Scan): String = { Base64.encodeBytes( ProtobufUtil.toScan(scan).toByteArray()) } scanToString(scan) Is there another way to perform an hbase range scan from pyspark or is that functionality something that might be supported in the future? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org