Re: pyspark hbase range scan
Hi, Maybe this might be helpful: https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala Cheers Gen On Thu, Apr 2, 2015 at 1:50 AM, Eric Kimbrel eric.kimb...@soteradefense.com wrote: I am attempting to read an hbase table in pyspark with a range scan. conf = { hbase.zookeeper.quorum: host, hbase.mapreduce.inputtable: table, hbase.mapreduce.scan : scan } hbase_rdd = sc.newAPIHadoopRDD( org.apache.hadoop.hbase.mapreduce.TableInputFormat, org.apache.hadoop.hbase.io.ImmutableBytesWritable, org.apache.hadoop.hbase.client.Result, keyConverter=keyConv, valueConverter=valueConv, conf=conf) If i jump over to scala or java and generate a base64 encoded protobuf scan object and convert it to a string, i can use that value for hbase.mapreduce.scan and everything works, the rdd will correctly perform the range scan and I am happy. The problem is that I can not find any reasonable way to generate that range scan string in python. The scala code required is: import org.apache.hadoop.hbase.util.Base64; import org.apache.hadoop.hbase.protobuf.ProtobufUtil; import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put, Result = HBaseResult, Scan} val scan = new Scan() scan.setStartRow(test_domain\0email.getBytes) scan.setStopRow(test_domain\0email~.getBytes) def scanToString(scan:Scan): String = { Base64.encodeBytes( ProtobufUtil.toScan(scan).toByteArray()) } scanToString(scan) Is there another way to perform an hbase range scan from pyspark or is that functionality something that might be supported in the future? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
pyspark hbase range scan
I am attempting to read an hbase table in pyspark with a range scan. conf = { hbase.zookeeper.quorum: host, hbase.mapreduce.inputtable: table, hbase.mapreduce.scan : scan } hbase_rdd = sc.newAPIHadoopRDD( org.apache.hadoop.hbase.mapreduce.TableInputFormat, org.apache.hadoop.hbase.io.ImmutableBytesWritable, org.apache.hadoop.hbase.client.Result, keyConverter=keyConv, valueConverter=valueConv, conf=conf) If i jump over to scala or java and generate a base64 encoded protobuf scan object and convert it to a string, i can use that value for hbase.mapreduce.scan and everything works, the rdd will correctly perform the range scan and I am happy. The problem is that I can not find any reasonable way to generate that range scan string in python. The scala code required is: import org.apache.hadoop.hbase.util.Base64; import org.apache.hadoop.hbase.protobuf.ProtobufUtil; import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put, Result = HBaseResult, Scan} val scan = new Scan() scan.setStartRow(test_domain\0email.getBytes) scan.setStopRow(test_domain\0email~.getBytes) def scanToString(scan:Scan): String = { Base64.encodeBytes( ProtobufUtil.toScan(scan).toByteArray()) } scanToString(scan) Is there another way to perform an hbase range scan from pyspark or is that functionality something that might be supported in the future? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark hbase range scan
Have you looked at http://happybase.readthedocs.org/en/latest/ ? Cheers On Apr 1, 2015, at 4:50 PM, Eric Kimbrel eric.kimb...@soteradefense.com wrote: I am attempting to read an hbase table in pyspark with a range scan. conf = { hbase.zookeeper.quorum: host, hbase.mapreduce.inputtable: table, hbase.mapreduce.scan : scan } hbase_rdd = sc.newAPIHadoopRDD( org.apache.hadoop.hbase.mapreduce.TableInputFormat, org.apache.hadoop.hbase.io.ImmutableBytesWritable, org.apache.hadoop.hbase.client.Result, keyConverter=keyConv, valueConverter=valueConv, conf=conf) If i jump over to scala or java and generate a base64 encoded protobuf scan object and convert it to a string, i can use that value for hbase.mapreduce.scan and everything works, the rdd will correctly perform the range scan and I am happy. The problem is that I can not find any reasonable way to generate that range scan string in python. The scala code required is: import org.apache.hadoop.hbase.util.Base64; import org.apache.hadoop.hbase.protobuf.ProtobufUtil; import org.apache.hadoop.hbase.client.{Delete, HBaseAdmin, HTable, Put, Result = HBaseResult, Scan} val scan = new Scan() scan.setStartRow(test_domain\0email.getBytes) scan.setStopRow(test_domain\0email~.getBytes) def scanToString(scan:Scan): String = { Base64.encodeBytes( ProtobufUtil.toScan(scan).toByteArray()) } scanToString(scan) Is there another way to perform an hbase range scan from pyspark or is that functionality something that might be supported in the future? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-hbase-range-scan-tp22348.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org