pyspark get column family and qualifier names from hbase table

freedafeng Tue, 11 Nov 2014 11:32:26 -0800

Hello there,

I am wondering how to get the column family names and column qualifier names
when using pyspark to read an hbase table with multiple column families.


I have a hbase table as follows,
hbase(main):007:0> scan 'data1'
ROW                   COLUMN+CELL                                               
 row1                 column=f1:, timestamp=1411078148186, value=value1         
 row1                 column=f2:, timestamp=1415732470877, value=value7         
 row2                 column=f2:, timestamp=1411078160265, value=value2         

when I ran the examples/hbase_inputformat.py code: 
    conf2 = {"hbase.zookeeper.quorum": "localhost",
"hbase.mapreduce.inputtable": 'data1'}
    hbase_rdd = sc.newAPIHadoopRDD(
        "org.apache.hadoop.hbase.mapreduce.TableInputFormat",
        "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
        "org.apache.hadoop.hbase.client.Result",
   
keyConverter="org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter",
       
valueConverter="org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter",
        conf=conf2)
    output = hbase_rdd.collect()
    for (k, v) in output:
        print (k, v)
I only see 
(u'row1', u'value1')
(u'row2', u'value2')

What I really want is (row_id, column family:column qualifier, value)
tuples. Any comments? Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

pyspark get column family and qualifier names from hbase table

Reply via email to