Hello there, I am wondering how to get the column family names and column qualifier names when using pyspark to read an hbase table with multiple column families.
I have a hbase table as follows, hbase(main):007:0> scan 'data1' ROW COLUMN+CELL row1 column=f1:, timestamp=1411078148186, value=value1 row1 column=f2:, timestamp=1415732470877, value=value7 row2 column=f2:, timestamp=1411078160265, value=value2 when I ran the examples/hbase_inputformat.py code: conf2 = {"hbase.zookeeper.quorum": "localhost", "hbase.mapreduce.inputtable": 'data1'} hbase_rdd = sc.newAPIHadoopRDD( "org.apache.hadoop.hbase.mapreduce.TableInputFormat", "org.apache.hadoop.hbase.io.ImmutableBytesWritable", "org.apache.hadoop.hbase.client.Result", keyConverter="org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter", valueConverter="org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter", conf=conf2) output = hbase_rdd.collect() for (k, v) in output: print (k, v) I only see (u'row1', u'value1') (u'row2', u'value2') What I really want is (row_id, column family:column qualifier, value) tuples. Any comments? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org