Re: Reading from Hbase using python

Ted Yu Wed, 12 Nov 2014 13:34:59 -0800

Looking at HBaseResultToStringConverter :

  override def convert(obj: Any): String = {
    val result = obj.asInstanceOf[Result]
    Bytes.toStringBinary(result.value())
  }
Here is the code for Result.value():


  public byte [] value() {

    if (isEmpty()) {

      return null;

    }

    return CellUtil.cloneValue(cells[0]);

This explains why you only got one row.

In the thread you mentioned, see the code posted by freedafeng which
iterates the Cells in Result.

Cheers

On Wed, Nov 12, 2014 at 1:04 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> To my knowledge, Spark 1.1 comes with HBase 0.94
> To utilize HBase 0.98, you will need:
> https://issues.apache.org/jira/browse/SPARK-1297
>
> You can apply the patch and build Spark yourself.
>
> Cheers
>
> On Wed, Nov 12, 2014 at 12:57 PM, Alan Prando <a...@scanboo.com.br> wrote:
>
>> Hi Ted! Thanks for anwsering...
>>
>> Maybe I didn't make myself clear... What I need is read a table from
>> HBase using Python in Spark.
>> I'm using HBase 0.98 and Spark 1.1
>>
>> My code is as following:
>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py
>> My problem is that, when I have two (or more) qualifiers in a rowkey,
>> this example return just one qualifier.
>>
>> In fact, I've already find a question similar (
>> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-td18613.html#a18650),
>> however I'm not able yet to find the solution.
>>
>> Do u have any idea?
>>
>>
>> 2014-11-12 18:26 GMT-02:00 Ted Yu <yuzhih...@gmail.com>:
>>
>> Can you give us a bit more detail:
>>>
>>> hbase release you're using.
>>> whether you can reproduce using hbase shell.
>>>
>>> I did the following using hbase shell against 0.98.4:
>>>
>>> hbase(main):001:0> create 'test', 'f1'
>>> 0 row(s) in 2.9140 seconds
>>>
>>> => Hbase::Table - test
>>> hbase(main):002:0> put 'test', 'row1', 'f1:1', 'value1'
>>> 0 row(s) in 0.1040 seconds
>>>
>>> hbase(main):003:0> put 'test', 'row1', 'f1:2', 'value2'
>>> 0 row(s) in 0.0080 seconds
>>>
>>> hbase(main):004:0> scan 'test'
>>> ROW                                      COLUMN+CELL
>>>  row1                                    column=f1:1,
>>> timestamp=1415823887048, value=value1
>>>  row1                                    column=f1:2,
>>> timestamp=1415823893857, value=value2
>>>
>>> Cheers
>>>
>>> On Wed, Nov 12, 2014 at 11:32 AM, Alan Prando <a...@scanboo.com.br>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm trying to read an hbase table using this an example from github (
>>>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py),
>>>> however I have two qualifiers in a column family.
>>>>
>>>> Ex.:
>>>>
>>>>  ROW COLUMN+CELL  row1 column=f1:1, timestamp=1401883411986,
>>>> value=value1  row1 column=f1:2, timestamp=1401883415212, value=value2  row2
>>>> column=f1:1, timestamp=1401883417858, value=value3  row3 column=f1:1,
>>>> timestamp=1401883420805, value=value4
>>>> When I run the code hbase_inputformat.py, the following loop print row1
>>>> just once:
>>>>
>>>> output = hbase_rdd.collect()  for (k, v) in output:  print (k, v)
>>>> Am I doing anything wrong?
>>>>
>>>> Thanks in advance.
>>>>
>>>
>>>
>>
>

Re: Reading from Hbase using python

Reply via email to