One way to for me to address the problem (as a hack) is to add the
timestamp to the rowed and remove it in the "map" phase of the mapred.
So instead of versions I will just have many many rows for example:

row1__t1
row1__t2
row1__t3
row2__t1
row2__t3
etc.


This seems to be the way that the data will be ordered according to
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#physical

Just thought to share this idea with other who may run into the problem.
If you like the idea please add it to the wiki

Thanks
-Yair




-----Original Message-----
From: stack [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 21, 2008 1:09 PM
To: [email protected]
Subject: Re: How to get ALL row versions to a map reduce job?

There is no such facility at the moment.   See 
https://issues.apache.org/jira/browse/HBASE-52 and HBASE-33 for
discussion.
St.Ack


Yair Even-Zohar wrote:
> I would like to get all the versions of a row for a map-reduce task.
> Given the details below, I'm afraid I was just looking too hard and
> there's a simpler solution.
>
>  
>
> Here is what I found out:
>
> 1) Looking at TableInputFormat  at the internal class
> TableInputFormat.TableRecordReader  there is a call to
> m_table.obtainScanner(m_cols, startRow);
>
> 2)  I thought that changing the HTable.ClientScanner will do the work.
> Specifically changing the next() method.
>
>      However, in Hbase 1.2 the code is as follows:
>
>             do {
>
>                         values = server.next(scannerId);
>
>             } while (values != null && values.size() == 0 &&
> nextScanner());
>
>  
>
> 3) Next thing, I was looking at the HResgionServer which seems to be
> dependent on the HRegion class.
>
> 4) Finally, will changing the internal HRegion.HScanner() is the
> solution to this daunting problem  or is there something easier ?
>
>  
>
> Comments and suggestions are very welcome
>
> -Yair
>
>
>   

Reply via email to