One way to for me to address the problem (as a hack) is to add the timestamp to the rowed and remove it in the "map" phase of the mapred. So instead of versions I will just have many many rows for example:
row1__t1 row1__t2 row1__t3 row2__t1 row2__t3 etc. This seems to be the way that the data will be ordered according to http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#physical Just thought to share this idea with other who may run into the problem. If you like the idea please add it to the wiki Thanks -Yair -----Original Message----- From: stack [mailto:[EMAIL PROTECTED] Sent: Monday, July 21, 2008 1:09 PM To: [email protected] Subject: Re: How to get ALL row versions to a map reduce job? There is no such facility at the moment. See https://issues.apache.org/jira/browse/HBASE-52 and HBASE-33 for discussion. St.Ack Yair Even-Zohar wrote: > I would like to get all the versions of a row for a map-reduce task. > Given the details below, I'm afraid I was just looking too hard and > there's a simpler solution. > > > > Here is what I found out: > > 1) Looking at TableInputFormat at the internal class > TableInputFormat.TableRecordReader there is a call to > m_table.obtainScanner(m_cols, startRow); > > 2) I thought that changing the HTable.ClientScanner will do the work. > Specifically changing the next() method. > > However, in Hbase 1.2 the code is as follows: > > do { > > values = server.next(scannerId); > > } while (values != null && values.size() == 0 && > nextScanner()); > > > > 3) Next thing, I was looking at the HResgionServer which seems to be > dependent on the HRegion class. > > 4) Finally, will changing the internal HRegion.HScanner() is the > solution to this daunting problem or is there something easier ? > > > > Comments and suggestions are very welcome > > -Yair > > >
