Hi Stack, 

On Jan 25, 2011, Stack wrote:

>> 2. mapreduce.HFileInputFormat
>> 
>> MR library to read data directly from HFiles. (Roughly 2.5 times faster than 
>> TableInputFormat in my tests)
>> 
>> Current status: Completed a proof-of-concept prototype and measured 
>> performance.
>> 
> 
> What about the in-memory edits?  Or you thinking of reading the WALs too?

My prototype doesn't read in-memory edits. So you have to flush the table 
before running your MR job. 

To read in-memory edits, I would create a special scanner in RS which reads 
KeyValues only from MemTable. I'll also add observer to RS to watch region 
flush event.

Also, my prototype doesn't deal with region compactions so the MR job will fail 
if the compaction threads delete old HFiles after minor/major compaction. I 
need to find a solution for this too.


- Tatsuya

--
Tatsuya Kawano (Mr.)
Tokyo, Japan

Reply via email to