Re: Items to contribute (plan)

Tatsuya Kawano Tue, 25 Jan 2011 19:35:54 -0800

Hi Yifeng, 

> #4. Writing Japanese books and documents
> I am glad if I can work on this one with you.



Thanks for your offer. Let me explain a bit more about them. 


>> -- Currently I'm authoring a book chapter about HBase for a Japanese NOSQL 
>> book

This one is a commercial book from a Japanese publisher, so I'll do this by 
myself.


>> -- I'll translate The Apache HBase Book to Japanese


This one comes with HBase, and I'm looking for some people (like you) to work 
with.

http://hbase.apache.org/book.html

I created a Jira entry to track this task: 
https://issues.apache.org/jira/browse/HBASE-3391


Are you working at Rakuten in Tokyo? Maybe we can meet at next Hadoop Source 
Code Reading at Rakuten Tower. Do you know this event? 

Thanks, 
Tatsuya

--
Tatsuya Kawano (Mr.)
Tokyo, Japan


On Jan 25, 2011, at 11:03 AM, Yifeng Jiang <[email protected]> 
wrote:

> #4. Writing Japanese books and documents
> I am glad if I can work on this one with you.
> 
> 
> On 01/23/2011 10:18 AM, Tatsuya Kawano wrote:
>> Hi,
>> 
>> I wanted to let you know that I'm planning to contribute the following items 
>> to the HBase community. These are my spare time projects and I'll only be 
>> able to spend my time about 7 hours a week, so the progress will be very 
>> slow. I want some feedback from you guys to prioritize them. Also, if 
>> someone/team wants to work on them (with me or alone), I'll be happy to 
>> provide more details.
>> 
>> 
>> 1. RADOS integration
>> 
>> Run HBase not only on HDFS but also RADOS distributed object store (the 
>> lower layer of Ceph), so that the following options will become available to 
>> HBase users:
>> 
>> -- No SPOF (RADOS doesn't have the name node(s), but only ZK-like monitors 
>> and data nodes)
>> -- Instant backup of HBase tables (RADOS provides copy-on-write snapshot per 
>> object pool)
>> -- Extra durability option on WAL (RADOS can do both synchronous and 
>> asynchronous disk flush. HDFS doesn't have the earlier option)
>> 
>> Note:
>> RADOS object = HFile, WAL
>> object pool = group of HFiles or WAL
>> 
>> Current status: Design phase
>> 
>> 
>> 2. mapreduce.HFileInputFormat
>> 
>> MR library to read data directly from HFiles. (Roughly 2.5 times faster than 
>> TableInputFormat in my tests)
>> 
>> Current status: Completed a proof-of-concept prototype and measured 
>> performance.
>> 
>> 
>> 3. Enhance Get/Scan performance of RS
>> 
>> Add an hash code and a couple of flags to HFile at the flush time and change 
>> scanner implementation so that:
>> 
>> -- Get/Scan operations will get faster. (less key comparisons for 
>> reconstructing a row: O(h * c) ->  O(h).  [h = number of HFiles for the row, 
>> c = number of columns in an HFile])
>> -- The size of HFiles will become a bit smaller. (The flags will eliminate 
>> duplicate bytes in keys (row, column family and qualifier) from HFiles.)
>> 
>> Current status: Completed a proof-of-concept prototype and measured 
>> performance.
>> 
>> Detals:
>> https://github.com/tatsuya6502/hbase-mr-pof/
>> (I meant "poc" not "pof"...)
>> 
>> 
>> 4. Writing Japanese books and documents
>> 
>> -- Currently I'm authoring a book chapter about HBase for a Japanese NOSQL 
>> book
>> -- I'll translate The Apache HBase Book to Japanese
>> 
>> 
>> Thank you,
>> 
>> 
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>> 
>> http://twitter.com/#!/tatsuya6502
>> 
>> 
>> 
> 
> 
> -- 
> Yifeng Jiang
>

Re: Items to contribute (plan)

Reply via email to