IO pipeline optimizations

2011-07-19 Thread Shrinivas Joshi
This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on current
state of these potential optimizations? Are some of these expected to be
addressed in next gen MR release?

Thanks,
-Shrinivas


Re: IO pipeline optimizations

2011-07-19 Thread Todd Lipcon
Hi Shrinivas,

There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.

There have also been a lot of improvements around random read access - for
example HDFS-941 which improves random read by more than 2x.

I'm planning on writing a blog post in the next couple of weeks about some
of this work.

-Todd

On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi jshrini...@gmail.comwrote:

 This blog post on YDN website

 http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
 detailed discussion on different steps involved in Hadoop IO
 operations
 and opportunities for optimizations. Could someone please comment on
 current
 state of these potential optimizations? Are some of these expected to be
 addressed in next gen MR release?

 Thanks,
 -Shrinivas




-- 
Todd Lipcon
Software Engineer, Cloudera