Hi Shrinivas,
There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.
There have also been a lot of improvements around random read access - for
example HDFS-941 which improves random read by more than 2x.
I'm planning on writing a blog post in the next couple of weeks about some
of this work.
-Todd
On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi jshrini...@gmail.comwrote:
This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on
current
state of these potential optimizations? Are some of these expected to be
addressed in next gen MR release?
Thanks,
-Shrinivas
--
Todd Lipcon
Software Engineer, Cloudera