Hi Shrinivas, There has been some work going on recently around optimizing checksums. See HDFS-2080 for example. This will help both the write and read code, though we've focused more on read.
There have also been a lot of improvements around random read access - for example HDFS-941 which improves random read by more than 2x. I'm planning on writing a blog post in the next couple of weeks about some of this work. -Todd On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi <jshrini...@gmail.com>wrote: > This blog post on YDN website > > http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has > detailed discussion on different steps involved in Hadoop IO > operations > and opportunities for optimizations. Could someone please comment on > current > state of these potential optimizations? Are some of these expected to be > addressed in "next gen MR" release? > > Thanks, > -Shrinivas > -- Todd Lipcon Software Engineer, Cloudera