On Jan 30, 2012, at 12:02 PM, Jesse Yates wrote: > The large blocks issue is going away soon/already with append support in > HDFS. You are still going to be hurt if you have other things IOing on the > node as you still need to spin disk, but it won't be as terrible as it could > be. > > The big problem is in the fact that writing replicas in HDFS is done in a > pipeline, rather than in parallel. There is a ticket to change this > (HDFS-1783), but no movement on it since last summer.
ugh - why would they change this? Pipelining maximizes bandwidth usage. It'd be cool if the log stream could be configured to return after written to one, two, or more nodes though. > Just my two cents, but sticking with the currently logging style makes the > most sense, though maybe making it a really distinct interface so we can swap > out for an HDFS implementation when it's ready and people prefer. > > - Jesse Yates > > Sent from my iPhone.
