There's an interview with one of the GFS engineers on ACM Queue that might be of interest to you. Its related to GFS, but I think the underlying issues are the same in HDFS. There is lot of discussion on dealing with large number of files. Here's the link: http://queue.acm.org/detail.cfm?id=1594206 - Jaideep On Mon, Aug 10, 2009 at 7:57 AM, Budianto Lie<popo6...@gmail.com> wrote: > Thanks > > On 8/9/09, Jean-Daniel Cryans <jdcry...@apache.org> wrote: >> Have a look at >> http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/ >> >> J-D >> >> On Sun, Aug 9, 2009 at 8:41 AM, Budianto Lie<popo6...@gmail.com> wrote: >>> Hello, >>> As we know the block size of hdfs is big (64M). >>> If I've large amount of files but the average file size is small (less >>> than >>> 50kb). And they are stored into hdfs. >>> What's the performance, compare with storing a big files? >>> >>> Thanks, >>> Budianto >>> >> >
-- - JDD