large files vs many files

Sasha Dolgy Tue, 05 May 2009 16:34:31 -0700

hi there,
working through a concept at the moment and was attempting to write lots of
data to few files as opposed to writing lots of data to lots of little
files.  what are the thoughts on this?


When I try and implement outputStream = hdfs.append(path); there doesn't
seem to be any locking mechanism in place ... or there is and it doesn't
work well enough for many writes per second?

i have read and seen that the property "dfs.support.append" is not meant for
production use.  still, if millions of little files are as good or better
--- or no difference -- to a few massive files then i suppose append isn't
something i really need.

I do see a lot of stack traces with messages like:

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file /foo/bar/aaa.bbb.ccc.ddd.xxx for DFSClient_-1821265528 on client
127.0.0.1 because current leaseholder is trying to recreate file.

i hope this make sense.  still a little bit confused.

thanks in advance
-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com

large files vs many files

Reply via email to