hi there, working through a concept at the moment and was attempting to write lots of data to few files as opposed to writing lots of data to lots of little files. what are the thoughts on this?
When I try and implement outputStream = hdfs.append(path); there doesn't seem to be any locking mechanism in place ... or there is and it doesn't work well enough for many writes per second? i have read and seen that the property "dfs.support.append" is not meant for production use. still, if millions of little files are as good or better --- or no difference -- to a few massive files then i suppose append isn't something i really need. I do see a lot of stack traces with messages like: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /foo/bar/aaa.bbb.ccc.ddd.xxx for DFSClient_-1821265528 on client 127.0.0.1 because current leaseholder is trying to recreate file. i hope this make sense. still a little bit confused. thanks in advance -sd -- Sasha Dolgy sasha.do...@gmail.com