Hi Lavanya,

On 5/14/2010 10:51 AM, Lavanya Ramakrishnan wrote:
> Hello,
>
>   I am running org.apache.hadoop.fs.TestDFSIO to benchmark our HDFS
> installation and had a couple of questions regarding the same.
>
> a) If I run the benchmark back to back in the same directory, I start seeing
> strange errors such as NotReplicatedYetException or
> AlreadyBeingCreatedException (failed to create file  .... on client 5,
> because this file is already being created by DFSClient_.... on ...).  It
> seems like there might be some kind of race condition between the
> replication from a previous run and subsequent runs. Is there any way to
> avoid this?

Yes this looks like a race with the previous run.
You can just wait or run TestDFSIO -clean before the second run.

> b) I have been testing with concurrent writers and see a significant drop in
> throughput. I get about 60 MB/s for 1 writer and about 8 MB/s for 50
> concurrent writers. Is this the known scalability limits for HDFS. Is there
> any way to configure this to perform better?

It depends on the size and the configuration of your cluster.
In general for consistent results with DFSIO it is better to set up 1 or 2
tasks per node. And specify as many files for DFSIO as you have map slots.
The idea is that all maps finish in one wave.
Then you should get optimal performance.

Thanks,
--Konstantin

Reply via email to