This is mostly disk bound on NameNode. I think this ends up being one fsync for each file. If you have multiple directories, you could start multiple commands in parallel. Because of the way NameNode syncs having multiple clients helps.

Raghu.

Frank Singleton wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram)
Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec)

This was the namenode that I executed the command on

Q1. Is this rate (60 files/sec) typical of what other folks are seeing ?
Q2. Are there any dfs/jvm parameters I should look at to see if I can improve 
this

time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank 
/home/frank/proj100

real    12m38.631s
user    1m54.662s
sys     0m33.124s

time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100
         220        45891         3965996260 
hdfs://namenode:9000/home/frank/proj100

real    0m1.579s
user    0m0.686s
sys     0m0.129s


cheers / frank
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkjln0MACgkQpZzN+MMic6dqgQCdEtto3qEhKIc50ICMf058w8ar
o4QAoILcDRDYmUUuxPwSFh7LNTQdKodn
=xuZE
-----END PGP SIGNATURE-----

Reply via email to