Re: is 12 minutes ok for dfs chown -R on 45000 files ?
On 10/2/08 11:33 PM, "Frank Singleton" <[EMAIL PROTECTED]> wrote: > Just to clarify, this is for when the chown will modify all files owner > attributes > > eg: toggle all from frank:frank to hadoop:hadoop (see below) When we converted from 0.15 to 0.16, we chown'ed all of our files. The local dev team wrote the code in https://issues.apache.org/jira/browse/HADOOP-3052 , but it wasn't committed as a standard feature as they viewed this as a one off. :( Needless to say, running a large chown as a MR job should be significantly faster.
Re: is 12 minutes ok for dfs chown -R on 45000 files ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Frank Singleton wrote: > Hi, > > Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram) > Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec) > > This was the namenode that I executed the command on > > Q1. Is this rate (60 files/sec) typical of what other folks are seeing ? > Q2. Are there any dfs/jvm parameters I should look at to see if I can improve > this > > time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank > /home/frank/proj100 > > real 12m38.631s > user 1m54.662s > sys 0m33.124s > > time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100 > 22045891 3965996260 > hdfs://namenode:9000/home/frank/proj100 > > real 0m1.579s > user 0m0.686s > sys 0m0.129s > > > cheers / frank Just to clarify, this is for when the chown will modify all files owner attributes eg: toggle all from frank:frank to hadoop:hadoop (see below) for chown -R from frank:frank to frank:frank , the results is only 5 or 6 seconds. at this point , all files under /home/frank/proj100 are frank:frank, and the command executes in 6 seconds or so. [EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank /home/frank/proj100 real0m5.624s user0m6.744s sys 0m0.402s #now lets change all to hadoop:hadoop [EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R hadoop:hadoop /home/frank/proj100 real12m43.732s user0m53.781s sys 0m10.655s # now toggle back to frank:frank [EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank /home/frank/proj100 real12m40.700s user0m45.757s sys 0m8.173s # now frank:frank to frank:frank [EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank /home/frank/proj100 real0m5.648s user0m6.734s sys 0m0.593s [EMAIL PROTECTED] ~]$ cheers / frank -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkjlvKwACgkQpZzN+MMic6eO4ACfVYEJ3DqWXo1Mg/4StUhG2Vii r2AAn2YpDmDi2l2a4Bn/1CHAHQtLDgrg =Dq3d -END PGP SIGNATURE-
Re: is 12 minutes ok for dfs chown -R on 45000 files ?
This is mostly disk bound on NameNode. I think this ends up being one fsync for each file. If you have multiple directories, you could start multiple commands in parallel. Because of the way NameNode syncs having multiple clients helps. Raghu. Frank Singleton wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram) Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec) This was the namenode that I executed the command on Q1. Is this rate (60 files/sec) typical of what other folks are seeing ? Q2. Are there any dfs/jvm parameters I should look at to see if I can improve this time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank /home/frank/proj100 real12m38.631s user1m54.662s sys 0m33.124s time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100 22045891 3965996260 hdfs://namenode:9000/home/frank/proj100 real0m1.579s user0m0.686s sys 0m0.129s cheers / frank -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkjln0MACgkQpZzN+MMic6dqgQCdEtto3qEhKIc50ICMf058w8ar o4QAoILcDRDYmUUuxPwSFh7LNTQdKodn =xuZE -END PGP SIGNATURE-
is 12 minutes ok for dfs chown -R on 45000 files ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram) Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec) This was the namenode that I executed the command on Q1. Is this rate (60 files/sec) typical of what other folks are seeing ? Q2. Are there any dfs/jvm parameters I should look at to see if I can improve this time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank /home/frank/proj100 real12m38.631s user1m54.662s sys 0m33.124s time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100 22045891 3965996260 hdfs://namenode:9000/home/frank/proj100 real0m1.579s user0m0.686s sys 0m0.129s cheers / frank -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkjln0MACgkQpZzN+MMic6dqgQCdEtto3qEhKIc50ICMf058w8ar o4QAoILcDRDYmUUuxPwSFh7LNTQdKodn =xuZE -END PGP SIGNATURE-