Re: is 12 minutes ok for dfs chown -R on 45000 files ?

2008-10-06 Thread Allen Wittenauer



On 10/2/08 11:33 PM, "Frank Singleton" <[EMAIL PROTECTED]> wrote:

> Just to clarify, this is for when the chown will modify all files owner
> attributes
> 
> eg: toggle all from frank:frank to hadoop:hadoop (see below)

When we converted from 0.15 to 0.16, we chown'ed all of our files.  The
local dev team wrote the code in
https://issues.apache.org/jira/browse/HADOOP-3052 , but it wasn't committed
as a standard feature as they viewed this as a one off. :(

Needless to say, running a large chown as a MR job should be
significantly faster.



Re: is 12 minutes ok for dfs chown -R on 45000 files ?

2008-10-02 Thread Frank Singleton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Frank Singleton wrote:
> Hi,
> 
> Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram)
> Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec)
> 
> This was the namenode that I executed the command on
> 
> Q1. Is this rate (60 files/sec) typical of what other folks are seeing ?
> Q2. Are there any dfs/jvm parameters I should look at to see if I can improve 
> this
> 
> time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank 
> /home/frank/proj100
> 
> real  12m38.631s
> user  1m54.662s
> sys   0m33.124s
> 
> time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100
>  22045891 3965996260 
> hdfs://namenode:9000/home/frank/proj100
> 
> real  0m1.579s
> user  0m0.686s
> sys   0m0.129s
> 
> 
> cheers / frank

Just to clarify, this is for when the chown will modify all files owner 
attributes

eg: toggle all from frank:frank to hadoop:hadoop (see below)

for chown -R from frank:frank to frank:frank , the results is only 5 or 6 
seconds.


at this point , all files  under /home/frank/proj100  are frank:frank,  and the 
command executes
in 6 seconds or so.

[EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R 
frank:frank /home/frank/proj100

real0m5.624s
user0m6.744s
sys 0m0.402s

#now lets change all to hadoop:hadoop

[EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R 
hadoop:hadoop /home/frank/proj100

real12m43.732s
user0m53.781s
sys 0m10.655s


# now toggle back to frank:frank

[EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R 
frank:frank /home/frank/proj100

real12m40.700s
user0m45.757s
sys 0m8.173s

# now frank:frank to frank:frank

[EMAIL PROTECTED] ~]$ time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R 
frank:frank /home/frank/proj100

real0m5.648s
user0m6.734s
sys 0m0.593s
[EMAIL PROTECTED] ~]$


cheers / frank

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkjlvKwACgkQpZzN+MMic6eO4ACfVYEJ3DqWXo1Mg/4StUhG2Vii
r2AAn2YpDmDi2l2a4Bn/1CHAHQtLDgrg
=Dq3d
-END PGP SIGNATURE-


Re: is 12 minutes ok for dfs chown -R on 45000 files ?

2008-10-02 Thread Raghu Angadi


This is mostly disk bound on NameNode. I think this ends up being one 
fsync for each file. If you have multiple directories, you could start 
multiple commands in parallel. Because of the way NameNode syncs having 
multiple clients helps.


Raghu.

Frank Singleton wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram)
Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec)

This was the namenode that I executed the command on

Q1. Is this rate (60 files/sec) typical of what other folks are seeing ?
Q2. Are there any dfs/jvm parameters I should look at to see if I can improve 
this

time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank 
/home/frank/proj100

real12m38.631s
user1m54.662s
sys 0m33.124s

time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100
 22045891 3965996260 
hdfs://namenode:9000/home/frank/proj100

real0m1.579s
user0m0.686s
sys 0m0.129s


cheers / frank
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkjln0MACgkQpZzN+MMic6dqgQCdEtto3qEhKIc50ICMf058w8ar
o4QAoILcDRDYmUUuxPwSFh7LNTQdKodn
=xuZE
-END PGP SIGNATURE-




is 12 minutes ok for dfs chown -R on 45000 files ?

2008-10-02 Thread Frank Singleton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Did a test on recursive chown on a fedora 9 box here (2xquad core,16Gram)
Took about 12.5 minutes to complete for 45000 files. (hmm approx 60 files/sec)

This was the namenode that I executed the command on

Q1. Is this rate (60 files/sec) typical of what other folks are seeing ?
Q2. Are there any dfs/jvm parameters I should look at to see if I can improve 
this

time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -chown -R frank:frank 
/home/frank/proj100

real12m38.631s
user1m54.662s
sys 0m33.124s

time /home/hadoop/hadoop-0.18.1/bin/hadoop dfs -count /home/frank/proj100
 22045891 3965996260 
hdfs://namenode:9000/home/frank/proj100

real0m1.579s
user0m0.686s
sys 0m0.129s


cheers / frank
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkjln0MACgkQpZzN+MMic6dqgQCdEtto3qEhKIc50ICMf058w8ar
o4QAoILcDRDYmUUuxPwSFh7LNTQdKodn
=xuZE
-END PGP SIGNATURE-