Re: Trying to write to HDFS from mapreduce.
I think your conf is incorrectly set and your job was run locally. Also, have you done jobconf.setNumReduceTasks(0)? Try running some example jobs to test your setting. Nicholas Sze - Original Message > From: Erik Holstad <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Thursday, July 24, 2008 3:17:40 PM > Subject: Trying to write to HDFS from mapreduce. > > Hi! > I'm writing a mapreduce job where I want the output from the mapper to go > strait > to the HDFS without passing the reduce method. Have been told that I can do: > c.setOutputFormat(TextOutputFormat.class); also added > Path path = new Path("user"); > FileOutputFormat.setOutputPath(c, path); > > But I still ended up with the result in the local filesystem instead. > > Regards Erik
Re: File permissions issue
Hi Joman, The temp directory we talking here is the temp directory in the local file system (i.e. Unix in your case). There is a config property hadoop.tmp.dir (see hadoop-default.xml), which specifies the path of temp directory. Before you start the cluster, you should set this property and chmod on the temp directory to make sure that all users have permission to create files under it. Hope it helps. Nicholas Sze - Original Message > From: Joman Chu <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Wednesday, July 9, 2008 4:15:39 AM > Subject: Re: File permissions issue > > So we can fix this issue by putting all three users in a common group? We did > that after we encountered the issue, but we still got the errors. Note that > we > had not restarted hadoop, so the permissions were still as described earlier. > Should we have restarted Hadoop after the grouping? > > On Wed, July 9, 2008 2:05 am, heyongqiang said: > > because in your permission set, the other role can not write the temp > > directory. and user3 is not in the same group with user2. > > > > > > > > > > > > heyongqiang 2008-07-09 > > > > > > > > ·¢¼þÈË£º Joman Chu ·¢ËÍʱ¼ä£º 2008-07-09 13:06:51 ÊÕ¼þÈË£º > > core-user@hadoop.apache.org ³ËÍ£º Ö÷Ì⣺ File permissions issue > > > > Hello, > > > > On a cluster where I run Hadoop, it seems that the temp directory created > > by Hadoop (in our case, /tmp/hadoop/) gets its permissions set to > > "drwxrwxr-x" owned by the first person that runs a job after the Hadoop > > services are started. This causes file permissions problems as we try to > > run jobs. > > > > For example, user1:user1 starts Hadoop using ./start-all.sh. Then > > user2:user2 runs a Hadoop job. Temp directories (/tmp/hadoop/) are now > > created in all nodes in the cluster owned by user2 with permissions > > "drwxrwxr-x". Now user3:user3 tries to run a job and gets the following > > exception: > > > > java.io.IOException: Permission denied at > > java.io.UnixFileSystem.createFileExclusively(Native Method) at > > java.io.File.checkAndCreate(File.java:1704) at > > java.io.File.createTempFile(File.java:1793) at > > org.apache.hadoop.util.RunJar.main(RunJar.java:115) at > > org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) at > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at > > org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) > > > > Why does this happen and how can we fix this? Our current stop gap > > measure is to run a job as the user that started Hadoop. That is, in our > > example, after user1 starts Hadoop, user1 runs a job. Everything seems to > > work fine then. > > > > Thanks, Joman Chu > > > > > -- > Joman Chu > AIM: ARcanUSNUMquam > IRC: irc.liquid-silver.net
Re: Task failing, cause FileSystem close?
Hi Christophe, This exception happens when you access the FileSystem after calling FileSystem.close(). From the error message below, a FileSystem input stream was accessed after FileSystem.close(). I guess the FileSystem was closed manually (and too early). In most cases, you don't have to call FileSystem.close() since it will be closed automatically. Nicholas - Original Message > From: Christophe Taton <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tuesday, June 17, 2008 4:18:45 AM > Subject: Task failing, cause FileSystem close? > > Hi all, > > I am experiencing (through my students) the following error on a 28 > nodes cluster running Hadoop 0.16.4. > Some jobs fail with many map tasks aborting with this error message: > > 2008-06-17 12:25:01,512 WARN org.apache.hadoop.mapred.TaskTracker: > Error running child > java.io.IOException: Filesystem closed > at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:166) > at org.apache.hadoop.dfs.DFSClient.access$500(DFSClient.java:58) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.close(DFSClient.java:1103) > at java.io.FilterInputStream.close(FilterInputStream.java:155) > at org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1541) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.close(SequenceFileRecordReader.java:125) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:155) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084) > > Any clue why this would happen? > > Thanks in advance, > Christophe
Re: client connect as different username?
This information can be found in http://hadoop.apache.org/core/docs/current/hdfs_permissions_guide.html Nicholas - Original Message > From: Chris Collins <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Wednesday, June 11, 2008 9:31:18 PM > Subject: Re: client connect as different username? > > Thanks Doug, should this be added to the permissions doc or to the > faq? See you in Sonoma. > > C > On Jun 11, 2008, at 9:15 PM, Doug Cutting wrote: > > > Chris Collins wrote: > >> You are referring to creating a directory in hdfs? Because if I am > >> user chris and the hdfs only has user foo, then I cant create a > >> directory because I dont have perms, infact I cant even connect. > > > > Today, users and groups are declared by the client. The namenode > > only records and checks against user and group names provided by the > > client. So if someone named "foo" writes a file, then that file is > > owned by someone named "foo" and anyone named "foo" is the owner of > > that file. No "foo" account need exist on the namenode. > > > > The one (important) exception is the "superuser". Whatever user > > name starts the namenode is the superuser for that filesystem. And > > if "/" is not world writable, a new filesystem will not contain a > > home directory (or anywhere else) writable by other users. So, in a > > multiuser Hadoop installation, the superuser needs to create home > > directories and project directories for other users and set their > > protections accordingly before other users can do anything. Perhaps > > this is what you've run into? > > > > Doug
Re: client connect as different username?
The best way is to use sudo command to execute hadoop client. Does it work for you? Nicholas - Original Message > From: Bob Remeika <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Wednesday, June 11, 2008 12:56:14 PM > Subject: client connect as different username? > > Apologies if this is an RTM response, but I looked and wasn't able to find > anything concrete. Is it possible to connect to HDFS via the HDFS client > under a different username than I am currently logged in as? > > Here is our situation, I am user bobr on the client machine. I need to add > something to the HDFS cluster as the user "companyuser". Is this possible > with the current set of APIs or do I have to upload and "chown"? > > Thanks, > Bob
Re: JAVA_HOME Cygwin problem (solution doesn't work)
The following works for me set JAVA_HOME=/cygdrive/c/Progra~1/Java/jdk1.5.0_14 Nicholas - Original Message From: vatsan <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Friday, May 23, 2008 5:41:05 PM Subject: JAVA_HOME Cygwin problem (solution doesn't work) I have installed hadoop on cygwin , I am running windows XP. My Java directory is C:\Program Files\Java\jre1.6.0_06 I am not able to run hadoop as it complains of "no such file or directory error". I did some searching and found out someone had proposed a solution of doing SET JAVA_HOME=C:\Program Files\Java\jre1.6.0_06 in the Cygwin.bat file, but that doesn't work for me. Neither does using the absolute path name "\cygwin\c\Program Files\Java" OR using \cygwin\c\"Program Files"\Java Can someone guide me here? (I understand that the problem is because of the path convention conflicts in windows and Cygwin, I found some stuff on fixes for the path issues that spoke of using cygpath.exe as a fix ... for example while running a java program on cygwin, but could not find anything that addressed my problem.) -- View this message in context: http://www.nabble.com/JAVA_HOME-Cygwin-problem-%28solution-doesn%27t-work%29-tp17443172p17443172.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop Permission Problem
Hi Senthil, drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This one is your local directory. I think you might have mixed up the local and hdfs directories. Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 1:11:01 PM Subject: RE: Hadoop Permission Problem Hi Nicholas, You are right, the permission problem is with datastore, that's what I mentioned in the previous mails. But I gave the 777 permission. Here is the datastore permission in the master. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore I am not seeing any datastore in the slave machines. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 09, 2008 3:29 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, Let me explain the error message " Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x". It says that the current user "test" is trying to WRITE to the inode "datastore" with owner hadoop:supergroup and permission 755. So the problem is in the directory "datastore". Could you check it? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 12:23:52 PM Subject: RE: Hadoop Permission Problem Hi Nicholas, Here I tried as user test after I got the error (is the exception comes from slave machine?) Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x /usr/local/hadoop/bin/hadoop fs -ls Found 1 items /user/test/myapps 26742008-05-07 17:55rw-r--r-- test supergroup -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 09, 2008 3:13 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, I cannot see why it does not work. Could you try again, do a fs -ls right after you see the error message? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 11:49:49 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, No, I am running map/red jobs over HDFS file. That permission is for datastore (hadoop.tmp.dir) Here is the HDFS /usr/local/hadoop/bin/hadoop dfs -ls / Found 2 items /user 2008-05-07 17:55rwxrwxrwx hadoop supergroup /usr 2008-05-07 17:18rwxr-xr-x hadoop supergroup [EMAIL PROTECTED] .ssh]$ /usr/local/hadoop/bin/hadoop dfs -ls /user Found 2 items /user/hadoop 2008-05-08 16:36rwxr-xr-x hadoop supergroup /user/test 2008-05-07 17:55rwxrwxrwx test supergroup Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 09, 2008 2:40 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test >From the output format, the directories above seem not HDFS directories. Are >you running map/red jobs over local file system (e.g. Linux)? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 6:36:27 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, That's what I was wondering. Here is the datastore directory permission in the master machine. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This datastore directory present only in the master right not on the slaves right? I couldn't find. After I changed the permission for datastore I restarted dfs and mapred. But still it complains about the permission. Even I changed all the directories in datastore to 777 drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test What are the places I need to change the permissions so that UserB can submit the job using the jobtracker and tasktracker started by UserA. Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, May 08, 2008 8:32 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, In the error message, it says that the permission for "datastore" is 755. Are you sure that you have changed it to 777? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Thursday, May 8, 2008 11:57:46 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, Thanks it helped. I gave permissi
Re: Hadoop Permission Problem
Hi Senthil, Let me explain the error message " Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x". It says that the current user "test" is trying to WRITE to the inode "datastore" with owner hadoop:supergroup and permission 755. So the problem is in the directory "datastore". Could you check it? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 12:23:52 PM Subject: RE: Hadoop Permission Problem Hi Nicholas, Here I tried as user test after I got the error (is the exception comes from slave machine?) Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x /usr/local/hadoop/bin/hadoop fs -ls Found 1 items /user/test/myapps 26742008-05-07 17:55rw-r--r-- test supergroup -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 09, 2008 3:13 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, I cannot see why it does not work. Could you try again, do a fs -ls right after you see the error message? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 11:49:49 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, No, I am running map/red jobs over HDFS file. That permission is for datastore (hadoop.tmp.dir) Here is the HDFS /usr/local/hadoop/bin/hadoop dfs -ls / Found 2 items /user 2008-05-07 17:55rwxrwxrwx hadoop supergroup /usr 2008-05-07 17:18rwxr-xr-x hadoop supergroup [EMAIL PROTECTED] .ssh]$ /usr/local/hadoop/bin/hadoop dfs -ls /user Found 2 items /user/hadoop 2008-05-08 16:36rwxr-xr-x hadoop supergroup /user/test 2008-05-07 17:55rwxrwxrwx test supergroup Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 09, 2008 2:40 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test >From the output format, the directories above seem not HDFS directories. Are >you running map/red jobs over local file system (e.g. Linux)? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 6:36:27 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, That's what I was wondering. Here is the datastore directory permission in the master machine. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This datastore directory present only in the master right not on the slaves right? I couldn't find. After I changed the permission for datastore I restarted dfs and mapred. But still it complains about the permission. Even I changed all the directories in datastore to 777 drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test What are the places I need to change the permissions so that UserB can submit the job using the jobtracker and tasktracker started by UserA. Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, May 08, 2008 8:32 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, In the error message, it says that the permission for "datastore" is 755. Are you sure that you have changed it to 777? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Thursday, May 8, 2008 11:57:46 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, Thanks it helped. I gave permission 777 for /user So now user "Test" can perform HDFS operations. And also I gave permission 777 for /usr/local/hadoop/datastore on the master. When user "Test" tries to submit the MapReduce job, getting this error Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x Where else I need to give permission so that user "Test" can submit jobs using jobtracker and Datanode started by user "hadoop". Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 5:49 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, Since the path "myapps" is relative, copyFromLocal will copy the file to the home directory, i.e. /user/
Re: Hadoop Permission Problem
Hi Senthil, I cannot see why it does not work. Could you try again, do a fs -ls right after you see the error message? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 11:49:49 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, No, I am running map/red jobs over HDFS file. That permission is for datastore (hadoop.tmp.dir) Here is the HDFS /usr/local/hadoop/bin/hadoop dfs -ls / Found 2 items /user 2008-05-07 17:55rwxrwxrwx hadoop supergroup /usr 2008-05-07 17:18rwxr-xr-x hadoop supergroup [EMAIL PROTECTED] .ssh]$ /usr/local/hadoop/bin/hadoop dfs -ls /user Found 2 items /user/hadoop 2008-05-08 16:36rwxr-xr-x hadoop supergroup /user/test 2008-05-07 17:55rwxrwxrwx test supergroup Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 09, 2008 2:40 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test >From the output format, the directories above seem not HDFS directories. Are >you running map/red jobs over local file system (e.g. Linux)? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 6:36:27 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, That's what I was wondering. Here is the datastore directory permission in the master machine. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This datastore directory present only in the master right not on the slaves right? I couldn't find. After I changed the permission for datastore I restarted dfs and mapred. But still it complains about the permission. Even I changed all the directories in datastore to 777 drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test What are the places I need to change the permissions so that UserB can submit the job using the jobtracker and tasktracker started by UserA. Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, May 08, 2008 8:32 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, In the error message, it says that the permission for "datastore" is 755. Are you sure that you have changed it to 777? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Thursday, May 8, 2008 11:57:46 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, Thanks it helped. I gave permission 777 for /user So now user "Test" can perform HDFS operations. And also I gave permission 777 for /usr/local/hadoop/datastore on the master. When user "Test" tries to submit the MapReduce job, getting this error Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x Where else I need to give permission so that user "Test" can submit jobs using jobtracker and Datanode started by user "hadoop". Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 5:49 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, Since the path "myapps" is relative, copyFromLocal will copy the file to the home directory, i.e. /user/Test/myapps in your case. If /user/Test doesn't not exist, it will first try to create it. You got AccessControlException because the permission of /user is 755. Hope this helps. Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, May 7, 2008 2:36:22 PM Subject: Hadoop Permission Problem Hi, My datanode and jobtracker are started by user "hadoop". And user "Test" needs to submit the job. So if the user "Test" copies file to HDFS, there is a permission error. /usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=Test, access=WRITE, inode="user":hadoop:supergroup:rwxr-xr-x Could you please let me know how other users (other than hadoop) can access HDFS and then submit MapReduce jobs. Where to configure or what default configuration needs to be changed. Thanks, Senthil
Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop]
Hi Rick, > the hbase master must be run on the same machine as the hadoop hdfs (what > part of it?) if one wants to use the hdfs permissions system or that right > now we must run without permissions? Hdfs and hbase (and all clients) should run under the same administrative domain, but not the same machine. The stack track is good enough. HMasting does DistributedFileSystem.setSafeMod(...) which required superuser privilege. Nicholas - Original Message From: Rick Hangartner <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Friday, May 9, 2008 11:51:55 AM Subject: Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop] Hi Nicholas, I was the original poster of this question. Thanks for your response. (And thanks for elevating attention to this Stack). Am I missing something or is one implication of how hdfs determines privileges from the Linux filesystem that the hbase master must be run on the same machine as the hadoop hdfs (what part of it?) if one wants to use the hdfs permissions system or that right now we must run without permissions? Here's most of the full Java trace for the exception that might be helpful in determining why superuser privilege is required to run HMaster. Unfortunately log4j appears to have chopped off the last 6 entries. (This is from the hbase log). Thanks for the help. 2008-05-08 10:13:28,670 ERROR org.apache.hadoop.hbase.HMaster: Can not start master java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun .reflect .NativeConstructorAccessorImpl .newInstance(NativeConstructorAccessorImpl.java:39) at sun .reflect .DelegatingConstructorAccessorImpl .newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at org.apache.hadoop.hbase.HMaster.doMain(HMaster.java:3312) at org.apache.hadoop.hbase.HMaster.main(HMaster.java:3346) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Superuser privilege is required at org .apache .hadoop.dfs.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:4020) at org.apache.hadoop.dfs.FSNamesystem.setSafeMode(FSNamesystem.java: 3794) at org.apache.hadoop.dfs.NameNode.setSafeMode(NameNode.java:473) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 39) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) at org.apache.hadoop.ipc.Client.call(Client.java:512) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy0.setSafeMode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 39) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:585) at org .apache .hadoop .io .retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java: 82) at org .apache .hadoop .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.setSafeMode(Unknown Source) at org.apache.hadoop.dfs.DFSClient.setSafeMode(DFSClient.java:486) at org .apache .hadoop .dfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:257) at org.apache.hadoop.hbase.HMaster.(HMaster.java:893) at org.apache.hadoop.hbase.HMaster.(HMaster.java:859) ... 6 more On May 9, 2008, at 11:34 AM, [EMAIL PROTECTED] wrote: > Hi Stack, > >> One question this raises is if the "hbase:hbase" user and group are >> being derived from the Linux file system user and group, or if they >> are the hdfs user and group? > HDFS currently does not manage user and group information. User and > group in HDFS are being derived from the underlying OS (Linux in > your case) user and group. > >> Otherwise, how can we indicate that "hbase" user is in the hdfs >> group "supergroup"? > In Hadoop conf, the property dfs.permissions.supergroup specifies > the super-user group and the default value is "supergroup". > Administrator should set this property to a dedicated group in the > underlying OS for HDFS superuser. For example, you could create a > group "hdfs-superuser" in Linux, set dfs.permissions.supergroup to > "hdfs-superuser" and add "hdfs-superuser" to hbase's group list. > Then, "hbase" becomes a HDFS superuser. > > I don't know why superuser privilege is required to run HMaster. I > might be able to tell if a comp
Re: Hadoop Permission Problem
Hi Senthil, drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test >From the output format, the directories above seem not HDFS directories. Are >you running map/red jobs over local file system (e.g. Linux)? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 6:36:27 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, That's what I was wondering. Here is the datastore directory permission in the master machine. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This datastore directory present only in the master right not on the slaves right? I couldn't find. After I changed the permission for datastore I restarted dfs and mapred. But still it complains about the permission. Even I changed all the directories in datastore to 777 drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test What are the places I need to change the permissions so that UserB can submit the job using the jobtracker and tasktracker started by UserA. Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, May 08, 2008 8:32 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, In the error message, it says that the permission for "datastore" is 755. Are you sure that you have changed it to 777? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Thursday, May 8, 2008 11:57:46 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, Thanks it helped. I gave permission 777 for /user So now user "Test" can perform HDFS operations. And also I gave permission 777 for /usr/local/hadoop/datastore on the master. When user "Test" tries to submit the MapReduce job, getting this error Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x Where else I need to give permission so that user "Test" can submit jobs using jobtracker and Datanode started by user "hadoop". Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 5:49 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, Since the path "myapps" is relative, copyFromLocal will copy the file to the home directory, i.e. /user/Test/myapps in your case. If /user/Test doesn't not exist, it will first try to create it. You got AccessControlException because the permission of /user is 755. Hope this helps. Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, May 7, 2008 2:36:22 PM Subject: Hadoop Permission Problem Hi, My datanode and jobtracker are started by user "hadoop". And user "Test" needs to submit the job. So if the user "Test" copies file to HDFS, there is a permission error. /usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=Test, access=WRITE, inode="user":hadoop:supergroup:rwxr-xr-x Could you please let me know how other users (other than hadoop) can access HDFS and then submit MapReduce jobs. Where to configure or what default configuration needs to be changed. Thanks, Senthil
Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop]
Hi Stack, > One question this raises is if the "hbase:hbase" user and group are being > derived from the Linux file system user and group, or if they are the hdfs > user and group? HDFS currently does not manage user and group information. User and group in HDFS are being derived from the underlying OS (Linux in your case) user and group. > Otherwise, how can we indicate that "hbase" user is in the hdfs group > "supergroup"? In Hadoop conf, the property dfs.permissions.supergroup specifies the super-user group and the default value is "supergroup". Administrator should set this property to a dedicated group in the underlying OS for HDFS superuser. For example, you could create a group "hdfs-superuser" in Linux, set dfs.permissions.supergroup to "hdfs-superuser" and add "hdfs-superuser" to hbase's group list. Then, "hbase" becomes a HDFS superuser. I don't know why superuser privilege is required to run HMaster. I might be able to tell if a complete stack track is given. Nicholas - Original Message From: stack <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, May 8, 2008 8:44:42 PM Subject: Hadoop Permissions Question -> [Fwd: Hbase on hadoop] Can someone familiar with permissions offer an opinion on the below? Thanks, St.Ack Hi, We have an issue with hbase on hadoop and file system permissions we hope someone already knows the answer to. Our apologies if we missed that this issue has already been addressed on this list. We are running hbase-0.1.2 on top of hadoop-0.16.3, starting the hbase daemon from an "hbase" user account and the hadoop daemon and have observed this "feature". We are running hbase in a separate "hadoop" user account and hadoop in it's own "hadoop" user account on a single machine. When we try to start up hbase, we see this error message in the log: 2008-05-06 12:09:02,845 ERROR org.apache.hadoop.hbase.HMaster: Can not start master java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun .reflect .NativeConstructorAccessorImpl .newInstance(NativeConstructorAccessorImpl.java:39) at sun .reflect .DelegatingConstructorAccessorImpl .newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at org.apache.hadoop.hbase.HMaster.doMain(HMaster.java:3329) at org.apache.hadoop.hbase.HMaster.main(HMaster.java:3363) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Superuser privilege is required ... (etc) If we run hbase in the hadoop user account we don't have any problems. We think we've narrowed the issue down a bit from the debug logs. The method "FSNameSystem.checkPermission()" method is throwing the exception because the "PermissionChecker()" constructor is returning that the hbase user is not a superuser or in the same supergroup as hadoop. private void checkSuperuserPrivilege() throws AccessControlException { if (isPermissionEnabled) { PermissionChecker pc = new PermissionChecker( fsOwner.getUserName(), supergroup); if (!pc.isSuper) { throw new AccessControlException("Superuser privilege is required"); } } } If we look at at the "PermissionChecker()" constructor we see that it is comparing the hdfs owner name (which should be "hadoop") and the hdfs file system owner's group ("supergroup") to the current user and groups, which the log seems to indicate the user is "hbase" and the groups for user "hbase" only include "hbase" : PermissionChecker(String fsOwner, String supergroup ) throws AccessControlException{ UserGroupInformation ugi = UserGroupInformation.getCurrentUGI(); if (LOG.isDebugEnabled()) { LOG.debug("ugi=" + ugi); } if (ugi != null) { user = ugi.getUserName(); groups.addAll(Arrays.asList(ugi.getGroupNames())); isSuper = user.equals(fsOwner) || groups.contains(supergroup); } else { throw new AccessControlException("ugi = null"); } } The current user and group is derived from the thread information: private static final ThreadLocal currentUGI = new ThreadLocal(); /** @return the [EMAIL PROTECTED] UserGroupInformation} for the current thread */ public static UserGroupInformation getCurrentUGI() { return currentUGI.get(); } which we're hoping might be enough to illuminate the problem. One question this raises is if the "hbase:hbase" user and group are being derived from the Linux file system user and group, or if they are the hdfs user and group? Otherwise, how can we indicate that "hbase" user is in the hdfs group "supergroup"? Is there a parameter in a hadoop configuration file? Apparently setting the groups of the web server to include "supergroup" didn't have any effect, although perhaps t
Re: Hadoop Permission Problem
Hi Senthil, In the error message, it says that the permission for "datastore" is 755. Are you sure that you have changed it to 777? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Thursday, May 8, 2008 11:57:46 AM Subject: RE: Hadoop Permission Problem Hi Nicholas, Thanks it helped. I gave permission 777 for /user So now user "Test" can perform HDFS operations. And also I gave permission 777 for /usr/local/hadoop/datastore on the master. When user "Test" tries to submit the MapReduce job, getting this error Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x Where else I need to give permission so that user "Test" can submit jobs using jobtracker and Datanode started by user "hadoop". Thanks, Senthil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 5:49 PM To: core-user@hadoop.apache.org Subject: Re: Hadoop Permission Problem Hi Senthil, Since the path "myapps" is relative, copyFromLocal will copy the file to the home directory, i.e. /user/Test/myapps in your case. If /user/Test doesn't not exist, it will first try to create it. You got AccessControlException because the permission of /user is 755. Hope this helps. Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, May 7, 2008 2:36:22 PM Subject: Hadoop Permission Problem Hi, My datanode and jobtracker are started by user "hadoop". And user "Test" needs to submit the job. So if the user "Test" copies file to HDFS, there is a permission error. /usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=Test, access=WRITE, inode="user":hadoop:supergroup:rwxr-xr-x Could you please let me know how other users (other than hadoop) can access HDFS and then submit MapReduce jobs. Where to configure or what default configuration needs to be changed. Thanks, Senthil
Re: Hadoop Permission Problem
Hi Senthil, Since the path "myapps" is relative, copyFromLocal will copy the file to the home directory, i.e. /user/Test/myapps in your case. If /user/Test doesn't not exist, it will first try to create it. You got AccessControlException because the permission of /user is 755. Hope this helps. Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, May 7, 2008 2:36:22 PM Subject: Hadoop Permission Problem Hi, My datanode and jobtracker are started by user "hadoop". And user "Test" needs to submit the job. So if the user "Test" copies file to HDFS, there is a permission error. /usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=Test, access=WRITE, inode="user":hadoop:supergroup:rwxr-xr-x Could you please let me know how other users (other than hadoop) can access HDFS and then submit MapReduce jobs. Where to configure or what default configuration needs to be changed. Thanks, Senthil
Re: distcp fails when copying from s3 to hdfs
Your distcp command looks correct. distcp may have created some log files (e.g. inside /_distcp_logs_5vzva5 from your previous email.) Could you check the logs, see whether there are error messages? If you could send me the distcp output and the logs, I may be able to find out the problem. (remember to remove the id:secret :) Nicholas - Original Message From: Siddhartha Reddy <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Friday, April 4, 2008 12:23:17 PM Subject: Re: distcp fails when copying from s3 to hdfs I am sorry, that was a mistype in my mail. The second command was (please note the / at the end): bin/hadoop fs -fs s3://id:[EMAIL PROTECTED] -ls / I guess you are right, Nicholas. The s3://id:[EMAIL PROTECTED]/file.txtindeed does not seem to be there. But the earlier distcp command to copy the file to S3 finished without errors. Once again, the command I am using to copy the file to S3 is: bin/hadoop distcp file.txt s3://id:[EMAIL PROTECTED]/file.txt Am I doing anything wrong here? Thanks, Siddhartha On Fri, Apr 4, 2008 at 11:38 PM, <[EMAIL PROTECTED]> wrote: > >To check that the file actually exists on S3, I tried the following > commands: > > > >bin/hadoop fs -fs s3://id:[EMAIL PROTECTED] -ls > >bin/hadoop fs -fs s3://id:[EMAIL PROTECTED] -ls > > > >The first returned nothing, while the second returned the following: > > > >Found 1 items > >/_distcp_logs_5vzva5 1969-12-31 19:00rwxrwxrwx > > > Are the first and the second commands the same? (why they return > different results?) It seems that distcp is right: > s3://id:[EMAIL PROTECTED]/file.txt indeed does not exist from the output > of your the second command. > > Nicholas > > -- http://sids.in "If you are not having fun, you are not doing it right."
Re: distcp fails when copying from s3 to hdfs
>To check that the file actually exists on S3, I tried the following commands: > >bin/hadoop fs -fs s3://id:[EMAIL PROTECTED] -ls >bin/hadoop fs -fs s3://id:[EMAIL PROTECTED] -ls > >The first returned nothing, while the second returned the following: > >Found 1 items >/_distcp_logs_5vzva5 1969-12-31 19:00rwxrwxrwx Are the first and the second commands the same? (why they return different results?) It seems that distcp is right: s3://id:[EMAIL PROTECTED]/file.txt indeed does not exist from the output of your the second command. Nicholas
Re: distcp fails :Input source not found
distcp supports multiple sources (link Unix cp) and if the specified source is a directory, it copies the entire directory. So, you could either do distcp src1 src2 ... src100 dst or first copy all srcs to srcdir, and then distcp srcdir dstdir I have no experience on S3 and EC2. Not sure it will work. Nicholas - Original Message From: Prasan Ary <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Thursday, April 3, 2008 10:06:35 AM Subject: Re: distcp fails :Input source not found I found it was a slight oversight on my part. I was copying the files into S3 using Firefox EC2 UI, and then trying to access those files on S3 using hadoop. The S3 filesystem provided by hadoop doesn't work with standard files. When I used hadoop to upload the files into S3 instead of Firefox EC2 UI, things sorted out. But then I had a hard time copying a whole folder from S3 onto EC2 cluster. The following article suggests that "distcp" can be used to copy folder from S3 bucket onto EC2 hdfs : http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 However, when I try it on 0.15.3, it doesn't allow a folder copy. I have 100+ files in my S3 bucket, and I had to run "distcp" on each one of them to get them on HDFS on EC2 . Not a nice experience! Can anyone suggest more elegant way that we can transfer 100s of files from S3 to HDFS on EC2 without having to iterate through each file? [EMAIL PROTECTED] wrote: It might be a bug. Could you try the following? bin/hadoop fs -ls s3://ID:[EMAIL PROTECTED]/InputFileFormat.xml Nicholas - Original Message From: Prasan Ary To: core-user@hadoop.apache.org Sent: Wednesday, April 2, 2008 7:41:50 AM Subject: Re: distcp fails :Input source not found Anybody ? Any thoughts why this might be happening? Here is what is happening directly from the ec2 screen. The ID and Secret Key are the only things changed. I'm running hadoop 15.3 from the public ami. I launched a 2 machine cluster using the ec2 scripts in the src/contrib/ec2/bin . . . The file I try and copy is 9KB (I noticed previous discussion on empty files and files that are > 10MB) > First I make sure that we can copy the file from s3 [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop fs -copyToLocal s3://ID:[EMAIL PROTECTED]/InputFileFormat.xml /usr/InputFileFormat.xml > Now I see that the file is copied to the ec2 master (where I'm logged in) [EMAIL PROTECTED] hadoop-0.15.3]# dir /usr/Input* /usr/InputFileFormat.xml > Next I make sure I can access the HDFS and that the input directory is there [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop fs -ls / Found 2 items /input 2008-04-01 15:45 /mnt 2008-04-01 15:42 [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop fs -ls /input/ Found 0 items > I make sure hadoop is running just fine by running an example [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop jar hadoop-0.15.3-examples.jar pi 10 1000 Number of Maps = 10 Samples per Map = 1000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 08/04/01 17:38:14 INFO mapred.FileInputFormat: Total input paths to process : 10 08/04/01 17:38:14 INFO mapred.JobClient: Running job: job_200804011542_0001 08/04/01 17:38:15 INFO mapred.JobClient: map 0% reduce 0% 08/04/01 17:38:22 INFO mapred.JobClient: map 20% reduce 0% 08/04/01 17:38:24 INFO mapred.JobClient: map 30% reduce 0% 08/04/01 17:38:25 INFO mapred.JobClient: map 40% reduce 0% 08/04/01 17:38:27 INFO mapred.JobClient: map 50% reduce 0% 08/04/01 17:38:28 INFO mapred.JobClient: map 60% reduce 0% 08/04/01 17:38:31 INFO mapred.JobClient: map 80% reduce 0% 08/04/01 17:38:33 INFO mapred.JobClient: map 90% reduce 0% 08/04/01 17:38:34 INFO mapred.JobClient: map 100% reduce 0% 08/04/01 17:38:43 INFO mapred.JobClient: map 100% reduce 20% 08/04/01 17:38:44 INFO mapred.JobClient: map 100% reduce 100% 08/04/01 17:38:45 INFO mapred.JobClient: Job complete: job_200804011542_0001 08/04/01 17:38:45 INFO mapred.JobClient: Counters: 9 08/04/01 17:38:45 INFO mapred.JobClient: Job Counters 08/04/01 17:38:45 INFO mapred.JobClient: Launched map tasks=10 08/04/01 17:38:45 INFO mapred.JobClient: Launched reduce tasks=1 08/04/01 17:38:45 INFO mapred.JobClient: Data-local map tasks=10 08/04/01 17:38:45 INFO mapred.JobClient: Map-Reduce Framework 08/04/01 17:38:45 INFO mapred.JobClient: Map input records=10 08/04/01 17:38:45 INFO mapred.JobClient: Map output records=20 08/04/01 17:38:45 INFO mapred.JobClient: Map input bytes=240 08/04/01 17:38:45 INFO mapred.JobClient: Map output bytes=320 08/04/01 17:38:45 INFO mapred.JobClient: Reduce input groups=2 08/04/01 17:38:45 INFO mapred.JobClient: Reduce input records=20 Job Finished in 31.028 seconds Estimated value of PI is 3.1556 > Finally, I try and copy the file over [EMAIL PROTECTED
Re: distcp fails :Input source not found
It might be a bug. Could you try the following? bin/hadoop fs -ls s3://ID:[EMAIL PROTECTED]/InputFileFormat.xml Nicholas - Original Message From: Prasan Ary <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Wednesday, April 2, 2008 7:41:50 AM Subject: Re: distcp fails :Input source not found Anybody ? Any thoughts why this might be happening? Here is what is happening directly from the ec2 screen. The ID and Secret Key are the only things changed. I'm running hadoop 15.3 from the public ami. I launched a 2 machine cluster using the ec2 scripts in the src/contrib/ec2/bin . . . The file I try and copy is 9KB (I noticed previous discussion on empty files and files that are > 10MB) > First I make sure that we can copy the file from s3 [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop fs -copyToLocal s3://ID:[EMAIL PROTECTED]/InputFileFormat.xml /usr/InputFileFormat.xml > Now I see that the file is copied to the ec2 master (where I'm logged in) [EMAIL PROTECTED] hadoop-0.15.3]# dir /usr/Input* /usr/InputFileFormat.xml > Next I make sure I can access the HDFS and that the input directory is there [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop fs -ls / Found 2 items /input 2008-04-01 15:45 /mnt 2008-04-01 15:42 [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop fs -ls /input/ Found 0 items > I make sure hadoop is running just fine by running an example [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop jar hadoop-0.15.3-examples.jar pi 10 1000 Number of Maps = 10 Samples per Map = 1000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 08/04/01 17:38:14 INFO mapred.FileInputFormat: Total input paths to process : 10 08/04/01 17:38:14 INFO mapred.JobClient: Running job: job_200804011542_0001 08/04/01 17:38:15 INFO mapred.JobClient: map 0% reduce 0% 08/04/01 17:38:22 INFO mapred.JobClient: map 20% reduce 0% 08/04/01 17:38:24 INFO mapred.JobClient: map 30% reduce 0% 08/04/01 17:38:25 INFO mapred.JobClient: map 40% reduce 0% 08/04/01 17:38:27 INFO mapred.JobClient: map 50% reduce 0% 08/04/01 17:38:28 INFO mapred.JobClient: map 60% reduce 0% 08/04/01 17:38:31 INFO mapred.JobClient: map 80% reduce 0% 08/04/01 17:38:33 INFO mapred.JobClient: map 90% reduce 0% 08/04/01 17:38:34 INFO mapred.JobClient: map 100% reduce 0% 08/04/01 17:38:43 INFO mapred.JobClient: map 100% reduce 20% 08/04/01 17:38:44 INFO mapred.JobClient: map 100% reduce 100% 08/04/01 17:38:45 INFO mapred.JobClient: Job complete: job_200804011542_0001 08/04/01 17:38:45 INFO mapred.JobClient: Counters: 9 08/04/01 17:38:45 INFO mapred.JobClient: Job Counters 08/04/01 17:38:45 INFO mapred.JobClient: Launched map tasks=10 08/04/01 17:38:45 INFO mapred.JobClient: Launched reduce tasks=1 08/04/01 17:38:45 INFO mapred.JobClient: Data-local map tasks=10 08/04/01 17:38:45 INFO mapred.JobClient: Map-Reduce Framework 08/04/01 17:38:45 INFO mapred.JobClient: Map input records=10 08/04/01 17:38:45 INFO mapred.JobClient: Map output records=20 08/04/01 17:38:45 INFO mapred.JobClient: Map input bytes=240 08/04/01 17:38:45 INFO mapred.JobClient: Map output bytes=320 08/04/01 17:38:45 INFO mapred.JobClient: Reduce input groups=2 08/04/01 17:38:45 INFO mapred.JobClient: Reduce input records=20 Job Finished in 31.028 seconds Estimated value of PI is 3.1556 > Finally, I try and copy the file over [EMAIL PROTECTED] hadoop-0.15.3]# bin/hadoop distcp s3://ID:[EMAIL PROTECTED]/InputFileFormat.xml /input/InputFileFormat.xml With failures, global counters are inaccurate; consider running with -i Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source s3://ID:[EMAIL PROTECTED]/InputFileFormat.xml does not exist. at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:470) at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563) - You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.
Re: distcp fails :Input source not found
> That was a typo in my email. I do have s3:// in my command when it fails. Not sure what's wrong. Your command looks right to me. Would you mind to show me the exact error message you see? Nicholas
Re: distcp fails :Input source not found
> bin/hadoop distcp s3//:@/fileone.txt /somefolder_on_hdfs/fileone.txt : Fails - Input source doesnt exist. Should "s3//..." be "s3://..."? Nicholas
Re: [some bugs] Re: file permission problem
Hi Stefan, > any magic we can do with hadoop.dfs.umask? > dfs.umask is similar to Unix umask. > Or is there any other off switch for the file security? > If dfs.permissions is set to false, then the security will be turned off. For the two questions above, see http://hadoop.apache.org/core/docs/r0.16.1/hdfs_permissions_guide.html for more details > I definitely can reproduce the problem Johannes describes ... > I guess you are using the nightly builds which having the bug. Please try 0.16.1 release or current trunk. > Beside of that I had some interesting observations. > If I have permissions to write to a folder A I can delete folder A and > file B that is inside of folder A even if I do have no permissions for B. > This is also true for POSIX or Unix, where Hadoop permission bases on. > Also I noticed following in my dfs > [EMAIL PROTECTED] hadoop]$ bin/hadoop fs -ls /user/joa23/myApp-1205474968598 > Found 1 items > /user/joa23/myApp-1205474968598/VOICE_CALL2008-03-13 16:00 > rwxr-xr-xhadoopsupergroup > [EMAIL PROTECTED] hadoop]$ bin/hadoop fs -ls > /user/joa23/myApp-1205474968598/VOICE_CALL > Found 1 items > /user/joa23/myApp-1205474968598/VOICE_CALL/part-027311 > 2008-03-13 16:00rw-r--r--joa23supergroup > > Do I miss something or was I able to write as user joa23 into a > folder owned by hadoop where I should have no permissions. :-O. > Should I open some jira issues? > Suppose joa23 is not a superuser. Then, no. The output above only shows a file owned by joa23 exists in a directory owned hadoop. This can definitely be done by a sequence of commands with chmod/chown. Suppose joa23 is not a superuser. If joa23 can create a file, say by "hadoop fs -put ...", under hadoop's directory with rwxr-xr-x, then it is a bug. But I don't think we can do this. Hope this helps. Nicholas
Re: [some bugs] Re: file permission problem
Hi, Let me clarify the versions having this problem. 0.16.0 release, 0.16.1 release, current trunk: no problem Nightly builds between 0.16.0 and 0.16.1 before HADOOP-2391 or after HADOOP-2915: no problem Nightly builds between 0.16.0 and 0.16.1 after HADOOP-2391 and before HADOOP-2915: bug exists Similarly, codes downloading from trunk before HADOOP-2391 or after HADOOP-2915: no problem Codes downloading from trunk after HADOOP-2391 and before HADOOP-2915: bug exists Sorry for the confusion. Nicholas - Original Message From: Stefan Groschupf <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Saturday, March 15, 2008 8:02:07 PM Subject: Re: [some bugs] Re: file permission problem Great - it is even alrady fixed in 16.1! Thanks for the hint! Stefan On Mar 14, 2008, at 2:49 PM, Andy Li wrote: > I think this is the same problem related to this mail thread. > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg02759.html > > A JIRA has been filed, please see HADOOP-2915.
Re: file permission problem
Hi Johannes, > i'm using the 0.16.0 distribution. I assume you mean the 0.16.0 release (http://hadoop.apache.org/core/releases.html) without any additional patch. I just have tried it but cannot reproduce the problem you described. I did the following: 1) start a cluster with "tsz" 2) run a job with "nicholas" The output directory and files are owned by "nicholas". Am I doing the same thing you did? Could you try again? Nicholas > - Original Message > From: Johannes Zillmann <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Wednesday, March 12, 2008 5:47:27 PM > Subject: file permission problem > > Hi, > > i have a question regarding the file permissions. > I have a kind of workflow where i submit a job from my laptop to a > remote hadoop cluster. > After the job finished i do some file operations on the generated output. > The "cluster-user" is different to the "laptop-user". As output i > specify a directory inside the users home. This output directory, > created through the map-reduce job has "cluster-user" permissions, so > this does not allow me to move or delete the output folder with my > "laptop-user". > > So it looks as follow: > /user/jz/ rwxrwxrwx jzsupergroup > /user/jz/output rwxr-xr-xhadoopsupergroup > > I tried different things to achieve what i want (moving/deleting the > output folder): > - jobConf.setUser("hadoop") on the client side > - System.setProperty("user.name","hadoop") before jobConf instantiation > on the client side > - add user.name node in the hadoop-site.xml on the client side > - setPermision(777) on the home folder on the client side (does not work > recursiv) > - setPermision(777) on the output folder on the client side (permission > denied) > - create the output folder before running the job (Output directory > already exists exception) > > None of the things i tried worked. Is there a way to achieve what i want ? > Any ideas appreciated! > > cheers > Johannes > > > -- ~~~ 101tec GmbH Halle (Saale), Saxony-Anhalt, Germany http://www.101tec.com
Re: file permission problem
Hi Johannes, Which version of hadoop are you using? There is a known bug in some nightly builds. Nicholas - Original Message From: Johannes Zillmann <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Wednesday, March 12, 2008 5:47:27 PM Subject: file permission problem Hi, i have a question regarding the file permissions. I have a kind of workflow where i submit a job from my laptop to a remote hadoop cluster. After the job finished i do some file operations on the generated output. The "cluster-user" is different to the "laptop-user". As output i specify a directory inside the users home. This output directory, created through the map-reduce job has "cluster-user" permissions, so this does not allow me to move or delete the output folder with my "laptop-user". So it looks as follow: /user/jz/ rwxrwxrwx jzsupergroup /user/jz/output rwxr-xr-xhadoopsupergroup I tried different things to achieve what i want (moving/deleting the output folder): - jobConf.setUser("hadoop") on the client side - System.setProperty("user.name","hadoop") before jobConf instantiation on the client side - add user.name node in the hadoop-site.xml on the client side - setPermision(777) on the home folder on the client side (does not work recursiv) - setPermision(777) on the output folder on the client side (permission denied) - create the output folder before running the job (Output directory already exists exception) None of the things i tried worked. Is there a way to achieve what i want ? Any ideas appreciated! cheers Johannes -- ~~~ 101tec GmbH Halle (Saale), Saxony-Anhalt, Germany http://www.101tec.com