Permission issues with HDFS NFS gateway
Hi, I am running a single node hadoop cluster and also the HDFS NFS gateway on the same node. HDFS processes are running under the context of user “hadoop” and NFS gateway is running under the context of user “nfsserver”. I mounted the NFS export on the same machine as the root user. Now whenever I try to write to NFS mount as the root user I get “permission denied” error. Also I constantly see this error message on the terminal, where I mounted the NFS export - INFO security.ShellBasedIdMapping: Can't map group supergroup. Use its string hashcode:-1710818332 HDFS namenode/datanode NFS config - core-site.xml - property namehadoop.proxyuser.nfsserver.groups/name value*/value /property property namehadoop.proxyuser.nfsserver.hosts/name value*/value /property hdfs-site.xml - property namenfs.dump.dir/name value/tmp/.hdfs-nfs/value /property property namenfs.exports.allowed.hosts/name value* rw/value /property HDFS NFS gateway config - core-site.xml - property namehadoop.proxyuser.nfsserver.groups/name value*/value /property property namehadoop.proxyuser.nfsserver.hosts/name value* rw/value /property hdfs-site.xml - property namenfs.dump.dir/name value/tmp/.hdfs-nfs/value /property property namenfs.exports.allowed.hosts/name value* rw/value /property When I try to do IO on NFS mount as “hadoop” user I don’t face any permission issue. Why am I getting permission denied error while trying to perform writes as root user? Thanks Saurabh
Re: Synchronization among Mappers in map-reduce task Please advise
Hi Wangda , I am not sure making overwrite=false , will solve the problem. As per java doc by making overwrite=false , it will throw an exception if the file already exists. So, for all the remaining mappers it will throw an exception. Also I am very new to ZK and have very basic knowledge of it , I am not sure if it can solve the problem and if yes how. I am still going through available sources on the ZK. Can you please refer to me some source or link on ZK , that can help me in solving the problem. Best Saurabh On Tue, Aug 12, 2014 at 3:08 AM, Wangda Tan wheele...@gmail.com wrote: Hi Saurabh, It's an interesting topic, So , here is the question , is it possible to make sure that when one of the mapper tasks is writing to a file , other should wait until the first one is finished. ? I read that all the mappers task don't interact with each other A simple way to do this is using HDFS namespace: Create file using public FSDataOutputStream create(Path f, boolean overwrite), overwrite=false. Only one mapper can successfully create file. After write completed, the mapper will create a flag file like completed in the same folder. Other mappers can wait for the completed file created. Is there any way to have synchronization between two independent map reduce jobs? I think ZK can do some complex synchronization here. Like mutex, master election, etc. Hope this helps, Wangda Tan On Tue, Aug 12, 2014 at 10:43 AM, saurabh jain sauravma...@gmail.com wrote: Hi Folks , I have been writing a map-reduce application where I am having an input file containing records and every field in the record is separated by some delimiter. In addition to this user will also provide a list of columns that he wants to lookup in a master properties file (stored in HDFS). If this columns (lets say it a key) is present in master properties file then get the corresponding value and update the key with this value and if the key is not present it in the master properties file then it will create a new value for this key and will write to this property file and will also update in the record. I have written this application , tested it and everything worked fine till now. *e.g :* *I/P Record :* This | is | the | test | record *Columns :* 2,4 (that means code will look up only field *is and test* in the master properties file.) Here , I have a question. *Q 1:* In the case when my input file is huge and it is splitted across the multiple mappers , I was getting the below mentioned exception where all the other mappers tasks were failing. *Also initially when I started the job my master properties file is empty.* In my code I have a check if this file (master properties) doesn't exist create a new empty file before submitting the job itself. e.g : If i have 4 splits of data , then 3 map tasks are failing. But after this all the failed map tasks restarts and finally the job become successful. So , *here is the question , is it possible to make sure that when one of the mapper tasks is writing to a file , other should wait until the first one is finished. ?* I read that all the mappers task don't interact with each other. Also what will happen in the scenario when I start multiple parallel map-reduce jobs and all of them working on the same properties files. *Is there any way to have synchronization between two independent map reduce jobs*? I have also read that ZooKeeper can be used in such scenarios , Is that correct ? Error: com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException: IOException - failed while appending data to the file -Failed to create file [/user/cloudera/lob/master/bank.properties] for [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client [10.X.X.17], because this file is already being created by [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on [10.X.X.17] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server
Synchronization among Mappers in map-reduce task
Hi Folks , I have been writing a map-reduce application where I am having an input file containing records and every field in the record is separated by some delimiter. In addition to this user will also provide a list of columns that he wants to lookup in a master properties file (stored in HDFS). If this columns (lets say it a key) is present in master properties file then get the corresponding value and update the key with this value and if the key is not present it in the master properties file then it will create a new value for this key and will write to this property file and will also update in the record. I have written this application , tested it and everything worked fine till now. *e.g :* *I/P Record :* This | is | the | test | record *Columns :* 2,4 (that means code will look up only field *is and test* in the master properties file.) Here , I have a question. *Q 1:* In the case when my input file is huge and it is splitted across the multiple mappers , I was getting the below mentioned exception where all the other mappers tasks were failing. *Also initially when I started the job my master properties file is empty.* In my code I have a check if this file (master properties) doesn't exist create a new empty file before submitting the job itself. e.g : If i have 4 splits of data , then 3 map tasks are failing. But after this all the failed map tasks restarts and finally the job become successful. So , *here is the question , is it possible to make sure that when one of the mapper tasks is writing to a file , other should wait until the first one is finished. ?* I read that all the mappers task don't interact with each other. Also what will happen in the scenario when I start multiple parallel map-reduce jobs and all of them working on the same properties files. *Is there any way to have synchronization between two independent map reduce jobs*? I have also read that ZooKeeper can be used in such scenarios , Is that correct ? Error: com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException: IOException - failed while appending data to the file -Failed to create file [/user/cloudera/lob/master/bank.properties] for [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client [10.X.X.17], because this file is already being created by [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on [10.X.X.17] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
RE: Problem accessing HDFS from a remote machine
Thanks for all the help. Changing fs.default.name value from localhost to IP in all the conf files and making a configuration change in the /etc/conf did the job. Thanks Saurabh From: Rishi Yadav [mailto:ri...@infoobjects.com] Sent: 09 April 2013 10:11 To: user@hadoop.apache.org Subject: Re: Problem accessing HDFS from a remote machine have you checked firewall on namenode. If you are running ubuntu and namenode port is 8020 command is - ufw allow 8020 Thanks and Regards, Rishi Yadav InfoObjects Inc || http://www.infoobjects.comhttp://www.infoobjects.com/ (Big Data Solutions) On Mon, Apr 8, 2013 at 6:57 PM, Azuryy Yu azury...@gmail.commailto:azury...@gmail.com wrote: can you use command jps on your localhost to see if there is NameNode process running? On Tue, Apr 9, 2013 at 2:27 AM, Bjorn Jonsson bjorn...@gmail.commailto:bjorn...@gmail.com wrote: Yes, the namenode port is not open for your cluster. I had this problem to. First, log into your namenode and do netstat -nap to see what ports are listening. You can do service --status-all to see if the namenode service is running. Basically you need Hadoop to bind to the correct ip (an external one, or at least reachable from your remote machine). So listening on 127.0.0.1 or localhost or some ip for a private network will not be sufficient. Check your /etc/hosts file and /etc/hadoop/conf/*-site.xml files to configure the correct ip/ports. I'm no expert, so my understanding might be limited/wrong...but I hope this helps :) Best, B On Mon, Apr 8, 2013 at 7:29 AM, Saurabh Jain saurabh_j...@symantec.commailto:saurabh_j...@symantec.com wrote: Hi All, I have setup a single node cluster(release hadoop-1.0.4). Following is the configuration used - core-site.xml :- property namefs.default.namehttp://fs.default.name/name valuehdfs://localhost:54310/value /property masters:- localhost slaves:- localhost I am able to successfully format the Namenode and perform files system operations by running the CLIs on Namenode. But I am receiving following error when I try to access HDFS from a remote machine - $ bin/hadoop fs -ls / Warning: $HADOOP_HOME is deprecated. 13/04/08 07:13:56 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 0 time(s). 13/04/08 07:13:57 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 1 time(s). 13/04/08 07:13:58 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 2 time(s). 13/04/08 07:13:59 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 3 time(s). 13/04/08 07:14:00 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 4 time(s). 13/04/08 07:14:01 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 5 time(s). 13/04/08 07:14:02 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 6 time(s). 13/04/08 07:14:03 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 7 time(s). 13/04/08 07:14:04 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 8 time(s). 13/04/08 07:14:05 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. Already tried 9 time(s). Bad connection to FS. command aborted. exception: Call to 10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310 failed on connection exception: java.net.ConnectException: Connection refused Where 10.209.10.206 is the IP of the server hosting the Namenode and it is also the configured value for fs.default.namehttp://fs.default.name in the core-site.xml file on the remote machine. Executing 'bin/hadoop fs -fs hdfs://10.209.10.206:54310http://10.209.10.206:54310 -ls /' also result in same output. Also, I am writing a C application using libhdfs to communicate with HDFS. How do we provide credentials while connecting to HDFS? Thanks Saurabh
Problem accessing HDFS from a remote machine
Hi All, I have setup a single node cluster(release hadoop-1.0.4). Following is the configuration used - core-site.xml :- property namefs.default.name/name valuehdfs://localhost:54310/value /property masters:- localhost slaves:- localhost I am able to successfully format the Namenode and perform files system operations by running the CLIs on Namenode. But I am receiving following error when I try to access HDFS from a remote machine - $ bin/hadoop fs -ls / Warning: $HADOOP_HOME is deprecated. 13/04/08 07:13:56 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 0 time(s). 13/04/08 07:13:57 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 1 time(s). 13/04/08 07:13:58 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 2 time(s). 13/04/08 07:13:59 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 3 time(s). 13/04/08 07:14:00 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 4 time(s). 13/04/08 07:14:01 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 5 time(s). 13/04/08 07:14:02 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 6 time(s). 13/04/08 07:14:03 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 7 time(s). 13/04/08 07:14:04 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 8 time(s). 13/04/08 07:14:05 INFO ipc.Client: Retrying connect to server: 10.209.10.206/10.209.10.206:54310. Already tried 9 time(s). Bad connection to FS. command aborted. exception: Call to 10.209.10.206/10.209.10.206:54310 failed on connection exception: java.net.ConnectException: Connection refused Where 10.209.10.206 is the IP of the server hosting the Namenode and it is also the configured value for fs.default.name in the core-site.xml file on the remote machine. Executing 'bin/hadoop fs -fs hdfs://10.209.10.206:54310 -ls /' also result in same output. Also, I am writing a C application using libhdfs to communicate with HDFS. How do we provide credentials while connecting to HDFS? Thanks Saurabh