Permission issues with HDFS NFS gateway

2014-12-15 Thread Saurabh Jain
Hi,

I am running a single node hadoop cluster and also the HDFS NFS gateway on the 
same node. HDFS processes are running under the context of user “hadoop” and 
NFS gateway is running under the context of user “nfsserver”. I mounted the NFS 
export on the same machine as the root user. Now whenever I try to write to NFS 
mount as the root user I get “permission denied” error. Also I constantly see 
this error message on the terminal, where I mounted the NFS export -
INFO security.ShellBasedIdMapping: Can't map group supergroup. Use its string 
hashcode:-1710818332

HDFS namenode/datanode NFS config -

core-site.xml -
property
namehadoop.proxyuser.nfsserver.groups/name
  value*/value
/property
property
namehadoop.proxyuser.nfsserver.hosts/name
value*/value
/property

hdfs-site.xml -
property
namenfs.dump.dir/name
value/tmp/.hdfs-nfs/value
/property
property
namenfs.exports.allowed.hosts/name
  value* rw/value
/property

HDFS NFS gateway config -
core-site.xml -
property
namehadoop.proxyuser.nfsserver.groups/name
  value*/value
/property
property
  namehadoop.proxyuser.nfsserver.hosts/name
  value* rw/value
/property

hdfs-site.xml -
property
namenfs.dump.dir/name
value/tmp/.hdfs-nfs/value
/property
property
namenfs.exports.allowed.hosts/name
value* rw/value
/property

When I try to do IO on NFS mount as “hadoop” user I don’t face any permission 
issue. Why am I getting permission denied error while trying to perform writes 
as root user?

Thanks
Saurabh


Re: Synchronization among Mappers in map-reduce task Please advise

2014-08-12 Thread saurabh jain
Hi Wangda ,

I am not sure making overwrite=false , will solve the problem. As per java
doc by making overwrite=false , it will throw an exception if the file
already exists. So, for all the remaining mappers it will throw an
exception.

Also I am very new to ZK and have very basic knowledge of it , I am not
sure if it can solve the problem and if yes how. I am still going through
available sources on the ZK.

Can you please refer to me some source or link on ZK , that can help me in
solving the problem.

Best
Saurabh

On Tue, Aug 12, 2014 at 3:08 AM, Wangda Tan wheele...@gmail.com wrote:

 Hi Saurabh,
 It's an interesting topic,

  So , here is the question , is it possible to make sure that when one of
 the mapper tasks is writing to a file , other should wait until the first
 one is finished. ? I read that all the mappers task don't interact with
 each other

 A simple way to do this is using HDFS namespace:
 Create file using public FSDataOutputStream create(Path f, boolean
 overwrite), overwrite=false. Only one mapper can successfully create file.

 After write completed, the mapper will create a flag file like completed
 in the same folder. Other mappers can wait for the completed file
 created.

  Is there any way to have synchronization between two independent map
 reduce jobs?
 I think ZK can do some complex synchronization here. Like mutex, master
 election, etc.

 Hope this helps,

 Wangda Tan




 On Tue, Aug 12, 2014 at 10:43 AM, saurabh jain sauravma...@gmail.com
 wrote:

  Hi Folks ,
 
  I have been writing a map-reduce application where I am having an input
  file containing records and every field in the record is separated by
 some
  delimiter.
 
  In addition to this user will also provide a list of columns that he
 wants
  to lookup in a master properties file (stored in HDFS). If this columns
  (lets say it a key) is present in master properties file then get the
  corresponding value and update the key with this value and if the key is
  not present it in the master properties file then it will create a new
  value for this key and will write to this property file and will also
  update in the record.
 
  I have written this application , tested it and everything worked fine
  till now.
 
  *e.g :* *I/P Record :* This | is | the | test | record
 
  *Columns :* 2,4 (that means code will look up only field *is and
 test* in
  the master properties file.)
 
  Here , I have a question.
 
  *Q 1:* In the case when my input file is huge and it is splitted across
  the multiple mappers , I was getting the below mentioned exception where
  all the other mappers tasks were failing. *Also initially when I started
  the job my master properties file is empty.* In my code I have a check if
  this file (master properties) doesn't exist create a new empty file
 before
  submitting the job itself.
 
  e.g : If i have 4 splits of data , then 3 map tasks are failing. But
 after
  this all the failed map tasks restarts and finally the job become
  successful.
 
  So , *here is the question , is it possible to make sure that when one of
  the mapper tasks is writing to a file , other should wait until the first
  one is finished. ?* I read that all the mappers task don't interact with
  each other.
 
  Also what will happen in the scenario when I start multiple parallel
  map-reduce jobs and all of them working on the same properties files. *Is
  there any way to have synchronization between two independent map reduce
  jobs*?
 
  I have also read that ZooKeeper can be used in such scenarios , Is that
  correct ?
 
 
  Error:
 com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException:
 IOException - failed while appending data to the file -Failed to create
 file [/user/cloudera/lob/master/bank.properties] for
 [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client
 [10.X.X.17], because this file is already being created by
  [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on
 [10.X.X.17]
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377)
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612)
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575)
  at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
  at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
  at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server

Synchronization among Mappers in map-reduce task

2014-08-11 Thread saurabh jain
Hi Folks ,

I have been writing a map-reduce application where I am having an input
file containing records and every field in the record is separated by some
delimiter.

In addition to this user will also provide a list of columns that he wants
to lookup in a master properties file (stored in HDFS). If this columns
(lets say it a key) is present in master properties file then get the
corresponding value and update the key with this value and if the key is
not present it in the master properties file then it will create a new
value for this key and will write to this property file and will also
update in the record.

I have written this application , tested it and everything worked fine till
now.

*e.g :* *I/P Record :* This | is | the | test | record

*Columns :* 2,4 (that means code will look up only field *is and test* in
the master properties file.)

Here , I have a question.

*Q 1:* In the case when my input file is huge and it is splitted across the
multiple mappers , I was getting the below mentioned exception where all
the other mappers tasks were failing. *Also initially when I started the
job my master properties file is empty.* In my code I have a check if this
file (master properties) doesn't exist create a new empty file before
submitting the job itself.

e.g : If i have 4 splits of data , then 3 map tasks are failing. But after
this all the failed map tasks restarts and finally the job become
successful.

So , *here is the question , is it possible to make sure that when one of
the mapper tasks is writing to a file , other should wait until the first
one is finished. ?* I read that all the mappers task don't interact with
each other.

Also what will happen in the scenario when I start multiple parallel
map-reduce jobs and all of them working on the same properties files. *Is
there any way to have synchronization between two independent map reduce
jobs*?

I have also read that ZooKeeper can be used in such scenarios , Is that
correct ?


Error: com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException:
IOException - failed while appending data to the file -Failed to
create file [/user/cloudera/lob/master/bank.properties] for
[DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on
client [10.X.X.17], because this file is already being created by
[DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on
[10.X.X.17]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)


RE: Problem accessing HDFS from a remote machine

2013-04-09 Thread Saurabh Jain
Thanks for all the help.

Changing fs.default.name value from localhost to IP in all the conf files and 
making a configuration change in the /etc/conf did the job.

Thanks
Saurabh

From: Rishi Yadav [mailto:ri...@infoobjects.com]
Sent: 09 April 2013 10:11
To: user@hadoop.apache.org
Subject: Re: Problem accessing HDFS from a remote machine

have you checked firewall on namenode.

If you are running ubuntu and namenode port is 8020 command is
- ufw allow 8020


Thanks and Regards,

Rishi Yadav

InfoObjects Inc || http://www.infoobjects.comhttp://www.infoobjects.com/ (Big 
Data Solutions)

On Mon, Apr 8, 2013 at 6:57 PM, Azuryy Yu 
azury...@gmail.commailto:azury...@gmail.com wrote:
can you use command jps on your localhost to see if there is NameNode process 
running?

On Tue, Apr 9, 2013 at 2:27 AM, Bjorn Jonsson 
bjorn...@gmail.commailto:bjorn...@gmail.com wrote:
Yes, the namenode port is not open for your cluster. I had this problem to. 
First, log into your namenode and do netstat -nap to see what ports are 
listening. You can do service --status-all to see if the namenode service is 
running. Basically you need Hadoop to bind to the correct ip (an external one, 
or at least reachable from your remote machine). So listening on 127.0.0.1 or 
localhost or some ip for a private network will not be sufficient. Check your 
/etc/hosts file and /etc/hadoop/conf/*-site.xml files to configure the correct 
ip/ports.

I'm no expert, so my understanding might be limited/wrong...but I hope this 
helps :)

Best,
B

On Mon, Apr 8, 2013 at 7:29 AM, Saurabh Jain 
saurabh_j...@symantec.commailto:saurabh_j...@symantec.com wrote:
Hi All,

I have setup a single node cluster(release hadoop-1.0.4). Following is the 
configuration used -

core-site.xml :-

property
 namefs.default.namehttp://fs.default.name/name
 valuehdfs://localhost:54310/value
/property

masters:-
localhost

slaves:-
localhost

I am able to successfully format the Namenode and perform files system 
operations by running the CLIs on Namenode.

But I am receiving following error when I try to access HDFS from a remote 
machine -

$ bin/hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.

13/04/08 07:13:56 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 0 time(s).
13/04/08 07:13:57 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 1 time(s).
13/04/08 07:13:58 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 2 time(s).
13/04/08 07:13:59 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 3 time(s).
13/04/08 07:14:00 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 4 time(s).
13/04/08 07:14:01 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 5 time(s).
13/04/08 07:14:02 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 6 time(s).
13/04/08 07:14:03 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 7 time(s).
13/04/08 07:14:04 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 8 time(s).
13/04/08 07:14:05 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310. 
Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to 
10.209.10.206/10.209.10.206:54310http://10.209.10.206/10.209.10.206:54310 
failed on connection exception: java.net.ConnectException: Connection refused

Where 10.209.10.206 is the IP of the server hosting the Namenode and it  is 
also the configured value for fs.default.namehttp://fs.default.name in the 
core-site.xml file on the remote machine.

Executing 'bin/hadoop fs -fs 
hdfs://10.209.10.206:54310http://10.209.10.206:54310 -ls /' also result in 
same output.

Also, I am writing a C application using libhdfs to communicate with HDFS. How 
do we provide credentials while connecting to HDFS?

Thanks
Saurabh







Problem accessing HDFS from a remote machine

2013-04-08 Thread Saurabh Jain
Hi All,

I have setup a single node cluster(release hadoop-1.0.4). Following is the 
configuration used -

core-site.xml :-

property
 namefs.default.name/name
 valuehdfs://localhost:54310/value
/property

masters:-
localhost

slaves:-
localhost

I am able to successfully format the Namenode and perform files system 
operations by running the CLIs on Namenode.

But I am receiving following error when I try to access HDFS from a remote 
machine -

$ bin/hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.

13/04/08 07:13:56 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 0 time(s).
13/04/08 07:13:57 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 1 time(s).
13/04/08 07:13:58 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 2 time(s).
13/04/08 07:13:59 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 3 time(s).
13/04/08 07:14:00 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 4 time(s).
13/04/08 07:14:01 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 5 time(s).
13/04/08 07:14:02 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 6 time(s).
13/04/08 07:14:03 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 7 time(s).
13/04/08 07:14:04 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 8 time(s).
13/04/08 07:14:05 INFO ipc.Client: Retrying connect to server: 
10.209.10.206/10.209.10.206:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to 
10.209.10.206/10.209.10.206:54310 failed on connection exception: 
java.net.ConnectException: Connection refused

Where 10.209.10.206 is the IP of the server hosting the Namenode and it  is 
also the configured value for fs.default.name in the core-site.xml file on 
the remote machine.

Executing 'bin/hadoop fs -fs hdfs://10.209.10.206:54310 -ls /' also result in 
same output.

Also, I am writing a C application using libhdfs to communicate with HDFS. How 
do we provide credentials while connecting to HDFS?

Thanks
Saurabh