RE: Permission Denied

2015-03-01 Thread david marion
It looks like / is owned by hadoop.supergroup and the perms are 755. You could 
precreate /accumulo and chown it appropriately, or set the perms for / to 775. 
Init is trying to create /accumulo in hdfs as the accumulo user and your perms 
dont allow it.

Do you have instance.volumes set in accumulo-site.xml?

div Original message /divdivFrom: David Patterson 
patt...@gmail.com /divdivDate:03/01/2015  3:36 PM  (GMT-05:00) 
/divdivTo: user@hadoop.apache.org /divdivCc:  /divdivSubject: 
Permission Denied /divdiv
/div
I'm trying to create an Accumulo/Hadoop/Zookeeper configuration on a single
(Ubuntu) machine, with Hadoop 2.6.0, Zookeeper 3.4.6 and Accumulo 1.6.1.

I've got 3 userids for these components that are in the same group and no
other users are in that group.

I have zookeeper running, and hadoop as well.

Hadoop's core-site.xml file has the hadoop.tmp.dir set to
/app/hadoop/tmp.The /app/hadoop/tmp directory is owned by the hadoop user
and has permissions that allow other members of the group to write
(drwxrwxr-x).

When I try to initialize Accumulo, with bin/accumulo init, I get FATAL:
Failed to initialize filesystem.
org.apache.hadoop.security.AccessControlException: Permission denied:
user=accumulo, access=WRITE, inode=/:hadoop:supergroup:drwxr-xr-x

So, my main question is which directory do I need to give group-write
permission so the accumulo user can write as needed so it can initialize?

The second problem is that the Accumulo init reports
[Configuration.deprecation] INFO : fs.default.name is deprecated. Instead
use fs.defaultFS. However, the hadoop core-site.xml file contains:
namefs.defaultFS/name
valuehdfs://localhost:9000/value

Is there somewhere else that this value (fs.default.name) is specified?
Could it be due to Accumulo having a default value and not getting the
override from hadoop because of the problem listed above?

Thanks

Dave Patterson
patt...@gmail.com


RE: missing data blocks after active name node crashes

2015-02-10 Thread david marion
I believe therr was an issue fixed in 2.5 or 2.6 where the standby NN would not 
process block reports from the DNs when it was dealing with the checkpoint 
process. The missing blocks will get reported eventually.

div Original message /divdivFrom: Chen Song 
chen.song...@gmail.com /divdivDate:02/10/2015  2:44 PM  (GMT-05:00) 
/divdivTo: user@hadoop.apache.org, Ravi Prakash ravi...@ymail.com 
/divdivCc:  /divdivSubject: Re: missing data blocks after active name 
node crashes /divdiv
/div
Thanks for the reply, Ravi.

In my case, what I see constantly is there are always missing blocks every
time active name node crashes. The active name node crashes because of
timeout on journal nodes.

Could this be a specific case which could lead to missing blocks?

Chen

On Tue, Feb 10, 2015 at 2:20 PM, Ravi Prakash ravi...@ymail.com wrote:

 Hi Chen!

 From my understanding, every operation on the Namenode is logged (and
 flushed) to disk / QJM / shared storage. This includes the addBlock
 operation. So when a client requests to write a new block, the metadata is
 logged by the active NN, so even if it crashes later on, the new active NN
 would still see the creation of the block.

 HTH
 Ravi


   On Tuesday, February 10, 2015 9:38 AM, Chen Song chen.song...@gmail.com
 wrote:


 When the active name node crashes, it seems there is always a chance that
 the data blocks in flight will be missing.
 My understanding is that when the active name node crashes, the metadata
 of data blocks in transition which exist in active name node memory is not
 successfully captured by journal nodes and thus not available on standby
 name node when it is promoted to active by zkfc.
 Is my understanding correct? Any way to mitigate this problem or race
 condition?

 --
 Chen Song






--
Chen Song


Client usage with multiple clusters

2014-04-17 Thread david marion
 I'm having an issue in client code where there are multiple clusters with HA 
namenodes involved. Example setup using Hadoop 2.3.0:

Cluster A with the following properties defined in core, hdfs, etc:

dfs.nameservices=clusterA
dfs.ha.namenodes.clusterA=nn1,nn2
dfs.namenode.rpc-address.clusterA.nn1=
dfs.namenode.http-address.clusterA.nn1=
dfs.namenode.rpc-address.clusterA.nn2=
dfs.namenode.http-address.clusterA.nn2=
dfs.client.failover.proxy.provider.clusterA=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Cluster B has similar properties defined in its core-site.xml, hdfs-site.xml, 
etc.

Now, I want to be able to distcp from clusterA to clusterB. Regardless of which 
cluster I am executing this from, neither has all of the information. Looking 
at DFSClient and DataNode:

  - if I put both clusterA and clusterB into dfs.nameservices, then the 
datanodes will try to federate the blocks from both nameservices.
  - if I don't put both clusterA and clusterB into dfs.nameservices, then the 
client won't know how to resolve both namenodes for the nameservices in the 
distcp command.

 I'm wondering if I am missing a property or something that will allow me to 
define both nameservice on both clusters and have the datanodes for the cluster 
*not* try and federate. Looking at DataNode, it appears that it tries to 
connect to all namenodes defined and the first one that sets the clusterid 
wins. It seems that there should be a dfs.datanode.clusterid property that the 
datanode uses. This seems to line up with 'namenode -format -clusterid 
cluster' command when you have multiple nameservices. Am I missing something 
in the configuration that will allow me to do what I want? To get distcp to 
work I had to create a 3 set of configuration files just for the client to use.
  

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread david marion
Hi Roger,

  I wrote the HDFS provider for Commons VFS. I went back and looked at the 
source and tests, and I don't see anything wrong with what you are doing. I did 
develop it against Hadoop 1.1.2 at the time, so there might be an issue that is 
not accounted for with Hadoop 2. It was also not tested with security turned 
on. Are you using security?

Dave

 From: roger.whitc...@actian.com
 To: user@hadoop.apache.org
 Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
 access?
 Date: Fri, 11 Apr 2014 20:20:06 +
 
 Hi,
 I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
 issue browsing HDFS files.  I have written an Apache Commons VFS (Virtual 
 File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair 
 for Pivot: full disclosure).  And now I'm trying to get this browser to work 
 with HDFS to do HDFS browsing from our application.  I'm running into a 
 problem, which seems sort of basic, so I thought I'd ask here...
 
 So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
 down sort of the minimum set of .jars necessary to at least (try to) connect 
 using Commons VFS 2.1:
 commons-collections-3.2.1.jar
 commons-configuration-1.6.jar
 commons-lang-2.6.jar
 commons-vfs2-2.1-SNAPSHOT.jar
 guava-11.0.2.jar
 hadoop-auth-2.3.0.jar
 hadoop-common-2.3.0.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar
 
 What's happening now is that I instantiated the HdfsProvider this way:
   private static DefaultFileSystemManager manager = null;
 
   static
   {
   manager = new DefaultFileSystemManager();
   try {
   manager.setFilesCache(new DefaultFilesCache());
   manager.addProvider(hdfs, new HdfsFileProvider());
   manager.setFileContentInfoFactory(new 
 FileContentInfoFilenameFactory());
   manager.setFilesCache(new SoftRefFilesCache());
   manager.setReplicator(new DefaultFileReplicator());
   manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
   manager.init();
   }
   catch (final FileSystemException e) {
   throw new 
 RuntimeException(Intl.getString(object#manager.setupError), e);
   }
   }
 
 Then, I try to browse into an HDFS system this way:
   String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master 
 , 50070, hdfsPath);
   return manager.resolveFile(url);
 
 Note: the client is running on Windows 7 (but could be any system that runs 
 Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
 (basically the same thing happens no matter which Hadoop installation I try 
 to hit).  So I'm guessing the problem is in my client configuration.
 
 This attempt to basically just connect to HDFS results in a bunch of error 
 messages in the log file, which looks like it is trying to do user validation 
 on the local machine instead of against the Hadoop (remote) cluster.
 Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
 Trying to resolve file reference 'hdfs://hadoop-master:50070/'
 Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26)  INFO 
 org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
 deprecated. Instead, use fs.defaultFS
 Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of successful kerberos logins and latency (milliseconds)], 
 about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of failed kerberos logins and latency (milliseconds)], about=, 
 type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: 
 UgiMetrics, User and group related metrics
 Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG Groups:  Creating 
 new Groups object
 Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
 Trying to load the custom-built native-hadoop library...
 Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
 Failed to load native-hadoop with error: 

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread david marion
Also, make sure that the jars on the classpath actually contain the HDFS file 
system. I'm looking at:

No FileSystem for scheme: hdfs

which is an indicator for this condition.

Dave

From: dlmar...@hotmail.com
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?
Date: Fri, 11 Apr 2014 23:48:48 +




Hi Roger,

  I wrote the HDFS provider for Commons VFS. I went back and looked at the 
source and tests, and I don't see anything wrong with what you are doing. I did 
develop it against Hadoop 1.1.2 at the time, so there might be an issue that is 
not accounted for with Hadoop 2. It was also not tested with security turned 
on. Are you using security?

Dave

 From: roger.whitc...@actian.com
 To: user@hadoop.apache.org
 Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
 access?
 Date: Fri, 11 Apr 2014 20:20:06 +
 
 Hi,
 I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
 issue browsing HDFS files.  I have written an Apache Commons VFS (Virtual 
 File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair 
 for Pivot: full disclosure).  And now I'm trying to get this browser to work 
 with HDFS to do HDFS browsing from our application.  I'm running into a 
 problem, which seems sort of basic, so I thought I'd ask here...
 
 So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
 down sort of the minimum set of .jars necessary to at least (try to) connect 
 using Commons VFS 2.1:
 commons-collections-3.2.1.jar
 commons-configuration-1.6.jar
 commons-lang-2.6.jar
 commons-vfs2-2.1-SNAPSHOT.jar
 guava-11.0.2.jar
 hadoop-auth-2.3.0.jar
 hadoop-common-2.3.0.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar
 
 What's happening now is that I instantiated the HdfsProvider this way:
   private static DefaultFileSystemManager manager = null;
 
   static
   {
   manager = new DefaultFileSystemManager();
   try {
   manager.setFilesCache(new DefaultFilesCache());
   manager.addProvider(hdfs, new HdfsFileProvider());
   manager.setFileContentInfoFactory(new 
 FileContentInfoFilenameFactory());
   manager.setFilesCache(new SoftRefFilesCache());
   manager.setReplicator(new DefaultFileReplicator());
   manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
   manager.init();
   }
   catch (final FileSystemException e) {
   throw new 
 RuntimeException(Intl.getString(object#manager.setupError), e);
   }
   }
 
 Then, I try to browse into an HDFS system this way:
   String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master 
 , 50070, hdfsPath);
   return manager.resolveFile(url);
 
 Note: the client is running on Windows 7 (but could be any system that runs 
 Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
 (basically the same thing happens no matter which Hadoop installation I try 
 to hit).  So I'm guessing the problem is in my client configuration.
 
 This attempt to basically just connect to HDFS results in a bunch of error 
 messages in the log file, which looks like it is trying to do user validation 
 on the local machine instead of against the Hadoop (remote) cluster.
 Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
 Trying to resolve file reference 'hdfs://hadoop-master:50070/'
 Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26)  INFO 
 org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
 deprecated. Instead, use fs.defaultFS
 Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of successful kerberos logins and latency (milliseconds)], 
 about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of failed kerberos logins and latency (milliseconds)], about=, 
 type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: 
 UgiMetrics, User and group 

RE: HA NN Failover question

2014-03-18 Thread david marion
I think I found the issue. The ZKFC on the standby NN server tried, and failed, 
to connect to the standby NN when I shutdown the network on the Active NN 
server. I'm getting an exception from the HealthMonitor in the ZKFC log:

WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception try to 
monitor health of NameNode at host/ip:port.
INFO org.apache.hadoop.ipc.CLient: Retrying connect to server 
host/ip:port. Already tried 0 time(s); retry policy is  (the default)

Is it significant that it thinks the address is host/ip, instead of just the 
host or the ip?

From: azury...@gmail.com
Subject: Re: HA NN Failover question
Date: Sat, 15 Mar 2014 11:35:20 +0800
To: user@hadoop.apache.org

I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1

Sent from my iPhone5s
On 2014年3月15日, at 10:53, dlmarion dlmar...@hotmail.com wrote:






Apache Hadoop 2.3.0







Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone





 Original message 

From: Azuryy 

Date:03/14/2014 10:45 PM (GMT-05:00) 

To: user@hadoop.apache.org 

Subject: Re: HA NN Failover question 




Which Hadoop version you used?




Sent from my iPhone5s


On 2014年3月15日, at 9:29, dlmarion dlmar...@hotmail.com wrote:







Server 1: NN1 and ZKFC1
Server 2: NN2 and ZKFC2
Server 3: Journal1 and ZK1
Server 4: Journal2 and ZK2
Server 5: Journal3 and ZK3
Server 6+: Datanode
 
All in the same rack. I would expect the ZKFC from the active name node server 
to lose its lock and the other ZKFC to tell the standby namenode that it should 
become active (I’m assuming that’s how it works).
 
- Dave
 


From: Juan Carlos [mailto:juc...@gmail.com]


Sent: Friday, March 14, 2014 9:12 PM

To: user@hadoop.apache.org

Subject: Re: HA NN Failover question


 

Hi Dave,


How many zookeeper servers do you have and where are them? 




Juan Carlos Fernández Rodríguez




El 15/03/2014, a las 01:21, dlmarion dlmar...@hotmail.com escribió:



I was doing some testing with HA NN today. I set up two NN with active failover 
(ZKFC) using sshfence. I tested that its working on both NN by doing ‘kill -9 
pid’ on the active NN. When I did this on the active node, the standby would
 become the active and everything seemed to work. Next, I logged onto the 
active NN and did a ‘service network stop’ to simulate a NIC/network failure. 
The standby did not become the active in this scenario. In fact, it remained in 
standby mode and complained
 in the log that it could not communicate with (what was) the active NN. I was 
unable to find anything relevant via searches in Google in Jira. Does anyone 
have experience successfully testing this? I’m hoping that it is just a 
configuration problem.
 
FWIW, when the network was restarted on the active NN, it failed over almost 
immediately.
 
Thanks,
 
Dave








  

RE: HA NN Failover question

2014-03-18 Thread david marion
Found this: 
http://grokbase.com/t/cloudera/cdh-user/12anhyr8ht/cdh4-failover-controllers

Then configured dfs.ha.fencing.methods to contain both sshfence and 
shell(/bin/true). Note that the docs for core-default.xml say that the value is 
a list. I tried a comma with no luck. Had to look in the src to find it's 
separated by a newline. Adding shell(/bin/true) allowed it to work successfully.

From: dlmar...@hotmail.com
To: user@hadoop.apache.org
Subject: RE: HA NN Failover question
Date: Tue, 18 Mar 2014 14:51:25 +




I think I found the issue. The ZKFC on the standby NN server tried, and failed, 
to connect to the standby NN when I shutdown the network on the Active NN 
server. I'm getting an exception from the HealthMonitor in the ZKFC log:

WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception try to 
monitor health of NameNode at host/ip:port.
INFO org.apache.hadoop.ipc.CLient: Retrying connect to server 
host/ip:port. Already tried 0 time(s); retry policy is  (the default)

Is it significant that it thinks the address is host/ip, instead of just the 
host or the ip?

From: azury...@gmail.com
Subject: Re: HA NN Failover question
Date: Sat, 15 Mar 2014 11:35:20 +0800
To: user@hadoop.apache.org

I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1

Sent from my iPhone5s
On 2014年3月15日, at 10:53, dlmarion dlmar...@hotmail.com wrote:






Apache Hadoop 2.3.0







Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone





 Original message 

From: Azuryy 

Date:03/14/2014 10:45 PM (GMT-05:00) 

To: user@hadoop.apache.org 

Subject: Re: HA NN Failover question 




Which Hadoop version you used?




Sent from my iPhone5s


On 2014年3月15日, at 9:29, dlmarion dlmar...@hotmail.com wrote:







Server 1: NN1 and ZKFC1
Server 2: NN2 and ZKFC2
Server 3: Journal1 and ZK1
Server 4: Journal2 and ZK2
Server 5: Journal3 and ZK3
Server 6+: Datanode
 
All in the same rack. I would expect the ZKFC from the active name node server 
to lose its lock and the other ZKFC to tell the standby namenode that it should 
become active (I’m assuming that’s how it works).
 
- Dave
 


From: Juan Carlos [mailto:juc...@gmail.com]


Sent: Friday, March 14, 2014 9:12 PM

To: user@hadoop.apache.org

Subject: Re: HA NN Failover question


 

Hi Dave,


How many zookeeper servers do you have and where are them? 




Juan Carlos Fernández Rodríguez




El 15/03/2014, a las 01:21, dlmarion dlmar...@hotmail.com escribió:



I was doing some testing with HA NN today. I set up two NN with active failover 
(ZKFC) using sshfence. I tested that its working on both NN by doing ‘kill -9 
pid’ on the active NN. When I did this on the active node, the standby would
 become the active and everything seemed to work. Next, I logged onto the 
active NN and did a ‘service network stop’ to simulate a NIC/network failure. 
The standby did not become the active in this scenario. In fact, it remained in 
standby mode and complained
 in the log that it could not communicate with (what was) the active NN. I was 
unable to find anything relevant via searches in Google in Jira. Does anyone 
have experience successfully testing this? I’m hoping that it is just a 
configuration problem.
 
FWIW, when the network was restarted on the active NN, it failed over almost 
immediately.
 
Thanks,
 
Dave