MRV1 / MRV2 interoperability question

2014-04-11 Thread David Rosenstrauch
I'm in the process of migrating over our Hadoop setup from MRv1 to MRv2 
and have a question about interoperability.


We run our Hadoop clusters in the cloud (AWS) in a transient fashion. 
I.e., start up clusters when needed, push all output from HDFS to S3, 
and shut the clusters down when done.  We have configurations for 
starting up run different-sized clusters simultaneously to handle 
different work streams, etc.  All works fine.


I have a client machine (the job controller) at the center of the 
process, which runs the scripts to launch  shut down the Hadoop 
clusters, as well as uses the installed Hadoop client to submit jobs to 
the clusters.  Again, all works fine.



I want to start migrating over our setup from MRv1 to MRv2.  But a) I 
don't necessarily need/want to migrate over all of our cluster 
configurations all at once, and b) I need to do some testing on the MRv2 
cluster configurations/scripts before I go live with it.  So, I'd like 
to be able to launch some clusters as MRv2, and some as MRv1.  But given 
my set up with the central job controller machine, I'm scratching my 
head about how to accomplish this.


MRv2 uses a ResourceManager daemon to submit jobs, vs. MRv1's JobTracker 
(with ResourceManager listening on a different port).  If I leave the 
version of the Hadoop client on the job controller machine as MRv1, I'm 
thinking it won't be able to submit jobs to an MRv2 cluster.  Similarly, 
if I upgrade the client to MRv2, then I'd think it wouldn't be able to 
submit jobs to MRv1 clusters.



So my question is:  is there any (easy) way for a single machine to be 
able to submit jobs to both types of clusters?  (E.g., run both the MRv1 
and MRv2 client packages?)



One obvious workaround here would be to start up a 2nd job controller 
machine, and use that for all the MRv2 testing, before I cut over the 
main one to MRv2.  But that's not an ideal solution for a number of 
reasons.  (Cost, time involved in setting up a duplicate environment, 
difficulties involved in splitting production work between 2 machines 
while we transition, etc.)



Any suggestions here greatly appreciated!

Thanks,

DR


RE: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs

2014-04-11 Thread Phan, Truong Q
I could not find the attempt_1395628276810_0062_m_000149_0 attemp* in the 
HDFS /tmp directory.
Where can I find these log files.

Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.com


-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Thursday, 10 April 2014 4:32 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run 
on a larger jobs

It appears to me that whatever chunk of the input CSV files your map task 
000149 gets, the program is unable to process it and throws an error and exits.

Look into the attempt_1395628276810_0062_m_000149_0 attempt's task log to see 
if there's any stdout/stderr printed that may help. The syslog in the attempt's 
task log will also carry a Processing split ...
message that may help you know which file and what offset+length under that 
file was being processed.

On Thu, Apr 10, 2014 at 10:55 AM, Phan, Truong Q troung.p...@team.telstra.com 
wrote:
 Hi



 My Hadoop 2.2.0-cdh5.0.0-beta-1 is failed to run on a larger MapReduce
 Streaming job.

 I have no issue in running the MapReduce Streaming job which has an
 input data file of around 400Mb CSV file.

 However, it is failed when I try to run the job which has 11 input
 data files of size 400Mb each.

 The job failed with the following error.



 I appreciate for any hints or suggestions to fix this issue.



 +

 2014-04-10 10:28:10,498 FATAL [IPC Server handler 2 on 52179]
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
 attempt_1395628276810_0062_m_000149_0 - exited : java.lang.RuntimeException:
 PipeMapRed.waitOutputThreads(): subprocess failed with code 1

 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
 va:320)

 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
 533)

 at
 org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 at
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

 at
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

 at
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
 ion.java:1491)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



 2014-04-10 10:28:10,498 INFO [IPC Server handler 2 on 52179]
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
 from
 attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException:
 PipeMapRed.waitOutputThreads(): subprocess failed with code 1

 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
 va:320)

 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
 533)

 at
 org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 at
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

 at
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

 at
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
 ion.java:1491)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



 2014-04-10 10:28:10,499 INFO [AsyncDispatcher event handler]
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
 Diagnostics report from attempt_1395628276810_0062_m_000149_0: Error:
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
 failed with code 1

 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
 va:320)

 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
 533)

 at
 org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 at
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

 at
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

 at
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

 at 

Re: InputFormat and InputSplit - Network location name contains /:

2014-04-11 Thread Patcharee Thongtra

Hi Harsh,

Many thanks! I got rid of the problem by updating the InputSplit's 
getLocations() to return hosts.


Patcharee

On 04/11/2014 06:16 AM, Harsh J wrote:

Do not use the InputSplit's getLocations() API to supply your file
path, it is not intended for such things, if thats what you've done in
your current InputFormat implementation.

If you're looking to store a single file path, use the FileSplit
class, or if not as simple as that, do use it as a base reference to
build you Path based InputSplit derivative. Its sources are at
https://github.com/apache/hadoop-common/blob/release-2.4.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileSplit.java.
Look for the Writable method overrides in particular to understand how
to use custom fields.

On Thu, Apr 10, 2014 at 9:54 PM, Patcharee Thongtra
patcharee.thong...@uni.no wrote:

Hi,

I wrote a custom InputFormat and InputSplit to handle netcdf file. I use
with a custom pig Load function. When I submitted a job by running a pig
script. I got an error below. From the error log, the network location name
is hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 -
my input file, containing /, and hadoop does not allow.

It could be something missing in my custom InputFormat and InputSplit. Any
ideas? Any help is appreciated,

Patcharee


2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing
the event EventType: JOB_SETUP

2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1387474594811_0071Job Transitioned from SETUP to RUNNING

2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler]
org.apache.hadoop.yarn.util.RackResolver: Resolved
hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to
/default-rack

2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.IllegalArgumentException: Network location name contains /:
hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02
 at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87)
 at org.apache.hadoop.net.NodeBase.init(NodeBase.java:65)
 at
org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111)
 at
org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.init(TaskAttemptImpl.java:548)
 at
org.apache.hadoop.mapred.MapTaskAttemptImpl.init(MapTaskAttemptImpl.java:47)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866)
 at
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632)
 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99)
 at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237)
 at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231)
 at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
 at java.lang.Thread.run(Thread.java:662)
2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler]
org.apache.hadoop.







Number of map task

2014-04-11 Thread Patcharee Thongtra

Hi,

I wrote a custom InputFormat. When I ran the pig script Load function 
using this InputFormat, the number of InputSplit  1, but there was only 
1 map task handling these splits.


Does the number of Map task not correspond to the number of splits?

I think the job will be done quicker if there are more Map tasks?

Patcharee


Re: how can i archive old data in HDFS?

2014-04-11 Thread Peyman Mohajerian
There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html
But not sure if it compresses the data or not.


On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi s...@gopivotal.com wrote:

 AFAIK, no tools now.

 Regards,
 *Stanley Shi,*



 On Fri, Apr 11, 2014 at 9:09 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  how can i archive old data in HDFS ,i have lot of old data ,the
 data will not be use ,but it take lot of space to store it ,i want to
 archive and zip the old data, HDFS can do this operation?





Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread Roger Whitcomb
Hi,
I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
issue browsing HDFS files.  I have written an Apache Commons VFS (Virtual File 
System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for 
Pivot: full disclosure).  And now I'm trying to get this browser to work with 
HDFS to do HDFS browsing from our application.  I'm running into a problem, 
which seems sort of basic, so I thought I'd ask here...

So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
down sort of the minimum set of .jars necessary to at least (try to) connect 
using Commons VFS 2.1:
commons-collections-3.2.1.jar
commons-configuration-1.6.jar
commons-lang-2.6.jar
commons-vfs2-2.1-SNAPSHOT.jar
guava-11.0.2.jar
hadoop-auth-2.3.0.jar
hadoop-common-2.3.0.jar
log4j-1.2.17.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar

What's happening now is that I instantiated the HdfsProvider this way:
private static DefaultFileSystemManager manager = null;

static
{
manager = new DefaultFileSystemManager();
try {
manager.setFilesCache(new DefaultFilesCache());
manager.addProvider(hdfs, new HdfsFileProvider());
manager.setFileContentInfoFactory(new 
FileContentInfoFilenameFactory());
manager.setFilesCache(new SoftRefFilesCache());
manager.setReplicator(new DefaultFileReplicator());
manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
manager.init();
}
catch (final FileSystemException e) {
throw new 
RuntimeException(Intl.getString(object#manager.setupError), e);
}
}

Then, I try to browse into an HDFS system this way:
String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master 
, 50070, hdfsPath);
return manager.resolveFile(url);

Note: the client is running on Windows 7 (but could be any system that runs 
Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
(basically the same thing happens no matter which Hadoop installation I try to 
hit).  So I'm guessing the problem is in my client configuration.

This attempt to basically just connect to HDFS results in a bunch of error 
messages in the log file, which looks like it is trying to do user validation 
on the local machine instead of against the Hadoop (remote) cluster.
Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
Trying to resolve file reference 'hdfs://hadoop-master:50070/'
Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26)  INFO 
org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
deprecated. Instead, use fs.defaultFS
Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
value=[Rate of successful kerberos logins and latency (milliseconds)], about=, 
type=DEFAULT, always=false, sampleName=Ops)
Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
value=[Rate of failed kerberos logins and latency (milliseconds)], about=, 
type=DEFAULT, always=false, sampleName=Ops)
Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: 
UgiMetrics, User and group related metrics
Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG Groups:  Creating 
new Groups object
Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
Trying to load the custom-built native-hadoop library...
Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no 
hadoop in java.library.path
Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
java.library.path= bunch of stuff
Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26)  WARN NativeCodeLoader: 
Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
Apr 11,2014 18:27:39.375 GMT T[AWT-EventQueue-0](26) DEBUG 
JniBasedUnixGroupsMappingWithFallback: Falling back to shell based
Apr 11,2014 18:27:39.375 GMT T[AWT-EventQueue-0](26) DEBUG 
JniBasedUnixGroupsMappingWithFallback: Group 

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread david marion
Hi Roger,

  I wrote the HDFS provider for Commons VFS. I went back and looked at the 
source and tests, and I don't see anything wrong with what you are doing. I did 
develop it against Hadoop 1.1.2 at the time, so there might be an issue that is 
not accounted for with Hadoop 2. It was also not tested with security turned 
on. Are you using security?

Dave

 From: roger.whitc...@actian.com
 To: user@hadoop.apache.org
 Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
 access?
 Date: Fri, 11 Apr 2014 20:20:06 +
 
 Hi,
 I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
 issue browsing HDFS files.  I have written an Apache Commons VFS (Virtual 
 File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair 
 for Pivot: full disclosure).  And now I'm trying to get this browser to work 
 with HDFS to do HDFS browsing from our application.  I'm running into a 
 problem, which seems sort of basic, so I thought I'd ask here...
 
 So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
 down sort of the minimum set of .jars necessary to at least (try to) connect 
 using Commons VFS 2.1:
 commons-collections-3.2.1.jar
 commons-configuration-1.6.jar
 commons-lang-2.6.jar
 commons-vfs2-2.1-SNAPSHOT.jar
 guava-11.0.2.jar
 hadoop-auth-2.3.0.jar
 hadoop-common-2.3.0.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar
 
 What's happening now is that I instantiated the HdfsProvider this way:
   private static DefaultFileSystemManager manager = null;
 
   static
   {
   manager = new DefaultFileSystemManager();
   try {
   manager.setFilesCache(new DefaultFilesCache());
   manager.addProvider(hdfs, new HdfsFileProvider());
   manager.setFileContentInfoFactory(new 
 FileContentInfoFilenameFactory());
   manager.setFilesCache(new SoftRefFilesCache());
   manager.setReplicator(new DefaultFileReplicator());
   manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
   manager.init();
   }
   catch (final FileSystemException e) {
   throw new 
 RuntimeException(Intl.getString(object#manager.setupError), e);
   }
   }
 
 Then, I try to browse into an HDFS system this way:
   String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master 
 , 50070, hdfsPath);
   return manager.resolveFile(url);
 
 Note: the client is running on Windows 7 (but could be any system that runs 
 Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
 (basically the same thing happens no matter which Hadoop installation I try 
 to hit).  So I'm guessing the problem is in my client configuration.
 
 This attempt to basically just connect to HDFS results in a bunch of error 
 messages in the log file, which looks like it is trying to do user validation 
 on the local machine instead of against the Hadoop (remote) cluster.
 Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
 Trying to resolve file reference 'hdfs://hadoop-master:50070/'
 Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26)  INFO 
 org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
 deprecated. Instead, use fs.defaultFS
 Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of successful kerberos logins and latency (milliseconds)], 
 about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of failed kerberos logins and latency (milliseconds)], about=, 
 type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: 
 UgiMetrics, User and group related metrics
 Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG Groups:  Creating 
 new Groups object
 Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
 Trying to load the custom-built native-hadoop library...
 Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: 
 Failed to load native-hadoop with error: 

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread david marion
Also, make sure that the jars on the classpath actually contain the HDFS file 
system. I'm looking at:

No FileSystem for scheme: hdfs

which is an indicator for this condition.

Dave

From: dlmar...@hotmail.com
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?
Date: Fri, 11 Apr 2014 23:48:48 +




Hi Roger,

  I wrote the HDFS provider for Commons VFS. I went back and looked at the 
source and tests, and I don't see anything wrong with what you are doing. I did 
develop it against Hadoop 1.1.2 at the time, so there might be an issue that is 
not accounted for with Hadoop 2. It was also not tested with security turned 
on. Are you using security?

Dave

 From: roger.whitc...@actian.com
 To: user@hadoop.apache.org
 Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
 access?
 Date: Fri, 11 Apr 2014 20:20:06 +
 
 Hi,
 I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
 issue browsing HDFS files.  I have written an Apache Commons VFS (Virtual 
 File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair 
 for Pivot: full disclosure).  And now I'm trying to get this browser to work 
 with HDFS to do HDFS browsing from our application.  I'm running into a 
 problem, which seems sort of basic, so I thought I'd ask here...
 
 So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
 down sort of the minimum set of .jars necessary to at least (try to) connect 
 using Commons VFS 2.1:
 commons-collections-3.2.1.jar
 commons-configuration-1.6.jar
 commons-lang-2.6.jar
 commons-vfs2-2.1-SNAPSHOT.jar
 guava-11.0.2.jar
 hadoop-auth-2.3.0.jar
 hadoop-common-2.3.0.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar
 
 What's happening now is that I instantiated the HdfsProvider this way:
   private static DefaultFileSystemManager manager = null;
 
   static
   {
   manager = new DefaultFileSystemManager();
   try {
   manager.setFilesCache(new DefaultFilesCache());
   manager.addProvider(hdfs, new HdfsFileProvider());
   manager.setFileContentInfoFactory(new 
 FileContentInfoFilenameFactory());
   manager.setFilesCache(new SoftRefFilesCache());
   manager.setReplicator(new DefaultFileReplicator());
   manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
   manager.init();
   }
   catch (final FileSystemException e) {
   throw new 
 RuntimeException(Intl.getString(object#manager.setupError), e);
   }
   }
 
 Then, I try to browse into an HDFS system this way:
   String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master 
 , 50070, hdfsPath);
   return manager.resolveFile(url);
 
 Note: the client is running on Windows 7 (but could be any system that runs 
 Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
 (basically the same thing happens no matter which Hadoop installation I try 
 to hit).  So I'm guessing the problem is in my client configuration.
 
 This attempt to basically just connect to HDFS results in a bunch of error 
 messages in the log file, which looks like it is trying to do user validation 
 on the local machine instead of against the Hadoop (remote) cluster.
 Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
 Trying to resolve file reference 'hdfs://hadoop-master:50070/'
 Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26)  INFO 
 org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
 deprecated. Instead, use fs.defaultFS
 Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of successful kerberos logins and latency (milliseconds)], 
 about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of failed kerberos logins and latency (milliseconds)], about=, 
 type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: 
 UgiMetrics, User and group 

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread Roger Whitcomb
Hi Dave,

​Thanks for the responses.  I guess I have a small question then:  what 
exact class(es) would it be looking for that it can't find?  I have all the 
.jar files I mentioned below on the classpath, and it is loading and executing 
stuff in the org.apache.hadoop.fs.FileSystem class (according to the stack 
trace below), so  there are implementing classes I would guess, so what 
.jar file would they be in?


Thanks,

~Roger



From: david marion dlmar...@hotmail.com
Sent: Friday, April 11, 2014 4:55 PM
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?

Also, make sure that the jars on the classpath actually contain the HDFS file 
system. I'm looking at:

No FileSystem for scheme: hdfs

which is an indicator for this condition.

Dave


From: dlmar...@hotmail.com
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?
Date: Fri, 11 Apr 2014 23:48:48 +

Hi Roger,

  I wrote the HDFS provider for Commons VFS. I went back and looked at the 
source and tests, and I don't see anything wrong with what you are doing. I did 
develop it against Hadoop 1.1.2 at the time, so there might be an issue that is 
not accounted for with Hadoop 2. It was also not tested with security turned 
on. Are you using security?

Dave

 From: roger.whitc...@actian.com
 To: user@hadoop.apache.org
 Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
 access?
 Date: Fri, 11 Apr 2014 20:20:06 +

 Hi,
 I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
 issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File 
 System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for 
 Pivot: full disclosure). And now I'm trying to get this browser to work with 
 HDFS to do HDFS browsing from our application. I'm running into a problem, 
 which seems sort of basic, so I thought I'd ask here...

 So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
 down sort of the minimum set of .jars necessary to at least (try to) connect 
 using Commons VFS 2.1:
 commons-collections-3.2.1.jar
 commons-configuration-1.6.jar
 commons-lang-2.6.jar
 commons-vfs2-2.1-SNAPSHOT.jar
 guava-11.0.2.jar
 hadoop-auth-2.3.0.jar
 hadoop-common-2.3.0.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar

 What's happening now is that I instantiated the HdfsProvider this way:
 private static DefaultFileSystemManager manager = null;

 static
 {
 manager = new DefaultFileSystemManager();
 try {
 manager.setFilesCache(new DefaultFilesCache());
 manager.addProvider(hdfs, new HdfsFileProvider());
 manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory());
 manager.setFilesCache(new SoftRefFilesCache());
 manager.setReplicator(new DefaultFileReplicator());
 manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
 manager.init();
 }
 catch (final FileSystemException e) {
 throw new RuntimeException(Intl.getString(object#manager.setupError), e);
 }
 }

 Then, I try to browse into an HDFS system this way:
 String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, 
 hdfsPath);
 return manager.resolveFile(url);

 Note: the client is running on Windows 7 (but could be any system that runs 
 Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
 (basically the same thing happens no matter which Hadoop installation I try 
 to hit). So I'm guessing the problem is in my client configuration.

 This attempt to basically just connect to HDFS results in a bunch of error 
 messages in the log file, which looks like it is trying to do user validation 
 on the local machine instead of against the Hadoop (remote) cluster.
 Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
 Trying to resolve file reference 'hdfs://hadoop-master:50070/'
 Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO 
 org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
 deprecated. Instead, use fs.defaultFS
 Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of successful kerberos logins and latency (milliseconds)], 
 about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of failed kerberos logins and latency (milliseconds)], about=, 
 type=DEFAULT, always=false, 

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-11 Thread dlmarion
If memory serves me, its in the hadoop-hdfs.jar file.


Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone

 Original message 
From: Roger Whitcomb roger.whitc...@actian.com
Date:04/11/2014  8:37 PM  (GMT-05:00)
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?

Hi Dave,

​Thanks for the responses.  I guess I have a small question then:  what 
exact class(es) would it be looking for that it can't find?  I have all the 
.jar files I mentioned below on the classpath, and it is loading and executing 
stuff in the org.apache.hadoop.fs.FileSystem class (according to the stack 
trace below), so  there are implementing classes I would guess, so what 
.jar file would they be in?


Thanks,

~Roger



From: david marion dlmar...@hotmail.com
Sent: Friday, April 11, 2014 4:55 PM
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?

Also, make sure that the jars on the classpath actually contain the HDFS file 
system. I'm looking at:

No FileSystem for scheme: hdfs

which is an indicator for this condition.

Dave


From: dlmar...@hotmail.com
To: user@hadoop.apache.org
Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
access?
Date: Fri, 11 Apr 2014 23:48:48 +

Hi Roger,

  I wrote the HDFS provider for Commons VFS. I went back and looked at the 
source and tests, and I don't see anything wrong with what you are doing. I did 
develop it against Hadoop 1.1.2 at the time, so there might be an issue that is 
not accounted for with Hadoop 2. It was also not tested with security turned 
on. Are you using security?

Dave

 From: roger.whitc...@actian.com
 To: user@hadoop.apache.org
 Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS 
 access?
 Date: Fri, 11 Apr 2014 20:20:06 +

 Hi,
 I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of 
 issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File 
 System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for 
 Pivot: full disclosure). And now I'm trying to get this browser to work with 
 HDFS to do HDFS browsing from our application. I'm running into a problem, 
 which seems sort of basic, so I thought I'd ask here...

 So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track 
 down sort of the minimum set of .jars necessary to at least (try to) connect 
 using Commons VFS 2.1:
 commons-collections-3.2.1.jar
 commons-configuration-1.6.jar
 commons-lang-2.6.jar
 commons-vfs2-2.1-SNAPSHOT.jar
 guava-11.0.2.jar
 hadoop-auth-2.3.0.jar
 hadoop-common-2.3.0.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar

 What's happening now is that I instantiated the HdfsProvider this way:
 private static DefaultFileSystemManager manager = null;

 static
 {
 manager = new DefaultFileSystemManager();
 try {
 manager.setFilesCache(new DefaultFilesCache());
 manager.addProvider(hdfs, new HdfsFileProvider());
 manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory());
 manager.setFilesCache(new SoftRefFilesCache());
 manager.setReplicator(new DefaultFileReplicator());
 manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
 manager.init();
 }
 catch (final FileSystemException e) {
 throw new RuntimeException(Intl.getString(object#manager.setupError), e);
 }
 }

 Then, I try to browse into an HDFS system this way:
 String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, 
 hdfsPath);
 return manager.resolveFile(url);

 Note: the client is running on Windows 7 (but could be any system that runs 
 Java), and the target has been one of several Hadoop clusters on Ubuntu VMs 
 (basically the same thing happens no matter which Hadoop installation I try 
 to hit). So I'm guessing the problem is in my client configuration.

 This attempt to basically just connect to HDFS results in a bunch of error 
 messages in the log file, which looks like it is trying to do user validation 
 on the local machine instead of against the Hadoop (remote) cluster.
 Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: 
 Trying to resolve file reference 'hdfs://hadoop-master:50070/'
 Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO 
 org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is 
 deprecated. Instead, use fs.defaultFS
 Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG 
 MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate 
 org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
 annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, 
 value=[Rate of successful kerberos logins and latency (milliseconds)], 
 about=, type=DEFAULT, always=false, sampleName=Ops)
 Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) 

Resetting dead datanodes list

2014-04-11 Thread Ashwin Shankar
Hi,
Hadoop-1's name node UI displays dead datanodes
even if those instances are terminated and are not part of the cluster
anymore.
Is there a way to reset the dead datenode list without bouncing namenode ?

This would help me in my script(which would run nightly) which parses the
html page,terminates
dead datanodes and resize the cluster.
-- 
Thanks,
Ashwin