MRV1 / MRV2 interoperability question
I'm in the process of migrating over our Hadoop setup from MRv1 to MRv2 and have a question about interoperability. We run our Hadoop clusters in the cloud (AWS) in a transient fashion. I.e., start up clusters when needed, push all output from HDFS to S3, and shut the clusters down when done. We have configurations for starting up run different-sized clusters simultaneously to handle different work streams, etc. All works fine. I have a client machine (the job controller) at the center of the process, which runs the scripts to launch shut down the Hadoop clusters, as well as uses the installed Hadoop client to submit jobs to the clusters. Again, all works fine. I want to start migrating over our setup from MRv1 to MRv2. But a) I don't necessarily need/want to migrate over all of our cluster configurations all at once, and b) I need to do some testing on the MRv2 cluster configurations/scripts before I go live with it. So, I'd like to be able to launch some clusters as MRv2, and some as MRv1. But given my set up with the central job controller machine, I'm scratching my head about how to accomplish this. MRv2 uses a ResourceManager daemon to submit jobs, vs. MRv1's JobTracker (with ResourceManager listening on a different port). If I leave the version of the Hadoop client on the job controller machine as MRv1, I'm thinking it won't be able to submit jobs to an MRv2 cluster. Similarly, if I upgrade the client to MRv2, then I'd think it wouldn't be able to submit jobs to MRv1 clusters. So my question is: is there any (easy) way for a single machine to be able to submit jobs to both types of clusters? (E.g., run both the MRv1 and MRv2 client packages?) One obvious workaround here would be to start up a 2nd job controller machine, and use that for all the MRv2 testing, before I cut over the main one to MRv2. But that's not an ideal solution for a number of reasons. (Cost, time involved in setting up a duplicate environment, difficulties involved in splitting production work between 2 machines while we transition, etc.) Any suggestions here greatly appreciated! Thanks, DR
RE: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs
I could not find the attempt_1395628276810_0062_m_000149_0 attemp* in the HDFS /tmp directory. Where can I find these log files. Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.com -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, 10 April 2014 4:32 PM To: user@hadoop.apache.org Subject: Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs It appears to me that whatever chunk of the input CSV files your map task 000149 gets, the program is unable to process it and throws an error and exits. Look into the attempt_1395628276810_0062_m_000149_0 attempt's task log to see if there's any stdout/stderr printed that may help. The syslog in the attempt's task log will also carry a Processing split ... message that may help you know which file and what offset+length under that file was being processed. On Thu, Apr 10, 2014 at 10:55 AM, Phan, Truong Q troung.p...@team.telstra.com wrote: Hi My Hadoop 2.2.0-cdh5.0.0-beta-1 is failed to run on a larger MapReduce Streaming job. I have no issue in running the MapReduce Streaming job which has an input data file of around 400Mb CSV file. However, it is failed when I try to run the job which has 11 input data files of size 400Mb each. The job failed with the following error. I appreciate for any hints or suggestions to fix this issue. + 2014-04-10 10:28:10,498 FATAL [IPC Server handler 2 on 52179] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1395628276810_0062_m_000149_0 - exited : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja va:320) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java: 533) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat ion.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) 2014-04-10 10:28:10,498 INFO [IPC Server handler 2 on 52179] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja va:320) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java: 533) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat ion.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) 2014-04-10 10:28:10,499 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja va:320) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java: 533) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at
Re: InputFormat and InputSplit - Network location name contains /:
Hi Harsh, Many thanks! I got rid of the problem by updating the InputSplit's getLocations() to return hosts. Patcharee On 04/11/2014 06:16 AM, Harsh J wrote: Do not use the InputSplit's getLocations() API to supply your file path, it is not intended for such things, if thats what you've done in your current InputFormat implementation. If you're looking to store a single file path, use the FileSplit class, or if not as simple as that, do use it as a base reference to build you Path based InputSplit derivative. Its sources are at https://github.com/apache/hadoop-common/blob/release-2.4.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileSplit.java. Look for the Writable method overrides in particular to understand how to use custom fields. On Thu, Apr 10, 2014 at 9:54 PM, Patcharee Thongtra patcharee.thong...@uni.no wrote: Hi, I wrote a custom InputFormat and InputSplit to handle netcdf file. I use with a custom pig Load function. When I submitted a job by running a pig script. I got an error below. From the error log, the network location name is hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 - my input file, containing /, and hadoop does not allow. It could be something missing in my custom InputFormat and InputSplit. Any ideas? Any help is appreciated, Patcharee 2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP 2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1387474594811_0071Job Transitioned from SETUP to RUNNING 2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to /default-rack 2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Network location name contains /: hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87) at org.apache.hadoop.net.NodeBase.init(NodeBase.java:65) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.init(TaskAttemptImpl.java:548) at org.apache.hadoop.mapred.MapTaskAttemptImpl.init(MapTaskAttemptImpl.java:47) at org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler] org.apache.hadoop.
Number of map task
Hi, I wrote a custom InputFormat. When I ran the pig script Load function using this InputFormat, the number of InputSplit 1, but there was only 1 map task handling these splits. Does the number of Map task not correspond to the number of splits? I think the job will be done quicker if there are more Map tasks? Patcharee
Re: how can i archive old data in HDFS?
There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html But not sure if it compresses the data or not. On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi s...@gopivotal.com wrote: AFAIK, no tools now. Regards, *Stanley Shi,* On Fri, Apr 11, 2014 at 9:09 AM, ch huang justlo...@gmail.com wrote: hi,maillist: how can i archive old data in HDFS ,i have lot of old data ,the data will not be use ,but it take lot of space to store it ,i want to archive and zip the old data, HDFS can do this operation?
Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?
Hi, I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for Pivot: full disclosure). And now I'm trying to get this browser to work with HDFS to do HDFS browsing from our application. I'm running into a problem, which seems sort of basic, so I thought I'd ask here... So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track down sort of the minimum set of .jars necessary to at least (try to) connect using Commons VFS 2.1: commons-collections-3.2.1.jar commons-configuration-1.6.jar commons-lang-2.6.jar commons-vfs2-2.1-SNAPSHOT.jar guava-11.0.2.jar hadoop-auth-2.3.0.jar hadoop-common-2.3.0.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar What's happening now is that I instantiated the HdfsProvider this way: private static DefaultFileSystemManager manager = null; static { manager = new DefaultFileSystemManager(); try { manager.setFilesCache(new DefaultFilesCache()); manager.addProvider(hdfs, new HdfsFileProvider()); manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory()); manager.setFilesCache(new SoftRefFilesCache()); manager.setReplicator(new DefaultFileReplicator()); manager.setCacheStrategy(CacheStrategy.ON_RESOLVE); manager.init(); } catch (final FileSystemException e) { throw new RuntimeException(Intl.getString(object#manager.setupError), e); } } Then, I try to browse into an HDFS system this way: String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, hdfsPath); return manager.resolveFile(url); Note: the client is running on Windows 7 (but could be any system that runs Java), and the target has been one of several Hadoop clusters on Ubuntu VMs (basically the same thing happens no matter which Hadoop installation I try to hit). So I'm guessing the problem is in my client configuration. This attempt to basically just connect to HDFS results in a bunch of error messages in the log file, which looks like it is trying to do user validation on the local machine instead of against the Hadoop (remote) cluster. Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: Trying to resolve file reference 'hdfs://hadoop-master:50070/' Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG Groups: Creating new Groups object Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: java.library.path= bunch of stuff Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Apr 11,2014 18:27:39.375 GMT T[AWT-EventQueue-0](26) DEBUG JniBasedUnixGroupsMappingWithFallback: Falling back to shell based Apr 11,2014 18:27:39.375 GMT T[AWT-EventQueue-0](26) DEBUG JniBasedUnixGroupsMappingWithFallback: Group
RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?
Hi Roger, I wrote the HDFS provider for Commons VFS. I went back and looked at the source and tests, and I don't see anything wrong with what you are doing. I did develop it against Hadoop 1.1.2 at the time, so there might be an issue that is not accounted for with Hadoop 2. It was also not tested with security turned on. Are you using security? Dave From: roger.whitc...@actian.com To: user@hadoop.apache.org Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 20:20:06 + Hi, I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for Pivot: full disclosure). And now I'm trying to get this browser to work with HDFS to do HDFS browsing from our application. I'm running into a problem, which seems sort of basic, so I thought I'd ask here... So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track down sort of the minimum set of .jars necessary to at least (try to) connect using Commons VFS 2.1: commons-collections-3.2.1.jar commons-configuration-1.6.jar commons-lang-2.6.jar commons-vfs2-2.1-SNAPSHOT.jar guava-11.0.2.jar hadoop-auth-2.3.0.jar hadoop-common-2.3.0.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar What's happening now is that I instantiated the HdfsProvider this way: private static DefaultFileSystemManager manager = null; static { manager = new DefaultFileSystemManager(); try { manager.setFilesCache(new DefaultFilesCache()); manager.addProvider(hdfs, new HdfsFileProvider()); manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory()); manager.setFilesCache(new SoftRefFilesCache()); manager.setReplicator(new DefaultFileReplicator()); manager.setCacheStrategy(CacheStrategy.ON_RESOLVE); manager.init(); } catch (final FileSystemException e) { throw new RuntimeException(Intl.getString(object#manager.setupError), e); } } Then, I try to browse into an HDFS system this way: String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, hdfsPath); return manager.resolveFile(url); Note: the client is running on Windows 7 (but could be any system that runs Java), and the target has been one of several Hadoop clusters on Ubuntu VMs (basically the same thing happens no matter which Hadoop installation I try to hit). So I'm guessing the problem is in my client configuration. This attempt to basically just connect to HDFS results in a bunch of error messages in the log file, which looks like it is trying to do user validation on the local machine instead of against the Hadoop (remote) cluster. Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: Trying to resolve file reference 'hdfs://hadoop-master:50070/' Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG Groups: Creating new Groups object Apr 11,2014 18:27:39.344 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... Apr 11,2014 18:27:39.360 GMT T[AWT-EventQueue-0](26) DEBUG NativeCodeLoader: Failed to load native-hadoop with error:
RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?
Also, make sure that the jars on the classpath actually contain the HDFS file system. I'm looking at: No FileSystem for scheme: hdfs which is an indicator for this condition. Dave From: dlmar...@hotmail.com To: user@hadoop.apache.org Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 23:48:48 + Hi Roger, I wrote the HDFS provider for Commons VFS. I went back and looked at the source and tests, and I don't see anything wrong with what you are doing. I did develop it against Hadoop 1.1.2 at the time, so there might be an issue that is not accounted for with Hadoop 2. It was also not tested with security turned on. Are you using security? Dave From: roger.whitc...@actian.com To: user@hadoop.apache.org Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 20:20:06 + Hi, I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for Pivot: full disclosure). And now I'm trying to get this browser to work with HDFS to do HDFS browsing from our application. I'm running into a problem, which seems sort of basic, so I thought I'd ask here... So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track down sort of the minimum set of .jars necessary to at least (try to) connect using Commons VFS 2.1: commons-collections-3.2.1.jar commons-configuration-1.6.jar commons-lang-2.6.jar commons-vfs2-2.1-SNAPSHOT.jar guava-11.0.2.jar hadoop-auth-2.3.0.jar hadoop-common-2.3.0.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar What's happening now is that I instantiated the HdfsProvider this way: private static DefaultFileSystemManager manager = null; static { manager = new DefaultFileSystemManager(); try { manager.setFilesCache(new DefaultFilesCache()); manager.addProvider(hdfs, new HdfsFileProvider()); manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory()); manager.setFilesCache(new SoftRefFilesCache()); manager.setReplicator(new DefaultFileReplicator()); manager.setCacheStrategy(CacheStrategy.ON_RESOLVE); manager.init(); } catch (final FileSystemException e) { throw new RuntimeException(Intl.getString(object#manager.setupError), e); } } Then, I try to browse into an HDFS system this way: String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, hdfsPath); return manager.resolveFile(url); Note: the client is running on Windows 7 (but could be any system that runs Java), and the target has been one of several Hadoop clusters on Ubuntu VMs (basically the same thing happens no matter which Hadoop installation I try to hit). So I'm guessing the problem is in my client configuration. This attempt to basically just connect to HDFS results in a bunch of error messages in the log file, which looks like it is trying to do user validation on the local machine instead of against the Hadoop (remote) cluster. Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: Trying to resolve file reference 'hdfs://hadoop-master:50070/' Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MetricsSystemImpl: UgiMetrics, User and group
RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?
Hi Dave, Thanks for the responses. I guess I have a small question then: what exact class(es) would it be looking for that it can't find? I have all the .jar files I mentioned below on the classpath, and it is loading and executing stuff in the org.apache.hadoop.fs.FileSystem class (according to the stack trace below), so there are implementing classes I would guess, so what .jar file would they be in? Thanks, ~Roger From: david marion dlmar...@hotmail.com Sent: Friday, April 11, 2014 4:55 PM To: user@hadoop.apache.org Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Also, make sure that the jars on the classpath actually contain the HDFS file system. I'm looking at: No FileSystem for scheme: hdfs which is an indicator for this condition. Dave From: dlmar...@hotmail.com To: user@hadoop.apache.org Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 23:48:48 + Hi Roger, I wrote the HDFS provider for Commons VFS. I went back and looked at the source and tests, and I don't see anything wrong with what you are doing. I did develop it against Hadoop 1.1.2 at the time, so there might be an issue that is not accounted for with Hadoop 2. It was also not tested with security turned on. Are you using security? Dave From: roger.whitc...@actian.com To: user@hadoop.apache.org Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 20:20:06 + Hi, I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for Pivot: full disclosure). And now I'm trying to get this browser to work with HDFS to do HDFS browsing from our application. I'm running into a problem, which seems sort of basic, so I thought I'd ask here... So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track down sort of the minimum set of .jars necessary to at least (try to) connect using Commons VFS 2.1: commons-collections-3.2.1.jar commons-configuration-1.6.jar commons-lang-2.6.jar commons-vfs2-2.1-SNAPSHOT.jar guava-11.0.2.jar hadoop-auth-2.3.0.jar hadoop-common-2.3.0.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar What's happening now is that I instantiated the HdfsProvider this way: private static DefaultFileSystemManager manager = null; static { manager = new DefaultFileSystemManager(); try { manager.setFilesCache(new DefaultFilesCache()); manager.addProvider(hdfs, new HdfsFileProvider()); manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory()); manager.setFilesCache(new SoftRefFilesCache()); manager.setReplicator(new DefaultFileReplicator()); manager.setCacheStrategy(CacheStrategy.ON_RESOLVE); manager.init(); } catch (final FileSystemException e) { throw new RuntimeException(Intl.getString(object#manager.setupError), e); } } Then, I try to browse into an HDFS system this way: String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, hdfsPath); return manager.resolveFile(url); Note: the client is running on Windows 7 (but could be any system that runs Java), and the target has been one of several Hadoop clusters on Ubuntu VMs (basically the same thing happens no matter which Hadoop installation I try to hit). So I'm guessing the problem is in my client configuration. This attempt to basically just connect to HDFS results in a bunch of error messages in the log file, which looks like it is trying to do user validation on the local machine instead of against the Hadoop (remote) cluster. Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: Trying to resolve file reference 'hdfs://hadoop-master:50070/' Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false,
RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?
If memory serves me, its in the hadoop-hdfs.jar file. Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone Original message From: Roger Whitcomb roger.whitc...@actian.com Date:04/11/2014 8:37 PM (GMT-05:00) To: user@hadoop.apache.org Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Hi Dave, Thanks for the responses. I guess I have a small question then: what exact class(es) would it be looking for that it can't find? I have all the .jar files I mentioned below on the classpath, and it is loading and executing stuff in the org.apache.hadoop.fs.FileSystem class (according to the stack trace below), so there are implementing classes I would guess, so what .jar file would they be in? Thanks, ~Roger From: david marion dlmar...@hotmail.com Sent: Friday, April 11, 2014 4:55 PM To: user@hadoop.apache.org Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Also, make sure that the jars on the classpath actually contain the HDFS file system. I'm looking at: No FileSystem for scheme: hdfs which is an indicator for this condition. Dave From: dlmar...@hotmail.com To: user@hadoop.apache.org Subject: RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 23:48:48 + Hi Roger, I wrote the HDFS provider for Commons VFS. I went back and looked at the source and tests, and I don't see anything wrong with what you are doing. I did develop it against Hadoop 1.1.2 at the time, so there might be an issue that is not accounted for with Hadoop 2. It was also not tested with security turned on. Are you using security? Dave From: roger.whitc...@actian.com To: user@hadoop.apache.org Subject: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access? Date: Fri, 11 Apr 2014 20:20:06 + Hi, I'm fairly new to Hadoop, but not to Apache, and I'm having a newbie kind of issue browsing HDFS files. I have written an Apache Commons VFS (Virtual File System) browser for the Apache Pivot GUI framework (I'm the PMC Chair for Pivot: full disclosure). And now I'm trying to get this browser to work with HDFS to do HDFS browsing from our application. I'm running into a problem, which seems sort of basic, so I thought I'd ask here... So, I downloaded Hadoop 2.3.0 from one of the mirrors, and was able to track down sort of the minimum set of .jars necessary to at least (try to) connect using Commons VFS 2.1: commons-collections-3.2.1.jar commons-configuration-1.6.jar commons-lang-2.6.jar commons-vfs2-2.1-SNAPSHOT.jar guava-11.0.2.jar hadoop-auth-2.3.0.jar hadoop-common-2.3.0.jar log4j-1.2.17.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar What's happening now is that I instantiated the HdfsProvider this way: private static DefaultFileSystemManager manager = null; static { manager = new DefaultFileSystemManager(); try { manager.setFilesCache(new DefaultFilesCache()); manager.addProvider(hdfs, new HdfsFileProvider()); manager.setFileContentInfoFactory(new FileContentInfoFilenameFactory()); manager.setFilesCache(new SoftRefFilesCache()); manager.setReplicator(new DefaultFileReplicator()); manager.setCacheStrategy(CacheStrategy.ON_RESOLVE); manager.init(); } catch (final FileSystemException e) { throw new RuntimeException(Intl.getString(object#manager.setupError), e); } } Then, I try to browse into an HDFS system this way: String url = String.format(hdfs://%1$s:%2$d/%3$s, hadoop-master , 50070, hdfsPath); return manager.resolveFile(url); Note: the client is running on Windows 7 (but could be any system that runs Java), and the target has been one of several Hadoop clusters on Ubuntu VMs (basically the same thing happens no matter which Hadoop installation I try to hit). So I'm guessing the problem is in my client configuration. This attempt to basically just connect to HDFS results in a bunch of error messages in the log file, which looks like it is trying to do user validation on the local machine instead of against the Hadoop (remote) cluster. Apr 11,2014 18:27:38.640 GMT T[AWT-EventQueue-0](26) DEBUG FileObjectManager: Trying to resolve file reference 'hdfs://hadoop-master:50070/' Apr 11,2014 18:27:38.953 GMT T[AWT-EventQueue-0](26) INFO org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Apr 11,2014 18:27:39.078 GMT T[AWT-EventQueue-0](26) DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops) Apr 11,2014 18:27:39.094 GMT T[AWT-EventQueue-0](26)
Resetting dead datanodes list
Hi, Hadoop-1's name node UI displays dead datanodes even if those instances are terminated and are not part of the cluster anymore. Is there a way to reset the dead datenode list without bouncing namenode ? This would help me in my script(which would run nightly) which parses the html page,terminates dead datanodes and resize the cluster. -- Thanks, Ashwin