Setting HDFS directory time programmatically
Hi - Is it possible to set the access time of a HDFS directory programmatically? I’m using 0.20.204.0. I need to do that in unit tests, where my clean up program is going to remove files/dirs whose access time is too far in the past. I can setTimes on the test files without any problem, but not on the directories... The directories created automatically when I create the test fiels have a date (with getAccessTime) of 1969/12/31 16:00 and I can’t control that date, which makes my unit testing impossible. By the way, setTimes doesn’t allow to set the date on dirs, but getAccessTime is happy to return a date, which is inconsistent, IMHO. Finally, on our production systems, I’m seeing appropriate dates for both files and directories. Any insight appreciated, Thanks! Frank
Question about accessing another HDFS
Hi - We have two namenodes set up at our company, say: hdfs://A.mycompany.com hdfs://B.mycompany.com From the command line, I can do: Hadoop fs –ls hdfs://A.mycompany.com//some-dir And Hadoop fs –ls hdfs://B.mycompany.com//some-other-dir I’m now trying to do the same from a Java program that uses the HDFS API. No luck there. I get an exception: “Wrong FS”. Any idea what I’m missing in my Java program?? Thanks, Frank
Re: Question about accessing another HDFS
Can you show your code here ? What URL protocol are you using ? I’m guess I’m being very naïve (and relatively new to HDFS). I can’t show too much code, but basically, I’d like to do: Path myPath = new Path(“hdfs://A.mycompany.com//some-dir”); Where Path is a hadoop fs path. I think I can take it from there, if that worked... Did you mean that I need to address the namenode with an http:// address? Thanks! Frank On Thu, Dec 8, 2011 at 5:47 PM, Tom Melendez t...@supertom.com wrote: I'm hoping there is a better answer, but I'm thinking you could load another configuration file (with B.company in it) using Configuration, grab a FileSystem obj with that and then go forward. Seems like some unnecessary overhead though. Thanks, Tom On Thu, Dec 8, 2011 at 2:42 PM, Frank Astier fast...@yahoo-inc.com wrote: Hi - We have two namenodes set up at our company, say: hdfs://A.mycompany.com hdfs://B.mycompany.com From the command line, I can do: Hadoop fs –ls hdfs://A.mycompany.com//some-dir And Hadoop fs –ls hdfs://B.mycompany.com//some-other-dir I’m now trying to do the same from a Java program that uses the HDFS API. No luck there. I get an exception: “Wrong FS”. Any idea what I’m missing in my Java program?? Thanks, Frank -- Jay Vyas MMSB/UCHC
Hadoop start up error
Hi - I’m seeing the following exception while trying to start MiniDFSCluster in some environment. The stack trace is: Starting DataNode 0 with dfs.data.dir: build/test/data/dfs/data/data1,build/test/data/dfs/data/data2 java.lang.NullPointerException at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:413) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:274) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:122) at com.yahoo.ads.ngdstone.tpbdm.BDMTestCase.oneTimeSetUp(BDMTestCase.java:69) My code is: System.setProperty(hadoop.log.dir, /tmp); System.setProperty(dfs.permissions.supergroup, su); conf = new Configuration(); dfsCluster = new MiniDFSCluster(conf, 1, true, null); // == Line 69 in BDMTestCase.java fs = dfsCluster.getFileSystem(); ... Line 413 in MiniDFSCluster is: String ipAddr = dn.getSelfAddr().getAddress().getHostAddress(); So I’m guessing it has something to do with the network setup?? - I see just above in MiniDFSCluster: conf.set(dfs.datanode.address, 127.0.0.1:0); conf.set(dfs.datanode.http.address, 127.0.0.1:0); conf.set(dfs.datanode.ipc.address, 127.0.0.1:0); Do I need to have those ports available/enabled in my environment? Is that the problem here? Thanks! Frank
Question about superuser and permissions
Hi - I’m writing unit tests that programatically start a name node and populate HDFS directories, but I want to simulate the situation where I don’t have read access to some HDFS directory (which happens on the real grid I eventually deploy to). I’ve tried to chown and chmod, but it seems to have no effect, and my unit tests happily read the directory I don’t want them to be able to read. Looking at the permissions documentation, it seems that because my unit test program started the name node, it is automatically superuser. I tried setting dfs.permissions.supergroup to some other user, but that didn’t work either. Is there any way I could have that unit test think it’s not the user who started the name node? Thanks, Frank
Debugging mapper
Hi - I’m using IntelliJ and the WordCount example in Hadoop (which uses MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint straight into the map function of the mapper? - I’ve tried, but so far, the debugger does not stop at the breakpoint. Thanks! Frank
Error with logging in (my) unit tests
Hi - I’m working with Maven inside IntelliJ, using Hadoop 0.20.203.0, and I get the following error message when trying to run my own unit tests, that use MiniDFSCluster. I copied how to use MiniDFSCluster from the Hadoop unit tests. The message: log4j:ERROR Could not instantiate class [org.apache.hadoop.log.metrics.EventCounter]. java.lang.ClassNotFoundException: org.apache.hadoop.log.metrics.EventCounter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at org.apache.log4j.helpers.Loader.loadClass(Loader.java:198) at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:326) at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:123) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:752) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483) at org.apache.log4j.LogManager.clinit(LogManager.java:127) at org.apache.log4j.Logger.getLogger(Logger.java:104) at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289) at org.apache.commons.logging.impl.Log4JLogger.init(Log4JLogger.java:109) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116) at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:914) at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) at org.apache.hadoop.conf.Configuration.clinit(Configuration.java:139) Thanks! Frank
Turn off all Hadoop logs?
Is it possible to turn off all the Hadoop logs simultaneously? In my unit tests, I don’t want to see the myriad “INFO” logs spewed out by various Hadoop components. I’m using: ((Log4JLogger) DataNode.LOG).getLogger().setLevel(Level.OFF); ((Log4JLogger) LeaseManager.LOG).getLogger().setLevel(Level.OFF); ((Log4JLogger) FSNamesystem.LOG).getLogger().setLevel(Level.OFF); ((Log4JLogger) DFSClient.LOG).getLogger().setLevel(Level.OFF); ((Log4JLogger) Storage.LOG).getLogger().setLevel(Level.OFF); But I’m still missing some loggers... Frank
Hadoop in process?
Hi - Is there a way I can start HDFS (the namenode) from a Java main and run unit tests against that? I need to integrate my Java/HDFS program into unit tests, and the unit test machine might not have Hadoop installed. I’m currently running the unit tests by hand with hadoop jar ... My unit tests create a bunch of (small) files in HDFS and manipulate them. I use the fs API for that. I don’t have map/reduce jobs (yet!). Thanks! Frank