Resending the query with a different subject - Was : FileSystem.listStatus() doesn't return list of files in hdfs directory I have a single-node hadoop cluster. The hadoop version - [patn...@ac4-dev-ims-211]~/dev/hadoop/hadoop-0.19.1% hadoop version Hadoop 0.19.1 Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 745977 Compiled by ndaley on Fri Feb 20 00:16:34 UTC 2009 Following is my hadoop-site.xml - <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> I have create some directories under my account and they show up correctly using "hadoop fs" shell command. [patn...@ac4-dev-ims-211]~/dev/hadoop/hadoop-0.19.1% hadoop fs -ls /user/patnala/tmp/allocation Found 2 items drwxr-xr-x - patnala supergroup 0 2009-04-20 21:58 /user/patnala/tmp/allocation/1 drwxr-xr-x - patnala supergroup 0 2009-04-20 21:58 /user/patnala/tmp/allocation/2 I am trying to retrieve the same information in Java through the org.apache.hadoop.fs package. Following is my code - Configuration conf = new Configuration(). conf.addResource(new Path(hadoopConfigFile1)); -- hadoopConfigFile1 is hadoop-default.xml conf.addResource(new Path(hadoopConfigFile1)); -- hadoopConfigFile1 is hadoop-site.xml FileSystem fs = FileSystem.get(conf); FileStatus[] listFiles = listFiles = fs.listStatus(path); logger.debug("Obtained directory contents for ap store url, size - " + listFiles.length); The output is below - DEBUG [main] (Configuration.java:176) - java.io.IOException: config() at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:176) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:164) at com.yahoo.morocco.systems.optimization.client.planstore.GridPlanConsumer.main(GridPlanConsumer.java:94) DEBUG [main] (GridPlanConsumer.java:38) - Allocating GridPlanConsumer with config and allocation plan root URL - /user/patnala/tmp/allocation DEBUG [main] (UnixUserGroupInformation.java:276) - Unix Login: patnala,dev,yahoodev,devmorocco,morocco-ims-dev INFO [main] (GridPlanConsumer.java:101) - Successfully allocationed plan consumer object DEBUG [main] (GridPlanConsumer.java:54) - Obtained directory contents for ap store url, size - 0 The "size - 0" means that it couldn't retrieve any files or directories under - /user/patnala/tmp/allocation/. I tried other paths and it seems like it's treating that path as a local directory. If I try specifying the complete HDFS path, I get some other exception below - INFO [main] (GridPlanConsumer.java:101) - Successfully allocationed plan consumer object java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:52) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:280) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457) at com.yahoo.morocco.systems.optimization.client.planstore.GridPlanConsumer.getPlans(GridPlanConsumer.java:52) at com.yahoo.morocco.systems.optimization.client.planstore.GridPlanConsumer.main(GridPlanConsumer.java:103) ERROR [main] (GridPlanConsumer.java:106) - Program exited with exception: Wrong FS: hdfs://localhost:9000/, expected: file:/// My question is, how do I parse directories and list files for data stored in the HDFS in java? Thanks, Praveen.
_________________________________________________________________ Windows Live™ SkyDrive™: Get 25 GB of free online storage. http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_skydrive_042009