Avery, They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
We have used the new HDFS apis in YARN in some places. hth, Arun On Dec 5, 2011, at 10:59 PM, Avery Ching wrote: > Thank you for the response, that's what I thought as well =). I spent the > day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a > lot of API changes! > > Avery > > On 12/5/11 9:14 PM, Mahadev Konar wrote: >> Avery, >> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be >> wrong but looking at the HDFS apis' it doesnt look like that it would >> be a lot of work to getting it to work with 0.20 apis. We had been >> using filecontext api's initially but have transitioned back to the >> old API's. >> >> Hope that helps. >> >> mahadev >> >> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ach...@apache.org> wrote: >>> Hi, >>> >>> I've been playing with 0.23.0, really nice stuff! I was able to setup a >>> small test cluster (40 nodes) and launch the example jobs. I was also able >>> to recompile old Hadoop programs with the new jars and start up those >>> programs as well. My question is the following: >>> >>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN. >>> This appears to be a bit of work. Launching the jobs gives me the >>> following error: >>> >>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) - >>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC >>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate >>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at >>> {removed}.{xxx}/{removed}:50177 >>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC >>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy >>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol >>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate >>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at >>> {removed}.{xxx}/{removed}:50177 >>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster >>> (Cluster.java:initialize(116)) - Failed to use >>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: >>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs >>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster. >>> Please check your configuration for mapreduce.framework.name and the >>> correspond server addresses. >>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123) >>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85) >>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78) >>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129) >>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152) >>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124) >>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153) >>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176) >>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560) >>> at >>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) >>> at >>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189) >>> >>> After doing a little digging it appears that YarnClientProtocolProvider >>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is >>> not available available in older versions of HDFS. >>> >>> What versions of HDFS are currently supported and what HDFS versions are >>> planned for support? It would be great to be able to run YARN on legacy >>> HDFS installations. >>> >>> Thanks, >>> >>> Avery >