> Johan > The project is a great idea. > I can help mentor this project. > I am cc'ing Alan Bateman the spec lead for JSR 203.
Thanks for showing interest. I, too, am very excited about this idea! > What timeframe did you have in mind? The general answer is that I need to stick with the dates set by Google, which are end of may until september. But I do have a time plan if the project is accepted by Google. In broad strokes, I would like to approach the problem by first creating a test suite like the existing one for HDFS, but for the NIO.2 API, and then move towards getting all tests to run. This would be ideal in my view. A rough estimate for this would be some two months. But this is just my idea of how to proceed. Since the deadline for organizations to apply is tomorrow we'd better move fast. I have attached a draft of my application. I will provide a résumé as well but need time to brush it up. Comments on the application are very welcome. Here are the dates for this year's GSoC: http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline > Alan B will be doing a presentation of JSR203 at Javaone this June. It would > be great if we could have some sort of > a prototype to validate JSR203 APIs against Hadoop by that time. We could definitely work together and isolate a set of components that would be big enough for a JavaOne demo. I might also have some extra time to spare before the GSOC begins. > > sanjay > > On Mar 9, 2009, at 9:20 PM, Doug Cutting wrote: > > Do you want to help mentor this guy in a GSoC project? > > Doug > > -------- Original Message -------- > Subject: GSOC proposal: Implementing the FileSystem SPI of JSR203 for HDFS > Date: Tue, 10 Mar 2009 02:46:00 +0100 > From: Johan Liesén <[email protected]> > Reply-To: [email protected], [email protected] > To: [email protected] > > There has been some discussion (HADOOP-4952, HADOOP-3518) on how the > HDFS file system can be exposed via the SPI in the new (new) NIO.2 > specification, JSR 203. > > I'd like to pitch the idea of doing that as a Google Summer of Code > (now accepting applications) project. What do you think about this? > Are there any developers that would want to mentor this effort? > > I am interested in doing this project (as a student) if the community > thinks that it is a good idea. > > > Some links: http://code.google.com/soc/, > https://issues.apache.org/jira/browse/HADOOP-3518, > https://issues.apache.org/jira/browse/HADOOP-4952, > http://jcp.org/en/jsr/detail?id=203 and a presentation of JSR203 on > the tube: http://www.youtube.com/watch?v=yNRS1ssLPdQ. > >
STUDENT APPLICATION -- GOOGLE SUMMER OF CODE 2009 IMPLEMENTING THE FILE SYSTEM SPI DEFINED IN JSR 203 FOR HDFS Mentor organisation: Apache (Hadoop) Student: Johan Liesâ n, Sweden (GMT +1) Background ---------- The new NIO proposal for Java defined in JSR 203 provides new APIs and SPIs for interacting with file systems. This project sets out to implement JSR 203s SPI for HDFS, thus making the experience of working with HDFS and file systems in general more coherent. The project will be beneficial for developers wanting to switch from an existing storage environment (supported by the SPI) to HDFS and will As previous discussions on this subject [1, 2] has mentioned, there isn't a one-to-one mapping between the SPI and the current file system interface used in HDFS today. The goal of this project is to provide an alternative entry point when interacting with HDFS code-wise. This should match the JSR 203 API as close as possible--some modifications might be necessary--and be usable. About the student ----------------- I'm Johan, a student in Sweden at Chalmers (http://www.chalmers.se) where I study for a MSc (graduating this fall) in CS. I got my hands dirty with a distrubuted file system when doing an internship at Google last summer. After that I had to find an open source alternative: Hadoop was the obvious choice. I have experience setting up and using Hadoop via the streaming library and Python and I am eager to dive into the distributed file system sea. I'm also a big fan of programming languages, both in theory and in practice, with experience in a wide range of them; from Haskell to JavaScript. However, Java is the language I'm most comfortable with, thus applying to Hadoop was an easy choice. All of my projects are open source and can be found on http://www.itstud.chalmers.se/~liesen and http://github.com/liesen. During the time span of this project I hope to get a deep understanding for how HDFS is designed and how it actually works, and, of course, make a good contribution to the open source world. I have no other obligations during the summer except for practicing my skateboarding skills. If there are any questions, please don't hesitate to talk to me. I'm on IRC (liesen) and mail. Road map -------- This is the schedule I have in mind. It's of course modifiable. (Estimates within parenthesis.) * (1w) Set up the initial project layout and a local HDFS environment * (1w) Determine the semantics of the operations defined in JSR 203 and how they map to the current HDFS API (will probably be ongoing) * (2w) Transfer the specification to the HDFS test suite * (3w) Adapt a new test suite by having the current use the new SPI * (4w) Implement a wrapper layer and expose the current file system via the new SPI * (2w) Have the test suite pass * (1w) Create a proof of concept application--the HDFS command line tool-- that uses the new functionality Total: 13 weeks When done, there should be code that works the same way as the current HDFS implementation and conforms to the JSR 203 specification plus the test suite has to pass and the example program(s) should work without fuss. I'm open to suggestions and ideas: just contact me.
