Re: [Fwd: GSOC proposal: Implementing the FileSystem SPI of JSR203 for HDFS]

Johan Liesén Thu, 12 Mar 2009 15:52:23 -0700

> Johan
>    The project is a great  idea.
>     I can help mentor this project.
>     I am cc'ing Alan Bateman the spec lead for JSR 203.

Thanks for showing interest. I, too, am very excited about this idea!

> What timeframe did you have in mind?

The general answer is that I need to stick with the dates set by
Google, which are end of may until september. But I do have a time
plan if the project is accepted by Google. In broad strokes, I would
like to approach the problem by first creating a test suite like the
existing one for HDFS, but for the NIO.2 API, and then move towards
getting all tests to run. This would be ideal in my view. A rough
estimate for this would be some two months. But this is just my idea
of how to proceed.

Since the deadline for organizations to apply is tomorrow we'd better
move fast. I have attached a draft of my application. I will provide a
résumé as well but need time to brush it up. Comments on the
application are very welcome.

Here are the dates for this year's GSoC:
http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline

> Alan B will be doing a presentation of JSR203 at Javaone this June. It would
> be great if we could have some sort of
> a prototype to validate JSR203 APIs against Hadoop by that time.

We could definitely work together and isolate a set of components that
would be big enough for a JavaOne demo. I might also have some extra
time to spare before the GSOC begins.

>
> sanjay
>
> On Mar 9, 2009, at 9:20 PM, Doug Cutting wrote:
>
> Do you want to help mentor this guy in a GSoC project?
>
> Doug
>
> -------- Original Message --------
> Subject: GSOC proposal: Implementing the FileSystem SPI of JSR203 for HDFS
> Date: Tue, 10 Mar 2009 02:46:00 +0100
> From: Johan Liesén <[email protected]>
> Reply-To: [email protected],   [email protected]
> To: [email protected]
>
> There has been some discussion (HADOOP-4952, HADOOP-3518) on how the
> HDFS file system can be exposed via the SPI in the new (new) NIO.2
> specification, JSR 203.
>
> I'd like to pitch the idea of doing that as a Google Summer of Code
> (now accepting applications) project. What do you think about this?
> Are there any developers that would want to mentor this effort?
>
> I am interested in doing this project (as a student) if the community
> thinks that it is a good idea.
>
>
> Some links: http://code.google.com/soc/,
> https://issues.apache.org/jira/browse/HADOOP-3518,
> https://issues.apache.org/jira/browse/HADOOP-4952,
> http://jcp.org/en/jsr/detail?id=203 and a presentation of JSR203 on
> the tube: http://www.youtube.com/watch?v=yNRS1ssLPdQ.
>
>

STUDENT APPLICATION -- GOOGLE SUMMER OF CODE 2009


IMPLEMENTING THE FILE SYSTEM SPI DEFINED IN JSR 203 FOR HDFS

Mentor organisation: Apache (Hadoop)
Student: Johan Liesâ n, Sweden (GMT +1)


Background
----------

The new NIO proposal for Java defined in JSR 203 provides new APIs and SPIs for 
interacting with file systems. This project sets out to implement JSR 203s SPI 
for HDFS, thus making the experience of working with HDFS and file systems in 
general more coherent. The project will be beneficial for developers wanting to 
switch from an existing storage environment (supported by the SPI) to HDFS and 
will 

As previous discussions on this subject [1, 2] has mentioned, there isn't a 
one-to-one mapping between the SPI and the current file system interface used 
in HDFS today. The goal of this project is to provide an alternative entry 
point when interacting with HDFS code-wise. This should match the JSR 203 API 
as close as possible--some modifications might be necessary--and be usable.


About the student
-----------------

I'm Johan, a student in Sweden at Chalmers (http://www.chalmers.se) where I 
study for a MSc (graduating this fall) in CS. I got my hands dirty with a 
distrubuted file system when doing an internship at Google last summer. After 
that I had to find an open source alternative: Hadoop was the obvious choice. I 
have experience setting up and using Hadoop via the streaming library and 
Python and I am eager to dive into the distributed file system sea.

I'm also a big fan of programming languages, both in theory and in practice, 
with experience in a wide range of them; from Haskell to JavaScript. However, 
Java is the language I'm most comfortable with, thus applying to Hadoop was an 
easy choice.

All of my projects are open source and can be found on 
http://www.itstud.chalmers.se/~liesen and http://github.com/liesen.

During the time span of this project I hope to get a deep understanding for how 
HDFS is designed and how it actually works, and, of course, make a good 
contribution to the open source world. I have no other obligations during the 
summer except for practicing my skateboarding skills.

If there are any questions, please don't hesitate to talk to me. I'm on IRC 
(liesen) and mail.


Road map
--------

This is the schedule I have in mind. It's of course modifiable. (Estimates 
within parenthesis.)

* (1w) Set up the initial project layout and a local HDFS environment
* (1w) Determine the semantics of the operations defined in JSR 203 and how 
       they map to the current HDFS API (will probably be ongoing)
* (2w) Transfer the specification to the HDFS test suite
* (3w) Adapt a new test suite by having the current use the new SPI
* (4w) Implement a wrapper layer and expose the current file system via the new
       SPI
* (2w) Have the test suite pass
* (1w) Create a proof of concept application--the HDFS command line tool--
       that uses the new functionality

Total: 13 weeks

When done, there should be code that works the same way as the current HDFS 
implementation and conforms to the JSR 203 specification plus the test suite 
has to pass and the example program(s) should work without fuss.

I'm open to suggestions and ideas: just contact me.

Re: [Fwd: GSOC proposal: Implementing the FileSystem SPI of JSR203 for HDFS]

Reply via email to