Hi,

We (Jingxin Feng, Xing Lin, and I) have been working on providing a
FileSystem implementation that allows Hadoop to utilize a NFSv3 storage
server as a filesystem. It leverages code from hadoop-nfs project for all
the request/response handling. We would like your help to add it as part of
hadoop tools (similar to the way hadoop-aws and hadoop-azure).

In more detail, the Hadoop NFS Connector allows Apache Hadoop (2.2+) and
Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint.
The NFS Connector can be run in two modes: (1) secondary filesystem - where
Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a
second storage endpoint, and (2) primary filesystem - where Hadoop/Spark
runs entirely on a NFSv3 storage server.

The code is written in a way such that existing applications do not have to
change. All one has to do is to copy the connector jar into the lib/
directory of Hadoop/Spark. Then, modify core-site.xml to provide the
necessary details.

The current version can be seen at:
https://github.com/NetApp/NetApp-Hadoop-NFS-Connector

It is my first time contributing to the Hadoop codebase. It would be great
if someone on the Hadoop team can guide us through this process. I'm
willing to make the necessary changes to integrate the code. What are the
next steps? Should I create a JIRA entry?

Thanks,

Gokul

Reply via email to