[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Sanjay Radia (JIRA) Mon, 23 Feb 2009 10:50:33 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676012#action_12676012
 ]


Sanjay Radia commented on HADOOP-4952:
--------------------------------------

I have taken an initial look at the new NIO2 comping in jdk7 (JSR 203 -   
http://openjdk.java.net/projects/nio/)

NIO2 has notions of multiple file system providers (like hadoop) and also has a 
URI based file system.

Technical considerations for adopting JSR 203:
# Does our file  naming fit with that of NIO2?
# Is the NIO2  FileSystem provider interface sufficient for expressing hadoop. 
( Hadoop-3518 proposes this)
# Others?

There are non-technical issues such as the timing of NIO2 and jdk7; Alan 
Bateman,
the spec lead for JSR 203 has informed that jdk7 will come out in 2010 and that 
google is considering backporting NIO2 to jdk 6 under a different package name.

In NIO2 the default file system (ie slash relative names like /foo/bar) are 
relative to the JVM's file system.
wd-relative file names are also resolved relative to the JVM's file system.
This makes a lot of sense in Java: Java needs to be consistent with the older 
Java File system related API; ie a slash-relative and wd-relative are the same 
whether you use the NIO2 APIs or the other java io APIs.

However, Hadoop has a slightly different notion of default file system.  Hadoop 
file names are as follows
# Fully qualified URIs are resolved against the specified file system: 
hdfs://nn_adr:port/fo/bar
# file:/// are relative the local file system
#  Slash relative names: - these are resolved relative to the default hadoop 
file system
 Typically in a Hadoop cluster the default file system is the  file system of 
the cluster. Application by default inherit this config;   a user can set his 
default file system to any file system.
#  wd relative names are relative to the working directory



Pathnames of type (3) or (4) are not consistent with NIO2 and the rest of Java 
io.

Consider alternate name conventions like hfs:/foo/bar and hfs:foo/bar as Hadoop 
default root and hadoop working directory: IMHO, these will be too 
inconvenient: e.g.  at the shell, instead of typing "ls foo/bar" one would have 
to type ls hfs:foo/bar" to list files relative to the working directory.

I don't see how we can escape this dual notion of root and wd:
In the code the distinction using jsr 203 APIs will be roughtly:
{code}
  Path p  = Paths.get("/foo/bar");  // relative to jvm default fs.
  Path p = HDFS_provider.get("/foo/bar"); // relative to hadoop's defautl fs.
  Path q  = Paths.get("foo/bar");  // relative to jvm's default fs wd.
  Path q = HDFS_provider.get("foo/bar"); // relative to hadoop's wd.
{code}

The other thing to evaluate is  the file system provider interface. Please 
continue this discussion on HADOOP-3518. If JSR 203 provider interfaces are not 
sufficient then we need to work with the JSR203 expert group to influence its 
APIs.

Also wrt timing we will have to move forward with our APIs for hadoop 1.0
since 203 will not be finalized by the time Hadoop 1.0 happens. 

> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Files.java
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Reply via email to