[jira] Updated: (HADOOP-4952) Improved files system interface for the application writer.

Sanjay Radia (JIRA) Mon, 24 Aug 2009 18:50:23 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sanjay Radia updated HADOOP-4952:
---------------------------------

    Attachment: FilesContext1.patch

The attached FilesContext.java is a class for the proposed new interface for 
the application writer. (I have use the name FIlesContext instead of the 
original class name Files since the class/object is really a context in which 
files
are named and created; further it avoids confusion with the Java class File.
The FileSystem class will remain for the fs implementing FSs (after some 
cleanup); We will probably rename it
to AbstractFileSystem or VirtualFileSystem (since it is very similar to the 
Unix VFS).
The FilesContext class is thing layer that mostly calls FileSystem.

There are still a few methods that remain to be implemented (such as copy, 
move, glob).

Special things to note when reviewing the new interface
* See the examples on how to use FilesContext in the javadoc.
* Have added notion of using ServerSide defaults using SERVER_DEFAULT
   (essentially its value is -1 and is passed across to the NN to use its 
defaults
   for ReplicationFactor, blockSize etc.
* Unlike FileSystem, FilesContext provides only 2  methods for create:
## FSDataOutputStream create(Path f, FsPermission permission  
                                                   EnumSet<CreateFlag> 
createFlag)
## FSDataOutputStream create(Path f,
                                    FsPermission permission,
                                    EnumSet<CreateFlag> createFlag,
                                    int bufferSize,
                                    short replication,
                                    long blockSize,
                                    Progressable progress)
** Note that the first one uses the defaults as expected.
In the 2nd one  can still use defaults for individual parameters; e.g.
  create("foo", FSPermissions.default(), EnumSet.of(CREATE), 4096, 
SERVER_DEFAULT, CONFIG_DEFAULT, null)
In this case the buf size is 4096, the SS default is used for replication 
factor and the
config default for the blockSize.

* Note the utility functions are in a inner class called Util.
The reason I have not put them in independent class is that one needs the
FilesContext for these function (ie to knopw what / and wd are and also for 
replication factor etc for new files created by the utility methods such as 
copy().
   

> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Files.java, Files.java, FilesContext1.patch
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4952) Improved files system interface for the application writer.

Reply via email to