[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Sanjay Radia (JIRA) Sun, 30 Aug 2009 17:03:56 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749344#action_12749344
 ]


Sanjay Radia commented on HADOOP-4952:
--------------------------------------

The proposed FileContext API, in addition to being a more convenient interface 
for application writer,  will also simplify the FileSystem layer, freeing that 
layer from dealing with notions of default filesystem (i.e. /) and wd and also 
freeing it from having to deal with default values of FS variables like default 
block size (BS), replication factor (RF), umask, etc

The original design was to allow options 1 and 2 below for config values. A few 
folks felt that we needed to retain the notion of client-side defaults (option 
3 below).:
 # Specify the desired values as parameters in the methods such as create. [as 
before]
 # Use SS defaults - different FS and their deployments can have their own 
defaults and also frees the admin from distributing the default config to all 
client nodes. [New]
 # Use client side defaults derived from the config. [as before]
The uploaded patch attempts to give all 3 options.

After further thinking, I am proposing that we support only options 1 and 2; 
not 3.
Reasons:
 * Simpler config 
 * Most folks are going to be happy with SS defaults. The admin has set the 
defaults based on various considerations including cluster size etc. Further in 
a federated environment, each cluster may have different defaults that a single 
client-side config cannot accommodate.
 * Turns out that supporting client-side defaults (option 3) complicates the 
implementation and forces the FileSystem layer to deal with default values for 
BS, RF etc.   The primary reason for that is that the original design of 
FileSystem allows *each distinct FileSystem class* to have its own client-side 
defaults for BS, RF!!
 ** If you look at my patch, you will realize that it is incorrect. When 
resolving a fully qualified URI it needs to look up the defaults for that 
specific FileSystem. The patch incorrectly uses the defaults for the default 
filesystem. Turns out the patch for Symbolic links (HDFS-245) also makes the 
same mistake when it follows symbolic links to foreign file systems.
 ** Yes this is fixable, but will require that the FileSystem layer retain 
notion of defaults for BS, RF. It is much simpler to let the FileContext be the 
only layer that needs to deal with the context for /, wd and defaults for BS, 
RF, etc.
 * I believe the notion of client-side defaults is bad idea.  SS defaults are 
sufficient and furthermore they deal with multiple deployments of the same FS 
while client-side defaults do not deal with that situation. 


> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: FileContext3.patch, FileContext5.patch, Files.java, 
> Files.java, FilesContext1.patch, FilesContext2.patch
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Reply via email to