[ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660844#action_12660844
 ] 

Doug Cutting commented on HADOOP-4952:
--------------------------------------

It might be nice to make it simpler to open a file given a URI and assuming the 
default configuration, but there are hazards to that as well.  Application code 
often becomes library code, and changing code that does not explicitly pass a 
configuration to start passing a configuration breaks compatibility.  The 
Hadoop API convention is everything must be explicitly passed a configuration, 
so that we do not rely on static state.  This is just a cost of doing business, 
and it's dangerous to remove it.

So I agree that static utility methods would be nice, since the FileSystem 
implementation can be automatically inferred from the configuration and the 
URI, but I think the Configuration should still be explicitly passed to these 
methods.  If we don't think that applications should ever need to call the 
non-static FileSystem methods, then placing the static methods in a different 
class makes sense, since it consolidates the documentation that users need be 
aware of.  Does that make sense to you, Tom?

Also, create() has too many parameters and is fragile.  We should add a 
FileCreation class that encapsulates these.  We should consider adding such 
classes for any methods that take more than a Configuration and a Path.


> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Files.java
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to