[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Sanjay Radia (JIRA) Tue, 06 Jan 2009 17:10:07 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661397#action_12661397
 ]


Sanjay Radia commented on HADOOP-4952:
--------------------------------------

Doug> Application code often becomes library code, and changing code that does 
not explicitly pass a configuration to start passing a configuration breaks 
compatibility.

I guess I am missing this library argument and having a hard time buying the 
argument "the rest of the Hodoop APIs do this and so we need to do this here".  
Most systems do not have such a config parameter in their APIs and they support 
both Application code and library code. What is so different about Hadoop and 
its libraries? Some example of the libraries you have in mind would help.
In most file  systems, the config state is provided by the underlying OS; it  
include the root and the working dir. Libraries need to set these if the 
default is not good enough.  

Most file system interfaces are dead simple. I think Hadoop's API can be made 
equally simple and at the same time allow complex libraries to be built 
relatively easily.  I thought the exportConfig/importConfig facilitates that.

Your use-case is where a piece of application code  suddenly turns into library 
code. 
The application code would have used the default config; when it turns into a 
library,   the line of code in your proposal  that says "config = new config()" 
would have to change to "config =  CallersConfigArg". 
In my proposal the libray writer will have to add a line 
"Files.importConfig(configArg)". 
Either way one line of  code has to change: setting the default config or 
passing in a different config argument.

I think there is more to the  underlying environment than just the config; 
hence libraries have  to be more careful . Owen, Arun and I discussed the Task 
Tracker and whether or not it can do any work on behalf of the job (either a 
some pre-work (loading cache)  or the task itself. The task tracker has to take 
the  credentials and the class  path from the job submitter's environment. For 
example when the task tracker acts on behalf of the job submitter, it  has to 
do the equivalent of JAAS's callerSubject.doAs().  


> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Files.java
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Reply via email to