[ https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sanjay Radia updated HADOOP-4952: --------------------------------- Attachment: FilesContext1.patch The attached FilesContext.java is a class for the proposed new interface for the application writer. (I have use the name FIlesContext instead of the original class name Files since the class/object is really a context in which files are named and created; further it avoids confusion with the Java class File. The FileSystem class will remain for the fs implementing FSs (after some cleanup); We will probably rename it to AbstractFileSystem or VirtualFileSystem (since it is very similar to the Unix VFS). The FilesContext class is thing layer that mostly calls FileSystem. There are still a few methods that remain to be implemented (such as copy, move, glob). Special things to note when reviewing the new interface * See the examples on how to use FilesContext in the javadoc. * Have added notion of using ServerSide defaults using SERVER_DEFAULT (essentially its value is -1 and is passed across to the NN to use its defaults for ReplicationFactor, blockSize etc. * Unlike FileSystem, FilesContext provides only 2 methods for create: ## FSDataOutputStream create(Path f, FsPermission permission EnumSet<CreateFlag> createFlag) ## FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> createFlag, int bufferSize, short replication, long blockSize, Progressable progress) ** Note that the first one uses the defaults as expected. In the 2nd one can still use defaults for individual parameters; e.g. create("foo", FSPermissions.default(), EnumSet.of(CREATE), 4096, SERVER_DEFAULT, CONFIG_DEFAULT, null) In this case the buf size is 4096, the SS default is used for replication factor and the config default for the blockSize. * Note the utility functions are in a inner class called Util. The reason I have not put them in independent class is that one needs the FilesContext for these function (ie to knopw what / and wd are and also for replication factor etc for new files created by the utility methods such as copy(). > Improved files system interface for the application writer. > ----------------------------------------------------------- > > Key: HADOOP-4952 > URL: https://issues.apache.org/jira/browse/HADOOP-4952 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 0.21.0 > Reporter: Sanjay Radia > Assignee: Sanjay Radia > Attachments: Files.java, Files.java, FilesContext1.patch > > > Currently the FIleSystem interface serves two purposes: > - an application writer's interface for using the Hadoop file system > - a file system implementer's interface (e.g. hdfs, local file system, kfs, > etc) > This Jira proposes that we provide a simpler interfaces for the application > writer and leave the FilsSystem interface for the implementer of a > filesystem. > - Filesystem interface has a confusing set of methods for the application > writer > - We could make it easier to take advantage of the URI file naming > ** Current approach is to get FileSystem instance by supplying the URI and > then access that name space. It is consistent for the FileSystem instance to > not accept URIs for other schemes, but we can do better. > ** The special copyFromLocalFIle can be generalized as a copyFile where the > src or target can be generalized to any URI, including the local one. > ** The proposed scheme (below) simplifies this. > - The client side config can be simplified. > ** New config() by default uses the default config. Since this is the common > usage pattern, one should not need to always pass the config as a parameter > when accessing the file system. > - > ** It does not handle multiple file systems too well. Today a site.xml is > derived from a single Hadoop cluster. This does not make sense for multiple > Hadoop clusters which may have different defaults. > ** Further one should need very little to configure the client side: > *** Default files system. > *** Block size > *** Replication factor > *** Scheme to class mapping > ** It should be possible to take Blocksize and replication factors defaults > from the target file system, rather then the client size config. I am not > suggesting we don't allow setting client side defaults, but most clients do > not care and would find it simpler to take the defaults for their systems > from the target file system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.