Oliver Caballero Alvarez created HADOOP-19199:
-------------------------------------------------

             Summary: Include FileStatus when opening a file from FileSystem
                 Key: HADOOP-19199
                 URL: https://issues.apache.org/jira/browse/HADOOP-19199
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
    Affects Versions: 3.4.0
            Reporter: Oliver Caballero Alvarez


The FileSystem abstract class prevents that if you have information about the 
FileStatus of a file, you use it to open that file, which means that in the 
implementations of the open method, they have to request the FileStatus of the 
same file again, making unnecessary requests.

A very clear example is seen in today's latest version of the parquet-hadoop 
implementation, where:

https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java

Although to create the implementation you had to consult the file to know its 
FileStatus, when opening it only the path is included, since the FileSystem 
implementation is the only thing it allows you to do. This implies that the 
implementation will surely, in its open function, verify that the file exists 
or what information the file has and perform the same operation again to 
collect the FileStatus.

 

This would simply be resolved by taking the latest current version:

 

[https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java]

and including the following:

 

  public FSDataInputStream open(FileStatus f) throws IOException {
        return this.open(f.getPath(), 
this.getConf().getInt("io.file.buffer.size", 4096));
    }

 

This would imply that it is backward compatible with all current Filesystems, 
but since it is in the implementation it could be used when this information is 
already known.

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to