yuqingguan created HDFS-15132:
---------------------------------

             Summary: WebHdfsFileSystem frequently reopening the target file 
causes the slow downloading speed
                 Key: HDFS-15132
                 URL: https://issues.apache.org/jira/browse/HDFS-15132
             Project: Hadoop HDFS
          Issue Type: Improvement
    Affects Versions: 3.1.1
            Reporter: yuqingguan


We are using webhdfs to access the file.
 After Hive jar was upgraded from 1.2.1 to 3.1.3, we found downloading the same 
ORC file took more time than before.

The problem is that in FSInputStream#read(long, byte[], int, int) method,
 FSInputStream#seek will toggle WebHdfsFileSystem#closeInputStream which closes 
the input stream and sets the RunnerState as SEEK as well.
 So that to read the whole file, it needs to open the file many times.
 While for Hive1, FSInputStream#read was not called. The input stream was not 
closed and RunnerState kept being OPEN once opened.

I also find the related issue https://issues.apache.org/jira/browse/HDFS-8797



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to