yuqingguan created HDFS-15132: --------------------------------- Summary: WebHdfsFileSystem frequently reopening the target file causes the slow downloading speed Key: HDFS-15132 URL: https://issues.apache.org/jira/browse/HDFS-15132 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.1.1 Reporter: yuqingguan
We are using webhdfs to access the file. After Hive jar was upgraded from 1.2.1 to 3.1.3, we found downloading the same ORC file took more time than before. The problem is that in FSInputStream#read(long, byte[], int, int) method, FSInputStream#seek will toggle WebHdfsFileSystem#closeInputStream which closes the input stream and sets the RunnerState as SEEK as well. So that to read the whole file, it needs to open the file many times. While for Hive1, FSInputStream#read was not called. The input stream was not closed and RunnerState kept being OPEN once opened. I also find the related issue https://issues.apache.org/jira/browse/HDFS-8797 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org