shahrs87 edited a comment on issue #509: HBASE-22877 WebHDFS based export snapshot will fail if hfile is in archive directory URL: https://github.com/apache/hbase/pull/509#issuecomment-523518436 > But when it comes to the httpfs java client WebHdfsFileSystem I am confused whether you are talking about WebHdfsFileSystem or HttpFSFileSystem. These two are different. Refer to class org.apache.hadoop.fs.http.client.HttpFSFileSystem for latter. The further comment is assuming that you are talking about WebHdfsFileSystem since you mentioned that multiple times. > I have debug and dive into to implements of WebHdfsFileSystem and I found: when we call WebHdfsFileSystem.open(), in fact, it does nothing except for preparing an InputStream to the remote httpfs server, but it didn't establish any connection to the httpfs. This is not correct. When you call WebHdfsFileSystem#open(), it does create an http connection to namenode and gets the list of datanodes where the blocks for that file resides. When you call read on the input stream, it directly goes to datanode. Follow the code path below. WebHdfsFileSystem#open --> WebHdfsInputStream(f, bufferSize) --> ReadRunner(path, buffersize) --> ReadRunner#getRedirectedUrl --> AbstractRunner#run() --> WebHdfsFileSystem#runWithRetry --> ReadRunner#connect --> AbstractRunner#connect(URL url) --> AbstractRunner#connect(final HttpOpParam.Op op, final URL url) which makes actual connection to namenode and if the file doesn't exist, WebHdfsFileSystem (client) will receive FileNotFoundException wrapped in RemoteException. My checked out branch in hadoop is branch-2.8. Hope this helps.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services