[GitHub] [hbase] shahrs87 commented on issue #509: HBASE-22877 WebHDFS based export snapshot will fail if hfile is in archive directory

2019-08-20 Thread GitBox
shahrs87 commented on issue #509: HBASE-22877 WebHDFS based export snapshot 
will fail if hfile is in archive directory
URL: https://github.com/apache/hbase/pull/509#issuecomment-523121559
 
 
   > in this case, no exception will be thrown even when the file didn't exist 
when we are calling
   @VicoWu  Could you elaborate why you think it doesn't throw FileNotfound 
exception. In the stack trace you pasted, it did throw FileNotFound Exception 
but it is wrapped in RemoteException. You just need to unwrap RemoteException 
to see underlying exception. Maybe I am missing something. Please correct me if 
I am wrong.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] shahrs87 commented on issue #509: HBASE-22877 WebHDFS based export snapshot will fail if hfile is in archive directory

2019-08-21 Thread GitBox
shahrs87 commented on issue #509: HBASE-22877 WebHDFS based export snapshot 
will fail if hfile is in archive directory
URL: https://github.com/apache/hbase/pull/509#issuecomment-523518436
 
 
   > But when it comes to the httpfs java client WebHdfsFileSystem
   I am confused whether you are talking about WebHdfsFileSystem or 
HttpFSFileSystem. These two are different. Refer to class 
org.apache.hadoop.fs.http.client.HttpFSFileSystem for latter. The further 
comment is assuming that you are talking about WebHdfsFileSystem since you 
mentioned that multiple times.
   >  I have debug and dive into to implements of WebHdfsFileSystem and I 
found: when we call WebHdfsFileSystem.open(), in fact, it does nothing except 
for preparing an InputStream to the remote httpfs server, but it didn't 
establish any connection to the httpfs.
   This is not correct.  When you call WebHdfsFileSystem#open(), it does call 
the namenode and gets the list of datanodes where the blocks for that file 
resides. When you call read on the input stream, it directly goes to datanode.
   Follow the code path below. 
   WebHdfsFileSystem#open -->  WebHdfsInputStream(f, bufferSize) --> 
ReadRunner(path, buffersize) --> ReadRunner#getRedirectedUrl --> 
AbstractRunner#run() --> WebHdfsFileSystem#runWithRetry --> ReadRunner#connect 
--> AbstractRunner#connect(URL url) --> AbstractRunner#connect(final 
HttpOpParam.Op op, final URL url) which makes actual connection to namenode and 
if the file doesn't exist, WebHdfsFileSystem (client) will receive 
FileNotFoundException wrapped in RemoteException.
   My checked out branch in hadoop is branch-2.8.
   Hope this helps.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] shahrs87 commented on issue #509: HBASE-22877 WebHDFS based export snapshot will fail if hfile is in archive directory

2019-08-26 Thread GitBox
shahrs87 commented on issue #509: HBASE-22877 WebHDFS based export snapshot 
will fail if hfile is in archive directory
URL: https://github.com/apache/hbase/pull/509#issuecomment-525039584
 
 
   >  But in hadoop 2.x, this problem do exists;
   
   This bug *doesn't* exist in branch-2.6.0 also.
   Here is the git repo for branch-2.6.0  
https://github.com/apache/hadoop/tree/branch-2.6.0
   Refer to WebHdfsFileSystem#open --> 
OffsetUrlInputStream(UnresolvedUrlOpener o, OffsetUrlOpener r) -->   
ByteRangeInputStream(URLOpener o, URLOpener r) --> 
ByteRangeInputStream#getInputStream() --> 
ByteRangeInputStream#openInputStream() --> UnresolvedUrlOpener#connect() --> 
AbstractRunner#run() --> AbstractRunner#runWithRetry() method which in turn 
calls NameNodeWebHdfsMethods on server side and it will throw 
FileNotFoundException wrapped in RemoteException. 
   One suggestion: Please enable trace logging on client side if possible. It 
will show you many debug and trace level logs about communication between 
client and server to understand WebHdfsFileSystem better.
   @VicoWu  Hope this helps.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services