[ 
https://issues.apache.org/jira/browse/HDFS-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729933#comment-14729933
 ] 

Chris Douglas commented on HDFS-9020:
-------------------------------------

Sorry, I didn't mean to imply an endorsement.

I share your reservations about caching clients. It wasn't intended to address 
leaks, but to associate separate {{POST}} requests as a session. The goal is to 
have the overhead at the NN not be significantly worse than if the client had 
instantiated a DFSClient instance. After being redirected, a client using 
{{WebHdfsFileSystem}} shouldn't create more (or longer-lived) clients than the 
existing code, if the stream is properly closed.

bq. If a stream is orphaned, the NN should eventually recover the lease but the 
cached client will keep the lease alive. So now you must have some additional 
mechanism for timing out open streams and closing them.
Good point. If the client were to send a zero-length append (i.e., {{POST}} w/ 
the session cookie) in the stream to keep the client alive, we could use that 
to time out clients that disappear without closing the stream. Combined with 
the shutdown hook, is that sufficient to catch most of the cases we'd also 
cover in DFSClient?

bq. If the intention is to abort the cached dfsclient on another node, the 
client won't know the lease is gone until it tries to add or complete a block - 
but an idle stream isn't going to do that.
Yes, that's the intent. If the client gets an error, it may retry and be 
redirected to another DN. Unfortunately, the old DFSClient shouldn't see any 
new writes, unless the old WebHDFS client is redirected back. Is there a way to 
compel the client to verify its lease on an idle stream? The existing {{POST}} 
isn't idempotent, and it'd be a significant change to make it so. We could try 
to resync at the WebHDFS client, if we could guarantee that the old DFSClient 
were closed and it had flushed the stream, but this should probably be a 
separate issue.

> Support hflush/hsync in WebHDFS
> -------------------------------
>
>                 Key: HDFS-9020
>                 URL: https://issues.apache.org/jira/browse/HDFS-9020
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Chris Douglas
>         Attachments: HDFS-9020-alt.txt
>
>
> In the current implementation, hflush/hsync have no effect on WebHDFS 
> streams, particularly w.r.t. visibility to other clients. This proposes to 
> extend the protocol and implementation to enable this functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to