[ https://issues.apache.org/jira/browse/HADOOP-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171543#comment-16171543 ]
Steve Loughran commented on HADOOP-14872: ----------------------------------------- # This is one of those APIs which {{StreamCapabilities.hasCapability()}} should be useful to query; as it is its a method of limited value (HBase logs a lot too when a method # And it'd have been nice if the topic of making this a public API was actually accompanied by some discussion and updates to the FS spec (HADOOP-12805) I think we should look at why {{FSDataInputStream#unbuffer}} throws an error if its not support. Why should it be be treated as anything other than a hint to "stop using so much resources". It's not a disaster if the client doesn't implement it. Here then, is my proposal h3. branch 2.6+ # All our input streams (s3a, wasb) implement CanUnbuffer but unbuffer() i s a no-op and hasCapability("unbuffer") -> false. That way: nothing breaks. # {{FSDataInputStream#unbuffer}} downgrades to a no-op if the method isn't supported. h3. branch-2+ # The input stream chain implements the same probes for StreamCapabilities.hasCapability as FSDataOutputStream does # those few streams which support CanUnbuffer, declare this in {{hasCapabiltlity("unbuffer")) # All our input streams (s3a, wasb) declare implement CanUnbuffer and StreamCapabilities, but unbuffer() as a no-op and hasCapability("unbuffer") -> false. That way: nothing breaks, and if you really want unbuffering, you get to ask. # Unless the streams really can implement unbuffering, and choose to do so, in which case they do so and hasCapability("unbuffer") -> true # FS spec documentation gets updated to describe interface as a no-op by default, if you implement hasCapability you declare your # tests to match the spec. That is: you call unbuffer and expect it to not fail, irrespective of stream capability. # CanUnbuffer javadoc says "check capabilities". Even better, change it to extends StreamCapabilities to mandate that API goes with it (it's tagged as @Evolving) after all. This is a bigger change but it addresses a more fundamental issue: HDFS added a new API that client apps expect to be implemented, but which no other fileystems do. If we treat the API as a best-effort attempt to reduce client side resource use, no-op is a legit action > CryptoInputStream should implement unbuffer > ------------------------------------------- > > Key: HADOOP-14872 > URL: https://issues.apache.org/jira/browse/HADOOP-14872 > Project: Hadoop Common > Issue Type: Improvement > Components: fs > Affects Versions: 2.6.4 > Reporter: John Zhuge > Assignee: John Zhuge > Attachments: HADOOP-14872.001.patch, HADOOP-14872.002.patch, > HADOOP-14872.003.patch > > > Discovered in IMPALA-5909. > Opening an encrypted HDFS file returns a chain of wrapped input streams: > {noformat} > HdfsDataInputStream > CryptoInputStream > DFSInputStream > {noformat} > If an application such as Impala or HBase calls HdfsDataInputStream#unbuffer, > FSDataInputStream#unbuffer will be called: > {code:java} > try { > ((CanUnbuffer)in).unbuffer(); > } catch (ClassCastException e) { > throw new UnsupportedOperationException("this stream does not " + > "support unbuffering."); > } > {code} > If the {{in}} class does not implement CanUnbuffer, UOE will be thrown. If > the application is not careful, tons of UOEs will show up in logs. > In comparison, opening an non-encrypted HDFS file returns this chain: > {noformat} > HdfsDataInputStream > DFSInputStream > {noformat} > DFSInputStream implements CanUnbuffer. > It is good for CryptoInputStream to implement CanUnbuffer for 2 reasons: > * Release buffer, cache, or any other resource when instructed > * Able to call its wrapped DFSInputStream unbuffer > * Avoid the UOE described above. Applications may not handle the UOE very > well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org