[jira] [Commented] (HADOOP-14872) CryptoInputStream should implement unbuffer

Xiao Chen (JIRA) Tue, 17 Oct 2017 13:26:43 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208272#comment-16208272
 ]


Xiao Chen commented on HADOOP-14872:
------------------------------------

Thanks for the continued work on this John. My belated review, mostly cosmetic:
- We should verify the exception. Also should log the exception using the 
standard loggers.
{code:title=CryptoStreamsTestBase:testUnbuffer}
    // Test buffered read again after unbuffer
      try {
        // Must seek to the beginning first
        ((Seekable) in).seek(0);
        // Can not read again if seek fails
        readCheck(in);
      } catch (Exception e) {
        System.err.println(e);
      }
{code}
- Suggest to include {{Locale.ENGLISH}} in {{capability.toLowerCase()}}. 
Steve's above example also does this. Hadoop's {{StringUtils#toLowerCase}} 
makes this a little easier to write, but also conflicts the the apache commons 
StringUtils in some classes. :(
- From existing code. The javadoc for hsync says 'flush out the data in 
client's buffer and the disk device', I feel the {{Syncable}}'s javadoc is more 
accurate 'flush out the data in client's user buffer all the way to the disk 
device (but the disk may have it in its cache).' We should also update the 
documentation about this.
- In {{CryptoInputStream}}, replace method name 
s/cleanDescriptorPool/clearDecryptorPool/g.
- Maybe we should provide a 1-sentence explanation on readahead, dropbehind and 
unbuffer, similar to what hsync and hflush have.

> CryptoInputStream should implement unbuffer
> -------------------------------------------
>
>                 Key: HADOOP-14872
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14872
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.6.4
>            Reporter: John Zhuge
>            Assignee: John Zhuge
>         Attachments: HADOOP-14872.001.patch, HADOOP-14872.002.patch, 
> HADOOP-14872.003.patch, HADOOP-14872.004.patch, HADOOP-14872.005.patch, 
> HADOOP-14872.006.patch, HADOOP-14872.007.patch, HADOOP-14872.008.patch, 
> HADOOP-14872.009.patch, HADOOP-14872.010.patch, HADOOP-14872.011.patch, 
> HADOOP-14872.012.patch
>
>
> Discovered in IMPALA-5909.
> Opening an encrypted HDFS file returns a chain of wrapped input streams:
> {noformat}
> HdfsDataInputStream
>   CryptoInputStream
>     DFSInputStream
> {noformat}
> If an application such as Impala or HBase calls HdfsDataInputStream#unbuffer, 
> FSDataInputStream#unbuffer will be called:
> {code:java}
>     try {
>       ((CanUnbuffer)in).unbuffer();
>     } catch (ClassCastException e) {
>       throw new UnsupportedOperationException("this stream does not " +
>           "support unbuffering.");
>     }
> {code}
> If the {{in}} class does not implement CanUnbuffer, UOE will be thrown. If 
> the application is not careful, tons of UOEs will show up in logs.
> In comparison, opening an non-encrypted HDFS file returns this chain:
> {noformat}
> HdfsDataInputStream
>   DFSInputStream
> {noformat}
> DFSInputStream implements CanUnbuffer.
> It is good for CryptoInputStream to implement CanUnbuffer for 2 reasons:
> * Release buffer, cache, or any other resource when instructed
> * Able to call its wrapped DFSInputStream unbuffer
> * Avoid the UOE described above. Applications may not handle the UOE very 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14872) CryptoInputStream should implement unbuffer

Reply via email to