[jira] [Commented] (ACCUMULO-980) support pluggable codecs for RFile

Keith Turner (JIRA) Fri, 01 Feb 2013 11:12:13 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568980#comment-13568980
 ]


Keith Turner commented on ACCUMULO-980:
---------------------------------------

Some comments on proposal V1.

It seems like the IV would be transparent to RFile, it would just be encryption 
header information associated with a block.  Just like each gzip block probably 
has some header.  From RFiles perspective it just needs to be able to read and 
write blocks of data.   When the encryption codec is not used, there is no per 
block IV.  Does this sound correct?  Taking this a step further, should 
encryption be pushed into BCFile?   Currently RFile has no concept of 
compression, is just reads and write blocks of data to BCFile.  BCFile handles 
compression and stores compression metadata like what codec to use for reading. 
 Even RFiles own root meta block is stored as a regular BCFile meta block and 
compressed like everything else.  Seems like modifying BCfile rather than RFile 
may be easier.   I have already modified BCfile to support multi level indexes 
in 1.4.   BCFile was copied because it was package private, but was not 
modified for a long time.

Why is another interface needed?  Why not use 
org.apache.hadoop.io.compress.CompressionCodec?  Not saying we should or should 
not do this, but would like to hear your thoughts since you have looked into 
this.  I see some things in the design doc that I suspect influence this 
decision, like needed to set Key and IV.  While thinking about this I 
remembered the BigTable paper mentioned using two compression codecs in series.

In the past we have not supported rolling upgrade from 1.x to 1.(x+1).  Would 
only need to consider this if 1.6 supported it.   Changes in the file format 
would be a small part of a larger effort to support rolling upgrade.   Releases 
to date could always read a file produced by any previous version.   So 
Accumulo 1.4 can read rfiles produced by any previous version of Accumulo.   

Is there any concern with storing unencrypted blocks in memory?  The code 
currently caches uncompressed blocks (but still serialzed with RFile encoding) 
in memory.  Would this be a concern in case these cached block are swapped out? 
 Would we want to keep blocks encrypted in the cache and decrypt only as 
needed?  

                
> support pluggable codecs for RFile
> ----------------------------------
>
>                 Key: ACCUMULO-980
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-980
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>            Assignee: Adam Fuchs
>             Fix For: 1.6.0
>
>         Attachments: RFile-Changes-Proposal-V1.pdf
>
>
> As part of the encryption at rest story, RFile should support pluggable 
> modules where it currently has hardcoded options for compression codecs. This 
> is a natural place to add encryption capabilities, as the cost of encryption 
> would likely not be significantly different from the cost of compression, and 
> the block-level integration should maintain the same seek and scan 
> performance. Given the many implementation options for both encryption and 
> compression, it makes sense to have a plugin structure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-980) support pluggable codecs for RFile

Reply via email to