[ https://issues.apache.org/jira/browse/ACCUMULO-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568980#comment-13568980 ]
Keith Turner commented on ACCUMULO-980: --------------------------------------- Some comments on proposal V1. It seems like the IV would be transparent to RFile, it would just be encryption header information associated with a block. Just like each gzip block probably has some header. From RFiles perspective it just needs to be able to read and write blocks of data. When the encryption codec is not used, there is no per block IV. Does this sound correct? Taking this a step further, should encryption be pushed into BCFile? Currently RFile has no concept of compression, is just reads and write blocks of data to BCFile. BCFile handles compression and stores compression metadata like what codec to use for reading. Even RFiles own root meta block is stored as a regular BCFile meta block and compressed like everything else. Seems like modifying BCfile rather than RFile may be easier. I have already modified BCfile to support multi level indexes in 1.4. BCFile was copied because it was package private, but was not modified for a long time. Why is another interface needed? Why not use org.apache.hadoop.io.compress.CompressionCodec? Not saying we should or should not do this, but would like to hear your thoughts since you have looked into this. I see some things in the design doc that I suspect influence this decision, like needed to set Key and IV. While thinking about this I remembered the BigTable paper mentioned using two compression codecs in series. In the past we have not supported rolling upgrade from 1.x to 1.(x+1). Would only need to consider this if 1.6 supported it. Changes in the file format would be a small part of a larger effort to support rolling upgrade. Releases to date could always read a file produced by any previous version. So Accumulo 1.4 can read rfiles produced by any previous version of Accumulo. Is there any concern with storing unencrypted blocks in memory? The code currently caches uncompressed blocks (but still serialzed with RFile encoding) in memory. Would this be a concern in case these cached block are swapped out? Would we want to keep blocks encrypted in the cache and decrypt only as needed? > support pluggable codecs for RFile > ---------------------------------- > > Key: ACCUMULO-980 > URL: https://issues.apache.org/jira/browse/ACCUMULO-980 > Project: Accumulo > Issue Type: Improvement > Reporter: Adam Fuchs > Assignee: Adam Fuchs > Fix For: 1.6.0 > > Attachments: RFile-Changes-Proposal-V1.pdf > > > As part of the encryption at rest story, RFile should support pluggable > modules where it currently has hardcoded options for compression codecs. This > is a natural place to add encryption capabilities, as the cost of encryption > would likely not be significantly different from the cost of compression, and > the block-level integration should maintain the same seek and scan > performance. Given the many implementation options for both encryption and > compression, it makes sense to have a plugin structure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira