> Does anybody have a good diagram showing the architecture of HDFS
encryption?

Related: Can we collect the digram and design docs from the various
implementation JIRAs and put them up on the Accumulo website? Every time
that I've needed to reference them it's been a giant pain to go find them.
Maybe brush up the contents if there happen to be differences between
design and implementation.

On Thu, Nov 5, 2015 at 12:21 PM, Adam Fuchs <afu...@apache.org> wrote:

> On Thu, Nov 5, 2015 at 12:17 PM, Christopher <ctubb...@apache.org> wrote:
>
> > My main concern using HDFS encryption vs. built-in Accumulo
> implementation
> > is possibly performance with respect to seeks. If we encrypt our indexed
> > blocks independently (as we do now), I suspect our seeks would be more
> > performant than relying on HDFS encryption, whose encrypted blocks may
> not
> > fall on our index boundaries. If this is a small difference, it might
> still
> > be worth it for convenience and simpler maintenance, but I suspect the
> > difference will be somewhat substantial.
> >
>
> Very good point, Chris. This is especially important if we allow users to
> pick their own encryption algorithms. As I understand it, cipher block
> chaining (CBC) is important to keep most crypto algorithms secure, and it
> has a big effect on where you need to start decrypting. There are ways of
> doing CBC that let you seek pretty close to any point in a file and decrypt
> from there, and there are other ways that require you to start from the
> beginning. The current RFile implementation ensures that you can start
> decrypting at the beginning of an RFile block, which matches where we start
> decompressing and where we currently seek in HDFS. The performance
> difference is likely to be much more pronounced for certain crypto
> settings.
>
> Does anybody have a good diagram showing the architecture of HDFS
> encryption?
>
>
>
> >
> > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.el...@gmail.com> wrote:
> >
> > > +1 I think this is the right step. My hunch is that some of the common
> > > data access patterns that we have in Accumulo (over HBase) is that the
> > > per-colfam encryption isn't quick as common a design pattern as it is
> > > for HBase (please tell me I'm wrong if anyone disagrees -- this is
> > > mostly a gut reaction). I think our users would likely benefit more
> from
> > > a per-namespace/table encryption control like you suggest.
> > >
> > > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > > zone/key for a table) is probably straightforward. Changing the
> > > TServer's WAL use would likely be trickier to get right (a tserver
> would
> > > have multiple WALs, one for each unique zone/key from Tablet it happens
> > > to host). Maybe worrying about that is getting ahead of things -- just
> > > thought about it and figured I'd mention it :)
> > >
> > > William Slacum wrote:
> > > > Yup, #2. I also don't know if it's worth the effort for that specific
> > > > feature. It might be easier to add something like per-namespace
> and/or
> > > > per-table encryption, then define common access patterns for
> > applications
> > > > that want to use multiple keys for encryption.
> > > >
> > > >
> > > >
> > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afu...@apache.org>
> wrote:
> > > >
> > > >> Bill,
> > > >>
> > > >> Do you envision one of the following as the driver behind
> > finer-grained
> > > >> encryption?:
> > > >>
> > > >> 1. We would only encrypt certain columns in order to get better
> > > >> performance;
> > > >>
> > > >> 2. We would use different keys on different columns in order to
> revoke
> > > >> access to a column via the key store;
> > > >>
> > > >> 3. We would only give a tablet server access to a subset of columns
> at
> > > any
> > > >> given time in order to protect something, and figure out what to do
> > for
> > > >> compactions, etc.;
> > > >>
> > > >> 4. Something entirely different...
> > > >>
> > > >> Seems like thing #2 might have merit, but I'm not sure it's worth
> the
> > > >> effort.
> > > >>
> > > >> Adam
> > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wsla...@gmail.com>  wrote:
> > > >>
> > > >>> @Adam, column family level encryption can be useful for
> multi-tenant
> > > >>> environments, and I think it maps pretty well to the document
> > > >>> partitioning/sharding/wikisearch style tables. Things are trickier
> in
> > > >>> Accumulo than in HBase since there isn't a 1:1 mapping between
> column
> > > >>> families and files. The built in RFile encryption scheme seems
> better
> > > >>> suited to this.
> > > >>>
> > > >>> @Christopher&  Keith, it's something we can evaluate. Is there a
> good
> > > >> test
> > > >>> harness for just writing an RFile, opening a reader to it, and just
> > > >> poking
> > > >>> around? I was looking at the constructors and they didn't seem
> > > >>> straightforward enough for me to comprehend them within a few
> > seconds.
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<ke...@deenlo.com
> > > >>> <javascript:_e(%7B%7D,'cvml','ke...@deenlo.com');>>  wrote:
> > > >>>
> > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<ke...@deenlo.com
> > > >>>> <javascript:_e(%7B%7D,'cvml','ke...@deenlo.com');>>  wrote:
> > > >>>>
> > > >>>>>
> > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<
> wsla...@gmail.com
> > > >>>> <javascript:_e(%7B%7D,'cvml','wsla...@gmail.com');>>  wrote:
> > > >>>>>> Is "the code being 'at rest'" you making a funny about active
> > > >>>> development?
> > > >>>>>> Making sure I haven't lost my ability to get jokes :)
> > > >>>>>>
> > > >>>>>> I see two reasons why the code would be inactive: the feature is
> > > >> good
> > > >>>>>> enough as is or it's not interesting enough to attract
> attention.
> > > >>>>>> Considering it's not public API, there are no discussions to
> bring
> > > >>> into
> > > >>>>>> the
> > > >>>>>> public API, and there's no effort to document how to use it, my
> > > >>>> intuition
> > > >>>>>> tells me that there isn't enough interest in it from a project
> > > >>>>>> perspective.
> > > >>>>>>
> > > >>>>>>  From a user perspective, I've been getting asked about it when
> I
> > > >> work
> > > >>>> with
> > > >>>>>> Accumulo users. My recommendation, exclusively, is to use HDFS
> > > >>>> encryption
> > > >>>>>> because I can go to Hadoop's website and find documentation on
> it.
> > > >>> When
> > > >>>> I
> > > >>>>>> go to find documentation on Accumulo's offerings, any usability
> > > >>>>>> information
> > > >>>>>> comes from vendor SlideShares. Most mentions of the feature on
> > > >>> official
> > > >>>>>> Apache Accumulo channels echo Christopher's sentiments on the
> > > >> feature
> > > >>>>>> being
> > > >>>>>> experimental and not being officially recommended for use.
> > > >>>>>>
> > > >>>>>> I wouldn't want to rip out the feature first and then figure
> > things
> > > >>> out
> > > >>>>>> later. Sean already alluded to it, but a roadmap should contain
> > > >>>> something
> > > >>>>>> (tool or documentation) to help users migrate if we go down that
> > > >>> route.
> > > >>>>>> What I'm trying to figure out is, when the question of "How do I
> > do
> > > >>>>>> encryption at rest in Accumulo?" comes up, what is our
> community's
> > > >>>> answer?
> > > >>>>>> If we went down the route of using HDFS encryption zones, can we
> > > >> offer
> > > >>>> the
> > > >>>>>> same features? At the very least, we'd be offering the same
> > > >>>> database-level
> > > >>>>> Where does the decryption happen with DFS, is it in the DFS
> client?
> > > >> If
> > > >>>>> so, using HDFS level encryption seems to offer the same
> > > >>> functionality???
> > > >>>>> Has anyone written a tool that takes an
> > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is as
> an
> > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering if there
> are
> > > >> any
> > > >>>>> unexpected gotchas w/ this.
> > > >>>>>
> > > >>>> I was discussing my questions w/ Christopher today and he
> mentioned
> > an
> > > >>>> experiment that I thought was interesting.   What is the random
> seek
> > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
> > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> encryption scheme. I don't know the details of "more advanced
> key
> > > >>>> stores",
> > > >>>>>> but it seems like we could potentially take any custom
> > > >> implementation
> > > >>>> and
> > > >>>>>> map it to a KeyProvider [1]. I could also envision table level
> > > >>>> encryption
> > > >>>>>> being implementable via zones, but probably not down to the
> column
> > > >>>> family
> > > >>>>>> level.
> > > >>>>>>
> > > >>>>>> [1]
> > > >>>>>>
> > > >>>>>>
> > > >>
> > >
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > > >>>>>>
> > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afu...@apache.org
> > > >>>> <javascript:_e(%7B%7D,'cvml','afu...@apache.org');>>  wrote:
> > > >>>>>>> Responses inline.
> > > >>>>>>>
> > > >>>>>>> Adam
> > > >>>>>>>
> > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubb...@apache.org
> > > >>>> <javascript:_e(%7B%7D,'cvml','ctubb...@apache.org');>>  wrote:
> > > >>>>>>>> 1. I'm not sure I'd call an incomplete solution 'great'. What
> it
> > > >>>> does
> > > >>>>>> is
> > > >>>>>>>> provide partial encryption-at-rest protection (unless you're
> > > >>> running
> > > >>>>>>>> without walogs, and have good integration with some external
> > > >>> secure
> > > >>>>>> key
> > > >>>>>>>> management faculty, and then it's probably fine).
> > > >>>>>>> The only thing that doesn't get encrypted is a temporary WAL
> > > >>> recovery
> > > >>>>>> file.
> > > >>>>>>> That is a project we should take on, but it does not imply that
> > > >> the
> > > >>>>>>> existing features are not valuable. With HDFS encryption
> options
> > > >>> this
> > > >>>>>> would
> > > >>>>>>> now be a much easier project to take on. Also, the users I know
> > > >> that
> > > >>>> use
> > > >>>>>>> encryption at rest do so with a more secure key store than the
> > > >>>> default.
> > > >>>>>>>> 2. I'm concerned that anybody using Accumulo's E-A-R don't
> > > >>>> necessarily
> > > >>>>>>>> realize its current shortcomings, or its lack of upstream
> > > >>>> maintenance
> > > >>>>>>>> support (which it has not been receiving). It may be the case
> > > >> that
> > > >>>>>> these
> > > >>>>>>>> users have support from an intermediary, and do understand the
> > > >>>>>>>> shortcomings... I don't know, but it's a concern.
> > > >>>>>>> Anybody that creates a secure system has to analyze the
> security
> > > >> of
> > > >>>> the
> > > >>>>>>> system as a whole. Accumulo's encryption at rest is one part of
> > > >> the
> > > >>>>>>> solution. Taking away the tool without providing an alternative
> > > >> does
> > > >>>>>>> nothing to improve the security of systems built on Accumulo.
> > > >>>>>>>
> > > >>>>>>>> 3. Correction: it has been an explicitly experimental feature
> > > >> and
> > > >>> an
> > > >>>>>>>> incomplete one, which hasn't really been touched in two years,
> > > >> and
> > > >>>> has
> > > >>>>>>> been
> > > >>>>>>>> explicitly excluded by the community for being public API
> > > >> because
> > > >>> of
> > > >>>>>> its
> > > >>>>>>>> incompleteness. Age doesn't determine public API status. The
> > > >>>> community
> > > >>>>>>> does.
> > > >>>>>>>
> > > >>>>>>> People are using it, so we have to consider the implications of
> > > >>>> whatever
> > > >>>>>>> changes we make and weigh against the benefits. I believe the
> > last
> > > >>> bug
> > > >>>>>> fix
> > > >>>>>>> was done this year, so I would argue it is being maintained.
> > > >> Changes
> > > >>>> to
> > > >>>>>> our
> > > >>>>>>> encryption at rest implementation will have consequences for
> > those
> > > >>>>>> users.
> > > >>>>>>> There had better be a clear benefit if we break their systems.
> > > >>>>>>>
> > > >>>>>>>> 4. Has Accumulo's been evaluated for security and performance?
> > > >> By
> > > >>>>>> whom?
> > > >>>>>>> Is
> > > >>>>>>>> it published?
> > > >>>>>>> Yes, there have been several talks at meetups and conferences
> > that
> > > >>>>>> discuss
> > > >>>>>>> the security and performance of the current solution.
> > > >>>>>>>
> > > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afu...@apache.org
> > > >>>> <javascript:_e(%7B%7D,'cvml','afu...@apache.org');>>  wrote:
> > > >>>>>>>>> There's another way to look at the state of Accumulo's
> > > >>> encryption
> > > >>>> at
> > > >>>>>>> rest:
> > > >>>>>>>>> 1. Encryption at rest works great for what it does, and the
> > > >> code
> > > >>>>>> being
> > > >>>>>>> "at
> > > >>>>>>>>> rest" isn't necessarily a problem
> > > >>>>>>>>> 2. Several organizations are using Accumulo's encryption at
> > > >> rest
> > > >>>>>>>>> effectively in operations
> > > >>>>>>>>> 3. Encryption at rest has been a supported configuration
> > > >> option
> > > >>>> for
> > > >>>>>>> over
> > > >>>>>>>>> two years with established plugin interfaces, and therefore
> it
> > > >>>>>> should
> > > >>>>>>> be
> > > >>>>>>>>> considered part of the public API
> > > >>>>>>>>> 4. Upstream alternatives (to my knowledge) have not been
> > > >>> analyzed
> > > >>>>>> for
> > > >>>>>>>>> performance or security
> > > >>>>>>>>>
> > > >>>>>>>>> The given option #2 would at least require an analysis of
> > > >>>>>> alternatives,
> > > >>>>>>> and
> > > >>>>>>>>> we would have to decide what to do about backwards
> > > >> compatibility
> > > >>>> for
> > > >>>>>>> users
> > > >>>>>>>>> using custom key stores and encryption strategies that may or
> > > >>> may
> > > >>>>>> not
> > > >>>>>>> be
> > > >>>>>>>>> supported by upstream alternatives.
> > > >>>>>>>>>
> > > >>>>>>>>> As far as option #1 goes, I can get behind encouraging people
> > > >> to
> > > >>>>>> take
> > > >>>>>>> up
> > > >>>>>>>>> projects to improve Accumulo's encryption. I think we're
> > > >> already
> > > >>>>>> going
> > > >>>>>>> down
> > > >>>>>>>>> this path, but without having identified resources to do the
> > > >>>>>>> improvements.
> > > >>>>>>>>> Any volunteers?
> > > >>>>>>>>>
> > > >>>>>>>>> Adam
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum<
> > > >>>> wsla...@gmail.com<javascript:_e(%7B%7D,'cvml','wsla...@gmail.com
> > ');>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>> So I've been looking into options for providing encryption
> > > >> at
> > > >>>>>> rest,
> > > >>>>>>> and
> > > >>>>>>>>> it
> > > >>>>>>>>>> seems like what Accumulo has is abandonware from a project
> > > >>>>>>> perspective.
> > > >>>>>>>>>> There is no official documentation on how to perform
> > > >>> encryption
> > > >>>> at
> > > >>>>>>> rest,
> > > >>>>>>>>>> and the best information from its status comes from year (or
> > > >>>>>> greater)
> > > >>>>>>> old
> > > >>>>>>>>>> ticket comments about how the feature is still experimental.
> > > >>>>>> Recently
> > > >>>>>>>>> there
> > > >>>>>>>>>> was a talk that described using HDFS encryption zones as an
> > > >>>>>>> alternative.
> > > >>>>>>>>>>  From my perspective, this is what I see as the current
> > > >>>> situation:
> > > >>>>>>>>>> 1- Encryption at rest in Accumulo isn't actively being
> > > >> worked
> > > >>> on
> > > >>>>>>>>>> 2- Encryption at rest in Accumulo isn't part of the public
> > > >> API
> > > >>>> or
> > > >>>>>>>>> marketed
> > > >>>>>>>>>> capabilities
> > > >>>>>>>>>> 3- Documentation for what does exist is scattered throughout
> > > >>>> Jira
> > > >>>>>>>>> comments
> > > >>>>>>>>>> or presentations
> > > >>>>>>>>>> 4- A viable alternative exists that appears to have feature
> > > >>>>>> parity in
> > > >>>>>>>>> HDFS
> > > >>>>>>>>>> encryption
> > > >>>>>>>>>> 5- HBase has finer grained encryption capabilities that
> > > >> extend
> > > >>>>>> beyond
> > > >>>>>>>>> what
> > > >>>>>>>>>> HDFS provides
> > > >>>>>>>>>>
> > > >>>>>>>>>> Moving forward, what's the consensus for supporting this
> > > >>>> feature?
> > > >>>>>>>>>> Personally, I see two options:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1- Start going down a path to bring the feature into the
> > > >>>> forefront
> > > >>>>>>> and
> > > >>>>>>>>>> start providing feature parity with HBase
> > > >>>>>>>>>>
> > > >>>>>>>>>> or
> > > >>>>>>>>>>
> > > >>>>>>>>>> 2- Remove the feature and place emphasis on upstream
> > > >>> encryption
> > > >>>>>>> offerings
> > > >>>>>>>>>> Any input is welcomed&  appreciated!
> > > >>>>>>>>>>
> > > >>>>>
> > > >
> > >
> >
>

Reply via email to