SGTM
William Slacum wrote:
Just to moonwalk back a bit, I see a few things happening concurrently now.
First is trying to get a consensus on where we want to go with the
encryption at rest story in Accumulo.
I see us having established that what we have is scoped down to working for
WALs and RFiles, and if you happen to have written it, you are satisfied.
However, as a project, we haven't pulled it into the public API and haven't
provided documentation, so if you haven't written it, the process of
finding out how to configure and use the feature is indirect.
There is some consensus about moving to using HDFS encryption to achieve
the same features, but we want to test and see if the performance is
comparable between it and Accumulo's RFile encryption capability. There may
be caveats based on how you encrypt the data. We want to explore this
space. Mike would like a Jira ticket to outline this.
For adding features to Accumulo, we could potentially add encryption at the
column level. Questions about this involve the level of effort for
supporting this because, compared to other solutions, dynamic locality
groups make this a more difficult task when compared to products with a 1:1
mapping between locality groups and column families (as well as an extra
mapping to files).
Did I miss anything?
On Thu, Nov 5, 2015 at 1:27 PM, Adam Fuchs<[email protected]> wrote:
Camps two and three are the same camp, really. If we can identify a clear
roadmap (eventually via the right set of tickets), then it comes down to
whether people have energy and inclination to do the work. I don't think
the roadmap ends here.
Adam
On Thu, Nov 5, 2015 at 1:18 PM, Christopher<[email protected]> wrote:
Perhaps. I had interpreted some of Adam's comments ("The only thing that
doesn't get encrypted is a temporary WAL recovery file. That is a project
we should take on..."), as favoring improvements to the current state of
things. As that has also been the focus of previous conversations about
the
state of Accumulo's encryption-at-rest, I assumed that third camp also
existed. Perhaps I was wrong.
On Thu, Nov 5, 2015 at 1:11 PM Mike Drob<[email protected]> wrote:
I think you have misidentified the two camps. There is a camp that
believes
we should phase out the code in favour of the HDFS encryption, and a
camp
that believes the code is sufficiently mature. I don't think there is a
group that is interested in improving the state of things.
On Thu, Nov 5, 2015 at 12:02 PM, Christopher<[email protected]>
wrote:
JIRAs are fine, but I thought this thread was mostly addressing the
fact
that there doesn't seem to be a sustained interest in actually
working
on
any of the JIRAs addressing that area of code. Am I wrong? Is there
willingness from anybody to expend effort on this code? Even if not,
we
can
still make JIRAs, but they'll probably just be ignored. So, the
question
for me is: which JIRAs should we make? Are we going to pursue phasing
out
the code, or pursue improving it? Those are very different JIRA text.
On Thu, Nov 5, 2015 at 12:22 PM Mike Drob<[email protected]> wrote:
Can we file some JIRAs to build out a suite to test this and run
the
necessary tests?
On Thu, Nov 5, 2015 at 11:17 AM, Christopher<[email protected]>
wrote:
My main concern using HDFS encryption vs. built-in Accumulo
implementation
is possibly performance with respect to seeks. If we encrypt our
indexed
blocks independently (as we do now), I suspect our seeks would be
more
performant than relying on HDFS encryption, whose encrypted
blocks
may
not
fall on our index boundaries. If this is a small difference, it
might
still
be worth it for convenience and simpler maintenance, but I
suspect
the
difference will be somewhat substantial.
On Thu, Nov 5, 2015 at 12:11 PM Josh Elser<[email protected]
wrote:
+1 I think this is the right step. My hunch is that some of the
common
data access patterns that we have in Accumulo (over HBase) is
that
the
per-colfam encryption isn't quick as common a design pattern as
it
is
for HBase (please tell me I'm wrong if anyone disagrees -- this
is
mostly a gut reaction). I think our users would likely benefit
more
from
a per-namespace/table encryption control like you suggest.
Implementing RFile encryption at HDFS level (e.g. tie a
specific
zone/key for a table) is probably straightforward. Changing the
TServer's WAL use would likely be trickier to get right (a
tserver
would
have multiple WALs, one for each unique zone/key from Tablet it
happens
to host). Maybe worrying about that is getting ahead of things
--
just
thought about it and figured I'd mention it :)
William Slacum wrote:
Yup, #2. I also don't know if it's worth the effort for that
specific
feature. It might be easier to add something like
per-namespace
and/or
per-table encryption, then define common access patterns for
applications
that want to use multiple keys for encryption.
On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<[email protected]
wrote:
Bill,
Do you envision one of the following as the driver behind
finer-grained
encryption?:
1. We would only encrypt certain columns in order to get
better
performance;
2. We would use different keys on different columns in order
to
revoke
access to a column via the key store;
3. We would only give a tablet server access to a subset of
columns
at
any
given time in order to protect something, and figure out
what
to
do
for
compactions, etc.;
4. Something entirely different...
Seems like thing #2 might have merit, but I'm not sure it's
worth
the
effort.
Adam
On Nov 4, 2015 7:38 PM, "William Slacum"<[email protected]>
wrote:
@Adam, column family level encryption can be useful for
multi-tenant
environments, and I think it maps pretty well to the
document
partitioning/sharding/wikisearch style tables. Things are
trickier
in
Accumulo than in HBase since there isn't a 1:1 mapping
between
column
families and files. The built in RFile encryption scheme
seems
better
suited to this.
@Christopher& Keith, it's something we can evaluate. Is
there
a
good
test
harness for just writing an RFile, opening a reader to it,
and
just
poking
around? I was looking at the constructors and they didn't
seem
straightforward enough for me to comprehend them within a
few
seconds.
On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
wrote:
On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
wrote:
On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
wrote:
Is "the code being 'at rest'" you making a funny about
active
development?
Making sure I haven't lost my ability to get jokes :)
I see two reasons why the code would be inactive: the
feature
is
good
enough as is or it's not interesting enough to attract
attention.
Considering it's not public API, there are no
discussions
to
bring
into
the
public API, and there's no effort to document how to use
it,
my
intuition
tells me that there isn't enough interest in it from a
project
perspective.
From a user perspective, I've been getting asked about
it
when
I
work
with
Accumulo users. My recommendation, exclusively, is to
use
HDFS
encryption
because I can go to Hadoop's website and find
documentation
on
it.
When
I
go to find documentation on Accumulo's offerings, any
usability
information
comes from vendor SlideShares. Most mentions of the
feature
on
official
Apache Accumulo channels echo Christopher's sentiments
on
the
feature
being
experimental and not being officially recommended for
use.
I wouldn't want to rip out the feature first and then
figure
things
out
later. Sean already alluded to it, but a roadmap should
contain
something
(tool or documentation) to help users migrate if we go
down
that
route.
What I'm trying to figure out is, when the question of
"How
do I
do
encryption at rest in Accumulo?" comes up, what is our
community's
answer?
If we went down the route of using HDFS encryption
zones,
can
we
offer
the
same features? At the very least, we'd be offering the
same
database-level
Where does the decryption happen with DFS, is it in the
DFS
client?
If
so, using HDFS level encryption seems to offer the same
functionality???
Has anyone written a tool that takes an
Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it
is
as
an
Accumulo-unencrypted-HDFS-encrypted-RFile? Wondering if
there
are
any
unexpected gotchas w/ this.
I was discussing my questions w/ Christopher today and he
mentioned
an
experiment that I thought was interesting. What is the
random
seek
performance of Accumulo-encrypted-HDFS-unencrypted-RFile
vs
Accumulo-unencrypted-HDFS-encrypted-RFile?
encryption scheme. I don't know the details of "more
advanced
key
stores",
but it seems like we could potentially take any custom
implementation
and
map it to a KeyProvider [1]. I could also envision table
level
encryption
being implementable via zones, but probably not down to
the
column
family
level.
[1]
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
wrote:
Responses inline.
Adam
On Nov 1, 2015 9:58 AM, "Christopher"<
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
wrote:
1. I'm not sure I'd call an incomplete solution
'great'.
What
it
does
is
provide partial encryption-at-rest protection (unless
you're
running
without walogs, and have good integration with some
external
secure
key
management faculty, and then it's probably fine).
The only thing that doesn't get encrypted is a
temporary
WAL
recovery
file.
That is a project we should take on, but it does not
imply
that
the
existing features are not valuable. With HDFS
encryption
options
this
would
now be a much easier project to take on. Also, the
users
I
know
that
use
encryption at rest do so with a more secure key store
than
the
default.
2. I'm concerned that anybody using Accumulo's E-A-R
don't
necessarily
realize its current shortcomings, or its lack of
upstream
maintenance
support (which it has not been receiving). It may be
the
case
that
these
users have support from an intermediary, and do
understand
the
shortcomings... I don't know, but it's a concern.
Anybody that creates a secure system has to analyze the
security
of
the
system as a whole. Accumulo's encryption at rest is one
part
of
the
solution. Taking away the tool without providing an
alternative
does
nothing to improve the security of systems built on
Accumulo.
3. Correction: it has been an explicitly experimental
feature
and
an
incomplete one, which hasn't really been touched in
two
years,
and
has
been
explicitly excluded by the community for being public
API
because
of
its
incompleteness. Age doesn't determine public API
status.
The
community
does.
People are using it, so we have to consider the
implications
of
whatever
changes we make and weigh against the benefits. I
believe
the
last
bug
fix
was done this year, so I would argue it is being
maintained.
Changes
to
our
encryption at rest implementation will have
consequences
for
those
users.
There had better be a clear benefit if we break their
systems.
4. Has Accumulo's been evaluated for security and
performance?
By
whom?
Is
it published?
Yes, there have been several talks at meetups and
conferences
that
discuss
the security and performance of the current solution.
On Sun, Nov 1, 2015, 08:55 Adam Fuchs<
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>>
wrote:
There's another way to look at the state of
Accumulo's
encryption
at
rest:
1. Encryption at rest works great for what it does,
and
the
code
being
"at
rest" isn't necessarily a problem
2. Several organizations are using Accumulo's
encryption
at
rest
effectively in operations
3. Encryption at rest has been a supported
configuration
option
for
over
two years with established plugin interfaces, and
therefore
it
should
be
considered part of the public API
4. Upstream alternatives (to my knowledge) have not
been
analyzed
for
performance or security
The given option #2 would at least require an
analysis
of
alternatives,
and
we would have to decide what to do about backwards
compatibility
for
users
using custom key stores and encryption strategies
that
may
or
may
not
be
supported by upstream alternatives.
As far as option #1 goes, I can get behind
encouraging
people
to
take
up
projects to improve Accumulo's encryption. I think
we're
already
going
down
this path, but without having identified resources to
do
the
improvements.
Any volunteers?
Adam
On Fri, Oct 30, 2015 at 4:22 PM, William Slacum<
[email protected]<javascript:_e(%7B%7D,'cvml','
[email protected]
');>>
wrote:
So I've been looking into options for providing
encryption
at
rest,
and
it
seems like what Accumulo has is abandonware from a
project
perspective.
There is no official documentation on how to perform
encryption
at
rest,
and the best information from its status comes from
year
(or
greater)
old
ticket comments about how the feature is still
experimental.
Recently
there
was a talk that described using HDFS encryption
zones
as
an
alternative.
From my perspective, this is what I see as the
current
situation:
1- Encryption at rest in Accumulo isn't actively
being
worked
on
2- Encryption at rest in Accumulo isn't part of the
public
API
or
marketed
capabilities
3- Documentation for what does exist is scattered
throughout
Jira
comments
or presentations
4- A viable alternative exists that appears to have
feature
parity in
HDFS
encryption
5- HBase has finer grained encryption capabilities
that
extend
beyond
what
HDFS provides
Moving forward, what's the consensus for supporting
this
feature?
Personally, I see two options:
1- Start going down a path to bring the feature into
the
forefront
and
start providing feature parity with HBase
or
2- Remove the feature and place emphasis on upstream
encryption
offerings
Any input is welcomed& appreciated!