On 11/14/17 3:04 PM, Mike Drob wrote:
I don't think the second part of my email ever got addressed.
I see "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
and claims that it will be implemented in the client, both of which make me
uncomfortable. Security Later is a general bad practice, and it is very
rarely correct to rely on client-side security for anything.
Is there another issue that covers security? Do we rely completely on
HDFS security here for more than just the DistCP? What kind of testing has
been done with security, do we have assurances that the backups aren't
accidentally exposing tables to the world?
"Security" as you phrase is pretty open ended, no? The current security
model is based around the filesystem permissions and the enforcement of
an HBase superuser to execute the necessary service operations behind
the BackupAdmin "facade" (e.g. WAL roll procedure execution, snapshot
creation, snapshot restore, update hbase:backup are the HBase client
actions actually being performed). That's the state of what it is right
now and, yes, it does rely on the filesystem backups are sent to (e.g.
HDFS, S3, Isilon, WASB) are properly secured. We certainly don't want to
be testing correctness of those systems in HBase.
I can see a small section on the documentation update I've already been
hacking on to include details on the issue "We can't help you secure
where you put the data". Given how many instances of "globally readable
S3 bucket" I've seen recently, this strikes me as prudent.
The final issue then is about the backup containing other table's data
-- somehow a backup would reference data from another table than the one
the admin intended to access. For full backups, this is out of scope
(the full backup is relying on Snapshots -- we shouldn't be testing
correctness of Snapshots via B&R). For incremental backups, specifically
when we're filtering WALs, this is a concern. Thankfully, it's an
analogous problem to "correctness". We have unit test coverage in this
area already, and we should get good coverage in the up-coming
integration test.
Does that help paint a better picture, Mike? Have I missed or glossed
over any points?