Bringing this thread up again, because I don't really know where else to ask...
The current backup&restore solution snapshots the backup metadata table and will restore-via-snapshot in case something goes wrong (or this is still in a patch? unclear if this has been committed or not, since there's a ton of code to dig through) AFAICT this is the major reason that we do not support concurrent backup or restore operations. (Are there others? Also couldn't find this.) The fault tolerance that we're working on now will need to be gutted and completely rewritten for the future improvements. I get that this is all internal and as long as we make it seamless for the operators then we have wide latitude to make our own changes. But an important question is just because we can, does it mean we should do this? I'm concerned that we're writing code that we know will get thrown away and replaced, except we will have to continue to support it for as long as 2.0 is an active branch. Mike On Wed, Nov 15, 2017 at 3:05 PM, Josh Elser <[email protected]> wrote: > On 11/14/17 4:54 PM, Mike Drob wrote: > >> I can see a small section on the documentation update I've already been >>> hacking on to include details on the issue "We can't help you secure >>> where >>> you put the data". Given how many instances of "globally readable S3 >>> bucket" I've seen recently, this strikes me as prudent. >>> >>> I would prefer this to be a giant, hard to miss, red letters, all caps >> warning; not a small section. I do think it is our responsibility for >> telling users how to configure the backup/restore process for >> communicating >> with secure systems. Or, at a minimum, documenting how we pass arbitrary >> configuration options that can then be used to communicate with said >> systems. >> > > :D > > For example, if we support writing backups to S3, then we should have a way >> to specify an Auth string and maybe even some of the custom headers like >> x-amz-acl. We don't have to explicitly enumerate best practices, but if >> the >> only option is to write to a globally open bucket, then I don't think we >> should advertise writing to S3 as an available option. >> >> Similarly, if we tell people that they can send backups to HDFS, then we >> should give them the hooks to correctly interface with a kerberized HDFS. >> >> Maybe this is already in the proposed patch, I haven't gone looking yet. >> > > Nope. I actually meant to include this in the patch I re-rolled today but > forgot. Let me update once more. > > Thanks again, Mike. Good questions/feedback! >
