Jeff, that's exactly what the table export feature does today. This thing
to consider here, is whether we should automatically do that periodically.

To be clear, I'm not convinced myself that it's a good idea to
automatically do it.

On Fri, Jul 14, 2017 at 2:45 PM Jeff Kubina <[email protected]> wrote:

> Wouldn't it be better to have a utility method that reads all the splits
> from the table's rfiles that outputs them to a file? We could then use the
> file to recreate the table with the pre-existing splits.
>
> --
> Jeff Kubina
> 410-988-4436 <(410)%20988-4436>
>
>
> On Fri, Jul 14, 2017 at 2:26 PM, Sean Busbey <[email protected]> wrote:
>
> > This could also be useful for botched upgrades
> > (should we change stuff in meta again).
> >
> > Don't we already default replication of the blocks for the meta tables
> > to something very high? Aren't the exported-to-HDFS things just as
> > subject to block corruption, or more-so if they use default
> > replication?
> >
> > I think if we automate something like this, to Mike's point about set
> > & pray, we'd have to also build in automated periodic checks on if the
> > stored information is useful so that operators can be alerted.
> >
> > Can we sketch what testing looks like?
> >
> > Christopher, can you get some estimates on what kind of volume we're
> > talking about here? Seems like it'd be small.
> >
> > On Fri, Jul 14, 2017 at 1:07 PM, Christopher <[email protected]>
> wrote:
> > > The problem is HDFS corrupt blocks which affect the metadata tables. I
> > > don't know that this window is all that narrow. I've seen corrupt
> blocks
> > > far more often than HDFS outages. Some due to HDFS bugs, some due to
> > > hardware failures and too few replicas, etc. We know how to recover
> > corrupt
> > > blocks in user tables (accepting data loss) by essentially replacing a
> > > corrupt file with an empty one. But, we don't really have a good way to
> > > recover when the corrupt blocks occur in metadata tables. That's what
> > this
> > > would address.
> > >
> > > On Fri, Jul 14, 2017 at 1:47 PM Mike Drob <[email protected]> wrote:
> > >
> > >> What's the risk that we are trying to address?
> > >>
> > >> Storing data locally won't help in case of a namenode failure. If you
> > have
> > >> a failure that's severe enough to actually kill blocks but not severe
> > >> enough that your HDFS is still up, that's a pretty narrow window.
> > >>
> > >> How do you test that your backups are good? That you haven't lost any
> > data
> > >> there? Or is it a set and forget (and pray?)
> > >>
> > >> This seems like something that is not worth while to automate because
> > >> everybody is going to have such different needs. Write a blog post,
> then
> > >> push people onto existing backup/disaster recovery solutions,
> including
> > off
> > >> site storage, etc. If they're not already convinced that they need
> this,
> > >> then their data likely isn't that valuable to begin with. If this same
> > >> problem happens multiple times to the same user... I don't think a
> > periodic
> > >> export table will help them.
> > >>
> > >> Mike
> > >>
> > >> On Fri, Jul 14, 2017 at 12:29 PM, Christopher <[email protected]>
> > wrote:
> > >>
> > >> > I saw a user running a very old version of Accumulo run into a
> pretty
> > >> > severe failure, where they lost an HDFS block containing part of
> their
> > >> root
> > >> > tablet. This, of course, will cause a ton of problems. Without the
> > root
> > >> > tablet, you can't recover the metadata table, and without that, you
> > can't
> > >> > recover your user tables.
> > >> >
> > >> > Now, you can recover the RFiles, of course... but without knowing
> the
> > >> split
> > >> > points, you can run into all sorts of problems trying to restore an
> > >> > Accumulo instance from just these RFiles.
> > >> >
> > >> > We have an export table feature which creates a snapshot of the
> split
> > >> > points for a table, allowing a user to relatively easily recover
> from
> > a
> > >> > serious failure, provided the RFiles are available. However, that
> > >> requires
> > >> > a user to manually run it on occasion, which of course does not
> > happen by
> > >> > default.
> > >> >
> > >> > I'm interested to know what people think about possibly doing
> > something
> > >> > like this internally on a regular basis. Maybe hourly by default,
> > >> performed
> > >> > by the Master for all user tables, and saved to a file in /accumulo
> on
> > >> > HDFS?
> > >> >
> > >> > The closest think I can think of to this, which has saved me more
> than
> > >> > once, is the way Chrome and Firefox backup open tabs and bookmarks
> > >> > regularly, to restore from a crash.
> > >> >
> > >> > Users could already be doing this on their own, so it's not really
> > >> > necessary to bake it in... but as we all probably know... people are
> > >> really
> > >> > bad at customizing away from defaults.
> > >> >
> > >> > What are some of the issues and trade-offs of incorporating this as
> a
> > >> > default feature? What are some of the issues we'd have to address
> with
> > >> it?
> > >> > What would its configuration look like? Should it be on by default?
> > >> >
> > >> > Perhaps a simple blog describing a custom user service running
> > alongside
> > >> > Accumulo which periodically runs "export table" would suffice? (this
> > is
> > >> > what I'm leaning towards, but the idea of making it default is
> > >> compelling,
> > >> > given the number of times I've seen users struggle to plan for or
> > respond
> > >> > to catastrophic failures, especially at the storage layer).
> > >> >
> > >>
> >
> >
> >
> > --
> > busbey
> >
>

Reply via email to