Hi Rahul,

Have you identified with it takes those 30 minutes? Is the table balances
correctly across the servers? Form the logs, are you able to identify what
takes that much time?

JM

2015-07-10 18:46 GMT-04:00 rahul gidwani <[email protected]>:

> Hi Matteo,
>
> We do SKIP_FLUSH.  We have 1200+ regionservers with a single table with 60k
> regions and 4 column families.  It takes around 30 minutes to snapshot this
> table using manifests compared to just seconds doing this with hdfs.
> Cloning this table takes considerably longer.
>
> For cases where someone would want to run Map/Reduce over snapshots this
> could be much faster as we could take an hdfs snapshot and bypass the
> clone.
>
> rahul
>
>
> On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <[email protected]>
> wrote:
>
> > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <[email protected]>
> >  wrote:
> >
> > > Even with manifests (Snapshot V2) for our larger tables it can take
> hours
> > > to Snapshot and Clone a table.
> > >
> >
> > on snapshot time the only thing that can take hours, is "flush".
> > if you don't need that (which is what you get with hdfs snapshots) you
> can
> > specify SKIP_FLUSH => true
> >
> >
> > Matteo
> >
> >
> > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <[email protected]>
> > wrote:
> >
> > > HBase snapshots are a very useful feature. but it was implemented back
> > > before there was the ability to snapshot via HDFS.
> > >
> > > Newer versions of Hadoop support HDFS snapshots.  I was wondering if
> the
> > > community would be interested in something like a Snapshot V3 where we
> > use
> > > HDFS to take these snapshots.
> > >
> > > Even with manifests (Snapshot V2) for our larger tables it can take
> hours
> > > to Snapshot and Clone a table.
> > >
> > > Would this feature be of use to anyone?
> > >
> > > thanks
> > > rahul
> > >
> >
>

Reply via email to