yeah, something along that line. but I doubt the problem is RS side.
or the communication between the master and RSs.

in theory the problem may be the verification step where the master
is checking the snapshot. I was just trying to figure out where he is
spending the time
and that "30 minutes to snapshot" does not sound right to me,
because the snapshot phase where each RS take a manifest should not take
that long.

Matteo


On Fri, Jul 10, 2015 at 6:04 PM, Vladimir Rodionov <[email protected]>
wrote:

> Matteo, there should be some explanation for 30 min flash_skip snapshot. I
> think its should be somewhere in NN/Hdfs. This is a huge cluster and NN
> load is extreme, it is probably does not scale well with # DNs and #files
> per directory. I presume that NN performance on file operations degrades
> when # of DNs and/or directory sizes increase.
>
> -Vlad
>
> On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi <[email protected]>
> wrote:
>
> > Manifest per Region, not family.
> > we couldn't send them back to the master/table to keep compatibility.
> > 60k region on 1200 RS are ~50 manifest per RS that alone should not take
> > 30sec
> >
> >
> > On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <
> [email protected]
> > >
> > wrote:
> >
> > > OK, even with 1 manifest file per region (per column family?) - 60K X
> 4 =
> > > 240,000 new files
> > > 8000 per minute, 135 per second. That is probably NN limit.
> > >
> > > Anyway, the root cause is the same as with reference files during
> region
> > > split:
> > >
> > > HDFS does not do well on file create/open/close/delete.
> > >
> > > -Vlad
> > >
> > > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <
> > [email protected]>
> > > wrote:
> > >
> > > > @Vladimir there is no hfile link creation on snapshot. we create 1
> > > manifest
> > > > per region
> > > >
> > > > Matteo
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Being not very familiar with snapshot code, I could speculate only
> on
> > > > where
> > > > > most time is spent ...
> > > > >
> > > > > In creating 60K x 4 x K (K is average # of store files per region)
> > > small
> > > > > HFileLInks? This can be very large # of files.
> > > > >
> > > > > -Vlad
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > the total time taken by a snapshot should be bounded by the
> slowest
> > > > > > machine.
> > > > > > we send a notification to each RS and each RS execute the
> snapshot
> > > > > > operation for each region.
> > > > > > can you track down what is slow in your case?
> > > > > >
> > > > > > clone has to create a reference for each file, and that is a
> master
> > > > > > operation, and these calls may all go away if we change the
> layout
> > > in a
> > > > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > > > Most of the time should be spent on the enableTable phase of the
> > > clone.
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Hi Rahul,
> > > > > > >
> > > > > > > Have you identified with it takes those 30 minutes? Is the
> table
> > > > > balances
> > > > > > > correctly across the servers? Form the logs, are you able to
> > > identify
> > > > > > what
> > > > > > > takes that much time?
> > > > > > >
> > > > > > > JM
> > > > > > >
> > > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <
> > [email protected]
> > > >:
> > > > > > >
> > > > > > > > Hi Matteo,
> > > > > > > >
> > > > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single
> > > table
> > > > > with
> > > > > > > 60k
> > > > > > > > regions and 4 column families.  It takes around 30 minutes to
> > > > > snapshot
> > > > > > > this
> > > > > > > > table using manifests compared to just seconds doing this
> with
> > > > hdfs.
> > > > > > > > Cloning this table takes considerably longer.
> > > > > > > >
> > > > > > > > For cases where someone would want to run Map/Reduce over
> > > snapshots
> > > > > > this
> > > > > > > > could be much faster as we could take an hdfs snapshot and
> > bypass
> > > > the
> > > > > > > > clone.
> > > > > > > >
> > > > > > > > rahul
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > [email protected]>
> > > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> it
> > > can
> > > > > take
> > > > > > > > hours
> > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > on snapshot time the only thing that can take hours, is
> > > "flush".
> > > > > > > > > if you don't need that (which is what you get with hdfs
> > > > snapshots)
> > > > > > you
> > > > > > > > can
> > > > > > > > > specify SKIP_FLUSH => true
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Matteo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > HBase snapshots are a very useful feature. but it was
> > > > implemented
> > > > > > > back
> > > > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > > > >
> > > > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> > > > wondering
> > > > > > if
> > > > > > > > the
> > > > > > > > > > community would be interested in something like a
> Snapshot
> > V3
> > > > > where
> > > > > > > we
> > > > > > > > > use
> > > > > > > > > > HDFS to take these snapshots.
> > > > > > > > > >
> > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> it
> > > can
> > > > > take
> > > > > > > > hours
> > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > >
> > > > > > > > > > Would this feature be of use to anyone?
> > > > > > > > > >
> > > > > > > > > > thanks
> > > > > > > > > > rahul
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to