Re: CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Saad Mufti
Thanks, I tried briefly but maybe I didn't do quite the right search. In
any case, thanks for the help.


Saad


On Mon, Mar 19, 2018 at 2:50 PM, Huaxiang Sun  wrote:

> You can google search the exception stack and mostly it will find the JIRA.
>
> Regards,
>
> Huaxiang
>
> > On Mar 19, 2018, at 10:52 AM, Saad Mufti  wrote:
> >
> > Thanks!!! Wish that was documented somewhere in the manual.
> >
> > Cheers.
> >
> > 
> > Saad
> >
> >
> > On Mon, Mar 19, 2018 at 1:38 PM, Huaxiang Sun  wrote:
> >
> >> Mostly it is due to HBASE-15430  >> jira/browse/HBASE-15430>, “snapshot.manifest.size.limit” needs to be
> >> configured as 64MB or 128MB.
> >>
> >> Regards,
> >>
> >> Huaxiang Sun
> >>
> >>
> >>> On Mar 19, 2018, at 10:16 AM, Saad Mufti  wrote:
> >>>
> >>> Hi,
> >>>
> >>> We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.
> >>>
> >>> We have started seeing the following stacktrace when trying to take a
> >>> snapshot of a table with a very large number of files (12000 regions
> and
> >>> roughly 36 - 40 files). The number of files should go down as
> we
> >>> haven't been compacting for a while for other operational reasons and
> are
> >>> now running it. But I'd to understand why our snapshots are failing
> with
> >>> the following:
> >>>
> >>> 2018-03-19 16:05:56,948 ERROR
>  [MASTER_TABLE_OPERATIONS-ip-10-194-208-6:16000-0]
>  snapshot.TakeSnapshotHandler: Failed taking snapshot {
>  ss=pgs-device.03-19-2018-15 table=pgs-device type=SKIPFLUSH } due to
>  exception:unable to parse data manifest Protocol message was too
> >> large.  May
>  be malicious.  Use CodedInputStream.setSizeLimit() to increase the
> size
>  limit.
> 
>  org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: unable
> to
>  parse data manifest Protocol message was too large.  May be malicious.
> >> Use
>  CodedInputStream.setSizeLimit() to increase the size limit.
> 
>    at
>  org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(
> >> SnapshotManifest.java:468)
> 
>    at
>  org.apache.hadoop.hbase.snapshot.SnapshotManifest.
> >> load(SnapshotManifest.java:297)
> 
>    at
>  org.apache.hadoop.hbase.snapshot.SnapshotManifest.
> >> open(SnapshotManifest.java:129)
> 
>    at
>  org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.
> >> verifySnapshot(MasterSnapshotVerifier.java:108)
> 
>    at
>  org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(
> >> TakeSnapshotHandler.java:203)
> 
>    at
>  org.apache.hadoop.hbase.executor.EventHandler.run(
> >> EventHandler.java:129)
> 
>    at
>  java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> ThreadPoolExecutor.java:1149)
> 
>    at
>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> ThreadPoolExecutor.java:624)
> 
>    at java.lang.Thread.run(Thread.java:748)
> 
>  Caused by: com.google.protobuf.InvalidProtocolBufferException:
> Protocol
>  message was too large.  May be malicious.  Use
>  CodedInputStream.setSizeLimit() to increase the size limit.
> 
>    at
>  com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(
> >> InvalidProtocolBufferException.java:110)
> 
>    at
>  com.google.protobuf.CodedInputStream.refillBuffer(
> >> CodedInputStream.java:755)
> 
>    at
>  com.google.protobuf.CodedInputStream.readRawBytes(
> >> CodedInputStream.java:811)
> 
>    at
>  com.google.protobuf.CodedInputStream.readBytes(
> >> CodedInputStream.java:329)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1313)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1263)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$StoreFile$1.parsePartialFrom(
> >> SnapshotProtos.java:1364)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$StoreFile$1.parsePartialFrom(
> >> SnapshotProtos.java:1359)
> 
>    at
>  com.google.protobuf.CodedInputStream.readMessage(
> >> CodedInputStream.java:309)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2161)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2103)
> 
>    at
>  org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> >> SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(
> >> SnapshotProtos.java:2197)
> 
>    at
> >>

Re: Hbase master doesnt start from commandline

2018-03-19 Thread Dima Spivak
Hey Muni,

This is probably a better question for Cloudera support. Especially if you
happen to be using Cloudera Manager, the normal command line functionality
may not apply.


-Dima

On Mon, Mar 19, 2018 at 10:41 AM, Muni Adusumalli  wrote:

> Hi,
>
> We have a cloudera cluster running in production.
> One of the hosts went down as it ran out of disk space when running hbase
> backups.
> After that master is marked as Busy on cloudera.
> cant restart the master using command line.
> All the region servers are up, but the master is missing.
>
> Any inputs on what to look for?
>
> Regards,
> RekDev
>


Re: CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Huaxiang Sun
You can google search the exception stack and mostly it will find the JIRA.

Regards,

Huaxiang

> On Mar 19, 2018, at 10:52 AM, Saad Mufti  wrote:
> 
> Thanks!!! Wish that was documented somewhere in the manual.
> 
> Cheers.
> 
> 
> Saad
> 
> 
> On Mon, Mar 19, 2018 at 1:38 PM, Huaxiang Sun  wrote:
> 
>> Mostly it is due to HBASE-15430 > jira/browse/HBASE-15430>, “snapshot.manifest.size.limit” needs to be
>> configured as 64MB or 128MB.
>> 
>> Regards,
>> 
>> Huaxiang Sun
>> 
>> 
>>> On Mar 19, 2018, at 10:16 AM, Saad Mufti  wrote:
>>> 
>>> Hi,
>>> 
>>> We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.
>>> 
>>> We have started seeing the following stacktrace when trying to take a
>>> snapshot of a table with a very large number of files (12000 regions and
>>> roughly 36 - 40 files). The number of files should go down as we
>>> haven't been compacting for a while for other operational reasons and are
>>> now running it. But I'd to understand why our snapshots are failing with
>>> the following:
>>> 
>>> 2018-03-19 16:05:56,948 ERROR
 [MASTER_TABLE_OPERATIONS-ip-10-194-208-6:16000-0]
 snapshot.TakeSnapshotHandler: Failed taking snapshot {
 ss=pgs-device.03-19-2018-15 table=pgs-device type=SKIPFLUSH } due to
 exception:unable to parse data manifest Protocol message was too
>> large.  May
 be malicious.  Use CodedInputStream.setSizeLimit() to increase the size
 limit.
 
 org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: unable to
 parse data manifest Protocol message was too large.  May be malicious.
>> Use
 CodedInputStream.setSizeLimit() to increase the size limit.
 
   at
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(
>> SnapshotManifest.java:468)
 
   at
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.
>> load(SnapshotManifest.java:297)
 
   at
 org.apache.hadoop.hbase.snapshot.SnapshotManifest.
>> open(SnapshotManifest.java:129)
 
   at
 org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.
>> verifySnapshot(MasterSnapshotVerifier.java:108)
 
   at
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(
>> TakeSnapshotHandler.java:203)
 
   at
 org.apache.hadoop.hbase.executor.EventHandler.run(
>> EventHandler.java:129)
 
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1149)
 
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:624)
 
   at java.lang.Thread.run(Thread.java:748)
 
 Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
 message was too large.  May be malicious.  Use
 CodedInputStream.setSizeLimit() to increase the size limit.
 
   at
 com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(
>> InvalidProtocolBufferException.java:110)
 
   at
 com.google.protobuf.CodedInputStream.refillBuffer(
>> CodedInputStream.java:755)
 
   at
 com.google.protobuf.CodedInputStream.readRawBytes(
>> CodedInputStream.java:811)
 
   at
 com.google.protobuf.CodedInputStream.readBytes(
>> CodedInputStream.java:329)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1313)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1263)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$StoreFile$1.parsePartialFrom(
>> SnapshotProtos.java:1364)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$StoreFile$1.parsePartialFrom(
>> SnapshotProtos.java:1359)
 
   at
 com.google.protobuf.CodedInputStream.readMessage(
>> CodedInputStream.java:309)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2161)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2103)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(
>> SnapshotProtos.java:2197)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(
>> SnapshotProtos.java:2192)
 
   at
 com.google.protobuf.CodedInputStream.readMessage(
>> CodedInputStream.java:309)
 
   at
 org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
>> SnapshotRegionManifest.(SnapshotProtos.java:1165)
 
   at
 org.apache.hadoop.h

Hbase master doesnt start from commandline

2018-03-19 Thread Muni Adusumalli
Hi,

We have a cloudera cluster running in production.
One of the hosts went down as it ran out of disk space when running hbase
backups.
After that master is marked as Busy on cloudera.
cant restart the master using command line.
All the region servers are up, but the master is missing.

Any inputs on what to look for?

Regards,
RekDev


Re: CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Saad Mufti
Thanks!!! Wish that was documented somewhere in the manual.

Cheers.


Saad


On Mon, Mar 19, 2018 at 1:38 PM, Huaxiang Sun  wrote:

> Mostly it is due to HBASE-15430  jira/browse/HBASE-15430>, “snapshot.manifest.size.limit” needs to be
> configured as 64MB or 128MB.
>
> Regards,
>
> Huaxiang Sun
>
>
> > On Mar 19, 2018, at 10:16 AM, Saad Mufti  wrote:
> >
> > Hi,
> >
> > We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.
> >
> > We have started seeing the following stacktrace when trying to take a
> > snapshot of a table with a very large number of files (12000 regions and
> > roughly 36 - 40 files). The number of files should go down as we
> > haven't been compacting for a while for other operational reasons and are
> > now running it. But I'd to understand why our snapshots are failing with
> > the following:
> >
> > 2018-03-19 16:05:56,948 ERROR
> >> [MASTER_TABLE_OPERATIONS-ip-10-194-208-6:16000-0]
> >> snapshot.TakeSnapshotHandler: Failed taking snapshot {
> >> ss=pgs-device.03-19-2018-15 table=pgs-device type=SKIPFLUSH } due to
> >> exception:unable to parse data manifest Protocol message was too
> large.  May
> >> be malicious.  Use CodedInputStream.setSizeLimit() to increase the size
> >> limit.
> >>
> >> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: unable to
> >> parse data manifest Protocol message was too large.  May be malicious.
> Use
> >> CodedInputStream.setSizeLimit() to increase the size limit.
> >>
> >>at
> >> org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(
> SnapshotManifest.java:468)
> >>
> >>at
> >> org.apache.hadoop.hbase.snapshot.SnapshotManifest.
> load(SnapshotManifest.java:297)
> >>
> >>at
> >> org.apache.hadoop.hbase.snapshot.SnapshotManifest.
> open(SnapshotManifest.java:129)
> >>
> >>at
> >> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.
> verifySnapshot(MasterSnapshotVerifier.java:108)
> >>
> >>at
> >> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(
> TakeSnapshotHandler.java:203)
> >>
> >>at
> >> org.apache.hadoop.hbase.executor.EventHandler.run(
> EventHandler.java:129)
> >>
> >>at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >>
> >>at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >>
> >>at java.lang.Thread.run(Thread.java:748)
> >>
> >> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
> >> message was too large.  May be malicious.  Use
> >> CodedInputStream.setSizeLimit() to increase the size limit.
> >>
> >>at
> >> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(
> InvalidProtocolBufferException.java:110)
> >>
> >>at
> >> com.google.protobuf.CodedInputStream.refillBuffer(
> CodedInputStream.java:755)
> >>
> >>at
> >> com.google.protobuf.CodedInputStream.readRawBytes(
> CodedInputStream.java:811)
> >>
> >>at
> >> com.google.protobuf.CodedInputStream.readBytes(
> CodedInputStream.java:329)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1313)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1263)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$StoreFile$1.parsePartialFrom(
> SnapshotProtos.java:1364)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$StoreFile$1.parsePartialFrom(
> SnapshotProtos.java:1359)
> >>
> >>at
> >> com.google.protobuf.CodedInputStream.readMessage(
> CodedInputStream.java:309)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2161)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2103)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(
> SnapshotProtos.java:2197)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(
> SnapshotProtos.java:2192)
> >>
> >>at
> >> com.google.protobuf.CodedInputStream.readMessage(
> CodedInputStream.java:309)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest.(SnapshotProtos.java:1165)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest.(SnapshotProtos.java:1094)
> >>
> >>at
> >> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$
> SnapshotRegionManifest$1.parsePartialFrom

Re: CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Huaxiang Sun
Mostly it is due to HBASE-15430 
, 
“snapshot.manifest.size.limit” needs to be configured as 64MB or 128MB.

Regards,

Huaxiang Sun


> On Mar 19, 2018, at 10:16 AM, Saad Mufti  wrote:
> 
> Hi,
> 
> We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.
> 
> We have started seeing the following stacktrace when trying to take a
> snapshot of a table with a very large number of files (12000 regions and
> roughly 36 - 40 files). The number of files should go down as we
> haven't been compacting for a while for other operational reasons and are
> now running it. But I'd to understand why our snapshots are failing with
> the following:
> 
> 2018-03-19 16:05:56,948 ERROR
>> [MASTER_TABLE_OPERATIONS-ip-10-194-208-6:16000-0]
>> snapshot.TakeSnapshotHandler: Failed taking snapshot {
>> ss=pgs-device.03-19-2018-15 table=pgs-device type=SKIPFLUSH } due to
>> exception:unable to parse data manifest Protocol message was too large.  May
>> be malicious.  Use CodedInputStream.setSizeLimit() to increase the size
>> limit.
>> 
>> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: unable to
>> parse data manifest Protocol message was too large.  May be malicious.  Use
>> CodedInputStream.setSizeLimit() to increase the size limit.
>> 
>>at
>> org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(SnapshotManifest.java:468)
>> 
>>at
>> org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:297)
>> 
>>at
>> org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:129)
>> 
>>at
>> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:108)
>> 
>>at
>> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:203)
>> 
>>at
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
>> 
>>at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 
>>at java.lang.Thread.run(Thread.java:748)
>> 
>> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
>> message was too large.  May be malicious.  Use
>> CodedInputStream.setSizeLimit() to increase the size limit.
>> 
>>at
>> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>> 
>>at
>> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>> 
>>at
>> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
>> 
>>at
>> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1313)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1263)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile$1.parsePartialFrom(SnapshotProtos.java:1364)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile$1.parsePartialFrom(SnapshotProtos.java:1359)
>> 
>>at
>> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2161)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2103)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(SnapshotProtos.java:2197)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(SnapshotProtos.java:2192)
>> 
>>at
>> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest.(SnapshotProtos.java:1165)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest.(SnapshotProtos.java:1094)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$1.parsePartialFrom(SnapshotProtos.java:1201)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$1.parsePartialFrom(SnapshotProtos.java:1196)
>> 
>>at
>> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>> 
>>at
>> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotDataManifest.(SnapshotProtos.java:3858)
>> 
>>at
>> org

CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Saad Mufti
Hi,

We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.

We have started seeing the following stacktrace when trying to take a
snapshot of a table with a very large number of files (12000 regions and
roughly 36 - 40 files). The number of files should go down as we
haven't been compacting for a while for other operational reasons and are
now running it. But I'd to understand why our snapshots are failing with
the following:

2018-03-19 16:05:56,948 ERROR
> [MASTER_TABLE_OPERATIONS-ip-10-194-208-6:16000-0]
> snapshot.TakeSnapshotHandler: Failed taking snapshot {
> ss=pgs-device.03-19-2018-15 table=pgs-device type=SKIPFLUSH } due to
> exception:unable to parse data manifest Protocol message was too large.  May
> be malicious.  Use CodedInputStream.setSizeLimit() to increase the size
> limit.
>
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: unable to
> parse data manifest Protocol message was too large.  May be malicious.  Use
> CodedInputStream.setSizeLimit() to increase the size limit.
>
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(SnapshotManifest.java:468)
>
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:297)
>
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:129)
>
> at
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:108)
>
> at
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:203)
>
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
> message was too large.  May be malicious.  Use
> CodedInputStream.setSizeLimit() to increase the size limit.
>
> at
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>
> at
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>
> at
> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
>
> at
> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1313)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile.(SnapshotProtos.java:1263)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile$1.parsePartialFrom(SnapshotProtos.java:1364)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$StoreFile$1.parsePartialFrom(SnapshotProtos.java:1359)
>
> at
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2161)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles.(SnapshotProtos.java:2103)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(SnapshotProtos.java:2197)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$FamilyFiles$1.parsePartialFrom(SnapshotProtos.java:2192)
>
> at
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest.(SnapshotProtos.java:1165)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest.(SnapshotProtos.java:1094)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$1.parsePartialFrom(SnapshotProtos.java:1201)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotRegionManifest$1.parsePartialFrom(SnapshotProtos.java:1196)
>
> at
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotDataManifest.(SnapshotProtos.java:3858)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotDataManifest.(SnapshotProtos.java:3792)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotDataManifest$1.parsePartialFrom(SnapshotProtos.java:3894)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos$SnapshotDataManifest$1.parsePartialFrom(SnapshotP

Re: Scan problem

2018-03-19 Thread Saad Mufti
Another option if you have enough disk space/off heap memory space is to
enable bucket cache to cache even more of your data, and set the
PREFETCH_ON_OPEN => true option on the column families you want always
cache. That way HBase will prefetch your data into the bucket cache and
your scan won't have that initial slowdown. Or if you want to do it
globally for all column families, set the configuration flag
"hbase.rs.prefetchblocksonopen" to "true". Keep in mind though that if you
do this, you should either have enough bucket cache space for all your
data, otherwise there will be a lot of useless eviction activity at HBase
startup and even later.

Also, where a region is located will also be heavily impacted by which
region balancer you have chosen and how you have tuned it in terms of how
often to run and other parameters. A split region will stay initially at
least on the same region server but your balancer if and when run can move
it (an indeed any region) elsewhere to satisfy its criteria.

Cheers.


Saad


On Mon, Mar 19, 2018 at 1:14 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Hi
>
> First regarding the scans,
>
> Generally the data resides in the store files which is in HDFS. So probably
> the first scan that you are doing is reading from HDFS which involves disk
> reads. Once the blocks are read, they are cached in the Block cache of
> HBase. So your further reads go through that and hence you see further
> speed up in the scans.
>
> >> And another question about region split, I want to know which
> RegionServer
> will load the new region afther splited ,
> Will they be the same One with the old region?
> Yes . Generally same region server hosts it.
>
> In master the code is here,
> https://github.com/apache/hbase/blob/master/hbase-
> server/src/main/java/org/apache/hadoop/hbase/master/assignment/
> SplitTableRegionProcedure.java
>
> You may need to understand the entire flow to know how the regions are
> opened after a split.
>
> Regards
> Ram
>
> On Sat, Mar 17, 2018 at 9:02 PM, Yang Zhang 
> wrote:
>
> > Hello everyone
> >
> > I try to do many Scan use RegionScanner in coprocessor, and
> ervery
> > time ,the first Scan cost  about 10 times than the other,
> > I don't know why this will happen
> >
> > OneBucket Scan cost is : 8794 ms Num is : 710
> > OneBucket Scan cost is : 91 ms Num is : 776
> > OneBucket Scan cost is : 87 ms Num is : 808
> > OneBucket Scan cost is : 105 ms Num is : 748
> > OneBucket Scan cost is : 68 ms Num is : 200
> >
> >
> > And another question about region split, I want to know which
> RegionServer
> > will load the new region afther splited ,
> > Will they be the same One with the old region?  Anyone know where I can
> > find the code to learn about that?
> >
> >
> > Thanks for your help
> >
>