Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread Stack
On Mon, May 22, 2017 at 9:31 AM, jeff saremi  wrote:

> while I'm still trying to find anything useful in the logs, my question is
> why isn't HBase self managing this?
>

It should do better here, yes (I thought TooBusy retried but I am not
finding it at the mo.). Exception is thrown for such as the reasons James
lists -- in essence out of resources --  including the case where we fail
to obtain lock inside the configured timeouts (row lock on write or region
lock doing bulk load). As James notes, you should see the too busy dumped
into the regionserver log at time of issue. Having this, you can figure
what resource is crimped. Is there no more detail on client side on the
root of the TooBusy exceptions?


Thanks,
S



>
> 
> From: jeff saremi 
> Sent: Friday, May 19, 2017 8:18:59 PM
> To: user@hbase.apache.org
> Subject: Re: What is the cause for RegionTooBusyException?
>
> Thanks Ted. I will look deeper as you suggested
>
> 
> From: Ted Yu 
> Sent: Friday, May 19, 2017 4:18:12 PM
> To: user@hbase.apache.org
> Subject: Re: What is the cause for RegionTooBusyException?
>
> Have you checked region server log ?
> Please take a look at the following method in HRegion:
>
>   private void checkResources() throws RegionTooBusyException {
>
> ...
>
> if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
>
>   blockedRequestsCount.increment();
>
>   requestFlush();
>
>   throw new RegionTooBusyException("Above memstore limit, " +
>
> Which hbase release are you using ?
>
> Cheers
>
> On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> wrote:
>
> > We're getting errors like this. Where should we be looking into to solve
> > this?
> >
> >
> > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > RemoteWithExtrasException: 56566 times
> >
> > thanks
> >
> > Jeff
> >
> >
>


Re: region files

2017-05-26 Thread Ted Yu
The hfiles of a region are stored on hdfs. By default, hdfs has replication
factor of 3.
If you're not using read replica feature, any single region is served by
one region server (however the data blocks of the hfile may not be on the
same node as the region server).

Cheers

On Thu, May 25, 2017 at 11:45 PM, Rajeshkumar J  wrote:

> Hi,
>
>we have region max file size as 10 GB. Whether the hfiles of a region
> exists in same region server or will it be distributed?
>
> Thanks
>


Re: region files

2017-05-26 Thread Rajeshkumar J
Thanks Ted. If data blocks of the hfile may not be on the same node as the
region server then how data locality is achieved when mapreduce is run over
hbase tables



On Fri, May 26, 2017 at 6:15 PM, Ted Yu  wrote:

> The hfiles of a region are stored on hdfs. By default, hdfs has replication
> factor of 3.
> If you're not using read replica feature, any single region is served by
> one region server (however the data blocks of the hfile may not be on the
> same node as the region server).
>
> Cheers
>
> On Thu, May 25, 2017 at 11:45 PM, Rajeshkumar J <
> rajeshkumarit8...@gmail.com
> > wrote:
>
> > Hi,
> >
> >we have region max file size as 10 GB. Whether the hfiles of a region
> > exists in same region server or will it be distributed?
> >
> > Thanks
> >
>


Re: region files

2017-05-26 Thread Ted Yu
Consider running major compaction which restores data locality. 

Thanks

> On May 26, 2017, at 6:08 AM, Rajeshkumar J  
> wrote:
> 
> Thanks Ted. If data blocks of the hfile may not be on the same node as the
> region server then how data locality is achieved when mapreduce is run over
> hbase tables
> 
> 
> 
>> On Fri, May 26, 2017 at 6:15 PM, Ted Yu  wrote:
>> 
>> The hfiles of a region are stored on hdfs. By default, hdfs has replication
>> factor of 3.
>> If you're not using read replica feature, any single region is served by
>> one region server (however the data blocks of the hfile may not be on the
>> same node as the region server).
>> 
>> Cheers
>> 
>> On Thu, May 25, 2017 at 11:45 PM, Rajeshkumar J <
>> rajeshkumarit8...@gmail.com
>>> wrote:
>> 
>>> Hi,
>>> 
>>>   we have region max file size as 10 GB. Whether the hfiles of a region
>>> exists in same region server or will it be distributed?
>>> 
>>> Thanks
>> 


Re: region files

2017-05-26 Thread Rajeshkumar J
I have seen the code in that while creating input split they are also
sending region info with that splits. Is there any reason for that as all
the hfiles are not going to be in that server

On Fri, May 26, 2017 at 7:06 PM, Ted Yu  wrote:

> Consider running major compaction which restores data locality.
>
> Thanks
>
> > On May 26, 2017, at 6:08 AM, Rajeshkumar J 
> wrote:
> >
> > Thanks Ted. If data blocks of the hfile may not be on the same node as
> the
> > region server then how data locality is achieved when mapreduce is run
> over
> > hbase tables
> >
> >
> >
> >> On Fri, May 26, 2017 at 6:15 PM, Ted Yu  wrote:
> >>
> >> The hfiles of a region are stored on hdfs. By default, hdfs has
> replication
> >> factor of 3.
> >> If you're not using read replica feature, any single region is served by
> >> one region server (however the data blocks of the hfile may not be on
> the
> >> same node as the region server).
> >>
> >> Cheers
> >>
> >> On Thu, May 25, 2017 at 11:45 PM, Rajeshkumar J <
> >> rajeshkumarit8...@gmail.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>>   we have region max file size as 10 GB. Whether the hfiles of a region
> >>> exists in same region server or will it be distributed?
> >>>
> >>> Thanks
> >>
>


Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread jeff saremi
Hi Stack

no there are no details in the exception. I mentioned that in another thread. 
When you perform a Batch operation, I believe no details will be communicated. 
I am not sure about individual Put's though. That makes it hard to go through 
logs cause we don't know out of hundreds of RS's which logs to look at

I have an issue with this exception being thrown period. I think the resource 
management needs a lot of work. I will soon post another note about my 
impression of this whole thing.

Jeff


From: saint@gmail.com  on behalf of Stack 

Sent: Friday, May 26, 2017 12:05:36 AM
To: Hbase-User
Subject: Re: What is the cause for RegionTooBusyException?

On Mon, May 22, 2017 at 9:31 AM, jeff saremi  wrote:

> while I'm still trying to find anything useful in the logs, my question is
> why isn't HBase self managing this?
>

It should do better here, yes (I thought TooBusy retried but I am not
finding it at the mo.). Exception is thrown for such as the reasons James
lists -- in essence out of resources --  including the case where we fail
to obtain lock inside the configured timeouts (row lock on write or region
lock doing bulk load). As James notes, you should see the too busy dumped
into the regionserver log at time of issue. Having this, you can figure
what resource is crimped. Is there no more detail on client side on the
root of the TooBusy exceptions?


Thanks,
S



>
> 
> From: jeff saremi 
> Sent: Friday, May 19, 2017 8:18:59 PM
> To: user@hbase.apache.org
> Subject: Re: What is the cause for RegionTooBusyException?
>
> Thanks Ted. I will look deeper as you suggested
>
> 
> From: Ted Yu 
> Sent: Friday, May 19, 2017 4:18:12 PM
> To: user@hbase.apache.org
> Subject: Re: What is the cause for RegionTooBusyException?
>
> Have you checked region server log ?
> Please take a look at the following method in HRegion:
>
>   private void checkResources() throws RegionTooBusyException {
>
> ...
>
> if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
>
>   blockedRequestsCount.increment();
>
>   requestFlush();
>
>   throw new RegionTooBusyException("Above memstore limit, " +
>
> Which hbase release are you using ?
>
> Cheers
>
> On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> wrote:
>
> > We're getting errors like this. Where should we be looking into to solve
> > this?
> >
> >
> > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > RemoteWithExtrasException: 56566 times
> >
> > thanks
> >
> > Jeff
> >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread Dima Spivak
Sending this back to the user mailing list.

RegionServers can die for many reasons. Looking at your RegionServer log
files should give hints as to why it's happening.


-Dima

On Fri, May 26, 2017 at 9:48 AM, jeff saremi  wrote:

> I had posted this to the user mailing list and I have not got any direct
> answer to my question.
>
> Where do dead RS's come from and how can they be cleaned up? Someone in
> the midst of developers should know this.
>
> thanks
>
> Jeff
>
> 
> From: jeff saremi 
> Sent: Thursday, May 25, 2017 10:23:17 AM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> I'm still looking to get hints on how to remove the dead regions. thanks
>
> 
> From: jeff saremi 
> Sent: Wednesday, May 24, 2017 12:27:06 PM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> i'm trying to eliminate the dead region servers.
>
> 
> From: Ted Yu 
> Sent: Wednesday, May 24, 2017 12:17:40 PM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> bq. running hbck (many times
>
> Can you describe the specific inconsistencies you were trying to resolve ?
> Depending on the inconsistencies, advice can be given on the best known
> hbck command arguments to use.
>
> Feel free to pastebin master log if needed.
>
> On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> wrote:
>
> > these are the things I have done so far:
> >
> >
> > - restarting master (few times)
> >
> > - running hbck (many times; this tool does not seem to be doing anything
> > at all)
> >
> > - checking the list of region servers in ZK (none of the dead ones are
> > listed here)
> >
> > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > are listed here with "-splitting" at the end of their names and they
> > contain one single file like: 1493846660401..meta.1493922323600.meta
> >
> >
> >
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > To: user@hbase.apache.org
> > Subject: What is Dead Region Servers and how to clear them up?
> >
> > Apparently having dead region servers is so common that a section of the
> > master console is dedicated to that?
> > How can we clean this up (preferably in an automated fashion)? Why isn't
> > this being done by HBase automatically?
> >
> >
> > thanks
> >
>


Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread James Moore
One mechanism for revealing the error in question is to print one of the
individual exceptions which are included in the batch calls response.  We
use this in a few places to allow inspection of individual Exceptions you
can see an example of how to do this over here
https://github.com/apache/hbase/blob/master/hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java#L1222


On Fri, May 26, 2017 at 12:47 PM, jeff saremi 
wrote:

> Hi Stack
>
> no there are no details in the exception. I mentioned that in another
> thread. When you perform a Batch operation, I believe no details will be
> communicated. I am not sure about individual Put's though. That makes it
> hard to go through logs cause we don't know out of hundreds of RS's which
> logs to look at
>
> I have an issue with this exception being thrown period. I think the
> resource management needs a lot of work. I will soon post another note
> about my impression of this whole thing.
>
> Jeff
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Friday, May 26, 2017 12:05:36 AM
> To: Hbase-User
> Subject: Re: What is the cause for RegionTooBusyException?
>
> On Mon, May 22, 2017 at 9:31 AM, jeff saremi 
> wrote:
>
> > while I'm still trying to find anything useful in the logs, my question
> is
> > why isn't HBase self managing this?
> >
>
> It should do better here, yes (I thought TooBusy retried but I am not
> finding it at the mo.). Exception is thrown for such as the reasons James
> lists -- in essence out of resources --  including the case where we fail
> to obtain lock inside the configured timeouts (row lock on write or region
> lock doing bulk load). As James notes, you should see the too busy dumped
> into the regionserver log at time of issue. Having this, you can figure
> what resource is crimped. Is there no more detail on client side on the
> root of the TooBusy exceptions?
>
>
> Thanks,
> S
>
>
>
> >
> > 
> > From: jeff saremi 
> > Sent: Friday, May 19, 2017 8:18:59 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Thanks Ted. I will look deeper as you suggested
> >
> > 
> > From: Ted Yu 
> > Sent: Friday, May 19, 2017 4:18:12 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Have you checked region server log ?
> > Please take a look at the following method in HRegion:
> >
> >   private void checkResources() throws RegionTooBusyException {
> >
> > ...
> >
> > if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
> >
> >   blockedRequestsCount.increment();
> >
> >   requestFlush();
> >
> >   throw new RegionTooBusyException("Above memstore limit, " +
> >
> > Which hbase release are you using ?
> >
> > Cheers
> >
> > On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> > wrote:
> >
> > > We're getting errors like this. Where should we be looking into to
> solve
> > > this?
> > >
> > >
> > > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > > RemoteWithExtrasException: 56566 times
> > >
> > > thanks
> > >
> > > Jeff
> > >
> > >
> >
>


Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread jeff saremi
@James, thank you very much. That was extremely helpful


From: James Moore 
Sent: Friday, May 26, 2017 10:24:42 AM
To: user@hbase.apache.org
Subject: Re: What is the cause for RegionTooBusyException?

One mechanism for revealing the error in question is to print one of the
individual exceptions which are included in the batch calls response.  We
use this in a few places to allow inspection of individual Exceptions you
can see an example of how to do this over here
https://github.com/apache/hbase/blob/master/hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java#L1222


On Fri, May 26, 2017 at 12:47 PM, jeff saremi 
wrote:

> Hi Stack
>
> no there are no details in the exception. I mentioned that in another
> thread. When you perform a Batch operation, I believe no details will be
> communicated. I am not sure about individual Put's though. That makes it
> hard to go through logs cause we don't know out of hundreds of RS's which
> logs to look at
>
> I have an issue with this exception being thrown period. I think the
> resource management needs a lot of work. I will soon post another note
> about my impression of this whole thing.
>
> Jeff
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Friday, May 26, 2017 12:05:36 AM
> To: Hbase-User
> Subject: Re: What is the cause for RegionTooBusyException?
>
> On Mon, May 22, 2017 at 9:31 AM, jeff saremi 
> wrote:
>
> > while I'm still trying to find anything useful in the logs, my question
> is
> > why isn't HBase self managing this?
> >
>
> It should do better here, yes (I thought TooBusy retried but I am not
> finding it at the mo.). Exception is thrown for such as the reasons James
> lists -- in essence out of resources --  including the case where we fail
> to obtain lock inside the configured timeouts (row lock on write or region
> lock doing bulk load). As James notes, you should see the too busy dumped
> into the regionserver log at time of issue. Having this, you can figure
> what resource is crimped. Is there no more detail on client side on the
> root of the TooBusy exceptions?
>
>
> Thanks,
> S
>
>
>
> >
> > 
> > From: jeff saremi 
> > Sent: Friday, May 19, 2017 8:18:59 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Thanks Ted. I will look deeper as you suggested
> >
> > 
> > From: Ted Yu 
> > Sent: Friday, May 19, 2017 4:18:12 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Have you checked region server log ?
> > Please take a look at the following method in HRegion:
> >
> >   private void checkResources() throws RegionTooBusyException {
> >
> > ...
> >
> > if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
> >
> >   blockedRequestsCount.increment();
> >
> >   requestFlush();
> >
> >   throw new RegionTooBusyException("Above memstore limit, " +
> >
> > Which hbase release are you using ?
> >
> > Cheers
> >
> > On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> > wrote:
> >
> > > We're getting errors like this. Where should we be looking into to
> solve
> > > this?
> > >
> > >
> > > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > > RemoteWithExtrasException: 56566 times
> > >
> > > thanks
> > >
> > > Jeff
> > >
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
Thank you for the GFY answer

And i guess to figure out how to fix these I can always go through the HBase 
source code.



From: Dima Spivak 
Sent: Friday, May 26, 2017 9:58:00 AM
To: hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Sending this back to the user mailing list.

RegionServers can die for many reasons. Looking at your RegionServer log
files should give hints as to why it's happening.


-Dima

On Fri, May 26, 2017 at 9:48 AM, jeff saremi  wrote:

> I had posted this to the user mailing list and I have not got any direct
> answer to my question.
>
> Where do dead RS's come from and how can they be cleaned up? Someone in
> the midst of developers should know this.
>
> thanks
>
> Jeff
>
> 
> From: jeff saremi 
> Sent: Thursday, May 25, 2017 10:23:17 AM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> I'm still looking to get hints on how to remove the dead regions. thanks
>
> 
> From: jeff saremi 
> Sent: Wednesday, May 24, 2017 12:27:06 PM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> i'm trying to eliminate the dead region servers.
>
> 
> From: Ted Yu 
> Sent: Wednesday, May 24, 2017 12:17:40 PM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> bq. running hbck (many times
>
> Can you describe the specific inconsistencies you were trying to resolve ?
> Depending on the inconsistencies, advice can be given on the best known
> hbck command arguments to use.
>
> Feel free to pastebin master log if needed.
>
> On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> wrote:
>
> > these are the things I have done so far:
> >
> >
> > - restarting master (few times)
> >
> > - running hbck (many times; this tool does not seem to be doing anything
> > at all)
> >
> > - checking the list of region servers in ZK (none of the dead ones are
> > listed here)
> >
> > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > are listed here with "-splitting" at the end of their names and they
> > contain one single file like: 1493846660401..meta.1493922323600.meta
> >
> >
> >
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > To: user@hbase.apache.org
> > Subject: What is Dead Region Servers and how to clear them up?
> >
> > Apparently having dead region servers is so common that a section of the
> > master console is dedicated to that?
> > How can we clean this up (preferably in an automated fashion)? Why isn't
> > this being done by HBase automatically?
> >
> >
> > thanks
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread James Moore
In HBase all data is stored in HDFS rather than inside of the region
server.  The HBase cluster itself considers any individual region
server process a region server and when that process dies it is considered
a dead region server, this tracking is particularly important during the
crash recovery process and dealing with network partitions, there isn't any
need to clean up dead region servers as an out of band maintenance task and
will be cleaned up by the HMasters eventually.

On Fri, May 26, 2017 at 2:03 PM, jeff saremi  wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread Enis Söztutar
Jeff, please be respectful to be people who are trying to help you. This is
not acceptable behavior and will result in consequences next time.

On the specific issue that you are seeing, it is highly likely that you are
seeing this: https://issues.apache.org/jira/browse/HBASE-14223. Having
those servers in the dead servers list will not hurt operations, or
runtimes or anything else. Possibly for those servers, there is not new
instance of the regionserver running in the same host and ports.

If you want to manually clean out these, you can follow these steps:
 - Manually move these directries from the file system:
/WALs/dead-server-splitting
 - ONLY do this if you are sure that there is no "WAL" recovery is
happening, and there is only WAL files with names containing ".meta."
 - Restart HBase master.

Upon restart, you can see that these do not show up anymore. For more
technical details, please refer to the jira link.

Enis

On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
thanks Enis

I apologize for earlier

This looks very close to our issue
When you say: "there is no "WAL" recovery is happening", how could i make sure 
of that? Thanks

Jeff



From: Enis Söztutar 
Sent: Friday, May 26, 2017 11:47:11 AM
To: d...@hbase.apache.org
Cc: hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Jeff, please be respectful to be people who are trying to help you. This is
not acceptable behavior and will result in consequences next time.

On the specific issue that you are seeing, it is highly likely that you are
seeing this: https://issues.apache.org/jira/browse/HBASE-14223. Having
those servers in the dead servers list will not hurt operations, or
runtimes or anything else. Possibly for those servers, there is not new
instance of the regionserver running in the same host and ports.

If you want to manually clean out these, you can follow these steps:
 - Manually move these directries from the file system:
/WALs/dead-server-splitting
 - ONLY do this if you are sure that there is no "WAL" recovery is
happening, and there is only WAL files with names containing ".meta."
 - Restart HBase master.

Upon restart, you can see that these do not show up anymore. For more
technical details, please refer to the jira link.

Enis

On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles

2017-05-26 Thread Ted Yu
You can compare 27c0a905ee174c9898d324acf1554bf9 with region names of the
target table and find out which region the hfile should go to.

Then check the region server log where that region was hosted and see what
the cause might be.

Cheers

On Thu, May 25, 2017 at 5:53 PM, anil gupta  wrote:

> Cross posting since this seems to be an HBase issue.
> I think completeBulkLoad step is failing. Please refer to the mail below.
>
> -- Forwarded message --
> From: anil gupta 
> Date: Thu, May 25, 2017 at 4:38 PM
> Subject: [IndexTool NOT working] mapreduce.LoadIncrementalHFiles: Split
> occured while grouping HFiles
> To: "u...@phoenix.apache.org" 
>
>
> Hi,
>
> We are using HDP2.3.2(Phoenix 4.4 and HBase 1.1), we created a secondary
> index on an already existing table. We paused all writes to Primary table.
> Then we invoked IndexTool to populate secondary index table. We have tried
> same steps many times but we keep on getting following error(we have also
> tried drop the index and adding it again):
>
> 2017-05-24 18:00:10,281 WARN  [LoadIncrementalHFiles-2]
> util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is
> deprecated by hbase.regionserver.global.memstore.size
> 2017-05-24 18:00:10,340 WARN  [LoadIncrementalHFiles-12]
> util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is
> deprecated by hbase.regionserver.global.memstore.size
> 2017-05-24 18:00:10,342 INFO  [LoadIncrementalHFiles-11]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> c79ae6d27824424f99523dad586e86b1 first=JF2GPADC8GH331037\x00\
> x80\x00\x1A0\x80\x00\x01Wj\x03r1defc4d301e4ec172b49be4a7ea33c2f7
> last=JTHBK1GG4E2122477\x00\x80\x00$\xE4\x80\x00\x01[\xAD`{\
> x17901d036d588292854ac5b1d4c29d8e1e
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-14]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> f0e97b218aed4abf9949cf49a57e559b first=5NPEB4AC3DH620091\x00\
> x80\x00\xE0\x16\x80\x00\x01X\xE5g\xD6\x0B81d210ac753ed281e8627e5edb7eb59f
> last=JF2GPADC8GH331037\x00\x80\x00\x1A0\x80\x00\x01W]&\
> xE54f37d636104f6cd916b2b07bf3aa94d3f
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-2]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> 27c0a905ee174c9898d324acf1554bf9 first=WMWZP3C58FT708786\x00\
> x80\x00\xE0\x16\x80\x00\x01Y\xB8\x95U\xA0d21d32aed18af976dd53735705c728cd
> last=`1GCRCPE05BZ430377\x00\x80\x00}\x05\x80\x00\x01[\xDEE\
> x91L383768c6ac5f306fa99f68964b4f18aa
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-12]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> d0a6c4b727bb416f840ed254658f3982 first=1N4BZ0CP4GC308715\x00\
> x80\x01T\xFC\x80\x00\x01U\xE3\x7FL\x9A37b77d47941e99e430fcb0e0657f5558
> last=2GKALMEK7H6220949\x00\x80\x00!\x1A\x80\x00\x01Y\x18\xE6\xB3\
> xB42e72036f7e7e03078f41fc82712c5de7
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-0]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> 11627a2861e3446e9d6f684ab534563e first=3C7WRNAJ6GG313342\x00\
> x80\x00NB\x80\x00\x01V}\xFD\xE4+65bbebdd06dedd8466a31ebd33841a51
> last=3N1CE2CP2FL407481\x00\x80\x00\xE0\x16\x80\x00\x01W\x1B\x0A\x02\
> xC1fc95d4114d5e91197a5e41bf37c9e8c7
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-1]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> 23df78bafd304ff887385a2b6becf06d first=1C4RJFLT6HC742023\x00\
> x80\x00x\xFF\x80\x00\x01[J8\x0Ac8b65a80fe1662fb25d80798a66cc83dc
> last=1FMCU9J90EUB68140\x00\x80\x01X\xA4\x80\x00\x01[\x1B\xDD\xB2\
> x1C577502512ec987844b0108738a9ec6ba
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-3]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> 39dc73882bec49a0bdd5d787b06ac032 first=1G1JD5SB5H4136951\x00\
> x80\x00!\x8A\x80\x00\x01Z%\xF6\x7Ffef0b8faeeeb4a10103e1a67ea5ebdbec
> last=1GNKVHKD7HJ275239\x00\x80\x00$\x87\x80\x00\x01Z%\xF6s\
> xDC0961566a370af3b7da440e9705bc4c8c
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-8]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> a37a2a56ff5c48399cf1abd92f99662f first=###239824\x00\
> x80\x01(\xFE\x80\x00\x01Z\xAE\xD6\xE0Xe5a45a2beab337228bdba90c06f34a12
> last=1C4RJFLT6HC742023\x00\x80\x00x\xFF\x80\x00\x01[H\xF9w\
> x8D60edb518c27ef80f8a751701926d9174
> 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-10]
> mapreduce.LoadIncrementalHFiles:
> Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> b5843a62a6bd47fbbfc29303bee158e3 first=5FNRL5H4XFB033259\x00\
> x80\x00\x1EZ\x80\x00\x01\x5C"\x87s\xF5ce24ec7e2a3698836386bccabc1265af
> last=5NPEB4AC3DH620091\x00\x80\x00\xE0\x16\x80\x00\x01X\xE4\

Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
@James

Thanks for the insight. I think that's also our case. I see the dead region 
list but it seems like our cluster is operating properly.
However, from a maintenance standpoint I'd like the cluster to always report as 
health. And having a list of "dead" servers is not a healthy thing to have.
So i was hoping that from the comments I'd be collecting here, I could write a 
shell file that would do this clean up in an automated fashion. I just needed 
insight as to what I should be cleaning up and when it's safe to do so.

jeff


From: James Moore 
Sent: Friday, May 26, 2017 11:35:22 AM
To: user@hbase.apache.org
Cc: d...@hbase.apache.org
Subject: Re: What is Dead Region Servers and how to clear them up?

In HBase all data is stored in HDFS rather than inside of the region
server.  The HBase cluster itself considers any individual region
server process a region server and when that process dies it is considered
a dead region server, this tracking is particularly important during the
crash recovery process and dealing with network partitions, there isn't any
need to clean up dead region servers as an out of band maintenance task and
will be cleaned up by the HMasters eventually.

On Fri, May 26, 2017 at 2:03 PM, jeff saremi  wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread Dima Spivak
Actually, it's a "Please give us the details another member of the project
already asked for."

This is a community mailing list, which means we volunteer our time to help
people with questions. If you're looking for customer support, you should
be taking your question to a consultant or vendor that provides such
services. Being a jerk is incredibly counterproductive.

-Dima

On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
Sir

You're not only not helping but you're also polluting my post and reducing its 
visibility. Now you're asking for recognition for that too?

If you don't have anything to add to my question, please don't respond to it. 
Let someone else who might have something to say not get tricked into thinking 
that my post was already addressed.

Jeff


From: Dima Spivak 
Sent: Friday, May 26, 2017 1:27:33 PM
To: hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Actually, it's a "Please give us the details another member of the project
already asked for."

This is a community mailing list, which means we volunteer our time to help
people with questions. If you're looking for customer support, you should
be taking your question to a consultant or vendor that provides such
services. Being a jerk is incredibly counterproductive.

-Dima

On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread Enis Söztutar
In general if there are no regions in transition, the WAL recovery has
already finished. You can watch the master's log4j log for those entries,
but the lack of regions in transition is the easiest way to identify.

Enis

On Fri, May 26, 2017 at 12:14 PM, jeff saremi 
wrote:

> thanks Enis
>
> I apologize for earlier
>
> This looks very close to our issue
> When you say: "there is no "WAL" recovery is happening", how could i make
> sure of that? Thanks
>
> Jeff
>
>
> 
> From: Enis Söztutar 
> Sent: Friday, May 26, 2017 11:47:11 AM
> To: d...@hbase.apache.org
> Cc: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Jeff, please be respectful to be people who are trying to help you. This is
> not acceptable behavior and will result in consequences next time.
>
> On the specific issue that you are seeing, it is highly likely that you are
> seeing this: https://issues.apache.org/jira/browse/HBASE-14223. Having
> those servers in the dead servers list will not hurt operations, or
> runtimes or anything else. Possibly for those servers, there is not new
> instance of the regionserver running in the same host and ports.
>
> If you want to manually clean out these, you can follow these steps:
>  - Manually move these directries from the file system:
> /WALs/dead-server-splitting
>  - ONLY do this if you are sure that there is no "WAL" recovery is
> happening, and there is only WAL files with names containing ".meta."
>  - Restart HBase master.
>
> Upon restart, you can see that these do not show up anymore. For more
> technical details, please refer to the jira link.
>
> Enis
>
> On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
> wrote:
>
> > Thank you for the GFY answer
> >
> > And i guess to figure out how to fix these I can always go through the
> > HBase source code.
> >
> >
> > 
> > From: Dima Spivak 
> > Sent: Friday, May 26, 2017 9:58:00 AM
> > To: hbase-user
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > Sending this back to the user mailing list.
> >
> > RegionServers can die for many reasons. Looking at your RegionServer log
> > files should give hints as to why it's happening.
> >
> >
> > -Dima
> >
> > On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> > wrote:
> >
> > > I had posted this to the user mailing list and I have not got any
> direct
> > > answer to my question.
> > >
> > > Where do dead RS's come from and how can they be cleaned up? Someone in
> > > the midst of developers should know this.
> > >
> > > thanks
> > >
> > > Jeff
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Thursday, May 25, 2017 10:23:17 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > >
> > > I'm still looking to get hints on how to remove the dead regions.
> thanks
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > >
> > > i'm trying to eliminate the dead region servers.
> > >
> > > 
> > > From: Ted Yu 
> > > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > >
> > > bq. running hbck (many times
> > >
> > > Can you describe the specific inconsistencies you were trying to
> resolve
> > ?
> > > Depending on the inconsistencies, advice can be given on the best known
> > > hbck command arguments to use.
> > >
> > > Feel free to pastebin master log if needed.
> > >
> > > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > > wrote:
> > >
> > > > these are the things I have done so far:
> > > >
> > > >
> > > > - restarting master (few times)
> > > >
> > > > - running hbck (many times; this tool does not seem to be doing
> > anything
> > > > at all)
> > > >
> > > > - checking the list of region servers in ZK (none of the dead ones
> are
> > > > listed here)
> > > >
> > > > - checking the WALs under /WALs. Out of 11 dead ones
> only 3
> > > > are listed here with "-splitting" at the end of their names and they
> > > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > From: jeff saremi 
> > > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > > To: user@hbase.apache.org
> > > > Subject: What is Dead Region Servers and how to clear them up?
> > > >
> > > > Apparently having dead region servers is so common that a section of
> > the
> > > > master console is dedicated to that?
> > > > How can we clean this up (preferably in an automated fashion)? Why
> > isn't
> > > > this being done by HBase automatically?
> > > >
> > > >
> > > > thanks
> > > >
> > >
> >
>


Re: mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles

2017-05-26 Thread Sergey Soldatov
Check HBase logs for any additional exceptions. The main problem of HBase
bulkload that for almost any kind of problem it reports that there was a
split. Sometimes it could be related to HA configuration (people are using
cluster name with port number) and it's visible from the logs (errors that
hdfs name is incorrect).

Thanks,
Sergey

On Fri, May 26, 2017 at 12:18 PM, Ted Yu  wrote:

> You can compare 27c0a905ee174c9898d324acf1554bf9 with region names of the
> target table and find out which region the hfile should go to.
>
> Then check the region server log where that region was hosted and see what
> the cause might be.
>
> Cheers
>
> On Thu, May 25, 2017 at 5:53 PM, anil gupta  wrote:
>
> > Cross posting since this seems to be an HBase issue.
> > I think completeBulkLoad step is failing. Please refer to the mail below.
> >
> > -- Forwarded message --
> > From: anil gupta 
> > Date: Thu, May 25, 2017 at 4:38 PM
> > Subject: [IndexTool NOT working] mapreduce.LoadIncrementalHFiles: Split
> > occured while grouping HFiles
> > To: "u...@phoenix.apache.org" 
> >
> >
> > Hi,
> >
> > We are using HDP2.3.2(Phoenix 4.4 and HBase 1.1), we created a secondary
> > index on an already existing table. We paused all writes to Primary
> table.
> > Then we invoked IndexTool to populate secondary index table. We have
> tried
> > same steps many times but we keep on getting following error(we have also
> > tried drop the index and adding it again):
> >
> > 2017-05-24 18:00:10,281 WARN  [LoadIncrementalHFiles-2]
> > util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit
> is
> > deprecated by hbase.regionserver.global.memstore.size
> > 2017-05-24 18:00:10,340 WARN  [LoadIncrementalHFiles-12]
> > util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit
> is
> > deprecated by hbase.regionserver.global.memstore.size
> > 2017-05-24 18:00:10,342 INFO  [LoadIncrementalHFiles-11]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > c79ae6d27824424f99523dad586e86b1 first=JF2GPADC8GH331037\x00\
> > x80\x00\x1A0\x80\x00\x01Wj\x03r1defc4d301e4ec172b49be4a7ea33c2f7
> > last=JTHBK1GG4E2122477\x00\x80\x00$\xE4\x80\x00\x01[\xAD`{\
> > x17901d036d588292854ac5b1d4c29d8e1e
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-14]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > f0e97b218aed4abf9949cf49a57e559b first=5NPEB4AC3DH620091\x00\
> > x80\x00\xE0\x16\x80\x00\x01X\xE5g\xD6\x0B81d210ac753ed281e8627e5edb7
> eb59f
> > last=JF2GPADC8GH331037\x00\x80\x00\x1A0\x80\x00\x01W]&\
> > xE54f37d636104f6cd916b2b07bf3aa94d3f
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-2]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > 27c0a905ee174c9898d324acf1554bf9 first=WMWZP3C58FT708786\x00\
> > x80\x00\xE0\x16\x80\x00\x01Y\xB8\x95U\xA0d21d32aed18af976dd53735705c
> 728cd
> > last=`1GCRCPE05BZ430377\x00\x80\x00}\x05\x80\x00\x01[\xDEE\
> > x91L383768c6ac5f306fa99f68964b4f18aa
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-12]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > d0a6c4b727bb416f840ed254658f3982 first=1N4BZ0CP4GC308715\x00\
> > x80\x01T\xFC\x80\x00\x01U\xE3\x7FL\x9A37b77d47941e99e430fcb0e0657f5558
> > last=2GKALMEK7H6220949\x00\x80\x00!\x1A\x80\x00\x01Y\x18\xE6\xB3\
> > xB42e72036f7e7e03078f41fc82712c5de7
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-0]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > 11627a2861e3446e9d6f684ab534563e first=3C7WRNAJ6GG313342\x00\
> > x80\x00NB\x80\x00\x01V}\xFD\xE4+65bbebdd06dedd8466a31ebd33841a51
> > last=3N1CE2CP2FL407481\x00\x80\x00\xE0\x16\x80\x00\x01W\x1B\x0A\x02\
> > xC1fc95d4114d5e91197a5e41bf37c9e8c7
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-1]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > 23df78bafd304ff887385a2b6becf06d first=1C4RJFLT6HC742023\x00\
> > x80\x00x\xFF\x80\x00\x01[J8\x0Ac8b65a80fe1662fb25d80798a66cc83dc
> > last=1FMCU9J90EUB68140\x00\x80\x01X\xA4\x80\x00\x01[\x1B\xDD\xB2\
> > x1C577502512ec987844b0108738a9ec6ba
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-3]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > 39dc73882bec49a0bdd5d787b06ac032 first=1G1JD5SB5H4136951\x00\
> > x80\x00!\x8A\x80\x00\x01Z%\xF6\x7Ffef0b8faeeeb4a10103e1a67ea5ebdbec
> > last=1GNKVHKD7HJ275239\x00\x80\x00$\x87\x80\x00\x01Z%\xF6s\
> > xDC0961566a370af3b7da440e9705bc4c8c
> > 2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-8]
> > mapreduce.LoadIncrementalHFiles:
> > Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
> > a37a2a56ff5c48399cf1abd92

Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread Stack
On Fri, May 26, 2017 at 9:47 AM, jeff saremi  wrote:

>
> I have an issue with this exception being thrown period. I think the
> resource management needs a lot of work. I will soon post another note
> about my impression of this whole thing.
>
>
Please. Would appreciate your input Jeff.
Thanks,
St.Ack



> Jeff
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Friday, May 26, 2017 12:05:36 AM
> To: Hbase-User
> Subject: Re: What is the cause for RegionTooBusyException?
>
> On Mon, May 22, 2017 at 9:31 AM, jeff saremi 
> wrote:
>
> > while I'm still trying to find anything useful in the logs, my question
> is
> > why isn't HBase self managing this?
> >
>
> It should do better here, yes (I thought TooBusy retried but I am not
> finding it at the mo.). Exception is thrown for such as the reasons James
> lists -- in essence out of resources --  including the case where we fail
> to obtain lock inside the configured timeouts (row lock on write or region
> lock doing bulk load). As James notes, you should see the too busy dumped
> into the regionserver log at time of issue. Having this, you can figure
> what resource is crimped. Is there no more detail on client side on the
> root of the TooBusy exceptions?
>
>
> Thanks,
> S
>
>
>
> >
> > 
> > From: jeff saremi 
> > Sent: Friday, May 19, 2017 8:18:59 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Thanks Ted. I will look deeper as you suggested
> >
> > 
> > From: Ted Yu 
> > Sent: Friday, May 19, 2017 4:18:12 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Have you checked region server log ?
> > Please take a look at the following method in HRegion:
> >
> >   private void checkResources() throws RegionTooBusyException {
> >
> > ...
> >
> > if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
> >
> >   blockedRequestsCount.increment();
> >
> >   requestFlush();
> >
> >   throw new RegionTooBusyException("Above memstore limit, " +
> >
> > Which hbase release are you using ?
> >
> > Cheers
> >
> > On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> > wrote:
> >
> > > We're getting errors like this. Where should we be looking into to
> solve
> > > this?
> > >
> > >
> > > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > > RemoteWithExtrasException: 56566 times
> > >
> > > thanks
> > >
> > > Jeff
> > >
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread Yu Li
bq. And having a list of "dead" servers is not a healthy thing to have.
I don't think the existence of "dead" servers means the service is
unhealthy, especially in a distributed system. Besides hbase, HDFS also
shows Live and Dead nodes in namenode UI, and people won't regard HDFS as
unhealthy if there're dead nodes.

In HBase, if some RS aborts due to unexpected issue like long GC, normally
we will restart it and once it's restarted and report to master, it will be
removed from the dead server list. So when we observed dead server in
Master UI, the first thing is to check the root cause and restart it if it
won't cause further issue.

However, sometimes we may find the server aborted due to some hardware
failure and we must offline the server for repairing. Or we need to move
some nodes to join other clusters so we stop the RS process on purpose. I
guess this is the case you're dealing with @jeff? If so, I think it's a
reasonable requirement that we supply a command in hbase to clear the dead
nodes when operator assure they no longer serves.

Best Regards,
Yu

On 27 May 2017 at 04:49, Enis Söztutar  wrote:

> In general if there are no regions in transition, the WAL recovery has
> already finished. You can watch the master's log4j log for those entries,
> but the lack of regions in transition is the easiest way to identify.
>
> Enis
>
> On Fri, May 26, 2017 at 12:14 PM, jeff saremi 
> wrote:
>
> > thanks Enis
> >
> > I apologize for earlier
> >
> > This looks very close to our issue
> > When you say: "there is no "WAL" recovery is happening", how could i make
> > sure of that? Thanks
> >
> > Jeff
> >
> >
> > 
> > From: Enis Söztutar 
> > Sent: Friday, May 26, 2017 11:47:11 AM
> > To: d...@hbase.apache.org
> > Cc: hbase-user
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > Jeff, please be respectful to be people who are trying to help you. This
> is
> > not acceptable behavior and will result in consequences next time.
> >
> > On the specific issue that you are seeing, it is highly likely that you
> are
> > seeing this: https://issues.apache.org/jira/browse/HBASE-14223. Having
> > those servers in the dead servers list will not hurt operations, or
> > runtimes or anything else. Possibly for those servers, there is not new
> > instance of the regionserver running in the same host and ports.
> >
> > If you want to manually clean out these, you can follow these steps:
> >  - Manually move these directries from the file system:
> > /WALs/dead-server-splitting
> >  - ONLY do this if you are sure that there is no "WAL" recovery is
> > happening, and there is only WAL files with names containing ".meta."
> >  - Restart HBase master.
> >
> > Upon restart, you can see that these do not show up anymore. For more
> > technical details, please refer to the jira link.
> >
> > Enis
> >
> > On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
> > wrote:
> >
> > > Thank you for the GFY answer
> > >
> > > And i guess to figure out how to fix these I can always go through the
> > > HBase source code.
> > >
> > >
> > > 
> > > From: Dima Spivak 
> > > Sent: Friday, May 26, 2017 9:58:00 AM
> > > To: hbase-user
> > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > >
> > > Sending this back to the user mailing list.
> > >
> > > RegionServers can die for many reasons. Looking at your RegionServer
> log
> > > files should give hints as to why it's happening.
> > >
> > >
> > > -Dima
> > >
> > > On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> > > wrote:
> > >
> > > > I had posted this to the user mailing list and I have not got any
> > direct
> > > > answer to my question.
> > > >
> > > > Where do dead RS's come from and how can they be cleaned up? Someone
> in
> > > > the midst of developers should know this.
> > > >
> > > > thanks
> > > >
> > > > Jeff
> > > >
> > > > 
> > > > From: jeff saremi 
> > > > Sent: Thursday, May 25, 2017 10:23:17 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > > >
> > > > I'm still looking to get hints on how to remove the dead regions.
> > thanks
> > > >
> > > > 
> > > > From: jeff saremi 
> > > > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > > >
> > > > i'm trying to eliminate the dead region servers.
> > > >
> > > > 
> > > > From: Ted Yu 
> > > > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: What is Dead Region Servers and how to clear them up?
> > > >
> > > > bq. running hbck (many times
> > > >
> > > > Can you describe the specific inconsistencies you were trying to
> > resolve
> > > ?
> > > > Depending on the inconsistencies, advice can be given on the best
> known
> > >