Re: Baffling RPC exceptions with our Thrift servers

2017-08-09 Thread jeff saremi
we are getting inundated with RPC exceptions in Thrift server. Is anyone here 
that could point us to where the problem is?
According to the Master UI everything is good, no regions in transition, no 
FAILED_CLOSE, no red tasks or anything like that, no Offline regions, nothing



From: jeff saremi 
Sent: Friday, August 4, 2017 2:25:52 PM
To: user@hbase.apache.org
Subject: Re: Baffling RPC exceptions with our Thrift servers

actually going further back in the RS logs I see these:

java.io.IOException: Got error, status message 
org.apache.hadoop.yarn.server.nodemanager.util.UtilizationBasedNodeBusyChecker 
CPU: 18.97175> 10 , for OP_READ_BLOCK, self=/25.123.83.126:41098, 
remote=/10.27.138.10:10010, for file 
/hbase/SomeData/data/default/SomeTable122016/bfce55b49e2ade82e1bac73c4205d967/info/995f0a2a24b84a048ea55a4879f46e28,
 for pool BP-575538346-25.126.51.77-1446116651710 block 1096040129_23473372
at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:142)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:821)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:700)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:358)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:729)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1651)
at org.apache.hadoop.hdfs.DFSInputStream$3.call(DFSInputStream.java:1610)
at org.apache.hadoop.hdfs.DFSInputStream$3.call(DFSInputStream.java:1602)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



From: jeff saremi 
Sent: Friday, August 4, 2017 2:22:54 PM
To: user@hbase.apache.org
Subject: Baffling RPC exceptions with our Thrift servers

Every once in a while (and this is getting more frequent) our Thrift clients 
report errors all over.

I check say one of the Thrift server logs. I see a lot of lines like the 
following:


2017-08-04 14:15:17,089 INFO  [thrift-worker-29] client.RpcRetryingCaller: Call 
exception, tries=14, retries=35, started=108853 ms ago, cancelled=false, 
msg=row 'http://hobartexchange.com.au/classifieds/_g397381.html' on table 
'ClickStreamTable122016' at 
region=ClickStreamTable122016,http://hifimov.com/youtube-videos/mcent-hack-unlimited-money-cracked-apk,1501285634230.bfce55b49e2ade82e1bac73c4205d967.,
 hostname=co4aps197b537e,16020,1501339699340, seqNum=54295


I go to mater. Check status. No issues whatsoever.

I check the logs for the RS mentioned in the log. No issues that I can tell you.

I restarted all Thrift servers and that didn't help. I bounced the active 
master. still nothing


What else can I check? what could be the reason? How can we get Thrift working 
again?

thanks

Jeff




Re: Region remains in FAILED_CLOSE

2017-08-07 Thread jeff saremi
thanks Stephen!


From: Stephen Jiang 
Sent: Monday, August 7, 2017 7:22:52 PM
To: user@hbase.apache.org
Subject: Re: Region remains in FAILED_CLOSE

The easiest way is to restart the region server that hosted this troubled
region.  Shut down co4aap80c321419,16020,1502140564865 would force the
region to be re-assigned.

Thanks
Stephen

On Mon, Aug 7, 2017 at 5:10 PM, jeff saremi  wrote:

> How can this be fixed? I have run hbck -repair a few times but
>
> it doesnt make a diff.
>
> THis is what's reported in hbck log:
>
>
> java.io.IOException: Region {ENCODED => ff1472457c0dba52ca09f464bf691244,
> NAME => 'ClickStreamTable122016,http://100-poems.com/poems/best/
> 1495001.htm,1501285596868.ff1472457c0dba52ca09f464bf691244.', STARTKEY =>
> 'http://100-poems.com/poems/best/1495001.htm', ENDKEY => '
> http://100-the.blogspot.com/2010/11/e'} failed to move out of transition
> within timeout 12ms
>
> and this is from Master UI (shown in RED):
>
> ClickStreamTable122016,http://lankaconnections.com/Thread-
> Contact-Details-of-National-Savings-Bank-NSB-Anamaduwa-
> Branch-in-Sri-Lanka,1501285611734.a8a539f31ca0f4cf4b1c1eec36ed567b.
> state=FAILED_CLOSE, ts=Mon Aug 07 16:39:23 PDT 2017 (738s ago),
> server=co4aap80c321419,16020,1502140564865
>
>


Region remains in FAILED_CLOSE

2017-08-07 Thread jeff saremi
How can this be fixed? I have run hbck -repair a few times but

it doesnt make a diff.

THis is what's reported in hbck log:


java.io.IOException: Region {ENCODED => ff1472457c0dba52ca09f464bf691244, NAME 
=> 
'ClickStreamTable122016,http://100-poems.com/poems/best/1495001.htm,1501285596868.ff1472457c0dba52ca09f464bf691244.',
 STARTKEY => 'http://100-poems.com/poems/best/1495001.htm', ENDKEY => 
'http://100-the.blogspot.com/2010/11/e'} failed to move out of transition 
within timeout 12ms

and this is from Master UI (shown in RED):

ClickStreamTable122016,http://lankaconnections.com/Thread-Contact-Details-of-National-Savings-Bank-NSB-Anamaduwa-Branch-in-Sri-Lanka,1501285611734.a8a539f31ca0f4cf4b1c1eec36ed567b.
 state=FAILED_CLOSE, ts=Mon Aug 07 16:39:23 PDT 2017 (738s ago), 
server=co4aap80c321419,16020,1502140564865



Re: Thrift servers not connecting to RegionServers

2017-08-07 Thread jeff saremi
Just checked

THe keys are URLs. The region name has that and also the intended lookup key is 
a URL. So I think they are consistent



From: jeff saremi 
Sent: Monday, August 7, 2017 11:18:52 AM
To: user@hbase.apache.org
Subject: Re: Thrift servers not connecting to RegionServers


Not intended at all.
let me investigate that

thanks Ted


From: Ted Yu 
Sent: Monday, August 7, 2017 10:52:10 AM
To: user@hbase.apache.org
Subject: Re: Thrift servers not connecting to RegionServers

The formation of row key (starting with http) looks to be different from
the region (which starts with table name).

Is this expected ?

On Mon, Aug 7, 2017 at 10:34 AM, jeff saremi  wrote:

> This happens so frequently to us and we still haven't figured out why
>
> We have this problem a few times a day where the Thrift server reports
> errors like:
>
>
> 2017-08-07 10:28:45,686 INFO  [thrift-worker-54] client.RpcRetryingCaller:
> Call exception, tries=31, retries=35, started=452454 ms ago,
> cancelled=false, msg=row 'http://anotherhost.com/
> colonel-brewery-tours-lebanon.html' on table 'OurTable' at
> region=ClickStreamTable122016,http://somehost.com/,1501285630924.
> 9772ed631b4a7a3a76a9c8647c6f8273., 
> hostname=co4aapa29412175,16020,1501881562692,
> seqNum=10880
>
>
> and the regionserver in question (co4aapa29412175) shows no problems
> either on the status page or in its logs.
>
> Restarting Thrift servers are not helping either
>
> Does anyone have any ideas on what to check?
> thanks
>
>


Re: Thrift servers not connecting to RegionServers

2017-08-07 Thread jeff saremi
Not intended at all.
let me investigate that

thanks Ted


From: Ted Yu 
Sent: Monday, August 7, 2017 10:52:10 AM
To: user@hbase.apache.org
Subject: Re: Thrift servers not connecting to RegionServers

The formation of row key (starting with http) looks to be different from
the region (which starts with table name).

Is this expected ?

On Mon, Aug 7, 2017 at 10:34 AM, jeff saremi  wrote:

> This happens so frequently to us and we still haven't figured out why
>
> We have this problem a few times a day where the Thrift server reports
> errors like:
>
>
> 2017-08-07 10:28:45,686 INFO  [thrift-worker-54] client.RpcRetryingCaller:
> Call exception, tries=31, retries=35, started=452454 ms ago,
> cancelled=false, msg=row 'http://anotherhost.com/
> colonel-brewery-tours-lebanon.html' on table 'OurTable' at
> region=ClickStreamTable122016,http://somehost.com/,1501285630924.
> 9772ed631b4a7a3a76a9c8647c6f8273., 
> hostname=co4aapa29412175,16020,1501881562692,
> seqNum=10880
>
>
> and the regionserver in question (co4aapa29412175) shows no problems
> either on the status page or in its logs.
>
> Restarting Thrift servers are not helping either
>
> Does anyone have any ideas on what to check?
> thanks
>
>


Thrift servers not connecting to RegionServers

2017-08-07 Thread jeff saremi
This happens so frequently to us and we still haven't figured out why

We have this problem a few times a day where the Thrift server reports errors 
like:


2017-08-07 10:28:45,686 INFO  [thrift-worker-54] client.RpcRetryingCaller: Call 
exception, tries=31, retries=35, started=452454 ms ago, cancelled=false, 
msg=row 'http://anotherhost.com/colonel-brewery-tours-lebanon.html' on table 
'OurTable' at 
region=ClickStreamTable122016,http://somehost.com/,1501285630924.9772ed631b4a7a3a76a9c8647c6f8273.,
 hostname=co4aapa29412175,16020,1501881562692, seqNum=10880


and the regionserver in question (co4aapa29412175) shows no problems either on 
the status page or in its logs.

Restarting Thrift servers are not helping either

Does anyone have any ideas on what to check?
thanks



Re: Baffling RPC exceptions with our Thrift servers

2017-08-04 Thread jeff saremi
actually going further back in the RS logs I see these:

java.io.IOException: Got error, status message 
org.apache.hadoop.yarn.server.nodemanager.util.UtilizationBasedNodeBusyChecker 
CPU: 18.97175> 10 , for OP_READ_BLOCK, self=/25.123.83.126:41098, 
remote=/10.27.138.10:10010, for file 
/hbase/SomeData/data/default/SomeTable122016/bfce55b49e2ade82e1bac73c4205d967/info/995f0a2a24b84a048ea55a4879f46e28,
 for pool BP-575538346-25.126.51.77-1446116651710 block 1096040129_23473372
at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:142)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:821)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:700)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:358)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:729)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1651)
at org.apache.hadoop.hdfs.DFSInputStream$3.call(DFSInputStream.java:1610)
at org.apache.hadoop.hdfs.DFSInputStream$3.call(DFSInputStream.java:1602)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



From: jeff saremi 
Sent: Friday, August 4, 2017 2:22:54 PM
To: user@hbase.apache.org
Subject: Baffling RPC exceptions with our Thrift servers

Every once in a while (and this is getting more frequent) our Thrift clients 
report errors all over.

I check say one of the Thrift server logs. I see a lot of lines like the 
following:


2017-08-04 14:15:17,089 INFO  [thrift-worker-29] client.RpcRetryingCaller: Call 
exception, tries=14, retries=35, started=108853 ms ago, cancelled=false, 
msg=row 'http://hobartexchange.com.au/classifieds/_g397381.html' on table 
'ClickStreamTable122016' at 
region=ClickStreamTable122016,http://hifimov.com/youtube-videos/mcent-hack-unlimited-money-cracked-apk,1501285634230.bfce55b49e2ade82e1bac73c4205d967.,
 hostname=co4aps197b537e,16020,1501339699340, seqNum=54295


I go to mater. Check status. No issues whatsoever.

I check the logs for the RS mentioned in the log. No issues that I can tell you.

I restarted all Thrift servers and that didn't help. I bounced the active 
master. still nothing


What else can I check? what could be the reason? How can we get Thrift working 
again?

thanks

Jeff




Baffling RPC exceptions with our Thrift servers

2017-08-04 Thread jeff saremi
Every once in a while (and this is getting more frequent) our Thrift clients 
report errors all over.

I check say one of the Thrift server logs. I see a lot of lines like the 
following:


2017-08-04 14:15:17,089 INFO  [thrift-worker-29] client.RpcRetryingCaller: Call 
exception, tries=14, retries=35, started=108853 ms ago, cancelled=false, 
msg=row 'http://hobartexchange.com.au/classifieds/_g397381.html' on table 
'ClickStreamTable122016' at 
region=ClickStreamTable122016,http://hifimov.com/youtube-videos/mcent-hack-unlimited-money-cracked-apk,1501285634230.bfce55b49e2ade82e1bac73c4205d967.,
 hostname=co4aps197b537e,16020,1501339699340, seqNum=54295


I go to mater. Check status. No issues whatsoever.

I check the logs for the RS mentioned in the log. No issues that I can tell you.

I restarted all Thrift servers and that didn't help. I bounced the active 
master. still nothing


What else can I check? what could be the reason? How can we get Thrift working 
again?

thanks

Jeff




Re: distributing new regions immediately

2017-07-27 Thread jeff saremi
very interesting. Thanks Ted


From: Ted Yu 
Sent: Thursday, July 27, 2017 2:13:25 PM
To: user@hbase.apache.org
Subject: Re: distributing new regions immediately

Since you're more concerned with write load, you can take a look at the
following parameter:

hbase.master.balancer.stochastic.writeRequestCost

Default value is 5, much smaller than default value for region count cost
(500).
Consider raising the value so that load balancer reacts more responsively.

On Thu, Jul 27, 2017 at 12:17 PM, jeff saremi 
wrote:

> We haven't done enough testing for me to say this with certainty but as we
> insert data and new regions get created, it could be a while before those
> regions are distributed. As such and if the data injection continues the
> load on the region server becomes overwhelming
>
> Is there a way to expedite the distribution of regions among available
> region servers?
>
> thanks
>
>


Re: distributing new regions immediately

2017-07-27 Thread jeff saremi
Thanks Dima


From: Dima Spivak 
Sent: Thursday, July 27, 2017 12:38:56 PM
To: user@hbase.apache.org
Subject: Re: distributing new regions immediately

Presplitting tables is typically how this is addressed in production cases.

On Thu, Jul 27, 2017 at 12:17 PM jeff saremi  wrote:

> We haven't done enough testing for me to say this with certainty but as we
> insert data and new regions get created, it could be a while before those
> regions are distributed. As such and if the data injection continues the
> load on the region server becomes overwhelming
>
> Is there a way to expedite the distribution of regions among available
> region servers?
>
> thanks
>
> --
-Dima


distributing new regions immediately

2017-07-27 Thread jeff saremi
We haven't done enough testing for me to say this with certainty but as we 
insert data and new regions get created, it could be a while before those 
regions are distributed. As such and if the data injection continues the load 
on the region server becomes overwhelming

Is there a way to expedite the distribution of regions among available region 
servers?

thanks



Re: How to get a list of running tasks in hbase shell?

2017-07-13 Thread jeff saremi
Thanks very much Josh


From: Josh Elser 
Sent: Thursday, July 13, 2017 11:44:52 AM
To: user@hbase.apache.org
Subject: Re: How to get a list of running tasks in hbase shell?

TaskMonitor is probably relying on the fact that it's invoked inside of
the RegionServer/Master process (note the maven module and the fact that
it's a template file for the webUI).

You would have to invoke an RPC to the server to get the list of Tasks.
I'm not sure if such an RPC already exists.

On 7/13/17 12:45 PM, jeff saremi wrote:
> would someone throw this dog a bone please? thanks
>
> ____
> From: jeff saremi 
> Sent: Tuesday, July 11, 2017 10:55:40 AM
> To: user@hbase.apache.org
> Subject: How to get a list of running tasks in hbase shell?
>
> I sent this earlier in another thread. Thought i'd create its own to get an 
> answer. thanks
>
>
> How do you get an instance of TaskMonitor in Jruby (bin/hbase shell)?
> I tried the following and didn't result in anything:
>
> -
> taskmonitor = org.apache.hadoop.hbase.monitoring.TaskMonitor.get
> taskmonitor.get_tasks.each do |task|
>  printf("%s\r\n", task.to_string)
> end
> exit
> -
>
> I was trying to mimic the following code in 
> "hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/common/TaskMonitorTmpl.jamon"
>
>
> TaskMonitor taskMonitor = TaskMonitor.get();
> ...
> List tasks = taskMonitor.getTasks();
>
>


Re: How to get a list of running tasks in hbase shell?

2017-07-13 Thread jeff saremi
would someone throw this dog a bone please? thanks


From: jeff saremi 
Sent: Tuesday, July 11, 2017 10:55:40 AM
To: user@hbase.apache.org
Subject: How to get a list of running tasks in hbase shell?

I sent this earlier in another thread. Thought i'd create its own to get an 
answer. thanks


How do you get an instance of TaskMonitor in Jruby (bin/hbase shell)?
I tried the following and didn't result in anything:

-
taskmonitor = org.apache.hadoop.hbase.monitoring.TaskMonitor.get
taskmonitor.get_tasks.each do |task|
printf("%s\r\n", task.to_string)
end
exit
-

I was trying to mimic the following code in 
"hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/common/TaskMonitorTmpl.jamon"


TaskMonitor taskMonitor = TaskMonitor.get();
...
List tasks = taskMonitor.getTasks();



How to get a list of running tasks in hbase shell?

2017-07-11 Thread jeff saremi
I sent this earlier in another thread. Thought i'd create its own to get an 
answer. thanks


How do you get an instance of TaskMonitor in Jruby (bin/hbase shell)?
I tried the following and didn't result in anything:

-
taskmonitor = org.apache.hadoop.hbase.monitoring.TaskMonitor.get
taskmonitor.get_tasks.each do |task|
printf("%s\r\n", task.to_string)
end
exit
-

I was trying to mimic the following code in 
"hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/common/TaskMonitorTmpl.jamon"


TaskMonitor taskMonitor = TaskMonitor.get();
...
List tasks = taskMonitor.getTasks();



Re: Command line tool to get the list of Failed Regions

2017-07-10 Thread jeff saremi
@Sean Busbey<mailto:sean.bus...@gmail.com>
How do you get an instance of TaskMonitor in Jruby?
I tried the following and didn't result in anything:


taskmonitor = org.apache.hadoop.hbase.monitoring.TaskMonitor.get
taskmonitor.get_tasks.each do |task|
printf("%s\r\n", task.to_string)
end
exit


I was trying to mimic the following code in

hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/common/TaskMonitorTmpl.jamon


TaskMonitor taskMonitor = TaskMonitor.get();
String filter = "general";
String format = "html";

<%java>
List tasks = taskMonitor.getTasks();


____
From: jeff saremi 
Sent: Monday, July 10, 2017 11:20 AM
To: user@hbase.apache.org
Subject: Re: Command line tool to get the list of Failed Regions

yep it looks like sky is the limit with the SHell tool

Great suggestion! thanks Sean


From: Sean Busbey 
Sent: Friday, July 7, 2017 2:11 PM
To: user@hbase.apache.org
Subject: Re: Command line tool to get the list of Failed Regions

you could use the shell in non-interactive mode to do this.

I don't have access to an instance ATM, but the JIRA for getting some shell
examples has a start

https://issues.apache.org/jira/browse/HBASE-15611
[HBASE-15611] add examples to shell docs - ASF 
JIRA<https://issues.apache.org/jira/browse/HBASE-15611>
issues.apache.org
It would be nice if our shell documentation included some additional examples 
of operational tasks one can perform. things to include to come in comments. 
when we ...


[HBASE-15611] add examples to shell docs - ASF 
JIRA<https://issues.apache.org/jira/browse/HBASE-15611>
[HBASE-15611] add examples to shell docs - ASF 
JIRA<https://issues.apache.org/jira/browse/HBASE-15611>
issues.apache.org
It would be nice if our shell documentation included some additional examples 
of operational tasks one can perform. things to include to come in comments. 
when we ...


issues.apache.org
It would be nice if our shell documentation included some additional examples 
of operational tasks one can perform. things to include to come in comments. 
when we ...



look for the one "how do I list regions for a table". if you use that
example in an interactive session, you should be able to see what other
methods exist on the region object instead of turning it into a string. I'd
expect there should be a status on it.

On Fri, Jul 7, 2017 at 5:06 PM, Josh Elser  wrote:

> HBCK can do this, something like `hbase hbck -summary `.
>
> This wouldn't be easily machine-consumable, but the content should be
> there.
>
>
> On 7/7/17 3:48 PM, jeff saremi wrote:
>
>> Is there a command line option that would give us a list of offline
>> Regions for a table? or a list of all regions and their status similar to
>> the Tasks section of master-status web page?
>>
>> thanks
>>
>>
>>
>>


--
Sean


Re: Command line tool to get the list of Failed Regions

2017-07-10 Thread jeff saremi
yep it looks like sky is the limit with the SHell tool

Great suggestion! thanks Sean


From: Sean Busbey 
Sent: Friday, July 7, 2017 2:11 PM
To: user@hbase.apache.org
Subject: Re: Command line tool to get the list of Failed Regions

you could use the shell in non-interactive mode to do this.

I don't have access to an instance ATM, but the JIRA for getting some shell
examples has a start

https://issues.apache.org/jira/browse/HBASE-15611
[HBASE-15611] add examples to shell docs - ASF 
JIRA<https://issues.apache.org/jira/browse/HBASE-15611>
issues.apache.org
It would be nice if our shell documentation included some additional examples 
of operational tasks one can perform. things to include to come in comments. 
when we ...



look for the one "how do I list regions for a table". if you use that
example in an interactive session, you should be able to see what other
methods exist on the region object instead of turning it into a string. I'd
expect there should be a status on it.

On Fri, Jul 7, 2017 at 5:06 PM, Josh Elser  wrote:

> HBCK can do this, something like `hbase hbck -summary `.
>
> This wouldn't be easily machine-consumable, but the content should be
> there.
>
>
> On 7/7/17 3:48 PM, jeff saremi wrote:
>
>> Is there a command line option that would give us a list of offline
>> Regions for a table? or a list of all regions and their status similar to
>> the Tasks section of master-status web page?
>>
>> thanks
>>
>>
>>
>>


--
Sean


Re: Command line tool to get the list of Failed Regions

2017-07-10 Thread jeff saremi
Thanks Josh

The output is a little hard to parse automatically but may work out if nothing 
else does the trick



From: Josh Elser 
Sent: Friday, July 7, 2017 2:06 PM
To: user@hbase.apache.org
Subject: Re: Command line tool to get the list of Failed Regions

HBCK can do this, something like `hbase hbck -summary `.

This wouldn't be easily machine-consumable, but the content should be there.

On 7/7/17 3:48 PM, jeff saremi wrote:
> Is there a command line option that would give us a list of offline Regions 
> for a table? or a list of all regions and their status similar to the Tasks 
> section of master-status web page?
>
> thanks
>
>
>


Command line tool to get the list of Failed Regions

2017-07-07 Thread jeff saremi
Is there a command line option that would give us a list of offline Regions for 
a table? or a list of all regions and their status similar to the Tasks section 
of master-status web page?

thanks




Error getting hdfs block distribution - suspicious path

2017-06-06 Thread jeff saremi
We're seeing errors like these in our master logs. The problem is manifested on 
the master-status page in section "Regions in Transition" in the following form 
(in red):


4498c6d15d8b92f0519de59853d76992  
ImageFeaturesTable,fa/mKp,1496337620103.4498c6d15d8b92f0519de59853d76992. 
state=FAILED_OPEN, ts=Tue Jun 06 13:54:26 PDT 2017 (4253s ago), 
server=co4aap71d094619,16020,1496782457320  4253018


Then in the logs we see entries like this:


2017-06-06 13:54:15,415 WARN  [ProcedureExecutor-28] regionserver.HRegion: 
Error getting hdfs block distribution for 
hdfs://NameNode3-VIP.Yarn-Prod-CO4.CO4.ap.gbl/hbase/MaltaData/data/default/ImageFeaturesTable/9af0e6227720802390dc58a2caaea6f1/info/c56295eb2a5f4811ba330de423b0fb69.ac7dd8c2274ed24f082edfdd36eaf2de-hdfs://NameNode3-VIP.Yarn-Prod-CO4.CO4.ap.gbl/hbase/MaltaData/data/default/ImageFeaturesTable/ac7dd8c2274ed24f082edfdd36eaf2de/info/c56295eb2a5f4811ba330de423b0fb69-bottom

Looking at the path in this message, it looks like 2 hdfs paths are 
concatenated. What do you make of that?

We're using hbase 1.2.5.


thanks

Jeff


Re: Question about the behavior of hbck

2017-05-31 Thread jeff saremi
Ted, thanks a lot for the hints. I'll have to go through the log and check the 
Jira item as well. We're using release 1.2.5



From: Ted Yu 
Sent: Wednesday, May 31, 2017 3:06:11 PM
To: user@hbase.apache.org
Subject: Re: Question about the behavior of hbck

bq. See logs for detail

Did you get clue from hbck log ?

Which hbase release are you using ?

Please check out this JIRA:

HBASE-16008 A robust way deal with early termination of HBCK

Cheers

On Wed, May 31, 2017 at 3:00 PM, jeff saremi  wrote:

> I'm running hbck like the following:
>
> bin\hbase.cmd hbck -repair
>
>
> Then I get this exception printed out to the console. However the program
> does not exit or seem to be doing anything else. When I kill it using
> CTRL-C, I'm not able to run subsequent hbck commands unless I go and clear
> the locks in hdfs.
> Is this exptected?
>
>
> Exception in thread "main" java.io.IOException: 1 region(s) could not be
> checked or repaired.  See logs for detail.
> at org.apache.hadoop.hbase.util.HBaseFsck.checkAndFixConsistency(
> HBaseFsck.java:1846)
> at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(
> HBaseFsck.java:672)
> at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(
> HBaseFsck.java:694)
> at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:
> 4732)
> at org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(
> HBaseFsck.java:4531)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:
> 4519)
>
>
>


Question about the behavior of hbck

2017-05-31 Thread jeff saremi
I'm running hbck like the following:

bin\hbase.cmd hbck -repair


Then I get this exception printed out to the console. However the program does 
not exit or seem to be doing anything else. When I kill it using CTRL-C, I'm 
not able to run subsequent hbck commands unless I go and clear the locks in 
hdfs.
Is this exptected?


Exception in thread "main" java.io.IOException: 1 region(s) could not be 
checked or repaired.  See logs for detail.
at 
org.apache.hadoop.hbase.util.HBaseFsck.checkAndFixConsistency(HBaseFsck.java:1846)
at 
org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:672)
at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:694)
at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:4732)
at 
org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(HBaseFsck.java:4531)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:4519)




Re: What is Dead Region Servers and how to clear them up?

2017-05-30 Thread jeff saremi
wonderful! thanks


From: Yu Li 
Sent: Tuesday, May 30, 2017 4:33:59 AM
To: jeff saremi
Cc: d...@hbase.apache.org; hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Thanks for the confirmation Jeff, have opened 
HBASE-18131<https://issues.apache.org/jira/browse/HBASE-18131> for this, FYI.

Best Regards,
Yu

On 29 May 2017 at 03:48, jeff saremi 
mailto:jeffsar...@hotmail.com>> wrote:

Yes Yu. What you're suggesting would work for us too and would still be 
appreciated.

thanks a lot

jeff


From: Yu Li mailto:car...@gmail.com>>
Sent: Sunday, May 28, 2017 10:13:38 AM
To: jeff saremi
Cc: d...@hbase.apache.org<mailto:d...@hbase.apache.org>; hbase-user

Subject: Re: What is Dead Region Servers and how to clear them up?

Thanks for the additional information Jeff, interesting scenario.

Let me re-explain: dead server means on this node (or container, in your case) 
there was a regionserver process once but not now. This doesn't indicate the 
current health state of the cluster, but only tells the fact and alarm operator 
to give a check on those nodes/containers to see what problem cause them dead. 
But I admit that these might cause confusion.

And as I proposed in previous mail, I think in the Yarn/Mesos deployment 
scenario we need to supply a command to clear those dead servers. To be more 
specified, after all the actions, no matter automatic ones like WAL split and 
zk clearance, or the manual ones like hbck -repair, as long as we're sure we 
don't need to care about those dead servers any more, we could remove them from 
master UI. If this satisfies what you desire, I could open a JIRA and get the 
work done (smile).

Let me know your thoughts, thanks.

Best Regards,
Yu

On 28 May 2017 at 23:26, jeff saremi 
mailto:jeffsar...@hotmail.com>> wrote:

I think more and more deployments are being made dynamic using Yarn and Mesos. 
Going back to a fixed set of servers is not going to eliminate the problem i'm 
talking about. Making assumptions that the region servers come back on the same 
node is too optimistic.

Let me try this a different way to see if I can make my point:

- A cluster is either healthy or not healthy.

- If the cluster is unhealthy, then it can be made healthy using either 
external tools (hbck) or the internal agreement of master-regionserver. If this 
is not achievable, then the cluster must be discarded.

- The cluster is now healthy, meaning that no information should be lingering 
on such as dead server, dead regions, or whatever anywhere in the system. And 
moreover no such information must ever be brought up to the attention of the 
administrators of the cluster.

- If there is such information still hiding in some place in the system, then 
it only means that the mechansim (hbck or hbase itself) that made the system 
healthy did not complete its job in cleaning up what is needed to be cleaned up




From: Ted Yu mailto:yuzhih...@gmail.com>>
Sent: Saturday, May 27, 2017 1:54:50 PM

To: d...@hbase.apache.org<mailto:d...@hbase.apache.org>
Cc: Hbase-User; Yu Li
Subject: Re: What is Dead Region Servers and how to clear them up?

The involvement of Yarn can explain why you observed relatively more dead
servers (compared to traditional deployment).

Suppose in first run, Yarn allocates containers for region servers on a set
of nodes. Subsequently, Yarn may choose nodes (for the same number of
servers) which are not exactly the same nodes in the previous run.

What Yu Li described as restarting server is on the same node where the
server was running previously.

Cheers

On Sat, May 27, 2017 at 11:59 AM, jeff saremi 
mailto:jeffsar...@hotmail.com>>
wrote:

> Yes. we don't have fixed servers with the exceptions of ZK machines.
>
> We have 3 yarn jobs one for each of master, region, and thrift servers
> each launched separately with different number of nodes. I hope that's not
> what is causing problems.
>
> 
> From: Ted Yu mailto:yuzhih...@gmail.com>>
> Sent: Saturday, May 27, 2017 11:27:36 AM
> To: d...@hbase.apache.org<mailto:d...@hbase.apache.org>
> Cc: Hbase-User; Yu Li
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Jeff:
> bq. We run our cluster on Yarn and upon restarting jobs in Yarn
>
> Can you clarify a bit more - are you running hbase processes inside Yarn
> container ?
>
> Cheers
>
> On Sat, May 27, 2017 at 10:58 AM, jeff saremi 
> mailto:jeffsar...@hotmail.com>>
> wrote:
>
> > Thanks @Yu Li<mailto:car...@gmail.com>
> >
> > You are absolutely correct. Dead RS's will happen regardless. My issue
> > with this is more "psychological". If I have done everything needed to be
> > 

Re: What is Dead Region Servers and how to clear them up?

2017-05-28 Thread jeff saremi
Yes Yu. What you're suggesting would work for us too and would still be 
appreciated.

thanks a lot

jeff


From: Yu Li 
Sent: Sunday, May 28, 2017 10:13:38 AM
To: jeff saremi
Cc: d...@hbase.apache.org; hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Thanks for the additional information Jeff, interesting scenario.

Let me re-explain: dead server means on this node (or container, in your case) 
there was a regionserver process once but not now. This doesn't indicate the 
current health state of the cluster, but only tells the fact and alarm operator 
to give a check on those nodes/containers to see what problem cause them dead. 
But I admit that these might cause confusion.

And as I proposed in previous mail, I think in the Yarn/Mesos deployment 
scenario we need to supply a command to clear those dead servers. To be more 
specified, after all the actions, no matter automatic ones like WAL split and 
zk clearance, or the manual ones like hbck -repair, as long as we're sure we 
don't need to care about those dead servers any more, we could remove them from 
master UI. If this satisfies what you desire, I could open a JIRA and get the 
work done (smile).

Let me know your thoughts, thanks.

Best Regards,
Yu

On 28 May 2017 at 23:26, jeff saremi 
mailto:jeffsar...@hotmail.com>> wrote:

I think more and more deployments are being made dynamic using Yarn and Mesos. 
Going back to a fixed set of servers is not going to eliminate the problem i'm 
talking about. Making assumptions that the region servers come back on the same 
node is too optimistic.

Let me try this a different way to see if I can make my point:

- A cluster is either healthy or not healthy.

- If the cluster is unhealthy, then it can be made healthy using either 
external tools (hbck) or the internal agreement of master-regionserver. If this 
is not achievable, then the cluster must be discarded.

- The cluster is now healthy, meaning that no information should be lingering 
on such as dead server, dead regions, or whatever anywhere in the system. And 
moreover no such information must ever be brought up to the attention of the 
administrators of the cluster.

- If there is such information still hiding in some place in the system, then 
it only means that the mechansim (hbck or hbase itself) that made the system 
healthy did not complete its job in cleaning up what is needed to be cleaned up




From: Ted Yu mailto:yuzhih...@gmail.com>>
Sent: Saturday, May 27, 2017 1:54:50 PM

To: d...@hbase.apache.org<mailto:d...@hbase.apache.org>
Cc: Hbase-User; Yu Li
Subject: Re: What is Dead Region Servers and how to clear them up?

The involvement of Yarn can explain why you observed relatively more dead
servers (compared to traditional deployment).

Suppose in first run, Yarn allocates containers for region servers on a set
of nodes. Subsequently, Yarn may choose nodes (for the same number of
servers) which are not exactly the same nodes in the previous run.

What Yu Li described as restarting server is on the same node where the
server was running previously.

Cheers

On Sat, May 27, 2017 at 11:59 AM, jeff saremi 
mailto:jeffsar...@hotmail.com>>
wrote:

> Yes. we don't have fixed servers with the exceptions of ZK machines.
>
> We have 3 yarn jobs one for each of master, region, and thrift servers
> each launched separately with different number of nodes. I hope that's not
> what is causing problems.
>
> 
> From: Ted Yu mailto:yuzhih...@gmail.com>>
> Sent: Saturday, May 27, 2017 11:27:36 AM
> To: d...@hbase.apache.org<mailto:d...@hbase.apache.org>
> Cc: Hbase-User; Yu Li
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Jeff:
> bq. We run our cluster on Yarn and upon restarting jobs in Yarn
>
> Can you clarify a bit more - are you running hbase processes inside Yarn
> container ?
>
> Cheers
>
> On Sat, May 27, 2017 at 10:58 AM, jeff saremi 
> mailto:jeffsar...@hotmail.com>>
> wrote:
>
> > Thanks @Yu Li<mailto:car...@gmail.com>
> >
> > You are absolutely correct. Dead RS's will happen regardless. My issue
> > with this is more "psychological". If I have done everything needed to be
> > done to ensure that RSs are running fine and regions are assigned and
> such
> > and hbck reports are consistent then how is this list of dead region
> > servers helping me? other than causing anxiety?
> > We run our cluster on Yarn and upon restarting jobs in Yarn we get a lot
> > of inconsistent, unavailable regions. (and this is only one scenario).
> Then
> > we'll run hbck with -repair option (and i was wrong here too: hbck does
> > take care of some issues) and restart the 

Re: What is Dead Region Servers and how to clear them up?

2017-05-28 Thread jeff saremi
I think more and more deployments are being made dynamic using Yarn and Mesos. 
Going back to a fixed set of servers is not going to eliminate the problem i'm 
talking about. Making assumptions that the region servers come back on the same 
node is too optimistic.

Let me try this a different way to see if I can make my point:

- A cluster is either healthy or not healthy.

- If the cluster is unhealthy, then it can be made healthy using either 
external tools (hbck) or the internal agreement of master-regionserver. If this 
is not achievable, then the cluster must be discarded.

- The cluster is now healthy, meaning that no information should be lingering 
on such as dead server, dead regions, or whatever anywhere in the system. And 
moreover no such information must ever be brought up to the attention of the 
administrators of the cluster.

- If there is such information still hiding in some place in the system, then 
it only means that the mechansim (hbck or hbase itself) that made the system 
healthy did not complete its job in cleaning up what is needed to be cleaned up




From: Ted Yu 
Sent: Saturday, May 27, 2017 1:54:50 PM
To: d...@hbase.apache.org
Cc: Hbase-User; Yu Li
Subject: Re: What is Dead Region Servers and how to clear them up?

The involvement of Yarn can explain why you observed relatively more dead
servers (compared to traditional deployment).

Suppose in first run, Yarn allocates containers for region servers on a set
of nodes. Subsequently, Yarn may choose nodes (for the same number of
servers) which are not exactly the same nodes in the previous run.

What Yu Li described as restarting server is on the same node where the
server was running previously.

Cheers

On Sat, May 27, 2017 at 11:59 AM, jeff saremi 
wrote:

> Yes. we don't have fixed servers with the exceptions of ZK machines.
>
> We have 3 yarn jobs one for each of master, region, and thrift servers
> each launched separately with different number of nodes. I hope that's not
> what is causing problems.
>
> 
> From: Ted Yu 
> Sent: Saturday, May 27, 2017 11:27:36 AM
> To: d...@hbase.apache.org
> Cc: Hbase-User; Yu Li
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Jeff:
> bq. We run our cluster on Yarn and upon restarting jobs in Yarn
>
> Can you clarify a bit more - are you running hbase processes inside Yarn
> container ?
>
> Cheers
>
> On Sat, May 27, 2017 at 10:58 AM, jeff saremi 
> wrote:
>
> > Thanks @Yu Li<mailto:car...@gmail.com>
> >
> > You are absolutely correct. Dead RS's will happen regardless. My issue
> > with this is more "psychological". If I have done everything needed to be
> > done to ensure that RSs are running fine and regions are assigned and
> such
> > and hbck reports are consistent then how is this list of dead region
> > servers helping me? other than causing anxiety?
> > We run our cluster on Yarn and upon restarting jobs in Yarn we get a lot
> > of inconsistent, unavailable regions. (and this is only one scenario).
> Then
> > we'll run hbck with -repair option (and i was wrong here too: hbck does
> > take care of some issues) and restart the master(s). After that there
> seem
> > to be no more issues other than dead region servers being still reported.
> > We should not have this anymore after having taken all precautions to
> reset
> > the system properly.
> >
> > If was trying to write something similar to what hbck would do to take
> > care of this specific issue. I wouldn't mind contributing to the hbck
> > itself either. However I needed to understand where this list comes from
> > and why. These are things that I could possibly automate (after all the
> > other steps i mentioned):
> > - check the ZK list of RS's. If any of the dead RS's found, remove node
> >
> > - check hdfs root WALs folder. If there are any with the dead RS's name
> in
> > them, delete them. (here we need to take precaution as @Enis mentioned;
> > possibly if the node timestamp has not been changed in a while)
> >
> > - what else? These steps are not enough
> >
> > For instance, we currently have 17 servers being reported as dead. Only
> > 3-4 of them show up in hdfs with "-splitting" in their WALS folder. Where
> > do the rest come from?
> > thanks
> >
> > Jeff
> >
> > 
> > From: Yu Li 
> > Sent: Friday, May 26, 2017 10:18:09 PM
> > To: Hbase-User
> > Cc: d...@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. And having a list of "dead"

Re: What is Dead Region Servers and how to clear them up?

2017-05-27 Thread jeff saremi
Yes. we don't have fixed servers with the exceptions of ZK machines.

We have 3 yarn jobs one for each of master, region, and thrift servers each 
launched separately with different number of nodes. I hope that's not what is 
causing problems.


From: Ted Yu 
Sent: Saturday, May 27, 2017 11:27:36 AM
To: d...@hbase.apache.org
Cc: Hbase-User; Yu Li
Subject: Re: What is Dead Region Servers and how to clear them up?

Jeff:
bq. We run our cluster on Yarn and upon restarting jobs in Yarn

Can you clarify a bit more - are you running hbase processes inside Yarn
container ?

Cheers

On Sat, May 27, 2017 at 10:58 AM, jeff saremi 
wrote:

> Thanks @Yu Li<mailto:car...@gmail.com>
>
> You are absolutely correct. Dead RS's will happen regardless. My issue
> with this is more "psychological". If I have done everything needed to be
> done to ensure that RSs are running fine and regions are assigned and such
> and hbck reports are consistent then how is this list of dead region
> servers helping me? other than causing anxiety?
> We run our cluster on Yarn and upon restarting jobs in Yarn we get a lot
> of inconsistent, unavailable regions. (and this is only one scenario). Then
> we'll run hbck with -repair option (and i was wrong here too: hbck does
> take care of some issues) and restart the master(s). After that there seem
> to be no more issues other than dead region servers being still reported.
> We should not have this anymore after having taken all precautions to reset
> the system properly.
>
> If was trying to write something similar to what hbck would do to take
> care of this specific issue. I wouldn't mind contributing to the hbck
> itself either. However I needed to understand where this list comes from
> and why. These are things that I could possibly automate (after all the
> other steps i mentioned):
> - check the ZK list of RS's. If any of the dead RS's found, remove node
>
> - check hdfs root WALs folder. If there are any with the dead RS's name in
> them, delete them. (here we need to take precaution as @Enis mentioned;
> possibly if the node timestamp has not been changed in a while)
>
> - what else? These steps are not enough
>
> For instance, we currently have 17 servers being reported as dead. Only
> 3-4 of them show up in hdfs with "-splitting" in their WALS folder. Where
> do the rest come from?
> thanks
>
> Jeff
>
> 
> From: Yu Li 
> Sent: Friday, May 26, 2017 10:18:09 PM
> To: Hbase-User
> Cc: d...@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> bq. And having a list of "dead" servers is not a healthy thing to have.
> I don't think the existence of "dead" servers means the service is
> unhealthy, especially in a distributed system. Besides hbase, HDFS also
> shows Live and Dead nodes in namenode UI, and people won't regard HDFS as
> unhealthy if there're dead nodes.
>
> In HBase, if some RS aborts due to unexpected issue like long GC, normally
> we will restart it and once it's restarted and report to master, it will be
> removed from the dead server list. So when we observed dead server in
> Master UI, the first thing is to check the root cause and restart it if it
> won't cause further issue.
>
> However, sometimes we may find the server aborted due to some hardware
> failure and we must offline the server for repairing. Or we need to move
> some nodes to join other clusters so we stop the RS process on purpose. I
> guess this is the case you're dealing with @jeff? If so, I think it's a
> reasonable requirement that we supply a command in hbase to clear the dead
> nodes when operator assure they no longer serves.
>
> Best Regards,
> Yu
>
> On 27 May 2017 at 04:49, Enis Sรถztutar  wrote:
>
> > In general if there are no regions in transition, the WAL recovery has
> > already finished. You can watch the master's log4j log for those entries,
> > but the lack of regions in transition is the easiest way to identify.
> >
> > Enis
> >
> > On Fri, May 26, 2017 at 12:14 PM, jeff saremi 
> > wrote:
> >
> > > thanks Enis
> > >
> > > I apologize for earlier
> > >
> > > This looks very close to our issue
> > > When you say: "there is no "WAL" recovery is happening", how could i
> make
> > > sure of that? Thanks
> > >
> > > Jeff
> > >
> > >
> > > 
> > > From: Enis Sรถztutar 
> > > Sent: Friday, May 26, 2017 11:47:11 AM
> > > To: d...@hbase.apach

Re: What is Dead Region Servers and how to clear them up?

2017-05-27 Thread jeff saremi
Thanks @Yu Li<mailto:car...@gmail.com>

You are absolutely correct. Dead RS's will happen regardless. My issue with 
this is more "psychological". If I have done everything needed to be done to 
ensure that RSs are running fine and regions are assigned and such and hbck 
reports are consistent then how is this list of dead region servers helping me? 
other than causing anxiety?
We run our cluster on Yarn and upon restarting jobs in Yarn we get a lot of 
inconsistent, unavailable regions. (and this is only one scenario). Then we'll 
run hbck with -repair option (and i was wrong here too: hbck does take care of 
some issues) and restart the master(s). After that there seem to be no more 
issues other than dead region servers being still reported. We should not have 
this anymore after having taken all precautions to reset the system properly.

If was trying to write something similar to what hbck would do to take care of 
this specific issue. I wouldn't mind contributing to the hbck itself either. 
However I needed to understand where this list comes from and why. These are 
things that I could possibly automate (after all the other steps i mentioned):
- check the ZK list of RS's. If any of the dead RS's found, remove node

- check hdfs root WALs folder. If there are any with the dead RS's name in 
them, delete them. (here we need to take precaution as @Enis mentioned; 
possibly if the node timestamp has not been changed in a while)

- what else? These steps are not enough

For instance, we currently have 17 servers being reported as dead. Only 3-4 of 
them show up in hdfs with "-splitting" in their WALS folder. Where do the rest 
come from?
thanks

Jeff


From: Yu Li 
Sent: Friday, May 26, 2017 10:18:09 PM
To: Hbase-User
Cc: d...@hbase.apache.org
Subject: Re: What is Dead Region Servers and how to clear them up?

bq. And having a list of "dead" servers is not a healthy thing to have.
I don't think the existence of "dead" servers means the service is
unhealthy, especially in a distributed system. Besides hbase, HDFS also
shows Live and Dead nodes in namenode UI, and people won't regard HDFS as
unhealthy if there're dead nodes.

In HBase, if some RS aborts due to unexpected issue like long GC, normally
we will restart it and once it's restarted and report to master, it will be
removed from the dead server list. So when we observed dead server in
Master UI, the first thing is to check the root cause and restart it if it
won't cause further issue.

However, sometimes we may find the server aborted due to some hardware
failure and we must offline the server for repairing. Or we need to move
some nodes to join other clusters so we stop the RS process on purpose. I
guess this is the case you're dealing with @jeff? If so, I think it's a
reasonable requirement that we supply a command in hbase to clear the dead
nodes when operator assure they no longer serves.

Best Regards,
Yu

On 27 May 2017 at 04:49, Enis Sรถztutar  wrote:

> In general if there are no regions in transition, the WAL recovery has
> already finished. You can watch the master's log4j log for those entries,
> but the lack of regions in transition is the easiest way to identify.
>
> Enis
>
> On Fri, May 26, 2017 at 12:14 PM, jeff saremi 
> wrote:
>
> > thanks Enis
> >
> > I apologize for earlier
> >
> > This looks very close to our issue
> > When you say: "there is no "WAL" recovery is happening", how could i make
> > sure of that? Thanks
> >
> > Jeff
> >
> >
> > 
> > From: Enis Sรถztutar 
> > Sent: Friday, May 26, 2017 11:47:11 AM
> > To: d...@hbase.apache.org
> > Cc: hbase-user
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > Jeff, please be respectful to be people who are trying to help you. This
> is
> > not acceptable behavior and will result in consequences next time.
> >
> > On the specific issue that you are seeing, it is highly likely that you
> are
> > seeing this: https://issues.apache.org/jira/browse/HBASE-14223. Having
> > those servers in the dead servers list will not hurt operations, or
> > runtimes or anything else. Possibly for those servers, there is not new
> > instance of the regionserver running in the same host and ports.
> >
> > If you want to manually clean out these, you can follow these steps:
> >  - Manually move these directries from the file system:
> > /WALs/dead-server-splitting
> >  - ONLY do this if you are sure that there is no "WAL" recovery is
> > happening, and there is only WAL files with names containing ".meta."
>

Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
Sir

You're not only not helping but you're also polluting my post and reducing its 
visibility. Now you're asking for recognition for that too?

If you don't have anything to add to my question, please don't respond to it. 
Let someone else who might have something to say not get tricked into thinking 
that my post was already addressed.

Jeff


From: Dima Spivak 
Sent: Friday, May 26, 2017 1:27:33 PM
To: hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Actually, it's a "Please give us the details another member of the project
already asked for."

This is a community mailing list, which means we volunteer our time to help
people with questions. If you're looking for customer support, you should
be taking your question to a consultant or vendor that provides such
services. Being a jerk is incredibly counterproductive.

-Dima

On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
@James

Thanks for the insight. I think that's also our case. I see the dead region 
list but it seems like our cluster is operating properly.
However, from a maintenance standpoint I'd like the cluster to always report as 
health. And having a list of "dead" servers is not a healthy thing to have.
So i was hoping that from the comments I'd be collecting here, I could write a 
shell file that would do this clean up in an automated fashion. I just needed 
insight as to what I should be cleaning up and when it's safe to do so.

jeff


From: James Moore 
Sent: Friday, May 26, 2017 11:35:22 AM
To: user@hbase.apache.org
Cc: d...@hbase.apache.org
Subject: Re: What is Dead Region Servers and how to clear them up?

In HBase all data is stored in HDFS rather than inside of the region
server.  The HBase cluster itself considers any individual region
server process a region server and when that process dies it is considered
a dead region server, this tracking is particularly important during the
crash recovery process and dealing with network partitions, there isn't any
need to clean up dead region servers as an out of band maintenance task and
will be cleaned up by the HMasters eventually.

On Fri, May 26, 2017 at 2:03 PM, jeff saremi  wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
thanks Enis

I apologize for earlier

This looks very close to our issue
When you say: "there is no "WAL" recovery is happening", how could i make sure 
of that? Thanks

Jeff



From: Enis Sรถztutar 
Sent: Friday, May 26, 2017 11:47:11 AM
To: d...@hbase.apache.org
Cc: hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Jeff, please be respectful to be people who are trying to help you. This is
not acceptable behavior and will result in consequences next time.

On the specific issue that you are seeing, it is highly likely that you are
seeing this: https://issues.apache.org/jira/browse/HBASE-14223. Having
those servers in the dead servers list will not hurt operations, or
runtimes or anything else. Possibly for those servers, there is not new
instance of the regionserver running in the same host and ports.

If you want to manually clean out these, you can follow these steps:
 - Manually move these directries from the file system:
/WALs/dead-server-splitting
 - ONLY do this if you are sure that there is no "WAL" recovery is
happening, and there is only WAL files with names containing ".meta."
 - Restart HBase master.

Upon restart, you can see that these do not show up anymore. For more
technical details, please refer to the jira link.

Enis

On Fri, May 26, 2017 at 11:03 AM, jeff saremi 
wrote:

> Thank you for the GFY answer
>
> And i guess to figure out how to fix these I can always go through the
> HBase source code.
>
>
> 
> From: Dima Spivak 
> Sent: Friday, May 26, 2017 9:58:00 AM
> To: hbase-user
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> Sending this back to the user mailing list.
>
> RegionServers can die for many reasons. Looking at your RegionServer log
> files should give hints as to why it's happening.
>
>
> -Dima
>
> On Fri, May 26, 2017 at 9:48 AM, jeff saremi 
> wrote:
>
> > I had posted this to the user mailing list and I have not got any direct
> > answer to my question.
> >
> > Where do dead RS's come from and how can they be cleaned up? Someone in
> > the midst of developers should know this.
> >
> > thanks
> >
> > Jeff
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, May 25, 2017 10:23:17 AM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > I'm still looking to get hints on how to remove the dead regions. thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 12:27:06 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > i'm trying to eliminate the dead region servers.
> >
> > 
> > From: Ted Yu 
> > Sent: Wednesday, May 24, 2017 12:17:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is Dead Region Servers and how to clear them up?
> >
> > bq. running hbck (many times
> >
> > Can you describe the specific inconsistencies you were trying to resolve
> ?
> > Depending on the inconsistencies, advice can be given on the best known
> > hbck command arguments to use.
> >
> > Feel free to pastebin master log if needed.
> >
> > On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> > wrote:
> >
> > > these are the things I have done so far:
> > >
> > >
> > > - restarting master (few times)
> > >
> > > - running hbck (many times; this tool does not seem to be doing
> anything
> > > at all)
> > >
> > > - checking the list of region servers in ZK (none of the dead ones are
> > > listed here)
> > >
> > > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > > are listed here with "-splitting" at the end of their names and they
> > > contain one single file like: 1493846660401..meta.1493922323600.meta
> > >
> > >
> > >
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > > To: user@hbase.apache.org
> > > Subject: What is Dead Region Servers and how to clear them up?
> > >
> > > Apparently having dead region servers is so common that a section of
> the
> > > master console is dedicated to that?
> > > How can we clean this up (preferably in an automated fashion)? Why
> isn't
> > > this being done by HBase automatically?
> > >
> > >
> > > thanks
> > >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-26 Thread jeff saremi
Thank you for the GFY answer

And i guess to figure out how to fix these I can always go through the HBase 
source code.



From: Dima Spivak 
Sent: Friday, May 26, 2017 9:58:00 AM
To: hbase-user
Subject: Re: What is Dead Region Servers and how to clear them up?

Sending this back to the user mailing list.

RegionServers can die for many reasons. Looking at your RegionServer log
files should give hints as to why it's happening.


-Dima

On Fri, May 26, 2017 at 9:48 AM, jeff saremi  wrote:

> I had posted this to the user mailing list and I have not got any direct
> answer to my question.
>
> Where do dead RS's come from and how can they be cleaned up? Someone in
> the midst of developers should know this.
>
> thanks
>
> Jeff
>
> ________
> From: jeff saremi 
> Sent: Thursday, May 25, 2017 10:23:17 AM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> I'm still looking to get hints on how to remove the dead regions. thanks
>
> 
> From: jeff saremi 
> Sent: Wednesday, May 24, 2017 12:27:06 PM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> i'm trying to eliminate the dead region servers.
>
> 
> From: Ted Yu 
> Sent: Wednesday, May 24, 2017 12:17:40 PM
> To: user@hbase.apache.org
> Subject: Re: What is Dead Region Servers and how to clear them up?
>
> bq. running hbck (many times
>
> Can you describe the specific inconsistencies you were trying to resolve ?
> Depending on the inconsistencies, advice can be given on the best known
> hbck command arguments to use.
>
> Feel free to pastebin master log if needed.
>
> On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
> wrote:
>
> > these are the things I have done so far:
> >
> >
> > - restarting master (few times)
> >
> > - running hbck (many times; this tool does not seem to be doing anything
> > at all)
> >
> > - checking the list of region servers in ZK (none of the dead ones are
> > listed here)
> >
> > - checking the WALs under /WALs. Out of 11 dead ones only 3
> > are listed here with "-splitting" at the end of their names and they
> > contain one single file like: 1493846660401..meta.1493922323600.meta
> >
> >
> >
> >
> > 
> > From: jeff saremi 
> > Sent: Wednesday, May 24, 2017 9:04:11 AM
> > To: user@hbase.apache.org
> > Subject: What is Dead Region Servers and how to clear them up?
> >
> > Apparently having dead region servers is so common that a section of the
> > master console is dedicated to that?
> > How can we clean this up (preferably in an automated fashion)? Why isn't
> > this being done by HBase automatically?
> >
> >
> > thanks
> >
>


Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread jeff saremi
@James, thank you very much. That was extremely helpful


From: James Moore 
Sent: Friday, May 26, 2017 10:24:42 AM
To: user@hbase.apache.org
Subject: Re: What is the cause for RegionTooBusyException?

One mechanism for revealing the error in question is to print one of the
individual exceptions which are included in the batch calls response.  We
use this in a few places to allow inspection of individual Exceptions you
can see an example of how to do this over here
https://github.com/apache/hbase/blob/master/hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java#L1222


On Fri, May 26, 2017 at 12:47 PM, jeff saremi 
wrote:

> Hi Stack
>
> no there are no details in the exception. I mentioned that in another
> thread. When you perform a Batch operation, I believe no details will be
> communicated. I am not sure about individual Put's though. That makes it
> hard to go through logs cause we don't know out of hundreds of RS's which
> logs to look at
>
> I have an issue with this exception being thrown period. I think the
> resource management needs a lot of work. I will soon post another note
> about my impression of this whole thing.
>
> Jeff
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Friday, May 26, 2017 12:05:36 AM
> To: Hbase-User
> Subject: Re: What is the cause for RegionTooBusyException?
>
> On Mon, May 22, 2017 at 9:31 AM, jeff saremi 
> wrote:
>
> > while I'm still trying to find anything useful in the logs, my question
> is
> > why isn't HBase self managing this?
> >
>
> It should do better here, yes (I thought TooBusy retried but I am not
> finding it at the mo.). Exception is thrown for such as the reasons James
> lists -- in essence out of resources --  including the case where we fail
> to obtain lock inside the configured timeouts (row lock on write or region
> lock doing bulk load). As James notes, you should see the too busy dumped
> into the regionserver log at time of issue. Having this, you can figure
> what resource is crimped. Is there no more detail on client side on the
> root of the TooBusy exceptions?
>
>
> Thanks,
> S
>
>
>
> >
> > 
> > From: jeff saremi 
> > Sent: Friday, May 19, 2017 8:18:59 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Thanks Ted. I will look deeper as you suggested
> >
> > 
> > From: Ted Yu 
> > Sent: Friday, May 19, 2017 4:18:12 PM
> > To: user@hbase.apache.org
> > Subject: Re: What is the cause for RegionTooBusyException?
> >
> > Have you checked region server log ?
> > Please take a look at the following method in HRegion:
> >
> >   private void checkResources() throws RegionTooBusyException {
> >
> > ...
> >
> > if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
> >
> >   blockedRequestsCount.increment();
> >
> >   requestFlush();
> >
> >   throw new RegionTooBusyException("Above memstore limit, " +
> >
> > Which hbase release are you using ?
> >
> > Cheers
> >
> > On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> > wrote:
> >
> > > We're getting errors like this. Where should we be looking into to
> solve
> > > this?
> > >
> > >
> > > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > > RemoteWithExtrasException: 56566 times
> > >
> > > thanks
> > >
> > > Jeff
> > >
> > >
> >
>


Re: What is the cause for RegionTooBusyException?

2017-05-26 Thread jeff saremi
Hi Stack

no there are no details in the exception. I mentioned that in another thread. 
When you perform a Batch operation, I believe no details will be communicated. 
I am not sure about individual Put's though. That makes it hard to go through 
logs cause we don't know out of hundreds of RS's which logs to look at

I have an issue with this exception being thrown period. I think the resource 
management needs a lot of work. I will soon post another note about my 
impression of this whole thing.

Jeff


From: saint@gmail.com  on behalf of Stack 

Sent: Friday, May 26, 2017 12:05:36 AM
To: Hbase-User
Subject: Re: What is the cause for RegionTooBusyException?

On Mon, May 22, 2017 at 9:31 AM, jeff saremi  wrote:

> while I'm still trying to find anything useful in the logs, my question is
> why isn't HBase self managing this?
>

It should do better here, yes (I thought TooBusy retried but I am not
finding it at the mo.). Exception is thrown for such as the reasons James
lists -- in essence out of resources --  including the case where we fail
to obtain lock inside the configured timeouts (row lock on write or region
lock doing bulk load). As James notes, you should see the too busy dumped
into the regionserver log at time of issue. Having this, you can figure
what resource is crimped. Is there no more detail on client side on the
root of the TooBusy exceptions?


Thanks,
S



>
> ____
> From: jeff saremi 
> Sent: Friday, May 19, 2017 8:18:59 PM
> To: user@hbase.apache.org
> Subject: Re: What is the cause for RegionTooBusyException?
>
> Thanks Ted. I will look deeper as you suggested
>
> 
> From: Ted Yu 
> Sent: Friday, May 19, 2017 4:18:12 PM
> To: user@hbase.apache.org
> Subject: Re: What is the cause for RegionTooBusyException?
>
> Have you checked region server log ?
> Please take a look at the following method in HRegion:
>
>   private void checkResources() throws RegionTooBusyException {
>
> ...
>
> if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {
>
>   blockedRequestsCount.increment();
>
>   requestFlush();
>
>   throw new RegionTooBusyException("Above memstore limit, " +
>
> Which hbase release are you using ?
>
> Cheers
>
> On Fri, May 19, 2017 at 3:59 PM, jeff saremi 
> wrote:
>
> > We're getting errors like this. Where should we be looking into to solve
> > this?
> >
> >
> > Failed 69261 actions: RegionTooBusyException: 12695 times,
> > RemoteWithExtrasException: 56566 times
> >
> > thanks
> >
> > Jeff
> >
> >
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-25 Thread jeff saremi
I'm still looking to get hints on how to remove the dead regions. thanks


From: jeff saremi 
Sent: Wednesday, May 24, 2017 12:27:06 PM
To: user@hbase.apache.org
Subject: Re: What is Dead Region Servers and how to clear them up?

i'm trying to eliminate the dead region servers.


From: Ted Yu 
Sent: Wednesday, May 24, 2017 12:17:40 PM
To: user@hbase.apache.org
Subject: Re: What is Dead Region Servers and how to clear them up?

bq. running hbck (many times

Can you describe the specific inconsistencies you were trying to resolve ?
Depending on the inconsistencies, advice can be given on the best known
hbck command arguments to use.

Feel free to pastebin master log if needed.

On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
wrote:

> these are the things I have done so far:
>
>
> - restarting master (few times)
>
> - running hbck (many times; this tool does not seem to be doing anything
> at all)
>
> - checking the list of region servers in ZK (none of the dead ones are
> listed here)
>
> - checking the WALs under /WALs. Out of 11 dead ones only 3
> are listed here with "-splitting" at the end of their names and they
> contain one single file like: 1493846660401..meta.1493922323600.meta
>
>
>
>
> 
> From: jeff saremi 
> Sent: Wednesday, May 24, 2017 9:04:11 AM
> To: user@hbase.apache.org
> Subject: What is Dead Region Servers and how to clear them up?
>
> Apparently having dead region servers is so common that a section of the
> master console is dedicated to that?
> How can we clean this up (preferably in an automated fashion)? Why isn't
> this being done by HBase automatically?
>
>
> thanks
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-24 Thread jeff saremi
i'm trying to eliminate the dead region servers.


From: Ted Yu 
Sent: Wednesday, May 24, 2017 12:17:40 PM
To: user@hbase.apache.org
Subject: Re: What is Dead Region Servers and how to clear them up?

bq. running hbck (many times

Can you describe the specific inconsistencies you were trying to resolve ?
Depending on the inconsistencies, advice can be given on the best known
hbck command arguments to use.

Feel free to pastebin master log if needed.

On Wed, May 24, 2017 at 12:10 PM, jeff saremi 
wrote:

> these are the things I have done so far:
>
>
> - restarting master (few times)
>
> - running hbck (many times; this tool does not seem to be doing anything
> at all)
>
> - checking the list of region servers in ZK (none of the dead ones are
> listed here)
>
> - checking the WALs under /WALs. Out of 11 dead ones only 3
> are listed here with "-splitting" at the end of their names and they
> contain one single file like: 1493846660401..meta.1493922323600.meta
>
>
>
>
> 
> From: jeff saremi 
> Sent: Wednesday, May 24, 2017 9:04:11 AM
> To: user@hbase.apache.org
> Subject: What is Dead Region Servers and how to clear them up?
>
> Apparently having dead region servers is so common that a section of the
> master console is dedicated to that?
> How can we clean this up (preferably in an automated fashion)? Why isn't
> this being done by HBase automatically?
>
>
> thanks
>


Re: What is Dead Region Servers and how to clear them up?

2017-05-24 Thread jeff saremi
these are the things I have done so far:


- restarting master (few times)

- running hbck (many times; this tool does not seem to be doing anything at all)

- checking the list of region servers in ZK (none of the dead ones are listed 
here)

- checking the WALs under /WALs. Out of 11 dead ones only 3 are 
listed here with "-splitting" at the end of their names and they contain one 
single file like: 1493846660401..meta.1493922323600.meta




____
From: jeff saremi 
Sent: Wednesday, May 24, 2017 9:04:11 AM
To: user@hbase.apache.org
Subject: What is Dead Region Servers and how to clear them up?

Apparently having dead region servers is so common that a section of the master 
console is dedicated to that?
How can we clean this up (preferably in an automated fashion)? Why isn't this 
being done by HBase automatically?


thanks


What is Dead Region Servers and how to clear them up?

2017-05-24 Thread jeff saremi
Apparently having dead region servers is so common that a section of the master 
console is dedicated to that?
How can we clean this up (preferably in an automated fashion)? Why isn't this 
being done by HBase automatically?


thanks


Re: How to get useful info from Client exceptions

2017-05-23 Thread jeff saremi
Thansk Ted
We might have a stale hbase-site.xml deployed along the code which runs on 
Spark.

however what we set is the Zookeeper quorum for the right cluster.

I was told that the third Batch starts to throw errors like these and the first 
two are fine. But then again I'm not running this myself

We're going to increase the log level to DEBUG to see if we see anything.


From: Ted Yu 
Sent: Tuesday, May 23, 2017 3:49:27 PM
To: user@hbase.apache.org
Subject: Re: How to get useful info from Client exceptions

The exception is composed this way (see AsyncProcess):

  return new RetriesExhaustedWithDetailsException(new
ArrayList(throwables),
  new ArrayList(actions), new ArrayList(addresses));

In your case there were 10456 UnknownHostException's.

Was there any other clue in client side log ?
I think one improvement we can do is selectively logging which host(s) was
involved in the UnknownHostException's.

BTW was hbase-site.xml on the classpath of your client ?

On Tue, May 23, 2017 at 3:28 PM, jeff saremi  wrote:

> We get errors like below which are not helping at all.
>
> For instance in this case we have no clue what server/region it is talking
> about
>
> Is there a setting we missed or somehow we could augment this information
> to help us? We're using hbase 1.2.5
>
>
> Failed 10456 actions: UnknownHostException: 10456 times,
> Stack Trace: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 10456 actions: UnknownHostException: 10456 times,
> at org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.makeException(AsyncProcess.java:258)
> at org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.access$2000(AsyncProcess.java:238)
> at org.apache.hadoop.hbase.client.AsyncProcess.
> waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> backgroundFlushCommits(BufferedMutatorImpl.java:240)
> at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> mutate(BufferedMutatorImpl.java:146)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
>
> thanks
>
>
>
>


How to get useful info from Client exceptions

2017-05-23 Thread jeff saremi
We get errors like below which are not helping at all.

For instance in this case we have no clue what server/region it is talking about

Is there a setting we missed or somehow we could augment this information to 
help us? We're using hbase 1.2.5


Failed 10456 actions: UnknownHostException: 10456 times,
Stack Trace: 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
10456 actions: UnknownHostException: 10456 times,
at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:258)
at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(AsyncProcess.java:238)
at 
org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:240)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:146)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)

thanks





Re: Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread jeff saremi
Vladimir, thanks a lot for helping us out

So I checked the no of RS in the master console. It was more than what we 
alloted.

Then I went to the list of FAIL_CLOSED regions, copied server names and then 
issued delete against those nodes in ZK.

I restarted masters (I don't think i need to do this step) and now all regions 
show as fine

Happy now!


From: Vladimir Rodionov 
Sent: Tuesday, May 23, 2017 2:41:30 PM
To: user@hbase.apache.org
Subject: Re: Regions in Transition: FAILED_CLOSE status

My bad, that is FAIL_CLOSE

Anyway, start with Master log, find region name in a FAIL_CLOSE, check RS
log that hosts this region.

On Tue, May 23, 2017 at 2:35 PM, James Moore  wrote:

> How many region servers are dead? and we're they colocated with DataNodes?
>
> On Tue, May 23, 2017 at 5:20 PM, Vladimir Rodionov  >
> wrote:
>
> > When Master attempt to assign region to RS and assignment fails, there
> > should be something in RS log file (check errors),
> > that explains reason of a failure.
> >
> > How many not-assigned region do you have? You can try to assign them
> > manually in hbase shell
> >
> > On Tue, May 23, 2017 at 1:25 PM, jeff saremi 
> > wrote:
> >
> > > Are dead region servers to blame? Is this possibly stale information in
> > > the ZK?
> > >
> > > 
> > > From: Vladimir Rodionov 
> > > Sent: Tuesday, May 23, 2017 12:20:16 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Regions in Transition: FAILED_CLOSE status
> > >
> > > You should check RS logs to see why regions can not be assigned.
> > > Get RS name from master log and check RS log
> > >
> > > -Vlad
> > >
> > > On Tue, May 23, 2017 at 11:47 AM, jeff saremi 
> > > wrote:
> > >
> > > > Our write code throws exceptions like the following:
> > > >
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > Failed 10331 actions: NotServingRegionException: 10331 times,at
> > > > org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.makeException(
> > > > AsyncProcess.java:258)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess$
> > > BatchErrors.access$2000(
> > > > AsyncProcess.java:238)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess.
> > > > waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > backgroundFlushCommits(BufferedMutatorImpl.java:240)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > mutate(BufferedMutatorImpl.java:146)
> > > >   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
> > > >   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> > > > ImageFeaturesHdfsToHbaseInjector.scala:115)
> > > >   at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > > 
> > > > From: jeff saremi 
> > > > Sent: Tuesday, May 23, 2017 11:36:11 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Regions in Transition: FAILED_CLOSE status
> > > >
> > > > Why are a few hundred of our regions in this state? and what can we
> do
> > to
> > > > fix this?
> > > > I have been running hbck a few times (is running one time enough?) to
> > no
> > > > avail.
> > > >
> > > > Internet search does not come up with anything useful either.
> > > >
> > > > I have restarted all masters and all region servers with no luck.
> > > >
> > > > Jeff
> > > >
> > >
> >
>


Re: Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread jeff saremi
Are dead region servers to blame? Is this possibly stale information in the ZK?


From: Vladimir Rodionov 
Sent: Tuesday, May 23, 2017 12:20:16 PM
To: user@hbase.apache.org
Subject: Re: Regions in Transition: FAILED_CLOSE status

You should check RS logs to see why regions can not be assigned.
Get RS name from master log and check RS log

-Vlad

On Tue, May 23, 2017 at 11:47 AM, jeff saremi 
wrote:

> Our write code throws exceptions like the following:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 10331 actions: NotServingRegionException: 10331 times,at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(
> AsyncProcess.java:258)
>   at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(
> AsyncProcess.java:238)
>   at org.apache.hadoop.hbase.client.AsyncProcess.
> waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
>   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> backgroundFlushCommits(BufferedMutatorImpl.java:240)
>   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> mutate(BufferedMutatorImpl.java:146)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
>   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> ImageFeaturesHdfsToHbaseInjector.scala:115)
>   at java.lang.Thread.run(Thread.java:745)
>
>
> 
> From: jeff saremi 
> Sent: Tuesday, May 23, 2017 11:36:11 AM
> To: user@hbase.apache.org
> Subject: Regions in Transition: FAILED_CLOSE status
>
> Why are a few hundred of our regions in this state? and what can we do to
> fix this?
> I have been running hbck a few times (is running one time enough?) to no
> avail.
>
> Internet search does not come up with anything useful either.
>
> I have restarted all masters and all region servers with no luck.
>
> Jeff
>


Re: Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread jeff saremi
Our write code throws exceptions like the following:

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
10331 actions: NotServingRegionException: 10331 times,at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:258)
  at 
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(AsyncProcess.java:238)
  at 
org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
  at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:240)
  at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:146)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
  at 
com.microsoft.bing.malta.hbaseClient11$$anon$2.run(ImageFeaturesHdfsToHbaseInjector.scala:115)
  at java.lang.Thread.run(Thread.java:745)



From: jeff saremi 
Sent: Tuesday, May 23, 2017 11:36:11 AM
To: user@hbase.apache.org
Subject: Regions in Transition: FAILED_CLOSE status

Why are a few hundred of our regions in this state? and what can we do to fix 
this?
I have been running hbck a few times (is running one time enough?) to no avail.

Internet search does not come up with anything useful either.

I have restarted all masters and all region servers with no luck.

Jeff


Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread jeff saremi
Why are a few hundred of our regions in this state? and what can we do to fix 
this?
I have been running hbck a few times (is running one time enough?) to no avail.

Internet search does not come up with anything useful either.

I have restarted all masters and all region servers with no luck.

Jeff


Re: What is the cause for RegionTooBusyException?

2017-05-22 Thread jeff saremi
while I'm still trying to find anything useful in the logs, my question is why 
isn't HBase self managing this?
In my 2-3 decades of using databases i had never had to stop a write operation 
to do anything such as compactions or whatever. The only time a write would 
fail would be if the db runs out of physical storage. The client could slowness 
due to many writes but throwing exceptions was unheard of.


____
From: jeff saremi 
Sent: Friday, May 19, 2017 8:18:59 PM
To: user@hbase.apache.org
Subject: Re: What is the cause for RegionTooBusyException?

Thanks Ted. I will look deeper as you suggested


From: Ted Yu 
Sent: Friday, May 19, 2017 4:18:12 PM
To: user@hbase.apache.org
Subject: Re: What is the cause for RegionTooBusyException?

Have you checked region server log ?
Please take a look at the following method in HRegion:

  private void checkResources() throws RegionTooBusyException {

...

if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {

  blockedRequestsCount.increment();

  requestFlush();

  throw new RegionTooBusyException("Above memstore limit, " +

Which hbase release are you using ?

Cheers

On Fri, May 19, 2017 at 3:59 PM, jeff saremi  wrote:

> We're getting errors like this. Where should we be looking into to solve
> this?
>
>
> Failed 69261 actions: RegionTooBusyException: 12695 times,
> RemoteWithExtrasException: 56566 times
>
> thanks
>
> Jeff
>
>


Re: What is the cause for RegionTooBusyException?

2017-05-19 Thread jeff saremi
Thanks Ted. I will look deeper as you suggested


From: Ted Yu 
Sent: Friday, May 19, 2017 4:18:12 PM
To: user@hbase.apache.org
Subject: Re: What is the cause for RegionTooBusyException?

Have you checked region server log ?
Please take a look at the following method in HRegion:

  private void checkResources() throws RegionTooBusyException {

...

if (this.memstoreDataSize.get() > this.blockingMemStoreSize) {

  blockedRequestsCount.increment();

  requestFlush();

  throw new RegionTooBusyException("Above memstore limit, " +

Which hbase release are you using ?

Cheers

On Fri, May 19, 2017 at 3:59 PM, jeff saremi  wrote:

> We're getting errors like this. Where should we be looking into to solve
> this?
>
>
> Failed 69261 actions: RegionTooBusyException: 12695 times,
> RemoteWithExtrasException: 56566 times
>
> thanks
>
> Jeff
>
>


Re: What is the cause for RegionTooBusyException?

2017-05-19 Thread jeff saremi
thanks James for the hints


From: James Moore 
Sent: Friday, May 19, 2017 7:42:02 PM
To: user@hbase.apache.org
Subject: Re: What is the cause for RegionTooBusyException?

That error appears to be coming from a batch call, 12695 out of 69261
operations failed with a RegionTooBusyException Some of the causes can be.

1. A full MemStore such as if you write to the MemStore faster than it can
flush or if it's too small to fit incoming writes
2. Too many storefiles for the region without compaction.  The default
number of store files which will start blocking inbound writes is 15
3. If the region is in the process of being closed but has long running
operations such as a scanner fetch active.

You should be able to see some additional info for the exact cause of the
exception in your region servers logs or in the message of a specific
RegionTooBusy exception.

cheers,

--James

On Fri, May 19, 2017 at 6:59 PM, jeff saremi  wrote:

> We're getting errors like this. Where should we be looking into to solve
> this?
>
>
> Failed 69261 actions: RegionTooBusyException: 12695 times,
> RemoteWithExtrasException: 56566 times
>
> thanks
>
> Jeff
>
>


What is the cause for RegionTooBusyException?

2017-05-19 Thread jeff saremi
We're getting errors like this. Where should we be looking into to solve this?


Failed 69261 actions: RegionTooBusyException: 12695 times, 
RemoteWithExtrasException: 56566 times

thanks

Jeff



Re: Is stop row included in the scan or not?

2017-05-03 Thread jeff saremi
thanks Ted.


From: Ted Yu 
Sent: Wednesday, May 3, 2017 3:32:12 PM
To: user@hbase.apache.org
Subject: Re: Is stop row included in the scan or not?

bq. stopRow - row to stop scanner before (exclusive)

On Wed, May 3, 2017 at 3:08 PM, jeff saremi  wrote:

> by reading the docs for 1.2 (https://hbase.apache.org/1.2/
> apidocs/org/apache/hadoop/hbase/client/Scan.html) i'm not able to tell if
> the stop row is returned in the results from a Scan or not. Could someone
> clear this up please? thanks
>
>
>


Is stop row included in the scan or not?

2017-05-03 Thread jeff saremi
by reading the docs for 1.2 
(https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/client/Scan.html) 
i'm not able to tell if the stop row is returned in the results from a Scan or 
not. Could someone clear this up please? thanks




Re: Baffling situation with tableExists and createTable

2017-04-26 Thread jeff saremi
sorry about that. I missed this post

Ted
This is version 1.2.2 of HBase. zkcli was used to remove a stale node with the 
tablename. Prior to that i ran hbck which didn't do anything
As i mentioned i opened a new Issue in Jira for this.
thanks

Jeff


From: Ted Yu 
Sent: Tuesday, April 25, 2017 5:42:24 PM
To: user@hbase.apache.org
Subject: Re: Baffling situation with tableExists and createTable

Which hbase release are you using ?

Can you check master log to see if there is some clue w.r.t. LoadTest ?

Using "hbase zkcli", you can inspect the znode status. Below is a sample:

[zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 2] ls
/hbase-unsecure/table
[hbase:meta, hbase:namespace, IntegrationTestBigLinkedList, datatsv,
usertable, hbase:backup, TestTable, t2]
[zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 3] ls
/hbase-unsecure/table/2
Node does not exist: /hbase-unsecure/table/2
[zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 4] ls
/hbase-unsecure/table/t2
[]
[zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 5] get
/hbase-unsecure/table/t2
๏ฟฝmaster:16000K๏ฟฝ๏ฟฝW๏ฟฝ,๏ฟฝPBUF
cZxid = 0x1000a7f01
ctime = Mon Mar 27 16:50:52 UTC 2017
mZxid = 0x1000a7f17
mtime = Mon Mar 27 16:50:52 UTC 2017
pZxid = 0x1000a7f01
cversion = 0
dataVersion = 2

On Tue, Apr 25, 2017 at 4:09 PM, jeff saremi  wrote:

> BTW on the page
> http://localhost:16010/master-status#userTables
> there is no sign of the supposedly existing table either
>
> ________
> From: jeff saremi 
> Sent: Tuesday, April 25, 2017 4:05:56 PM
> To: user@hbase.apache.org
> Subject: Baffling situation with tableExists and createTable
>
> I have a super simple piece of code which tries to create a test table if
> it does not exist
>
> calling admin.tableExists(TableName.valueOf(table)) returns false causing
> the control to be passed to the line that creates it 
> admin.createTable(tableDescriptor).
> Then i get an exception that the table exists!
>
> Exception in thread "main" org.apache.hadoop.hbase.TableExistsException:
> LoadTest
>
>
> String table = config.tableName;
> ...
> Connection conn = ConnectionFactory.createConnection(hbaseconf);
> Admin admin = conn.getAdmin();
> if(!admin.tableExists(TableName.valueOf(table))) {
> Log.info("table " + table + " does not exist. Creating it...");
> HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.
> valueOf(table));
> tableDescriptor.addFamily(new HColumnDescriptor(config.FAMILY));
> admin.createTable(tableDescriptor);
> }
>
> Jeff
>


Re: Baffling situation with tableExists and createTable

2017-04-26 Thread jeff saremi
i created this
https://issues.apache.org/jira/browse/HBASE-17966

to track this issue

Overall, none of this should have pushed out to the User. HBase should have 
transparently take care of inconsistency in the zookeeper. However i didn't 
make this ticket as comprehensive to include what I just said. Let's get 
tableExists and createTable the same for now

____
From: jeff saremi 
Sent: Wednesday, April 26, 2017 8:31:23 AM
To: user@hbase.apache.org
Subject: Re: Baffling situation with tableExists and createTable

yes i had to go to zookeeper and manually delete a node under Tables

my question is why there are multiple standards in how tables are handled

The same logic that is in tableExists() should exist in createTable() and vice 
versa



From: ashish singhi 
Sent: Wednesday, April 26, 2017 2:29:49 AM
To: user@hbase.apache.org
Subject: RE: Baffling situation with tableExists and createTable

This is already handled through Procedure-V2 code in HBase 1.1+ versions.

Regards,
Ashish

-Original Message-
From: Anoop John [mailto:anoop.hb...@gmail.com]
Sent: 26 April 2017 15:31
To: user@hbase.apache.org
Subject: Re: Baffling situation with tableExists and createTable

Ur earlier attempt to create this table would have failed in btw..  So the 
status of the table in zk and in master may be diff.. Table exist might be 
checking one and the next steps of crate table another..
Sorry forgot that area of code.. But have seen this kind of situation.
  Not sure whether in some latest versions, these kind of probs are solved or 
not.

-Anoop-

On Wed, Apr 26, 2017 at 6:12 AM, Ted Yu  wrote:
> Which hbase release are you using ?
>
> Can you check master log to see if there is some clue w.r.t. LoadTest ?
>
> Using "hbase zkcli", you can inspect the znode status. Below is a sample:
>
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 2]
> ls /hbase-unsecure/table [hbase:meta, hbase:namespace,
> IntegrationTestBigLinkedList, datatsv, usertable, hbase:backup,
> TestTable, t2]
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 3]
> ls
> /hbase-unsecure/table/2
> Node does not exist: /hbase-unsecure/table/2
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 4]
> ls
> /hbase-unsecure/table/t2
> []
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 5]
> get
> /hbase-unsecure/table/t2
>  master:16000K  W , PBUF
> cZxid = 0x1000a7f01
> ctime = Mon Mar 27 16:50:52 UTC 2017
> mZxid = 0x1000a7f17
> mtime = Mon Mar 27 16:50:52 UTC 2017
> pZxid = 0x1000a7f01
> cversion = 0
> dataVersion = 2
>
> On Tue, Apr 25, 2017 at 4:09 PM, jeff saremi  wrote:
>
>> BTW on the page
>> http://localhost:16010/master-status#userTables
>> there is no sign of the supposedly existing table either
>>
>> 
>> From: jeff saremi 
>> Sent: Tuesday, April 25, 2017 4:05:56 PM
>> To: user@hbase.apache.org
>> Subject: Baffling situation with tableExists and createTable
>>
>> I have a super simple piece of code which tries to create a test
>> table if it does not exist
>>
>> calling admin.tableExists(TableName.valueOf(table)) returns false
>> causing the control to be passed to the line that creates it 
>> admin.createTable(tableDescriptor).
>> Then i get an exception that the table exists!
>>
>> Exception in thread "main" org.apache.hadoop.hbase.TableExistsException:
>> LoadTest
>>
>>
>> String table = config.tableName;
>> ...
>> Connection conn = ConnectionFactory.createConnection(hbaseconf);
>> Admin admin = conn.getAdmin();
>> if(!admin.tableExists(TableName.valueOf(table))) {
>> Log.info("table " + table + " does not exist. Creating it...");
>> HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.
>> valueOf(table));
>> tableDescriptor.addFamily(new HColumnDescriptor(config.FAMILY));
>> admin.createTable(tableDescriptor);
>> }
>>
>> Jeff
>>


Re: Baffling situation with tableExists and createTable

2017-04-26 Thread jeff saremi
yes i had to go to zookeeper and manually delete a node under Tables

my question is why there are multiple standards in how tables are handled

The same logic that is in tableExists() should exist in createTable() and vice 
versa



From: ashish singhi 
Sent: Wednesday, April 26, 2017 2:29:49 AM
To: user@hbase.apache.org
Subject: RE: Baffling situation with tableExists and createTable

This is already handled through Procedure-V2 code in HBase 1.1+ versions.

Regards,
Ashish

-Original Message-
From: Anoop John [mailto:anoop.hb...@gmail.com]
Sent: 26 April 2017 15:31
To: user@hbase.apache.org
Subject: Re: Baffling situation with tableExists and createTable

Ur earlier attempt to create this table would have failed in btw..  So the 
status of the table in zk and in master may be diff.. Table exist might be 
checking one and the next steps of crate table another..
Sorry forgot that area of code.. But have seen this kind of situation.
  Not sure whether in some latest versions, these kind of probs are solved or 
not.

-Anoop-

On Wed, Apr 26, 2017 at 6:12 AM, Ted Yu  wrote:
> Which hbase release are you using ?
>
> Can you check master log to see if there is some clue w.r.t. LoadTest ?
>
> Using "hbase zkcli", you can inspect the znode status. Below is a sample:
>
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 2]
> ls /hbase-unsecure/table [hbase:meta, hbase:namespace,
> IntegrationTestBigLinkedList, datatsv, usertable, hbase:backup,
> TestTable, t2]
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 3]
> ls
> /hbase-unsecure/table/2
> Node does not exist: /hbase-unsecure/table/2
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 4]
> ls
> /hbase-unsecure/table/t2
> []
> [zk: cn011.x.com:2181,cn013.x.com:2181,cn012.x.com:2181(CONNECTED) 5]
> get
> /hbase-unsecure/table/t2
>  master:16000K  W , PBUF
> cZxid = 0x1000a7f01
> ctime = Mon Mar 27 16:50:52 UTC 2017
> mZxid = 0x1000a7f17
> mtime = Mon Mar 27 16:50:52 UTC 2017
> pZxid = 0x1000a7f01
> cversion = 0
> dataVersion = 2
>
> On Tue, Apr 25, 2017 at 4:09 PM, jeff saremi  wrote:
>
>> BTW on the page
>> http://localhost:16010/master-status#userTables
>> there is no sign of the supposedly existing table either
>>
>> 
>> From: jeff saremi 
>> Sent: Tuesday, April 25, 2017 4:05:56 PM
>> To: user@hbase.apache.org
>> Subject: Baffling situation with tableExists and createTable
>>
>> I have a super simple piece of code which tries to create a test
>> table if it does not exist
>>
>> calling admin.tableExists(TableName.valueOf(table)) returns false
>> causing the control to be passed to the line that creates it 
>> admin.createTable(tableDescriptor).
>> Then i get an exception that the table exists!
>>
>> Exception in thread "main" org.apache.hadoop.hbase.TableExistsException:
>> LoadTest
>>
>>
>> String table = config.tableName;
>> ...
>> Connection conn = ConnectionFactory.createConnection(hbaseconf);
>> Admin admin = conn.getAdmin();
>> if(!admin.tableExists(TableName.valueOf(table))) {
>> Log.info("table " + table + " does not exist. Creating it...");
>> HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.
>> valueOf(table));
>> tableDescriptor.addFamily(new HColumnDescriptor(config.FAMILY));
>> admin.createTable(tableDescriptor);
>> }
>>
>> Jeff
>>


Re: Baffling situation with tableExists and createTable

2017-04-25 Thread jeff saremi
BTW on the page
http://localhost:16010/master-status#userTables
there is no sign of the supposedly existing table either


From: jeff saremi 
Sent: Tuesday, April 25, 2017 4:05:56 PM
To: user@hbase.apache.org
Subject: Baffling situation with tableExists and createTable

I have a super simple piece of code which tries to create a test table if it 
does not exist

calling admin.tableExists(TableName.valueOf(table)) returns false causing the 
control to be passed to the line that creates it 
admin.createTable(tableDescriptor). Then i get an exception that the table 
exists!

Exception in thread "main" org.apache.hadoop.hbase.TableExistsException: 
LoadTest


String table = config.tableName;
...
Connection conn = ConnectionFactory.createConnection(hbaseconf);
Admin admin = conn.getAdmin();
if(!admin.tableExists(TableName.valueOf(table))) {
Log.info("table " + table + " does not exist. Creating it...");
HTableDescriptor tableDescriptor = new 
HTableDescriptor(TableName.valueOf(table));
tableDescriptor.addFamily(new HColumnDescriptor(config.FAMILY));
admin.createTable(tableDescriptor);
}

Jeff


Baffling situation with tableExists and createTable

2017-04-25 Thread jeff saremi
I have a super simple piece of code which tries to create a test table if it 
does not exist

calling admin.tableExists(TableName.valueOf(table)) returns false causing the 
control to be passed to the line that creates it 
admin.createTable(tableDescriptor). Then i get an exception that the table 
exists!

Exception in thread "main" org.apache.hadoop.hbase.TableExistsException: 
LoadTest


String table = config.tableName;
...
Connection conn = ConnectionFactory.createConnection(hbaseconf);
Admin admin = conn.getAdmin();
if(!admin.tableExists(TableName.valueOf(table))) {
Log.info("table " + table + " does not exist. Creating it...");
HTableDescriptor tableDescriptor = new 
HTableDescriptor(TableName.valueOf(table));
tableDescriptor.addFamily(new HColumnDescriptor(config.FAMILY));
admin.createTable(tableDescriptor);
}

Jeff


RequestsPerSecond on master status page

2017-04-13 Thread jeff saremi
there is a metric that we can't find in hadoop metric2-compatible ones. It's 
called "Requests Per Second" and it shows up on the master status page under 
BaseStats for Region servers.

Is this also found in metrics2 metrics? under what name?
thanks
Jeff



Http 500 error when accessing master:16010/master-status - ESAPI

2017-04-13 Thread jeff saremi
I'm running a test instance of hbase 1.2.2 on my local machine (zk, master, 
regionserver,... all on one machine)

I looked at the esapi-2.1.0.1.jar file and it does not have a resource called: 
ESAPI.properties




org.owasp.esapi.errors.ConfigurationException: 
java.lang.reflect.InvocationTargetException SecurityConfiguration class 
(org.owasp.esapi.reference.DefaultSecurityConfiguration) CTOR threw exception.
at org.owasp.esapi.util.ObjFactory.make(ObjFactory.java:129)
at org.owasp.esapi.ESAPI.securityConfiguration(ESAPI.java:184)
at org.owasp.esapi.ESAPI.encoder(ESAPI.java:99)
at 
org.apache.hadoop.hbase.tmpl.common.TaskMonitorTmplImpl.encodeFilter(TaskMonitorTmplImpl.java:29)
at 
org.apache.hadoop.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:160)
at 
org.apache.hadoop.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:180)
at 
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:350)
at 
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:387)
...
at 
org.owasp.esapi.reference.DefaultSecurityConfiguration.getInstance(DefaultSecurityConfiguration.java:67)
... 43 more
Caused by: java.lang.IllegalArgumentException: Failed to load ESAPI.properties 
as a classloader resource.
at 
org.owasp.esapi.reference.DefaultSecurityConfiguration.loadConfigurationFromClasspath(DefaultSecurityConfiguration.java:682)
at 
org.owasp.esapi.reference.DefaultSecurityConfiguration.loadConfiguration(DefaultSecurityConfiguration.java:440)
... 46 more



Re: How to healthcheck a regionserver

2017-03-30 Thread jeff saremi
I couldn't find the relationship between Slider and Regionservers. Do you have 
a specific link?
What I'm looking for is something like this:

using the java native client:
- get a list of regions

- for each region, get the name of the region server

- somehow find a record belonging to that region (starting hash?)

- do a GET on one record of each region and report back


From: Ted Yu 
Sent: Wednesday, March 29, 2017 10:20:17 PM
To: user@hbase.apache.org
Subject: Re: How to healthcheck a regionserver

Have you heard of http://slider.incubator.apache.org/ (since you mentioned
Yarn) ?

Slider provides several methods of monitoring region server health.

FYI

On Wed, Mar 29, 2017 at 9:57 PM, jeff saremi  wrote:

> We have our region servers assigned by Yarn and occasionally we get a
> false list of servers in zookeeper.
>
> I'm writing a monitor program and I'd like to instead of pinging the
> server, perform a simple query to get the health of a specific region
> server. Is this possible? how? thanks
>
> Jeff
>
>


How to healthcheck a regionserver

2017-03-29 Thread jeff saremi
We have our region servers assigned by Yarn and occasionally we get a false 
list of servers in zookeeper.

I'm writing a monitor program and I'd like to instead of pinging the server, 
perform a simple query to get the health of a specific region server. Is this 
possible? how? thanks

Jeff



Re: Need guidance on Custom Compaction Policy

2017-03-22 Thread jeff saremi
@Ted
No i had not looked at DateTiered. I will. Thanks

@Vladimir
I will look at the example code you gave me. thanks







From: Ted Yu 
Sent: Wednesday, March 22, 2017 2:12:00 PM
To: user@hbase.apache.org
Subject: Re: Need guidance on Custom Compaction Policy

Have you taken look at http://hbase.apache.org/book.html#ops.date.tiered ?

Cheers

On Wed, Mar 22, 2017 at 12:29 PM, jeff saremi 
wrote:

> I mentioned some of this in another thread. We have a readonly database
> which get bulk loaded using HFiles.
> We want to keep only two versions/generations of data. Since the size of
> data is massive we need to delete the older generation.
>
> Since we write one single HBase for each region for each CF for each
> generation, could we just yank the older files using a separate standalone
> process? (sounds a little scary)
>
> If not, could we write a custom compactor? what's involved (some pointers
> please)? thanks
>
> Jeff
>


Need guidance on Custom Compaction Policy

2017-03-22 Thread jeff saremi
I mentioned some of this in another thread. We have a readonly database which 
get bulk loaded using HFiles.
We want to keep only two versions/generations of data. Since the size of data 
is massive we need to delete the older generation.

Since we write one single HBase for each region for each CF for each 
generation, could we just yank the older files using a separate standalone 
process? (sounds a little scary)

If not, could we write a custom compactor? what's involved (some pointers 
please)? thanks

Jeff


Re: Optimizations for a Read-only database

2017-03-17 Thread jeff saremi
Thanks Anoop. We'll be sure to keep it below 80%


From: Anoop John 
Sent: Friday, March 17, 2017 12:49:30 PM
To: user@hbase.apache.org
Subject: Re: Optimizations for a Read-only database

>From HBase server perspective we need restrict memstore size + block
cache size to be max 80%.  And memstore size alone can go down to 5%
if am not wrong.

We need to be careful when using G1 and giving this 80%.  The cache
will be mostly full as u said it will be read workload. Making ur
working size large.  The default value for Initial Heap Occupancy
Percentage (IHOP)  in G1 is 45%.   You can up this.  But having this
>80% am not sure whether really advisable.


-Anoop-

On Fri, Mar 17, 2017 at 11:50 PM, jeff saremi  wrote:
> I'll go through these recommendations, Kevin. Thanks a lot
>
> 
> From: Kevin O'Dell 
> Sent: Friday, March 17, 2017 10:55:49 AM
> To: user@hbase.apache.org
> Subject: Re: Optimizations for a Read-only database
>
> Hi Jeff,
>
>   You can definitely lower the memstore, the last time I looked there it
> had to be set to .1 at lowest it could go. I would not recommend disabling
> compactions ever, bad things will occur and it can end up impacting your
> read performance greatly.  I would recommend looking at the Intel G1GC
> <https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase>
> blog series to leverage really large chunks of block cache, and then using
> the remaining memory for off heap caching. You should make sure to turn on
> things like Snappy compression, FAST_DIFF for data block encoding, and with
> all the extra memory you will have available it might be worth using the
> ROW+COL bloom filters, though you should have very few underlying HFiles
> depending on how often you bulk load. I think short-circuit reads are on by
> default these days, but it will greatly speed up read performance if not
> already turned on. From an upfront design make sure you pre-split your
> tables so your first few bulk loads don't cause split and compaction
> pains.  Hope this helps!
>
> On Fri, Mar 17, 2017 at 1:32 PM, jeff saremi  wrote:
>
>> We're creating a readonly database and would like to know the recommended
>> optimizations we could do. We'd be loading data via direct write to HFiles.
>>
>> One thing i could immediately think of is to eliminate the memory for
>> Memstore. What is the minimum that we could get away with?
>>
>> How about disabling some regular operations to save CPU time. I think
>> Compaction is one of those we'd like to stop.
>>
>> thanks
>>
>> Jeff
>>
>
>
>
> --
> Kevin O'Dell
> Field Engineer
> 850-496-1298 | ke...@rocana.com
> @kevinrodell
> <http://www.rocana.com>


Re: Optimizations for a Read-only database

2017-03-17 Thread jeff saremi
I'll go through these recommendations, Kevin. Thanks a lot


From: Kevin O'Dell 
Sent: Friday, March 17, 2017 10:55:49 AM
To: user@hbase.apache.org
Subject: Re: Optimizations for a Read-only database

Hi Jeff,

  You can definitely lower the memstore, the last time I looked there it
had to be set to .1 at lowest it could go. I would not recommend disabling
compactions ever, bad things will occur and it can end up impacting your
read performance greatly.  I would recommend looking at the Intel G1GC
<https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase>
blog series to leverage really large chunks of block cache, and then using
the remaining memory for off heap caching. You should make sure to turn on
things like Snappy compression, FAST_DIFF for data block encoding, and with
all the extra memory you will have available it might be worth using the
ROW+COL bloom filters, though you should have very few underlying HFiles
depending on how often you bulk load. I think short-circuit reads are on by
default these days, but it will greatly speed up read performance if not
already turned on. From an upfront design make sure you pre-split your
tables so your first few bulk loads don't cause split and compaction
pains.  Hope this helps!

On Fri, Mar 17, 2017 at 1:32 PM, jeff saremi  wrote:

> We're creating a readonly database and would like to know the recommended
> optimizations we could do. We'd be loading data via direct write to HFiles.
>
> One thing i could immediately think of is to eliminate the memory for
> Memstore. What is the minimum that we could get away with?
>
> How about disabling some regular operations to save CPU time. I think
> Compaction is one of those we'd like to stop.
>
> thanks
>
> Jeff
>



--
Kevin O'Dell
Field Engineer
850-496-1298 | ke...@rocana.com
@kevinrodell
<http://www.rocana.com>


Optimizations for a Read-only database

2017-03-17 Thread jeff saremi
We're creating a readonly database and would like to know the recommended 
optimizations we could do. We'd be loading data via direct write to HFiles.

One thing i could immediately think of is to eliminate the memory for Memstore. 
What is the minimum that we could get away with?

How about disabling some regular operations to save CPU time. I think 
Compaction is one of those we'd like to stop.

thanks

Jeff


Re: Is deploying Region server as a YARN job a customary thing to do?

2017-03-06 Thread jeff saremi
Thanks Ted. We're not using Slider yet but we can maually submit Yarn jobs. I 
guess this is an acceptable alternative to having a static list of 
regionservers around.


From: Ted Yu 
Sent: Monday, March 6, 2017 10:14:47 AM
To: user@hbase.apache.org
Subject: Re: Is deploying Region server as a YARN job a customary thing to do?

Related:
https://slider.incubator.apache.org/

Consider polling Slider mailing list.

FYI

On Mon, Mar 6, 2017 at 10:08 AM, jeff saremi  wrote:

> We have the option of running our region server dynamically as a YARN job.
> I'd like to know if this is what everyone else does? Is this recommended at
> all?
>
> thanks
>
>


Is deploying Region server as a YARN job a customary thing to do?

2017-03-06 Thread jeff saremi
We have the option of running our region server dynamically as a YARN job. I'd 
like to know if this is what everyone else does? Is this recommended at all?

thanks



Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-03 Thread jeff saremi
Yu
Of the patches attached to HBASE-15160, do I need to apply all (v2, v3, ...) or 
just  HBASE-15160.patch ?
Also how would I know against what version this patch was created?
thanks





From: jeff saremi 
Sent: Friday, March 3, 2017 10:34:00 AM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

Thanks a lot Yu

These are truly the metrics we care about about at this point. It is sad to see 
that such important metrics were removed from the code.

I will try to apply your patch on my own to the version of HBase we have. We 
definitely need these.

Other solutions like HTrace are not as urgent as having these few metrics you 
talked about here. So if we can get these merged with the code we should be 
happy.



From: Yu Li 
Sent: Friday, March 3, 2017 9:54:29 AM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

Hi Jeff,

If the question is simply monitoring HDFS read/write latencies, please
refer to HBASE-15160 <https://issues.apache.org/jira/browse/HBASE-15160>,
there's a patch but not committed yet, and probably cannot apply cleanly on
current code base, but still some good reference IMHO, so JFYI.

To get an overview of how quickly the system could respond and what might
be the root cause of the spikes, we only need to monitor the
average/p99/p999 latency of below metrics (stages):
1. totalCallTime: time from request arriving at server to sending response
2. processCallTime: time for the server to process the call, regardless of
the time this call being queued
3. queueCallTime: time the call has been queued
4. HDFS read/pread/write time: time of HFile reading/writing, added in
HBASE-15160
5. WAL sync time: time of WAL sync to HDFS, critical path of writing request

However, for your original question, that to monitor the whole trace of a
single request, I'm afraid no mature solution for the time being just as
Stack mentioned.

Hope my answer helps (smile).

Best Regards,
Yu

On 4 March 2017 at 00:48, jeff saremi  wrote:

> anything would help. thanks
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Thursday, March 2, 2017 9:53:41 PM
> To: Hbase-User
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> On Thu, Mar 2, 2017 at 10:26 PM, jeff saremi 
> wrote:
>
> > So i'd like to come back to my original question on how to get about
> > separating the latency of HDFS from HBase.
> >
> >
> That is a simple question to which we do not have an answer unfortunately
> (we should). If interested, I could describe how you might do it. I don't
> think it would take much work.
>
> St.Ack
>
>
>
> > Is there a most appropriate log4j TRACE option that could print out this
> > information to the logs?
> > Thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, March 2, 2017 12:45:59 PM
> > To: Hbase-User
> > Subject: Re: Need guidance on getting detailed elapsed times in every
> > stage of processing a request
> >
> > Thanks so much for the advice! Looking forward to when Tracing gets
> picked
> > up again
> >
> > 
> > From: saint@gmail.com  on behalf of Stack <
> > st...@duboce.net>
> > Sent: Thursday, March 2, 2017 12:17:35 PM
> > To: Hbase-User
> > Subject: Re: Need guidance on getting detailed elapsed times in every
> > stage of processing a request
> >
> > HBase/HTrace integration once worked but has long since rotted.
> > Refactorings of internals without proper respect for trace connections is
> > the main culprit. Updates in htrace and hdfs that need attention
> > reconnecting spans, etc., is another. On top of this, zipkin project has
> > seen a burst of effort of late that would seem to offer much promise if
> > someone of us spent some time rejiggering how HTrace and Zipkin relate.
> >
> > I would not waste any time on trying to setup HTrace for HBase at least
> > until after HBASE-14451 goes in, an issue that has been put aside with a
> > while now. Sorry if you've burned time on this to date.
> >
> > Yours,
> > St.Ack
> >
> > On Thu, Mar 2, 2017 at 6:28 AM, jeff saremi 
> > wrote:
> >
> > > Where would i seek help for issues revolving around HTrace and zipkin?
> > > Here? Because I have configured everything the way documentation said
> > but i
> > > see nothing in the zipkin server or in the logs. nothing at all
>

Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-03 Thread jeff saremi
Thanks a lot Yu

These are truly the metrics we care about about at this point. It is sad to see 
that such important metrics were removed from the code.

I will try to apply your patch on my own to the version of HBase we have. We 
definitely need these.

Other solutions like HTrace are not as urgent as having these few metrics you 
talked about here. So if we can get these merged with the code we should be 
happy.



From: Yu Li 
Sent: Friday, March 3, 2017 9:54:29 AM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

Hi Jeff,

If the question is simply monitoring HDFS read/write latencies, please
refer to HBASE-15160 <https://issues.apache.org/jira/browse/HBASE-15160>,
there's a patch but not committed yet, and probably cannot apply cleanly on
current code base, but still some good reference IMHO, so JFYI.

To get an overview of how quickly the system could respond and what might
be the root cause of the spikes, we only need to monitor the
average/p99/p999 latency of below metrics (stages):
1. totalCallTime: time from request arriving at server to sending response
2. processCallTime: time for the server to process the call, regardless of
the time this call being queued
3. queueCallTime: time the call has been queued
4. HDFS read/pread/write time: time of HFile reading/writing, added in
HBASE-15160
5. WAL sync time: time of WAL sync to HDFS, critical path of writing request

However, for your original question, that to monitor the whole trace of a
single request, I'm afraid no mature solution for the time being just as
Stack mentioned.

Hope my answer helps (smile).

Best Regards,
Yu

On 4 March 2017 at 00:48, jeff saremi  wrote:

> anything would help. thanks
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Thursday, March 2, 2017 9:53:41 PM
> To: Hbase-User
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> On Thu, Mar 2, 2017 at 10:26 PM, jeff saremi 
> wrote:
>
> > So i'd like to come back to my original question on how to get about
> > separating the latency of HDFS from HBase.
> >
> >
> That is a simple question to which we do not have an answer unfortunately
> (we should). If interested, I could describe how you might do it. I don't
> think it would take much work.
>
> St.Ack
>
>
>
> > Is there a most appropriate log4j TRACE option that could print out this
> > information to the logs?
> > Thanks
> >
> > 
> > From: jeff saremi 
> > Sent: Thursday, March 2, 2017 12:45:59 PM
> > To: Hbase-User
> > Subject: Re: Need guidance on getting detailed elapsed times in every
> > stage of processing a request
> >
> > Thanks so much for the advice! Looking forward to when Tracing gets
> picked
> > up again
> >
> > 
> > From: saint@gmail.com  on behalf of Stack <
> > st...@duboce.net>
> > Sent: Thursday, March 2, 2017 12:17:35 PM
> > To: Hbase-User
> > Subject: Re: Need guidance on getting detailed elapsed times in every
> > stage of processing a request
> >
> > HBase/HTrace integration once worked but has long since rotted.
> > Refactorings of internals without proper respect for trace connections is
> > the main culprit. Updates in htrace and hdfs that need attention
> > reconnecting spans, etc., is another. On top of this, zipkin project has
> > seen a burst of effort of late that would seem to offer much promise if
> > someone of us spent some time rejiggering how HTrace and Zipkin relate.
> >
> > I would not waste any time on trying to setup HTrace for HBase at least
> > until after HBASE-14451 goes in, an issue that has been put aside with a
> > while now. Sorry if you've burned time on this to date.
> >
> > Yours,
> > St.Ack
> >
> > On Thu, Mar 2, 2017 at 6:28 AM, jeff saremi 
> > wrote:
> >
> > > Where would i seek help for issues revolving around HTrace and zipkin?
> > > Here? Because I have configured everything the way documentation said
> > but i
> > > see nothing in the zipkin server or in the logs. nothing at all
> > >
> > > 
> > > From: jeff saremi 
> > > Sent: Tuesday, February 28, 2017 12:52:32 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Need guidance on getting detailed elapsed times in every
> > > stage of processing a request
> > >
> > > No I had not. but it looks like what i needed

Re: The complete list of metrics

2017-03-03 Thread jeff saremi
Ted and Yu

Good suggestions. thank you


From: Yu Li 
Sent: Friday, March 3, 2017 10:02:31 AM
To: Hbase-User
Subject: Re: The complete list of metrics

It seems we're truly missing such docs...

Other than reading source code, an easier way is to check the
http://:/jmx
page of a running regionserver, you could see all metrics there.

Best Regards,
Yu

On 4 March 2017 at 01:54, jeff saremi  wrote:

> Is there a page listing all metrics in HBase?
>
> I have checked this:
>
> http://hbase.apache.org/book.html#hbase_metrics
>
>
> but it's only the most important ones not the whole thing
>


The complete list of metrics

2017-03-03 Thread jeff saremi
Is there a page listing all metrics in HBase?

I have checked this:

http://hbase.apache.org/book.html#hbase_metrics


but it's only the most important ones not the whole thing


Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-03 Thread jeff saremi
anything would help. thanks


From: saint@gmail.com  on behalf of Stack 

Sent: Thursday, March 2, 2017 9:53:41 PM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

On Thu, Mar 2, 2017 at 10:26 PM, jeff saremi  wrote:

> So i'd like to come back to my original question on how to get about
> separating the latency of HDFS from HBase.
>
>
That is a simple question to which we do not have an answer unfortunately
(we should). If interested, I could describe how you might do it. I don't
think it would take much work.

St.Ack



> Is there a most appropriate log4j TRACE option that could print out this
> information to the logs?
> Thanks
>
> ____
> From: jeff saremi 
> Sent: Thursday, March 2, 2017 12:45:59 PM
> To: Hbase-User
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> Thanks so much for the advice! Looking forward to when Tracing gets picked
> up again
>
> 
> From: saint@gmail.com  on behalf of Stack <
> st...@duboce.net>
> Sent: Thursday, March 2, 2017 12:17:35 PM
> To: Hbase-User
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> HBase/HTrace integration once worked but has long since rotted.
> Refactorings of internals without proper respect for trace connections is
> the main culprit. Updates in htrace and hdfs that need attention
> reconnecting spans, etc., is another. On top of this, zipkin project has
> seen a burst of effort of late that would seem to offer much promise if
> someone of us spent some time rejiggering how HTrace and Zipkin relate.
>
> I would not waste any time on trying to setup HTrace for HBase at least
> until after HBASE-14451 goes in, an issue that has been put aside with a
> while now. Sorry if you've burned time on this to date.
>
> Yours,
> St.Ack
>
> On Thu, Mar 2, 2017 at 6:28 AM, jeff saremi 
> wrote:
>
> > Where would i seek help for issues revolving around HTrace and zipkin?
> > Here? Because I have configured everything the way documentation said
> but i
> > see nothing in the zipkin server or in the logs. nothing at all
> >
> > 
> > From: jeff saremi 
> > Sent: Tuesday, February 28, 2017 12:52:32 PM
> > To: user@hbase.apache.org
> > Subject: Re: Need guidance on getting detailed elapsed times in every
> > stage of processing a request
> >
> > No I had not. but it looks like what i needed. Thanks Ted.
> >
> > I'll see if I have any more questions after reading this.
> >
> > 
> > From: Ted Yu 
> > Sent: Tuesday, February 28, 2017 12:47:08 PM
> > To: user@hbase.apache.org
> > Subject: Re: Need guidance on getting detailed elapsed times in every
> > stage of processing a request
> >
> > Have you looked at:
> > http://hbase.apache.org/book.html#tracing
> >
> > On Tue, Feb 28, 2017 at 12:37 PM, jeff saremi 
> > wrote:
> >
> > > I think we need to get detailed information from HBase RegionServer
> logs
> > > on how a request (read or write) is processed. Specifically speaking, i
> > > need to know of say 100 ms time spent in processing a write, how much
> of
> > it
> > > was spent waiting for the HDFS?
> > > What is the most efficient way of enabling this in log4j properties?
> Are
> > > there better mechanisms to get this information?
> > >
> > > If I can get this in the log, then I can process the logs offline or in
> > > neartime and mount some dashboards on the top.
> > >
> > > thanks
> > >
> > >
> > > Jeff
> > >
> >
>


Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-02 Thread jeff saremi
So i'd like to come back to my original question on how to get about separating 
the latency of HDFS from HBase.

Is there a most appropriate log4j TRACE option that could print out this 
information to the logs?
Thanks


From: jeff saremi 
Sent: Thursday, March 2, 2017 12:45:59 PM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

Thanks so much for the advice! Looking forward to when Tracing gets picked up 
again


From: saint@gmail.com  on behalf of Stack 

Sent: Thursday, March 2, 2017 12:17:35 PM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

HBase/HTrace integration once worked but has long since rotted.
Refactorings of internals without proper respect for trace connections is
the main culprit. Updates in htrace and hdfs that need attention
reconnecting spans, etc., is another. On top of this, zipkin project has
seen a burst of effort of late that would seem to offer much promise if
someone of us spent some time rejiggering how HTrace and Zipkin relate.

I would not waste any time on trying to setup HTrace for HBase at least
until after HBASE-14451 goes in, an issue that has been put aside with a
while now. Sorry if you've burned time on this to date.

Yours,
St.Ack

On Thu, Mar 2, 2017 at 6:28 AM, jeff saremi  wrote:

> Where would i seek help for issues revolving around HTrace and zipkin?
> Here? Because I have configured everything the way documentation said but i
> see nothing in the zipkin server or in the logs. nothing at all
>
> ____
> From: jeff saremi 
> Sent: Tuesday, February 28, 2017 12:52:32 PM
> To: user@hbase.apache.org
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> No I had not. but it looks like what i needed. Thanks Ted.
>
> I'll see if I have any more questions after reading this.
>
> 
> From: Ted Yu 
> Sent: Tuesday, February 28, 2017 12:47:08 PM
> To: user@hbase.apache.org
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> Have you looked at:
> http://hbase.apache.org/book.html#tracing
>
> On Tue, Feb 28, 2017 at 12:37 PM, jeff saremi 
> wrote:
>
> > I think we need to get detailed information from HBase RegionServer logs
> > on how a request (read or write) is processed. Specifically speaking, i
> > need to know of say 100 ms time spent in processing a write, how much of
> it
> > was spent waiting for the HDFS?
> > What is the most efficient way of enabling this in log4j properties? Are
> > there better mechanisms to get this information?
> >
> > If I can get this in the log, then I can process the logs offline or in
> > neartime and mount some dashboards on the top.
> >
> > thanks
> >
> >
> > Jeff
> >
>


Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-02 Thread jeff saremi
Thanks so much for the advice! Looking forward to when Tracing gets picked up 
again


From: saint@gmail.com  on behalf of Stack 

Sent: Thursday, March 2, 2017 12:17:35 PM
To: Hbase-User
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

HBase/HTrace integration once worked but has long since rotted.
Refactorings of internals without proper respect for trace connections is
the main culprit. Updates in htrace and hdfs that need attention
reconnecting spans, etc., is another. On top of this, zipkin project has
seen a burst of effort of late that would seem to offer much promise if
someone of us spent some time rejiggering how HTrace and Zipkin relate.

I would not waste any time on trying to setup HTrace for HBase at least
until after HBASE-14451 goes in, an issue that has been put aside with a
while now. Sorry if you've burned time on this to date.

Yours,
St.Ack

On Thu, Mar 2, 2017 at 6:28 AM, jeff saremi  wrote:

> Where would i seek help for issues revolving around HTrace and zipkin?
> Here? Because I have configured everything the way documentation said but i
> see nothing in the zipkin server or in the logs. nothing at all
>
> ____
> From: jeff saremi 
> Sent: Tuesday, February 28, 2017 12:52:32 PM
> To: user@hbase.apache.org
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> No I had not. but it looks like what i needed. Thanks Ted.
>
> I'll see if I have any more questions after reading this.
>
> 
> From: Ted Yu 
> Sent: Tuesday, February 28, 2017 12:47:08 PM
> To: user@hbase.apache.org
> Subject: Re: Need guidance on getting detailed elapsed times in every
> stage of processing a request
>
> Have you looked at:
> http://hbase.apache.org/book.html#tracing
>
> On Tue, Feb 28, 2017 at 12:37 PM, jeff saremi 
> wrote:
>
> > I think we need to get detailed information from HBase RegionServer logs
> > on how a request (read or write) is processed. Specifically speaking, i
> > need to know of say 100 ms time spent in processing a write, how much of
> it
> > was spent waiting for the HDFS?
> > What is the most efficient way of enabling this in log4j properties? Are
> > there better mechanisms to get this information?
> >
> > If I can get this in the log, then I can process the logs offline or in
> > neartime and mount some dashboards on the top.
> >
> > thanks
> >
> >
> > Jeff
> >
>


Re: On HBase Read Replicas

2017-03-02 Thread jeff saremi
Thank you Biju


From: Biju N 
Sent: Wednesday, March 1, 2017 2:11:15 PM
To: user@hbase.apache.org
Subject: Re: On HBase Read Replicas

>From the table definition. For e.g.
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html#getRegionReplication--

On Tue, Feb 28, 2017 at 3:30 PM, jeff saremi  wrote:

> Enis
>
> just one more question. How would i go about getting the count of the
> replica's for a table or columngroup? thanks
>
> 
> From: Enis Sรถztutar 
> Sent: Wednesday, February 22, 2017 1:38:41 PM
> To: hbase-user
> Subject: Re: On HBase Read Replicas
>
> If you are doing a get to a specific replica, it will execute as a read
> with retries to a single "copy". There will not be any backup / fallback
> RPCs to any other replica.
>
> Only in timeline consistency mode there will be fallback RPCs.
>
> Enis
>
> On Sun, Feb 19, 2017 at 9:43 PM, Anoop John  wrote:
>
> > Thanks Enis.. I was not knowing the way of setting replica id
> > specifically..  So what will happen if that said replica is down at
> > the read time?  Will that go to another replica?
> >
> > -Anoop-
> >
> > On Sat, Feb 18, 2017 at 3:34 AM, Enis Sรถztutar 
> wrote:
> > > You can do gets using two different "modes":
> > >  - Do a read with backup RPCs. In case, the algorithm that I have above
> > > will be used. 1 RPC to primary, and 2 more RPCs after primary timeouts.
> > >  - Do a read to a single replica. In this case, there is only 1 RPC
> that
> > > will happen to that given replica.
> > >
> > > Enis
> > >
> > > On Fri, Feb 17, 2017 at 12:03 PM, jeff saremi 
> > > wrote:
> > >
> > >> Enis
> > >>
> > >> Thanks for taking the time to reply
> > >>
> > >> So i thought that a read request is sent to all Replicas regardless.
> If
> > we
> > >> have the option of Sending to one, analyzing response, and then
> sending
> > to
> > >> another, this bodes well with our scenarios.
> > >>
> > >> Please confirm
> > >>
> > >> thanks
> > >>
> > >> 
> > >> From: Enis Sรถztutar 
> > >> Sent: Friday, February 17, 2017 11:38:42 AM
> > >> To: hbase-user
> > >> Subject: Re: On HBase Read Replicas
> > >>
> > >> You can use read-replicas to distribute the read-load if you are fine
> > with
> > >> stale reads. The read replicas normally have a "backup rpc" path,
> which
> > >> implements a logic like this:
> > >>  - Send the RPC to the primary replica
> > >>  - if no response for 100ms (or configured timeout), send RPCs to the
> > other
> > >> replicas
> > >>  - return the first non-exception response.
> > >>
> > >> However, there is also another feature for read replicas, where you
> can
> > >> indicate which exact replica_id you want to read from when you are
> > doing a
> > >> get. If you do this:
> > >> Get get = new Get(row);
> > >> get.setReplicaId(2);
> > >>
> > >> the Get RPC will only go to the replica_id=2. Note that if you have
> > region
> > >> replication = 3, then you will have regions with replica ids: {0, 1,
> 2}
> > >> where replica_id=0 is the primary.
> > >>
> > >> So you can do load-balancing with a get.setReplicaId(random() %
> > >> num_replicas) kind of pattern.
> > >>
> > >> Enis
> > >>
> > >>
> > >>
> > >> On Thu, Feb 16, 2017 at 9:41 AM, Anoop John 
> > wrote:
> > >>
> > >> > Never saw this kind of discussion.
> > >> >
> > >> > -Anoop-
> > >> >
> > >> > On Thu, Feb 16, 2017 at 10:13 PM, jeff saremi <
> jeffsar...@hotmail.com
> > >
> > >> > wrote:
> > >> > > Thanks Anoop.
> > >> > >
> > >> > > Understood.
> > >> > >
> > >> > > Have there been enhancement requests or discussions on load
> > balancing
> > >> by
> > >> > providing additional replicas in the past? Has anyone else come up
> > with
> > >> > anything on this?
> > >> > > thanks
> > >> > >
> > >> > > 
&

Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-01 Thread jeff saremi
Where would i seek help for issues revolving around HTrace and zipkin? Here? 
Because I have configured everything the way documentation said but i see 
nothing in the zipkin server or in the logs. nothing at all


From: jeff saremi 
Sent: Tuesday, February 28, 2017 12:52:32 PM
To: user@hbase.apache.org
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

No I had not. but it looks like what i needed. Thanks Ted.

I'll see if I have any more questions after reading this.


From: Ted Yu 
Sent: Tuesday, February 28, 2017 12:47:08 PM
To: user@hbase.apache.org
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

Have you looked at:
http://hbase.apache.org/book.html#tracing

On Tue, Feb 28, 2017 at 12:37 PM, jeff saremi 
wrote:

> I think we need to get detailed information from HBase RegionServer logs
> on how a request (read or write) is processed. Specifically speaking, i
> need to know of say 100 ms time spent in processing a write, how much of it
> was spent waiting for the HDFS?
> What is the most efficient way of enabling this in log4j properties? Are
> there better mechanisms to get this information?
>
> If I can get this in the log, then I can process the logs offline or in
> neartime and mount some dashboards on the top.
>
> thanks
>
>
> Jeff
>


Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-02-28 Thread jeff saremi
No I had not. but it looks like what i needed. Thanks Ted.

I'll see if I have any more questions after reading this.


From: Ted Yu 
Sent: Tuesday, February 28, 2017 12:47:08 PM
To: user@hbase.apache.org
Subject: Re: Need guidance on getting detailed elapsed times in every stage of 
processing a request

Have you looked at:
http://hbase.apache.org/book.html#tracing

On Tue, Feb 28, 2017 at 12:37 PM, jeff saremi 
wrote:

> I think we need to get detailed information from HBase RegionServer logs
> on how a request (read or write) is processed. Specifically speaking, i
> need to know of say 100 ms time spent in processing a write, how much of it
> was spent waiting for the HDFS?
> What is the most efficient way of enabling this in log4j properties? Are
> there better mechanisms to get this information?
>
> If I can get this in the log, then I can process the logs offline or in
> neartime and mount some dashboards on the top.
>
> thanks
>
>
> Jeff
>


Need guidance on getting detailed elapsed times in every stage of processing a request

2017-02-28 Thread jeff saremi
I think we need to get detailed information from HBase RegionServer logs on how 
a request (read or write) is processed. Specifically speaking, i need to know 
of say 100 ms time spent in processing a write, how much of it was spent 
waiting for the HDFS?
What is the most efficient way of enabling this in log4j properties? Are there 
better mechanisms to get this information?

If I can get this in the log, then I can process the logs offline or in 
neartime and mount some dashboards on the top.

thanks


Jeff


Re: On HBase Read Replicas

2017-02-28 Thread jeff saremi
Enis

just one more question. How would i go about getting the count of the replica's 
for a table or columngroup? thanks


From: Enis Sรถztutar 
Sent: Wednesday, February 22, 2017 1:38:41 PM
To: hbase-user
Subject: Re: On HBase Read Replicas

If you are doing a get to a specific replica, it will execute as a read
with retries to a single "copy". There will not be any backup / fallback
RPCs to any other replica.

Only in timeline consistency mode there will be fallback RPCs.

Enis

On Sun, Feb 19, 2017 at 9:43 PM, Anoop John  wrote:

> Thanks Enis.. I was not knowing the way of setting replica id
> specifically..  So what will happen if that said replica is down at
> the read time?  Will that go to another replica?
>
> -Anoop-
>
> On Sat, Feb 18, 2017 at 3:34 AM, Enis Sรถztutar  wrote:
> > You can do gets using two different "modes":
> >  - Do a read with backup RPCs. In case, the algorithm that I have above
> > will be used. 1 RPC to primary, and 2 more RPCs after primary timeouts.
> >  - Do a read to a single replica. In this case, there is only 1 RPC that
> > will happen to that given replica.
> >
> > Enis
> >
> > On Fri, Feb 17, 2017 at 12:03 PM, jeff saremi 
> > wrote:
> >
> >> Enis
> >>
> >> Thanks for taking the time to reply
> >>
> >> So i thought that a read request is sent to all Replicas regardless. If
> we
> >> have the option of Sending to one, analyzing response, and then sending
> to
> >> another, this bodes well with our scenarios.
> >>
> >> Please confirm
> >>
> >> thanks
> >>
> >> 
> >> From: Enis Sรถztutar 
> >> Sent: Friday, February 17, 2017 11:38:42 AM
> >> To: hbase-user
> >> Subject: Re: On HBase Read Replicas
> >>
> >> You can use read-replicas to distribute the read-load if you are fine
> with
> >> stale reads. The read replicas normally have a "backup rpc" path, which
> >> implements a logic like this:
> >>  - Send the RPC to the primary replica
> >>  - if no response for 100ms (or configured timeout), send RPCs to the
> other
> >> replicas
> >>  - return the first non-exception response.
> >>
> >> However, there is also another feature for read replicas, where you can
> >> indicate which exact replica_id you want to read from when you are
> doing a
> >> get. If you do this:
> >> Get get = new Get(row);
> >> get.setReplicaId(2);
> >>
> >> the Get RPC will only go to the replica_id=2. Note that if you have
> region
> >> replication = 3, then you will have regions with replica ids: {0, 1, 2}
> >> where replica_id=0 is the primary.
> >>
> >> So you can do load-balancing with a get.setReplicaId(random() %
> >> num_replicas) kind of pattern.
> >>
> >> Enis
> >>
> >>
> >>
> >> On Thu, Feb 16, 2017 at 9:41 AM, Anoop John 
> wrote:
> >>
> >> > Never saw this kind of discussion.
> >> >
> >> > -Anoop-
> >> >
> >> > On Thu, Feb 16, 2017 at 10:13 PM, jeff saremi  >
> >> > wrote:
> >> > > Thanks Anoop.
> >> > >
> >> > > Understood.
> >> > >
> >> > > Have there been enhancement requests or discussions on load
> balancing
> >> by
> >> > providing additional replicas in the past? Has anyone else come up
> with
> >> > anything on this?
> >> > > thanks
> >> > >
> >> > > 
> >> > > From: Anoop John 
> >> > > Sent: Thursday, February 16, 2017 2:35:48 AM
> >> > > To: user@hbase.apache.org
> >> > > Subject: Re: On HBase Read Replicas
> >> > >
> >> > > The region replica feature came in so as to reduce the MTTR and so
> >> > > increase the data availability.  When the master region containing
> RS
> >> > > dies, the clients can read from the secondary regions.  But to keep
> >> > > one thing in mind that this data from secondary regions will be bit
> >> > > out of sync as the replica is eventual consistent.   Because of this
> >> > > said reason,  change client so as to share the load across diff RSs
> >> > > might be tough.
> >> > >
> >> > > -Anoop-
> >> > >
> >> > > On Sun, Feb 1

Re: On HBase Read Replicas

2017-02-17 Thread jeff saremi
Enis

Thanks for taking the time to reply

So i thought that a read request is sent to all Replicas regardless. If we have 
the option of Sending to one, analyzing response, and then sending to another, 
this bodes well with our scenarios.

Please confirm

thanks


From: Enis Sรถztutar 
Sent: Friday, February 17, 2017 11:38:42 AM
To: hbase-user
Subject: Re: On HBase Read Replicas

You can use read-replicas to distribute the read-load if you are fine with
stale reads. The read replicas normally have a "backup rpc" path, which
implements a logic like this:
 - Send the RPC to the primary replica
 - if no response for 100ms (or configured timeout), send RPCs to the other
replicas
 - return the first non-exception response.

However, there is also another feature for read replicas, where you can
indicate which exact replica_id you want to read from when you are doing a
get. If you do this:
Get get = new Get(row);
get.setReplicaId(2);

the Get RPC will only go to the replica_id=2. Note that if you have region
replication = 3, then you will have regions with replica ids: {0, 1, 2}
where replica_id=0 is the primary.

So you can do load-balancing with a get.setReplicaId(random() %
num_replicas) kind of pattern.

Enis



On Thu, Feb 16, 2017 at 9:41 AM, Anoop John  wrote:

> Never saw this kind of discussion.
>
> -Anoop-
>
> On Thu, Feb 16, 2017 at 10:13 PM, jeff saremi 
> wrote:
> > Thanks Anoop.
> >
> > Understood.
> >
> > Have there been enhancement requests or discussions on load balancing by
> providing additional replicas in the past? Has anyone else come up with
> anything on this?
> > thanks
> >
> > 
> > From: Anoop John 
> > Sent: Thursday, February 16, 2017 2:35:48 AM
> > To: user@hbase.apache.org
> > Subject: Re: On HBase Read Replicas
> >
> > The region replica feature came in so as to reduce the MTTR and so
> > increase the data availability.  When the master region containing RS
> > dies, the clients can read from the secondary regions.  But to keep
> > one thing in mind that this data from secondary regions will be bit
> > out of sync as the replica is eventual consistent.   Because of this
> > said reason,  change client so as to share the load across diff RSs
> > might be tough.
> >
> > -Anoop-
> >
> > On Sun, Feb 12, 2017 at 8:13 AM, jeff saremi 
> wrote:
> >> Yes indeed. thank you very much Ted
> >>
> >> 
> >> From: Ted Yu 
> >> Sent: Saturday, February 11, 2017 3:40:50 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: On HBase Read Replicas
> >>
> >> Please take a look at the design doc attached to
> >> https://issues.apache.org/jira/browse/HBASE-10070.
> >>
> >> Your first question would be answered by that document.
> >>
> >> Cheers
> >>
> >> On Sat, Feb 11, 2017 at 2:06 PM, jeff saremi 
> wrote:
> >>
> >>> The first time I heard replicas in HBase the following thought
> immediately
> >>> came to my mind:
> >>> To alleviate the load in read-heavy clusters, one could assign Region
> >>> servers to be replicas of others so that the load is distributed and
> there
> >>> is less pressure on the main RS.
> >>>
> >>> Just 2 days ago a colleague quoted a paragraph from HBase manual that
> >>> contradicted this completely. Apparently, the replicas do not help
> with the
> >>> load but they actually contribute to more traffic on the network and
> on the
> >>> underlying file system
> >>>
> >>> Would someone be able to give us some insight on why anyone would want
> >>> replicas?
> >>>
> >>> And also could one easily change this behavior in the HBase native Java
> >>> client to support what I had been imagining as the concept for
> replicas?
> >>>
> >>>
> >>> thanks
> >>>
>


HBase Metrics are unnecessarily flattened

2017-02-16 Thread jeff saremi
This is specifically true for Regions metrics such as
Namespace_hbase_table_meta_region_1657623790_metric_storeCount=1,
Even though the hadoop metrics2 framework allows for "tags' (which are called 
dimensions everywhere else) HBase does not take full advantage of that and 
instead spews metric names with dimension values embedded in them.
If I'm reading the above metric name correctly, it should have been like the 
following:

metric name: storeCount

metric value: 1

dimension (name=value): namespace=hbase

dimension (name=value): table=meta

dimension (name=value): region=1657623790

This way i could slice and dice these records however I wanted whereas now I 
have to either live with this limtation or write my own code to break up these 
metrics into proper representation




Re: On HBase Read Replicas

2017-02-16 Thread jeff saremi
Thanks Anoop.

Understood.

Have there been enhancement requests or discussions on load balancing by 
providing additional replicas in the past? Has anyone else come up with 
anything on this?
thanks


From: Anoop John 
Sent: Thursday, February 16, 2017 2:35:48 AM
To: user@hbase.apache.org
Subject: Re: On HBase Read Replicas

The region replica feature came in so as to reduce the MTTR and so
increase the data availability.  When the master region containing RS
dies, the clients can read from the secondary regions.  But to keep
one thing in mind that this data from secondary regions will be bit
out of sync as the replica is eventual consistent.   Because of this
said reason,  change client so as to share the load across diff RSs
might be tough.

-Anoop-

On Sun, Feb 12, 2017 at 8:13 AM, jeff saremi  wrote:
> Yes indeed. thank you very much Ted
>
> 
> From: Ted Yu 
> Sent: Saturday, February 11, 2017 3:40:50 PM
> To: user@hbase.apache.org
> Subject: Re: On HBase Read Replicas
>
> Please take a look at the design doc attached to
> https://issues.apache.org/jira/browse/HBASE-10070.
>
> Your first question would be answered by that document.
>
> Cheers
>
> On Sat, Feb 11, 2017 at 2:06 PM, jeff saremi  wrote:
>
>> The first time I heard replicas in HBase the following thought immediately
>> came to my mind:
>> To alleviate the load in read-heavy clusters, one could assign Region
>> servers to be replicas of others so that the load is distributed and there
>> is less pressure on the main RS.
>>
>> Just 2 days ago a colleague quoted a paragraph from HBase manual that
>> contradicted this completely. Apparently, the replicas do not help with the
>> load but they actually contribute to more traffic on the network and on the
>> underlying file system
>>
>> Would someone be able to give us some insight on why anyone would want
>> replicas?
>>
>> And also could one easily change this behavior in the HBase native Java
>> client to support what I had been imagining as the concept for replicas?
>>
>>
>> thanks
>>


Certain metric groups are missing when using a custom sink in place of FileSink

2017-02-14 Thread jeff saremi
When we use a FileSink to log metrics from HBase we can see more names than 
when we use a custom metric sink. Is there something undocumented that we're 
missing?
For instance when using FileSink, we can see WAL, RegionServer, 
Replication,Server , and Regions in the log.
However if we use a custom sink (which happens to be writing whatever it gets 
to the log as well) we can only see WAL and RegionServer.

These are the config lines we're using:

hbase.sink.mdmsink.class=mycomp.metrics.MdmSink
hbase.sink.mdmsink.server=myserver:8880
*.period=10



Re: Metric sink client keeps crashing the components with SSL-related exception

2017-02-14 Thread jeff saremi
This turned out to be a result of multiple versions of apache http components.
I made sure my code used the same version of dependencies as the HBase instance 
we had.


From: jeff saremi 
Sent: Tuesday, February 14, 2017 12:10:58 PM
To: user@hbase.apache.org
Subject: Metric sink client keeps crashing the components with SSL-related 
exception

This is really not an HBase issue but rather hadoop metrics issue (and it may 
not be that either) but I'll just post here to see if someone knows why this is 
happening.
I've created  a new Metrics2 Sink which will send the metrics over http to some 
server.

I can test this locally with no issues. However as soon as we place this in 
HBase, we get the following error:


2017-02-14 12:03:01,407 ERROR [main] master.HMasterCommandLine: Master exiting

java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMaster.

at 
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2422)
...

Caused by: java.lang.NoSuchFieldError: INSTANCE

at 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.(SSLConnectionSocketFactory.java:144)

at 
org.apache.http.impl.conn.BasicHttpClientConnectionManager.getDefaultRegistry(BasicHttpClientConnectionManager.java:117)

at 
org.apache.http.impl.conn.BasicHttpClientConnectionManager.(BasicHttpClientConnectionManager.java:161)

at 
com.bing.hadoop.metrics.MdmServiceClient.(MdmServiceClient.java:30)

at com.bing.hadoop.metrics.MdmSink.init(MdmSink.java:37)

I have not even set the URL yet and my url is plain http and not https. I don't 
want SSL at all in this case.
Here's my construction code:

public MdmServiceClient(String server, int port) throws Exception {
HttpClientConnectionManager connectionManager = new 
BasicHttpClientConnectionManager();
_httpClient = 
HttpClientBuilder.create().setConnectionManager(connectionManager).build();
_uri = new 
URIBuilder().setScheme("http").setHost(server).setPort(port).setPath("/metrics").build();
}

Line 30 refers to new BasicHttp...

This is line 37 in MdmSink:

_mdmClient = new MdmServiceClient(server, port);

thanks


Metric sink client keeps crashing the components with SSL-related exception

2017-02-14 Thread jeff saremi
This is really not an HBase issue but rather hadoop metrics issue (and it may 
not be that either) but I'll just post here to see if someone knows why this is 
happening.
I've created  a new Metrics2 Sink which will send the metrics over http to some 
server.

I can test this locally with no issues. However as soon as we place this in 
HBase, we get the following error:


2017-02-14 12:03:01,407 ERROR [main] master.HMasterCommandLine: Master exiting

java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMaster.

at 
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2422)
...

Caused by: java.lang.NoSuchFieldError: INSTANCE

at 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.(SSLConnectionSocketFactory.java:144)

at 
org.apache.http.impl.conn.BasicHttpClientConnectionManager.getDefaultRegistry(BasicHttpClientConnectionManager.java:117)

at 
org.apache.http.impl.conn.BasicHttpClientConnectionManager.(BasicHttpClientConnectionManager.java:161)

at 
com.bing.hadoop.metrics.MdmServiceClient.(MdmServiceClient.java:30)

at com.bing.hadoop.metrics.MdmSink.init(MdmSink.java:37)

I have not even set the URL yet and my url is plain http and not https. I don't 
want SSL at all in this case.
Here's my construction code:

public MdmServiceClient(String server, int port) throws Exception {
HttpClientConnectionManager connectionManager = new 
BasicHttpClientConnectionManager();
_httpClient = 
HttpClientBuilder.create().setConnectionManager(connectionManager).build();
_uri = new 
URIBuilder().setScheme("http").setHost(server).setPort(port).setPath("/metrics").build();
}

Line 30 refers to new BasicHttp...

This is line 37 in MdmSink:

_mdmClient = new MdmServiceClient(server, port);

thanks


Re: On HBase Read Replicas

2017-02-11 Thread jeff saremi
Yes indeed. thank you very much Ted


From: Ted Yu 
Sent: Saturday, February 11, 2017 3:40:50 PM
To: user@hbase.apache.org
Subject: Re: On HBase Read Replicas

Please take a look at the design doc attached to
https://issues.apache.org/jira/browse/HBASE-10070.

Your first question would be answered by that document.

Cheers

On Sat, Feb 11, 2017 at 2:06 PM, jeff saremi  wrote:

> The first time I heard replicas in HBase the following thought immediately
> came to my mind:
> To alleviate the load in read-heavy clusters, one could assign Region
> servers to be replicas of others so that the load is distributed and there
> is less pressure on the main RS.
>
> Just 2 days ago a colleague quoted a paragraph from HBase manual that
> contradicted this completely. Apparently, the replicas do not help with the
> load but they actually contribute to more traffic on the network and on the
> underlying file system
>
> Would someone be able to give us some insight on why anyone would want
> replicas?
>
> And also could one easily change this behavior in the HBase native Java
> client to support what I had been imagining as the concept for replicas?
>
>
> thanks
>


On HBase Read Replicas

2017-02-11 Thread jeff saremi
The first time I heard replicas in HBase the following thought immediately came 
to my mind:
To alleviate the load in read-heavy clusters, one could assign Region servers 
to be replicas of others so that the load is distributed and there is less 
pressure on the main RS.

Just 2 days ago a colleague quoted a paragraph from HBase manual that 
contradicted this completely. Apparently, the replicas do not help with the 
load but they actually contribute to more traffic on the network and on the 
underlying file system

Would someone be able to give us some insight on why anyone would want replicas?

And also could one easily change this behavior in the HBase native Java client 
to support what I had been imagining as the concept for replicas?


thanks


Re: Performance: HBase Native Java Client versus Thrift Java client

2017-01-30 Thread jeff saremi
Got it.

It looks like there's some benefit in having a Thrift server to serve one-off 
requests, for which creation of a single Hbase client may be a little too 
expensive.



From: Josh Elser 
Sent: Monday, January 30, 2017 1:05 PM
To: user@hbase.apache.org
Subject: Re: Performance: HBase Native Java Client versus Thrift Java client

Right you are, Jeff. The Master is just coordinating where these Regions
are hosted (among other things...).

Clients would be caching the mapping of which RegionServers is hosting a
Region. Typically this information does not change often. In the case
that it does change, the client can often react very quickly to this.

Unless you have some use-case where your Java client is not long-lived,
the Thrift server would always be overhead on top of what the HBase
client itself would do. The Thrift server is just wrapping the HBase
client you'd use in your own Java application and exposing it in a
different manner.

jeff saremi wrote:
> Thanks Josh
>
> I made a mistake in mentioning the master. It looks like the client contacts 
> the Region Server which holds the Meta table.
>
> Of a query into the meta table to get the RegionServer for a given key, it 
> wasn't clear to me what was being cached on the client? Also on the same 
> topic, is a Thrift server assisting this process in any shape or form? to 
> make its presence necessary?
>
> Is there anything else that the Thrift server might be contributing to 
> positively?
>
>
>
> 
> From: Josh Elser
> Sent: Monday, January 30, 2017 11:57 AM
> To: user@hbase.apache.org
> Subject: Re: Performance: HBase Native Java Client versus Thrift Java client
>
> Would recommend that you brush up on your understanding of the HBase
> architecture.
>
> Clients do not receive table data from the HBase Master at any point.
> This is purely a RegionServer operation.
>
> http://hbase.apache.org/book.html#_architecture
Apache HBase (tm) Reference 
Guide<http://hbase.apache.org/book.html#_architecture>
hbase.apache.org
Supported. In the context of Apache HBase, /supported/ means that HBase is 
designed to work in the way described, and deviation from the defined behavior 
or ...


>
> jeff saremi wrote:
>> I'd like to understand if there are any considerations on why one would use 
>> thrift versus the direct client?
>>
>> I was told that Thrift server allow key-caching which would result in faster 
>> key-to-regionserver queries as opposed to getting that from the Hbase master 
>> nodes. It would also alleviate the load on the master.
>>
>> At the same time, we know if going through Thrift would add to the latency 
>> since it's an indirect way of getting at data.
>>
>>
>> thanks
>>
>> Jeff
>>
>


Re: Performance: HBase Native Java Client versus Thrift Java client

2017-01-30 Thread jeff saremi
Thanks Josh

I made a mistake in mentioning the master. It looks like the client contacts 
the Region Server which holds the Meta table.

Of a query into the meta table to get the RegionServer for a given key, it 
wasn't clear to me what was being cached on the client? Also on the same topic, 
is a Thrift server assisting this process in any shape or form? to make its 
presence necessary?

Is there anything else that the Thrift server might be contributing to 
positively?




From: Josh Elser 
Sent: Monday, January 30, 2017 11:57 AM
To: user@hbase.apache.org
Subject: Re: Performance: HBase Native Java Client versus Thrift Java client

Would recommend that you brush up on your understanding of the HBase
architecture.

Clients do not receive table data from the HBase Master at any point.
This is purely a RegionServer operation.

http://hbase.apache.org/book.html#_architecture

jeff saremi wrote:
> I'd like to understand if there are any considerations on why one would use 
> thrift versus the direct client?
>
> I was told that Thrift server allow key-caching which would result in faster 
> key-to-regionserver queries as opposed to getting that from the Hbase master 
> nodes. It would also alleviate the load on the master.
>
> At the same time, we know if going through Thrift would add to the latency 
> since it's an indirect way of getting at data.
>
>
> thanks
>
> Jeff
>


Re: HBase Metrics and time series systems

2017-01-30 Thread jeff saremi
Thank you Otis!

Yes we were still weighing options on this



From: Otis Gospodnetic 
Sent: Monday, January 30, 2017 11:27 AM
To: user@hbase.apache.org
Subject: Re: HBase Metrics and time series systems

Hi Jeff,

Not sure if you are searching for tools or solutions, but if the latter is
of interest, see https://sematext.com/spm/integrations/hbase-monitoring/
Hbase Monitoring - 
Sematext<https://sematext.com/spm/integrations/hbase-monitoring/>
sematext.com
SPM Integrations Hbase Monitoring HBase Monitoring, Anomaly Detection and 
Alerting Previous Next Monitor all key HBase metrics and stats, like: Requests, 
Locality ...


(incidentally, powered by HBase :))
You don't need to export metrics - the SPM agent will collect them for
you.  Handy if you want to correlate your HBase and other metrics with your
logs in https://sematext.com/logsene
Logsene - ELK Logging as a Service - Sematext<https://sematext.com/logsene>
sematext.com
ELK as a Service. With Elasticsearch API and integrated Kibana, Logsene is the 
first true Hosted ELK Stack.Can't send logs to the Cloud? Ask us about Logsene 
On ...



Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Jan 24, 2017 at 11:17 PM, jeff saremi 
wrote:

> Has anyone found a re-usable way of exporting HBase metrics to Grafana or
> Kibana via tools such as Collectd, Telegraf to influxdb, opentsdb,
> Elasticsearch?
> thanks
>
> Jeff
>


Performance: HBase Native Java Client versus Thrift Java client

2017-01-30 Thread jeff saremi
I'd like to understand if there are any considerations on why one would use 
thrift versus the direct client?

I was told that Thrift server allow key-caching which would result in faster 
key-to-regionserver queries as opposed to getting that from the Hbase master 
nodes. It would also alleviate the load on the master.

At the same time, we know if going through Thrift would add to the latency 
since it's an indirect way of getting at data.


thanks

Jeff


Re: Writing/Importing large number of records into HBase

2017-01-28 Thread jeff saremi
No iI had not.I will take a look. Thanks Ted



From: Ted Yu 
Sent: Friday, January 27, 2017 7:41 PM
To: user@hbase.apache.org
Subject: Re: Writing/Importing large number of records into HBase

Have you looked at hbase-spark module (currently in master branch) ?

See 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/datasources/AvroSource.scala
and 
hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/DefaultSourceSuite.scala
for examples.

There may be other options.

FYI

On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi  wrote:

> Hi
> I'm seeking some pointers/guidance on what we could do to insert billions
> of records that we already have in avro files in hadoop into HBase.
>
> I read some articles online and one of them recommended using HFile
> format. I took a cursory look at the documentation for that. Given the
> complexity of that I think that may be the last resort we want to pursue.
> Unless some library is out there that easily helps us write our files into
> that format. I didn't see any.
> Assuming that the Hbase native client may be our best bet, is there any
> advice around pre-paritioning our records or such techniques that we could
> use?
> thanks
>
> Jeff
>


Re: Writing/Importing large number of records into HBase

2017-01-28 Thread jeff saremi
Thank you Chetan



From: Chetan Khatri 
Sent: Friday, January 27, 2017 8:15 PM
To: user@hbase.apache.org
Subject: Re: Writing/Importing large number of records into HBase

Oh. Sorry.
https://github.com/apache/hbase/blob/master/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkPutExample.java
[https://avatars1.githubusercontent.com/u/47359?v=3&s=400]<https://github.com/apache/hbase/blob/master/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkPutExample.java>

hbase/JavaHBaseBulkPutExample.java at master ยท apache 
...<https://github.com/apache/hbase/blob/master/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkPutExample.java>
github.com
hbase - Mirror of Apache HBase ... Switch branches/tags. Branches; Tags



On Sat, Jan 28, 2017 at 9:27 AM, Ted Yu  wrote:

> Chetan:
> The link you posted was from personal repo.
>
> There hasn't been commit for at least a year.
>
> Meanwhile, the hbase-spark module in hbase repo is being actively
> maintained.
>
> FYI
>
> > On Jan 27, 2017, at 7:47 PM, Chetan Khatri 
> wrote:
> >
> > Adding to @Ted Check Bulk Put Example -
> > https://github.com/tmalaska/SparkOnHBase/blob/master/src/
[https://avatars3.githubusercontent.com/u/1946016?v=3&s=400]<https://github.com/tmalaska/SparkOnHBase/blob/master/src/>

tmalaska/SparkOnHBase<https://github.com/tmalaska/SparkOnHBase/blob/master/src/>
github.com
Contribute to SparkOnHBase development by creating an account on GitHub.


> main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/
> HBaseBulkPutExampleFromFile.scala
> >
> >> On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu  wrote:
> >>
> >> Have you looked at hbase-spark module (currently in master branch) ?
> >>
> >> See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/
> >> example/datasources/AvroSource.scala
> >> and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/
> >> DefaultSourceSuite.scala
> >> for examples.
> >>
> >> There may be other options.
> >>
> >> FYI
> >>
> >> On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi 
> >> wrote:
> >>
> >>> Hi
> >>> I'm seeking some pointers/guidance on what we could do to insert
> billions
> >>> of records that we already have in avro files in hadoop into HBase.
> >>>
> >>> I read some articles online and one of them recommended using HFile
> >>> format. I took a cursory look at the documentation for that. Given the
> >>> complexity of that I think that may be the last resort we want to
> pursue.
> >>> Unless some library is out there that easily helps us write our files
> >> into
> >>> that format. I didn't see any.
> >>> Assuming that the Hbase native client may be our best bet, is there any
> >>> advice around pre-paritioning our records or such techniques that we
> >> could
> >>> use?
> >>> thanks
> >>>
> >>> Jeff
> >>
>


Writing/Importing large number of records into HBase

2017-01-27 Thread jeff saremi
Hi
I'm seeking some pointers/guidance on what we could do to insert billions of 
records that we already have in avro files in hadoop into HBase.

I read some articles online and one of them recommended using HFile format. I 
took a cursory look at the documentation for that. Given the complexity of that 
I think that may be the last resort we want to pursue. Unless some library is 
out there that easily helps us write our files into that format. I didn't see 
any.
Assuming that the Hbase native client may be our best bet, is there any advice 
around pre-paritioning our records or such techniques that we could use?
thanks

Jeff


Re: HBase Metrics and time series systems

2017-01-24 Thread jeff saremi
excellent. thanks Jeremy



From: Jeremy Carroll 
Sent: Tuesday, January 24, 2017 8:50 PM
To: user@hbase.apache.org
Subject: Re: HBase Metrics and time series systems

Tcollector has modules for all hbase and hadoop stats.

On Tue, Jan 24, 2017 at 8:49 PM jeff saremi  wrote:

> it looks like Jolokia is the required adapter for JMX based systems.
>
>
> ____
> From: jeff saremi 
> Sent: Tuesday, January 24, 2017 8:17 PM
> To: user@hbase.apache.org
> Subject: HBase Metrics and time series systems
>
> Has anyone found a re-usable way of exporting HBase metrics to Grafana or
> Kibana via tools such as Collectd, Telegraf to influxdb, opentsdb,
> Elasticsearch?
> thanks
>
> Jeff
>


Re: HBase Metrics and time series systems

2017-01-24 Thread jeff saremi
it looks like Jolokia is the required adapter for JMX based systems.



From: jeff saremi 
Sent: Tuesday, January 24, 2017 8:17 PM
To: user@hbase.apache.org
Subject: HBase Metrics and time series systems

Has anyone found a re-usable way of exporting HBase metrics to Grafana or 
Kibana via tools such as Collectd, Telegraf to influxdb, opentsdb, 
Elasticsearch?
thanks

Jeff


HBase Metrics and time series systems

2017-01-24 Thread jeff saremi
Has anyone found a re-usable way of exporting HBase metrics to Grafana or 
Kibana via tools such as Collectd, Telegraf to influxdb, opentsdb, 
Elasticsearch?
thanks

Jeff


Re: HBase Thrift server and Consitency.Timeline

2017-01-24 Thread jeff saremi
Thanks Ted!



From: Ted Yu 
Sent: Tuesday, January 24, 2017 1:51 PM
To: user@hbase.apache.org
Subject: Re: HBase Thrift server and Consitency.Timeline

Looks like there is no such support at the moment.

Logged HBASE-17523

FYI

On Tue, Jan 24, 2017 at 1:42 PM, jeff saremi  wrote:

> We are enabling reader replicas. We're also using Thrift endpoints for our
> HBase. How could we enable Consistency.Timeline for Thrift server?
> thanks
>
> Jeff
>


HBase Thrift server and Consitency.Timeline

2017-01-24 Thread jeff saremi
We are enabling reader replicas. We're also using Thrift endpoints for our 
HBase. How could we enable Consistency.Timeline for Thrift server?
thanks

Jeff


Re: HBase Thrift Client for C#: OutofMemoryException

2017-01-13 Thread jeff saremi
here you go:

https://issues.apache.org/jira/browse/HBASE-17467



From: Ted Yu 
Sent: Friday, January 13, 2017 4:02 PM
To: user@hbase.apache.org
Subject: Re: HBase Thrift Client for C#: OutofMemoryException

bq. i can create a pull request for them

That would be wonderful.

Please log a JIRA, polish the C# example and attach to the JIRA.

In hbase, we're not at the stage of reviewing / committing pull request yet.

On Fri, Jan 13, 2017 at 3:45 PM, jeff saremi  wrote:

> sorry Ted for wasting your time
>
> It happened that i was using the wrong port for this
>
> What a misleading error for an error so common! but that's Thrift
>
>
> On another note i now have converted the Democlient to from hbase-examples
> to C# and i have the generated files as well. If there's interest i can
> create a pull request for them
>
>
>
> 
> From: jeff saremi 
> Sent: Friday, January 13, 2017 2:11 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
> Thanks Ted.
>
> I looked at this. We didn't know that a multipexing protocol existed until
> you mentioned it to us.
> We're using a stock thrift server that is shipped with hbase.
> If you perhaps point us to where we should be checking I'd be appreciative.
>
>
>
> 
> From: Ted Yu 
> Sent: Friday, January 13, 2017 1:34 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
> I haven't touched C# for a decade.
>
> After a brief search, I found:
> http://stackoverflow.com/questions/17843749/apache-
[https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-i...@2.png?v=73d79a89bded]<http://stackoverflow.com/questions/17843749/apache->

Apache Thrift Client Run time issues in 
c#<http://stackoverflow.com/questions/17843749/apache->
stackoverflow.com
I am working on a client - server application written in C# that is built using 
the Apache THRIFT RPC framework. We have created several IDL files (.thrift 
files) with service definitions. These


> thrift-client-run-time-issues-in-c-sharp
> [https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-i...@2.png?v=
> 73d79a89bded]<http://stackoverflow.com/questions/
> 17843749/apache-thrift-client-run-time-issues-in-c-sharp>
>
> Apache Thrift Client Run time issues in c# - Stack Overflow<
> http://stackoverflow.com/questions/17843749/apache-
> thrift-client-run-time-issues-in-c-sharp>
> stackoverflow.com
> I am working on a client - server application written in C# that is built
> using the Apache THRIFT RPC framework. We have created several IDL files
> (.thrift ...
>
>
>
> Can you take a look at the answer to see if it is relevant ?
>
> Cheers
>
> On Fri, Jan 13, 2017 at 11:10 AM, jeff saremi 
> wrote:
>
> > The result is the same. OutofMemoryException.
> >
> > I again ran my C++ client to make sure nothing wierd is going on server
> > side.
> > I found the thrift compiler here: http://www-us.apache.org/dist/
> Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/
> >
> www-us.apache.org
> Apache Software Foundation Distribution Directory. The directories linked
> below contain current software releases from the Apache Software Foundation
> projects.
>
>
> Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/
> >
> Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/
> >
> www-us.apache.org
> Apache Software Foundation Distribution Directory. The directories linked
> below contain current software releases from the Apache Software Foundation
> projects.
>
>
> www-us.apache.org
> Apache Software Foundation Distribution Directory. The directories linked
> below contain current software releases from the Apache Software Foundation
> projects.
>
>
> > thrift/0.9.3/
> >
> > I regenerated all files and deleted all old ones.
> >
> > Here's a sample of a generated file for you to see that 0.9.3 is used:
> >
> >
> > /**
> >  * Autogenerated by Thrift Compiler (0.9.3)
> >  *
> >  * DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
> >  *  @generated
> >  */
> > using System;
> > using System.Collections;
> > using System.Collections.Generic;
> > using System.Text;
> > using System.IO;
> > using Thrift;
> > using Thrift.Collections;
> > using System.Runtime.Serialization;
> > using Thrift.Protocol;
> > using Thrift.Transport;
> >
> > public partial class Hbase {
> >   publ

Re: HBase Thrift Client for C#: OutofMemoryException

2017-01-13 Thread jeff saremi
sorry Ted for wasting your time

It happened that i was using the wrong port for this

What a misleading error for an error so common! but that's Thrift


On another note i now have converted the Democlient to from hbase-examples to 
C# and i have the generated files as well. If there's interest i can create a 
pull request for them



____
From: jeff saremi 
Sent: Friday, January 13, 2017 2:11 PM
To: user@hbase.apache.org
Subject: Re: HBase Thrift Client for C#: OutofMemoryException

Thanks Ted.

I looked at this. We didn't know that a multipexing protocol existed until you 
mentioned it to us.
We're using a stock thrift server that is shipped with hbase.
If you perhaps point us to where we should be checking I'd be appreciative.




From: Ted Yu 
Sent: Friday, January 13, 2017 1:34 PM
To: user@hbase.apache.org
Subject: Re: HBase Thrift Client for C#: OutofMemoryException

I haven't touched C# for a decade.

After a brief search, I found:
http://stackoverflow.com/questions/17843749/apache-thrift-client-run-time-issues-in-c-sharp
[https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-i...@2.png?v=73d79a89bded]<http://stackoverflow.com/questions/17843749/apache-thrift-client-run-time-issues-in-c-sharp>

Apache Thrift Client Run time issues in c# - Stack 
Overflow<http://stackoverflow.com/questions/17843749/apache-thrift-client-run-time-issues-in-c-sharp>
stackoverflow.com
I am working on a client - server application written in C# that is built using 
the Apache THRIFT RPC framework. We have created several IDL files (.thrift ...



Can you take a look at the answer to see if it is relevant ?

Cheers

On Fri, Jan 13, 2017 at 11:10 AM, jeff saremi 
wrote:

> The result is the same. OutofMemoryException.
>
> I again ran my C++ client to make sure nothing wierd is going on server
> side.
> I found the thrift compiler here: http://www-us.apache.org/dist/
Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/>
www-us.apache.org
Apache Software Foundation Distribution Directory. The directories linked below 
contain current software releases from the Apache Software Foundation projects.


Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/>
Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/>
www-us.apache.org
Apache Software Foundation Distribution Directory. The directories linked below 
contain current software releases from the Apache Software Foundation projects.


www-us.apache.org
Apache Software Foundation Distribution Directory. The directories linked below 
contain current software releases from the Apache Software Foundation projects.


> thrift/0.9.3/
>
> I regenerated all files and deleted all old ones.
>
> Here's a sample of a generated file for you to see that 0.9.3 is used:
>
>
> /**
>  * Autogenerated by Thrift Compiler (0.9.3)
>  *
>  * DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
>  *  @generated
>  */
> using System;
> using System.Collections;
> using System.Collections.Generic;
> using System.Text;
> using System.IO;
> using Thrift;
> using Thrift.Collections;
> using System.Runtime.Serialization;
> using Thrift.Protocol;
> using Thrift.Transport;
>
> public partial class Hbase {
>   public interface Iface {
>
>
>
> 
> From: jeff saremi 
> Sent: Friday, January 13, 2017 10:39 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
>
> oh i see. sure i'll do that and report back.
>
>
> 
> From: Ted Yu 
> Sent: Friday, January 13, 2017 10:32 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
> I am not sure about compatibility between thrift 0.10.0 and 0.9.3
>
> Is it possible for you to locate 0.9.3 thrift compiler and try again ?
>
> On Fri, Jan 13, 2017 at 10:27 AM, jeff saremi 
> wrote:
>
> > I used the following thrift compiler. I did not see any mentions of
> > versions.
> > http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10.
Apache Download Mirrors<http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10>
www.apache.org
Home page of The Apache Software Foundation


Apache Download Mirrors<http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10>
Apache Download Mirrors<http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10>
www.apache.org
Home page of The Apache Software Foundation


www.apache.org<http://www.apache.org>
Home page of The Apache Software Foundation


> 0/thrift-0.10.0.exe
> Apache Download Mirrors<http://www.apache.org/
> dyn/closer.cgi?path=/thrift/0.10.0/thrift-0.10.0.ex

Re: HBase Thrift Client for C#: OutofMemoryException

2017-01-13 Thread jeff saremi
Thanks Ted.

I looked at this. We didn't know that a multipexing protocol existed until you 
mentioned it to us.
We're using a stock thrift server that is shipped with hbase.
If you perhaps point us to where we should be checking I'd be appreciative.




From: Ted Yu 
Sent: Friday, January 13, 2017 1:34 PM
To: user@hbase.apache.org
Subject: Re: HBase Thrift Client for C#: OutofMemoryException

I haven't touched C# for a decade.

After a brief search, I found:
http://stackoverflow.com/questions/17843749/apache-thrift-client-run-time-issues-in-c-sharp

Can you take a look at the answer to see if it is relevant ?

Cheers

On Fri, Jan 13, 2017 at 11:10 AM, jeff saremi 
wrote:

> The result is the same. OutofMemoryException.
>
> I again ran my C++ client to make sure nothing wierd is going on server
> side.
> I found the thrift compiler here: http://www-us.apache.org/dist/
Index of /dist - Apache Software Foundation<http://www-us.apache.org/dist/>
www-us.apache.org
Apache Software Foundation Distribution Directory. The directories linked below 
contain current software releases from the Apache Software Foundation projects.


> thrift/0.9.3/
>
> I regenerated all files and deleted all old ones.
>
> Here's a sample of a generated file for you to see that 0.9.3 is used:
>
>
> /**
>  * Autogenerated by Thrift Compiler (0.9.3)
>  *
>  * DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
>  *  @generated
>  */
> using System;
> using System.Collections;
> using System.Collections.Generic;
> using System.Text;
> using System.IO;
> using Thrift;
> using Thrift.Collections;
> using System.Runtime.Serialization;
> using Thrift.Protocol;
> using Thrift.Transport;
>
> public partial class Hbase {
>   public interface Iface {
>
>
>
> 
> From: jeff saremi 
> Sent: Friday, January 13, 2017 10:39 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
>
> oh i see. sure i'll do that and report back.
>
>
> 
> From: Ted Yu 
> Sent: Friday, January 13, 2017 10:32 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
> I am not sure about compatibility between thrift 0.10.0 and 0.9.3
>
> Is it possible for you to locate 0.9.3 thrift compiler and try again ?
>
> On Fri, Jan 13, 2017 at 10:27 AM, jeff saremi 
> wrote:
>
> > I used the following thrift compiler. I did not see any mentions of
> > versions.
> > http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10.
Apache Download Mirrors<http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10>
www.apache.org
Home page of The Apache Software Foundation


> 0/thrift-0.10.0.exe
> Apache Download Mirrors<http://www.apache.org/
> dyn/closer.cgi?path=/thrift/0.10.0/thrift-0.10.0.exe>
> www.apache.org<http://www.apache.org>
> Home page of The Apache Software Foundation
>
>
> >
> >
> > Here's the stack trace. I am running ANYCPU platform:
> >
> >
> > private  string ReadStringBody(int size)
> > {
> > byte[] buf = new byte[size];
> >
> >
> > size = 1213486160
> >
> >
> >
> >at Thrift.Protocol.TBinaryProtocol.ReadStringBody(Int32 size) in
> > D:\repos\thrift\lib\csharp\src\Protocol\TBinaryProtocol.cs:line 383
> >at Thrift.Protocol.TBinaryProtocol.ReadMessageBegin() in
> > D:\repos\thrift\lib\csharp\src\Protocol\TBinaryProtocol.cs:line 239
> >at Hbase.Client.recv_getTableNames() in
> D:\Projects\HBaseThrift\Hbase.cs:line
> > 1418
> >at Hbase.Client.getTableNames() in D:\Projects\HBaseThrift\Hbase.
> cs:line
> > 1391
> >at DemoClient.Main(String[] args) in D:\Projects\HBaseThriftClient\
> DemoClient.cs:line
> > 97
> >at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly,
> > String[] args)
> >at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence
> > assemblySecurity, String[] args)
> >at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
> >at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
> >at System.Threading.ExecutionContext.RunInternal(ExecutionContext
> > executionContext, ContextCallback callback, Object state, Boolean
> > preserveSyncCtx)
> >at System.Threading.ExecutionContext.Run(ExecutionContext
> > executionContext, ContextCallback callback, Object state, Boolean
> > preserveSyncCtx)
> >at System.Threading.ExecutionContext.Run(ExecutionContext
> > executionContext

Re: HBase Thrift Client for C#: OutofMemoryException

2017-01-13 Thread jeff saremi
The result is the same. OutofMemoryException.

I again ran my C++ client to make sure nothing wierd is going on server side.
I found the thrift compiler here: http://www-us.apache.org/dist/thrift/0.9.3/

I regenerated all files and deleted all old ones.

Here's a sample of a generated file for you to see that 0.9.3 is used:


/**
 * Autogenerated by Thrift Compiler (0.9.3)
 *
 * DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
 *  @generated
 */
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.IO;
using Thrift;
using Thrift.Collections;
using System.Runtime.Serialization;
using Thrift.Protocol;
using Thrift.Transport;

public partial class Hbase {
  public interface Iface {




From: jeff saremi 
Sent: Friday, January 13, 2017 10:39 AM
To: user@hbase.apache.org
Subject: Re: HBase Thrift Client for C#: OutofMemoryException


oh i see. sure i'll do that and report back.



From: Ted Yu 
Sent: Friday, January 13, 2017 10:32 AM
To: user@hbase.apache.org
Subject: Re: HBase Thrift Client for C#: OutofMemoryException

I am not sure about compatibility between thrift 0.10.0 and 0.9.3

Is it possible for you to locate 0.9.3 thrift compiler and try again ?

On Fri, Jan 13, 2017 at 10:27 AM, jeff saremi 
wrote:

> I used the following thrift compiler. I did not see any mentions of
> versions.
> http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10.0/thrift-0.10.0.exe
Apache Download 
Mirrors<http://www.apache.org/dyn/closer.cgi?path=/thrift/0.10.0/thrift-0.10.0.exe>
www.apache.org
Home page of The Apache Software Foundation


>
>
> Here's the stack trace. I am running ANYCPU platform:
>
>
> private  string ReadStringBody(int size)
> {
> byte[] buf = new byte[size];
>
>
> size = 1213486160
>
>
>
>at Thrift.Protocol.TBinaryProtocol.ReadStringBody(Int32 size) in
> D:\repos\thrift\lib\csharp\src\Protocol\TBinaryProtocol.cs:line 383
>at Thrift.Protocol.TBinaryProtocol.ReadMessageBegin() in
> D:\repos\thrift\lib\csharp\src\Protocol\TBinaryProtocol.cs:line 239
>at Hbase.Client.recv_getTableNames() in 
> D:\Projects\HBaseThrift\Hbase.cs:line
> 1418
>at Hbase.Client.getTableNames() in D:\Projects\HBaseThrift\Hbase.cs:line
> 1391
>at DemoClient.Main(String[] args) in 
> D:\Projects\HBaseThriftClient\DemoClient.cs:line
> 97
>at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly,
> String[] args)
>at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence
> assemblySecurity, String[] args)
>at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
>at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
>at System.Threading.ExecutionContext.RunInternal(ExecutionContext
> executionContext, ContextCallback callback, Object state, Boolean
> preserveSyncCtx)
>at System.Threading.ExecutionContext.Run(ExecutionContext
> executionContext, ContextCallback callback, Object state, Boolean
> preserveSyncCtx)
>at System.Threading.ExecutionContext.Run(ExecutionContext
> executionContext, ContextCallback callback, Object state)
>at System.Threading.ThreadHelper.ThreadStart()
>
>
> 
> From: Ted Yu 
> Sent: Friday, January 13, 2017 10:00 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Thrift Client for C#: OutofMemoryException
>
> Which thrift version did you use to generate c# code ?
>
> hbase uses 0.9.3
>
> Can you pastebin the whole stack trace for the exception ?
>
> I assume you run your code on 64-bit machine.
>
> Cheers
>
> On Fri, Jan 13, 2017 at 9:53 AM, jeff saremi 
> wrote:
>
> > I have cloned the latest thrift and hbase code. Used thrift generator to
> > generate c# code from hbase-thrift\src\main\resources\org\apache\hadoop\
> hbase\thrift.
> > Then created a single VS solution with the generated code, the thrift lib
> > for c# (thrift\lib\csharp\src\Thrift.csproj) and i also added a
> > DemoClient (hbase-examples) converted from c++ to c#. When I run that I
> > keep getting OutofMemoryException with not a lot of other useful
> > information. I have done the same process for C++ and the democlient code
> > from hbase-examples runs with no issues at all.
> >
> >
> > here's the client code:
> >
> > TTransport socket = new TSocket(args[0], Convert.ToInt32(args[1]));
> > TTransport transport = new TBufferedTransport((TStreamTransport)socket);
> > TProtocol protocol = new TBinaryProtocol(transport);
> > Hbase.Client client = new Hbase.Client(protocol);
> > List tables = client.getTableNames();
> >
> > The last line is where the exception is thrown. thanks
> >
>


  1   2   >