Re: Regionservers going down during compaction

2015-07-17 Thread lars hofhansl
Yep. Agreed. Local NUMA zones are usually not what you want for something like 
HBase.
-- Lars
  From: Vladimir Rodionov 
 To: "user@hbase.apache.org" ; lars hofhansl 
 
 Sent: Thursday, July 16, 2015 12:01 PM
 Subject: Re: Regionservers going down during compaction
   
Ankit,
First of all, you need to make sure that your systems do not swap (they do, I 
am pretty sure)There are two reasons, why system go to swap:
1. default setting for 'vm.swappiness' (60) and high memory pressure (not your 
case)2. No high memory pressure, but not enough free memory in a particular 
zone of allocation when 'vm.swappiness=0'   
I think, 2. is what you have
Your boxes have at least 2 CPU (nodes), probably - 4. It means that Linux 
divides overall RAM in 2-4 zones (16-32GB in size). It is possible that one of 
the zones (where compaction thread runs ?) during compaction is out of free 
pages and file-backed pages and Linux starts swapping (or even kills process)
To verify this you will need:1. Confirm that you have si/so events in vmstat 
(swapping) during compaction2. dig into ' /proc/pid/numa_maps ' and verify that 
you have uneven memory allocation between zones.
Your output will be something similar to:ad3e000 default anon=13240527 
dirty=13223315 
  swapcache=3440324 active=13202235 N0=7865429 N1=5375098There are two zones: 
N0 and N1. Memory is in number of pages (4K).If you confirm both 1 and 2 you 
should change NUMA kernel memory allocation policy from default (local) to 
interleaved all.
cmd="/usr/bin/numactl --interleave all $cmd"Check 'man numactl' how to run 
application with different NUMA policies.
-Vlad



On Wed, Jul 15, 2015 at 8:23 AM, lars hofhansl  wrote:

We're running with fine 31g heap (31 to be able to make use of compressed oops) 
after a lot of tuning. Maybe your pattern is different...?

Or... Since it is ParNew on a 1GB only small gen taking that much time... Maybe 
you ran into this: http://www.evanjones.ca/jvm-mmap-pause.html?

-- Lars
      From: Vladimir Rodionov 
 To: "user@hbase.apache.org" 
 Sent: Monday, July 13, 2015 10:16 AM
 Subject: Re: Regionservers going down during compaction

Ankit,

-Xms31744m -Xmx31744 seems too high.

You run on SSD's and you probably do not need large (on heap) block cache.
Large heap + major compaction can result in bad GC behavior and cluster
instability. Its very hard to tune. Unless you 100% sure that 30GB is
absolutely necessary I would suggest reducing heap.

-Vlad



On Mon, Jul 13, 2015 at 8:28 AM, Dave Latham  wrote:

> What JDK are you using?  I've seen such behavior when a machine was
> swapping.  Can you tell if there was any swap in use?
>
> On Mon, Jul 13, 2015 at 3:24 AM, Ankit Singhal
>  wrote:
> > Hi Team,
> >
> > We are seeing regionservers getting down whenever major compaction is
> > triggered on table(8.5TB size).
> > Can anybody help with the resolution or give pointers to resolve this.
> >
> > Below are the current observation:-
> >
> > Above behaviour is seen even when compaction is run on compacted tables.
> > Load average seems to be normal and under 4(for 32 core machine).
> > Except bad datanode and JVM pause errors, No other error is seen in the
> > logs.
> >
> >
> > Cluster configuration:-
> > 79 Nodes
> > 32 core machine,64GB RAM ,1.2TB SSDs
> >
> > JVM OPTs:-
> >
> > export HBASE_OPTS="$HBASE_OPTS  -XX:+UseParNewGC
> -XX:+PerfDisableSharedMem
> > -XX:+UseConcMarkSweepGC -XX:ErrorFile={{log_dir}}/hs_err_pid%p.log"
> > $HBASE_REGIONSERVER_OPTS -XX:+PerfDisableSharedMem -XX:PermSize=128m
> > -XX:MaxPermSize=256m -XX:+UseCMSInitiatingOccupancyOnly  -Xmn1024m
> > -XX:CMSInitiatingOccupancyFraction=70  -Xms31744m -Xmx31744
> >
> > HBase-site.xml:-
> > PFA
> >
> > GC logs:-
> >
> > 2015-07-12T23:15:29.485-0700: 9260.407: [GC2015-07-12T23:15:29.485-0700:
> > 9260.407: [ParNew: 839872K->947K(943744K), 0.0324180 secs]
> > 1431555K->592630K(32401024K), 0.0325930 secs] [Times: user=0.72 sys=0.00,
> > real=0.03 secs]
> >
> > 2015-07-12T23:15:30.532-0700: 9261.454: [GC2015-07-12T23:15:30.532-0700:
> > 9261.454: [ParNew: 839859K->1017K(943744K), 31.0324970 secs]
> > 1431542K->592702K(32401024K), 31.0326950 secs] [Times: user=0.89
> sys=0.02,
> > real=31.03 secs]
> >
> > 2015-07-12T23:16:02.490-0700: 9293.412: [GC2015-07-12T23:16:02.490-0700:
> > 9293.412: [ParNew: 839929K->1100K(943744K), 0.0319400 secs]
> > 1431614K->592785K(32401024K), 0.0321580 secs] [Times: user=0.71 sys=0.00,
> > real=0.03 secs]
> >
> > 2015-07-12T23:16:03.747-0700: 9294.669: [GC2015-07-12T23:16:03.747-0700:
> > 9294.669: [ParNew: 840012K->894K(943744K), 0.0304370 secs]
> > 1431697K->592579K(32401024K), 0.0305330 secs] [Times: user=0.67 sys=0.01,
> > real=0.03 secs]
> >
> > Heap
> >
> >  par new generation  total 943744K, used 76608K [0x7f54d400,
> > 0x7f551400, 0x7f551400)
> >
> >  eden space 838912K,  9% used [0x7f54d400, 0x7f54d89f0728,
> > 0x7f550734)
> >
> >  from space 104832K,  0% used [0x7f550734, 0x

How can I tell when a client is connected and ready to go?

2015-07-17 Thread Dmitry Minkovsky
I am using HBase 1.1.0 and create HBase client connections like this:

try {
Configuration config = HBaseConfiguration.create();
connection = ConnectionFactory.createConnection(config);
}
catch (IOException e) {
logger.info("Error {}", e);
}

However, I noticed that when ZooKeeper and HBase Master/Region servers are
down, the catch clause is never reached. The code runs as if the connection
is made, and connection.isClosed() returns false. So:

- What happens when I use this connection? Are the RPC calls buffered and
retried at some interval in background?

- How can I tell whether the connection is actually ready for use? Should I
try to do this, or just use it? My inclination is not to run a client
service without the underlying datastores all actually ready, but perhaps
this is not the idea with the new HBase API.

- Under what conditions is the IOException catch actually reached?


My goal is to be able to fail fast. Is that the wrong idea with this client?


Thanks,
Dmitry


Re: How can I tell when a client is connected and ready to go?

2015-07-17 Thread Vladimir Rodionov
>> - Under what conditions is the IOException catch actually reached?
ConnectionFactory just instantiates and initiates connection
implementation, I do not think it does anything beyond that (the only way
to get cluster status is to connect to master, use HBaseAdmin instance, and
wait until rpcTimeout expires if Master is not available) No magic of
instant cluster status notification yet. You may try reducing rpcTimeout
hbase.rpc.timeout from default 6  and hbase.client.operation.timeout
from 120 to some lower values :

Configuration conf = HBaseConfiguration.create();
conf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 2000);
conf.setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 2000);

Connection conn = ConnectionFactory.createConnection(conf);

To get a connection which fails faster than default one (20 min :))

-Vlad


On Fri, Jul 17, 2015 at 6:36 AM, Dmitry Minkovsky 
wrote:

> I am using HBase 1.1.0 and create HBase client connections like this:
>
> try {
> Configuration config = HBaseConfiguration.create();
> connection = ConnectionFactory.createConnection(config);
> }
> catch (IOException e) {
> logger.info("Error {}", e);
> }
>
> However, I noticed that when ZooKeeper and HBase Master/Region servers are
> down, the catch clause is never reached. The code runs as if the connection
> is made, and connection.isClosed() returns false. So:
>
> - What happens when I use this connection? Are the RPC calls buffered and
> retried at some interval in background?
>
> - How can I tell whether the connection is actually ready for use? Should I
> try to do this, or just use it? My inclination is not to run a client
> service without the underlying datastores all actually ready, but perhaps
> this is not the idea with the new HBase API.
>
> - Under what conditions is the IOException catch actually reached?
>
>
> My goal is to be able to fail fast. Is that the wrong idea with this
> client?
>
>
> Thanks,
> Dmitry
>


Fwd: Hbase Fully distribution mode - Cannot resolve regionserver hostname

2015-07-17 Thread Dima Spivak
+user@, dev@ to bcc

Pubudu,

I think you'll get more help on an issue like this on the users list.

-Dima

-- Forwarded message --
From: Ted Yu 
Date: Fri, Jul 17, 2015 at 5:40 AM
Subject: Re: Hbase Fully distribution mode - Cannot resolve regionserver
hostname
To: "d...@hbase.apache.org" 


Have you looked at
HBASE-12954 Ability impaired using HBase on multihomed hosts

Cheers

On Fri, Jul 17, 2015 at 3:32 AM, Pubudu Gunatilaka 
wrote:

> Hi Devs,
>
> I am trying to run Hbase in fully distributed mode. So first I started
> master node. Then I started regionserver. But I am getting following
error.
>
> 2015-07-17 05:12:02,260 WARN  [pod-35:16020.activeMasterManager]
> master.AssignmentManager: Failed assignment of hbase:meta,,1.1588230740 to
> pod-36,16020,1437109916288, trying to assign elsewhere instead; try=1 of
10
> java.net.UnknownHostException: unknown host: pod-36
> at
>
>
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.(RpcClientImpl.java:296)
> at
>
>
org.apache.hadoop.hbase.ipc.RpcClientImpl.createConnection(RpcClientImpl.java:129)
> at
>
>
org.apache.hadoop.hbase.ipc.RpcClientImpl.getConnection(RpcClientImpl.java:1278)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1152)
> at
>
>
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at
>
>
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
> at
>
>
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:21711)
> at
>
>
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:712)
> at
>
>
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2101)
> at
>
>
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1567)
> at
>
>
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1545)
> at
>
>
org.apache.hadoop.hbase.master.AssignmentManager.assignMeta(AssignmentManager.java:2630)
> at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:820)
> at
>
>
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:685)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:165)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1428)
> at java.lang.Thread.run(Thread.java:745)
>
>
> This error occurs as the master node cannot resolve the hostname of the
> regionserver. According to the requirement of mine, I want to automate the
> hbase installation with 1 master node and 4 regionservers. But at the
> moment I don't have any possibility of updating master's /etc/hosts file.
> From the hbase configuration side, will I be able to solve the problem?
>
> If the hbase can communicate with IP addresses or use the hostname, which
> is already sent by regionserver to the master without updating /etc/hosts
> file this issue can be solved. Similar approach can be found in hadoop as
> well. Once the datanode connects to the namenode, it can communicate with
> the datanode without updating /etc/hosts file.
>
> Any help on this is appreciated.
>
> Thank you!
>
> --
>
> *Pubudu Gunatilaka*
>


Disable Base64 encoding in Stargate request and Return as String

2015-07-17 Thread anil gupta
Hi All,

We have a String Rowkey. We have String values of cells.
Still, Stargate returns the data with Base64 encoding due to which a user
cant read the data. Is there a way to disable Base64 encoding and then Rest
request would just return Strings.

-- 
Thanks & Regards,
Anil Gupta


Re: Disable Base64 encoding in Stargate request and Return as String

2015-07-17 Thread Andrew Purtell
​
​
The closest you can get to just a string is have your client use an accept
header of "Accept: application/octet-stream" with making a query. This will
return zero or one value in the response. If a value is present in the
table at the requested location, the response body will be the unencoded
bytes. If you've stored a string, you'll get back a string. If you've
stored an image, you'll get back the raw image bytes. Note that using an
accept header of "application/octet-stream" implicitly limits you to
queries that only return zero or one values. (Strictly speaking, per the
package doc: "If binary encoding is requested, only one cell can be
returned, the first to match the resource specification. The row, column,
and timestamp associated with the cell will be transmitted in X headers:
X-Row, X-Column, and X-Timestamp, respectively. Depending on the precision
of the resource specification, some of the X-headers may be elided as
redundant.")
​
​In general, the REST gateway supports

​several ​
alternate encodings. See
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html
for some examples.

Note that HBase
​cell data
 is binary
​, not string.
It
​does not
 make sense to turn off base64 encoding for the default response encoding,
XML, because
​that ​
would produce invalid XML
​ if a value happens to include non XML safe bytes​
.
​HBase can't know that in advance. We need to encode keys and values in a
safe manner to avoid blowing up your
client's XML.

The same is roughly true for JSON.​


If your client sends an accept header of "Accept: application/protobuf"
you'll get back a protobuf encoded object. Your client will need to be
prepared to handle that representation. This is probably not what you want.

Why are we
​even ​
talking about using XML
​, JSON,​
​ or
 protobuf to
​encode
 responses? Because for many types of REST queries, HBase
​must return ​
a structured response.
​The client has asked for more than
simply
​one value, simply one string​
. The response
​must include
key
​s​
,
​values
,
​timestamps
​;
 maybe a whole row
​'s worth​
of
​keys, values, and timestamps
​;
 maybe multiple rows. It depends on the query you issued.
​ (See the '​
Cell or Row Query (Multiple Values)
​' section in the package doc.)​




On Fri, Jul 17, 2015 at 2:20 PM, anil gupta  wrote:

> Hi All,
>
> We have a String Rowkey. We have String values of cells.
> Still, Stargate returns the data with Base64 encoding due to which a user
> cant read the data. Is there a way to disable Base64 encoding and then Rest
> request would just return Strings.
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Region assignment after full cluster restart

2015-07-17 Thread Ted Yu
bq. the assignment is not always preserved

Can you provide more information on this scenario ?
Master should have retained region assignment across cluster restart.

If you can pastebin relevant portion of master log w.r.t. the region(s)
whose location was not retained, that would be nice.

Thanks

On Thu, Jul 16, 2015 at 4:37 AM, Ricardo Manuel Pereira Vilaça <
rmvil...@di.uminho.pt> wrote:

> Hi,
>
> We are using HBase 0.98.6 and Hadoop 2.5.0 - CDH 5.3.5.
> We have some doubts regarding region assignment.
>
> We do manual region splits/and merges and also assignment of regions to
> regionservers.
> Is any way to ensure that the assignment remains after a full restart of
> the cluster? How?
>
> We did some experiments with hbase.master.wait.on.regionservers.mintostart
> set to the total number of region
> servers but the assignment is not always preserved.
>
> Thanks in advance,
>
> Ricardo Vilaça