Re: Spikes when writing data to HBase

2015-08-12 Thread Serega Sheypak
I agree.
  99% = 112.09 milliseconds
I could make 3 gets during 112 MS.


2015-08-12 9:24 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com:

 OK, this is actually checkAndPut - get - check -put. Latency is dominated
 by get operation. Unless you have SSDs 10-40 ms mean read latency is
 normal.

 -Vlad

 On Tue, Aug 11, 2015 at 11:24 PM, Serega Sheypak serega.shey...@gmail.com
 
 wrote:

  Hi, here is it:
  https://gist.github.com/seregasheypak/00ef1a44e6293d13e56e
 
  2015-08-12 4:25 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com:
 
   Can you post code snippet? Pastbin link is fine.
  
   -Vlad
  
   On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak 
  serega.shey...@gmail.com
   wrote:
  
Probably I found something. Response time decreases when parallelism
   grows.
What I did:
   
1. wrap business logic controller into java main class. My controller
   does
some logic and puts/gets to hbase with checkAndPut (sometimes)
2. create HConnection
3. pass HConnection to controller
4. wrap controller execution into codahale metrics
5. execute controller in several threads simultaneously. The same
  happens
in servlet environment
   
I can't explain result.
1. I used 10 threads and 10 iterations in each.
   
*RESULT:  99% = 28.81 milliseconds which sounds GOOD!*
-- Meters
   
 --
putMeter
 count = 414914
 mean rate = 885.58 events/second
 1-minute rate = 911.56 events/second
 5-minute rate = 778.16 events/second
15-minute rate = 549.72 events/second
   
-- Timers
   
 --
putTimer
 count = 414914
 mean rate = 884.66 calls/second
 1-minute rate = 911.53 calls/second
 5-minute rate = 765.60 calls/second
15-minute rate = 515.06 calls/second
   min = 4.87 milliseconds
   max = 211.77 milliseconds
  mean = 10.81 milliseconds
stddev = 5.43 milliseconds
median = 10.34 milliseconds
  75% = 11.59 milliseconds
  95% = 14.41 milliseconds
  98% = 19.59 milliseconds
  99% = 28.81 milliseconds
99.9% = 60.67 milliseconds
   
I've increased count of threads to 100:
*RESULT: 99% = 112.09 milliseconds*
-- Meters
   
 --
putMeter
 count = 1433056
 mean rate = 2476.46 events/second
 1-minute rate = 2471.18 events/second
 5-minute rate = 2483.28 events/second
15-minute rate = 2512.52 events/second
   
-- Timers
   
 --
putTimer
 count = 1433058
 mean rate = 2474.61 calls/second
 1-minute rate = 2468.45 calls/second
 5-minute rate = 2446.45 calls/second
15-minute rate = 2383.23 calls/second
   min = 10.03 milliseconds
   max = 853.05 milliseconds
  mean = 40.71 milliseconds
stddev = 39.04 milliseconds
median = 35.60 milliseconds
  75% = 47.69 milliseconds
  95% = 71.79 milliseconds
  98% = 85.83 milliseconds
  99% = 112.09 milliseconds
99.9% = 853.05 milliseconds
   
Is it possible to explain it? Could it be a problem in some
pooling/threading inside HConnection?
   
please see what happened to compactions during test:
http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png
   
get/put ops
http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png
   
slow ops:
http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png
   
2015-08-11 23:43 GMT+02:00 Serega Sheypak serega.shey...@gmail.com
 :
   
 How about GC activity? ApplicationStopTime? Do you track that?
 yes, jviusalm says it's ok, newrelic also doesn't show something
   strange.
 HBase also says it's OK.

 Profiler says most time thread is waiting for response from hbase
  side.
My
 assumption is:
 1. I have weird bug in HBase configuration
 2. I have undiscovered problems with networking (BUT the same
 tomcats
 write data to flume with higher rate, no data loss at all)
 3. I have weird problem with HConnection HConnectionManager is
 multithreaded env, when same servlet instance shared across many
   threads
 4. some mystic process somewhere in the cluster

 Is the issue reproducible? or you got it first time?
 always. Spikes disappear during night, but RPM doesn't change too
  much.

 I will run my controller code out of tomcat and see how it goes.
 I'm

Re: Spikes when writing data to HBase

2015-08-12 Thread Vladimir Rodionov
OK, this is actually checkAndPut - get - check -put. Latency is dominated
by get operation. Unless you have SSDs 10-40 ms mean read latency is normal.

-Vlad

On Tue, Aug 11, 2015 at 11:24 PM, Serega Sheypak serega.shey...@gmail.com
wrote:

 Hi, here is it:
 https://gist.github.com/seregasheypak/00ef1a44e6293d13e56e

 2015-08-12 4:25 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com:

  Can you post code snippet? Pastbin link is fine.
 
  -Vlad
 
  On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak 
 serega.shey...@gmail.com
  wrote:
 
   Probably I found something. Response time decreases when parallelism
  grows.
   What I did:
  
   1. wrap business logic controller into java main class. My controller
  does
   some logic and puts/gets to hbase with checkAndPut (sometimes)
   2. create HConnection
   3. pass HConnection to controller
   4. wrap controller execution into codahale metrics
   5. execute controller in several threads simultaneously. The same
 happens
   in servlet environment
  
   I can't explain result.
   1. I used 10 threads and 10 iterations in each.
  
   *RESULT:  99% = 28.81 milliseconds which sounds GOOD!*
   -- Meters
   --
   putMeter
count = 414914
mean rate = 885.58 events/second
1-minute rate = 911.56 events/second
5-minute rate = 778.16 events/second
   15-minute rate = 549.72 events/second
  
   -- Timers
   --
   putTimer
count = 414914
mean rate = 884.66 calls/second
1-minute rate = 911.53 calls/second
5-minute rate = 765.60 calls/second
   15-minute rate = 515.06 calls/second
  min = 4.87 milliseconds
  max = 211.77 milliseconds
 mean = 10.81 milliseconds
   stddev = 5.43 milliseconds
   median = 10.34 milliseconds
 75% = 11.59 milliseconds
 95% = 14.41 milliseconds
 98% = 19.59 milliseconds
 99% = 28.81 milliseconds
   99.9% = 60.67 milliseconds
  
   I've increased count of threads to 100:
   *RESULT: 99% = 112.09 milliseconds*
   -- Meters
   --
   putMeter
count = 1433056
mean rate = 2476.46 events/second
1-minute rate = 2471.18 events/second
5-minute rate = 2483.28 events/second
   15-minute rate = 2512.52 events/second
  
   -- Timers
   --
   putTimer
count = 1433058
mean rate = 2474.61 calls/second
1-minute rate = 2468.45 calls/second
5-minute rate = 2446.45 calls/second
   15-minute rate = 2383.23 calls/second
  min = 10.03 milliseconds
  max = 853.05 milliseconds
 mean = 40.71 milliseconds
   stddev = 39.04 milliseconds
   median = 35.60 milliseconds
 75% = 47.69 milliseconds
 95% = 71.79 milliseconds
 98% = 85.83 milliseconds
 99% = 112.09 milliseconds
   99.9% = 853.05 milliseconds
  
   Is it possible to explain it? Could it be a problem in some
   pooling/threading inside HConnection?
  
   please see what happened to compactions during test:
   http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png
  
   get/put ops
   http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png
  
   slow ops:
   http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png
  
   2015-08-11 23:43 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:
  
How about GC activity? ApplicationStopTime? Do you track that?
yes, jviusalm says it's ok, newrelic also doesn't show something
  strange.
HBase also says it's OK.
   
Profiler says most time thread is waiting for response from hbase
 side.
   My
assumption is:
1. I have weird bug in HBase configuration
2. I have undiscovered problems with networking (BUT the same tomcats
write data to flume with higher rate, no data loss at all)
3. I have weird problem with HConnection HConnectionManager is
multithreaded env, when same servlet instance shared across many
  threads
4. some mystic process somewhere in the cluster
   
Is the issue reproducible? or you got it first time?
always. Spikes disappear during night, but RPM doesn't change too
 much.
   
I will run my controller code out of tomcat and see how it goes. I'm
   going
to isolate components...
   
   
2015-08-11 23:36 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com
 :
   
How about GC activity? ApplicationStopTime? Do you track that?
   
Is the issue reproducible? or you got it first time?
   
Start with RS logs and try to 

Re: Spikes when writing data to HBase

2015-08-12 Thread Serega Sheypak
Hi, here is it:
https://gist.github.com/seregasheypak/00ef1a44e6293d13e56e

2015-08-12 4:25 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com:

 Can you post code snippet? Pastbin link is fine.

 -Vlad

 On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak serega.shey...@gmail.com
 wrote:

  Probably I found something. Response time decreases when parallelism
 grows.
  What I did:
 
  1. wrap business logic controller into java main class. My controller
 does
  some logic and puts/gets to hbase with checkAndPut (sometimes)
  2. create HConnection
  3. pass HConnection to controller
  4. wrap controller execution into codahale metrics
  5. execute controller in several threads simultaneously. The same happens
  in servlet environment
 
  I can't explain result.
  1. I used 10 threads and 10 iterations in each.
 
  *RESULT:  99% = 28.81 milliseconds which sounds GOOD!*
  -- Meters
  --
  putMeter
   count = 414914
   mean rate = 885.58 events/second
   1-minute rate = 911.56 events/second
   5-minute rate = 778.16 events/second
  15-minute rate = 549.72 events/second
 
  -- Timers
  --
  putTimer
   count = 414914
   mean rate = 884.66 calls/second
   1-minute rate = 911.53 calls/second
   5-minute rate = 765.60 calls/second
  15-minute rate = 515.06 calls/second
 min = 4.87 milliseconds
 max = 211.77 milliseconds
mean = 10.81 milliseconds
  stddev = 5.43 milliseconds
  median = 10.34 milliseconds
75% = 11.59 milliseconds
95% = 14.41 milliseconds
98% = 19.59 milliseconds
99% = 28.81 milliseconds
  99.9% = 60.67 milliseconds
 
  I've increased count of threads to 100:
  *RESULT: 99% = 112.09 milliseconds*
  -- Meters
  --
  putMeter
   count = 1433056
   mean rate = 2476.46 events/second
   1-minute rate = 2471.18 events/second
   5-minute rate = 2483.28 events/second
  15-minute rate = 2512.52 events/second
 
  -- Timers
  --
  putTimer
   count = 1433058
   mean rate = 2474.61 calls/second
   1-minute rate = 2468.45 calls/second
   5-minute rate = 2446.45 calls/second
  15-minute rate = 2383.23 calls/second
 min = 10.03 milliseconds
 max = 853.05 milliseconds
mean = 40.71 milliseconds
  stddev = 39.04 milliseconds
  median = 35.60 milliseconds
75% = 47.69 milliseconds
95% = 71.79 milliseconds
98% = 85.83 milliseconds
99% = 112.09 milliseconds
  99.9% = 853.05 milliseconds
 
  Is it possible to explain it? Could it be a problem in some
  pooling/threading inside HConnection?
 
  please see what happened to compactions during test:
  http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png
 
  get/put ops
  http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png
 
  slow ops:
  http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png
 
  2015-08-11 23:43 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:
 
   How about GC activity? ApplicationStopTime? Do you track that?
   yes, jviusalm says it's ok, newrelic also doesn't show something
 strange.
   HBase also says it's OK.
  
   Profiler says most time thread is waiting for response from hbase side.
  My
   assumption is:
   1. I have weird bug in HBase configuration
   2. I have undiscovered problems with networking (BUT the same tomcats
   write data to flume with higher rate, no data loss at all)
   3. I have weird problem with HConnection HConnectionManager is
   multithreaded env, when same servlet instance shared across many
 threads
   4. some mystic process somewhere in the cluster
  
   Is the issue reproducible? or you got it first time?
   always. Spikes disappear during night, but RPM doesn't change too much.
  
   I will run my controller code out of tomcat and see how it goes. I'm
  going
   to isolate components...
  
  
   2015-08-11 23:36 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com:
  
   How about GC activity? ApplicationStopTime? Do you track that?
  
   Is the issue reproducible? or you got it first time?
  
   Start with RS logs and try to find anything suspicious in a period of
 a
   very high latency. 1.5 sec HBase write latency does not look right.
  
   -Vlad
  
   On Tue, Aug 11, 2015 at 2:08 PM, Serega Sheypak 
  serega.shey...@gmail.com
   
   wrote:
  
Hi Vladimir!
   
Here are graphs. Servlet (3 tomcats on 3 different hosts write to
  HBase)