Re: Spikes when writing data to HBase
I agree. 99% = 112.09 milliseconds I could make 3 gets during 112 MS. 2015-08-12 9:24 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com: OK, this is actually checkAndPut - get - check -put. Latency is dominated by get operation. Unless you have SSDs 10-40 ms mean read latency is normal. -Vlad On Tue, Aug 11, 2015 at 11:24 PM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, here is it: https://gist.github.com/seregasheypak/00ef1a44e6293d13e56e 2015-08-12 4:25 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com: Can you post code snippet? Pastbin link is fine. -Vlad On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak serega.shey...@gmail.com wrote: Probably I found something. Response time decreases when parallelism grows. What I did: 1. wrap business logic controller into java main class. My controller does some logic and puts/gets to hbase with checkAndPut (sometimes) 2. create HConnection 3. pass HConnection to controller 4. wrap controller execution into codahale metrics 5. execute controller in several threads simultaneously. The same happens in servlet environment I can't explain result. 1. I used 10 threads and 10 iterations in each. *RESULT: 99% = 28.81 milliseconds which sounds GOOD!* -- Meters -- putMeter count = 414914 mean rate = 885.58 events/second 1-minute rate = 911.56 events/second 5-minute rate = 778.16 events/second 15-minute rate = 549.72 events/second -- Timers -- putTimer count = 414914 mean rate = 884.66 calls/second 1-minute rate = 911.53 calls/second 5-minute rate = 765.60 calls/second 15-minute rate = 515.06 calls/second min = 4.87 milliseconds max = 211.77 milliseconds mean = 10.81 milliseconds stddev = 5.43 milliseconds median = 10.34 milliseconds 75% = 11.59 milliseconds 95% = 14.41 milliseconds 98% = 19.59 milliseconds 99% = 28.81 milliseconds 99.9% = 60.67 milliseconds I've increased count of threads to 100: *RESULT: 99% = 112.09 milliseconds* -- Meters -- putMeter count = 1433056 mean rate = 2476.46 events/second 1-minute rate = 2471.18 events/second 5-minute rate = 2483.28 events/second 15-minute rate = 2512.52 events/second -- Timers -- putTimer count = 1433058 mean rate = 2474.61 calls/second 1-minute rate = 2468.45 calls/second 5-minute rate = 2446.45 calls/second 15-minute rate = 2383.23 calls/second min = 10.03 milliseconds max = 853.05 milliseconds mean = 40.71 milliseconds stddev = 39.04 milliseconds median = 35.60 milliseconds 75% = 47.69 milliseconds 95% = 71.79 milliseconds 98% = 85.83 milliseconds 99% = 112.09 milliseconds 99.9% = 853.05 milliseconds Is it possible to explain it? Could it be a problem in some pooling/threading inside HConnection? please see what happened to compactions during test: http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png get/put ops http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png slow ops: http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png 2015-08-11 23:43 GMT+02:00 Serega Sheypak serega.shey...@gmail.com : How about GC activity? ApplicationStopTime? Do you track that? yes, jviusalm says it's ok, newrelic also doesn't show something strange. HBase also says it's OK. Profiler says most time thread is waiting for response from hbase side. My assumption is: 1. I have weird bug in HBase configuration 2. I have undiscovered problems with networking (BUT the same tomcats write data to flume with higher rate, no data loss at all) 3. I have weird problem with HConnection HConnectionManager is multithreaded env, when same servlet instance shared across many threads 4. some mystic process somewhere in the cluster Is the issue reproducible? or you got it first time? always. Spikes disappear during night, but RPM doesn't change too much. I will run my controller code out of tomcat and see how it goes. I'm
Re: Spikes when writing data to HBase
OK, this is actually checkAndPut - get - check -put. Latency is dominated by get operation. Unless you have SSDs 10-40 ms mean read latency is normal. -Vlad On Tue, Aug 11, 2015 at 11:24 PM, Serega Sheypak serega.shey...@gmail.com wrote: Hi, here is it: https://gist.github.com/seregasheypak/00ef1a44e6293d13e56e 2015-08-12 4:25 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com: Can you post code snippet? Pastbin link is fine. -Vlad On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak serega.shey...@gmail.com wrote: Probably I found something. Response time decreases when parallelism grows. What I did: 1. wrap business logic controller into java main class. My controller does some logic and puts/gets to hbase with checkAndPut (sometimes) 2. create HConnection 3. pass HConnection to controller 4. wrap controller execution into codahale metrics 5. execute controller in several threads simultaneously. The same happens in servlet environment I can't explain result. 1. I used 10 threads and 10 iterations in each. *RESULT: 99% = 28.81 milliseconds which sounds GOOD!* -- Meters -- putMeter count = 414914 mean rate = 885.58 events/second 1-minute rate = 911.56 events/second 5-minute rate = 778.16 events/second 15-minute rate = 549.72 events/second -- Timers -- putTimer count = 414914 mean rate = 884.66 calls/second 1-minute rate = 911.53 calls/second 5-minute rate = 765.60 calls/second 15-minute rate = 515.06 calls/second min = 4.87 milliseconds max = 211.77 milliseconds mean = 10.81 milliseconds stddev = 5.43 milliseconds median = 10.34 milliseconds 75% = 11.59 milliseconds 95% = 14.41 milliseconds 98% = 19.59 milliseconds 99% = 28.81 milliseconds 99.9% = 60.67 milliseconds I've increased count of threads to 100: *RESULT: 99% = 112.09 milliseconds* -- Meters -- putMeter count = 1433056 mean rate = 2476.46 events/second 1-minute rate = 2471.18 events/second 5-minute rate = 2483.28 events/second 15-minute rate = 2512.52 events/second -- Timers -- putTimer count = 1433058 mean rate = 2474.61 calls/second 1-minute rate = 2468.45 calls/second 5-minute rate = 2446.45 calls/second 15-minute rate = 2383.23 calls/second min = 10.03 milliseconds max = 853.05 milliseconds mean = 40.71 milliseconds stddev = 39.04 milliseconds median = 35.60 milliseconds 75% = 47.69 milliseconds 95% = 71.79 milliseconds 98% = 85.83 milliseconds 99% = 112.09 milliseconds 99.9% = 853.05 milliseconds Is it possible to explain it? Could it be a problem in some pooling/threading inside HConnection? please see what happened to compactions during test: http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png get/put ops http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png slow ops: http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png 2015-08-11 23:43 GMT+02:00 Serega Sheypak serega.shey...@gmail.com: How about GC activity? ApplicationStopTime? Do you track that? yes, jviusalm says it's ok, newrelic also doesn't show something strange. HBase also says it's OK. Profiler says most time thread is waiting for response from hbase side. My assumption is: 1. I have weird bug in HBase configuration 2. I have undiscovered problems with networking (BUT the same tomcats write data to flume with higher rate, no data loss at all) 3. I have weird problem with HConnection HConnectionManager is multithreaded env, when same servlet instance shared across many threads 4. some mystic process somewhere in the cluster Is the issue reproducible? or you got it first time? always. Spikes disappear during night, but RPM doesn't change too much. I will run my controller code out of tomcat and see how it goes. I'm going to isolate components... 2015-08-11 23:36 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com : How about GC activity? ApplicationStopTime? Do you track that? Is the issue reproducible? or you got it first time? Start with RS logs and try to
Re: Spikes when writing data to HBase
Hi, here is it: https://gist.github.com/seregasheypak/00ef1a44e6293d13e56e 2015-08-12 4:25 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com: Can you post code snippet? Pastbin link is fine. -Vlad On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak serega.shey...@gmail.com wrote: Probably I found something. Response time decreases when parallelism grows. What I did: 1. wrap business logic controller into java main class. My controller does some logic and puts/gets to hbase with checkAndPut (sometimes) 2. create HConnection 3. pass HConnection to controller 4. wrap controller execution into codahale metrics 5. execute controller in several threads simultaneously. The same happens in servlet environment I can't explain result. 1. I used 10 threads and 10 iterations in each. *RESULT: 99% = 28.81 milliseconds which sounds GOOD!* -- Meters -- putMeter count = 414914 mean rate = 885.58 events/second 1-minute rate = 911.56 events/second 5-minute rate = 778.16 events/second 15-minute rate = 549.72 events/second -- Timers -- putTimer count = 414914 mean rate = 884.66 calls/second 1-minute rate = 911.53 calls/second 5-minute rate = 765.60 calls/second 15-minute rate = 515.06 calls/second min = 4.87 milliseconds max = 211.77 milliseconds mean = 10.81 milliseconds stddev = 5.43 milliseconds median = 10.34 milliseconds 75% = 11.59 milliseconds 95% = 14.41 milliseconds 98% = 19.59 milliseconds 99% = 28.81 milliseconds 99.9% = 60.67 milliseconds I've increased count of threads to 100: *RESULT: 99% = 112.09 milliseconds* -- Meters -- putMeter count = 1433056 mean rate = 2476.46 events/second 1-minute rate = 2471.18 events/second 5-minute rate = 2483.28 events/second 15-minute rate = 2512.52 events/second -- Timers -- putTimer count = 1433058 mean rate = 2474.61 calls/second 1-minute rate = 2468.45 calls/second 5-minute rate = 2446.45 calls/second 15-minute rate = 2383.23 calls/second min = 10.03 milliseconds max = 853.05 milliseconds mean = 40.71 milliseconds stddev = 39.04 milliseconds median = 35.60 milliseconds 75% = 47.69 milliseconds 95% = 71.79 milliseconds 98% = 85.83 milliseconds 99% = 112.09 milliseconds 99.9% = 853.05 milliseconds Is it possible to explain it? Could it be a problem in some pooling/threading inside HConnection? please see what happened to compactions during test: http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png get/put ops http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png slow ops: http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png 2015-08-11 23:43 GMT+02:00 Serega Sheypak serega.shey...@gmail.com: How about GC activity? ApplicationStopTime? Do you track that? yes, jviusalm says it's ok, newrelic also doesn't show something strange. HBase also says it's OK. Profiler says most time thread is waiting for response from hbase side. My assumption is: 1. I have weird bug in HBase configuration 2. I have undiscovered problems with networking (BUT the same tomcats write data to flume with higher rate, no data loss at all) 3. I have weird problem with HConnection HConnectionManager is multithreaded env, when same servlet instance shared across many threads 4. some mystic process somewhere in the cluster Is the issue reproducible? or you got it first time? always. Spikes disappear during night, but RPM doesn't change too much. I will run my controller code out of tomcat and see how it goes. I'm going to isolate components... 2015-08-11 23:36 GMT+02:00 Vladimir Rodionov vladrodio...@gmail.com: How about GC activity? ApplicationStopTime? Do you track that? Is the issue reproducible? or you got it first time? Start with RS logs and try to find anything suspicious in a period of a very high latency. 1.5 sec HBase write latency does not look right. -Vlad On Tue, Aug 11, 2015 at 2:08 PM, Serega Sheypak serega.shey...@gmail.com wrote: Hi Vladimir! Here are graphs. Servlet (3 tomcats on 3 different hosts write to HBase)