subject:"Response Too Slow in RegionServer Logs"

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread Bryan Beaudreault

That one long GC aside, look at the timings of the others as well.  Even
the smaller GCs are taking up the majority of each second.

For a heap that size you might want to try a java version over java7u60 and
use the G1GC.  Otherwise there are a bunch of resources on the web
including in the refguide for how to tune GCs for hbase.


On Fri, May 29, 2015 at 11:49 AM, rahul malviya malviyarahul2...@gmail.com
wrote:

 This is a sample from gc log file. At the end I see long gc pauses. Is
 there a way I can tune this ?

 2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+:
 98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs]
 13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09
 sys=0.07, real=0.65 secs]
 2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+:
 98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs]
 13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39
 sys=0.08, real=1.20 secs]
 2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+:
 98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs]
 13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07
 sys=0.05, real=0.90 secs]
 2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+:
 98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs]
 13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98
 sys=0.08, real=0.88 secs]
 2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+:
 98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs]
 13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99
 sys=0.04, real=0.63 secs]
 2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+:
 98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs]
 13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72
 sys=0.05, real=0.64 secs]
 2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+:
 98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs]
 13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16
 sys=0.05, real=0.72 secs]
 2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+:
 98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs]
 13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70
 sys=0.05, real=0.53 secs]
 2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+:
 98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs]
 13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57
 sys=0.08, real=1.08 secs]
 2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+:
 98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs]
 13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48
 sys=0.05, real=0.57 secs]
 2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+:
 98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs]
 13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49
 sys=0.07, real=0.71 secs]
 2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+:
 98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500
 secs]2015-04-29T22:46:22.243+: 98071.517: [CMS:
 13403504K-7807614K(19420524K), 284.4275150 secs]
 13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)],
 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs]
 Heap
  par new generation   total 2146944K, used 868806K [0x00016360,
 0x0001f4f9, 0x0001f4f9)
   eden space 1908416K,  45% used [0x00016360, 0x0001986718f8,
 0x0001d7db)
   from space 238528K,   0% used [0x0001d7db, 0x0001d7db,
 0x0001e66a)
   to   space 238528K,   0% used [0x0001e66a, 0x0001e66a,
 0x0001f4f9)
  concurrent mark-sweep generation total 19420524K, used 7807614K
 [0x0001f4f9, 0x0006964eb000, 0x0007fae0)
  concurrent-mark-sweep perm gen total 51000K, used 30498K
 [0x0007fae0, 0x0007fdfce000, 0x0008)

 Thanks,
 Rahul


 On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault 
 bbeaudrea...@hubspot.com
  wrote:

  
   2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
   slept
   14912ms instead of 3000ms, this is likely due to a long garbage
  collecting
   pause and it's usually bad, see
   http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 
 
  I would check your gc logs for long gc pauses.
 
  On Fri, May 29, 2015 at 11:38 AM, rahul malviya 
  malviyarahul2...@gmail.com
  wrote:
 
   Hi All,
  
   In our cluster region server logs are filled with response too slow
   message. This is causing jobs to slow down. How can I debug what is the
   reason for this slowness.
  
   We have enabled short circuit reads and region server has 27GB RAM.
  
   Here is a trace when regionserver starts.
  
   Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex
   core file

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread Vladimir Rodionov

Is there any reason for 27G heap? It seems you run M/R job? If yes, then I
would recommend you trying M/R over snapshots.  Combination of large HBase
heap and M/R is very hard to tune if possible at all
You can also try reducing number of map tasks and check your MR job
resource consumption
On May 29, 2015 8:58 AM, Bryan Beaudreault bbeaudrea...@hubspot.com
wrote:

 That one long GC aside, look at the timings of the others as well.  Even
 the smaller GCs are taking up the majority of each second.

 For a heap that size you might want to try a java version over java7u60 and
 use the G1GC.  Otherwise there are a bunch of resources on the web
 including in the refguide for how to tune GCs for hbase.


 On Fri, May 29, 2015 at 11:49 AM, rahul malviya 
 malviyarahul2...@gmail.com
 wrote:

  This is a sample from gc log file. At the end I see long gc pauses. Is
  there a way I can tune this ?
 
  2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+:
  98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs]
  13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09
  sys=0.07, real=0.65 secs]
  2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+:
  98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs]
  13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39
  sys=0.08, real=1.20 secs]
  2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+:
  98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs]
  13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07
  sys=0.05, real=0.90 secs]
  2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+:
  98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs]
  13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98
  sys=0.08, real=0.88 secs]
  2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+:
  98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs]
  13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99
  sys=0.04, real=0.63 secs]
  2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+:
  98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs]
  13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72
  sys=0.05, real=0.64 secs]
  2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+:
  98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs]
  13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16
  sys=0.05, real=0.72 secs]
  2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+:
  98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs]
  13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70
  sys=0.05, real=0.53 secs]
  2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+:
  98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs]
  13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57
  sys=0.08, real=1.08 secs]
  2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+:
  98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs]
  13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48
  sys=0.05, real=0.57 secs]
  2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+:
  98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs]
  13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49
  sys=0.07, real=0.71 secs]
  2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+:
  98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500
  secs]2015-04-29T22:46:22.243+: 98071.517: [CMS:
  13403504K-7807614K(19420524K), 284.4275150 secs]
  13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)],
  284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs]
  Heap
   par new generation   total 2146944K, used 868806K [0x00016360,
  0x0001f4f9, 0x0001f4f9)
eden space 1908416K,  45% used [0x00016360, 0x0001986718f8,
  0x0001d7db)
from space 238528K,   0% used [0x0001d7db, 0x0001d7db,
  0x0001e66a)
to   space 238528K,   0% used [0x0001e66a, 0x0001e66a,
  0x0001f4f9)
   concurrent mark-sweep generation total 19420524K, used 7807614K
  [0x0001f4f9, 0x0006964eb000, 0x0007fae0)
   concurrent-mark-sweep perm gen total 51000K, used 30498K
  [0x0007fae0, 0x0007fdfce000, 0x0008)
 
  Thanks,
  Rahul
 
 
  On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault 
  bbeaudrea...@hubspot.com
   wrote:
 
   
2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept
14912ms instead of 3000ms, this is likely due to a long garbage
   collecting
pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
  
  
   I would check your gc logs for long gc pauses.
  
   On Fri, May 29,

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread Bryan Beaudreault


 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
 slept
 14912ms instead of 3000ms, this is likely due to a long garbage collecting
 pause and it's usually bad, see
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired


I would check your gc logs for long gc pauses.

On Fri, May 29, 2015 at 11:38 AM, rahul malviya malviyarahul2...@gmail.com
wrote:

 Hi All,

 In our cluster region server logs are filled with response too slow
 message. This is causing jobs to slow down. How can I debug what is the
 reason for this slowness.

 We have enabled short circuit reads and region server has 27GB RAM.

 Here is a trace when regionserver starts.

 Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 966365
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 966365
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 2014-08-14 20:23:53,341 WARN org.apache.hadoop.conf.Configuration:
 fs.default.name is deprecated. Instead, use fs.defaultFS
 2014-08-14 20:23:53,342 WARN org.apache.hadoop.conf.Configuration:
 mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 2014-08-14 20:23:53,884 WARN org.apache.hadoop.conf.Configuration:
 slave.host.name is deprecated. Instead, use
 mapreduce.tasktracker.host.name
 2014-08-14 20:24:03,999 WARN org.apache.hadoop.conf.Configuration:
 hadoop.native.lib is deprecated. Instead, use io.native.lib.available
 2014-08-14 20:26:47,605 ERROR
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent
 configuration. Previous configuration for using table name in metrics:
 true, new configuration: false
 2014-08-14 20:28:23,491 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:18725,call:next(-8041903839443097981, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.248:58716

 ,starttimems:1408048084720,queuetimems:0,class:HRegionServer,responsesize:5031595,method:next}
 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 14912ms instead of 3000ms, this is likely due to a long garbage collecting
 pause and it's usually bad, see
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 2014-08-14 21:42:28,477 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:16968,call:next(5487686201525374976, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.249:36657

 ,starttimems:1408052531504,queuetimems:0,class:HRegionServer,responsesize:1959532,method:next}
 2014-08-14 21:42:56,923 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:10591,call:next(5487686201525374976, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.249:40818

 ,starttimems:1408052566327,queuetimems:1,class:HRegionServer,responsesize:2987578,method:next}
 2014-08-14 21:44:24,372 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:10656,call:next(5487686201525374976, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.249:41993

 ,starttimems:1408052653710,queuetimems:1,class:HRegionServer,responsesize:3039779,method:next}
 2014-08-14 21:45:50,598 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:12418,call:next(5487686201525374976, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.249:45197

 ,starttimems:1408052738174,queuetimems:10,class:HRegionServer,responsesize:2476903,method:next}
 2014-08-14 21:46:15,187 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:23766,call:next(5487686201525374976, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.249:49425

 ,starttimems:1408052751414,queuetimems:0,class:HRegionServer,responsesize:5681175,method:next}
 2014-08-14 21:47:09,041 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:12320,call:next(5487686201525374976, 10), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 17.170.176.249:50269

 ,starttimems:1408052816698,queuetimems:1,class:HRegionServer,responsesize:2986949,method:next}
 2014-08-14 21:49:23,833 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:11389,call:next(1227841280814011139, 10), rpc
 version=1,

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread rahul malviya

This is a sample from gc log file. At the end I see long gc pauses. Is
there a way I can tune this ?

2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+:
98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs]
13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09
sys=0.07, real=0.65 secs]
2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+:
98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs]
13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39
sys=0.08, real=1.20 secs]
2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+:
98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs]
13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07
sys=0.05, real=0.90 secs]
2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+:
98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs]
13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98
sys=0.08, real=0.88 secs]
2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+:
98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs]
13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99
sys=0.04, real=0.63 secs]
2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+:
98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs]
13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72
sys=0.05, real=0.64 secs]
2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+:
98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs]
13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16
sys=0.05, real=0.72 secs]
2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+:
98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs]
13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70
sys=0.05, real=0.53 secs]
2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+:
98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs]
13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57
sys=0.08, real=1.08 secs]
2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+:
98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs]
13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48
sys=0.05, real=0.57 secs]
2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+:
98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs]
13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49
sys=0.07, real=0.71 secs]
2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+:
98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500
secs]2015-04-29T22:46:22.243+: 98071.517: [CMS:
13403504K-7807614K(19420524K), 284.4275150 secs]
13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)],
284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs]
Heap
 par new generation   total 2146944K, used 868806K [0x00016360,
0x0001f4f9, 0x0001f4f9)
  eden space 1908416K,  45% used [0x00016360, 0x0001986718f8,
0x0001d7db)
  from space 238528K,   0% used [0x0001d7db, 0x0001d7db,
0x0001e66a)
  to   space 238528K,   0% used [0x0001e66a, 0x0001e66a,
0x0001f4f9)
 concurrent mark-sweep generation total 19420524K, used 7807614K
[0x0001f4f9, 0x0006964eb000, 0x0007fae0)
 concurrent-mark-sweep perm gen total 51000K, used 30498K
[0x0007fae0, 0x0007fdfce000, 0x0008)

Thanks,
Rahul


On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault bbeaudrea...@hubspot.com
 wrote:

 
  2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept
  14912ms instead of 3000ms, this is likely due to a long garbage
 collecting
  pause and it's usually bad, see
  http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired


 I would check your gc logs for long gc pauses.

 On Fri, May 29, 2015 at 11:38 AM, rahul malviya 
 malviyarahul2...@gmail.com
 wrote:

  Hi All,
 
  In our cluster region server logs are filled with response too slow
  message. This is causing jobs to slow down. How can I debug what is the
  reason for this slowness.
 
  We have enabled short circuit reads and region server has 27GB RAM.
 
  Here is a trace when regionserver starts.
 
  Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex
  core file size  (blocks, -c) 0
  data seg size   (kbytes, -d) unlimited
  scheduling priority (-e) 0
  file size   (blocks, -f) unlimited
  pending signals (-i) 966365
  max locked memory   (kbytes, -l) 64
  max memory size (kbytes, -m) unlimited
  open files  (-n) 32768
  pipe size(512 bytes, -p) 8
  POSIX message queues (bytes, -q) 819200
  real-time priority  (-r) 0
  stack size  (kbytes, -s)

Response Too Slow in RegionServer Logs

2015-05-29 Thread rahul malviya

Hi All,

In our cluster region server logs are filled with response too slow
message. This is causing jobs to slow down. How can I debug what is the
reason for this slowness.

We have enabled short circuit reads and region server has 27GB RAM.

Here is a trace when regionserver starts.

Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 966365
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 966365
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
2014-08-14 20:23:53,341 WARN org.apache.hadoop.conf.Configuration:
fs.default.name is deprecated. Instead, use fs.defaultFS
2014-08-14 20:23:53,342 WARN org.apache.hadoop.conf.Configuration:
mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2014-08-14 20:23:53,884 WARN org.apache.hadoop.conf.Configuration:
slave.host.name is deprecated. Instead, use mapreduce.tasktracker.host.name
2014-08-14 20:24:03,999 WARN org.apache.hadoop.conf.Configuration:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-08-14 20:26:47,605 ERROR
org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent
configuration. Previous configuration for using table name in metrics:
true, new configuration: false
2014-08-14 20:28:23,491 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:18725,call:next(-8041903839443097981, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.248:58716
,starttimems:1408048084720,queuetimems:0,class:HRegionServer,responsesize:5031595,method:next}
2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
14912ms instead of 3000ms, this is likely due to a long garbage collecting
pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-08-14 21:42:28,477 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:16968,call:next(5487686201525374976, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.249:36657
,starttimems:1408052531504,queuetimems:0,class:HRegionServer,responsesize:1959532,method:next}
2014-08-14 21:42:56,923 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:10591,call:next(5487686201525374976, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.249:40818
,starttimems:1408052566327,queuetimems:1,class:HRegionServer,responsesize:2987578,method:next}
2014-08-14 21:44:24,372 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:10656,call:next(5487686201525374976, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.249:41993
,starttimems:1408052653710,queuetimems:1,class:HRegionServer,responsesize:3039779,method:next}
2014-08-14 21:45:50,598 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:12418,call:next(5487686201525374976, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.249:45197
,starttimems:1408052738174,queuetimems:10,class:HRegionServer,responsesize:2476903,method:next}
2014-08-14 21:46:15,187 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:23766,call:next(5487686201525374976, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.249:49425
,starttimems:1408052751414,queuetimems:0,class:HRegionServer,responsesize:5681175,method:next}
2014-08-14 21:47:09,041 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:12320,call:next(5487686201525374976, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.249:50269
,starttimems:1408052816698,queuetimems:1,class:HRegionServer,responsesize:2986949,method:next}
2014-08-14 21:49:23,833 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:11389,call:next(1227841280814011139, 10), rpc
version=1, client version=29, methodsFingerPrint=-1368823753,client:
17.170.176.122:41976
,starttimems:1408052952415,queuetimems:0,class:HRegionServer,responsesize:3160025,method:next}
2014-08-14 21:49:23,869 WARN org.apache.hadoop.ipc.HBaseServer: Exception
while changing ops : java.nio.channels.CancelledKeyException
2014-08-14 21:49:23,900 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{processingtimems:11428,call:next(9103372947568217267, 10), rpc
version=1, client

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread rahul malviya

Yes we are running MR on hbase. I tried running MR on snapshot but the data
in our HBase changes very frequently and we end up occupying twice the
space and end up running into full disks.

I think we are hitting large HBase heap and M/R problem. I will try to add
some more space to our cluster and try MR on snapshot and see how it goes.

Thanks for suggestion.

Thanks,
Rahul

On Fri, May 29, 2015 at 9:14 AM, Vladimir Rodionov vladrodio...@gmail.com
wrote:

 Is there any reason for 27G heap? It seems you run M/R job? If yes, then I
 would recommend you trying M/R over snapshots.  Combination of large HBase
 heap and M/R is very hard to tune if possible at all
 You can also try reducing number of map tasks and check your MR job
 resource consumption
 On May 29, 2015 8:58 AM, Bryan Beaudreault bbeaudrea...@hubspot.com
 wrote:

  That one long GC aside, look at the timings of the others as well.  Even
  the smaller GCs are taking up the majority of each second.
 
  For a heap that size you might want to try a java version over java7u60
 and
  use the G1GC.  Otherwise there are a bunch of resources on the web
  including in the refguide for how to tune GCs for hbase.
 
 
  On Fri, May 29, 2015 at 11:49 AM, rahul malviya 
  malviyarahul2...@gmail.com
  wrote:
 
   This is a sample from gc log file. At the end I see long gc pauses. Is
   there a way I can tune this ?
  
   2015-04-29T22:46:12.387+: 98061.660:
 [GC2015-04-29T22:46:12.387+:
   98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs]
   13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09
   sys=0.07, real=0.65 secs]
   2015-04-29T22:46:13.165+: 98062.438:
 [GC2015-04-29T22:46:13.165+:
   98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs]
   13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39
   sys=0.08, real=1.20 secs]
   2015-04-29T22:46:14.485+: 98063.759:
 [GC2015-04-29T22:46:14.485+:
   98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs]
   13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07
   sys=0.05, real=0.90 secs]
   2015-04-29T22:46:15.519+: 98064.793:
 [GC2015-04-29T22:46:15.519+:
   98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs]
   13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98
   sys=0.08, real=0.88 secs]
   2015-04-29T22:46:16.547+: 98065.821:
 [GC2015-04-29T22:46:16.547+:
   98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs]
   13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99
   sys=0.04, real=0.63 secs]
   2015-04-29T22:46:17.273+: 98066.547:
 [GC2015-04-29T22:46:17.273+:
   98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs]
   13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72
   sys=0.05, real=0.64 secs]
   2015-04-29T22:46:18.035+: 98067.309:
 [GC2015-04-29T22:46:18.035+:
   98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs]
   13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16
   sys=0.05, real=0.72 secs]
   2015-04-29T22:46:18.875+: 98068.149:
 [GC2015-04-29T22:46:18.875+:
   98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs]
   13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70
   sys=0.05, real=0.53 secs]
   2015-04-29T22:46:19.521+: 98068.795:
 [GC2015-04-29T22:46:19.521+:
   98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs]
   13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57
   sys=0.08, real=1.08 secs]
   2015-04-29T22:46:20.751+: 98070.024:
 [GC2015-04-29T22:46:20.751+:
   98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs]
   13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48
   sys=0.05, real=0.57 secs]
   2015-04-29T22:46:21.393+: 98070.667:
 [GC2015-04-29T22:46:21.393+:
   98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs]
   13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49
   sys=0.07, real=0.71 secs]
   2015-04-29T22:46:22.214+: 98071.487:
 [GC2015-04-29T22:46:22.214+:
   98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500
   secs]2015-04-29T22:46:22.243+: 98071.517: [CMS:
   13403504K-7807614K(19420524K), 284.4275150 secs]
   13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)],
   284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs]
   Heap
par new generation   total 2146944K, used 868806K [0x00016360,
   0x0001f4f9, 0x0001f4f9)
 eden space 1908416K,  45% used [0x00016360,
 0x0001986718f8,
   0x0001d7db)
 from space 238528K,   0% used [0x0001d7db,
 0x0001d7db,
   0x0001e66a)
 to   space 238528K,   0% used [0x0001e66a,
 0x0001e66a,
   0x0001f4f9)
concurrent mark-sweep generation total 19420524K, used 7807614K
   [0x0001f4f9, 0x0006964eb000, 0x0007fae0)
concurrent-mark-sweep perm

Re: Response Too Slow in RegionServer Logs

Re: Response Too Slow in RegionServer Logs

Re: Response Too Slow in RegionServer Logs

Re: Response Too Slow in RegionServer Logs

Response Too Slow in RegionServer Logs

Re: Response Too Slow in RegionServer Logs

6 matches

Site Navigation

Mail list logo

Footer information