Re: Response Too Slow in RegionServer Logs
That one long GC aside, look at the timings of the others as well. Even the smaller GCs are taking up the majority of each second. For a heap that size you might want to try a java version over java7u60 and use the G1GC. Otherwise there are a bunch of resources on the web including in the refguide for how to tune GCs for hbase. On Fri, May 29, 2015 at 11:49 AM, rahul malviya malviyarahul2...@gmail.com wrote: This is a sample from gc log file. At the end I see long gc pauses. Is there a way I can tune this ? 2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+: 98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs] 13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09 sys=0.07, real=0.65 secs] 2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+: 98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs] 13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39 sys=0.08, real=1.20 secs] 2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+: 98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs] 13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07 sys=0.05, real=0.90 secs] 2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+: 98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs] 13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98 sys=0.08, real=0.88 secs] 2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+: 98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs] 13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99 sys=0.04, real=0.63 secs] 2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+: 98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs] 13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72 sys=0.05, real=0.64 secs] 2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+: 98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs] 13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16 sys=0.05, real=0.72 secs] 2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+: 98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs] 13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70 sys=0.05, real=0.53 secs] 2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+: 98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs] 13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57 sys=0.08, real=1.08 secs] 2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+: 98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs] 13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48 sys=0.05, real=0.57 secs] 2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+: 98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs] 13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49 sys=0.07, real=0.71 secs] 2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+: 98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500 secs]2015-04-29T22:46:22.243+: 98071.517: [CMS: 13403504K-7807614K(19420524K), 284.4275150 secs] 13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)], 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs] Heap par new generation total 2146944K, used 868806K [0x00016360, 0x0001f4f9, 0x0001f4f9) eden space 1908416K, 45% used [0x00016360, 0x0001986718f8, 0x0001d7db) from space 238528K, 0% used [0x0001d7db, 0x0001d7db, 0x0001e66a) to space 238528K, 0% used [0x0001e66a, 0x0001e66a, 0x0001f4f9) concurrent mark-sweep generation total 19420524K, used 7807614K [0x0001f4f9, 0x0006964eb000, 0x0007fae0) concurrent-mark-sweep perm gen total 51000K, used 30498K [0x0007fae0, 0x0007fdfce000, 0x0008) Thanks, Rahul On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 14912ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired I would check your gc logs for long gc pauses. On Fri, May 29, 2015 at 11:38 AM, rahul malviya malviyarahul2...@gmail.com wrote: Hi All, In our cluster region server logs are filled with response too slow message. This is causing jobs to slow down. How can I debug what is the reason for this slowness. We have enabled short circuit reads and region server has 27GB RAM. Here is a trace when regionserver starts. Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex core file
Re: Response Too Slow in RegionServer Logs
Is there any reason for 27G heap? It seems you run M/R job? If yes, then I would recommend you trying M/R over snapshots. Combination of large HBase heap and M/R is very hard to tune if possible at all You can also try reducing number of map tasks and check your MR job resource consumption On May 29, 2015 8:58 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: That one long GC aside, look at the timings of the others as well. Even the smaller GCs are taking up the majority of each second. For a heap that size you might want to try a java version over java7u60 and use the G1GC. Otherwise there are a bunch of resources on the web including in the refguide for how to tune GCs for hbase. On Fri, May 29, 2015 at 11:49 AM, rahul malviya malviyarahul2...@gmail.com wrote: This is a sample from gc log file. At the end I see long gc pauses. Is there a way I can tune this ? 2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+: 98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs] 13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09 sys=0.07, real=0.65 secs] 2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+: 98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs] 13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39 sys=0.08, real=1.20 secs] 2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+: 98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs] 13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07 sys=0.05, real=0.90 secs] 2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+: 98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs] 13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98 sys=0.08, real=0.88 secs] 2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+: 98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs] 13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99 sys=0.04, real=0.63 secs] 2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+: 98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs] 13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72 sys=0.05, real=0.64 secs] 2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+: 98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs] 13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16 sys=0.05, real=0.72 secs] 2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+: 98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs] 13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70 sys=0.05, real=0.53 secs] 2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+: 98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs] 13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57 sys=0.08, real=1.08 secs] 2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+: 98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs] 13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48 sys=0.05, real=0.57 secs] 2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+: 98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs] 13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49 sys=0.07, real=0.71 secs] 2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+: 98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500 secs]2015-04-29T22:46:22.243+: 98071.517: [CMS: 13403504K-7807614K(19420524K), 284.4275150 secs] 13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)], 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs] Heap par new generation total 2146944K, used 868806K [0x00016360, 0x0001f4f9, 0x0001f4f9) eden space 1908416K, 45% used [0x00016360, 0x0001986718f8, 0x0001d7db) from space 238528K, 0% used [0x0001d7db, 0x0001d7db, 0x0001e66a) to space 238528K, 0% used [0x0001e66a, 0x0001e66a, 0x0001f4f9) concurrent mark-sweep generation total 19420524K, used 7807614K [0x0001f4f9, 0x0006964eb000, 0x0007fae0) concurrent-mark-sweep perm gen total 51000K, used 30498K [0x0007fae0, 0x0007fdfce000, 0x0008) Thanks, Rahul On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 14912ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired I would check your gc logs for long gc pauses. On Fri, May 29,
Re: Response Too Slow in RegionServer Logs
2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 14912ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired I would check your gc logs for long gc pauses. On Fri, May 29, 2015 at 11:38 AM, rahul malviya malviyarahul2...@gmail.com wrote: Hi All, In our cluster region server logs are filled with response too slow message. This is causing jobs to slow down. How can I debug what is the reason for this slowness. We have enabled short circuit reads and region server has 27GB RAM. Here is a trace when regionserver starts. Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 966365 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 966365 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 2014-08-14 20:23:53,341 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2014-08-14 20:23:53,342 WARN org.apache.hadoop.conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 2014-08-14 20:23:53,884 WARN org.apache.hadoop.conf.Configuration: slave.host.name is deprecated. Instead, use mapreduce.tasktracker.host.name 2014-08-14 20:24:03,999 WARN org.apache.hadoop.conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2014-08-14 20:26:47,605 ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false 2014-08-14 20:28:23,491 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:18725,call:next(-8041903839443097981, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.248:58716 ,starttimems:1408048084720,queuetimems:0,class:HRegionServer,responsesize:5031595,method:next} 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 14912ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2014-08-14 21:42:28,477 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:16968,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:36657 ,starttimems:1408052531504,queuetimems:0,class:HRegionServer,responsesize:1959532,method:next} 2014-08-14 21:42:56,923 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:10591,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:40818 ,starttimems:1408052566327,queuetimems:1,class:HRegionServer,responsesize:2987578,method:next} 2014-08-14 21:44:24,372 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:10656,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:41993 ,starttimems:1408052653710,queuetimems:1,class:HRegionServer,responsesize:3039779,method:next} 2014-08-14 21:45:50,598 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:12418,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:45197 ,starttimems:1408052738174,queuetimems:10,class:HRegionServer,responsesize:2476903,method:next} 2014-08-14 21:46:15,187 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:23766,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:49425 ,starttimems:1408052751414,queuetimems:0,class:HRegionServer,responsesize:5681175,method:next} 2014-08-14 21:47:09,041 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:12320,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:50269 ,starttimems:1408052816698,queuetimems:1,class:HRegionServer,responsesize:2986949,method:next} 2014-08-14 21:49:23,833 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:11389,call:next(1227841280814011139, 10), rpc version=1,
Re: Response Too Slow in RegionServer Logs
This is a sample from gc log file. At the end I see long gc pauses. Is there a way I can tune this ? 2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+: 98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs] 13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09 sys=0.07, real=0.65 secs] 2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+: 98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs] 13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39 sys=0.08, real=1.20 secs] 2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+: 98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs] 13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07 sys=0.05, real=0.90 secs] 2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+: 98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs] 13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98 sys=0.08, real=0.88 secs] 2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+: 98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs] 13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99 sys=0.04, real=0.63 secs] 2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+: 98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs] 13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72 sys=0.05, real=0.64 secs] 2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+: 98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs] 13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16 sys=0.05, real=0.72 secs] 2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+: 98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs] 13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70 sys=0.05, real=0.53 secs] 2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+: 98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs] 13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57 sys=0.08, real=1.08 secs] 2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+: 98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs] 13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48 sys=0.05, real=0.57 secs] 2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+: 98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs] 13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49 sys=0.07, real=0.71 secs] 2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+: 98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500 secs]2015-04-29T22:46:22.243+: 98071.517: [CMS: 13403504K-7807614K(19420524K), 284.4275150 secs] 13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)], 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs] Heap par new generation total 2146944K, used 868806K [0x00016360, 0x0001f4f9, 0x0001f4f9) eden space 1908416K, 45% used [0x00016360, 0x0001986718f8, 0x0001d7db) from space 238528K, 0% used [0x0001d7db, 0x0001d7db, 0x0001e66a) to space 238528K, 0% used [0x0001e66a, 0x0001e66a, 0x0001f4f9) concurrent mark-sweep generation total 19420524K, used 7807614K [0x0001f4f9, 0x0006964eb000, 0x0007fae0) concurrent-mark-sweep perm gen total 51000K, used 30498K [0x0007fae0, 0x0007fdfce000, 0x0008) Thanks, Rahul On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 14912ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired I would check your gc logs for long gc pauses. On Fri, May 29, 2015 at 11:38 AM, rahul malviya malviyarahul2...@gmail.com wrote: Hi All, In our cluster region server logs are filled with response too slow message. This is causing jobs to slow down. How can I debug what is the reason for this slowness. We have enabled short circuit reads and region server has 27GB RAM. Here is a trace when regionserver starts. Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 966365 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s)
Response Too Slow in RegionServer Logs
Hi All, In our cluster region server logs are filled with response too slow message. This is causing jobs to slow down. How can I debug what is the reason for this slowness. We have enabled short circuit reads and region server has 27GB RAM. Here is a trace when regionserver starts. Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 966365 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 966365 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 2014-08-14 20:23:53,341 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2014-08-14 20:23:53,342 WARN org.apache.hadoop.conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 2014-08-14 20:23:53,884 WARN org.apache.hadoop.conf.Configuration: slave.host.name is deprecated. Instead, use mapreduce.tasktracker.host.name 2014-08-14 20:24:03,999 WARN org.apache.hadoop.conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2014-08-14 20:26:47,605 ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false 2014-08-14 20:28:23,491 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:18725,call:next(-8041903839443097981, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.248:58716 ,starttimems:1408048084720,queuetimems:0,class:HRegionServer,responsesize:5031595,method:next} 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 14912ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2014-08-14 21:42:28,477 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:16968,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:36657 ,starttimems:1408052531504,queuetimems:0,class:HRegionServer,responsesize:1959532,method:next} 2014-08-14 21:42:56,923 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:10591,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:40818 ,starttimems:1408052566327,queuetimems:1,class:HRegionServer,responsesize:2987578,method:next} 2014-08-14 21:44:24,372 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:10656,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:41993 ,starttimems:1408052653710,queuetimems:1,class:HRegionServer,responsesize:3039779,method:next} 2014-08-14 21:45:50,598 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:12418,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:45197 ,starttimems:1408052738174,queuetimems:10,class:HRegionServer,responsesize:2476903,method:next} 2014-08-14 21:46:15,187 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:23766,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:49425 ,starttimems:1408052751414,queuetimems:0,class:HRegionServer,responsesize:5681175,method:next} 2014-08-14 21:47:09,041 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:12320,call:next(5487686201525374976, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.249:50269 ,starttimems:1408052816698,queuetimems:1,class:HRegionServer,responsesize:2986949,method:next} 2014-08-14 21:49:23,833 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:11389,call:next(1227841280814011139, 10), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 17.170.176.122:41976 ,starttimems:1408052952415,queuetimems:0,class:HRegionServer,responsesize:3160025,method:next} 2014-08-14 21:49:23,869 WARN org.apache.hadoop.ipc.HBaseServer: Exception while changing ops : java.nio.channels.CancelledKeyException 2014-08-14 21:49:23,900 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:11428,call:next(9103372947568217267, 10), rpc version=1, client
Re: Response Too Slow in RegionServer Logs
Yes we are running MR on hbase. I tried running MR on snapshot but the data in our HBase changes very frequently and we end up occupying twice the space and end up running into full disks. I think we are hitting large HBase heap and M/R problem. I will try to add some more space to our cluster and try MR on snapshot and see how it goes. Thanks for suggestion. Thanks, Rahul On Fri, May 29, 2015 at 9:14 AM, Vladimir Rodionov vladrodio...@gmail.com wrote: Is there any reason for 27G heap? It seems you run M/R job? If yes, then I would recommend you trying M/R over snapshots. Combination of large HBase heap and M/R is very hard to tune if possible at all You can also try reducing number of map tasks and check your MR job resource consumption On May 29, 2015 8:58 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: That one long GC aside, look at the timings of the others as well. Even the smaller GCs are taking up the majority of each second. For a heap that size you might want to try a java version over java7u60 and use the G1GC. Otherwise there are a bunch of resources on the web including in the refguide for how to tune GCs for hbase. On Fri, May 29, 2015 at 11:49 AM, rahul malviya malviyarahul2...@gmail.com wrote: This is a sample from gc log file. At the end I see long gc pauses. Is there a way I can tune this ? 2015-04-29T22:46:12.387+: 98061.660: [GC2015-04-29T22:46:12.387+: 98061.661: [ParNew: 572757K-63867K(580608K), 0.6549550 secs] 13294553K-12811090K(20001132K), 0.6551600 secs] [Times: user=8.09 sys=0.07, real=0.65 secs] 2015-04-29T22:46:13.165+: 98062.438: [GC2015-04-29T22:46:13.165+: 98062.439: [ParNew: 579901K-55337K(580608K), 1.1941730 secs] 13327124K-12846913K(20001132K), 1.1944150 secs] [Times: user=5.39 sys=0.08, real=1.20 secs] 2015-04-29T22:46:14.485+: 98063.759: [GC2015-04-29T22:46:14.485+: 98063.759: [ParNew: 571496K-62270K(580608K), 0.8990460 secs] 13363071K-12891412K(20001132K), 0.8992880 secs] [Times: user=7.07 sys=0.05, real=0.90 secs] 2015-04-29T22:46:15.519+: 98064.793: [GC2015-04-29T22:46:15.519+: 98064.793: [ParNew: 578417K-41496K(580608K), 0.8849310 secs] 13407559K-12907053K(20001132K), 0.8851300 secs] [Times: user=4.98 sys=0.08, real=0.88 secs] 2015-04-29T22:46:16.547+: 98065.821: [GC2015-04-29T22:46:16.547+: 98065.821: [ParNew: 557652K-50791K(580608K), 0.6345050 secs] 13423210K-12940694K(20001132K), 0.6347240 secs] [Times: user=7.99 sys=0.04, real=0.63 secs] 2015-04-29T22:46:17.273+: 98066.547: [GC2015-04-29T22:46:17.273+: 98066.547: [ParNew: 566889K-48425K(580608K), 0.6372610 secs] 13456792K-12966994K(20001132K), 0.6375250 secs] [Times: user=3.72 sys=0.05, real=0.64 secs] 2015-04-29T22:46:18.035+: 98067.309: [GC2015-04-29T22:46:18.035+: 98067.309: [ParNew: 56K-52607K(580608K), 0.7116560 secs] 13483013K-12997986K(20001132K), 0.7119250 secs] [Times: user=7.16 sys=0.05, real=0.72 secs] 2015-04-29T22:46:18.875+: 98068.149: [GC2015-04-29T22:46:18.875+: 98068.149: [ParNew: 568635K-54078K(580608K), 0.5286160 secs] 13514014K-13026734K(20001132K), 0.5288800 secs] [Times: user=4.70 sys=0.05, real=0.53 secs] 2015-04-29T22:46:19.521+: 98068.795: [GC2015-04-29T22:46:19.521+: 98068.795: [ParNew: 570106K-64416K(580608K), 1.0709370 secs] 13542763K-13091755K(20001132K), 1.0712320 secs] [Times: user=9.57 sys=0.08, real=1.08 secs] 2015-04-29T22:46:20.751+: 98070.024: [GC2015-04-29T22:46:20.751+: 98070.024: [ParNew: 580387K-47393K(580608K), 0.5688620 secs] 13607725K-13099551K(20001132K), 0.5690960 secs] [Times: user=6.48 sys=0.05, real=0.57 secs] 2015-04-29T22:46:21.393+: 98070.667: [GC2015-04-29T22:46:21.393+: 98070.667: [ParNew: 563351K-26473K(580608K), 0.7063100 secs] 13615509K-13429978K(20001132K), 0.7065560 secs] [Times: user=10.49 sys=0.07, real=0.71 secs] 2015-04-29T22:46:22.214+: 98071.487: [GC2015-04-29T22:46:22.214+: 98071.488: [ParNew: 292415K-39128K(580608K), 0.0293500 secs]2015-04-29T22:46:22.243+: 98071.517: [CMS: 13403504K-7807614K(19420524K), 284.4275150 secs] 13695919K-7807614K(20001132K), [CMS Perm : 30509K-30432K(51000K)], 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs] Heap par new generation total 2146944K, used 868806K [0x00016360, 0x0001f4f9, 0x0001f4f9) eden space 1908416K, 45% used [0x00016360, 0x0001986718f8, 0x0001d7db) from space 238528K, 0% used [0x0001d7db, 0x0001d7db, 0x0001e66a) to space 238528K, 0% used [0x0001e66a, 0x0001e66a, 0x0001f4f9) concurrent mark-sweep generation total 19420524K, used 7807614K [0x0001f4f9, 0x0006964eb000, 0x0007fae0) concurrent-mark-sweep perm