RE: region server down when scanning using mapreduce

2013-03-12 Thread Lu, Wei

We turned the block cache to false and tried again, regionserver still crash 
one after another.
There are a lot of scanner lease time out, and then master log info:
RegionServer ephemeral node deleted, processing expiration 
[rs21,60020,1363010589837]
Seems the problem is not caused by block cache


Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com] 
Sent: Tuesday, March 12, 2013 1:41 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei w...@microstrategy.com wrote:

 No, does block cache matter? Btw, the mr dump is a mr program we
 implemented rather than the hbase tool.

 Thanks

 -Original Message-
 From: Azuryy Yu [mailto:azury...@gmail.com]
 Sent: Tuesday, March 12, 2013 1:18 PM
 To: user@hbase.apache.org
 Subject: Re: region server down when scanning using mapreduce

 did you closed block cache when you used mr dump?
 On Mar 12, 2013 1:06 PM, Lu, Wei w...@microstrategy.com wrote:

  Hi,
 
  When we use mapreduce to dump data from a pretty large table on hbase.
 One
  region server crash and then another. Mapreduce is deployed together with
  hbase.
 
  1) From log of the region server, there are both next and multi
  operations on going. Is it because there is write/read conflict that
 cause
  scanner timeout?
  2) Region server has 24 cores, and # max map tasks is 24 too; the table
  has about 30 regions (each of size 0.5G) on the region server, is it
  because cpu is all used by mapreduce and that case region server slow and
  then timeout?
  2) current hbase.regionserver.handler.count is 10 by default, should it
 be
  enlarged?
 
  Please give us some advices.
 
  Thanks,
  Wei
 
 
  Log information:
 
 
  [Regionserver rs21:]
 
  2013-03-11 18:36:28,148 INFO
  org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
  adcbg21.machine.wisdom.com
 ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
  entries=22417, filesize=127539793.  for
 
 /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
  2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept 28183ms instead of 3000ms, this is likely due to a long garbage
  collecting pause and it's usually bad, see
  http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:29830,call:next(1656517918313948447, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56058
 
 ,starttimems:1363027030280,queuetimems:4602,class:HRegionServer,responsesize:2774484,method:next}
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31195,call:next(-8353194140406556404, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56529
 
 ,starttimems:1363027028804,queuetimems:3634,class:HRegionServer,responsesize:2270919,method:next}
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:30965,call:next(2623756537510669130, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56146
 
 ,starttimems:1363027028807,queuetimems:3484,class:HRegionServer,responsesize:2753299,method:next}
  2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31023,call:next(5293572780165196795, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56069
 
 ,starttimems:1363027029086,queuetimems:3589,class:HRegionServer,responsesize:2722543,method:next}
  2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31160,call:next(-4285417329791344278, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56586
 
 ,starttimems:1363027029204,queuetimems:3707,class:HRegionServer,responsesize:2938870,method:next}
  2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:31249,call:multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
 ),
  rpc version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.109.21:35342
 
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
  2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:38813,call:multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e
 ),
  rpc version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.125.11:57078
 
 ,starttimems:1363027030273,queuetimems:4663,class:HRegionServer,responsesize:0,method:multi}
  2013-03-11 18:37:50,410 WARN

RE: region server down when scanning using mapreduce

2013-03-12 Thread Anoop Sam John
How is the GC pattern in your RSs which are getting down? In RS logs you might 
be having YouAreDeadExceptions...
Pls try tuning your RS memory and GC opts.

-Anoop-

From: Lu, Wei [w...@microstrategy.com]
Sent: Tuesday, March 12, 2013 1:42 PM
To: user@hbase.apache.org
Subject: RE: region server down when scanning using mapreduce

We turned the block cache to false and tried again, regionserver still crash 
one after another.
There are a lot of scanner lease time out, and then master log info:
RegionServer ephemeral node deleted, processing expiration 
[rs21,60020,1363010589837]
Seems the problem is not caused by block cache


Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: Tuesday, March 12, 2013 1:41 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei w...@microstrategy.com wrote:

 No, does block cache matter? Btw, the mr dump is a mr program we
 implemented rather than the hbase tool.

 Thanks

 -Original Message-
 From: Azuryy Yu [mailto:azury...@gmail.com]
 Sent: Tuesday, March 12, 2013 1:18 PM
 To: user@hbase.apache.org
 Subject: Re: region server down when scanning using mapreduce

 did you closed block cache when you used mr dump?
 On Mar 12, 2013 1:06 PM, Lu, Wei w...@microstrategy.com wrote:

  Hi,
 
  When we use mapreduce to dump data from a pretty large table on hbase.
 One
  region server crash and then another. Mapreduce is deployed together with
  hbase.
 
  1) From log of the region server, there are both next and multi
  operations on going. Is it because there is write/read conflict that
 cause
  scanner timeout?
  2) Region server has 24 cores, and # max map tasks is 24 too; the table
  has about 30 regions (each of size 0.5G) on the region server, is it
  because cpu is all used by mapreduce and that case region server slow and
  then timeout?
  2) current hbase.regionserver.handler.count is 10 by default, should it
 be
  enlarged?
 
  Please give us some advices.
 
  Thanks,
  Wei
 
 
  Log information:
 
 
  [Regionserver rs21:]
 
  2013-03-11 18:36:28,148 INFO
  org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
  adcbg21.machine.wisdom.com
 ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
  entries=22417, filesize=127539793.  for
 
 /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
  2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept 28183ms instead of 3000ms, this is likely due to a long garbage
  collecting pause and it's usually bad, see
  http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:29830,call:next(1656517918313948447, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56058
 
 ,starttimems:1363027030280,queuetimems:4602,class:HRegionServer,responsesize:2774484,method:next}
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31195,call:next(-8353194140406556404, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56529
 
 ,starttimems:1363027028804,queuetimems:3634,class:HRegionServer,responsesize:2270919,method:next}
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:30965,call:next(2623756537510669130, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56146
 
 ,starttimems:1363027028807,queuetimems:3484,class:HRegionServer,responsesize:2753299,method:next}
  2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31023,call:next(5293572780165196795, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56069
 
 ,starttimems:1363027029086,queuetimems:3589,class:HRegionServer,responsesize:2722543,method:next}
  2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31160,call:next(-4285417329791344278, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56586
 
 ,starttimems:1363027029204,queuetimems:3707,class:HRegionServer,responsesize:2938870,method:next}
  2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:31249,call:multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
 ),
  rpc version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.109.21:35342
 
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
  2013-03-11 18:37:49,108 WARN

RE: region server down when scanning using mapreduce

2013-03-12 Thread Azuryy Yu
how did you use scanner? paste some codes here.
On Mar 12, 2013 4:13 PM, Lu, Wei w...@microstrategy.com wrote:


 We turned the block cache to false and tried again, regionserver still
 crash one after another.
 There are a lot of scanner lease time out, and then master log info:
 RegionServer ephemeral node deleted, processing expiration
 [rs21,60020,1363010589837]
 Seems the problem is not caused by block cache


 Thanks

 -Original Message-
 From: Azuryy Yu [mailto:azury...@gmail.com]
 Sent: Tuesday, March 12, 2013 1:41 PM
 To: user@hbase.apache.org
 Subject: Re: region server down when scanning using mapreduce

 please read here http://hbase.apache.org/book.html (11.8.5. Block Cache)
 to
 get some background of block cache.


 On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei w...@microstrategy.com wrote:

  No, does block cache matter? Btw, the mr dump is a mr program we
  implemented rather than the hbase tool.
 
  Thanks
 
  -Original Message-
  From: Azuryy Yu [mailto:azury...@gmail.com]
  Sent: Tuesday, March 12, 2013 1:18 PM
  To: user@hbase.apache.org
  Subject: Re: region server down when scanning using mapreduce
 
  did you closed block cache when you used mr dump?
  On Mar 12, 2013 1:06 PM, Lu, Wei w...@microstrategy.com wrote:
 
   Hi,
  
   When we use mapreduce to dump data from a pretty large table on hbase.
  One
   region server crash and then another. Mapreduce is deployed together
 with
   hbase.
  
   1) From log of the region server, there are both next and multi
   operations on going. Is it because there is write/read conflict that
  cause
   scanner timeout?
   2) Region server has 24 cores, and # max map tasks is 24 too; the table
   has about 30 regions (each of size 0.5G) on the region server, is it
   because cpu is all used by mapreduce and that case region server slow
 and
   then timeout?
   2) current hbase.regionserver.handler.count is 10 by default, should it
  be
   enlarged?
  
   Please give us some advices.
  
   Thanks,
   Wei
  
  
   Log information:
  
  
   [Regionserver rs21:]
  
   2013-03-11 18:36:28,148 INFO
   org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
   adcbg21.machine.wisdom.com
  ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
   entries=22417, filesize=127539793.  for
  
 
 /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
   2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
   slept 28183ms instead of 3000ms, this is likely due to a long garbage
   collecting pause and it's usually bad, see
   http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
   2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
   {processingtimems:29830,call:next(1656517918313948447, 1000), rpc
   version=1, client version=29, methodsFingerPrint=54742778,client:
   10.20.127.21:56058
  
 
 ,starttimems:1363027030280,queuetimems:4602,class:HRegionServer,responsesize:2774484,method:next}
   2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
   {processingtimems:31195,call:next(-8353194140406556404, 1000), rpc
   version=1, client version=29, methodsFingerPrint=54742778,client:
   10.20.127.21:56529
  
 
 ,starttimems:1363027028804,queuetimems:3634,class:HRegionServer,responsesize:2270919,method:next}
   2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
   {processingtimems:30965,call:next(2623756537510669130, 1000), rpc
   version=1, client version=29, methodsFingerPrint=54742778,client:
   10.20.127.21:56146
  
 
 ,starttimems:1363027028807,queuetimems:3484,class:HRegionServer,responsesize:2753299,method:next}
   2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
   {processingtimems:31023,call:next(5293572780165196795, 1000), rpc
   version=1, client version=29, methodsFingerPrint=54742778,client:
   10.20.127.21:56069
  
 
 ,starttimems:1363027029086,queuetimems:3589,class:HRegionServer,responsesize:2722543,method:next}
   2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
   {processingtimems:31160,call:next(-4285417329791344278, 1000), rpc
   version=1, client version=29, methodsFingerPrint=54742778,client:
   10.20.127.21:56586
  
 
 ,starttimems:1363027029204,queuetimems:3707,class:HRegionServer,responsesize:2938870,method:next}
   2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
  
 
 {processingtimems:31249,call:multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
  ),
   rpc version=1, client version=29,
 methodsFingerPrint=54742778,client:
   10.20.109.21:35342
  
 
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
   2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
   (responseTooSlow):
  
 
 {processingtimems:38813,call:multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e

Re: region server down when scanning using mapreduce

2013-03-11 Thread Azuryy Yu
did you closed block cache when you used mr dump?
On Mar 12, 2013 1:06 PM, Lu, Wei w...@microstrategy.com wrote:

 Hi,

 When we use mapreduce to dump data from a pretty large table on hbase. One
 region server crash and then another. Mapreduce is deployed together with
 hbase.

 1) From log of the region server, there are both next and multi
 operations on going. Is it because there is write/read conflict that cause
 scanner timeout?
 2) Region server has 24 cores, and # max map tasks is 24 too; the table
 has about 30 regions (each of size 0.5G) on the region server, is it
 because cpu is all used by mapreduce and that case region server slow and
 then timeout?
 2) current hbase.regionserver.handler.count is 10 by default, should it be
 enlarged?

 Please give us some advices.

 Thanks,
 Wei


 Log information:


 [Regionserver rs21:]

 2013-03-11 18:36:28,148 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
 adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
 entries=22417, filesize=127539793.  for
 /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
 slept 28183ms instead of 3000ms, this is likely due to a long garbage
 collecting pause and it's usually bad, see
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:29830,call:next(1656517918313948447, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56058
 ,starttimems:1363027030280,queuetimems:4602,class:HRegionServer,responsesize:2774484,method:next}
 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31195,call:next(-8353194140406556404, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56529
 ,starttimems:1363027028804,queuetimems:3634,class:HRegionServer,responsesize:2270919,method:next}
 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:30965,call:next(2623756537510669130, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56146
 ,starttimems:1363027028807,queuetimems:3484,class:HRegionServer,responsesize:2753299,method:next}
 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31023,call:next(5293572780165196795, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56069
 ,starttimems:1363027029086,queuetimems:3589,class:HRegionServer,responsesize:2722543,method:next}
 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31160,call:next(-4285417329791344278, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56586
 ,starttimems:1363027029204,queuetimems:3707,class:HRegionServer,responsesize:2938870,method:next}
 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31249,call:multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.109.21:35342
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:38813,call:multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.125.11:57078
 ,starttimems:1363027030273,queuetimems:4663,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:38893,call:multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.109.20:51698
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:40037,call:multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.125.11:57078
 ,starttimems:1363027030601,queuetimems:4818,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:10880,call:multi(org.apache.hadoop.hbase.client.MultiAction@6928d7b),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.125.11:57076
 ,starttimems:1363027060645,queuetimems:34763,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:51,776 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 

RE: region server down when scanning using mapreduce

2013-03-11 Thread Lu, Wei
No, does block cache matter? Btw, the mr dump is a mr program we implemented 
rather than the hbase tool.

Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com] 
Sent: Tuesday, March 12, 2013 1:18 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

did you closed block cache when you used mr dump?
On Mar 12, 2013 1:06 PM, Lu, Wei w...@microstrategy.com wrote:

 Hi,

 When we use mapreduce to dump data from a pretty large table on hbase. One
 region server crash and then another. Mapreduce is deployed together with
 hbase.

 1) From log of the region server, there are both next and multi
 operations on going. Is it because there is write/read conflict that cause
 scanner timeout?
 2) Region server has 24 cores, and # max map tasks is 24 too; the table
 has about 30 regions (each of size 0.5G) on the region server, is it
 because cpu is all used by mapreduce and that case region server slow and
 then timeout?
 2) current hbase.regionserver.handler.count is 10 by default, should it be
 enlarged?

 Please give us some advices.

 Thanks,
 Wei


 Log information:


 [Regionserver rs21:]

 2013-03-11 18:36:28,148 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
 adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
 entries=22417, filesize=127539793.  for
 /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
 slept 28183ms instead of 3000ms, this is likely due to a long garbage
 collecting pause and it's usually bad, see
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:29830,call:next(1656517918313948447, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56058
 ,starttimems:1363027030280,queuetimems:4602,class:HRegionServer,responsesize:2774484,method:next}
 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31195,call:next(-8353194140406556404, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56529
 ,starttimems:1363027028804,queuetimems:3634,class:HRegionServer,responsesize:2270919,method:next}
 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:30965,call:next(2623756537510669130, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56146
 ,starttimems:1363027028807,queuetimems:3484,class:HRegionServer,responsesize:2753299,method:next}
 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31023,call:next(5293572780165196795, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56069
 ,starttimems:1363027029086,queuetimems:3589,class:HRegionServer,responsesize:2722543,method:next}
 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31160,call:next(-4285417329791344278, 1000), rpc
 version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.127.21:56586
 ,starttimems:1363027029204,queuetimems:3707,class:HRegionServer,responsesize:2938870,method:next}
 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:31249,call:multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.109.21:35342
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:38813,call:multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.125.11:57078
 ,starttimems:1363027030273,queuetimems:4663,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:38893,call:multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.109.20:51698
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:40037,call:multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf),
 rpc version=1, client version=29, methodsFingerPrint=54742778,client:
 10.20.125.11:57078
 ,starttimems:1363027030601,queuetimems:4818,class:HRegionServer,responsesize:0,method:multi}
 2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:10880,call:multi

Re: region server down when scanning using mapreduce

2013-03-11 Thread Azuryy Yu
please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei w...@microstrategy.com wrote:

 No, does block cache matter? Btw, the mr dump is a mr program we
 implemented rather than the hbase tool.

 Thanks

 -Original Message-
 From: Azuryy Yu [mailto:azury...@gmail.com]
 Sent: Tuesday, March 12, 2013 1:18 PM
 To: user@hbase.apache.org
 Subject: Re: region server down when scanning using mapreduce

 did you closed block cache when you used mr dump?
 On Mar 12, 2013 1:06 PM, Lu, Wei w...@microstrategy.com wrote:

  Hi,
 
  When we use mapreduce to dump data from a pretty large table on hbase.
 One
  region server crash and then another. Mapreduce is deployed together with
  hbase.
 
  1) From log of the region server, there are both next and multi
  operations on going. Is it because there is write/read conflict that
 cause
  scanner timeout?
  2) Region server has 24 cores, and # max map tasks is 24 too; the table
  has about 30 regions (each of size 0.5G) on the region server, is it
  because cpu is all used by mapreduce and that case region server slow and
  then timeout?
  2) current hbase.regionserver.handler.count is 10 by default, should it
 be
  enlarged?
 
  Please give us some advices.
 
  Thanks,
  Wei
 
 
  Log information:
 
 
  [Regionserver rs21:]
 
  2013-03-11 18:36:28,148 INFO
  org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
  adcbg21.machine.wisdom.com
 ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
  entries=22417, filesize=127539793.  for
 
 /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
  2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept 28183ms instead of 3000ms, this is likely due to a long garbage
  collecting pause and it's usually bad, see
  http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:29830,call:next(1656517918313948447, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56058
 
 ,starttimems:1363027030280,queuetimems:4602,class:HRegionServer,responsesize:2774484,method:next}
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31195,call:next(-8353194140406556404, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56529
 
 ,starttimems:1363027028804,queuetimems:3634,class:HRegionServer,responsesize:2270919,method:next}
  2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:30965,call:next(2623756537510669130, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56146
 
 ,starttimems:1363027028807,queuetimems:3484,class:HRegionServer,responsesize:2753299,method:next}
  2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31023,call:next(5293572780165196795, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56069
 
 ,starttimems:1363027029086,queuetimems:3589,class:HRegionServer,responsesize:2722543,method:next}
  2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
  {processingtimems:31160,call:next(-4285417329791344278, 1000), rpc
  version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.127.21:56586
 
 ,starttimems:1363027029204,queuetimems:3707,class:HRegionServer,responsesize:2938870,method:next}
  2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:31249,call:multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
 ),
  rpc version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.109.21:35342
 
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
  2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:38813,call:multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e
 ),
  rpc version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.125.11:57078
 
 ,starttimems:1363027030273,queuetimems:4663,class:HRegionServer,responsesize:0,method:multi}
  2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:38893,call:multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb
 ),
  rpc version=1, client version=29, methodsFingerPrint=54742778,client:
  10.20.109.20:51698
 
 ,starttimems:1363027031505,queuetimems:5720,class:HRegionServer,responsesize:0,method:multi}
  2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer:
  (responseTooSlow):
 
 {processingtimems:40037,call:multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf
 ),
  rpc version=1