RE: region server down when scanning using mapreduce
how did you use scanner? paste some codes here. On Mar 12, 2013 4:13 PM, "Lu, Wei" wrote: > > We turned the block cache to false and tried again, regionserver still > crash one after another. > There are a lot of scanner lease time out, and then master log info: > RegionServer ephemeral node deleted, processing expiration > [rs21,60020,1363010589837] > Seems the problem is not caused by block cache > > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:41 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) > to > get some background of block cache. > > > On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > > > No, does block cache matter? Btw, the mr dump is a mr program we > > implemented rather than the hbase tool. > > > > Thanks > > > > -Original Message- > > From: Azuryy Yu [mailto:azury...@gmail.com] > > Sent: Tuesday, March 12, 2013 1:18 PM > > To: user@hbase.apache.org > > Subject: Re: region server down when scanning using mapreduce > > > > did you closed block cache when you used mr dump? > > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > > > Hi, > > > > > > When we use mapreduce to dump data from a pretty large table on hbase. > > One > > > region server crash and then another. Mapreduce is deployed together > with > > > hbase. > > > > > > 1) From log of the region server, there are both "next" and "multi" > > > operations on going. Is it because there is write/read conflict that > > cause > > > scanner timeout? > > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > > has about 30 regions (each of size 0.5G) on the region server, is it > > > because cpu is all used by mapreduce and that case region server slow > and > > > then timeout? > > > 2) current hbase.regionserver.handler.count is 10 by default, should it > > be > > > enlarged? > > > > > > Please give us some advices. > > > > > > Thanks, > > > Wei > > > > > > > > > Log information: > > > > > > > > > [Regionserver rs21:] > > > > > > 2013-03-11 18:36:28,148 INFO > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > > adcbg21.machine.wisdom.com > > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > > entries=22417, filesize=127539793. for > > > > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > > collecting pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56058 > > > > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56529 > > > > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56146 > > > > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"
RE: region server down when scanning using mapreduce
How is the GC pattern in your RSs which are getting down? In RS logs you might be having YouAreDeadExceptions... Pls try tuning your RS memory and GC opts. -Anoop- From: Lu, Wei [w...@microstrategy.com] Sent: Tuesday, March 12, 2013 1:42 PM To: user@hbase.apache.org Subject: RE: region server down when scanning using mapreduce We turned the block cache to false and tried again, regionserver still crash one after another. There are a lot of scanner lease time out, and then master log info: RegionServer ephemeral node deleted, processing expiration [rs21,60020,1363010589837] Seems the problem is not caused by block cache Thanks -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, March 12, 2013 1:41 PM To: user@hbase.apache.org Subject: Re: region server down when scanning using mapreduce please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to get some background of block cache. On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > No, does block cache matter? Btw, the mr dump is a mr program we > implemented rather than the hbase tool. > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:18 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > did you closed block cache when you used mr dump? > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > Hi, > > > > When we use mapreduce to dump data from a pretty large table on hbase. > One > > region server crash and then another. Mapreduce is deployed together with > > hbase. > > > > 1) From log of the region server, there are both "next" and "multi" > > operations on going. Is it because there is write/read conflict that > cause > > scanner timeout? > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > has about 30 regions (each of size 0.5G) on the region server, is it > > because cpu is all used by mapreduce and that case region server slow and > > then timeout? > > 2) current hbase.regionserver.handler.count is 10 by default, should it > be > > enlarged? > > > > Please give us some advices. > > > > Thanks, > > Wei > > > > > > Log information: > > > > > > [Regionserver rs21:] > > > > 2013-03-11 18:36:28,148 INFO > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > adcbg21.machine.wisdom.com > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > entries=22417, filesize=127539793. for > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > collecting pause and it's usually bad, see > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56058 > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56529 > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56146 > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31023,"call
RE: region server down when scanning using mapreduce
We turned the block cache to false and tried again, regionserver still crash one after another. There are a lot of scanner lease time out, and then master log info: RegionServer ephemeral node deleted, processing expiration [rs21,60020,1363010589837] Seems the problem is not caused by block cache Thanks -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, March 12, 2013 1:41 PM To: user@hbase.apache.org Subject: Re: region server down when scanning using mapreduce please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to get some background of block cache. On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > No, does block cache matter? Btw, the mr dump is a mr program we > implemented rather than the hbase tool. > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:18 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > did you closed block cache when you used mr dump? > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > Hi, > > > > When we use mapreduce to dump data from a pretty large table on hbase. > One > > region server crash and then another. Mapreduce is deployed together with > > hbase. > > > > 1) From log of the region server, there are both "next" and "multi" > > operations on going. Is it because there is write/read conflict that > cause > > scanner timeout? > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > has about 30 regions (each of size 0.5G) on the region server, is it > > because cpu is all used by mapreduce and that case region server slow and > > then timeout? > > 2) current hbase.regionserver.handler.count is 10 by default, should it > be > > enlarged? > > > > Please give us some advices. > > > > Thanks, > > Wei > > > > > > Log information: > > > > > > [Regionserver rs21:] > > > > 2013-03-11 18:36:28,148 INFO > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > adcbg21.machine.wisdom.com > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > entries=22417, filesize=127539793. for > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > collecting pause and it's usually bad, see > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56058 > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56529 > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56146 > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56069 > > > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":&q
Re: region server down when scanning using mapreduce
please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to get some background of block cache. On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > No, does block cache matter? Btw, the mr dump is a mr program we > implemented rather than the hbase tool. > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:18 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > did you closed block cache when you used mr dump? > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > Hi, > > > > When we use mapreduce to dump data from a pretty large table on hbase. > One > > region server crash and then another. Mapreduce is deployed together with > > hbase. > > > > 1) From log of the region server, there are both "next" and "multi" > > operations on going. Is it because there is write/read conflict that > cause > > scanner timeout? > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > has about 30 regions (each of size 0.5G) on the region server, is it > > because cpu is all used by mapreduce and that case region server slow and > > then timeout? > > 2) current hbase.regionserver.handler.count is 10 by default, should it > be > > enlarged? > > > > Please give us some advices. > > > > Thanks, > > Wei > > > > > > Log information: > > > > > > [Regionserver rs21:] > > > > 2013-03-11 18:36:28,148 INFO > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > adcbg21.machine.wisdom.com > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > entries=22417, filesize=127539793. for > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > collecting pause and it's usually bad, see > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56058 > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56529 > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56146 > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56069 > > > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} > > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56586 > > > ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,&q
RE: region server down when scanning using mapreduce
No, does block cache matter? Btw, the mr dump is a mr program we implemented rather than the hbase tool. Thanks -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, March 12, 2013 1:18 PM To: user@hbase.apache.org Subject: Re: region server down when scanning using mapreduce did you closed block cache when you used mr dump? On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > Hi, > > When we use mapreduce to dump data from a pretty large table on hbase. One > region server crash and then another. Mapreduce is deployed together with > hbase. > > 1) From log of the region server, there are both "next" and "multi" > operations on going. Is it because there is write/read conflict that cause > scanner timeout? > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > has about 30 regions (each of size 0.5G) on the region server, is it > because cpu is all used by mapreduce and that case region server slow and > then timeout? > 2) current hbase.regionserver.handler.count is 10 by default, should it be > enlarged? > > Please give us some advices. > > Thanks, > Wei > > > Log information: > > > [Regionserver rs21:] > > 2013-03-11 18:36:28,148 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > entries=22417, filesize=127539793. for > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > slept 28183ms instead of 3000ms, this is likely due to a long garbage > collecting pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56058 > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56529 > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56146 > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56069 > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56586 > ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"} > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a), > rpc version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.109.21:35342 > ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"} > 2013-03-11 18:37:49,108 WARN org.apache.h
Re: region server down when scanning using mapreduce
did you closed block cache when you used mr dump? On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > Hi, > > When we use mapreduce to dump data from a pretty large table on hbase. One > region server crash and then another. Mapreduce is deployed together with > hbase. > > 1) From log of the region server, there are both "next" and "multi" > operations on going. Is it because there is write/read conflict that cause > scanner timeout? > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > has about 30 regions (each of size 0.5G) on the region server, is it > because cpu is all used by mapreduce and that case region server slow and > then timeout? > 2) current hbase.regionserver.handler.count is 10 by default, should it be > enlarged? > > Please give us some advices. > > Thanks, > Wei > > > Log information: > > > [Regionserver rs21:] > > 2013-03-11 18:36:28,148 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > entries=22417, filesize=127539793. for > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > slept 28183ms instead of 3000ms, this is likely due to a long garbage > collecting pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56058 > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56529 > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56146 > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56069 > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc > version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.127.21:56586 > ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"} > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a), > rpc version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.109.21:35342 > ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"} > 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e), > rpc version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.125.11:57078 > ","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"} > 2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb), > rpc version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.109.20:51698 > ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"} > 2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf), > rpc version=1, client version=29, methodsFingerPrint=54742778","client":" > 10.20.125.11:57078 > ","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"} > 2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.Multi
region server down when scanning using mapreduce
Hi, When we use mapreduce to dump data from a pretty large table on hbase. One region server crash and then another. Mapreduce is deployed together with hbase. 1) From log of the region server, there are both "next" and "multi" operations on going. Is it because there is write/read conflict that cause scanner timeout? 2) Region server has 24 cores, and # max map tasks is 24 too; the table has about 30 regions (each of size 0.5G) on the region server, is it because cpu is all used by mapreduce and that case region server slow and then timeout? 2) current hbase.regionserver.handler.count is 10 by default, should it be enlarged? Please give us some advices. Thanks, Wei Log information: [Regionserver rs21:] 2013-03-11 18:36:28,148 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, entries=22417, filesize=127539793. for /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 28183ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56058","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56529","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56146","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56069","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.127.21:56586","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"} 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.109.21:35342","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.109.20:51698","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6928d7b), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.20.125.11:57076","starttimems":1363027060645,"queuetimems":34763,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-03-11 18:37:51,776