RE: region server down when scanning using mapreduce

2013-03-12 Thread Azuryy Yu
how did you use scanner? paste some codes here.
On Mar 12, 2013 4:13 PM, "Lu, Wei"  wrote:

>
> We turned the block cache to false and tried again, regionserver still
> crash one after another.
> There are a lot of scanner lease time out, and then master log info:
> RegionServer ephemeral node deleted, processing expiration
> [rs21,60020,1363010589837]
> Seems the problem is not caused by block cache
>
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:41 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> please read here http://hbase.apache.org/book.html (11.8.5. Block Cache)
> to
> get some background of block cache.
>
>
> On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:
>
> > No, does block cache matter? Btw, the mr dump is a mr program we
> > implemented rather than the hbase tool.
> >
> > Thanks
> >
> > -Original Message-
> > From: Azuryy Yu [mailto:azury...@gmail.com]
> > Sent: Tuesday, March 12, 2013 1:18 PM
> > To: user@hbase.apache.org
> > Subject: Re: region server down when scanning using mapreduce
> >
> > did you closed block cache when you used mr dump?
> > On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
> >
> > > Hi,
> > >
> > > When we use mapreduce to dump data from a pretty large table on hbase.
> > One
> > > region server crash and then another. Mapreduce is deployed together
> with
> > > hbase.
> > >
> > > 1) From log of the region server, there are both "next" and "multi"
> > > operations on going. Is it because there is write/read conflict that
> > cause
> > > scanner timeout?
> > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > > has about 30 regions (each of size 0.5G) on the region server, is it
> > > because cpu is all used by mapreduce and that case region server slow
> and
> > > then timeout?
> > > 2) current hbase.regionserver.handler.count is 10 by default, should it
> > be
> > > enlarged?
> > >
> > > Please give us some advices.
> > >
> > > Thanks,
> > > Wei
> > >
> > >
> > > Log information:
> > >
> > >
> > > [Regionserver rs21:]
> > >
> > > 2013-03-11 18:36:28,148 INFO
> > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > > adcbg21.machine.wisdom.com
> > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > > entries=22417, filesize=127539793.  for
> > >
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > > collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56058
> > >
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56529
> > >
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56146
> > >
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"

RE: region server down when scanning using mapreduce

2013-03-12 Thread Anoop Sam John
How is the GC pattern in your RSs which are getting down? In RS logs you might 
be having YouAreDeadExceptions...
Pls try tuning your RS memory and GC opts.

-Anoop-

From: Lu, Wei [w...@microstrategy.com]
Sent: Tuesday, March 12, 2013 1:42 PM
To: user@hbase.apache.org
Subject: RE: region server down when scanning using mapreduce

We turned the block cache to false and tried again, regionserver still crash 
one after another.
There are a lot of scanner lease time out, and then master log info:
RegionServer ephemeral node deleted, processing expiration 
[rs21,60020,1363010589837]
Seems the problem is not caused by block cache


Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: Tuesday, March 12, 2013 1:41 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:

> No, does block cache matter? Btw, the mr dump is a mr program we
> implemented rather than the hbase tool.
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:18 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> did you closed block cache when you used mr dump?
> On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
>
> > Hi,
> >
> > When we use mapreduce to dump data from a pretty large table on hbase.
> One
> > region server crash and then another. Mapreduce is deployed together with
> > hbase.
> >
> > 1) From log of the region server, there are both "next" and "multi"
> > operations on going. Is it because there is write/read conflict that
> cause
> > scanner timeout?
> > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > has about 30 regions (each of size 0.5G) on the region server, is it
> > because cpu is all used by mapreduce and that case region server slow and
> > then timeout?
> > 2) current hbase.regionserver.handler.count is 10 by default, should it
> be
> > enlarged?
> >
> > Please give us some advices.
> >
> > Thanks,
> > Wei
> >
> >
> > Log information:
> >
> >
> > [Regionserver rs21:]
> >
> > 2013-03-11 18:36:28,148 INFO
> > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > adcbg21.machine.wisdom.com
> ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > entries=22417, filesize=127539793.  for
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56058
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56529
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56146
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31023,"call

RE: region server down when scanning using mapreduce

2013-03-12 Thread Lu, Wei

We turned the block cache to false and tried again, regionserver still crash 
one after another.
There are a lot of scanner lease time out, and then master log info:
RegionServer ephemeral node deleted, processing expiration 
[rs21,60020,1363010589837]
Seems the problem is not caused by block cache


Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com] 
Sent: Tuesday, March 12, 2013 1:41 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:

> No, does block cache matter? Btw, the mr dump is a mr program we
> implemented rather than the hbase tool.
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:18 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> did you closed block cache when you used mr dump?
> On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
>
> > Hi,
> >
> > When we use mapreduce to dump data from a pretty large table on hbase.
> One
> > region server crash and then another. Mapreduce is deployed together with
> > hbase.
> >
> > 1) From log of the region server, there are both "next" and "multi"
> > operations on going. Is it because there is write/read conflict that
> cause
> > scanner timeout?
> > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > has about 30 regions (each of size 0.5G) on the region server, is it
> > because cpu is all used by mapreduce and that case region server slow and
> > then timeout?
> > 2) current hbase.regionserver.handler.count is 10 by default, should it
> be
> > enlarged?
> >
> > Please give us some advices.
> >
> > Thanks,
> > Wei
> >
> >
> > Log information:
> >
> >
> > [Regionserver rs21:]
> >
> > 2013-03-11 18:36:28,148 INFO
> > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > adcbg21.machine.wisdom.com
> ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > entries=22417, filesize=127539793.  for
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56058
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56529
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56146
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56069
> >
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":&q

Re: region server down when scanning using mapreduce

2013-03-11 Thread Azuryy Yu
please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:

> No, does block cache matter? Btw, the mr dump is a mr program we
> implemented rather than the hbase tool.
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:18 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> did you closed block cache when you used mr dump?
> On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
>
> > Hi,
> >
> > When we use mapreduce to dump data from a pretty large table on hbase.
> One
> > region server crash and then another. Mapreduce is deployed together with
> > hbase.
> >
> > 1) From log of the region server, there are both "next" and "multi"
> > operations on going. Is it because there is write/read conflict that
> cause
> > scanner timeout?
> > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > has about 30 regions (each of size 0.5G) on the region server, is it
> > because cpu is all used by mapreduce and that case region server slow and
> > then timeout?
> > 2) current hbase.regionserver.handler.count is 10 by default, should it
> be
> > enlarged?
> >
> > Please give us some advices.
> >
> > Thanks,
> > Wei
> >
> >
> > Log information:
> >
> >
> > [Regionserver rs21:]
> >
> > 2013-03-11 18:36:28,148 INFO
> > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > adcbg21.machine.wisdom.com
> ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > entries=22417, filesize=127539793.  for
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56058
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56529
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56146
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56069
> >
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56586
> >
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,&q

RE: region server down when scanning using mapreduce

2013-03-11 Thread Lu, Wei
No, does block cache matter? Btw, the mr dump is a mr program we implemented 
rather than the hbase tool.

Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com] 
Sent: Tuesday, March 12, 2013 1:18 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

did you closed block cache when you used mr dump?
On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:

> Hi,
>
> When we use mapreduce to dump data from a pretty large table on hbase. One
> region server crash and then another. Mapreduce is deployed together with
> hbase.
>
> 1) From log of the region server, there are both "next" and "multi"
> operations on going. Is it because there is write/read conflict that cause
> scanner timeout?
> 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> has about 30 regions (each of size 0.5G) on the region server, is it
> because cpu is all used by mapreduce and that case region server slow and
> then timeout?
> 2) current hbase.regionserver.handler.count is 10 by default, should it be
> enlarged?
>
> Please give us some advices.
>
> Thanks,
> Wei
>
>
> Log information:
>
>
> [Regionserver rs21:]
>
> 2013-03-11 18:36:28,148 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> entries=22417, filesize=127539793.  for
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> slept 28183ms instead of 3000ms, this is likely due to a long garbage
> collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56058
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56529
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56146
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56069
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56586
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
> 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.109.21:35342
> ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
> 2013-03-11 18:37:49,108 WARN org.apache.h

Re: region server down when scanning using mapreduce

2013-03-11 Thread Azuryy Yu
did you closed block cache when you used mr dump?
On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:

> Hi,
>
> When we use mapreduce to dump data from a pretty large table on hbase. One
> region server crash and then another. Mapreduce is deployed together with
> hbase.
>
> 1) From log of the region server, there are both "next" and "multi"
> operations on going. Is it because there is write/read conflict that cause
> scanner timeout?
> 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> has about 30 regions (each of size 0.5G) on the region server, is it
> because cpu is all used by mapreduce and that case region server slow and
> then timeout?
> 2) current hbase.regionserver.handler.count is 10 by default, should it be
> enlarged?
>
> Please give us some advices.
>
> Thanks,
> Wei
>
>
> Log information:
>
>
> [Regionserver rs21:]
>
> 2013-03-11 18:36:28,148 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> entries=22417, filesize=127539793.  for
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> slept 28183ms instead of 3000ms, this is likely due to a long garbage
> collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56058
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56529
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56146
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56069
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.127.21:56586
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
> 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.109.21:35342
> ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
> 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.125.11:57078
> ","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"}
> 2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.109.20:51698
> ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
> 2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf),
> rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.125.11:57078
> ","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"}
> 2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.Multi

region server down when scanning using mapreduce

2013-03-11 Thread Lu, Wei
Hi, 

When we use mapreduce to dump data from a pretty large table on hbase. One 
region server crash and then another. Mapreduce is deployed together with hbase.

1) From log of the region server, there are both "next" and "multi" operations 
on going. Is it because there is write/read conflict that cause scanner timeout?
2) Region server has 24 cores, and # max map tasks is 24 too; the table has 
about 30 regions (each of size 0.5G) on the region server, is it because cpu is 
all used by mapreduce and that case region server slow and then timeout?
2) current hbase.regionserver.handler.count is 10 by default, should it be 
enlarged?

Please give us some advices.

Thanks,
Wei


Log information: 


[Regionserver rs21:]

2013-03-11 18:36:28,148 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/hbase/.logs/adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
 entries=22417, filesize=127539793.  for 
/hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
28183ms instead of 3000ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":29830,"call":"next(1656517918313948447, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56058","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":31195,"call":"next(-8353194140406556404, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56529","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":30965,"call":"next(2623756537510669130, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56146","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":31023,"call":"next(5293572780165196795, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56069","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":31160,"call":"next(-4285417329791344278, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56586","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.109.21:35342","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.109.20:51698","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6928d7b),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57076","starttimems":1363027060645,"queuetimems":34763,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:51,776