[ 
https://issues.apache.org/jira/browse/PHOENIX-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042350#comment-15042350
 ] 

Samarth Jain edited comment on PHOENIX-2408 at 12/4/15 10:34 PM:
-----------------------------------------------------------------

Spent the last couple of days trying to figure out what is going on here. On my 
laptop (1 region server), I loaded a table with 400 millions rows distributed 
over 8 regions. I added logging in a few places to see what is going on.  I see 
errors like these in my logs on the server side:

Exception caught in post scanner open for scan: 4. Exception: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting on region 
TESTXYZ,\x04\x00\x00\x00\x00\x00\x00\x00\x00,1449215361195.5fa492cebc9f25b9602ecaf1d4601daf.,
 call org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl@3efcf4dd 
after 121324 ms, since caller disconnected
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4144)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4061)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4048)
        at 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:288)
        at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:191)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1305)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1619)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1694)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1658)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1300)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3214)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30946)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2093)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)

It looks like org.apache.hadoop.hbase.ipc.CallerDisconnectedException is a 
regular IOException and not a DoNotRetryIOException. As a result, the 
BaseScannerRegionObserver#doPostScannerOpen() re-throws a regular IO exception 
back to the client resulting in retries. These retries however are never 
successful and we end up retrying the default number of times (31).

One thought I had was that I may be maxing out the IO on my laptop SSD. But 
then, reducing the number of region server handler threads from default to 2 
(to limit the I/O) didn't help either.

Will keep digging.


was (Author: samarthjain):
Spent the last couple of days trying to figure out what is going on here. On my 
laptop (1 region server), I loaded a table with 400 millions rows distributed 
over 8 regions. I added logging in a few places to see what is going on.  I see 
errors like these in my logs on the server side:

Exception caught in post scanner open for scan: 4. Exception: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting on region 
TESTXYZ,\x04\x00\x00\x00\x00\x00\x00\x00\x00,1449215361195.5fa492cebc9f25b9602ecaf1d4601daf.,
 call org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl@3efcf4dd 
after 121324 ms, since caller disconnected
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4144)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4061)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4048)
        at 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:288)
        at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:191)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1305)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1619)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1694)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1658)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1300)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3214)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30946)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2093)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)

It looks like org.apache.hadoop.hbase.ipc.CallerDisconnectedException is a 
regular IOException and not a DoNotRetryIOException. As a result, the 
BaseScannerRegionObserver#doPostScannerOpen() re-throws a regular IO exception 
back to the client resulting in retries. These retries however are never 
successful and we end up retrying the default number of times (31).

One thought I had was that I may be maxing out the IO on my laptop SSD. But 
then, reducing the number of region server handler threads from default to 2 
(to limit the I/O) didn't help either.

> Update statistics fails to complete
> -----------------------------------
>
>                 Key: PHOENIX-2408
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2408
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Samarth Jain
>             Fix For: 4.7.0
>
>
> On a production cluster, when UPDATE STATISTICS is run, it fails to complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to