[ 
https://issues.apache.org/jira/browse/PHOENIX-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474930#comment-15474930
 ] 

Lars Hofhansl commented on PHOENIX-3081:
----------------------------------------

+1

Patch looks good. Any reason to not commit it? [~elserj]

> MIsleading exception on async stats update after major compaction
> -----------------------------------------------------------------
>
>                 Key: PHOENIX-3081
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3081
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Minor
>             Fix For: 4.9.0, 4.8.1
>
>         Attachments: PHOENIX-3081.001.patch
>
>
> Saw an error in some $dayjob testing where, while a RegionServer was going 
> down to due to an exception, there was a scary looking exception about being 
> unable to write to the stats table because an hconnection was closed. Pardon 
> the mis-matched line numbers:
> {noformat}
> 2016-07-17 07:52:13,229 ERROR [phoenix-update-statistics-0] 
> stats.StatisticsScanner: Failed to update statistics table!
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the 
> location
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>   at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
>   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
>   at 
> org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215)
>   at 
> org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136)
>   at 
> org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230)
>   at 
> org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117)
>   at 
> org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: hconnection-0x5314972b closed
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153)
>   at 
> org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133)
>   at 
> org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
>   at 
> org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
>   ... 17 more
> {noformat}
> Looking into this some more, this async task to update the stats was still 
> running after a RegionServer already was in the process of shutting down. The 
> RegionServer already closed all of the "userRegions", but, because this task 
> is async, the task is still running and using the RegionServer's 
> CoprocessorHConnection. So, the RegionServer thinks all of the user regions 
> are closed and it is safe to close the HConnection. In reality, there is 
> still code tied to those user regions that might be running (as we can see 
> with the above stacktrace). The next time the StatisticsScannerCallable tries 
> to use the HConnection, it will then error.
> I think the simple fix is to just use the CoprocessorEnvironment to access 
> the RegionServerServices and use the {{isClosing()}} and {{isClosed()}} 
> methods. This is all pretty minor because the RegionServer is already 
> shutting down, but it is likely misleading to less-experienced users who 
> would think that the last exception in the log is the problem.
> Will put up a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to