[ https://issues.apache.org/jira/browse/PHOENIX-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474930#comment-15474930 ]
Lars Hofhansl commented on PHOENIX-3081: ---------------------------------------- +1 Patch looks good. Any reason to not commit it? [~elserj] > MIsleading exception on async stats update after major compaction > ----------------------------------------------------------------- > > Key: PHOENIX-3081 > URL: https://issues.apache.org/jira/browse/PHOENIX-3081 > Project: Phoenix > Issue Type: Improvement > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Minor > Fix For: 4.9.0, 4.8.1 > > Attachments: PHOENIX-3081.001.patch > > > Saw an error in some $dayjob testing where, while a RegionServer was going > down to due to an exception, there was a scary looking exception about being > unable to write to the stats table because an hconnection was closed. Pardon > the mis-matched line numbers: > {noformat} > 2016-07-17 07:52:13,229 ERROR [phoenix-update-statistics-0] > stats.StatisticsScanner: Failed to update statistics table! > org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the > location > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326) > at > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301) > at > org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166) > at > org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161) > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794) > at > org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215) > at > org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136) > at > org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230) > at > org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117) > at > org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: hconnection-0x5314972b closed > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153) > at > org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133) > at > org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162) > at > org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300) > ... 17 more > {noformat} > Looking into this some more, this async task to update the stats was still > running after a RegionServer already was in the process of shutting down. The > RegionServer already closed all of the "userRegions", but, because this task > is async, the task is still running and using the RegionServer's > CoprocessorHConnection. So, the RegionServer thinks all of the user regions > are closed and it is safe to close the HConnection. In reality, there is > still code tied to those user regions that might be running (as we can see > with the above stacktrace). The next time the StatisticsScannerCallable tries > to use the HConnection, it will then error. > I think the simple fix is to just use the CoprocessorEnvironment to access > the RegionServerServices and use the {{isClosing()}} and {{isClosed()}} > methods. This is all pretty minor because the RegionServer is already > shutting down, but it is likely misleading to less-experienced users who > would think that the last exception in the log is the problem. > Will put up a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)