[
https://issues.apache.org/jira/browse/PHOENIX-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser updated PHOENIX-3081:
--------------------------------
Attachment: PHOENIX-3081.001.patch
.001 little patch with a little test.
> MIsleading exception on async stats update after major compaction
> -----------------------------------------------------------------
>
> Key: PHOENIX-3081
> URL: https://issues.apache.org/jira/browse/PHOENIX-3081
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Minor
> Fix For: 4.9.0, 4.8.1
>
> Attachments: PHOENIX-3081.001.patch
>
>
> Saw an error in some $dayjob testing where, while a RegionServer was going
> down to due to an exception, there was a scary looking exception about being
> unable to write to the stats table because an hconnection was closed. Pardon
> the mis-matched line numbers:
> {noformat}
> 2016-07-17 07:52:13,229 ERROR [phoenix-update-statistics-0]
> stats.StatisticsScanner: Failed to update statistics table!
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the
> location
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
> at
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
> at
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
> at
> org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215)
> at
> org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136)
> at
> org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230)
> at
> org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117)
> at
> org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: hconnection-0x5314972b closed
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153)
> at
> org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133)
> at
> org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
> at
> org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
> ... 17 more
> {noformat}
> Looking into this some more, this async task to update the stats was still
> running after a RegionServer already was in the process of shutting down. The
> RegionServer already closed all of the "userRegions", but, because this task
> is async, the task is still running and using the RegionServer's
> CoprocessorHConnection. So, the RegionServer thinks all of the user regions
> are closed and it is safe to close the HConnection. In reality, there is
> still code tied to those user regions that might be running (as we can see
> with the above stacktrace). The next time the StatisticsScannerCallable tries
> to use the HConnection, it will then error.
> I think the simple fix is to just use the CoprocessorEnvironment to access
> the RegionServerServices and use the {{isClosing()}} and {{isClosed()}}
> methods. This is all pretty minor because the RegionServer is already
> shutting down, but it is likely misleading to less-experienced users who
> would think that the last exception in the log is the problem.
> Will put up a patch shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)