[ https://issues.apache.org/jira/browse/PHOENIX-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302951#comment-15302951 ]
Josh Elser commented on PHOENIX-2940: ------------------------------------- {quote} bq. do not cache stats on the server side at all I'm not sure if this is good or not. Presumably we'll want to have access to statistics from within the coprocessors to facilitate local decisions during query execution. I started to ask Maryann Xue about these kinds of requirements in her Calcite efforts. Maybe she can comment? {quote} Speaking from my failed experiments on how to approach this one, it's certainly difficult to cache properly with the varying currentSCN a client could use (and be able to consistently explain how Phoenix is going to work). I did have a passing wonder if we could improve one PTableStats impl that could actually be smart enough provide a view at some (later given) currentSCN (if you consider the current implementation a snapshot at one time). This would make the server-side caching logic much easier to implement but is definitely a down-the-road thought. I had talked briefly with Nick about all of the stats that Calcite would be needing (both what Phoenix could provide at the "sql table" level and what HBase itself could provide at the file level). Seems prudent to also fork off that discussion to start filling in more gaps for the calcite branch. That said, the last time I looked at all those stats, Calcite was still going by Optiq sooooo :) > Remove STATS RPCs from rowlock > ------------------------------ > > Key: PHOENIX-2940 > URL: https://issues.apache.org/jira/browse/PHOENIX-2940 > Project: Phoenix > Issue Type: Improvement > Environment: HDP 2.3 + Apache Phoenix 4.6.0 > Reporter: Nick Dimiduk > Assignee: Josh Elser > > We have an unfortunate situation wherein we potentially execute many RPCs > while holding a row lock. This is problem is discussed in detail on the user > list thread ["Write path blocked by MetaDataEndpoint acquiring region > lock"|http://search-hadoop.com/m/9UY0h2qRaBt6Tnaz1&subj=Write+path+blocked+by+MetaDataEndpoint+acquiring+region+lock]. > During some situations, the > [MetaDataEndpoint|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L492] > coprocessor will attempt to refresh it's view of the schema definitions and > statistics. This involves [taking a > rowlock|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2862], > executing a scan against the [local > region|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L542], > and then a scan against a [potentially > remote|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L964] > statistics table. > This issue is apparently exacerbated by the use of user-provided timestamps > (in my case, the use of the ROW_TIMESTAMP feature, or perhaps as in > PHOENIX-2607). When combined with other issues (PHOENIX-2939), we end up with > total gridlock in our handler threads -- everyone queued behind the rowlock, > scanning and rescanning SYSTEM.STATS. Because this happens in the > MetaDataEndpoint, the means by which all clients refresh their knowledge of > schema, gridlock in that RS can effectively stop all forward progress on the > cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)