[ 
https://issues.apache.org/jira/browse/PHOENIX-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302951#comment-15302951
 ] 

Josh Elser commented on PHOENIX-2940:
-------------------------------------

{quote}
bq.    do not cache stats on the server side at all

I'm not sure if this is good or not. Presumably we'll want to have access to 
statistics from within the coprocessors to facilitate local decisions during 
query execution. I started to ask Maryann Xue about these kinds of requirements 
in her Calcite efforts. Maybe she can comment?
{quote}

Speaking from my failed experiments on how to approach this one, it's certainly 
difficult to cache properly with the varying currentSCN a client could use (and 
be able to consistently explain how Phoenix is going to work). I did have a 
passing wonder if we could improve one PTableStats impl that could actually be 
smart enough provide a view at some (later given) currentSCN (if you consider 
the current implementation a snapshot at one time). This would make the 
server-side caching logic much easier to implement but is definitely a 
down-the-road thought.

I had talked briefly with Nick about all of the stats that Calcite would be 
needing (both what Phoenix could provide at the "sql table" level and what 
HBase itself could provide at the file level). Seems prudent to also fork off 
that discussion to start filling in more gaps for the calcite branch. That 
said, the last time I looked at all those stats, Calcite was still going by 
Optiq sooooo :)

> Remove STATS RPCs from rowlock
> ------------------------------
>
>                 Key: PHOENIX-2940
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2940
>             Project: Phoenix
>          Issue Type: Improvement
>         Environment: HDP 2.3 + Apache Phoenix 4.6.0
>            Reporter: Nick Dimiduk
>            Assignee: Josh Elser
>
> We have an unfortunate situation wherein we potentially execute many RPCs 
> while holding a row lock. This is problem is discussed in detail on the user 
> list thread ["Write path blocked by MetaDataEndpoint acquiring region 
> lock"|http://search-hadoop.com/m/9UY0h2qRaBt6Tnaz1&subj=Write+path+blocked+by+MetaDataEndpoint+acquiring+region+lock].
>  During some situations, the 
> [MetaDataEndpoint|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L492]
>  coprocessor will attempt to refresh it's view of the schema definitions and 
> statistics. This involves [taking a 
> rowlock|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2862],
>  executing a scan against the [local 
> region|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L542],
>  and then a scan against a [potentially 
> remote|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L964]
>  statistics table.
> This issue is apparently exacerbated by the use of user-provided timestamps 
> (in my case, the use of the ROW_TIMESTAMP feature, or perhaps as in 
> PHOENIX-2607). When combined with other issues (PHOENIX-2939), we end up with 
> total gridlock in our handler threads -- everyone queued behind the rowlock, 
> scanning and rescanning SYSTEM.STATS. Because this happens in the 
> MetaDataEndpoint, the means by which all clients refresh their knowledge of 
> schema, gridlock in that RS can effectively stop all forward progress on the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to