[jira] [Updated] (PHOENIX-7876) Improve EXPLAIN

Andrew Kyle Purtell (Jira) Wed, 03 Jun 2026 14:55:08 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Kyle Purtell updated PHOENIX-7876:
-----------------------------------------
    Description: 
Phoenix's {{EXPLAIN [WITH REGIONS]}} output is today incomplete and easy to 
misread for both query analysis and performance investigation.

DDL EXPLAINs don't show columns/PK/families/salt/splits. Non-trivial joins 
render with almost no detail. The cost vector advertises three dimensions but 
only compares IO, and estimates/statistics are unreliable. Some optimizations 
are incorrectly categorized, for example some shown as row-eliminating 
predicates are actually column-projection optimizations. The operator trees of 
complex queries are flattened to a single ident level. Many planner facts are 
not surfaced at all, such as which rule chooses an index, which indexes the 
optimizer considered but rejected, what query rewrites took place (e.g. 
subquery decorrelation, star-join detection, right-to-left normalization, 
HAVING lift, RVC-offset translation, reverse-scan substitution, UNION ORDER BY, 
index expression substitution, and more), the specific hash-join strategy 
chosen, salt bucket counts, local vs. global vs. uncovered-global index 
distinctions, the particular flavor of atomic upsert chosen and server-side 
atomic update expressions, multi tenant context, CDC scope, transaction 
provider, projection lists, predicate to filter attribution, hints honored vs 
ignored, and the structure of the JSON/BSON/array path expressions evaluated 
server-side. 

This proposal closes all of those gaps. Details provided in the design 
document. 

Design document: 
https://docs.google.com/document/d/10H_MNWGQL7ZzsPmIVBMz7T1jJroez8AYpvwouKcBVQo/edit?tab=t.0

  was:
Phoenix's {{EXPLAIN [WITH REGIONS]}} output is today incomplete and easy to 
misread for both query analysis and performance investigation.

DDL EXPLAINs don't show columns/PK/families/salt/splits. Non-trivial joins 
render with almost no detail. The cost vector advertises three dimensions but 
only compares IO, and estimates/statistics are unreliable. Some optimizations 
are incorrectly categorized, for example some shown as row-eliminating 
predicates are actually column-projection optimizations. The operator trees of 
complex queries are flattened to a single ident level. Many planner facts are 
not surfaced at all, such as which rule chooses an index, which indexes the 
optimizer considered but rejected, what query rewrites took place (e.g. 
subquery decorrelation, star-join detection, right-to-left normalization, 
HAVING lift, RVC-offset translation, reverse-scan substitution, UNION ORDER BY, 
index expression substitution, and more), the specific hash-join strategy 
chosen, salt bucket counts, local vs. global vs. uncovered-global index 
distinctions, the particular flavor of atomic upsert chosen and server-side 
atomic update expressions, multi tenant context, CDC scope, transaction 
provider, projection lists, predicate to filter attribution, hints honored vs 
ignored, and the structure of the JSON/BSON/array path expressions evaluated 
server-side. 

This proposal closes all of those gaps. Details provided in the design 
document. 


> Improve EXPLAIN
> ---------------
>
>                 Key: PHOENIX-7876
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7876
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>
> Phoenix's {{EXPLAIN [WITH REGIONS]}} output is today incomplete and easy to 
> misread for both query analysis and performance investigation.
> DDL EXPLAINs don't show columns/PK/families/salt/splits. Non-trivial joins 
> render with almost no detail. The cost vector advertises three dimensions but 
> only compares IO, and estimates/statistics are unreliable. Some optimizations 
> are incorrectly categorized, for example some shown as row-eliminating 
> predicates are actually column-projection optimizations. The operator trees 
> of complex queries are flattened to a single ident level. Many planner facts 
> are not surfaced at all, such as which rule chooses an index, which indexes 
> the optimizer considered but rejected, what query rewrites took place (e.g. 
> subquery decorrelation, star-join detection, right-to-left normalization, 
> HAVING lift, RVC-offset translation, reverse-scan substitution, UNION ORDER 
> BY, index expression substitution, and more), the specific hash-join strategy 
> chosen, salt bucket counts, local vs. global vs. uncovered-global index 
> distinctions, the particular flavor of atomic upsert chosen and server-side 
> atomic update expressions, multi tenant context, CDC scope, transaction 
> provider, projection lists, predicate to filter attribution, hints honored vs 
> ignored, and the structure of the JSON/BSON/array path expressions evaluated 
> server-side. 
> This proposal closes all of those gaps. Details provided in the design 
> document. 
> Design document: 
> https://docs.google.com/document/d/10H_MNWGQL7ZzsPmIVBMz7T1jJroez8AYpvwouKcBVQo/edit?tab=t.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7876) Improve EXPLAIN

Reply via email to