[
https://issues.apache.org/jira/browse/PHOENIX-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated PHOENIX-7876:
-----------------------------------------
Description:
Phoenix's {{EXPLAIN [WITH REGIONS]}} output is today incomplete and easy to
misread for both query analysis and performance investigation.
DDL EXPLAINs don't show columns/PK/families/salt/splits. Non-trivial joins
render with almost no detail. The cost vector advertises three dimensions but
only compares IO, and estimates/statistics are unreliable. Some optimizations
are incorrectly categorized, for example some shown as row-eliminating
predicates are actually column-projection optimizations. The operator trees of
complex queries are flattened to a single ident level. Many planner facts are
not surfaced at all, such as which rule chooses an index, which indexes the
optimizer considered but rejected, what query rewrites took place (e.g.
subquery decorrelation, star-join detection, right-to-left normalization,
HAVING lift, RVC-offset translation, reverse-scan substitution, UNION ORDER BY,
index expression substitution, and more), the specific hash-join strategy
chosen, salt bucket counts, local vs. global vs. uncovered-global index
distinctions, the particular flavor of atomic upsert chosen and server-side
atomic update expressions, multi tenant context, CDC scope, transaction
provider, projection lists, predicate to filter attribution, hints honored vs
ignored, and the structure of the JSON/BSON/array path expressions evaluated
server-side.
This proposal closes all of those gaps. Details provided in the design
document.
Design document:
https://docs.google.com/document/d/10H_MNWGQL7ZzsPmIVBMz7T1jJroez8AYpvwouKcBVQo/edit?tab=t.0
was:
Phoenix's {{EXPLAIN [WITH REGIONS]}} output is today incomplete and easy to
misread for both query analysis and performance investigation.
DDL EXPLAINs don't show columns/PK/families/salt/splits. Non-trivial joins
render with almost no detail. The cost vector advertises three dimensions but
only compares IO, and estimates/statistics are unreliable. Some optimizations
are incorrectly categorized, for example some shown as row-eliminating
predicates are actually column-projection optimizations. The operator trees of
complex queries are flattened to a single ident level. Many planner facts are
not surfaced at all, such as which rule chooses an index, which indexes the
optimizer considered but rejected, what query rewrites took place (e.g.
subquery decorrelation, star-join detection, right-to-left normalization,
HAVING lift, RVC-offset translation, reverse-scan substitution, UNION ORDER BY,
index expression substitution, and more), the specific hash-join strategy
chosen, salt bucket counts, local vs. global vs. uncovered-global index
distinctions, the particular flavor of atomic upsert chosen and server-side
atomic update expressions, multi tenant context, CDC scope, transaction
provider, projection lists, predicate to filter attribution, hints honored vs
ignored, and the structure of the JSON/BSON/array path expressions evaluated
server-side.
This proposal closes all of those gaps. Details provided in the design
document.
> Improve EXPLAIN
> ---------------
>
> Key: PHOENIX-7876
> URL: https://issues.apache.org/jira/browse/PHOENIX-7876
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
>
> Phoenix's {{EXPLAIN [WITH REGIONS]}} output is today incomplete and easy to
> misread for both query analysis and performance investigation.
> DDL EXPLAINs don't show columns/PK/families/salt/splits. Non-trivial joins
> render with almost no detail. The cost vector advertises three dimensions but
> only compares IO, and estimates/statistics are unreliable. Some optimizations
> are incorrectly categorized, for example some shown as row-eliminating
> predicates are actually column-projection optimizations. The operator trees
> of complex queries are flattened to a single ident level. Many planner facts
> are not surfaced at all, such as which rule chooses an index, which indexes
> the optimizer considered but rejected, what query rewrites took place (e.g.
> subquery decorrelation, star-join detection, right-to-left normalization,
> HAVING lift, RVC-offset translation, reverse-scan substitution, UNION ORDER
> BY, index expression substitution, and more), the specific hash-join strategy
> chosen, salt bucket counts, local vs. global vs. uncovered-global index
> distinctions, the particular flavor of atomic upsert chosen and server-side
> atomic update expressions, multi tenant context, CDC scope, transaction
> provider, projection lists, predicate to filter attribution, hints honored vs
> ignored, and the structure of the JSON/BSON/array path expressions evaluated
> server-side.
> This proposal closes all of those gaps. Details provided in the design
> document.
> Design document:
> https://docs.google.com/document/d/10H_MNWGQL7ZzsPmIVBMz7T1jJroez8AYpvwouKcBVQo/edit?tab=t.0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)