[ https://issues.apache.org/jira/browse/IMPALA-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-7293: ---------------------------------- Description: There's currently no way to tell in the explain plan what the contents of each tuple are. At explain_level>=2 we include "tuple-ids" but no information about what is actually in the tuples. {noformat} [localhost:21000] default> explain select min(regexp_replace(l_comment, ".", "x")) from tpch.lineitem; summary; Query: explain select min(regexp_replace(l_comment, ".", "x")) from tpch.lineitem +---------------------------------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=8.00MB Threads=3 | | Per-Host Resource Estimates: Memory=284.00MB | | | | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B thread-reservation=1 | | PLAN-ROOT SINK | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | | | 03:AGGREGATE [FINALIZE] | | | output: min:merge(regexp_replace(l_comment, '.', 'x')) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=16B cardinality=1 | | | | | 02:EXCHANGE [UNPARTITIONED] | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | tuple-ids=1 row-size=16B cardinality=1 | | | | | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | Per-Host Resources: mem-estimate=274.00MB mem-reservation=8.00MB thread-reservation=2 | | 01:AGGREGATE | | | output: min(regexp_replace(l_comment, '.', 'x')) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=16B cardinality=1 | | | | | 00:SCAN HDFS [tpch.lineitem, RANDOM] | | partitions=1/1 files=1 size=718.94MB | | stored statistics: | | table: rows=6001215 size=718.94MB | | columns: all | | extrapolated-rows=disabled max-scan-range-rows=1068457 | | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 | | tuple-ids=0 row-size=42B cardinality=6001215 | +---------------------------------------------------------------------------------------+ Fetched 32 row(s) in 0.01s Summary not available {noformat} We already have a debugString() methods that prints a human-readable representation. We could start off by printing a tuple descriptor per line at the end of the explain plan with basic information. I think we should only print materialised slots and print them in order of offset, e.g. for "select l_comment from tpch.lineitem where l_orderkey < 10" we would print something like this at the end of the explain plan {noformat} F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Host Resources: mem-estimate=9.02MB mem-reservation=0B thread-reservation=1 PLAN-ROOT SINK | mem-estimate=0B mem-reservation=0B thread-reservation=0 | 01:EXCHANGE [UNPARTITIONED] | mem-estimate=9.02MB mem-reservation=0B thread-reservation=0 | tuple-ids=0 row-size=46B cardinality=600122 | in pipelines: 00(GETNEXT) | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 Per-Host Resources: mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=2 00:SCAN HDFS [tpch.lineitem, RANDOM] partitions=1/1 files=1 size=718.94MB predicates: l_orderkey < CAST(10 AS BIGINT) stored statistics: table: rows=6001215 size=718.94MB columns: all extrapolated-rows=disabled max-scan-range-rows=1068457 mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 tuple-ids=0 row-size=46B cardinality=600122 in pipelines: 00(GETNEXT) Tuple 1: Slot 1: offset=0 type=STRING path=tpch.lineitem.l_comment nullable=true Slot 2: offset=12 type=BIGINT path=tpch.lineitem.l_comment nullable=true {noformat} was: There's currently no way to tell in the explain plan what the contents of each tuple are. At explain_level>=2 we include "tuple-ids" but no information about what is actually in the tuples. {noformat} [localhost:21000] default> explain select min(regexp_replace(l_comment, ".", "x")) from tpch.lineitem; summary; Query: explain select min(regexp_replace(l_comment, ".", "x")) from tpch.lineitem +---------------------------------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=8.00MB Threads=3 | | Per-Host Resource Estimates: Memory=284.00MB | | | | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B thread-reservation=1 | | PLAN-ROOT SINK | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | | | 03:AGGREGATE [FINALIZE] | | | output: min:merge(regexp_replace(l_comment, '.', 'x')) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=16B cardinality=1 | | | | | 02:EXCHANGE [UNPARTITIONED] | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | tuple-ids=1 row-size=16B cardinality=1 | | | | | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | Per-Host Resources: mem-estimate=274.00MB mem-reservation=8.00MB thread-reservation=2 | | 01:AGGREGATE | | | output: min(regexp_replace(l_comment, '.', 'x')) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=16B cardinality=1 | | | | | 00:SCAN HDFS [tpch.lineitem, RANDOM] | | partitions=1/1 files=1 size=718.94MB | | stored statistics: | | table: rows=6001215 size=718.94MB | | columns: all | | extrapolated-rows=disabled max-scan-range-rows=1068457 | | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 | | tuple-ids=0 row-size=42B cardinality=6001215 | +---------------------------------------------------------------------------------------+ Fetched 32 row(s) in 0.01s Summary not available {noformat} We already have a debugString() methods that prints a human-readable representation. We could start off by printing a tuple descriptor per line at the end of the explain plan and tweaking it a little where necessary to make it more readable, e.g. hiding non-materialized slots. > Show tuple layout in explain plan > --------------------------------- > > Key: IMPALA-7293 > URL: https://issues.apache.org/jira/browse/IMPALA-7293 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Reporter: Tim Armstrong > Priority: Major > Labels: observability > > There's currently no way to tell in the explain plan what the contents of > each tuple are. At explain_level>=2 we include "tuple-ids" but no information > about what is actually in the tuples. > {noformat} > [localhost:21000] default> explain select min(regexp_replace(l_comment, ".", > "x")) > from tpch.lineitem; summary; > Query: explain select min(regexp_replace(l_comment, ".", "x")) > from tpch.lineitem > +---------------------------------------------------------------------------------------+ > | Explain String > | > +---------------------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=8.00MB Threads=3 > | > | Per-Host Resource Estimates: Memory=284.00MB > | > | > | > | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > | | Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B > thread-reservation=1 | > | PLAN-ROOT SINK > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | > | > | 03:AGGREGATE [FINALIZE] > | > | | output: min:merge(regexp_replace(l_comment, '.', 'x')) > | > | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=1 row-size=16B cardinality=1 > | > | | > | > | 02:EXCHANGE [UNPARTITIONED] > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | tuple-ids=1 row-size=16B cardinality=1 > | > | | > | > | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > | > | Per-Host Resources: mem-estimate=274.00MB mem-reservation=8.00MB > thread-reservation=2 | > | 01:AGGREGATE > | > | | output: min(regexp_replace(l_comment, '.', 'x')) > | > | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=1 row-size=16B cardinality=1 > | > | | > | > | 00:SCAN HDFS [tpch.lineitem, RANDOM] > | > | partitions=1/1 files=1 size=718.94MB > | > | stored statistics: > | > | table: rows=6001215 size=718.94MB > | > | columns: all > | > | extrapolated-rows=disabled max-scan-range-rows=1068457 > | > | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 > | > | tuple-ids=0 row-size=42B cardinality=6001215 > | > +---------------------------------------------------------------------------------------+ > Fetched 32 row(s) in 0.01s > Summary not available > {noformat} > We already have a debugString() methods that prints a human-readable > representation. We could start off by printing a tuple descriptor per line at > the end of the explain plan with basic information. I think we should only > print materialised slots and print them in order of offset, e.g. for "select > l_comment from tpch.lineitem where l_orderkey < 10" we would print something > like this at the end of the explain plan > {noformat} > F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=9.02MB mem-reservation=0B > thread-reservation=1 > PLAN-ROOT SINK > | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > 01:EXCHANGE [UNPARTITIONED] > | mem-estimate=9.02MB mem-reservation=0B thread-reservation=0 > | tuple-ids=0 row-size=46B cardinality=600122 > | in pipelines: 00(GETNEXT) > | > F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > Per-Host Resources: mem-estimate=264.00MB mem-reservation=8.00MB > thread-reservation=2 > 00:SCAN HDFS [tpch.lineitem, RANDOM] > partitions=1/1 files=1 size=718.94MB > predicates: l_orderkey < CAST(10 AS BIGINT) > stored statistics: > table: rows=6001215 size=718.94MB > columns: all > extrapolated-rows=disabled max-scan-range-rows=1068457 > mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 > tuple-ids=0 row-size=46B cardinality=600122 > in pipelines: 00(GETNEXT) > Tuple 1: > Slot 1: offset=0 type=STRING path=tpch.lineitem.l_comment nullable=true > Slot 2: offset=12 type=BIGINT path=tpch.lineitem.l_comment nullable=true > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org