[ 
https://issues.apache.org/jira/browse/HIVE-28938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28938:
----------------------------------
    Labels: pull-request-available  (was: )

> Error in LATERAL VIEW with non native tables due to presence of incorrect 
> virtual columns in RowResolver
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-28938
>                 URL: https://issues.apache.org/jira/browse/HIVE-28938
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 4.0.1
>            Reporter: Soumyakanti Das
>            Assignee: Soumyakanti Das
>            Priority: Major
>              Labels: pull-request-available
>
> To reproduce, run:
> {noformat}
> create external table test(id int, arr array<string>) stored by iceberg;
> insert into test values (1, array("a", "b")), (2, array("c", "d")), (3, 
> array("e", "f"));
> select * from test
> lateral view explode(arr) tbl1 as name
> lateral view explode(arr) tbl2 as name1;
> {noformat}
> Error:
> {noformat}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference 'BLOCK__OFFSET__INSIDE__FILE'
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:14032)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13971)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:5027)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4801)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11888)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11827)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12812)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12665)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12696)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12678)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:688)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13572)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:489)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:324)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:227)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:108){noformat}
>  
> BLOCK_{_}OFFSET{_}{_}INSIDE{_}_FILE should not be present for non native 
> tables, but it is present because of the misclassification of ICEBERG tables 
> as TableType.NATIVE in the method [obtainTableType(Table 
> tabMetaData)|https://github.com/apache/hive/blob/e7cdc0cac935b8e2c573edad9d6e790d8beeee40/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L3272].
>  This returns NATIVE for ICEBERG tables and subsequently adds incorrect 
> virtual columns in 
> [genTableLogicalPlan|https://github.com/apache/hive/blob/e7cdc0cac935b8e2c573edad9d6e790d8beeee40/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L3079].
> Virtual column adding logic should be similar to 
> [here|https://github.com/apache/hive/blob/e7cdc0cac935b8e2c573edad9d6e790d8beeee40/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L11990].
> P.S. This only occurs for multiple lateral views. My guess is that when we 
> just have one lateral view, unnecessary fields are trimmed from the 
> HiveProject on top of the HiveTableFunctionScan. But when we have two or more 
> lateral views, HiveProjects which are not at the top of the plan has all 
> fields as the lateral function on top of the Project (in 
> HiveTableFunctionScan) refers to them.
> For example, this is a plan with two lateral views:
> {noformat}
> query: explain cbo
> select * from test
> lateral view explode(arr) tbl1 as name
> lateral view explode(arr) tbl2 as name1
> CBO PLAN:
> HiveProject(id=[$0], arr=[$1], name=[$8], name1=[$9])
>   HiveTableFunctionScan(invocation=[LATERAL(explode($1), $0, $1, $2, $3, $4, 
> $5, $6, $7, $8)], rowType=[RecordType(INTEGER id, VARCHAR(2147483647) ARRAY 
> arr, INTEGER PARTITION__SPEC__ID, BIGINT PARTITION__HASH, VARCHAR(2147483647) 
> FILE__PATH, BIGINT ROW__POSITION, VARCHAR(2147483647) PARTITION__PROJECTION, 
> BIGINT SNAPSHOT__ID, VARCHAR(2147483647) tbl1.name, VARCHAR(2147483647) 
> tbl2.name1)])
>     HiveProject(id=[$0], arr=[$1], PARTITION__SPEC__ID=[$2], 
> PARTITION__HASH=[$3], FILE__PATH=[$4], ROW__POSITION=[$5], 
> PARTITION__PROJECTION=[$6], SNAPSHOT__ID=[$7], tbl1.name=[$8])
>       HiveTableFunctionScan(invocation=[LATERAL(explode($1), $0, $1, $2, $3, 
> $4, $5, $6, $7)], rowType=[RecordType(INTEGER id, VARCHAR(2147483647) ARRAY 
> arr, INTEGER PARTITION__SPEC__ID, BIGINT PARTITION__HASH, VARCHAR(2147483647) 
> FILE__PATH, BIGINT ROW__POSITION, VARCHAR(2147483647) PARTITION__PROJECTION, 
> BIGINT SNAPSHOT__ID, VARCHAR(2147483647) tbl1.name)])
>         HiveTableScan(table=[[default, test]], table:alias=[test]) {noformat}
> We can see the second Project has all columns because the lateral function on 
> top of it refers to them. Whereas, this is a plan with just one lateral view:
> {noformat}
> query: explain cbo
> select * from test
> lateral view explode(arr) tbl1 as name
> CBO PLAN:
> HiveProject(id=[$0], arr=[$1], name=[$8])
>   HiveTableFunctionScan(invocation=[LATERAL(explode($1), $0, $1, $2, $3, $4, 
> $5, $6, $7)], rowType=[RecordType(INTEGER id, VARCHAR(2147483647) ARRAY arr, 
> INTEGER PARTITION__SPEC__ID, BIGINT PARTITION__HASH, VARCHAR(2147483647) 
> FILE__PATH, BIGINT ROW__POSITION, VARCHAR(2147483647) PARTITION__PROJECTION, 
> BIGINT SNAPSHOT__ID, VARCHAR(2147483647) tbl1.name)])
>     HiveTableScan(table=[[default, test]], table:alias=[test]){noformat}
>  
> P.P.S: The plans above are after applying the fix, so we only see correct 
> virtual columns, but if the fix was not present, we would also see virtual 
> columns like BLOCK_{_}OFFSET{_}{_}INSIDE{_}_FILE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to