[
https://issues.apache.org/jira/browse/FLINK-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557839#comment-17557839
]
luoyuxia edited comment on FLINK-28212 at 6/23/22 1:15 PM:
-----------------------------------------------------------
It seems a bug of Calcite for it hasn't adjust the index of window's
lowerBound/upperBound.
But to fix it in Flink, the idea is straghtforward, trim the produced project
node in HiveParser. The logical plan produced by Hive parser should looks like:
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, EXPR$2])
LogicalProject(ctinyint=[$0], cint=[$2], EXPR$2=[COUNT($5) OVER (PARTITION BY
$0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
The project node only contains needed field.
was (Author: luoyuxia):
It seems a bug of Calcite for it hasn't adjust the index of window's
lowerBound/upperBound.
But to fix it, the idea is straghtforward, trim the produced project node in
HiveParser. The logical plan produced by Hive parser should looks like:
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, EXPR$2])
LogicalProject(ctinyint=[$0], cint=[$2], EXPR$2=[COUNT($5) OVER (PARTITION BY
$0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
The project node only contains needed field.
> IndexOutOfBoundsException is thrown when project contains window which
> dosen't refer all fields of input when using Hive dialect
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-28212
> URL: https://issues.apache.org/jira/browse/FLINK-28212
> Project: Flink
> Issue Type: Sub-task
> Components: Connectors / Hive
> Reporter: luoyuxia
> Priority: Major
> Fix For: 1.16.0
>
>
> Can be reproduced by following sql when using Hive dialect:
> {code:java}
> CREATE TABLE alltypesorc(
> ctinyint TINYINT,
> csmallint SMALLINT,
> cint INT,
> cbigint BIGINT,
> cfloat FLOAT,
> cdouble DOUBLE,
> cstring1 STRING,
> cstring2 STRING,
> ctimestamp1 TIMESTAMP,
> ctimestamp2 TIMESTAMP,
> cboolean1 BOOLEAN,
> cboolean2 BOOLEAN);
> select a.ctinyint, a.cint, count(a.cdouble)
> over(partition by a.ctinyint order by a.cint desc
> rows between 1 preceding and 1 following)
> from alltypesorc {code}
> Then it will throw the exception "caused by:
> java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)".
>
> The reson is for such sql, Hive dialect will generate a RelNode:
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2])
> LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12])
> LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3],
> cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8],
> ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER
> (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1
> FOLLOWING)])
> LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
> Note: the first ProjectNode from down to top conatins all fields.
> And as the "{*}1{*} PRECEDING AND *1* FOLLOWING" in the window whose input
> will also contains all fields in the project node will be converted to
> RexInputRef in Calcite. So, the window will be like
> {code:java}
> COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $11
> PRECEDING AND $11 FOLLOWING{code}
> {color:#172b4d}Note: `$11` is a special field for windows, which is actually
> recorded as window's constants.{color}
>
> But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not
> refered by the top project and window) will be removed,
> so the the input of the window will only contains 4 fields (ctinyint, cint,
> cdouble, count(cdouble)).
> Finally, in RelExplainUtil, when explain boundString, it won't find
> {*}$11{*}, so the exception "Caused by: java.lang.IndexOutOfBoundsException:
> index (8) must be less than size (1)" throws.
> {code:java}
> val ref = bound.getOffset.asInstanceOf[RexInputRef]
> // ref.getIndex will be 11 but origin input size of the window is 3
> val boundIndex = ref.getIndex - calcOriginInputRows(window)
> // offset = 8, but the window's constants only contains one single element "1"
> val offset = window.constants.get(boundIndex).getValue2
> val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING"
> s"$offset $offsetKind" {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)