[jira] [Work logged] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output
[ https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=795141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795141 ] ASF GitHub Bot logged work on HIVE-26408: - Author: ASF GitHub Bot Created on: 26/Jul/22 06:40 Start Date: 26/Jul/22 06:40 Worklog Time Spent: 10m Work Description: abstractdog merged PR #3452: URL: https://github.com/apache/hive/pull/3452 Issue Time Tracking --- Worklog Id: (was: 795141) Time Spent: 20m (was: 10m) > Vectorization: Fix deallocation of scratch columns, don't reuse a child > ConstantVectorExpression as an output > - > > Key: HIVE-26408 > URL: https://issues.apache.org/jira/browse/HIVE-26408 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > This is similar to HIVE-15588. With a customer query, I reproduced a > vectorized expression tree like the below one (I'll attach a simple repro > query when it's possible): > {code} > selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col > 61:string)(children: StringColumnInList(col 13, values TermDeposit, > RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns > [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( > _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col > 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> > 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, > ConstantVectorExpression(val ) -> 61:string) -> 62:string > {code} > query part was: > {code} > CASE WHEN DLY_BAL.PDELP_VALUE in ( > 'TermDeposit', 'RecurringDeposit', > 'CertificateOfDeposit' > ) THEN NVL( > ( > from_unixtime( > unix_timestamp( > cast(DLY_BAL.APATD_MTRTY_DATE as date) > ), > 'MM-dd-' > ) > ), > ' ' > ) ELSE '' END AS MAT_DTE > {code} > Here is the problem described: > 1. IfExprCondExprColumn has 62:string as its > [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64], > which is a reused scratch column (see 5) ) > 2. in evaluation time, [isRepeating is > reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68] > 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of > children is required, so we go to > [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95] > 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which > belongs to the second branch of VectorCoalesce, so to the '' empty string in > NVL's second argument > 5. in 4) 62: string column is set to an isRepeating column (and it's released > by > [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]), > so it's marked as a reusable scratch column > 6. after the conditional evaluation in 3), the final output of > IfExprCondExprColumn set > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99], > but here we get an exception > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]: > {code} > 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: > java.lang.AssertionError: Output column number expected to be 0 when > isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694) > at >
[jira] [Work logged] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output
[ https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=792373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792373 ] ASF GitHub Bot logged work on HIVE-26408: - Author: ASF GitHub Bot Created on: 18/Jul/22 22:11 Start Date: 18/Jul/22 22:11 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request, #3452: URL: https://github.com/apache/hive/pull/3452 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 792373) Remaining Estimate: 0h Time Spent: 10m > Vectorization: Fix deallocation of scratch columns, don't reuse a child > ConstantVectorExpression as an output > - > > Key: HIVE-26408 > URL: https://issues.apache.org/jira/browse/HIVE-26408 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This is similar to HIVE-15588. With a customer query, I reproduced a > vectorized expression tree like the below one (I'll attach a simple repro > query when it's possible): > {code} > selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col > 61:string)(children: StringColumnInList(col 13, values TermDeposit, > RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns > [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( > _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col > 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> > 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, > ConstantVectorExpression(val ) -> 61:string) -> 62:string > {code} > query part was: > {code} > CASE WHEN DLY_BAL.PDELP_VALUE in ( > 'TermDeposit', 'RecurringDeposit', > 'CertificateOfDeposit' > ) THEN NVL( > ( > from_unixtime( > unix_timestamp( > cast(DLY_BAL.APATD_MTRTY_DATE as date) > ), > 'MM-dd-' > ) > ), > ' ' > ) ELSE '' END AS MAT_DTE > {code} > Here is the problem described: > 1. IfExprCondExprColumn has 62:string as its > [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64], > which is a reused scratch column (see 5) ) > 2. in evaluation time, [isRepeating is > reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68] > 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of > children is required, so we go to > [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95] > 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which > belongs to the second branch of VectorCoalesce, so to the '' empty string in > NVL's second argument > 5. in 4) 62: string column is set to an isRepeating column (and it's released > by > [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]), > so it's marked as a reusable scratch column > 6. after the conditional evaluation in 3), the final output of > IfExprCondExprColumn set > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99], > but here we get an exception > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]: > {code} > 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: > java.lang.AssertionError: Output column number expected to be 0 when > isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > at >