[jira] [Work logged] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=795141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795141
 ]

ASF GitHub Bot logged work on HIVE-26408:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 06:40
Start Date: 26/Jul/22 06:40
Worklog Time Spent: 10m 
  Work Description: abstractdog merged PR #3452:
URL: https://github.com/apache/hive/pull/3452




Issue Time Tracking
---

Worklog Id: (was: 795141)
Time Spent: 20m  (was: 10m)

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>   at 
> 

[jira] [Work logged] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2022-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=792373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792373
 ]

ASF GitHub Bot logged work on HIVE-26408:
-

Author: ASF GitHub Bot
Created on: 18/Jul/22 22:11
Start Date: 18/Jul/22 22:11
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request, #3452:
URL: https://github.com/apache/hive/pull/3452

   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 792373)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
>