soumyakanti3578 commented on PR #6027:
URL: https://github.com/apache/hive/pull/6027#issuecomment-3190855413
@kasakrisz
Sorry I should have clarified it in more detail. The bug was fixed by this:
```
if (longColVector.isRepeating) {
if (longColVector.noNulls || !longColVector.isNull[0]) {
lastValue = longColVector.vector[0];
isGroupResultNull = false;
} else {
-- isGroupResultNull = true;
++ isGroupResultNull = doesRespectNulls() || lastValue == null;
}
}
```
When `longColVector.isRepeating = true`, it means that the column vector
contains the same value for all the rows in the batch and thus we just store
the value in the `0th` index.
When `if (longColVector.noNulls || !longColVector.isNull[0])` condition is
`true`, we can simply pick up the value from the `0th` index because all rows
in the batch has the same repeating non null value, so `lastValue` would be the
same value too, and hence the group result is not null as indicated by
`isGroupResultNull = false`.
But when the condition is `false`, i.e., when values in the batch are
repeating for the column (`longColVector.isRepeating = true`) AND the value in
the first element is null (`!(longColVector.noNulls ||
!longColVector.isNull[0])`) we were simply setting `isGroupResultNull = true;`
as indeed then the value in the column vector is null. This was fine when we
were not supporting `IGNORE NULLS`.
Now, we must check if we respect nulls with `doesRespectNulls()`. If we do,
then the group result is indeed null and we can set it to `true`. Else if we
are ignoring nulls, then we must check if lastValue was set by the previous
batch or not. If it was set, then we must set `isGroupResultNull = false;`
In short, when we are in the `else` block, the value in the column vector is
indeed `null`. But we must check the `lastValue` because it could have been set
by the previous batch.
In the int test file, we have these three rows for id = 2: `(2, 20), (2,
null), (2, null)`. Since we have `id` and `int_col` in the ORDER BY of
`LAST_VALUE(int_col IGNORE NULLS) OVER(PARTITION BY id ORDER BY id, int_col) AS
last_int` this would create two batches:
```
batch 1: [2, 20]
batch 2: [[2, null], [2, null]]
```
as we are dealing with `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`,
i.e., both `(2, null)`s will be together. Also note that for batch 2,
`longColVector.isRepeating = true`. While computing batch 2, we will go to the
else block and we would have set the result to null before.
Hope this clarifies the bug!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]