Soumyakanti Das created HIVE-29122:
--------------------------------------

             Summary: Vectorization - IGNORE NULLS not supported for 
FIRST_VALUE and LAST_VALUE
                 Key: HIVE-29122
                 URL: https://issues.apache.org/jira/browse/HIVE-29122
             Project: Hive
          Issue Type: Bug
          Components: Vectorization
    Affects Versions: 4.2.0
            Reporter: Soumyakanti Das
            Assignee: Soumyakanti Das


Currently we get wrong results for FIRST_VALUE and LAST_VALUE functions with 
IGNORE NULLS when vectorization is enabled.

Consider this table:
{noformat}
CREATE TABLE window_double_test (
  id INT,
  double_col DOUBLE
);

INSERT INTO window_double_test VALUES
  (1, NULL),
  (2, NULL),
  (1, NULL),
  (1, NULL),
  (2, 25.5),
  (3, NULL),
  (6, 65.5),
  (2, NULL),
  (3, 32.5),
  (4, NULL),
  (3, 30.5),
  (5, 50.5),
  (4, 42.3),
  (5, NULL),
  (6, 65.2),
  (7, NULL),
  (NULL, 80.5),
  (NULL, NULL);{noformat}
Non-vectorized:
{noformat}
set hive.vectorized.execution.enabled=false;
SELECT id, double_col,
  FIRST_VALUE(double_col) IGNORE NULLS OVER(ORDER BY id) AS first_double,
  LAST_VALUE(double_col) IGNORE NULLS OVER(ORDER BY id) AS last_double
FROM window_double_test;


id  double_col first_double last_double
1 NULL NULL NULL
1 NULL NULL NULL
1 NULL NULL NULL
2 NULL 25.5 25.5
2 25.5 25.5 25.5
2 NULL 25.5 25.5
3 NULL 25.5 32.5
3 30.5 25.5 32.5
3 32.5 25.5 32.5
4 NULL 25.5 42.3
4 42.3 25.5 42.3
5 50.5 25.5 50.5
5 NULL 25.5 50.5
6 65.2 25.5 65.5
6 65.5 25.5 65.5
7 NULL 25.5 65.5
NULL  NULL 25.5 80.5
NULL  80.5 25.5 80.5{noformat}
Vectorized:
{noformat}
set hive.vectorized.execution.enabled=true;
SELECT id, double_col,
  FIRST_VALUE(double_col) IGNORE NULLS OVER(ORDER BY id) AS first_double,
  LAST_VALUE(double_col) IGNORE NULLS OVER(ORDER BY id) AS last_double
FROM window_double_test;


id  double_col first_double last_double
1 NULL NULL NULL
1 NULL NULL NULL
1 NULL NULL NULL
2 NULL NULL NULL
2 25.5 NULL NULL
2 NULL NULL NULL
3 NULL NULL 32.5
3 30.5 NULL 32.5
3 32.5 NULL 32.5
4 NULL NULL 42.3
4 42.3 NULL 42.3
5 50.5 NULL NULL
5 NULL NULL NULL
6 65.2 NULL 65.5
6 65.5 NULL 65.5
7 NULL NULL NULL
NULL  NULL NULL 80.5
NULL  80.5 NULL 80.5{noformat}
We can see the differences for id = 2, for example. Similar results can also be 
seen for int and decimal columns too. We don't see the same for a String column 
as only LONG, DECIMAL, and DOUBLE are supported, as can be seen in 
Vectorizer#validatePTFOperator.

There are other scenarios where it bypasses the Vectorizer, for example, if we 
explicitly use ROWS instead of the default RANGE, etc. Anything that skips 
vectorization produces correct results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to