Krisztian Kasa created HIVE-24775:
-------------------------------------

             Summary: Incorrect null handling when rebuilding Materialized view 
incrementally
                 Key: HIVE-24775
                 URL: https://issues.apache.org/jira/browse/HIVE-24775
             Project: Hive
          Issue Type: Bug
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa


{code}
CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc 
TBLPROPERTIES ('transactional'='true');

INSERT INTO t1 VALUES
 (NULL, 'null_value', 100.77, 7),
 (1, 'calvin', 978.76, 3),
 (1, 'charlie', 9.8, 1);

CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS
  SELECT a, b, sum(d)
  FROM t1
  WHERE c > 10.0
  GROUP BY a, b;

INSERT INTO t1 VALUES
 (NULL, 'null_value', 100.88, 8),
 (1, 'charlie', 15.8, 1);

ALTER MATERIALIZED VIEW mat1 REBUILD;

SELECT * FROM mat1
ORDER BY a, b;
{code}
View contains:
{code}
1       calvin  3
1       charlie 1
NULL    null_value      8
NULL    null_value      7
{code}
but it should contain:
{code}
1       calvin  3
1       charlie 1
NULL    null_value      15
{code}

Rows with aggregate key columns having NULL values are not aggregated because 
incremental materialized view rebuild plan is altered by 
[applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]:
  IS NOT NULL filter added for each of these columns on top of the view scan 
when joining with the branch pulls the rows inserted after the last rebuild:
{code}
HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, 
+($5, $2))])
  HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), =($1, 
$4)))])
    HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], 
algorithm=[none], cost=[not available])
      HiveProject(a=[$0], b=[$1], _c2=[$2])
        HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))])
          HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
      HiveProject(a=[$0], b=[$1], $f2=[$2])
        HiveAggregate(group=[{0, 1}], agg#0=[sum($3)])
          HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))])
            HiveTableScan(table=[[default, t1]], table:alias=[t1])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to