Cheng Lian created SPARK-1959: --------------------------------- Summary: String "NULL" is interpreted as null value Key: SPARK-1959 URL: https://issues.apache.org/jira/browse/SPARK-1959 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Cheng Lian
The {{HiveTableScan}} operator unwraps string "NULL" (case insensitive) into null values even if the column type is {{STRING}}. To reproduce the bug, we use {{sql/hive/src/test/resources/groupby_groupingid.txt}} as test input, copied to {{/tmp/groupby_groupingid.txt}}. Hive session: {code} hive> CREATE TABLE test_null(key INT, value STRING); hive> LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null; hive> SELECT * FROM test_null WHERE value IS NOT NULL; ... OK 1 NULL 1 1 2 2 3 3 3 NULL 4 5 1 NULL 1 1 2 2 3 3 3 NULL 4 5 {code} We can see that the {{NULL}} cells in the original input file are interpreted as string {{"NULL"}} in Hive. Spark SQL session ({{sbt/sbt hive/console}}): {code} scala> hql("CREATE TABLE test_null(key INT, value STRING)") scala> hql("LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null") scala> hql("SELECT * FROM test_null WHERE value IS NOT NULL").foreach(println) ... [1,1] [2,2] [3,3] [4,5] {code} As we can see, string {{"NULL"}} is interpreted as null values in Spark SQL. -- This message was sent by Atlassian JIRA (v6.2#6252)