[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-19887: ------------------------------- Affects Version/s: 2.2.0 > __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in > partitioned persisted tables > ----------------------------------------------------------------------------------------------------- > > Key: SPARK-19887 > URL: https://issues.apache.org/jira/browse/SPARK-19887 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0, 2.2.0 > Reporter: Cheng Lian > > The following Spark shell snippet under Spark 2.1 reproduces this issue: > {code} > val data = Seq( > ("p1", 1, 1), > ("p2", 2, 2), > (null, 3, 3) > ) > // Correct case: Saving partitioned data to file system. > val path = "/tmp/partitioned" > data. > toDF("a", "b", "c"). > write. > mode("overwrite"). > partitionBy("a", "b"). > parquet(path) > spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false) > // +---+---+---+ > // |c |a |b | > // +---+---+---+ > // |2 |p2 |2 | > // |1 |p1 |1 | > // +---+---+---+ > // Incorrect case: Saving partitioned data as persisted table. > data. > toDF("a", "b", "c"). > write. > mode("overwrite"). > partitionBy("a", "b"). > saveAsTable("test_null") > spark.table("test_null").filter($"a".isNotNull).show(truncate = false) > // +---+--------------------------+---+ > // |c |a |b | > // +---+--------------------------+---+ > // |3 |__HIVE_DEFAULT_PARTITION__|3 | <-- This line should not be here > // |1 |p1 |1 | > // |2 |p2 |2 | > // +---+--------------------------+---+ > {code} > Hive-style partitioned tables use the magic string > {{\_\_HIVE_DEFAULT_PARTITION\_\_}} to indicate {{NULL}} partition values in > partition directory names. However, in the case persisted partitioned table, > this magic string is not interpreted as {{NULL}} but a regular string. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org