[ https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106370#comment-15106370 ]
Charles Pritchard commented on HIVE-10888: ------------------------------------------ I'm seeing a similar issue, in Hive 0.14. I have a two-level partition -- partitioned by (date string, bucket string) and it seems that most queries do not include the default partition (for bucket) when run. While I can run create temp table as select *, and get a fully functioning table, I can not simply run select * where, and get useable results from the default partition, when I have a where query. This may be a regression introduced in HIVE-4878. I'll check through some support channels to see what I can find. > Hive Dynamic Partition + Default Partition makes Null Values Not querable > ------------------------------------------------------------------------- > > Key: HIVE-10888 > URL: https://issues.apache.org/jira/browse/HIVE-10888 > Project: Hive > Issue Type: Bug > Components: Hive, Query Processor > Reporter: Goden Yao > > This is reported by Pivotal.io (Noa Horn) > And HAWQ latest version should have this fixed in our queries. > === Expected Behavior === > When dynamic partition enabled and mode = nonstrict, the null value in the > default partition should still be returned when user specify that in > "...WHERE.... is Null". > === Problem statment === > *Enable dynamic partitions* > {code} > hive.exec.dynamic.partition = true > hive.exec.dynamic.partition.mode = nonstrict > #Get default partition name: > hive.exec.default.partition.name > Default Value: _HIVE_DEFAULT_PARTITION_ > {code} > Hive creates a default partition if the partition key value doesn’t conform > to the field type. For example, if the partition key is NULL. > *Hive Example* > Add the following parameters to hive-site.xml > {code} > <property> > <name>hive.exec.dynamic.partition</name> > <value>true</value> > </property> > <property> > <name>hive.exec.dynamic.partition.mode</name> > <value>true</value> > </property> > {code} > Create data: > vi /tmp/base_data.txt > 1,1.0,1900-01-01 > 2,2.2,1994-04-14 > 3,3.3,2011-03-31 > 4,4.5,bla > 5,5.0,2013-12-06 > Create hive table and load the data to it. This table is used to load data to > the partition table. > {code} > hive> > CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ','; > LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base; > SELECT * FROM base; > OK > 1 1.0 1900-01-01 > 2 2.2 1994-04-14 > 3 3.3 2011-03-31 > 4 4.5 NULL > 5 5.0 2013-12-06 > {code} > Note that one of the rows has NULL in its date field. > Create hive partition table and load data from base table to it. The data > will be dynamically partitioned > {code} > CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date > date); > INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base; > SELECT * FROM sales; > OK > 1 1.0 1900-01-01 > 2 2.2 1994-04-14 > 3 3.3 2011-03-31 > 5 5.0 2013-12-06 > 4 4.5 NULL > {code} > Check that the table has different partitions > {code} > hdfs dfs -ls /hive/warehouse/sales > Found 5 items > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=1900-01-01 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=1994-04-14 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=2011-03-31 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=2013-12-06 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__ > {code} > Hive queries with default partition > Queries without a filter or with a filter on a different field returns the > default partition data: > {code} > hive> select * from sales; > OK > 1 1.0 1900-01-01 > 2 2.2 1994-04-14 > 3 3.3 2011-03-31 > 5 5.0 2013-12-06 > 4 4.5 NULL > Time taken: 0.578 seconds, Fetched: 5 row(s) > {code} > Queries with a filter on the partition field omit the default partition data: > {code} > hive> select * from sales where date <> '2013-12-06'; > OK > 1 1.0 1900-01-01 > 2 2.2 1994-04-14 > 3 3.3 2011-03-31 > Time taken: 0.19 seconds, Fetched: 3 row(s) > hive> select * from sales where date is null; > OK > Time taken: 0.035 seconds > hive> select * from sales where date is not null; > OK > 1 1.0 1900-01-01 > 2 2.2 1994-04-14 > 3 3.3 2011-03-31 > 5 5.0 2013-12-06 > Time taken: 0.042 seconds, Fetched: 4 row(s) > hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__'; > OK > Time taken: 0.056 seconds > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)