Alan Jackoway created IMPALA-6954:
-------------------------------------

             Summary: Kudu CTAS Loses Partitioning
                 Key: IMPALA-6954
                 URL: https://issues.apache.org/jira/browse/IMPALA-6954
             Project: IMPALA
          Issue Type: Bug
            Reporter: Alan Jackoway


In certain types of queries, CTAS stored as Kudu will lose the partitioning.

To reproduce:
Create transactions table:
{code:sql}
create table alanj_transactions(account_id string, transaction_id string, total 
double, close_date string) 
{code}

Don't need to put any data into it. Create Kudu table from it, trying to get 
the longest-lived record (close date to now):
{code:sql}
create table alanj_kudu 
primary key (account_id) 
partition by hash(account_id) partitions 5
stored as kudu
as
select account_id,
datediff(now(), min(cast(close_date AS TIMESTAMP))) AS tenure_days
from alanj_transactions
group by 1
{code}

You receive a warning like "Unpartitioned Kudu tables are inefficient for large 
data sizes." Show create table + the Kudu UIs confirm that partitions were not 
created.

If you replace that datediff line with something like {{sum(total) as 
account_total}}, it works fine. Something about datediff is causing it to lose 
the partitioning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to