[ https://issues.apache.org/jira/browse/HIVE-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-17923: ---------------------------------- Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-17458) > 'cluster by' should not be needed for a bucketed table > ------------------------------------------------------ > > Key: HIVE-17923 > URL: https://issues.apache.org/jira/browse/HIVE-17923 > Project: Hive > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Eugene Koifman > Priority: Blocker > > given > {noformat} > CREATE TABLE over10k_orc_bucketed(t tinyint, > si smallint, > i int, > b bigint, > f float, > d double, > bo boolean, > s string, > ts timestamp, > `dec` decimal(4,2), > bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC; > {noformat} > insert into over10k_orc_bucketed select * from over10k > {noformat} > produces 1 data file (bucket 0). It should produce 4 based on input data. > {noformat} > insert into over10k_orc_bucketed select * from over10k cluster by si > {noformat} > does the right thing. > acid_vectorization_original.q has the full script -- This message was sent by Atlassian JIRA (v6.4.14#64029)