[ https://issues.apache.org/jira/browse/HIVE-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-17923: ---------------------------------- Description: given {noformat} CREATE TABLE over10k_orc_bucketed(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC; {noformat} insert into over10k_orc_bucketed select * from over10k {noformat} produces 1 data file (bucket 0). It should produce 4 based on input data. {noformat} insert into over10k_orc_bucketed select * from over10k cluster by si {noformat} does the right thing. acid_vectorization_original.q has the full script (HIVE-17458) was: given {noformat} CREATE TABLE over10k_orc_bucketed(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC; {noformat} insert into over10k_orc_bucketed select * from over10k {noformat} produces 1 data file (bucket 0). It should produce 4 based on input data. {noformat} insert into over10k_orc_bucketed select * from over10k cluster by si {noformat} does the right thing. acid_vectorization_original.q has the full script > 'cluster by' should not be needed for a bucketed table > ------------------------------------------------------ > > Key: HIVE-17923 > URL: https://issues.apache.org/jira/browse/HIVE-17923 > Project: Hive > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Eugene Koifman > Priority: Blocker > > given > {noformat} > CREATE TABLE over10k_orc_bucketed(t tinyint, > si smallint, > i int, > b bigint, > f float, > d double, > bo boolean, > s string, > ts timestamp, > `dec` decimal(4,2), > bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC; > {noformat} > insert into over10k_orc_bucketed select * from over10k > {noformat} > produces 1 data file (bucket 0). It should produce 4 based on input data. > {noformat} > insert into over10k_orc_bucketed select * from over10k cluster by si > {noformat} > does the right thing. > acid_vectorization_original.q has the full script (HIVE-17458) -- This message was sent by Atlassian JIRA (v6.4.14#64029)