[jira] [Updated] (HIVE-17923) 'cluster by' should not be needed for a bucketed table

Eugene Koifman (JIRA) Mon, 30 Oct 2017 11:37:24 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eugene Koifman updated HIVE-17923:
----------------------------------
    Description: 
given 
{noformat}
CREATE TABLE over10k_orc_bucketed(t tinyint,
           si smallint,
           i int,
           b bigint,
           f float,
           d double,
           bo boolean,
           s string,
           ts timestamp,
           `dec` decimal(4,2),
           bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
{noformat}
insert into over10k_orc_bucketed select * from over10k
{noformat}
produces 1 data file (bucket 0).  It should produce 4 based on input data.
{noformat}
insert into over10k_orc_bucketed select * from over10k cluster by si
{noformat}

does the right thing.

acid_vectorization_original.q has the full script (HIVE-17458)

  was:
given 
{noformat}
CREATE TABLE over10k_orc_bucketed(t tinyint,
           si smallint,
           i int,
           b bigint,
           f float,
           d double,
           bo boolean,
           s string,
           ts timestamp,
           `dec` decimal(4,2),
           bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
{noformat}
insert into over10k_orc_bucketed select * from over10k
{noformat}
produces 1 data file (bucket 0).  It should produce 4 based on input data.
{noformat}
insert into over10k_orc_bucketed select * from over10k cluster by si
{noformat}

does the right thing.

acid_vectorization_original.q has the full script


> 'cluster by' should not be needed for a bucketed table
> ------------------------------------------------------
>
>                 Key: HIVE-17923
>                 URL: https://issues.apache.org/jira/browse/HIVE-17923
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Priority: Blocker
>
> given 
> {noformat}
> CREATE TABLE over10k_orc_bucketed(t tinyint,
>            si smallint,
>            i int,
>            b bigint,
>            f float,
>            d double,
>            bo boolean,
>            s string,
>            ts timestamp,
>            `dec` decimal(4,2),
>            bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
> {noformat}
> insert into over10k_orc_bucketed select * from over10k
> {noformat}
> produces 1 data file (bucket 0).  It should produce 4 based on input data.
> {noformat}
> insert into over10k_orc_bucketed select * from over10k cluster by si
> {noformat}
> does the right thing.
> acid_vectorization_original.q has the full script (HIVE-17458)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17923) 'cluster by' should not be needed for a bucketed table

Reply via email to