[jira] [Updated] (HIVE-9481) allow column list specification in INSERT statement

Eugene Koifman (JIRA) Mon, 09 Feb 2015 17:34:36 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eugene Koifman updated HIVE-9481:
---------------------------------
    Description: 
Given a table FOO(a int, b int, c int), ANSI SQL supports insert into FOO(c,b) 
select x,y from T.  The expectation is that 'x' is written to column 'c' and 
'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' is 
NULLABLE.

Hive does not support this.  In Hive one has to ensure that the data producing 
statement has a schema that matches target table schema.

Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when 
target schema is explicitly provided, missing columns will be set to NULL if 
they are NULLABLE, otherwise an error will be raised.

If/when DEFAULT clause is supported, this can be enhanced to set default value 
rather than NULL.

Thus, given {noformat}
create table source (a int, b int);
create table target (x int, y int, z int);
create table target2 (x int, y int, z int);
{noformat}
{noformat}insert into target(y,z) select * from source;{noformat}
will mean 
{noformat}insert into target select null as x, a, b from source;{noformat}
and 
{noformat}insert into target(z,y) select * from source;{noformat}
will meant 
{noformat}insert into target select null as x, b, a from source;{noformat}

Also,
{noformat}
from source insert into target(y,z) select null as x, * insert into 
target2(y,z) select null as x, source.*;
{noformat}

and for partitioned tables, given

{noformat}
Given:
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, "from" STRING)
  PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS 
STORED AS ORC;

INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link)  
VALUES ('jsmith', 'mail.com');
{noformat}
And dynamic partitioning
{noformat}
INSERT INTO TABLE pageviews PARTITION (datestamp)(userid,datestamp,link)  
VALUES ('jsmith', '2014-09-23', 'mail.com');
{noformat}

In all cases, the schema specification contains columns of the target table 
which are matched by position to the values produced by VALUES clause/SELECT 
statement.  If the producer side provides values for a dynamic partition 
column, the column should be in the specified schema.  Static partition values 
are part of the partition spec and thus are not produced by the producer and 
should not be part of the schema specification.

  was:
Given a table FOO(a int, b int, c int), ANSI SQL supports insert into FOO(c,b) 
select x,y from T.  The expectation is that 'x' is written to column 'c' and 
'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' is 
NULLABLE.

Hive does not support this.  In Hive one has to ensure that the data producing 
statement has a schema that matches target table schema.

Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when 
target schema is explicitly provided, missing columns will be set to NULL if 
they are NULLABLE, otherwise an error will be raised.

If/when DEFAULT clause is supported, this can be enhanced to set default value 
rather than NULL.

Thus, given {noformat}
create table source (a int, b int);
create table target (x int, y int, z int);
create table target2 (x int, y int, z int);
{noformat}
{noformat}insert into target(y,z) select * from source;{noformat}
will mean 
{noformat}insert into target select null as x, a, b from source;{noformat}
and 
{noformat}insert into target(z,y) select * from source;{noformat}
will meant 
{noformat}insert into target select null as x, b, a from source;{noformat}

Also,
{noformat}
from source insert into target(y,z) select null as x, * insert into 
target2(y,z) select null as x, source.*;
{noformat}

and for partitioned tables, given

{noformat}
Given:
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, "from" STRING)
  PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS 
STORED AS ORC;

INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link)  
VALUES ('jsmith', 'mail.com');
{noformat}


> allow column list specification in INSERT statement
> ---------------------------------------------------
>
>                 Key: HIVE-9481
>                 URL: https://issues.apache.org/jira/browse/HIVE-9481
>             Project: Hive
>          Issue Type: Bug
>          Components: Parser, Query Processor, SQL
>    Affects Versions: 0.14.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>         Attachments: HIVE-9481.2.patch, HIVE-9481.patch
>
>
> Given a table FOO(a int, b int, c int), ANSI SQL supports insert into 
> FOO(c,b) select x,y from T.  The expectation is that 'x' is written to column 
> 'c' and 'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' 
> is NULLABLE.
> Hive does not support this.  In Hive one has to ensure that the data 
> producing statement has a schema that matches target table schema.
> Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when 
> target schema is explicitly provided, missing columns will be set to NULL if 
> they are NULLABLE, otherwise an error will be raised.
> If/when DEFAULT clause is supported, this can be enhanced to set default 
> value rather than NULL.
> Thus, given {noformat}
> create table source (a int, b int);
> create table target (x int, y int, z int);
> create table target2 (x int, y int, z int);
> {noformat}
> {noformat}insert into target(y,z) select * from source;{noformat}
> will mean 
> {noformat}insert into target select null as x, a, b from source;{noformat}
> and 
> {noformat}insert into target(z,y) select * from source;{noformat}
> will meant 
> {noformat}insert into target select null as x, b, a from source;{noformat}
> Also,
> {noformat}
> from source insert into target(y,z) select null as x, * insert into 
> target2(y,z) select null as x, source.*;
> {noformat}
> and for partitioned tables, given
> {noformat}
> Given:
> CREATE TABLE pageviews (userid VARCHAR(64), link STRING, "from" STRING)
>   PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS 
> STORED AS ORC;
> INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link) 
>  VALUES ('jsmith', 'mail.com');
> {noformat}
> And dynamic partitioning
> {noformat}
> INSERT INTO TABLE pageviews PARTITION (datestamp)(userid,datestamp,link)  
> VALUES ('jsmith', '2014-09-23', 'mail.com');
> {noformat}
> In all cases, the schema specification contains columns of the target table 
> which are matched by position to the values produced by VALUES clause/SELECT 
> statement.  If the producer side provides values for a dynamic partition 
> column, the column should be in the specified schema.  Static partition 
> values are part of the partition spec and thus are not produced by the 
> producer and should not be part of the schema specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9481) allow column list specification in INSERT statement

Reply via email to