Re: How do I INSERT OVERWRITE into a new table if it's partitioned?

Zheng Shao Mon, 21 Dec 2009 00:47:59 -0800

You are correct.

Just opened https://issues.apache.org/jira/browse/HIVE-1002
This is a highly wanted feature from a lot of users.


Please comment on the JIRA. Let's figure out how we want to do it.

Zheng
On Wed, Dec 16, 2009 at 6:36 PM,  <[email protected]> wrote:
> Correct me if I’m wrong, but it looks like I’ll need an INSERT statement for
> each (DS, TYPE) partition:
>
>
>
> hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE
> PARTITION(ds='2009-12-07',type='user') select month, day, time, host,
> logline from syslog where syslog.ds='2009-12-07' and syslog.type='user';
>
> Loading data to table syslog_sequence partition {ds=2009-12-07, type=user}
>
> OK
>
>
>
> Since I’m programmatically loading the data from files into the syslog table
> in the first place, I just need to add another statement to that program
> that adds the data to syslog_sequence every time it  adds a new partition to
> syslog.
>
>
>
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, December 16, 2009 6:16 PM
> To: [email protected]
> Subject: How do I INSERT OVERWRITE into a new table if it's partitioned?
>
>
>
> Hello,
>
>
>
> I’d like to store my log file data that’s imported into Hive in compressed
> format. I was following some steps outlined by Zheng on how to do that,
> where he says:
>
>
>
> CREATE TABLE texttable (...) STORED AS TEXTFILE;
>
> LOAD DATA ... OVERWRITE INTO texttable;
>
> CREATE TABLE seqtable (...) STORED AS SEQUENCEFILE;
>
> set hive.exec.compress.output=true;
>
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
> set mapred.output.compression.type=BLOCK;
>
> INSERT OVERWRITE TABLE seqtable SELECT * FROM texttable;
>
>
>
> but I get stuck on the last step.
>
>
>
> I can’t write to my new SYSLOG_SEQUENCE  table because the tables are
> partitioned:
>
>
>
> hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE  SELECT * FROM SYSLOG;
>
> FAILED: Error in semantic analysis: need to specify partition columns
> because the destination table is partitioned.
>
>
>
> What syntax can I use to get the data in the new table? It has DS and TYPE
> as partition columns:
>
>
>
> hive> describe syslog;
>
> OK
>
> month   string  from deserializer
>
> day     string  from deserializer
>
> time    string  from deserializer
>
> host    string  from deserializer
>
> logline string  from deserializer
>
> ds      string
>
> type    string
>
>
>
> I took a stab at it like below, but that only gives me two partitions in
> total, and of course what I want is the same partitions as exist in the
> original SYSLOG table.
>
>
>
> Any pointers welcome-
>
> Thanks
>
> Ken
>
>
>
> hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE PARTITION(ds='*',type='*')
> select month, day, time, host, logline from syslog;
>
> OK
>
> Time taken: 94.885 seconds
>
>



-- 
Yours,
Zheng

Re: How do I INSERT OVERWRITE into a new table if it's partitioned?

Reply via email to