Hello, I'd like to store my log file data that's imported into Hive in compressed format. I was following some steps outlined by Zheng on how to do that, where he says:
CREATE TABLE texttable (...) STORED AS TEXTFILE; LOAD DATA ... OVERWRITE INTO texttable; CREATE TABLE seqtable (...) STORED AS SEQUENCEFILE; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set mapred.output.compression.type=BLOCK; INSERT OVERWRITE TABLE seqtable SELECT * FROM texttable; but I get stuck on the last step. I can't write to my new SYSLOG_SEQUENCE table because the tables are partitioned: hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE SELECT * FROM SYSLOG; FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned. What syntax can I use to get the data in the new table? It has DS and TYPE as partition columns: hive> describe syslog; OK month string from deserializer day string from deserializer time string from deserializer host string from deserializer logline string from deserializer ds string type string I took a stab at it like below, but that only gives me two partitions in total, and of course what I want is the same partitions as exist in the original SYSLOG table. Any pointers welcome- Thanks Ken hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE PARTITION(ds='*',type='*') select month, day, time, host, logline from syslog; OK Time taken: 94.885 seconds
