Re: How to output SeqFile
I do have that. However I am not writing directly to the table partition. Instead, I first write my data in a tmp directory (eventually moved to the hdfs table partition) and then publish that partition using alter table statement in metastore. Something like this: -- create table x ... stored as SeqFile -- insert overwrite directory 'd' select * from table y -- distcp 'd' x/dateint=.../hour=... -- alter table x add partition In the second step above I need to produce SeqFile. Thanks for prompt reply. Gaurav Jain - Original Message From: Yang tedd...@gmail.com To: jainy_gau...@yahoo.com Sent: Wed, October 6, 2010 1:28:42 PM Subject: Re: How to output SeqFile Gaurav: not sure if I understand your question correctly when you create the output table, that has an option to set the output table SerDe Regards Yang On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote: How can I produce a sequence file from query insert overwrite directory I have set: SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; It seems to produce Text .gz format files. Regards, Gaurav Jain
Re: How to output SeqFile
Thanks Yang. I thought about it as well. But as you said, its a hack. hive-dev@, can you please verify if this is possible? - Original Message From: Yang tedd...@gmail.com To: hive-u...@hadoop.apache.org Sent: Wed, October 6, 2010 1:52:21 PM Subject: Re: How to output SeqFile if this is indeed a feature that is yet missing, I have a hack: create a temp table that is seqFile format, then you dump to that table, then since you know the location, just copy the part files from that location. then delete that partition from the table manually. of course you may run into some issues such as partition already exists when you insert into the temp table the next time, so you may need to do an explicit delete from the temp table too. Y On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain jainy_gau...@yahoo.com wrote: I was hoping there would be a configuration where I can set the outputformat for my query. Regards, Gaurav Jain - Original Message From: Jacob R Rideout apa...@jacobrideout.net To: hive-u...@hadoop.apache.org Sent: Wed, October 6, 2010 1:42:57 PM Subject: Re: How to output SeqFile On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain jainy_gau...@yahoo.com wrote: I do have that. However I am not writing directly to the table partition. Instead, I first write my data in a tmp directory (eventually moved to the hdfs table partition) and then publish that partition using alter table statement in metastore. Something like this: -- create table x ... stored as SeqFile -- insert overwrite directory 'd' select * from table y -- distcp 'd' x/dateint=.../hour=... -- alter table x add partition In the second step above I need to produce SeqFile. Thanks for prompt reply. Gaurav Jain - Original Message From: Yang tedd...@gmail.com To: jainy_gau...@yahoo.com Sent: Wed, October 6, 2010 1:28:42 PM Subject: Re: How to output SeqFile Gaurav: not sure if I understand your question correctly when you create the output table, that has an option to set the output table SerDe Regards Yang On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote: How can I produce a sequence file from query insert overwrite directory I have set: SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; It seems to produce Text .gz format files. Regards, Gaurav Jain if you are inserting into the directory rather than the table, hive won't know to look at the metadata description of the table you need something like: insert overwrite table x select * from table y
Re: How to output SeqFile
can you try set hive.query.result.fileformat=sequencefile; if not work, you can also try set hive.default.fileformat=sequencefile; thanks yongqiang On Wed, Oct 6, 2010 at 2:29 PM, gaurav jain jainy_gau...@yahoo.com wrote: Thanks Yang. I thought about it as well. But as you said, its a hack. hive-dev@, can you please verify if this is possible? - Original Message From: Yang tedd...@gmail.com To: hive-u...@hadoop.apache.org Sent: Wed, October 6, 2010 1:52:21 PM Subject: Re: How to output SeqFile if this is indeed a feature that is yet missing, I have a hack: create a temp table that is seqFile format, then you dump to that table, then since you know the location, just copy the part files from that location. then delete that partition from the table manually. of course you may run into some issues such as partition already exists when you insert into the temp table the next time, so you may need to do an explicit delete from the temp table too. Y On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain jainy_gau...@yahoo.com wrote: I was hoping there would be a configuration where I can set the outputformat for my query. Regards, Gaurav Jain - Original Message From: Jacob R Rideout apa...@jacobrideout.net To: hive-u...@hadoop.apache.org Sent: Wed, October 6, 2010 1:42:57 PM Subject: Re: How to output SeqFile On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain jainy_gau...@yahoo.com wrote: I do have that. However I am not writing directly to the table partition. Instead, I first write my data in a tmp directory (eventually moved to the hdfs table partition) and then publish that partition using alter table statement in metastore. Something like this: -- create table x ... stored as SeqFile -- insert overwrite directory 'd' select * from table y -- distcp 'd' x/dateint=.../hour=... -- alter table x add partition In the second step above I need to produce SeqFile. Thanks for prompt reply. Gaurav Jain - Original Message From: Yang tedd...@gmail.com To: jainy_gau...@yahoo.com Sent: Wed, October 6, 2010 1:28:42 PM Subject: Re: How to output SeqFile Gaurav: not sure if I understand your question correctly when you create the output table, that has an option to set the output table SerDe Regards Yang On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote: How can I produce a sequence file from query insert overwrite directory I have set: SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; It seems to produce Text .gz format files. Regards, Gaurav Jain if you are inserting into the directory rather than the table, hive won't know to look at the metadata description of the table you need something like: insert overwrite table x select * from table y
Re: How to output SeqFile
I tried your suggestions with config set hive.query.result.fileformat=sequencefile; and then ( separately) set hive.default.fileformat=sequencefile; It does not work As per docs, I think these options are only applied for CREATE TABLE query ?? any other suggestion will be helpful. gaurav jain - Original Message From: yongqiang he heyongqiang...@gmail.com To: hive-dev@hadoop.apache.org Sent: Wed, October 6, 2010 6:34:53 PM Subject: Re: How to output SeqFile can you try set hive.query.result.fileformat=sequencefile; if not work, you can also try set hive.default.fileformat=sequencefile; thanks yongqiang On Wed, Oct 6, 2010 at 2:29 PM, gaurav jain jainy_gau...@yahoo.com wrote: Thanks Yang. I thought about it as well. But as you said, its a hack. hive-dev@, can you please verify if this is possible? - Original Message From: Yang tedd...@gmail.com To: hive-u...@hadoop.apache.org Sent: Wed, October 6, 2010 1:52:21 PM Subject: Re: How to output SeqFile if this is indeed a feature that is yet missing, I have a hack: create a temp table that is seqFile format, then you dump to that table, then since you know the location, just copy the part files from that location. then delete that partition from the table manually. of course you may run into some issues such as partition already exists when you insert into the temp table the next time, so you may need to do an explicit delete from the temp table too. Y On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain jainy_gau...@yahoo.com wrote: I was hoping there would be a configuration where I can set the outputformat for my query. Regards, Gaurav Jain - Original Message From: Jacob R Rideout apa...@jacobrideout.net To: hive-u...@hadoop.apache.org Sent: Wed, October 6, 2010 1:42:57 PM Subject: Re: How to output SeqFile On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain jainy_gau...@yahoo.com wrote: I do have that. However I am not writing directly to the table partition. Instead, I first write my data in a tmp directory (eventually moved to the hdfs table partition) and then publish that partition using alter table statement in metastore. Something like this: -- create table x ... stored as SeqFile -- insert overwrite directory 'd' select * from table y -- distcp 'd' x/dateint=.../hour=... -- alter table x add partition In the second step above I need to produce SeqFile. Thanks for prompt reply. Gaurav Jain - Original Message From: Yang tedd...@gmail.com To: jainy_gau...@yahoo.com Sent: Wed, October 6, 2010 1:28:42 PM Subject: Re: How to output SeqFile Gaurav: not sure if I understand your question correctly when you create the output table, that has an option to set the output table SerDe Regards Yang On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote: How can I produce a sequence file from query insert overwrite directory I have set: SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; It seems to produce Text .gz format files. Regards, Gaurav Jain if you are inserting into the directory rather than the table, hive won't know to look at the metadata description of the table you need something like: insert overwrite table x select * from table y