Re: How to output SeqFile

2010-10-06 Thread gaurav jain
I do have that.

However I am not writing directly to the table partition. Instead, I first 
write 
my data in a tmp directory (eventually moved to the hdfs table partition)  and 
then publish that partition using alter table statement in metastore. 

Something like this:

-- create table x ... stored as SeqFile
-- insert overwrite directory 'd' select * from table y
-- distcp 'd'  x/dateint=.../hour=...
-- alter table x add partition 

In the second step above I need to produce SeqFile.


Thanks for prompt reply.
Gaurav Jain


- Original Message 
From: Yang tedd...@gmail.com
To: jainy_gau...@yahoo.com
Sent: Wed, October 6, 2010 1:28:42 PM
Subject: Re: How to output SeqFile

Gaurav:

not sure if I understand your question correctly
when you create the output table, that has an option to set the
output table SerDe

Regards
Yang

On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote:




 How can I produce a sequence file from query

 insert overwrite directory 


 I have set:

 SET io.seqfile.compression.type=BLOCK;
 SET hive.exec.compress.output=true;
 set mapred.output.compression.type=BLOCK;
 set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



 It seems to produce Text .gz format files.



 Regards,
 Gaurav Jain







  


Re: How to output SeqFile

2010-10-06 Thread gaurav jain


Thanks Yang. I thought about it as well. But as you said, its a hack.

hive-dev@, can you please verify if this is possible?



- Original Message 
From: Yang tedd...@gmail.com
To: hive-u...@hadoop.apache.org
Sent: Wed, October 6, 2010 1:52:21 PM
Subject: Re: How to output SeqFile

if this is indeed a feature that is yet missing, I have a hack:

create a temp table that is seqFile format, then you dump to that table,
then since you know the location, just copy the part files from that location.
then delete that partition from the table manually. of course you may
run into some issues
such as partition already exists when you insert into the temp table
the next time, so you may need
to do an explicit delete from the temp table too.

Y

On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain jainy_gau...@yahoo.com wrote:
 I was hoping there would be a configuration where I can set the outputformat 
for
 my query.

 Regards,
 Gaurav Jain



 - Original Message 
 From: Jacob R Rideout apa...@jacobrideout.net
 To: hive-u...@hadoop.apache.org
 Sent: Wed, October 6, 2010 1:42:57 PM
 Subject: Re: How to output SeqFile

 On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain jainy_gau...@yahoo.com wrote:
 I do have that.

 However I am not writing directly to the table partition. Instead, I first
write
 my data in a tmp directory (eventually moved to the hdfs table partition) 
 and
 then publish that partition using alter table statement in metastore.

 Something like this:

 -- create table x ... stored as SeqFile
 -- insert overwrite directory 'd' select * from table y
 -- distcp 'd'  x/dateint=.../hour=...
 -- alter table x add partition 

 In the second step above I need to produce SeqFile.


 Thanks for prompt reply.
 Gaurav Jain


 - Original Message 
 From: Yang tedd...@gmail.com
 To: jainy_gau...@yahoo.com
 Sent: Wed, October 6, 2010 1:28:42 PM
 Subject: Re: How to output SeqFile

 Gaurav:

 not sure if I understand your question correctly
 when you create the output table, that has an option to set the
 output table SerDe

 Regards
 Yang

 On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote:




 How can I produce a sequence file from query

 insert overwrite directory 


 I have set:

 SET io.seqfile.compression.type=BLOCK;
 SET hive.exec.compress.output=true;
 set mapred.output.compression.type=BLOCK;
 set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



 It seems to produce Text .gz format files.



 Regards,
 Gaurav Jain











 if you are inserting into the directory rather than the table, hive
 won't know to look at the metadata description of the table

 you need something like:
 insert overwrite table x select * from table y








  


Re: How to output SeqFile

2010-10-06 Thread yongqiang he
can you try
set hive.query.result.fileformat=sequencefile;

if not work, you can also try
set hive.default.fileformat=sequencefile;

thanks
yongqiang
On Wed, Oct 6, 2010 at 2:29 PM, gaurav jain jainy_gau...@yahoo.com wrote:


 Thanks Yang. I thought about it as well. But as you said, its a hack.

 hive-dev@, can you please verify if this is possible?



 - Original Message 
 From: Yang tedd...@gmail.com
 To: hive-u...@hadoop.apache.org
 Sent: Wed, October 6, 2010 1:52:21 PM
 Subject: Re: How to output SeqFile

 if this is indeed a feature that is yet missing, I have a hack:

 create a temp table that is seqFile format, then you dump to that table,
 then since you know the location, just copy the part files from that location.
 then delete that partition from the table manually. of course you may
 run into some issues
 such as partition already exists when you insert into the temp table
 the next time, so you may need
 to do an explicit delete from the temp table too.

 Y

 On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain jainy_gau...@yahoo.com wrote:
 I was hoping there would be a configuration where I can set the outputformat
for
 my query.

 Regards,
 Gaurav Jain



 - Original Message 
 From: Jacob R Rideout apa...@jacobrideout.net
 To: hive-u...@hadoop.apache.org
 Sent: Wed, October 6, 2010 1:42:57 PM
 Subject: Re: How to output SeqFile

 On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain jainy_gau...@yahoo.com wrote:
 I do have that.

 However I am not writing directly to the table partition. Instead, I first
write
 my data in a tmp directory (eventually moved to the hdfs table partition)
  and
 then publish that partition using alter table statement in metastore.

 Something like this:

 -- create table x ... stored as SeqFile
 -- insert overwrite directory 'd' select * from table y
 -- distcp 'd'  x/dateint=.../hour=...
 -- alter table x add partition 

 In the second step above I need to produce SeqFile.


 Thanks for prompt reply.
 Gaurav Jain


 - Original Message 
 From: Yang tedd...@gmail.com
 To: jainy_gau...@yahoo.com
 Sent: Wed, October 6, 2010 1:28:42 PM
 Subject: Re: How to output SeqFile

 Gaurav:

 not sure if I understand your question correctly
 when you create the output table, that has an option to set the
 output table SerDe

 Regards
 Yang

 On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote:




 How can I produce a sequence file from query

 insert overwrite directory 


 I have set:

 SET io.seqfile.compression.type=BLOCK;
 SET hive.exec.compress.output=true;
 set mapred.output.compression.type=BLOCK;
 set 
 mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



 It seems to produce Text .gz format files.



 Regards,
 Gaurav Jain











 if you are inserting into the directory rather than the table, hive
 won't know to look at the metadata description of the table

 you need something like:
 insert overwrite table x select * from table y












Re: How to output SeqFile

2010-10-06 Thread gaurav jain
I tried your suggestions with config 

set hive.query.result.fileformat=sequencefile;

and then ( separately)

set hive.default.fileformat=sequencefile;


It does not work

As per docs, I think these options are only applied for CREATE TABLE query ??


any other suggestion will be helpful.
gaurav jain



- Original Message 
From: yongqiang he heyongqiang...@gmail.com
To: hive-dev@hadoop.apache.org
Sent: Wed, October 6, 2010 6:34:53 PM
Subject: Re: How to output SeqFile

can you try
set hive.query.result.fileformat=sequencefile;

if not work, you can also try
set hive.default.fileformat=sequencefile;

thanks
yongqiang
On Wed, Oct 6, 2010 at 2:29 PM, gaurav jain jainy_gau...@yahoo.com wrote:


 Thanks Yang. I thought about it as well. But as you said, its a hack.

 hive-dev@, can you please verify if this is possible?



 - Original Message 
 From: Yang tedd...@gmail.com
 To: hive-u...@hadoop.apache.org
 Sent: Wed, October 6, 2010 1:52:21 PM
 Subject: Re: How to output SeqFile

 if this is indeed a feature that is yet missing, I have a hack:

 create a temp table that is seqFile format, then you dump to that table,
 then since you know the location, just copy the part files from that location.
 then delete that partition from the table manually. of course you may
 run into some issues
 such as partition already exists when you insert into the temp table
 the next time, so you may need
 to do an explicit delete from the temp table too.

 Y

 On Wed, Oct 6, 2010 at 1:46 PM, gaurav jain jainy_gau...@yahoo.com wrote:
 I was hoping there would be a configuration where I can set the outputformat
for
 my query.

 Regards,
 Gaurav Jain



 - Original Message 
 From: Jacob R Rideout apa...@jacobrideout.net
 To: hive-u...@hadoop.apache.org
 Sent: Wed, October 6, 2010 1:42:57 PM
 Subject: Re: How to output SeqFile

 On Wed, Oct 6, 2010 at 2:35 PM, gaurav jain jainy_gau...@yahoo.com wrote:
 I do have that.

 However I am not writing directly to the table partition. Instead, I first
write
 my data in a tmp directory (eventually moved to the hdfs table partition)
  and
 then publish that partition using alter table statement in metastore.

 Something like this:

 -- create table x ... stored as SeqFile
 -- insert overwrite directory 'd' select * from table y
 -- distcp 'd'  x/dateint=.../hour=...
 -- alter table x add partition 

 In the second step above I need to produce SeqFile.


 Thanks for prompt reply.
 Gaurav Jain


 - Original Message 
 From: Yang tedd...@gmail.com
 To: jainy_gau...@yahoo.com
 Sent: Wed, October 6, 2010 1:28:42 PM
 Subject: Re: How to output SeqFile

 Gaurav:

 not sure if I understand your question correctly
 when you create the output table, that has an option to set the
 output table SerDe

 Regards
 Yang

 On Wed, Oct 6, 2010 at 1:18 PM, gaurav jain jainy_gau...@yahoo.com wrote:




 How can I produce a sequence file from query

 insert overwrite directory 


 I have set:

 SET io.seqfile.compression.type=BLOCK;
 SET hive.exec.compress.output=true;
 set mapred.output.compression.type=BLOCK;
 set 
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



 It seems to produce Text .gz format files.



 Regards,
 Gaurav Jain











 if you are inserting into the directory rather than the table, hive
 won't know to look at the metadata description of the table

 you need something like:
 insert overwrite table x select * from table y