RE: help with compression and index

2012-02-21 Thread Hamilton, Robert (Austin)
The automatic index handling will be very cool.  I'm testing now 0.81 on our 
system and will see how it goes. 
Thanks Mark and Bejoy!


-Original Message-
From: Mark Grover [mailto:mgro...@oanda.com] 
Sent: Tuesday, February 21, 2012 4:03 PM
To: user@hive.apache.org
Subject: Re: help with compression and index

Hi Robert,
As per https://issues.apache.org/jira/browse/HIVE-1644, Hive 0.8 introduces 
automatic accessing of indexes. That might come in handy too!

Mark

Mark Grover, Business Intelligence Analyst OANDA Corporation 

www: oanda.com www: fxtrade.com
e: mgro...@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


- Original Message -
From: "Bejoy Ks" 
To: user@hive.apache.org
Sent: Tuesday, February 21, 2012 11:47:56 AM
Subject: Re: help with compression and index



Hi Hamilton
When you are doing indexing(generate index files) is compression enabled? If so 
you are running into this known issue
https://issues.apache.org/jira/browse/HIVE-2331 


Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is 
recommended. 


Regards
Bejoy.K.S 








From: "Hamilton, Robert (Austin)"  
To: "user@hive.apache.org"  
Sent: Tuesday, February 21, 2012 8:48 PM 
Subject: help with compression and index 

Hi all. I sent this to common-user@hadoop hoping there was an easy answer but 
got no response. 

I have a couple of users who basically have no use case other than the need to 
extract specific rows based on some predetermined set of keys, so I would like 
to be able to just provide them with an index and show them how to join to the 
detail table using the index. So I'm looking for a reliable compression+index 
method with hive. To get an idea of the data size my files add up to about 80TB 
uncompressed but currently gzipped to only 10 TB - I need to keep it small 
(ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would 
actually work first :) 

I've done my testing on LZO and uncompressed test samples. If I use 
uncompressed files the indexed select works OK. If I use LZO it returns only a 
fraction of the rows I expect. I gather that files compressed with other 
compression methods cannot be indexed at all with Hive 0.7.1? 

I'm following the prescription to select buckets/offets into a temporary file, 
set hive.index.compact.file to the temp file, set hive.input.format to 
HiveCompactIndexInputFormat and run my select. That doesn't let me do 
subselects but I don't mind as it is only a very limited use case that I need 
to support. 

This is the only method I could find documented on the net. Is there a better 
way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop 
(currently 0.20.2)? 





Re: help with compression and index

2012-02-21 Thread Mark Grover
Hi Robert,
As per https://issues.apache.org/jira/browse/HIVE-1644, Hive 0.8 introduces 
automatic accessing of indexes. That might come in handy too!

Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgro...@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


- Original Message -
From: "Bejoy Ks" 
To: user@hive.apache.org
Sent: Tuesday, February 21, 2012 11:47:56 AM
Subject: Re: help with compression and index



Hi Hamilton 
When you are doing indexing(generate index files) is compression enabled? If so 
you are running into this known issue 
https://issues.apache.org/jira/browse/HIVE-2331 


Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is 
recommended. 


Regards 
Bejoy.K.S 








From: "Hamilton, Robert (Austin)"  
To: "user@hive.apache.org"  
Sent: Tuesday, February 21, 2012 8:48 PM 
Subject: help with compression and index 

Hi all. I sent this to common-user@hadoop hoping there was an easy answer but 
got no response. 

I have a couple of users who basically have no use case other than the need to 
extract specific rows based on some predetermined set of keys, so I would like 
to be able to just provide them with an index and show them how to join to the 
detail table using the index. So I'm looking for a reliable compression+index 
method with hive. To get an idea of the data size my files add up to about 80TB 
uncompressed but currently gzipped to only 10 TB - I need to keep it small 
(ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would 
actually work first :) 

I've done my testing on LZO and uncompressed test samples. If I use 
uncompressed files the indexed select works OK. If I use LZO it returns only a 
fraction of the rows I expect. I gather that files compressed with other 
compression methods cannot be indexed at all with Hive 0.7.1? 

I'm following the prescription to select buckets/offets into a temporary file, 
set hive.index.compact.file to the temp file, set hive.input.format to 
HiveCompactIndexInputFormat and run my select. That doesn't let me do 
subselects but I don't mind as it is only a very limited use case that I need 
to support. 

This is the only method I could find documented on the net. Is there a better 
way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop 
(currently 0.20.2)? 





Re: help with compression and index

2012-02-21 Thread Bejoy Ks
Hi Hamilton
    When you are doing indexing(generate index files) is compression enabled? 
If so you are running into this known issue
https://issues.apache.org/jira/browse/HIVE-2331

Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is 
recommended.

Regards
Bejoy.K.S





 From: "Hamilton, Robert (Austin)" 
To: "user@hive.apache.org"  
Sent: Tuesday, February 21, 2012 8:48 PM
Subject: help with compression and index
 
Hi all. I sent this to common-user@hadoop hoping there was an easy answer but 
got no response.

I have a couple of users who basically have no use case other than the need to 
extract specific rows based on some predetermined set of keys, so I would like 
to be able to just provide them with an index and show them how to join to the 
detail table using the index.  So I'm looking for a reliable compression+index 
method with hive.  To get an idea of the data size my files add up to about 
80TB uncompressed but currently gzipped to only 10 TB - I need to keep it small 
(ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would 
actually work first :)

I've done my testing on LZO and uncompressed test samples. If I use 
uncompressed files the indexed select works OK. If I use LZO it returns only a 
fraction of the rows I expect.  I gather that files compressed with other 
compression methods cannot be indexed at all with Hive 0.7.1?

I'm following the prescription to select buckets/offets into a temporary file, 
set hive.index.compact.file to the temp file, set hive.input.format to 
HiveCompactIndexInputFormat and run my select.  That doesn't let me do 
subselects but I don't mind as it is only a very limited use case that I need 
to support.

This is the only method I could find documented on the net.  Is there a better 
way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop 
(currently 0.20.2)?

help with compression and index

2012-02-21 Thread Hamilton, Robert (Austin)
Hi all. I sent this to common-user@hadoop hoping there was an easy answer but 
got no response.

 I have a couple of users who basically have no use case other than the need to 
extract specific rows based on some predetermined set of keys, so I would like 
to be able to just provide them with an index and show them how to join to the 
detail table using the index.  So I'm looking for a reliable compression+index 
method with hive.  To get an idea of the data size my files add up to about 
80TB uncompressed but currently gzipped to only 10 TB - I need to keep it small 
(ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would 
actually work first :)

I've done my testing on LZO and uncompressed test samples. If I use 
uncompressed files the indexed select works OK. If I use LZO it returns only a 
fraction of the rows I expect.  I gather that files compressed with other 
compression methods cannot be indexed at all with Hive 0.7.1?

I'm following the prescription to select buckets/offets into a temporary file, 
set hive.index.compact.file to the temp file, set hive.input.format to 
HiveCompactIndexInputFormat and run my select.  That doesn't let me do 
subselects but I don't mind as it is only a very limited use case that I need 
to support.

This is the only method I could find documented on the net.  Is there a better 
way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop 
(currently 0.20.2)?