?????? Using snappy compresscodec in hive

2018-07-25 Thread Zhefu Peng
Hi Gopal,


Thanks for your reply! One more question, does the effect of using pure-java 
version is the same as that of using SnappyCodec? Or, in other words, is there 
any difference between these two methods, about the compression result and 
effect?


Looking forward to your reply and help.


Best,
Zhefu Peng




--  --
??: "Gopal Vijayaraghavan";
: 2018??7??24??(??) 10:53
??: "user@hive.apache.org";

: Re: Using snappy compresscodec in hive




> "TBLPROPERTIES ("orc.compress"="Snappy"); " 

That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is 
slower, but always works).

The Hadoop snappyCodec needs libsnappy installed on all hosts.

Cheers,
Gopal

?????? Using snappy compresscodec in hive

2018-07-24 Thread Zhefu Peng
Hi Gopal,


Thanks for your reply! One more question, does the effect of using pure-java 
version is the same as that of using SnappyCodec? Or, in other words, is there 
any difference between these two methods, about the compression result and 
effect?


Looking forward to your reply and help.


Best,
Zhefu Peng




--  --
??: "Gopal Vijayaraghavan";
: 2018??7??24??(??) 10:53
??: "user@hive.apache.org";

: Re: Using snappy compresscodec in hive




> "TBLPROPERTIES ("orc.compress"="Snappy"); " 

That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is 
slower, but always works).

The Hadoop snappyCodec needs libsnappy installed on all hosts.

Cheers,
Gopal

Using snappy compresscodec in hive

2018-07-23 Thread Zhefu Peng
Hi,


Here is a confusion I encountered these days: I don't install or build snappy 
on my hadoop cluster, but when I tested and compared about the compression 
ratio of Parquet and ORC storage format. During the test, I can set the way of 
compression for two storage format, for example, using "TBLPROPERTIES 
("orc.compress"="Snappy"); " or "set parquet.compression=snappy;", both these 
commands would work. However, when I just want to compress the textfile format 
with snappy compression, it says that "can not find or access the snappy 
library".


I wonder why this situation happen, and, I really doubt that whether the ORC or 
Parquet file using "Snappy" compression. But, the storage really becomes 
smaller, and diff from the "gzip" or "zlib" compression.


Looking forward to your reply and help.


Best,
Zhefu Peng

Using snappy compresscodec in hive

2018-07-22 Thread Zhefu PENG
Hi,


Here is a confusion I encounter these days: I don't install or build snappy on 
my hadoop cluster, but when I tested and compared about the compression ratio 
of Parquet and ORC storage format. During the test, I can set the way of 
compression for two storage format, for example, using "TBLPROPERTIES 
("orc.compress"="Snappy"); " or "set parquet.compression=snappy;", both these 
commands would work. However, when I just want to compress the textfile format 
with snappy compression, it says that "can not find or access the snappy 
library".


I wonder why this situation happen, and, I really doubt that whether the ORC or 
Parquet file using "Snappy" compression. But, the storage really becomes 
smaller, and diff from the "gzip" or "zlib" compression.


Looking forward to your reply and help.


Best,
Zhefu Peng

?????? Does Hive 3.0 only works with hadoop3.x.y?

2018-07-19 Thread Zhefu Peng
Hi Sungwoo??


Just want to confirm, does that mean I just need to update the hive version, 
without updating the hadoop version?


Thanks!


Best,
Zhefu Peng




--  --
??: "Sungwoo Park";
: 2018??7??19??(??) 8:20
??: "user";

: Re: Does Hive 3.0 only works with hadoop3.x.y?



Hive 3.0 make a few function calls that depend on Hadoop 3.x, but they are easy 
to replace with code that compiles okay on Hadoop 2.8+. I am currently running 
Hadoop 3.x on Hadoop 2.7.6 and HDP 2.6.4 to test with the TPC-DS benchmark, and 
have not encountered any compatibility issue yet. I previously posted a diff 
file that lets us compile Hadoop 3.x on Hadoop 2.8+.

http://mail-archives.apache.org/mod_mbox/hive-user/201806.mbox/%3CCAKHFPXDDFn52buKetHzSXTtjzX3UMHf%3DQvxm9QNNkv9r5xBs-Q%40mail.gmail.com%3E
 

--- Sungwoo Park





On Thu, Jul 19, 2018 at 8:21 PM, ?? <461292...@qq.com> wrote:
Hi,


I already deployed hive 2.2.0 on our hadoop cluster. And recently, we deployed 
the spark cluster with 2.3.0, aiming at using the feature that hive on spark 
engine. However, when I checked the website of hive release, I found the text 
below:

21 May 2018 : release 3.0.0 available

This release works with Hadoop 3.x.y.

Now the hadoop version we deployed is hadoop 2.7.6. I wonder, does Hive 3.0 
only work with hadoop 3.x.y? Or, if we want to use hive 3.0, we have to update 
the hadoop version to 3.x.y?

Looking forward to your reply and help.

Best,

Zhefu Peng