?????? Using snappy compresscodec in hive
Hi Gopal, Thanks for your reply! One more question, does the effect of using pure-java version is the same as that of using SnappyCodec? Or, in other words, is there any difference between these two methods, about the compression result and effect? Looking forward to your reply and help. Best, Zhefu Peng -- -- ??: "Gopal Vijayaraghavan"; : 2018??7??24??(??) 10:53 ??: "user@hive.apache.org"; : Re: Using snappy compresscodec in hive > "TBLPROPERTIES ("orc.compress"="Snappy"); " That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is slower, but always works). The Hadoop snappyCodec needs libsnappy installed on all hosts. Cheers, Gopal
?????? Using snappy compresscodec in hive
Hi Gopal, Thanks for your reply! One more question, does the effect of using pure-java version is the same as that of using SnappyCodec? Or, in other words, is there any difference between these two methods, about the compression result and effect? Looking forward to your reply and help. Best, Zhefu Peng -- -- ??: "Gopal Vijayaraghavan"; : 2018??7??24??(??) 10:53 ??: "user@hive.apache.org"; : Re: Using snappy compresscodec in hive > "TBLPROPERTIES ("orc.compress"="Snappy"); " That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is slower, but always works). The Hadoop snappyCodec needs libsnappy installed on all hosts. Cheers, Gopal
Using snappy compresscodec in hive
Hi, Here is a confusion I encountered these days: I don't install or build snappy on my hadoop cluster, but when I tested and compared about the compression ratio of Parquet and ORC storage format. During the test, I can set the way of compression for two storage format, for example, using "TBLPROPERTIES ("orc.compress"="Snappy"); " or "set parquet.compression=snappy;", both these commands would work. However, when I just want to compress the textfile format with snappy compression, it says that "can not find or access the snappy library". I wonder why this situation happen, and, I really doubt that whether the ORC or Parquet file using "Snappy" compression. But, the storage really becomes smaller, and diff from the "gzip" or "zlib" compression. Looking forward to your reply and help. Best, Zhefu Peng
Using snappy compresscodec in hive
Hi, Here is a confusion I encounter these days: I don't install or build snappy on my hadoop cluster, but when I tested and compared about the compression ratio of Parquet and ORC storage format. During the test, I can set the way of compression for two storage format, for example, using "TBLPROPERTIES ("orc.compress"="Snappy"); " or "set parquet.compression=snappy;", both these commands would work. However, when I just want to compress the textfile format with snappy compression, it says that "can not find or access the snappy library". I wonder why this situation happen, and, I really doubt that whether the ORC or Parquet file using "Snappy" compression. But, the storage really becomes smaller, and diff from the "gzip" or "zlib" compression. Looking forward to your reply and help. Best, Zhefu Peng
?????? Does Hive 3.0 only works with hadoop3.x.y?
Hi Sungwoo?? Just want to confirm, does that mean I just need to update the hive version, without updating the hadoop version? Thanks! Best, Zhefu Peng -- -- ??: "Sungwoo Park"; : 2018??7??19??(??) 8:20 ??: "user"; : Re: Does Hive 3.0 only works with hadoop3.x.y? Hive 3.0 make a few function calls that depend on Hadoop 3.x, but they are easy to replace with code that compiles okay on Hadoop 2.8+. I am currently running Hadoop 3.x on Hadoop 2.7.6 and HDP 2.6.4 to test with the TPC-DS benchmark, and have not encountered any compatibility issue yet. I previously posted a diff file that lets us compile Hadoop 3.x on Hadoop 2.8+. http://mail-archives.apache.org/mod_mbox/hive-user/201806.mbox/%3CCAKHFPXDDFn52buKetHzSXTtjzX3UMHf%3DQvxm9QNNkv9r5xBs-Q%40mail.gmail.com%3E --- Sungwoo Park On Thu, Jul 19, 2018 at 8:21 PM, ?? <461292...@qq.com> wrote: Hi, I already deployed hive 2.2.0 on our hadoop cluster. And recently, we deployed the spark cluster with 2.3.0, aiming at using the feature that hive on spark engine. However, when I checked the website of hive release, I found the text below: 21 May 2018 : release 3.0.0 available This release works with Hadoop 3.x.y. Now the hadoop version we deployed is hadoop 2.7.6. I wonder, does Hive 3.0 only work with hadoop 3.x.y? Or, if we want to use hive 3.0, we have to update the hadoop version to 3.x.y? Looking forward to your reply and help. Best, Zhefu Peng