org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi,

I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and 
HBase.
With default setting in Spark 1.0.2, when trying to load a file I got Class 
org.apache.hadoop.io.compress.SnappyCodec not found

Can you please advise how to enable snappy in Spark?

Regards
Arthur


scala inFILE.first()
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.RDD.take(RDD.scala:983)
at org.apache.spark.rdd.RDD.first(RDD.scala:1015)
at $iwC$$iwC$$iwC$$iwC.init(console:15)
at $iwC$$iwC$$iwC.init(console:20)
at $iwC$$iwC.init(console:22)
at $iwC.init(console:24)
at init(console:26)
at .init(console:30)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 55 more
Caused by: java.lang.IllegalArgumentException: Compression codec   
org.apache.hadoop.io.compress.SnappyCodec not found.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi,

my check native result:

hadoop checknative
14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize 
native-bzip2 library system-native, will use pure-Java version
14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded  initialized 
native-zlib library
Native library checking:
hadoop: true 
/mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so
zlib:   true /lib64/libz.so.1
snappy: true 
/mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1
lz4:true revision:99
bzip2:  false

Any idea how to enable or disable  snappy in Spark?

Regards
Arthur


On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.com wrote:

 Hi,
 
 I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and 
 HBase.
 With default setting in Spark 1.0.2, when trying to load a file I got Class 
 org.apache.hadoop.io.compress.SnappyCodec not found
 
 Can you please advise how to enable snappy in Spark?
 
 Regards
 Arthur
 
 
 scala inFILE.first()
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)
   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at org.apache.spark.rdd.RDD.take(RDD.scala:983)
   at org.apache.spark.rdd.RDD.first(RDD.scala:1015)
   at $iwC$$iwC$$iwC$$iwC.init(console:15)
   at $iwC$$iwC$$iwC.init(console:20)
   at $iwC$$iwC.init(console:22)
   at $iwC.init(console:24)
   at init(console:26)
   at .init(console:30)
   at .clinit(console)
   at .init(console:7)
   at .clinit(console)
   at $print(console)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
   at 
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
   at 
 org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
   at 
 org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
   at 
 org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
   at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
   at 
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
   at 
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
   at 
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
   at 
 scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
   at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
   at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
   at org.apache.spark.repl.Main$.main(Main.scala:31)
   at org.apache.spark.repl.Main.main(Main.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.lang.reflect.InvocationTargetException

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 55 more
Caused by: java.lang.IllegalArgumentException: Compression codec   
org.apache.hadoop.io.compress.GzipCodec not found.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175)
at 
org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 60 more
Caused by: java.lang.ClassNotFoundException: Class   
org.apache.hadoop.io.compress.GzipCodec not found
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
... 62 more


Any idea to fix this issue?
Regards
Arthur


On 29 Aug, 2014, at 2:58 am, arthur.hk.c...@gmail.com 
arthur.hk.c...@gmail.com wrote:

 Hi,
 
 my check native result:
 
 hadoop checknative
 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize 
 native-bzip2 library system-native, will use pure-Java version
 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded  initialized 
 native-zlib library
 Native library checking:
 hadoop: true 
 /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so
 zlib:   true /lib64/libz.so.1
 snappy: true 
 /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1
 lz4:true revision:99
 bzip2:  false
 
 Any idea how to enable or disable  snappy in Spark?
 
 Regards
 Arthur
 
 
 On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:
 
 Hi,
 
 I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and 
 HBase.
 With default setting in Spark 1.0.2, when trying to load a file I got Class 
 org.apache.hadoop.io.compress.SnappyCodec not found
 
 Can you please advise how to enable snappy in Spark?
 
 Regards
 Arthur
 
 
 scala inFILE.first()
 java.lang.RuntimeException: Error in configuring object
  at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
  at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
  at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
  at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
  at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
  at org.apache.spark.rdd.RDD.take(RDD.scala:983)
  at org.apache.spark.rdd.RDD.first(RDD.scala:1015)
  at $iwC$$iwC$$iwC$$iwC.init(console:15)
  at $iwC$$iwC$$iwC.init(console:20)
  at $iwC$$iwC.init(console:22)
  at $iwC.init(console:24)
  at init(console:26)
  at .init(console:30)
  at .clinit(console)
  at .init(console:7)
  at .clinit(console)
  at $print(console)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
  at 
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
  at 
 org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
  at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
  at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
  at 
 org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
  at 
 org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
  at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
  at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
  at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
  at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
  at 
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
  at 
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 55 more
 Caused by: java.lang.IllegalArgumentException: Compression codec   
 org.apache.hadoop.io.compress.GzipCodec not found.
   at 
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
   at 
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175)
   at 
 org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
   ... 60 more
 Caused by: java.lang.ClassNotFoundException: Class   
 org.apache.hadoop.io.compress.GzipCodec not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
   ... 62 more
 
 
 Any idea to fix this issue?
 Regards
 Arthur
 
 
 On 29 Aug, 2014, at 2:58 am, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:
 
 Hi,
 
 my check native result:
 
 hadoop checknative
 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize 
 native-bzip2 library system-native, will use pure-Java version
 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded  initialized 
 native-zlib library
 Native library checking:
 hadoop: true 
 /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so
 zlib:   true /lib64/libz.so.1
 snappy: true 
 /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1
 lz4:true revision:99
 bzip2:  false
 
 Any idea how to enable or disable  snappy in Spark?
 
 Regards
 Arthur
 
 
 On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:
 
 Hi,
 
 I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and 
 HBase.
 With default setting in Spark 1.0.2, when trying to load a file I got 
 Class org.apache.hadoop.io.compress.SnappyCodec not found
 
 Can you please advise how to enable snappy in Spark?
 
 Regards
 Arthur
 
 
 scala inFILE.first()
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.rdd.RDD.take(RDD.scala:983)
 at org.apache.spark.rdd.RDD.first(RDD.scala:1015)
 at $iwC$$iwC$$iwC$$iwC.init(console:15)
 at $iwC$$iwC$$iwC.init(console:20)
 at $iwC$$iwC.init(console:22)
 at $iwC.init(console:24)
 at init(console:26)
 at .init(console:30)
 at .clinit(console)
 at .init(console:7)
 at .clinit(console)
 at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
 at 
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
 at 
 org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
 at 
 org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
 at 
 org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
 at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
 at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
 at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
 at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
 at 
 org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala

RE: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread linkpatrickliu
Hi,
You can set the settings in conf/spark-env.sh like this:export 
SPARK_LIBRARY_PATH=/usr/lib/hadoop/lib/native/
SPARK_JAVA_OPTS+=-Djava.library.path=$SPARK_LIBRARY_PATH 
SPARK_JAVA_OPTS+=-Dspark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 
SPARK_JAVA_OPTS+=-Dio.compression.codecs=org.apache.hadoop.io.compress.SnappyCodec
 
export SPARK_JAVA_OPTS

And make sure the snappy native library is located in 
/usr/lib/hadoop/lib/native/ directory.
Also, if you are using 64 bit system, remember to re-compile the snappy native 
library to 64-bit, cause the official snappy lib is 32-bit.
$ file /usr/lib/hadoop/lib/native/*/usr/lib/hadoop/lib/native/libhadoop.a:  
  current ar archive/usr/lib/hadoop/lib/native/libhadooppipes.a:   current ar 
archive/usr/lib/hadoop/lib/native/libhadoop.so:   symbolic link to 
`libhadoop.so.1.0.0'/usr/lib/hadoop/lib/native/libhadoop.so.1.0.0: ELF 64-bit 
LSB shared object, x86-64, version 1 (SYSV), dynamically linked, 
stripped/usr/lib/hadoop/lib/native/libhadooputils.a:   current ar 
archive/usr/lib/hadoop/lib/native/libhdfs.a:  current ar 
archive/usr/lib/hadoop/lib/native/libsnappy.so:   symbolic link to 
`libsnappy.so.1.1.3'/usr/lib/hadoop/lib/native/libsnappy.so.1: symbolic 
link to `libsnappy.so.1.1.3'/usr/lib/hadoop/lib/native/libsnappy.so.1.1.3: ELF 
64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, stripped

BRs,Patrick Liu

Date: Thu, 28 Aug 2014 18:36:25 -0700
From: ml-node+s1001560n13084...@n3.nabble.com
To: linkpatrick...@live.com
Subject: Re: org.apache.hadoop.io.compress.SnappyCodec not found



Hi, I fixed the issue by copying libsnappy.so to Java ire.

RegardsArthur
On 29 Aug, 2014, at 8:12 am, [hidden email] [hidden email] wrote:Hi,
If change my etc/hadoop/core-site.xml 
frompropertynameio.compression.codecs/namevalue  
org.apache.hadoop.io.compress.SnappyCodec,  
org.apache.hadoop.io.compress.GzipCodec,  
org.apache.hadoop.io.compress.DefaultCodec,  
org.apache.hadoop.io.compress.BZip2Codec/value   /property
topropertynameio.compression.codecs/namevalue  
org.apache.hadoop.io.compress.GzipCodec,  
org.apache.hadoop.io.compress.SnappyCodec,  
org.apache.hadoop.io.compress.DefaultCodec,  
org.apache.hadoop.io.compress.BZip2Codec/value   /property


and run the test again, I found this time it cannot find 
org.apache.hadoop.io.compress.GzipCodec
scala inFILE.first()java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)  
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)  
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)   at 
org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)  at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)  at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:202)   at 
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)  at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)  at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:202)   at 
org.apache.spark.rdd.RDD.take(RDD.scala:983) at 
org.apache.spark.rdd.RDD.first(RDD.scala:1015)   at 
$iwC$$iwC$$iwC$$iwC.init(console:15) at 
$iwC$$iwC$$iwC.init(console:20)  at $iwC$$iwC.init(console:22)   at 
$iwC.init(console:24)at init(console:26) at .init(console:30)   
 at .clinit(console) at .init(console:7) at .clinit(console) at 
$print(console)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)   at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)   at 
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at 
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at 
org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)   at 
org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at 
org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608