I first saw this using SparkSQL but the result is the same with plain Spark.
14/11/07 19:46:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) Full stack below .... I tried many different thing without luck * extract the libsnappyjava.so from the Spark assembly and put it on the library path * Added -Djava.library.path=... to SPARK_MASTER_OPTS and SPARK_WORKER_OPTS * added library path to SPARK_LIBRARY_PATH * added hadoop library path to SPARK_LIBRARY_PATH * Rebuilt spark with different versions (previous and next) of Snappy (as seen when Google-ing) Env : Centos 6.4 Hadoop 2.3 (CDH5.1) Running in standalone/local mode Any help would be appreciated Thank you Stephane scala> import org.apache.hadoop.io.BytesWritable import org.apache.hadoop.io.BytesWritable scala> import org.apache.hadoop.io.Text import org.apache.hadoop.io.Text scala> import org.apache.hadoop.io.NullWritable import org.apache.hadoop.io.NullWritable scala> var seq = sc.sequenceFile[NullWritable,Text]("/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/000000_0").map(_._2.toString()) 14/11/07 19:46:19 INFO MemoryStore: ensureFreeSpace(157973) called with curMem=0, maxMem=278302556 14/11/07 19:46:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 154.3 KB, free 265.3 MB) seq: org.apache.spark.rdd.RDD[String] = MappedRDD[2] at map at <console>:15 scala> seq.collect().foreach(println) 14/11/07 19:46:35 INFO FileInputFormat: Total input paths to process : 1 14/11/07 19:46:35 INFO SparkContext: Starting job: collect at <console>:18 14/11/07 19:46:35 INFO DAGScheduler: Got job 0 (collect at <console>:18) with 2 output partitions (allowLocal=false) 14/11/07 19:46:35 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:18) 14/11/07 19:46:35 INFO DAGScheduler: Parents of final stage: List() 14/11/07 19:46:35 INFO DAGScheduler: Missing parents: List() 14/11/07 19:46:35 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at map at <console>:15), which has no missing parents 14/11/07 19:46:35 INFO MemoryStore: ensureFreeSpace(2928) called with curMem=157973, maxMem=278302556 14/11/07 19:46:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.9 KB, free 265.3 MB) 14/11/07 19:46:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[2] at map at <console>:15) 14/11/07 19:46:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/11/07 19:46:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1243 bytes) 14/11/07 19:46:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1243 bytes) 14/11/07 19:46:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 14/11/07 19:46:36 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 14/11/07 19:46:36 INFO HadoopRDD: Input split: file:/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/000000_0:6504064+6504065 14/11/07 19:46:36 INFO HadoopRDD: Input split: file:/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/000000_0:0+6504064 14/11/07 19:46:36 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 14/11/07 19:46:36 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 14/11/07 19:46:36 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 14/11/07 19:46:36 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/11/07 19:46:36 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 14/11/07 19:46:36 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/11/07 19:46:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/11/07 19:46:36 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/11/07 19:46:36 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/11/07 19:46:36 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915) org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) 14/11/07 19:46:36 ERROR TaskSetManager: Task 1 in stage 0.0 failed 1 times; aborting job 14/11/07 19:46:36 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/11/07 19:46:36 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor localhost: java.lang.UnsatisfiedLinkError (org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z) [duplicate 1] 14/11/07 19:46:36 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool