I first saw this using SparkSQL but the result is the same with plain
Spark.
14/11/07 19:46:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.UnsatisfiedLinkError:
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native
Method)
at
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
Full stack below
I tried many different thing without luck
* extract the libsnappyjava.so from the Spark assembly and put it on
the library path
* Added -Djava.library.path=... to SPARK_MASTER_OPTS
and SPARK_WORKER_OPTS
* added library path to SPARK_LIBRARY_PATH
* added hadoop library path to SPARK_LIBRARY_PATH
* Rebuilt spark with different versions (previous and next) of Snappy
(as seen when Google-ing)
Env :
Centos 6.4
Hadoop 2.3 (CDH5.1)
Running in standalone/local mode
Any help would be appreciated
Thank you
Stephane
scala import org.apache.hadoop.io.BytesWritable
import org.apache.hadoop.io.BytesWritable
scala import org.apache.hadoop.io.Text
import org.apache.hadoop.io.Text
scala import org.apache.hadoop.io.NullWritable
import org.apache.hadoop.io.NullWritable
scala var seq =
sc.sequenceFile[NullWritable,Text](/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/00_0).map(_._2.toString())
14/11/07 19:46:19 INFO MemoryStore: ensureFreeSpace(157973) called with
curMem=0, maxMem=278302556
14/11/07 19:46:19 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 154.3 KB, free 265.3 MB)
seq: org.apache.spark.rdd.RDD[String] = MappedRDD[2] at map at console:15
scala seq.collect().foreach(println)
14/11/07 19:46:35 INFO FileInputFormat: Total input paths to process : 1
14/11/07 19:46:35 INFO SparkContext: Starting job: collect at console:18
14/11/07 19:46:35 INFO DAGScheduler: Got job 0 (collect at console:18)
with 2 output partitions (allowLocal=false)
14/11/07 19:46:35 INFO DAGScheduler: Final stage: Stage 0(collect at
console:18)
14/11/07 19:46:35 INFO DAGScheduler: Parents of final stage: List()
14/11/07 19:46:35 INFO DAGScheduler: Missing parents: List()
14/11/07 19:46:35 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at
map at console:15), which has no missing parents
14/11/07 19:46:35 INFO MemoryStore: ensureFreeSpace(2928) called with
curMem=157973, maxMem=278302556
14/11/07 19:46:35 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 2.9 KB, free 265.3 MB)
14/11/07 19:46:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage
0 (MappedRDD[2] at map at console:15)
14/11/07 19:46:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/11/07 19:46:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, localhost, PROCESS_LOCAL, 1243 bytes)
14/11/07 19:46:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1, localhost, PROCESS_LOCAL, 1243 bytes)
14/11/07 19:46:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/11/07 19:46:36 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
14/11/07 19:46:36 INFO HadoopRDD: Input split:
file:/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/00_0:6504064+6504065
14/11/07 19:46:36 INFO HadoopRDD: Input split:
file:/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/00_0:0+6504064
14/11/07 19:46:36 INFO deprecation: mapred.tip.id is deprecated. Instead,
use mapreduce.task.id
14/11/07 19:46:36 INFO deprecation: mapred.task.is.map is deprecated.
Instead, use mapreduce.task.ismap
14/11/07 19:46:36 INFO deprecation: mapred.task.partition is deprecated.
Instead, use mapreduce.task.partition
14/11/07 19:46:36 INFO deprecation: mapred.job.id is deprecated. Instead,
use mapreduce.job.id
14/11/07 19:46:36 INFO deprecation: mapred.task.id is deprecated. Instead,
use mapreduce.task.attempt.id
14/11/07 19:46:36 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsatisfiedLinkError:
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native
Method)
at
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at
org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
at
org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1773)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:49)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:197)
at