Hi Guys, I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0.
I tried to figure it out with no success. Will appreciate any help or a direction how to attack this issue. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): *java.lang.RuntimeException: native-lzo library not available* *at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)* at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54) at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64) at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38) at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: I see the LZO at GPextras: ll total 104 -rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct 4 2017 COPYING.hadoop-lzo -rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct 4 2017 hadoop-lzo-0.4.15-cdh5.13.0.jar lrwxrwxrwx 1 cloudera-scm cloudera-scm 31 May 3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jar drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Oct 4 2017 native -- Take Care Fawze Abujaber