My Code: val dwsite = sc.sequenceFile("/sys/edw/dw_sites/snapshot/2015/10/18/00/part-r-00000",classOf[Text], classOf[Text]) val records = dwsite.filter { case (k, v) => if(v.toString.indexOf("Bhutan") != -1) true else false }
records.saveAsNewAPIHadoopFile("dw_output12",classOf[Text],classOf[Text], classOf[org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat [Text,Text]]) Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 4, localhost): java.lang.IllegalArgumentException: SequenceFile doesn't work with GzipCodec without native-hadoop code! I cannot install any libraries on this machine or on this cluster as i do not have any kind of write access. I am thinking of using a different compression codec and re-run the same program and hope it works. Hence i included sc.getConf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec") but i still get the same error that implies above line did not affect the compression codec of sequence file output format. What is the fix ? Any suggestions. Appreciate your time. -- Deepak