Looks like a jar conflict to me. ava.lang.NoSuchMethodException: org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
You are having multiple versions of the same jars in the classpath. Thanks Best Regards On Wed, Jul 1, 2015 at 6:58 AM, nkd <kalidas.nimmaga...@gmail.com> wrote: > I am running a spark application in standalone cluster on windows 7 > environment. > Following are the details. > > spark version = 1.4.0 > Windows/Standalone mode > > built the Hadoop 2.6.0 on windows and set the env params like so > HADOOP_HOME = E:\hadooptar260\hadoop-2.6.0 > HADOOP_CONF_DIR =E:\hadooptar260\hadoop-2.6.0\etc\hadoop // where the > core-site.xml resides > added this to the path E:\hadooptar260\hadoop-2.6.0\bin > > Note: I am not starting Hadoop. Wanted to ensure that hadoop libraries are > made available to Spark > especially ensuringe hdsf.jar and haddop-common.jar are in classpath and > winutils in system path > > > @rem startMaster > spark-class2.cmd org.apache.spark.deploy.master.Master --host > machine1.QQQ.HYD --port 7077 > > @rem startWorker.This worker runs on the same machine as the master > spark-class2.cmd org.apache.spark.deploy.worker.Worker > spark://machine1.QQQ.HYD:7077 > > @rem startWorker.This worker runs on a second machine > spark-class2.cmd org.apache.spark.deploy.worker.Worker > spark://machine1.QQQ.HYD:7077 > > @rem startApp.This command is run from the machine where master and first > worker are running > spark-submit2 --verbose --jars /app/lib/ojdbc7.jar --driver-class-path > /app/lib/ojdbc7.jar --driver-library-path > /programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class "org.ETLProcess" > --name MyETL --master spark://machine1.QQQ.HYD:7077 --deploy-mode client > /app/appjar/myapp-0.1.0.jar ETLProcess 1 51 > > @rem to avoid the NoSuchmethodException, tried the following > spark-submit2 --verbose --jars > > /app/lib/ojdbc7.jar,/app/lib/hadoop-common-2.6.0.jar,/app/lib/hadoop-hdfs-2.6.0.jar > --driver-class-path /app/lib/ojdbc7.jar --driver-library-path > /programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class > "org.dwh.oem.transform.ETLProcess" --name SureETL --master > spark://machine1.QQQ.HYD:7077 --deploy-mode client > /app/appjar/myapp-0.1.0.jar ETLProcess 1 51 > > The above the ETL job is completing successfully by fetching the data from > db and storing as json files on each of the worker nodes. > > *In the first node the files are proprly getting commited and I could see > the removal of _temporary folder and marking it as -SUCCESS* > > *The issue is, files in the second node remain in the _temporary folder > making them as not usable for further jobs. Help required to overcome this > this issue* > > * > This is line 176 from SparkHadoopUtil.scala where the below excetion is > occurring * > > private def getFileSystemThreadStatistics(): Seq[AnyRef] = { > val stats = FileSystem.getAllStatistics() > * stats.map(Utils.invoke(classOf[Statistics], _, "getThreadStatistics")) > *=========================> Line 176 > } > > Following are the extracts from the log which also contains the below > exceptions: > > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten() > > java.lang.ClassNotFoundException: > org.apache.hadoop.mapred.InputSplitWithLocationInfo > > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics() > > ----------------------------------------------- > > 2015-06-30 15:55:48 DEBUG NativeCodeLoader:46 - Trying to load the > custom-built native-hadoop library... > 2015-06-30 15:55:48 DEBUG NativeCodeLoader:50 - Loaded the native-hadoop > library > 2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMapping:50 - Using > JniBasedUnixGroupsMapping for Group resolution > 2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMappingWithFallback:44 - Group > mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping > 2015-06-30 15:55:48 DEBUG Groups:80 - Group mapping > impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; > cacheTimeout=300000; warningDeltaMs=5000 > 2015-06-30 15:55:48 DEBUG UserGroupInformation:193 - hadoop login > 2015-06-30 15:55:48 DEBUG UserGroupInformation:142 - hadoop login commit > ----------------------------------------------- > 2015-06-30 15:55:50 DEBUG Master:56 - [actor] received message > RegisterApplication(ApplicationDescription(SureETL)) from > Actor[akka.tcp://sparkDriver@172.16.11.212:59974/user/$a#-1360185865] > 2015-06-30 15:55:50 INFO Master:59 - Registering app SureETL > 2015-06-30 15:55:50 INFO Master:59 - Registered app SureETL with ID > app-20150630155550-0001 > 2015-06-30 15:55:50 INFO Master:59 - Launching executor > app-20150630155550-0001/0 on worker > worker-20150630154548-172.16.11.212-59791 > 2015-06-30 15:55:50 INFO Master:59 - Launching executor > app-20150630155550-0001/1 on worker > worker-20150630155002-172.16.11.133-61908 > 2015-06-30 15:55:50 DEBUG Master:62 - [actor] handled message (8.672752 ms) > RegisterApplication(ApplicationDescription(SureETL)) from > Actor[akka.tcp://sparkDriver@172.16.11.212:59974/user/$a#-1360185865] > ----------------------------------------------- > 2015-06-30 15:56:02 DEBUG Server:228 - rpcKind=RPC_PROTOCOL_BUFFER, > rpcRequestWrapperClass=class > org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, > > rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@604d28c6 > 2015-06-30 15:56:02 DEBUG Client:63 - getting client out of cache: > org.apache.hadoop.ipc.Client@1511d157 > 2015-06-30 15:56:03 DEBUG > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:56 - [actor] received > message AkkaMessage(ReviveOffers,false) from > Actor[akka://sparkDriver/deadLetters] > 2015-06-30 15:56:03 DEBUG > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:63 - Received RPC > message: AkkaMessage(ReviveOffers,false) > 2015-06-30 15:56:03 DEBUG > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:62 - [actor] handled > message (1.73455 ms) AkkaMessage(ReviveOffers,false) from > Actor[akka://sparkDriver/deadLetters] > 2015-06-30 15:56:03 DEBUG BlockReaderLocal:105 - Both short-circuit local > reads and UNIX domain socket are disabled. > 2015-06-30 15:56:03 DEBUG PairRDDFunctions:63 - Saving as hadoop file of > type (NullWritable, Text) > 2015-06-30 15:56:03 DEBUG HadoopRDD:84 - SplitLocationInfo and other new > Hadoop classes are unavailable. Using the older Hadoop location info code. > java.lang.ClassNotFoundException: > org.apache.hadoop.mapred.InputSplitWithLocationInfo > at java.net.URLClassLoader.findClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Unknown Source) > at > > org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386) > at > org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396) > at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395) > at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala) > at > org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:61) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1093) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896) > at > > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1400) > at > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379) > at > org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379) > at > > org.apache.spark.sql.json.DefaultSource.createRelation(JSONRelation.scala:99) > at > org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:305) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135) > at > > org.dwh.oem.extract.OrderLookupExtractor$.orderLookupExtractionProcss(OrderingLookupExtractor.scala:61) > at org.dwh.oem.transform.ETLProcess$.main(ETLProcess.scala:33) > at org.dwh.oem.transform.ETLProcess.main(ETLProcess.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) > at > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 2015-06-30 15:56:03 INFO deprecation:1009 - mapred.tip.id is deprecated. > Instead, use mapreduce.task.id > 2015-06-30 15:56:03 INFO deprecation:1009 - mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > 2015-06-30 15:56:03 INFO deprecation:1009 - mapred.task.is.map is > deprecated. Instead, use mapreduce.task.ismap > 2015-06-30 15:56:03 INFO deprecation:1009 - mapred.task.partition is > deprecated. Instead, use mapreduce.task.partition > 2015-06-30 15:56:03 INFO deprecation:1009 - mapred.job.id is deprecated. > Instead, use mapreduce.job.id > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure > <function2> > > (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13}) > +++ > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared fields: 4 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public static final long > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.serialVersionUID > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1 > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.$outer > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final > org.apache.spark.SerializableWritable > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.wrappedConf$2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final > org.apache.spark.SparkHadoopWriter > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.writer$2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared methods: 3 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final void > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(org.apache.spark.TaskContext,scala.collection.Iterator) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final > java.lang.Object > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(java.lang.Object,java.lang.Object) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1 > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$ > 13.org$apache$spark$rdd$PairRDDFunctions$$anonfun$$anonfun$$$outer() > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + inner classes: 3 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer classes: 2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > org.apache.spark.rdd.PairRDDFunctions > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer objects: 2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - <function0> > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > org.apache.spark.rdd.PairRDDFunctions@5d14e99e > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + populating accessed fields > because this is the starting closure > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + fields accessed by > starting > closure: 2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class > org.apache.spark.rdd.PairRDDFunctions,Set()) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer)) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outermost object is not a > closure, so do not clone it: (class > > org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e > ) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + cloning the object > <function0> of class > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + cleaning cloned closure > <function0> recursively > (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure > <function0> > (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1}) +++ > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared fields: 3 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public static final long > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.serialVersionUID > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final > org.apache.spark.rdd.PairRDDFunctions > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.$outer > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final > org.apache.hadoop.mapred.JobConf > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.conf$4 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared methods: 4 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final > java.lang.Object > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply() > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final void > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply() > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public void > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp() > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public > org.apache.spark.rdd.PairRDDFunctions > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.org > $apache$spark$rdd$PairRDDFunctions$$anonfun$$$outer() > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + inner classes: 5 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$apply$mcV$sp$2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer classes: 1 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > org.apache.spark.rdd.PairRDDFunctions > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer objects: 1 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - > org.apache.spark.rdd.PairRDDFunctions@5d14e99e > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + fields accessed by > starting > closure: 2 > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class > org.apache.spark.rdd.PairRDDFunctions,Set()) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer)) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outermost object is not a > closure, so do not clone it: (class > > org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e > ) > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ closure <function0> > (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1) is > now cleaned +++ > 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ closure <function2> > > (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13) > is now cleaned +++ > 2015-06-30 15:56:03 INFO SparkContext:59 - Starting job: save at > OrderingLookupExtractor.scala:61 > > > ----------------------------------------------------------------------------------------- > 15-06-30 15:56:11 DEBUG SparkHadoopUtil:84 - Couldn't find method for > retrieving thread-level FileSystem output data > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten() > at java.lang.Class.getDeclaredMethod(Unknown Source) > at > > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:182) > at > > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:162) > at > org.apache.spark.rdd.PairRDDFunctions.org > $apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > 2015-06-30 15:56:11 DEBUG HadoopRDD:84 - SplitLocationInfo and other new > Hadoop classes are unavailable. Using the older Hadoop location info code. > java.lang.ClassNotFoundException: > org.apache.hadoop.mapred.InputSplitWithLocationInfo > at java.net.URLClassLoader.findClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Unknown Source) > at > > org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386) > at > org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396) > at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395) > at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala) > at > org.apache.spark.SparkHadoopWriter.setup(SparkHadoopWriter.scala:70) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1103) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > 2015-06-30 15:56:11 DEBUG NativeIO:191 - Initialized cache for IDs to > User/Group mapping with a cache timeout of 14400 seconds. > 2015-06-30 15:56:11 INFO JDBCRDD:59 - closed connection > 2015-06-30 15:56:11 INFO FileOutputCommitter:439 - Saved output of task > 'attempt_201506301556_0000_m_000000_0' to > > file:/sparketl/extract/icasdb_cl/oem/lookup51/dw_app_value_list/_temporary/0/task_201506301556_0000_m_000000 > 2015-06-30 15:56:11 INFO SparkHadoopMapRedUtil:59 - > attempt_201506301556_0000_m_000000_0: Committed > 2015-06-30 15:56:11 INFO JDBCRDD:59 - closed connection > 2015-06-30 15:56:11 INFO Executor:59 - Finished task 0.0 in stage 0.0 (TID > 0). 624 bytes result sent to driver > > -------------------------------------------------------------------------------------- > 2015-06-30 15:57:03 DEBUG SparkHadoopUtil:84 - Couldn't find method for > retrieving thread-level FileSystem output data > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics() > at java.lang.Class.getDeclaredMethod(Unknown Source) > at org.apache.spark.util.Utils$.invoke(Utils.scala:2069) > at > > org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176) > at > > org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:750) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1202) > at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:176) > at > > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:161) > at > org.apache.spark.rdd.PairRDDFunctions.org > $apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101) > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > ----------------------------------------------- > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/output-folder-structure-not-getting-commited-and-remains-as-temporary-tp23557.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >