[ https://issues.apache.org/jira/browse/SPARK-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-5350. ------------------------------ Resolution: Not a Problem Hadoop dependencies should be 'provided' as well, so that you pick up the cluster's version when you run spark-submit. Use mvn dependency:tree to understand where other 1.x dependencies could be coming from. I think this is a question that could continue on the mailing list if needed, as so far it does not look like any Spark issue. It can be reopened if that proves incorrect. > There are issues when combining Spark and CDK (https://github.com/egonw/cdk). > ------------------------------------------------------------------------------ > > Key: SPARK-5350 > URL: https://issues.apache.org/jira/browse/SPARK-5350 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.1.1, 1.2.0 > Environment: Running Spark using a local computer, using both Mac OS > X and a VM with Linux Ubuntu. > Reporter: Staffan Arvidsson > > I'm using Maven and Eclipse to build my project. When I import the CDK > (https://github.com/egonw/cdk) jar-files that I need, and setup the > SparkContext and try for instance reading a file (by simply "val lines = > sc.textFile(filePath)") I get the following errors in the log: > {quote} > [main] DEBUG org.apache.spark.rdd.HadoopRDD - SplitLocationInfo and other > new Hadoop classes are unavailable. Using the older Hadoop location info code. > java.lang.ClassNotFoundException: > org.apache.hadoop.mapred.InputSplitWithLocationInfo > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:381) > at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:391) > at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:390) > at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:159) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) > at org.apache.spark.rdd.RDD.foreach(RDD.scala:765) > {quote} > later in the log: > {quote} > [Executor task launch worker-0] DEBUG org.apache.spark.deploy.SparkHadoopUtil > - Couldn't find method for retrieving thread-level FileSystem input data > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics() > at java.lang.Class.getDeclaredMethod(Class.java:2009) > at org.apache.spark.util.Utils$.invoke(Utils.scala:1733) > at > org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) > at > org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:178) > at > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:138) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {quote} > There has also been issues related to "HADOOP_HOME" not being set etc., but > which seems to be intermittent and only occur sometimes. > After testing different versions of both CDK and Spark, I've found out that > the Spark version 0.9.1 seems to get things to work. This will not solve my > problem though, as I will later need to use functionality from the MLlib that > are only in the newer versions of Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org