[ 
https://issues.apache.org/jira/browse/SPARK-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5350.
------------------------------
    Resolution: Not a Problem

Hadoop dependencies should be 'provided' as well, so that you pick up the 
cluster's version when you run spark-submit. Use mvn dependency:tree to 
understand where other 1.x dependencies could be coming from.

I think this is a question that could continue on the mailing list if needed, 
as so far it does not look like any Spark issue. It can be reopened if that 
proves incorrect.

> There are issues when combining Spark and CDK (https://github.com/egonw/cdk). 
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-5350
>                 URL: https://issues.apache.org/jira/browse/SPARK-5350
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.1, 1.2.0
>         Environment: Running Spark using a local computer, using both Mac OS 
> X and a VM with Linux Ubuntu.
>            Reporter: Staffan Arvidsson
>
> I'm using Maven and Eclipse to build my project. When I import the CDK 
> (https://github.com/egonw/cdk) jar-files that I need, and setup the 
> SparkContext and try for instance reading a file (by simply "val lines = 
> sc.textFile(filePath)") I get the following errors in the log:
> {quote}
> [main] DEBUG org.apache.spark.rdd.HadoopRDD  - SplitLocationInfo and other 
> new Hadoop classes are unavailable. Using the older Hadoop location info code.
> java.lang.ClassNotFoundException: 
> org.apache.hadoop.mapred.InputSplitWithLocationInfo
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:191)
>       at 
> org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:381)
>       at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:391)
>       at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:390)
>       at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
>       at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:159)
>       at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
>       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>       at scala.Option.getOrElse(Option.scala:120)
>       at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>       at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>       at scala.Option.getOrElse(Option.scala:120)
>       at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>       at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
>       at org.apache.spark.rdd.RDD.foreach(RDD.scala:765)
> {quote}
> later in the log: 
> {quote}
> [Executor task launch worker-0] DEBUG org.apache.spark.deploy.SparkHadoopUtil 
>  - Couldn't find method for retrieving thread-level FileSystem input data
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()
>       at java.lang.Class.getDeclaredMethod(Class.java:2009)
>       at org.apache.spark.util.Utils$.invoke(Utils.scala:1733)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178)
>       at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>       at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>       at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>       at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>       at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:178)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:138)
>       at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>       at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>       at org.apache.spark.scheduler.Task.run(Task.scala:56)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> {quote}
> There has also been issues related to "HADOOP_HOME" not being set etc., but 
> which seems to be intermittent and only occur sometimes. 
> After testing different versions of both CDK and Spark, I've found out that 
> the Spark version 0.9.1 seems to get things to work. This will not solve my 
> problem though, as I will later need to use functionality from the MLlib that 
> are only in the newer versions of Spark.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to