[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580689#comment-14580689 ]
Dev Lakhani commented on SPARK-8142: ------------------------------------ To clarify [~srowen] 1) I meant the other way around, if we choose to use Apache Spark, which "provides" Apache Hadoop libs and we then choose a Cloudera Hadoop distribution on our (the rest of our) cluster and use Cloudera Hadoop clients in the application code. Spark will provide Apache Hadoop libs whereas our cluster will be cdh5. Is there any issue in doing this? We choose to use Apache Spark because the CDH is a version behind the official Spark release and we don't want to wait for say Dataframes support. 2) If I mark my spark core as "provided" right now, as we speak , my code compiles but when I run my application in my IDE using Spark "local" I get: NoClassFoundError org/apache/spark/api/java/function/Function this is why I am suggesting whether we need maven profiles, one for local testing and one for deployment? So getting back to the issue raised in this JIRA, which we seem to be ignoring, even when Hadoop and Spark is provided and Hbase client/protocol/server is packaged we run into SPARK-1867 which at latest comment suggests a dependency is missing and this results in the obscure exception. Whether this is on the Hadoop side or Spark side is not known but as the JIRA suggests it was caused by a missing dependency. I cannot see this missing class/dependency exception anywhere in the spark logs. This suggests that if anyone using Spark sets any of the userClasspath* misses out a primary, secondary or tertiary dependency they will encounter SPARK-1867. Therefore we are stuck, any suggestions are welcome to overcome this. Either there is a need make ChildFirstURLClassLoader ignore Spark and Hadoop libs or help spark log what's causing SPARK-1867. > Spark Job Fails with ResultTask ClassCastException > -------------------------------------------------- > > Key: SPARK-8142 > URL: https://issues.apache.org/jira/browse/SPARK-8142 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.3.1 > Reporter: Dev Lakhani > > When running a Spark Job, I get no failures in the application code > whatsoever but a weird ResultTask Class exception. In my job, I create a RDD > from HBase and for each partition do a REST call on an API, using a REST > client. This has worked in IntelliJ but when I deploy to a cluster using > spark-submit.sh I get : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, host): java.lang.ClassCastException: > org.apache.spark.scheduler.ResultTask cannot be cast to > org.apache.spark.scheduler.Task > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > These are the configs I set to override the spark classpath because I want to > use my own glassfish jersey version: > > sparkConf.set("spark.driver.userClassPathFirst","true"); > sparkConf.set("spark.executor.userClassPathFirst","true"); > I see no other warnings or errors in any of the logs. > Unfortunately I cannot post my code, but please ask me questions that will > help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org