[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580689#comment-14580689
 ] 

Dev Lakhani commented on SPARK-8142:
------------------------------------

To clarify [~srowen] 

1)  I meant the other way around, if we choose to use Apache Spark, which 
"provides" Apache Hadoop libs and we then choose a Cloudera Hadoop distribution 
on our (the rest of our) cluster and use Cloudera Hadoop clients in the 
application code. Spark will provide Apache Hadoop libs whereas our cluster 
will be cdh5. Is there any issue in doing this?

We choose to use Apache Spark because the CDH is a version behind the official 
Spark release and we don't want to wait for say Dataframes support.

2) If I mark my spark core as "provided" right now, as we speak , my code 
compiles but when I run my application in my IDE using Spark "local" I get: 
NoClassFoundError org/apache/spark/api/java/function/Function this is why I am 
suggesting whether we need maven profiles, one for local testing and one for 
deployment? 

So getting back to the issue raised in this JIRA, which we seem to be ignoring, 
even when Hadoop and Spark is provided and Hbase client/protocol/server is 
packaged we run into SPARK-1867 which at latest comment suggests a dependency 
is missing and this results in the obscure exception. Whether this is on the 
Hadoop side or Spark side is not known but as the JIRA suggests it was caused 
by a missing dependency. I cannot see this missing class/dependency exception 
anywhere in the spark logs. This suggests that if anyone using Spark sets any 
of the userClasspath* misses out a primary, secondary or tertiary dependency 
they will encounter SPARK-1867.

Therefore we are stuck, any suggestions are welcome to overcome this. Either 
there is a need make ChildFirstURLClassLoader ignore Spark and Hadoop libs or 
help spark log what's causing SPARK-1867.



> Spark Job Fails with ResultTask ClassCastException
> --------------------------------------------------
>
>                 Key: SPARK-8142
>                 URL: https://issues.apache.org/jira/browse/SPARK-8142
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1
>            Reporter: Dev Lakhani
>
> When running a Spark Job, I get no failures in the application code 
> whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
> from HBase and for each partition do a REST call on an API, using a REST 
> client.  This has worked in IntelliJ but when I deploy to a cluster using 
> spark-submit.sh I get :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, host): java.lang.ClassCastException: 
> org.apache.spark.scheduler.ResultTask cannot be cast to 
> org.apache.spark.scheduler.Task
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> These are the configs I set to override the spark classpath because I want to 
> use my own glassfish jersey version:
>  
> sparkConf.set("spark.driver.userClassPathFirst","true");
> sparkConf.set("spark.executor.userClassPathFirst","true");
> I see no other warnings or errors in any of the logs.
> Unfortunately I cannot post my code, but please ask me questions that will 
> help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to