[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580730#comment-14580730
 ] 

Dev Lakhani commented on SPARK-8142:
------------------------------------

Hi [~vanzin]

bq.  if you want to use the glassfish jersey version, you shouldn't need to do 
this, right?  Spark depends on the old one that is under com.sun.*, IIRC.

Yes I need to make use of glassfish 2.x in my application and not the sun.* one 
provided, but this could apply to any other dependency that needs to supersede 
Sparks provided etc.

bq. marking all dependencies (including hbase) as provided and using {{spark. 
{driver,executor}.extraClassPath}} might be the easiest way out if you really 
need to use userClassPathFirst.

This is an option but might  be a challenge to scale if we have different 
folder layouts for the extraClassPath in different clusters/nodes for hbase and 
hadoop installs. This can be (and usually is) the case when new servers are 
added to existing ones for example. If we had  /disk4/path/to/hbase/libs and 
the other has /disk3/another/path/to/hbase/libs and so on then the 
extraClassPath will need to include these both and grow significantly and spark 
submit args along with it. Also when we update Hbase these then have to change 
this classpath each time.  

Maybe the ideal way is to have, as you suggest, a blacklist which would contain 
spark and hadoop libs. Then we could put whatever we wanted into one uber/fat 
jar and it doesn't matter where Hbase and Hadoop are installed or what's 
provided and compiled, but we let spark work it out. 

These are just my thoughts, I'm sure others will have different preferences 
and/or better approaches. Thanks anyway for your input on this JIRA.

> Spark Job Fails with ResultTask ClassCastException
> --------------------------------------------------
>
>                 Key: SPARK-8142
>                 URL: https://issues.apache.org/jira/browse/SPARK-8142
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1
>            Reporter: Dev Lakhani
>
> When running a Spark Job, I get no failures in the application code 
> whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
> from HBase and for each partition do a REST call on an API, using a REST 
> client.  This has worked in IntelliJ but when I deploy to a cluster using 
> spark-submit.sh I get :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, host): java.lang.ClassCastException: 
> org.apache.spark.scheduler.ResultTask cannot be cast to 
> org.apache.spark.scheduler.Task
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> These are the configs I set to override the spark classpath because I want to 
> use my own glassfish jersey version:
>  
> sparkConf.set("spark.driver.userClassPathFirst","true");
> sparkConf.set("spark.executor.userClassPathFirst","true");
> I see no other warnings or errors in any of the logs.
> Unfortunately I cannot post my code, but please ask me questions that will 
> help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to