[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580730#comment-14580730 ]
Dev Lakhani commented on SPARK-8142: ------------------------------------ Hi [~vanzin] bq. if you want to use the glassfish jersey version, you shouldn't need to do this, right? Spark depends on the old one that is under com.sun.*, IIRC. Yes I need to make use of glassfish 2.x in my application and not the sun.* one provided, but this could apply to any other dependency that needs to supersede Sparks provided etc. bq. marking all dependencies (including hbase) as provided and using {{spark. {driver,executor}.extraClassPath}} might be the easiest way out if you really need to use userClassPathFirst. This is an option but might be a challenge to scale if we have different folder layouts for the extraClassPath in different clusters/nodes for hbase and hadoop installs. This can be (and usually is) the case when new servers are added to existing ones for example. If we had /disk4/path/to/hbase/libs and the other has /disk3/another/path/to/hbase/libs and so on then the extraClassPath will need to include these both and grow significantly and spark submit args along with it. Also when we update Hbase these then have to change this classpath each time. Maybe the ideal way is to have, as you suggest, a blacklist which would contain spark and hadoop libs. Then we could put whatever we wanted into one uber/fat jar and it doesn't matter where Hbase and Hadoop are installed or what's provided and compiled, but we let spark work it out. These are just my thoughts, I'm sure others will have different preferences and/or better approaches. Thanks anyway for your input on this JIRA. > Spark Job Fails with ResultTask ClassCastException > -------------------------------------------------- > > Key: SPARK-8142 > URL: https://issues.apache.org/jira/browse/SPARK-8142 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.3.1 > Reporter: Dev Lakhani > > When running a Spark Job, I get no failures in the application code > whatsoever but a weird ResultTask Class exception. In my job, I create a RDD > from HBase and for each partition do a REST call on an API, using a REST > client. This has worked in IntelliJ but when I deploy to a cluster using > spark-submit.sh I get : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, host): java.lang.ClassCastException: > org.apache.spark.scheduler.ResultTask cannot be cast to > org.apache.spark.scheduler.Task > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > These are the configs I set to override the spark classpath because I want to > use my own glassfish jersey version: > > sparkConf.set("spark.driver.userClassPathFirst","true"); > sparkConf.set("spark.executor.userClassPathFirst","true"); > I see no other warnings or errors in any of the logs. > Unfortunately I cannot post my code, but please ask me questions that will > help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org