Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition
On Wed, July 2, 2014 2:00 am, Mayur Rustagi wrote: two job context cannot share data, are you collecting the data to the master then sending it to the other context? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi honeyjo...@ideata-analytics.com wrote: On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote: Ideally you should be converting RDD to schemardd ? You are creating UnionRDD to join across dstream rdd? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi honeyjo...@ideata-analytics.com wrote: Hi, I am trying to run a project which takes data as a DStream and dumps the data in the Shark table after various operations. I am getting the following error : Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArra y.sc ala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala :102 6) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap ply( DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap ply( DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.s cala :619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$a nonf un$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Ab stra ctDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260 ) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPo ol.j ava:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java: 1979 ) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerTh read .java:107) Can someone please explain the cause of this error, I am also using a Spark Context with the existing Streaming Context. I am using spark 0.9.0-Incubating, so it doesnt have anything to do with schemaRDD.This error is probably coming when I am trying to use one spark context and one shark context in the same job.Is there any way to incorporate two context in one job? Regards Honey Joshi Ideata-Analytics Both of these contexts are independently executing but they were still giving us issues, mostly because of the lazy evaluation in scala.This error is probably coming when I am trying to use one spark context and one shark context in the same job.Got it resolved by stopping the existing spark context before calling the shark context. Thanks for your help Mayur.
Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition
Ideally you should be converting RDD to schemardd ? You are creating UnionRDD to join across dstream rdd? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi honeyjo...@ideata-analytics.com wrote: Hi, I am trying to run a project which takes data as a DStream and dumps the data in the Shark table after various operations. I am getting the following error : Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Can someone please explain the cause of this error, I am also using a Spark Context with the existing Streaming Context.
Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition
On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote: Ideally you should be converting RDD to schemardd ? You are creating UnionRDD to join across dstream rdd? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi honeyjo...@ideata-analytics.com wrote: Hi, I am trying to run a project which takes data as a DStream and dumps the data in the Shark table after various operations. I am getting the following error : Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sc ala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:102 6) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply( DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply( DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala :619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonf un$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abstra ctDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.j ava:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979 ) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread .java:107) Can someone please explain the cause of this error, I am also using a Spark Context with the existing Streaming Context. I am using spark 0.9.0-Incubating, so it doesnt have anything to do with schemaRDD.This error is probably coming when I am trying to use one spark context and one shark context in the same job.Is there any way to incorporate two context in one job? Regards Honey Joshi Ideata-Analytics
Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition
two job context cannot share data, are you collecting the data to the master then sending it to the other context? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi honeyjo...@ideata-analytics.com wrote: On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote: Ideally you should be converting RDD to schemardd ? You are creating UnionRDD to join across dstream rdd? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi honeyjo...@ideata-analytics.com wrote: Hi, I am trying to run a project which takes data as a DStream and dumps the data in the Shark table after various operations. I am getting the following error : Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sc ala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:102 6) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply( DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply( DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala :619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonf un$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abstra ctDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.j ava:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979 ) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread .java:107) Can someone please explain the cause of this error, I am also using a Spark Context with the existing Streaming Context. I am using spark 0.9.0-Incubating, so it doesnt have anything to do with schemaRDD.This error is probably coming when I am trying to use one spark context and one shark context in the same job.Is there any way to incorporate two context in one job? Regards Honey Joshi Ideata-Analytics
Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition
Hi, I am trying to run a project which takes data as a DStream and dumps the data in the Shark table after various operations. I am getting the following error : Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Can someone please explain the cause of this error, I am also using a Spark Context with the existing Streaming Context.