Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

2014-07-03 Thread Honey Joshi
On Wed, July 2, 2014 2:00 am, Mayur Rustagi wrote:
 two job context cannot share data, are you collecting the data to the
 master  then sending it to the other context?

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi




 On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi 
 honeyjo...@ideata-analytics.com wrote:

 On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:

 Ideally you should be converting RDD to schemardd ?
 You are creating UnionRDD to join across dstream rdd?




 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi





 On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
 honeyjo...@ideata-analytics.com


 wrote:



 Hi,
 I am trying to run a project which takes data as a DStream and dumps
 the data in the Shark table after various operations. I am getting
 the following error :

 Exception in thread main org.apache.spark.SparkException: Job
 aborted:
 Task 0.0:0 failed 1 times (most recent failure: Exception failure:
 java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition
 cannot be cast to org.apache.spark.rdd.HadoopPartition) at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s
 ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s
 ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
 at

 scala.collection.mutable.ResizableArray$class.foreach(ResizableArra
 y.sc ala:59)
 at
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala
 :102
 6)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap
 ply( DAGScheduler.scala:619)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap
 ply( DAGScheduler.scala:619)
 at scala.Option.foreach(Option.scala:236) at

 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.s
 cala :619)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$a
 nonf un$receive$1.applyOrElse(DAGScheduler.scala:207)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
 akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
 akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at

 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Ab
 stra ctDispatcher.scala:386)
 at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260
 )
 at

 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPo
 ol.j ava:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:
 1979
 )
 at

 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerTh
 read .java:107)



 Can someone please explain the cause of this error, I am also using
 a Spark Context with the existing Streaming Context.





 I am using spark 0.9.0-Incubating, so it doesnt have anything to do
 with schemaRDD.This error is probably coming when I am trying to use one
 spark context and one shark context in the same job.Is there any way to
 incorporate two context in one job? Regards


 Honey Joshi
 Ideata-Analytics




Both of these contexts are independently executing but they were still
giving us issues, mostly because of the lazy evaluation in scala.This
error is probably coming when I am trying to use one spark context and one
shark context in the same job.Got it resolved by stopping the existing
spark context before calling the shark context. Thanks for your help
Mayur.



Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

2014-07-02 Thread Mayur Rustagi
Ideally you should be converting RDD to schemardd ?
You are creating UnionRDD to join across dstream rdd?


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi honeyjo...@ideata-analytics.com
 wrote:

 Hi,
 I am trying to run a project which takes data as a DStream and dumps the
 data in the Shark table after various operations. I am getting the
 following error :

 Exception in thread main org.apache.spark.SparkException: Job aborted:
 Task 0.0:0 failed 1 times (most recent failure: Exception failure:
 java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
 be cast to org.apache.spark.rdd.HadoopPartition)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
 at

 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
 at scala.Option.foreach(Option.scala:236)
 at

 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at

 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at

 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at

 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 Can someone please explain the cause of this error, I am also using a
 Spark Context with the existing Streaming Context.



Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

2014-07-02 Thread Honey Joshi
On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:
 Ideally you should be converting RDD to schemardd ?
 You are creating UnionRDD to join across dstream rdd?



 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi




 On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
 honeyjo...@ideata-analytics.com

 wrote:


 Hi,
 I am trying to run a project which takes data as a DStream and dumps the
  data in the Shark table after various operations. I am getting the
 following error :

 Exception in thread main org.apache.spark.SparkException: Job
 aborted:
 Task 0.0:0 failed 1 times (most recent failure: Exception failure:
 java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
  be cast to org.apache.spark.rdd.HadoopPartition) at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
 uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
 uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
 at

 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sc
 ala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:102
 6)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
 DAGScheduler.scala:619)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
 DAGScheduler.scala:619)
 at scala.Option.foreach(Option.scala:236) at

 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala
 :619)
 at

 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonf
 un$receive$1.applyOrElse(DAGScheduler.scala:207)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
 akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
 akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at

 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abstra
 ctDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at

 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.j
 ava:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979
 )
 at

 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
 .java:107)


 Can someone please explain the cause of this error, I am also using a
 Spark Context with the existing Streaming Context.




I am using spark 0.9.0-Incubating, so it doesnt have anything to do with
schemaRDD.This error is probably coming when I am trying to use one spark
context and one shark context in the same job.Is there any way to
incorporate two context in one job?
Regards

Honey Joshi
Ideata-Analytics



Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

2014-07-02 Thread Mayur Rustagi
two job context cannot share data, are you collecting the data to the
master  then sending it to the other context?

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi 
honeyjo...@ideata-analytics.com wrote:

 On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:
  Ideally you should be converting RDD to schemardd ?
  You are creating UnionRDD to join across dstream rdd?
 
 
 
  Mayur Rustagi
  Ph: +1 (760) 203 3257
  http://www.sigmoidanalytics.com
  @mayur_rustagi https://twitter.com/mayur_rustagi
 
 
 
 
  On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
  honeyjo...@ideata-analytics.com
 
  wrote:
 
 
  Hi,
  I am trying to run a project which takes data as a DStream and dumps the
   data in the Shark table after various operations. I am getting the
  following error :
 
  Exception in thread main org.apache.spark.SparkException: Job
  aborted:
  Task 0.0:0 failed 1 times (most recent failure: Exception failure:
  java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
   be cast to org.apache.spark.rdd.HadoopPartition) at
 
  org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
  uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
  at
 
  org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
  uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
  at
 
  scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sc
  ala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at org.apache.spark.scheduler.DAGScheduler.org
  $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:102
  6)
  at
 
  org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
  DAGScheduler.scala:619)
  at
 
  org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
  DAGScheduler.scala:619)
  at scala.Option.foreach(Option.scala:236) at
 
  org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala
  :619)
  at
 
  org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonf
  un$receive$1.applyOrElse(DAGScheduler.scala:207)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
  akka.actor.ActorCell.invoke(ActorCell.scala:456)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
  akka.dispatch.Mailbox.run(Mailbox.scala:219)
  at
 
  akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abstra
  ctDispatcher.scala:386)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
 
  scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.j
  ava:1339)
  at
  scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979
  )
  at
 
  scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
  .java:107)
 
 
  Can someone please explain the cause of this error, I am also using a
  Spark Context with the existing Streaming Context.
 
 
 

 I am using spark 0.9.0-Incubating, so it doesnt have anything to do with
 schemaRDD.This error is probably coming when I am trying to use one spark
 context and one shark context in the same job.Is there any way to
 incorporate two context in one job?
 Regards

 Honey Joshi
 Ideata-Analytics




Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

2014-07-01 Thread Honey Joshi
Hi,
I am trying to run a project which takes data as a DStream and dumps the
data in the Shark table after various operations. I am getting the
following error :

Exception in thread main org.apache.spark.SparkException: Job aborted:
Task 0.0:0 failed 1 times (most recent failure: Exception failure:
java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
be cast to org.apache.spark.rdd.HadoopPartition)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Can someone please explain the cause of this error, I am also using a
Spark Context with the existing Streaming Context.