SparkContext declared as object variable

2015-09-18 Thread Priya Ch
Hello All,

  Instead of declaring sparkContext in main, declared as object variable as
-

 object sparkDemo
{

 val conf = new SparkConf
 val sc = new SparkContext(conf)

  def main(args:Array[String])
  {

val baseRdd = sc.parallelize()
   .
   .
   .
  }

}

But this piece of code is giving :
java.io.IOException: org.apache.spark.SparkException: Failed to get
broadcast_5_piece0 of broadcast_5
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1155)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get
broadcast_5_piece0 of broadcast_5
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org

$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)

Why should't we declare sc as object variable ???

Regards,
Padma Ch


Re: SparkContext declared as object variable

2015-09-22 Thread Akhil Das
Its a "value" not a variable, and what are you parallelizing here?

Thanks
Best Regards

On Fri, Sep 18, 2015 at 11:21 PM, Priya Ch 
wrote:

> Hello All,
>
>   Instead of declaring sparkContext in main, declared as object variable
> as -
>
>  object sparkDemo
> {
>
>  val conf = new SparkConf
>  val sc = new SparkContext(conf)
>
>   def main(args:Array[String])
>   {
>
> val baseRdd = sc.parallelize()
>.
>.
>.
>   }
>
> }
>
> But this piece of code is giving :
> java.io.IOException: org.apache.spark.SparkException: Failed to get
> broadcast_5_piece0 of broadcast_5
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1155)
> at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get
> broadcast_5_piece0 of broadcast_5
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at org.apache.spark.broadcast.TorrentBroadcast.org
> 
> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
>
> Why should't we declare sc as object variable ???
>
> Regards,
> Padma Ch
>


Re: SparkContext declared as object variable

2015-09-23 Thread Akhil Das
Yes of course it works.

[image: Inline image 1]

Thanks
Best Regards

On Tue, Sep 22, 2015 at 4:53 PM, Priya Ch 
wrote:

> Parallelzing some collection (array of strings). Infact in our product we
> are reading data from kafka using KafkaUtils.createStream and applying some
> transformations.
>
> Is creating sparContext at object level instead of creating in main
> doesn't work 
>
> On Tue, Sep 22, 2015 at 2:59 PM, Akhil Das 
> wrote:
>
>> Its a "value" not a variable, and what are you parallelizing here?
>>
>> Thanks
>> Best Regards
>>
>> On Fri, Sep 18, 2015 at 11:21 PM, Priya Ch 
>> wrote:
>>
>>> Hello All,
>>>
>>>   Instead of declaring sparkContext in main, declared as object variable
>>> as -
>>>
>>>  object sparkDemo
>>> {
>>>
>>>  val conf = new SparkConf
>>>  val sc = new SparkContext(conf)
>>>
>>>   def main(args:Array[String])
>>>   {
>>>
>>> val baseRdd = sc.parallelize()
>>>.
>>>.
>>>.
>>>   }
>>>
>>> }
>>>
>>> But this piece of code is giving :
>>> java.io.IOException: org.apache.spark.SparkException: Failed to get
>>> broadcast_5_piece0 of broadcast_5
>>> at
>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1155)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>>> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: org.apache.spark.SparkException: Failed to get
>>> broadcast_5_piece0 of broadcast_5
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
>>> at scala.Option.getOrElse(Option.scala:120)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at org.apache.spark.broadcast.TorrentBroadcast.org
>>> 
>>> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
>>> at
>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
>>>
>>> Why should't we declare sc as object variable ???
>>>
>>> Regards,
>>> Padma Ch
>>>
>>
>>
>


Re: SparkContext declared as object variable

2015-09-24 Thread Priya Ch
object StreamJob {

  val conf = new SparkConf
  val sc = new SparkContext(conf)

 def main(args:Array[String])
 {
val baseRDD =
sc.parallelize(Array("hi","hai","hi","bye","bye","hi","hai","hi","bye","bye"))
val words = baseRDD.flatMap(line => line.split(","))
val wordPairs = words.map(word => (word,1))
val reducedWords = wordPairs.reduceByKey((a,b) => a+b)
reducedWords.print
  }
}

Please try to run this code in cluster mode, possibly YARN mode and the
spark version is 1.3.0

This has thrown
java.io.IOException: org.apache.spark.SparkException: Failed to get
broadcast_5_piece0 of broadcast_5

Regards,
Padma Ch

On Thu, Sep 24, 2015 at 11:25 AM, Akhil Das 
wrote:

> It should, i don't see any reason for it to not run in cluster mode.
>
> Thanks
> Best Regards
>
> On Wed, Sep 23, 2015 at 8:56 PM, Priya Ch 
> wrote:
>
>> does it run in cluster mode ???
>>
>> On Wed, Sep 23, 2015 at 7:11 PM, Akhil Das 
>> wrote:
>>
>>> Yes of course it works.
>>>
>>> [image: Inline image 1]
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, Sep 22, 2015 at 4:53 PM, Priya Ch 
>>> wrote:
>>>
 Parallelzing some collection (array of strings). Infact in our product
 we are reading data from kafka using KafkaUtils.createStream and applying
 some transformations.

 Is creating sparContext at object level instead of creating in main
 doesn't work 

 On Tue, Sep 22, 2015 at 2:59 PM, Akhil Das 
 wrote:

> Its a "value" not a variable, and what are you parallelizing here?
>
> Thanks
> Best Regards
>
> On Fri, Sep 18, 2015 at 11:21 PM, Priya Ch <
> learnings.chitt...@gmail.com> wrote:
>
>> Hello All,
>>
>>   Instead of declaring sparkContext in main, declared as object
>> variable as -
>>
>>  object sparkDemo
>> {
>>
>>  val conf = new SparkConf
>>  val sc = new SparkContext(conf)
>>
>>   def main(args:Array[String])
>>   {
>>
>> val baseRdd = sc.parallelize()
>>.
>>.
>>.
>>   }
>>
>> }
>>
>> But this piece of code is giving :
>> java.io.IOException: org.apache.spark.SparkException: Failed to get
>> broadcast_5_piece0 of broadcast_5
>> at
>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1155)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>> at
>> org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
>> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.spark.SparkException: Failed to get
>> broadcast_5_piece0 of broadcast_5
>> at
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
>> at scala.Option.getOrElse(Option.scala:120)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at org.apache.spark.broadcast.TorrentBroadcast.org
>> 
>> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
>> at
>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
>>
>> Why should't we declare sc as object varia