Re: Fwd: Sample Spark Program Error

2014-12-31 Thread RK
If you look at your program output closely, you can see the following output. 
Lines with a: 24, Lines with b: 15

The exception seems to be happening with Spark cleanup after executing your 
code. Try adding sc.stop() at the end of your program to see if the exception 
goes away.

 

 On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire 
vmadh...@umail.iu.edu wrote:
   

 

Hi All,
I am trying to run a sample Spark program using Scala SBT,
Below is the program,
def main(args: Array[String]) {
      val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should be 
some file on your system      val sc = new SparkContext(local, Simple App, 
E:/ApacheSpark/usb/usb/spark/bin,List(target/scala-2.10/sbt2_2.10-1.0.jar)) 
     val logData = sc.textFile(logFile, 2).cache()
      val numAs = logData.filter(line = line.contains(a)).count()      val 
numBs = logData.filter(line = line.contains(b)).count()
      println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))
    }

Below is the error log,

14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: 
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+67314/12/30 23:20:21 INFO 
storage.MemoryStore: ensureFreeSpace(2032) called with curMem=34047, 
maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 
stored as values in memory (estimated size 2032.0 B, free 267.2 MB)14/12/30 
23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on zealot:61452 
(size: 2032.0 B, free: 267.3 MB)14/12/30 23:20:21 INFO 
storage.BlockManagerMaster: Updated info of block rdd_1_014/12/30 23:20:21 INFO 
executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2300 bytes result 
sent to driver14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 
1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)14/12/30 23:20:21 
INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)14/12/30 23:20:21 
INFO spark.CacheManager: Partition rdd_1_1 not found, computing it14/12/30 
23:20:21 INFO rdd.HadoopRDD: Input split: 
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+67314/12/30 23:20:21 INFO 
scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3507 ms on 
localhost (1/2)14/12/30 23:20:21 INFO storage.MemoryStore: 
ensureFreeSpace(1912) called with curMem=36079, maxMem=28024897514/12/30 
23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values in memory 
(estimated size 1912.0 B, free 267.2 MB)14/12/30 23:20:21 INFO 
storage.BlockManagerInfo: Added rdd_1_1 in memory on zealot:61452 (size: 1912.0 
B, free: 267.3 MB)14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated 
info of block rdd_1_114/12/30 23:20:21 INFO executor.Executor: Finished task 
1.0 in stage 0.0 (TID 1). 2300 bytes result sent to driver14/12/30 23:20:21 
INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 261 ms 
on localhost (2/2)14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed 
TaskSet 0.0, whose tasks have all completed, from pool 14/12/30 23:20:21 INFO 
scheduler.DAGScheduler: Stage 0 (count at Test1.scala:19) finished in 3.811 
s14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at 
Test1.scala:19, took 3.997365232 s14/12/30 23:20:21 INFO spark.SparkContext: 
Starting job: count at Test1.scala:2014/12/30 23:20:21 INFO 
scheduler.DAGScheduler: Got job 1 (count at Test1.scala:20) with 2 output 
partitions (allowLocal=false)14/12/30 23:20:21 INFO scheduler.DAGScheduler: 
Final stage: Stage 1(count at Test1.scala:20)14/12/30 23:20:21 INFO 
scheduler.DAGScheduler: Parents of final stage: List()14/12/30 23:20:21 INFO 
scheduler.DAGScheduler: Missing parents: List()14/12/30 23:20:21 INFO 
scheduler.DAGScheduler: Submitting Stage 1 (FilteredRDD[3] at filter at 
Test1.scala:20), which has no missing parents14/12/30 23:20:21 INFO 
storage.MemoryStore: ensureFreeSpace(2600) called with curMem=37991, 
maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 
stored as values in memory (estimated size 2.5 KB, free 267.2 MB)14/12/30 
23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 
(FilteredRDD[3] at filter at Test1.scala:20)14/12/30 23:20:21 INFO 
scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks14/12/30 23:20:21 
INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 
localhost, ANY, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running 
task 0.0 in stage 1.0 (TID 2)14/12/30 23:20:21 INFO storage.BlockManager: Found 
block rdd_1_0 locally14/12/30 23:20:21 INFO executor.Executor: Finished task 
0.0 in stage 1.0 (TID 2). 1731 bytes result sent to driver14/12/30 23:20:21 
INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 
localhost, ANY, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running 
task 1.0 in stage 1.0 (TID 3)14/12/30 23:20:21 INFO storage.BlockManager: Found 
block rdd_1_1 locally14/12/30 23:20:21 INFO executor.Executor: Finished task 
1.0 in stage 1.0 (TID 3). 1731 bytes 

Re: Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire
Yes. The exception is gone now after adding stop() at the end.
Can you please tell me what this stop() does at the end. Does it disable
the spark context.

On Wed, Dec 31, 2014 at 10:09 AM, RK prk...@yahoo.com wrote:

 If you look at your program output closely, you can see the following
 output.
 Lines with a: 24, Lines with b: 15

 The exception seems to be happening with Spark cleanup after executing
 your code. Try adding sc.stop() at the end of your program to see if the
 exception goes away.




   On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire 
 vmadh...@umail.iu.edu wrote:




 Hi All,

 I am trying to run a sample Spark program using Scala SBT,

 Below is the program,

 def main(args: Array[String]) {

   val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should
 be some file on your system
   val sc = new SparkContext(local, Simple App,
 E:/ApacheSpark/usb/usb/spark/bin,List(target/scala-2.10/sbt2_2.10-1.0.jar))
   val logData = sc.textFile(logFile, 2).cache()

   val numAs = logData.filter(line = line.contains(a)).count()
   val numBs = logData.filter(line = line.contains(b)).count()

   println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

 }


 Below is the error log,


 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
 file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called
 with curMem=34047, maxMem=280248975
 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values
 in memory (estimated size 2032.0 B, free 267.2 MB)
 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory
 on zealot:61452 (size: 2032.0 B, free: 267.3 MB)
 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
 rdd_1_0
 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0
 (TID 0). 2300 bytes result sent to driver
 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in
 stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0
 (TID 1)
 14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found,
 computing it
 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
 file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in
 stage 0.0 (TID 0) in 3507 ms on localhost (1/2)
 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called
 with curMem=36079, maxMem=280248975
 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values
 in memory (estimated size 1912.0 B, free 267.2 MB)
 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory
 on zealot:61452 (size: 1912.0 B, free: 267.3 MB)
 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
 rdd_1_1
 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0
 (TID 1). 2300 bytes result sent to driver
 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in
 stage 0.0 (TID 1) in 261 ms on localhost (2/2)
 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
 whose tasks have all completed, from pool
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at
 Test1.scala:19) finished in 3.811 s
 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
 Test1.scala:19, took 3.997365232 s
 14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at
 Test1.scala:20
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at
 Test1.scala:20) with 2 output partitions (allowLocal=false)
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count
 at Test1.scala:20)
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage:
 List()
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1
 (FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called
 with curMem=37991, maxMem=280248975
 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as
 values in memory (estimated size 2.5 KB, free 267.2 MB)
 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
 from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)
 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
 with 2 tasks
 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in
 stage 1.0 (TID 2, localhost, ANY, 1264 bytes)
 14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0
 (TID 2)
 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally
 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0
 (TID 2). 1731 bytes result sent to driver
 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in
 stage 

RE: Fwd: Sample Spark Program Error

2014-12-31 Thread Kapil Malik
Hi Naveen,

Quoting 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
SparkContext is Main entry point for Spark functionality. A SparkContext 
represents the connection to a Spark cluster, and can be used to create RDDs, 
accumulators and broadcast variables on that cluster.

Only one SparkContext may be active per JVM. You must stop() the active 
SparkContext before creating a new one

So stop ( ) shuts down the connection between Driver program and Spark master, 
and does some cleanup. Indeed, after calling this, you cannot do any operation 
on it or on any RDD created via this context.

Regards,

Kapil

From: Naveen Madhire [mailto:vmadh...@umail.iu.edu]
Sent: 31 December 2014 22:08
To: RK
Cc: user@spark.apache.org
Subject: Re: Fwd: Sample Spark Program Error

Yes. The exception is gone now after adding stop() at the end.
Can you please tell me what this stop() does at the end. Does it disable the 
spark context.

On Wed, Dec 31, 2014 at 10:09 AM, RK 
prk...@yahoo.commailto:prk...@yahoo.com wrote:
If you look at your program output closely, you can see the following output.
Lines with a: 24, Lines with b: 15

The exception seems to be happening with Spark cleanup after executing your 
code. Try adding sc.stop() at the end of your program to see if the exception 
goes away.



On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire 
vmadh...@umail.iu.edumailto:vmadh...@umail.iu.edu wrote:


Hi All,

I am trying to run a sample Spark program using Scala SBT,

Below is the program,

def main(args: Array[String]) {

  val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should be 
some file on your system
  val sc = new SparkContext(local, Simple App, 
E:/ApacheSpark/usb/usb/spark/bin,List(target/scala-2.10/sbt2_2.10-1.0.jar))
  val logData = sc.textFile(logFile, 2).cache()

  val numAs = logData.filter(line = line.contains(a)).count()
  val numBs = logData.filter(line = line.contains(b)).count()

  println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

}


Below is the error log,


14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: 
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called with 
curMem=34047, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values in 
memory (estimated size 2032.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on 
zealot:61452 (size: 2032.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_0
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 
0). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 
(TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found, 
computing it
14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: 
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 
(TID 0) in 3507 ms on localhost (1/2)
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called with 
curMem=36079, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values in 
memory (estimated size 1912.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on 
zealot:61452 (size: 1912.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_1
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 
1). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 
(TID 1) in 261 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose 
tasks have all completed, from pool
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at 
Test1.scala:19) finished in 3.811 s
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at 
Test1.scala:19, took 3.997365232 s
14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at Test1.scala:20
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at 
Test1.scala:20) with 2 output partitions (allowLocal=false)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count at 
Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1 
(FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called with 
curMem=37991, maxMem=280248975
14/12/30 23