It shows nullPointerException, your data could be corrupted? Try putting a try catch inside the operation that you are doing, Are you running the worker process on the master node also? If not, then only 1 node will be doing the processing. If yes, then try setting the level of parallelism and number of partitions while creating/transforming the RDD.
Thanks Best Regards On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi All, > > We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is > having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set > up hdfs which has 2 TB capacity and the block size is 256 mb When we try > to process 1 gb file on spark, we see the following exception > > 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in > stage 0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in > stage 0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in > stage 0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered > executor: > Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156] > with ID 0 > 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block > manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM > 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block > manager NODE-DSRV02:47844 with 2.1 GB RAM > 14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection from > [NODE-DSRV05.impetus.co.in/192.168.145.195:51447] > 14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection to > [NODE-DSRV05.impetus.co.in/192.168.145.195:60432] > 14/11/14 17:01:43 INFO network.SendingConnection: Connected to [ > NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending > 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 > in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1 GB) > 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 > in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1 GB) > 14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage > 0.0 (TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException: > org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) > org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > org.apache.spark.scheduler.Task.run(Task.scala:54) > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > java.lang.Thread.run(Thread.java:722) > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in > stage 0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage > 0.0 (TID 1) on executor NODE-DSRV05.impetus.co.in: > java.lang.NullPointerException (null) [duplicate 1] > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage > 0.0 (TID 2) on executor NODE-DSRV05.impetus.co.in: > java.lang.NullPointerException (null) [duplicate 2] > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in > stage 0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in > stage 0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage > 0.0 (TID 3) on executor NODE-DSRV05.impetus.co.in: > java.lang.NullPointerException (null) [duplicate 3] > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in > stage 0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes) > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage > 0.0 (TID 4) on executor NODE-DSRV05.impetus.co.in: > java.lang.NullPointerException (null) [duplicate 4] > 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in > stage 0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes) > > > What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it on > single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont see > any exception but the entire processing is done by only one node. I couldnt > figure out where the issue lies. > > Any suggestions on what kind of situations might cause such issue ? > > Thanks, > Padma Ch >