Hi, I have successfully reduced my data and store it in JavaDStream<BSONObject>
Now, i want to save this data in mongodb for this i have used BSONObject type. But, when i try to save it, it is giving exception. For this, i also try to save it just as *saveAsTextFile *but same exception. Error Log : attached full log file Excerpt from log file. 2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_4_piece0 2015-08-11 11:18:52,664 INFO (org.apache.spark.SparkContext:59) - Created broadcast 4 from broadcast at DAGScheduler.scala:839 2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Submitting 2 missing tasks from Stage 7 (MapPartitionsRDD[5] at foreach at DirectStream.java:167) 2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Adding task set 7.0 with 2 tasks 2015-08-11 11:18:52,665 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Starting task 0.0 in stage 7.0 (TID 5, localhost, PROCESS_LOCAL, 1056 bytes) 2015-08-11 11:18:52,666 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Starting task 1.0 in stage 7.0 (TID 6, localhost, PROCESS_LOCAL, 1056 bytes) 2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) - Running task 0.0 in stage 7.0 (TID 5) 2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) - Running task 1.0 in stage 7.0 (TID 6) 2015-08-11 11:18:52,827 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty blocks out of 2 blocks 2015-08-11 11:18:52,828 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote fetches in 1 ms 2015-08-11 11:18:52,846 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty blocks out of 2 blocks 2015-08-11 11:18:52,847 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote fetches in 1 ms 2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) - Exception in task 1.0 in stage 7.0 (TID 6) java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) - Exception in task 0.0 in stage 7.0 (TID 5) java.lang.NullPointerException Code for saving output : // for MongoDB Configuration outputConfig = new Configuration(); outputConfig.set("mongo.output.uri", "mongodb://localhost:27017/test.spark"); outputConfig.set("mongo.output.format", "com.mongodb.hadoop.MongoOutputFormat"); JavaDStream<BSONObject> suspectedStream suspectedStream.foreach(new Function<JavaRDD<BSONObject>, Void>() { private static final long serialVersionUID = 4414703053334523053L; @Override public Void call(JavaRDD<BSONObject> rdd) throws Exception { logger.info(rdd.first()); rdd.saveAsTextFile("E://"); rdd.saveAsNewAPIHadoopFile("", Object.class, BSONObject.class, MongoOutputFormat.class,outputConfig); return null; } }); Regards, Deepesh
2015-08-11 11:18:52,265 INFO (org.apache.spark.streaming.scheduler.JobScheduler:59) - Finished job streaming job 1439272130000 ms.1 from job set of time 1439272130000 ms 2015-08-11 11:18:52,265 INFO (org.apache.spark.streaming.scheduler.JobScheduler:59) - Starting job streaming job 1439272130000 ms.2 from job set of time 1439272130000 ms 2015-08-11 11:18:52,271 INFO (org.apache.spark.SparkContext:59) - Starting job: foreach at DirectStream.java:167 2015-08-11 11:18:52,274 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Got job 2 (foreach at DirectStream.java:167) with 1 output partitions (allowLocal=true) 2015-08-11 11:18:52,274 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Final stage: Stage 5(foreach at DirectStream.java:167) 2015-08-11 11:18:52,274 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Parents of final stage: List(Stage 4) 2015-08-11 11:18:52,276 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Missing parents: List() 2015-08-11 11:18:52,276 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Submitting Stage 5 (MapPartitionsRDD[4] at map at DirectStream.java:119), which has no missing parents 2015-08-11 11:18:52,278 INFO (org.apache.spark.storage.MemoryStore:59) - ensureFreeSpace(3152) called with curMem=16781, maxMem=1018932756 2015-08-11 11:18:52,279 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_3 stored as values in memory (estimated size 3.1 KB, free 971.7 MB) 2015-08-11 11:18:52,281 INFO (org.apache.spark.storage.MemoryStore:59) - ensureFreeSpace(2261) called with curMem=19933, maxMem=1018932756 2015-08-11 11:18:52,281 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.2 KB, free 971.7 MB) 2015-08-11 11:18:52,283 INFO (org.apache.spark.storage.BlockManagerInfo:59) - Added broadcast_3_piece0 in memory on localhost:65012 (size: 2.2 KB, free: 971.7 MB) 2015-08-11 11:18:52,284 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_3_piece0 2015-08-11 11:18:52,284 INFO (org.apache.spark.SparkContext:59) - Created broadcast 3 from broadcast at DAGScheduler.scala:839 2015-08-11 11:18:52,285 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Submitting 1 missing tasks from Stage 5 (MapPartitionsRDD[4] at map at DirectStream.java:119) 2015-08-11 11:18:52,285 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Adding task set 5.0 with 1 tasks 2015-08-11 11:18:52,286 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Starting task 0.0 in stage 5.0 (TID 4, localhost, PROCESS_LOCAL, 1056 bytes) 2015-08-11 11:18:52,287 INFO (org.apache.spark.executor.Executor:59) - Running task 0.0 in stage 5.0 (TID 4) 2015-08-11 11:18:52,290 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty blocks out of 2 blocks 2015-08-11 11:18:52,290 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote fetches in 0 ms 2015-08-11 11:18:52,307 INFO (com.spark.DirectStream:123) - (articleId_488691_host_luxpresso.com,1) 2015-08-11 11:18:52,308 INFO (com.spark.DirectStream:130) - { "articleId" : "488691" , "host" : "luxpresso.com" , "count" : 1} 2015-08-11 11:18:52,309 INFO (org.apache.spark.executor.Executor:59) - Finished task 0.0 in stage 5.0 (TID 4). 1248 bytes result sent to driver 2015-08-11 11:18:52,311 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Finished task 0.0 in stage 5.0 (TID 4) in 24 ms on localhost (1/1) 2015-08-11 11:18:52,311 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Removed TaskSet 5.0, whose tasks have all completed, from pool 2015-08-11 11:18:52,312 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Stage 5 (foreach at DirectStream.java:167) finished in 0.026 s 2015-08-11 11:18:52,313 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Job 2 finished: foreach at DirectStream.java:167, took 0.040501 s 2015-08-11 11:18:52,313 INFO (com.spark.DirectStream:174) - { "articleId" : "488691" , "host" : "luxpresso.com" , "count" : 1} 2015-08-11 11:18:52,349 INFO (org.apache.spark.storage.BlockManager:59) - Removing broadcast 3 2015-08-11 11:18:52,351 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_3_piece0 2015-08-11 11:18:52,352 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_3_piece0 of size 2261 dropped from memory (free 1018912823) 2015-08-11 11:18:52,355 INFO (org.apache.spark.storage.BlockManagerInfo:59) - Removed broadcast_3_piece0 on localhost:65012 in memory (size: 2.2 KB, free: 971.7 MB) 2015-08-11 11:18:52,383 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_3_piece0 2015-08-11 11:18:52,384 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_3 2015-08-11 11:18:52,384 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_3 of size 3152 dropped from memory (free 1018915975) 2015-08-11 11:18:52,408 INFO (org.apache.spark.ContextCleaner:59) - Cleaned broadcast 3 2015-08-11 11:18:52,414 INFO (org.apache.spark.storage.BlockManager:59) - Removing broadcast 2 2015-08-11 11:18:52,416 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_2_piece0 2015-08-11 11:18:52,416 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_2_piece0 of size 2261 dropped from memory (free 1018918236) 2015-08-11 11:18:52,418 INFO (org.apache.spark.storage.BlockManagerInfo:59) - Removed broadcast_2_piece0 on localhost:65012 in memory (size: 2.2 KB, free: 971.7 MB) 2015-08-11 11:18:52,419 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_2_piece0 2015-08-11 11:18:52,419 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_2 2015-08-11 11:18:52,419 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_2 of size 3152 dropped from memory (free 1018921388) 2015-08-11 11:18:52,420 INFO (org.apache.spark.ContextCleaner:59) - Cleaned broadcast 2 2015-08-11 11:18:52,421 INFO (org.apache.spark.storage.BlockManager:59) - Removing broadcast 1 2015-08-11 11:18:52,422 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_1 2015-08-11 11:18:52,422 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_1 of size 2264 dropped from memory (free 1018923652) 2015-08-11 11:18:52,422 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_1_piece0 2015-08-11 11:18:52,423 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_1_piece0 of size 1675 dropped from memory (free 1018925327) 2015-08-11 11:18:52,424 INFO (org.apache.spark.storage.BlockManagerInfo:59) - Removed broadcast_1_piece0 on localhost:65012 in memory (size: 1675.0 B, free: 971.7 MB) 2015-08-11 11:18:52,424 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_1_piece0 2015-08-11 11:18:52,425 INFO (org.apache.spark.ContextCleaner:59) - Cleaned broadcast 1 2015-08-11 11:18:52,426 INFO (org.apache.spark.storage.BlockManager:59) - Removing broadcast 0 2015-08-11 11:18:52,426 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_0 2015-08-11 11:18:52,427 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_0 of size 4328 dropped from memory (free 1018929655) 2015-08-11 11:18:52,427 INFO (org.apache.spark.storage.BlockManager:59) - Removing block broadcast_0_piece0 2015-08-11 11:18:52,427 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_0_piece0 of size 3101 dropped from memory (free 1018932756) 2015-08-11 11:18:52,429 INFO (org.apache.spark.storage.BlockManagerInfo:59) - Removed broadcast_0_piece0 on localhost:65012 in memory (size: 3.0 KB, free: 971.7 MB) 2015-08-11 11:18:52,429 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_0_piece0 2015-08-11 11:18:52,432 INFO (org.apache.spark.ContextCleaner:59) - Cleaned broadcast 0 2015-08-11 11:18:52,569 INFO (org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.tip.id is deprecated. Instead, use mapreduce.task.id 2015-08-11 11:18:52,569 INFO (org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 2015-08-11 11:18:52,569 INFO (org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 2015-08-11 11:18:52,570 INFO (org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 2015-08-11 11:18:52,570 INFO (org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.job.id is deprecated. Instead, use mapreduce.job.id 2015-08-11 11:18:52,630 INFO (org.apache.spark.SparkContext:59) - Starting job: foreach at DirectStream.java:167 2015-08-11 11:18:52,631 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Got job 3 (foreach at DirectStream.java:167) with 2 output partitions (allowLocal=false) 2015-08-11 11:18:52,632 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Final stage: Stage 7(foreach at DirectStream.java:167) 2015-08-11 11:18:52,632 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Parents of final stage: List(Stage 6) 2015-08-11 11:18:52,633 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Missing parents: List() 2015-08-11 11:18:52,633 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Submitting Stage 7 (MapPartitionsRDD[5] at foreach at DirectStream.java:167), which has no missing parents 2015-08-11 11:18:52,659 INFO (org.apache.spark.storage.MemoryStore:59) - ensureFreeSpace(107792) called with curMem=0, maxMem=1018932756 2015-08-11 11:18:52,660 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_4 stored as values in memory (estimated size 105.3 KB, free 971.6 MB) 2015-08-11 11:18:52,662 INFO (org.apache.spark.storage.MemoryStore:59) - ensureFreeSpace(64018) called with curMem=107792, maxMem=1018932756 2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.MemoryStore:59) - Block broadcast_4_piece0 stored as bytes in memory (estimated size 62.5 KB, free 971.6 MB) 2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.BlockManagerInfo:59) - Added broadcast_4_piece0 in memory on localhost:65012 (size: 62.5 KB, free: 971.7 MB) 2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block broadcast_4_piece0 2015-08-11 11:18:52,664 INFO (org.apache.spark.SparkContext:59) - Created broadcast 4 from broadcast at DAGScheduler.scala:839 2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Submitting 2 missing tasks from Stage 7 (MapPartitionsRDD[5] at foreach at DirectStream.java:167) 2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Adding task set 7.0 with 2 tasks 2015-08-11 11:18:52,665 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Starting task 0.0 in stage 7.0 (TID 5, localhost, PROCESS_LOCAL, 1056 bytes) 2015-08-11 11:18:52,666 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Starting task 1.0 in stage 7.0 (TID 6, localhost, PROCESS_LOCAL, 1056 bytes) 2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) - Running task 0.0 in stage 7.0 (TID 5) 2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) - Running task 1.0 in stage 7.0 (TID 6) 2015-08-11 11:18:52,827 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty blocks out of 2 blocks 2015-08-11 11:18:52,828 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote fetches in 1 ms 2015-08-11 11:18:52,846 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty blocks out of 2 blocks 2015-08-11 11:18:52,847 INFO (org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote fetches in 1 ms 2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) - Exception in task 1.0 in stage 7.0 (TID 6) java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) - Exception in task 0.0 in stage 7.0 (TID 5) java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-08-11 11:18:52,975 WARN (org.apache.spark.scheduler.TaskSetManager:71) - Lost task 0.0 in stage 7.0 (TID 5, localhost): java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-08-11 11:18:52,978 ERROR (org.apache.spark.scheduler.TaskSetManager:75) - Task 0 in stage 7.0 failed 1 times; aborting job 2015-08-11 11:18:52,980 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Removed TaskSet 7.0, whose tasks have all completed, from pool 2015-08-11 11:18:52,981 INFO (org.apache.spark.scheduler.TaskSetManager:59) - Lost task 1.0 in stage 7.0 (TID 6) on executor localhost: java.lang.NullPointerException (null) [duplicate 1] 2015-08-11 11:18:52,981 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Removed TaskSet 7.0, whose tasks have all completed, from pool 2015-08-11 11:18:52,983 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59) - Cancelling stage 7 2015-08-11 11:18:52,987 INFO (org.apache.spark.scheduler.DAGScheduler:59) - Job 3 failed: foreach at DirectStream.java:167, took 0.356095 s 2015-08-11 11:18:52,989 ERROR (org.apache.spark.streaming.scheduler.JobScheduler:96) - Error running job streaming job 1439272130000 ms.2 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 (TID 5, localhost): java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 (TID 5, localhost): java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org