Spark Streaming : Could not compute split, block not found
We are using Sparks 1.0 for Spark Streaming on Spark Standalone cluster and seeing the following error. Job aborted due to stage failure: Task 3475.0:15 failed 4 times, most recent failure: Exception failure in TID 216394 on host hslave33102.sjc9.service-now.com: java.lang.Exception: Could not compute split, block input-0-140686934 not found org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) We are using the Memory_DISK serialization option for the input streams. And the stream is also being persisted since we have multiple transformations happening on the input stream. val lines = KafkaUtils.createStream[String, Array[Byte], StringDecoder, DefaultDecoder](ssc, kafkaParams, topicpMap, StorageLevel.MEMORY_AND_DISK_SER) lines.persist(StorageLevel.MEMORY_AND_DISK_SER) We are aggregating data every 15 minutes as well as an hour. The spark.streaming.blockInterval=1 so we minimize the blocks of data read. The problem started at the 15 minute interval but now I'm seeing it happen every hour since last night. Any suggestions? Thanks Kanwal -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming : Could not compute split, block not found
We are using Sparks 1.0. I'm using DStream operations such as map, filter and reduceByKeyAndWindow and doing a foreach operation on DStream. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11209.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming : Could not compute split, block not found
All the operations being done are using the dstream. I do read an RDD in memory which is collected and converted into a map and used for lookups as part of DStream operations. This RDD is loaded only once and converted into map that is then used on streamed data. Do you mean non streaming jobs on RDD using raw kafka data? Log File attached: streaming.gz http://apache-spark-user-list.1001560.n3.nabble.com/file/n11229/streaming.gz -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11229.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming : Could not compute split, block not found
Not at all. Don't have any such code. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11231.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming : Could not compute split, block not found
Here is the log file. streaming.gz http://apache-spark-user-list.1001560.n3.nabble.com/file/n11240/streaming.gz There are quite few AskTimeouts that have happening for about 2 minutes and then followed by block not found errors. Thanks Kanwal -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11240.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Kafka Streaming - Error Could not compute split
We are using Spark 1.0.0 deployed on Spark Standalone cluster and I'm getting the following exception. With previous version I've seen this error occur along with OutOfMemory errors which I'm not seeing with Sparks 1.0. Any suggestions? Job aborted due to stage failure: Task 3748.0:20 failed 4 times, most recent failure: Exception failure in TID 225792 on host hslave32106.sjc9.service-now.com: java.lang.Exception: Could not compute split, block input-0-1403458929600 not found org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Driver stacktrace: -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-Error-Could-not-compute-split-tp8112.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Writing data to HBase using Spark
Please see sample code attached at https://issues.apache.org/jira/browse/SPARK-944. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-data-to-HBase-using-Spark-tp7304p7305.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
KafkaInputDStream Stops reading new messages
Spark Streaming job was running on two worker nodes and then there was an error on one of the nodes. The spark job showed running but no progress was being made and not processing any new messages. Based on the driver log files I see the following errors. I would expect the stream reading would be retried and continue processing new messages. Is there any configuration that I could be missing. System.setProperty(spark.serializer, org.apache.spark.serializer.KryoSerializer) System.setProperty(spark.kryo.registrator, com.snc.sinet.streaming.StreamAggregatorKryoRegistrator) System.setProperty(spark.local.dir, Configuration.streamingConfig.localDir) System.setProperty(spark.ui.port, Configuration.streamingConfig.uiPort.toString) 2014-04-05 18:22:26,507 ERROR akka.remote.EndpointWriter spark-akka.actor.default-dispatcher-3 - AssociationError [akka.tcp://sp...@hclient01.sea1.service-now.com:49048] - [akka.tcp://sp...@hclient02.sea1.service-now.com:50888]: Error [Shut down address: akka.tcp://sp...@hclient02.sea1.service-now.com:50888] [ akka.remote.ShutDownAssociation: Shut down address: akka.tcp://sp...@hclient02.sea1.service-now.com:50888 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down. ] akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkexecu...@hclient02.sea1.service-now.com:47512] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hclient02.sea1.service-now.com/10.196.32.78:47512 ] 2014-04-05 18:21:52,893 WARN o.a.spark.scheduler.TaskSetManager - Loss was due to java.lang.IllegalStateException java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2418) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1379) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1988) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1912) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:64) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1793) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-Stops-reading-new-messages-tp4016.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Using ProtoBuf 2.5 for messages with Spark Streaming
Yes I'm using akka as well. But if that is the problem then I should have been facing this issue in my local setup as well. I'm only running into this error on using the spark standalone cluster. But will try out your suggestion and let you know. Thanks Kanwal -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396p3582.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Using ProtoBuf 2.5 for messages with Spark Streaming
I've removed the dependency on akka in a separate project but still running into the same error. In the POM Dependency Hierarchy I do see 2.4.1 - shaded and 2.5.0 being included. If there is a conflict with project dependency I would think I should be getting the same error in my local setup as well. Here is the dependencies I'm using. dependencies dependency groupIdch.qos.logback/groupId artifactIdlogback-core/artifactId version1.0.13/version /dependency dependency groupIdch.qos.logback/groupId artifactIdlogback-classic/artifactId version1.0.13/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId version0.9.0-incubating/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming_2.10/artifactId version0.9.0-incubating/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming-kafka_2.10/artifactId version0.9.0-incubating/version /dependency dependency groupIdorg.apache.hbase/groupId artifactIdhbase/artifactId version0.94.15-cdh4.6.0/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.0.0-cdh4.6.0/version /dependency dependency groupIdcom.google.protobuf/groupId artifactIdprotobuf-java/artifactId version2.5.0/version /dependency dependency groupIdorg.slf4j/groupId artifactIdslf4j-api/artifactId version1.7.5/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-library/artifactId version2.10.2/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-actors/artifactId version2.10.2/version /dependency dependency groupIdorg.scala-lang/groupId artifactIdscala-reflect/artifactId version2.10.2/version /dependency dependency groupIdorg.slf4j/groupId artifactIdslf4j-api/artifactId version1.7.5/version /dependency -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396p3585.html Sent from the Apache Spark User List mailing list archive at Nabble.com.