Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
We are using Sparks 1.0 for Spark Streaming on Spark Standalone cluster and
seeing the following error.

Job aborted due to stage failure: Task 3475.0:15 failed 4 times, most
recent failure: Exception failure in TID 216394 on host
hslave33102.sjc9.service-now.com: java.lang.Exception: Could not compute
split, block input-0-140686934 not found
org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) 

We are using the Memory_DISK serialization option for the input streams. And
the stream is also being persisted since we have multiple transformations
happening on the input stream.


val lines = KafkaUtils.createStream[String, Array[Byte], StringDecoder,
DefaultDecoder](ssc, kafkaParams, topicpMap,
StorageLevel.MEMORY_AND_DISK_SER)

lines.persist(StorageLevel.MEMORY_AND_DISK_SER)

We are aggregating data every 15 minutes as well as an hour. The
spark.streaming.blockInterval=1 so we minimize the blocks of data read.

The problem started at the 15 minute interval but now I'm seeing it happen
every hour since last night.

Any suggestions?

Thanks
Kanwal



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
We are using Sparks 1.0.

I'm using DStream operations such as map, filter and reduceByKeyAndWindow
and doing a foreach operation on DStream. 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11209.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
All the operations being done are using the dstream. I do read an RDD in
memory which is collected and converted into a map and used for lookups as
part of DStream operations. This RDD is loaded only once and converted into
map that is then used on streamed data.

Do you mean non streaming jobs on RDD using raw kafka data? 

Log File attached:
streaming.gz
http://apache-spark-user-list.1001560.n3.nabble.com/file/n11229/streaming.gz  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11229.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
Not at all. Don't have any such code.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11231.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
Here is the log file.
streaming.gz
http://apache-spark-user-list.1001560.n3.nabble.com/file/n11240/streaming.gz  

There are quite few AskTimeouts that have happening for about 2 minutes and
then followed by block not found errors.

Thanks
Kanwal




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11240.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Kafka Streaming - Error Could not compute split

2014-06-23 Thread Kanwaldeep
We are using Spark 1.0.0 deployed on Spark Standalone cluster and I'm getting
the following exception. With previous version I've seen this error occur
along with OutOfMemory errors which I'm not seeing with Sparks 1.0.

Any suggestions?

Job aborted due to stage failure: Task 3748.0:20 failed 4 times, most recent
failure: Exception failure in TID 225792 on host
hslave32106.sjc9.service-now.com: java.lang.Exception: Could not compute
split, block input-0-1403458929600 not found
org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
org.apache.spark.scheduler.Task.run(Task.scala:51)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662) Driver stacktrace:



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-Error-Could-not-compute-split-tp8112.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Writing data to HBase using Spark

2014-06-10 Thread Kanwaldeep
Please see sample code attached at
https://issues.apache.org/jira/browse/SPARK-944. 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-data-to-HBase-using-Spark-tp7304p7305.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


KafkaInputDStream Stops reading new messages

2014-04-09 Thread Kanwaldeep
Spark Streaming job was running on two worker nodes and then there was an
error on one of the nodes. The spark job showed running but no progress was
being made and not processing any new messages. Based on the driver log
files I see the following errors. 

I would expect the stream reading would be retried and continue processing
new messages. Is there any configuration that I could be missing.

System.setProperty(spark.serializer,
org.apache.spark.serializer.KryoSerializer)
System.setProperty(spark.kryo.registrator,
com.snc.sinet.streaming.StreamAggregatorKryoRegistrator)
System.setProperty(spark.local.dir,
Configuration.streamingConfig.localDir)
System.setProperty(spark.ui.port,
Configuration.streamingConfig.uiPort.toString)


2014-04-05 18:22:26,507 ERROR akka.remote.EndpointWriter
spark-akka.actor.default-dispatcher-3 -
AssociationError
[akka.tcp://sp...@hclient01.sea1.service-now.com:49048] -
[akka.tcp://sp...@hclient02.sea1.service-now.com:50888]: Error [Shut down
address: akka.tcp://sp...@hclient02.sea1.service-now.com:50888] [
akka.remote.ShutDownAssociation: Shut down address:
akka.tcp://sp...@hclient02.sea1.service-now.com:50888
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The
remote system terminated the association because it is shutting down.
]

akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkexecu...@hclient02.sea1.service-now.com:47512]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: hclient02.sea1.service-now.com/10.196.32.78:47512
]

2014-04-05 18:21:52,893 WARN  o.a.spark.scheduler.TaskSetManager  -
Loss was due to
java.lang.IllegalStateException
java.lang.IllegalStateException: unread block data
at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2418)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1379)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1988)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1912)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at
org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:64)
at
org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
at
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1834)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1793)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-Stops-reading-new-messages-tp4016.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-01 Thread Kanwaldeep
Yes I'm using akka as well. But if that is the problem then I should have
been facing this issue in my local setup as well. I'm only running into this
error on using the spark standalone cluster.

But will try out your suggestion and let you know.

Thanks
Kanwal



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396p3582.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-01 Thread Kanwaldeep
I've removed the dependency on akka in a separate project but still running
into the same error. In the POM Dependency Hierarchy I do see 2.4.1 - shaded
and 2.5.0 being included. If there is a conflict with project dependency I
would think I should be getting the same error in my local setup as well.

Here is the dependencies I'm using.

dependencies
dependency
groupIdch.qos.logback/groupId
artifactIdlogback-core/artifactId
version1.0.13/version
/dependency
dependency
groupIdch.qos.logback/groupId
artifactIdlogback-classic/artifactId
version1.0.13/version
/dependency

dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version0.9.0-incubating/version

/dependency
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-streaming_2.10/artifactId
version0.9.0-incubating/version
/dependency
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-streaming-kafka_2.10/artifactId
version0.9.0-incubating/version

/dependency


  dependency
groupIdorg.apache.hbase/groupId
artifactIdhbase/artifactId
version0.94.15-cdh4.6.0/version

/dependency  

dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-client/artifactId
version2.0.0-cdh4.6.0/version
/dependency  
dependency
groupIdcom.google.protobuf/groupId
artifactIdprotobuf-java/artifactId
version2.5.0/version
/dependency 

dependency
groupIdorg.slf4j/groupId
artifactIdslf4j-api/artifactId
version1.7.5/version
/dependency


dependency
groupIdorg.scala-lang/groupId
artifactIdscala-library/artifactId
version2.10.2/version
/dependency

dependency
groupIdorg.scala-lang/groupId
artifactIdscala-actors/artifactId
version2.10.2/version
/dependency
dependency
groupIdorg.scala-lang/groupId
artifactIdscala-reflect/artifactId
version2.10.2/version
/dependency

dependency
groupIdorg.slf4j/groupId
artifactIdslf4j-api/artifactId
version1.7.5/version
/dependency






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396p3585.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.