Hi TD,
We also struggled with this error for a long while. The recurring scenario
is when the job takes longer to compute than the job interval and a backlog
starts to pile up.
Hint: Check
If the DStream storage level is set to MEMORY_ONLY_SER and memory runs
out, then you will get a 'Cannot
Gerard,
That is a good observation. However, the strange thing I meet is if I use
MEMORY_AND_DISK_SER, the job even fails earlier. In my case, it takes 10
seconds to process my data of every batch, which is one minute. It fails
after 10 hours with the cannot compute split error.
Bill
On Thu,
If it regularly fails after 8 hours then could you get me the log4j logs?
To limit the size, set default log level to Warn and the level of logs for
all classes in package o.a.s.streaming to Debug. Then I can take a look.
On Nov 27, 2014 11:01 AM, Bill Jay bill.jaypeter...@gmail.com wrote:
.1001560.n3.nabble.com/Lifecycle-of-RDD-in-spark-streaming-tp19749p19835.html
To start a new topic under Apache Spark User List, email
ml-node+s1001560n1...@n3.nabble.com
To unsubscribe from Apache Spark User List, click here
http://apache-spark-user-list.1001560.n3.nabble.com/template
/Lifecycle-of-RDD-in-spark-streaming-tp19749p19850.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
I have found this paper seems to answer most of questions about life
duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf
Tian
On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha
me.mukesh@gmail.com wrote:
Hey Experts,
I wanted to understand in
Just add one more point. If Spark streaming knows when the RDD will not be
used any more, I believe Spark will not try to retrieve data it will not
use any more. However, in practice, I often encounter the error of cannot
compute split. Based on my understanding, this is because Spark cleared
out
Can you elaborate on the usage pattern that lead to cannot compute
split ? Are you using the RDDs generated by DStream, outside the
DStream logic? Something like running interactive Spark jobs
(independent of the Spark Streaming ones) on RDDs generated by
DStreams? If that is the case, what is
Hi TD,
I am using Spark Streaming to consume data from Kafka and do some
aggregation and ingest the results into RDS. I do use foreachRDD in the
program. I am planning to use Spark streaming in our production pipeline
and it performs well in generating the results. Unfortunately, we plan to
have
Hey Experts,
I wanted to understand in detail about the lifecycle of rdd(s) in a
streaming app.
From my current understanding
- rdd gets created out of the realtime input stream.
- Transform(s) functions are applied in a lazy fashion on the RDD to
transform into another rdd(s).
- Actions are
Any pointers guys?
On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha me.mukesh@gmail.com wrote:
Hey Experts,
I wanted to understand in detail about the lifecycle of rdd(s) in a
streaming app.
From my current understanding
- rdd gets created out of the realtime input stream.
- Transform(s)
11 matches
Mail list logo