Hi,

Suppose I have an RDD that is loaded from some file and then I also have a
DStream that has data coming from some stream. I want to keep union some of
the tuples from the DStream into my RDD. For this I can use something like
this:

  var myRDD: RDD[(String, Long)] = sc.fromText...
  dstream.foreachRDD{ rdd =>
    myRDD = myRDD.union(rdd.filter(myfilter))
  }

My questions is that for how long spark will keep RDDs underlying the
dstream around? Is there some configuratoin knob that can control that?

Regards,
Anand
  • [no subject] Anand Nalya
    • RE: Evo Eftimov
    • Re: Gerard Maas
      • RE: Evo Eftimov
        • Re: Gerard Maas
          • RE: Evo Eftimov

Reply via email to