Could anyone answer my question? _____________________________________________ From: Chen, Yan I Sent: 2016, June, 14 1:34 PM To: 'user@spark.apache.org' Subject: restarting of spark streaming
Hi, I notice that in the process of restarting, spark streaming will try to recover/replay all the batches it missed. But in this process, will streams be checkpointed like the way they are checkpointed in the normal process? Does anyone know? Sometimes our cluster goes maintenance, and our streaming process is shutdown for e.g. 1 day and restarted. If replaying batches in this period of time without checkpointing, the RDD chain will be very big, and memory usage will keep going up until all missing batches are replayed. [memory usage will keep going up until all missing batches are replayed]: this is what we observe now. Thanks, Yan Chen _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference. Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future.