Could anyone answer my question?

_____________________________________________
From: Chen, Yan I
Sent: 2016, June, 14 1:34 PM
To: 'user@spark.apache.org'
Subject: restarting of spark streaming


Hi,

I notice that in the process of restarting, spark streaming will try to 
recover/replay all the batches it missed. But in this process, will streams be 
checkpointed like the way they are checkpointed in the normal process?

Does anyone know?

Sometimes our cluster goes maintenance, and our streaming process is shutdown 
for e.g. 1 day and restarted. If replaying batches in this period of time 
without checkpointing, the RDD chain will be very big, and memory usage will 
keep going up until all missing batches are replayed.

[memory usage will keep going up until all missing batches are replayed]: this 
is what we observe now.

Thanks,
Yan Chen

_______________________________________________________________________
If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.  

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.

Reply via email to