With 3x replication, we should be able to achieve fault tolerance. This checkPointed RDD can be cleared if we have another in-memory checkPointed RDD down the line. It can avoid hitting disk if we have enough memory to use. We need to investigate more to find a good solution. -Xiangrui
On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <mri...@gmail.com> wrote: > Effectively this is persist without fault tolerance. > Failure of any node means complete lack of fault tolerance. > I would be very skeptical of truncating lineage if it is not reliable. > On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <j...@apache.org> wrote: > >> Xiangrui Meng created SPARK-1855: >> ------------------------------------ >> >> Summary: Provide memory-and-local-disk RDD checkpointing >> Key: SPARK-1855 >> URL: https://issues.apache.org/jira/browse/SPARK-1855 >> Project: Spark >> Issue Type: New Feature >> Components: MLlib, Spark Core >> Affects Versions: 1.0.0 >> Reporter: Xiangrui Meng >> >> >> Checkpointing is used to cut long lineage while maintaining fault >> tolerance. The current implementation is HDFS-based. Using the BlockRDD we >> can create in-memory-and-local-disk (with replication) checkpoints that are >> not as reliable as HDFS-based solution but faster. >> >> It can help applications that require many iterations. >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.2#6252) >>