BTW for what it’s worth I agree this is a good option to add, the only tricky 
thing will be making sure the checkpoint blocks are not garbage-collected by 
the block store. I don’t think they will be though.

Matei
On May 17, 2014, at 2:20 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:

> We do actually have replicated StorageLevels in Spark. You can use 
> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom 
> replication factor.
> 
> BTW you guys should probably have this discussion on the JIRA rather than the 
> dev list; I think the replies somehow ended up on the dev list.
> 
> Matei
> 
> On May 17, 2014, at 1:36 AM, Mridul Muralidharan <mri...@gmail.com> wrote:
> 
>> We don't have 3x replication in spark :-)
>> And if we use replicated storagelevel, while decreasing odds of failure, it
>> does not eliminate it (since we are not doing a great job with replication
>> anyway from fault tolerance point of view).
>> Also it does take a nontrivial performance hit with replicated levels.
>> 
>> Regards,
>> Mridul
>> On 17-May-2014 8:16 am, "Xiangrui Meng" <men...@gmail.com> wrote:
>> 
>>> With 3x replication, we should be able to achieve fault tolerance.
>>> This checkPointed RDD can be cleared if we have another in-memory
>>> checkPointed RDD down the line. It can avoid hitting disk if we have
>>> enough memory to use. We need to investigate more to find a good
>>> solution. -Xiangrui
>>> 
>>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <mri...@gmail.com>
>>> wrote:
>>>> Effectively this is persist without fault tolerance.
>>>> Failure of any node means complete lack of fault tolerance.
>>>> I would be very skeptical of truncating lineage if it is not reliable.
>>>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <j...@apache.org> wrote:
>>>> 
>>>>> Xiangrui Meng created SPARK-1855:
>>>>> ------------------------------------
>>>>> 
>>>>>            Summary: Provide memory-and-local-disk RDD checkpointing
>>>>>                Key: SPARK-1855
>>>>>                URL: https://issues.apache.org/jira/browse/SPARK-1855
>>>>>            Project: Spark
>>>>>         Issue Type: New Feature
>>>>>         Components: MLlib, Spark Core
>>>>>   Affects Versions: 1.0.0
>>>>>           Reporter: Xiangrui Meng
>>>>> 
>>>>> 
>>>>> Checkpointing is used to cut long lineage while maintaining fault
>>>>> tolerance. The current implementation is HDFS-based. Using the BlockRDD
>>> we
>>>>> can create in-memory-and-local-disk (with replication) checkpoints that
>>> are
>>>>> not as reliable as HDFS-based solution but faster.
>>>>> 
>>>>> It can help applications that require many iterations.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> This message was sent by Atlassian JIRA
>>>>> (v6.2#6252)
>>>>> 
>>> 
> 

Reply via email to