Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Mridul Muralidharan Sun, 18 May 2014 21:17:17 -0700

My bad ... I was replying via mobile, and I did not realize responses
to JIRA mails were not mirrored to JIRA - unlike PR responses !



Regards,
Mridul

On Sun, May 18, 2014 at 2:50 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> We do actually have replicated StorageLevels in Spark. You can use 
> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom 
> replication factor.
>
> BTW you guys should probably have this discussion on the JIRA rather than the 
> dev list; I think the replies somehow ended up on the dev list.
>
> Matei
>
> On May 17, 2014, at 1:36 AM, Mridul Muralidharan <mri...@gmail.com> wrote:
>
>> We don't have 3x replication in spark :-)
>> And if we use replicated storagelevel, while decreasing odds of failure, it
>> does not eliminate it (since we are not doing a great job with replication
>> anyway from fault tolerance point of view).
>> Also it does take a nontrivial performance hit with replicated levels.
>>
>> Regards,
>> Mridul
>> On 17-May-2014 8:16 am, "Xiangrui Meng" <men...@gmail.com> wrote:
>>
>>> With 3x replication, we should be able to achieve fault tolerance.
>>> This checkPointed RDD can be cleared if we have another in-memory
>>> checkPointed RDD down the line. It can avoid hitting disk if we have
>>> enough memory to use. We need to investigate more to find a good
>>> solution. -Xiangrui
>>>
>>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <mri...@gmail.com>
>>> wrote:
>>>> Effectively this is persist without fault tolerance.
>>>> Failure of any node means complete lack of fault tolerance.
>>>> I would be very skeptical of truncating lineage if it is not reliable.
>>>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <j...@apache.org> wrote:
>>>>
>>>>> Xiangrui Meng created SPARK-1855:
>>>>> ------------------------------------
>>>>>
>>>>>             Summary: Provide memory-and-local-disk RDD checkpointing
>>>>>                 Key: SPARK-1855
>>>>>                 URL: https://issues.apache.org/jira/browse/SPARK-1855
>>>>>             Project: Spark
>>>>>          Issue Type: New Feature
>>>>>          Components: MLlib, Spark Core
>>>>>    Affects Versions: 1.0.0
>>>>>            Reporter: Xiangrui Meng
>>>>>
>>>>>
>>>>> Checkpointing is used to cut long lineage while maintaining fault
>>>>> tolerance. The current implementation is HDFS-based. Using the BlockRDD
>>> we
>>>>> can create in-memory-and-local-disk (with replication) checkpoints that
>>> are
>>>>> not as reliable as HDFS-based solution but faster.
>>>>>
>>>>> It can help applications that require many iterations.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> This message was sent by Atlassian JIRA
>>>>> (v6.2#6252)
>>>>>
>>>
>

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Reply via email to