Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan
My bad ... I was replying via mobile, and I did not realize responses
to JIRA mails were not mirrored to JIRA - unlike PR responses !


Regards,
Mridul

On Sun, May 18, 2014 at 2:50 AM, Matei Zaharia  wrote:
> We do actually have replicated StorageLevels in Spark. You can use 
> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom 
> replication factor.
>
> BTW you guys should probably have this discussion on the JIRA rather than the 
> dev list; I think the replies somehow ended up on the dev list.
>
> Matei
>
> On May 17, 2014, at 1:36 AM, Mridul Muralidharan  wrote:
>
>> We don't have 3x replication in spark :-)
>> And if we use replicated storagelevel, while decreasing odds of failure, it
>> does not eliminate it (since we are not doing a great job with replication
>> anyway from fault tolerance point of view).
>> Also it does take a nontrivial performance hit with replicated levels.
>>
>> Regards,
>> Mridul
>> On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:
>>
>>> With 3x replication, we should be able to achieve fault tolerance.
>>> This checkPointed RDD can be cleared if we have another in-memory
>>> checkPointed RDD down the line. It can avoid hitting disk if we have
>>> enough memory to use. We need to investigate more to find a good
>>> solution. -Xiangrui
>>>
>>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan 
>>> wrote:
 Effectively this is persist without fault tolerance.
 Failure of any node means complete lack of fault tolerance.
 I would be very skeptical of truncating lineage if it is not reliable.
 On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:

> Xiangrui Meng created SPARK-1855:
> 
>
> Summary: Provide memory-and-local-disk RDD checkpointing
> Key: SPARK-1855
> URL: https://issues.apache.org/jira/browse/SPARK-1855
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, Spark Core
>Affects Versions: 1.0.0
>Reporter: Xiangrui Meng
>
>
> Checkpointing is used to cut long lineage while maintaining fault
> tolerance. The current implementation is HDFS-based. Using the BlockRDD
>>> we
> can create in-memory-and-local-disk (with replication) checkpoints that
>>> are
> not as reliable as HDFS-based solution but faster.
>
> It can help applications that require many iterations.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>
>>>
>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Matei Zaharia
BTW in Spark the consensus so far was that we’d use the dev@ list for 
high-level discussions (e.g. change in the development process, major features, 
proposals of new components, release votes) and keep lower-level issue tracking 
in JIRA. This is just how the project operated before so it was the easiest way 
for people to continue.

Matei

On May 18, 2014, at 4:01 PM, Matei Zaharia  wrote:

> Ah, maybe it’s just different in other Apache projects. All the ones I’ve 
> participated in have had their design discussions on JIRA. For example take a 
> look at https://issues.apache.org/jira/browse/HDFS-4949. (Most design 
> discussions in Hadoop are also on JIRA).
> 
> Hosting it this way is more convenient because most users come in looking at 
> the issue tracker, not at mailing list archives (if only because the issue 
> tracker is much more searchable for issues).
> 
> Matei
> 
> On May 18, 2014, at 2:19 PM, Jacek Laskowski  wrote:
> 
>> On Sun, May 18, 2014 at 8:28 PM, Andrew Ash  wrote:
>>> The nice thing about putting discussion on the Jira is that everything
>>> about the bug is in one place.  So people looking to understand the
>>> discussion a few years from now only have to look on the jira ticket rather
>>> than also search the mailing list archives and hope commenters all put the
>>> string "SPARK-1855" into the messages.
>> 
>> My understanding is that JIRA is not for discussions. In a sense it
>> could be used for a few opinions, but have never seen it elsewhere and
>> am curious if it's an approach for the project (that I might accept
>> ultimately, but that would require some adoption time).
>> 
>> What wrong with linking a discussion thread to a JIRA issue?
>> 
>> Jacek
>> 
>> -- 
>> Jacek Laskowski | http://blog.japila.pl
>> "Never discourage anyone who continually makes progress, no matter how
>> slow." Plato
> 



Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Matei Zaharia
Ah, maybe it’s just different in other Apache projects. All the ones I’ve 
participated in have had their design discussions on JIRA. For example take a 
look at https://issues.apache.org/jira/browse/HDFS-4949. (Most design 
discussions in Hadoop are also on JIRA).

Hosting it this way is more convenient because most users come in looking at 
the issue tracker, not at mailing list archives (if only because the issue 
tracker is much more searchable for issues).

Matei

On May 18, 2014, at 2:19 PM, Jacek Laskowski  wrote:

> On Sun, May 18, 2014 at 8:28 PM, Andrew Ash  wrote:
>> The nice thing about putting discussion on the Jira is that everything
>> about the bug is in one place.  So people looking to understand the
>> discussion a few years from now only have to look on the jira ticket rather
>> than also search the mailing list archives and hope commenters all put the
>> string "SPARK-1855" into the messages.
> 
> My understanding is that JIRA is not for discussions. In a sense it
> could be used for a few opinions, but have never seen it elsewhere and
> am curious if it's an approach for the project (that I might accept
> ultimately, but that would require some adoption time).
> 
> What wrong with linking a discussion thread to a JIRA issue?
> 
> Jacek
> 
> -- 
> Jacek Laskowski | http://blog.japila.pl
> "Never discourage anyone who continually makes progress, no matter how
> slow." Plato



Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Jacek Laskowski
On Sun, May 18, 2014 at 8:28 PM, Andrew Ash  wrote:
> The nice thing about putting discussion on the Jira is that everything
> about the bug is in one place.  So people looking to understand the
> discussion a few years from now only have to look on the jira ticket rather
> than also search the mailing list archives and hope commenters all put the
> string "SPARK-1855" into the messages.

My understanding is that JIRA is not for discussions. In a sense it
could be used for a few opinions, but have never seen it elsewhere and
am curious if it's an approach for the project (that I might accept
ultimately, but that would require some adoption time).

What wrong with linking a discussion thread to a JIRA issue?

Jacek

-- 
Jacek Laskowski | http://blog.japila.pl
"Never discourage anyone who continually makes progress, no matter how
slow." Plato


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Matei Zaharia
JIRAs comments are mirrored to the iss...@spark.apache.org list, so people who 
want to get them by email can do so. In theory one should also be able to reply 
to one of those emails and have the message show up in JIRA, but I don’t think 
ours is configured that way. I’m not sure why it wouldn’t be the “ASF way” when 
the JIRA instance is hosted by the ASF and mirrored on ASF lists.

Matei

On May 18, 2014, at 11:28 AM, Andrew Ash  wrote:

> The nice thing about putting discussion on the Jira is that everything
> about the bug is in one place.  So people looking to understand the
> discussion a few years from now only have to look on the jira ticket rather
> than also search the mailing list archives and hope commenters all put the
> string "SPARK-1855" into the messages.
> 
> 
> On Sun, May 18, 2014 at 10:34 AM, Jacek Laskowski  wrote:
> 
>> Hi,
>> 
>> I'm curious if it's a common approach to have discussions in JIRA not here.
>> I don't think it's the ASF way.
>> 
>> Pozdrawiam,
>> Jacek Laskowski
>> http://blog.japila.pl
>> 17 maj 2014 23:55 "Matei Zaharia"  napisał(a):
>> 
>>> We do actually have replicated StorageLevels in Spark. You can use
>>> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom
>>> replication factor.
>>> 
>>> BTW you guys should probably have this discussion on the JIRA rather than
>>> the dev list; I think the replies somehow ended up on the dev list.
>>> 
>>> Matei
>>> 
>>> On May 17, 2014, at 1:36 AM, Mridul Muralidharan 
>> wrote:
>>> 
 We don't have 3x replication in spark :-)
 And if we use replicated storagelevel, while decreasing odds of
>> failure,
>>> it
 does not eliminate it (since we are not doing a great job with
>>> replication
 anyway from fault tolerance point of view).
 Also it does take a nontrivial performance hit with replicated levels.
 
 Regards,
 Mridul
 On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:
 
> With 3x replication, we should be able to achieve fault tolerance.
> This checkPointed RDD can be cleared if we have another in-memory
> checkPointed RDD down the line. It can avoid hitting disk if we have
> enough memory to use. We need to investigate more to find a good
> solution. -Xiangrui
> 
> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <
>> mri...@gmail.com>
> wrote:
>> Effectively this is persist without fault tolerance.
>> Failure of any node means complete lack of fault tolerance.
>> I would be very skeptical of truncating lineage if it is not
>> reliable.
>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" 
>>> wrote:
>> 
>>> Xiangrui Meng created SPARK-1855:
>>> 
>>> 
>>>Summary: Provide memory-and-local-disk RDD checkpointing
>>>Key: SPARK-1855
>>>URL:
>> https://issues.apache.org/jira/browse/SPARK-1855
>>>Project: Spark
>>> Issue Type: New Feature
>>> Components: MLlib, Spark Core
>>>   Affects Versions: 1.0.0
>>>   Reporter: Xiangrui Meng
>>> 
>>> 
>>> Checkpointing is used to cut long lineage while maintaining fault
>>> tolerance. The current implementation is HDFS-based. Using the
>>> BlockRDD
> we
>>> can create in-memory-and-local-disk (with replication) checkpoints
>>> that
> are
>>> not as reliable as HDFS-based solution but faster.
>>> 
>>> It can help applications that require many iterations.
>>> 
>>> 
>>> 
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)
>>> 
> 
>>> 
>>> 
>> 



Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Andrew Ash
The nice thing about putting discussion on the Jira is that everything
about the bug is in one place.  So people looking to understand the
discussion a few years from now only have to look on the jira ticket rather
than also search the mailing list archives and hope commenters all put the
string "SPARK-1855" into the messages.


On Sun, May 18, 2014 at 10:34 AM, Jacek Laskowski  wrote:

> Hi,
>
> I'm curious if it's a common approach to have discussions in JIRA not here.
> I don't think it's the ASF way.
>
> Pozdrawiam,
> Jacek Laskowski
> http://blog.japila.pl
> 17 maj 2014 23:55 "Matei Zaharia"  napisał(a):
>
> > We do actually have replicated StorageLevels in Spark. You can use
> > MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom
> > replication factor.
> >
> > BTW you guys should probably have this discussion on the JIRA rather than
> > the dev list; I think the replies somehow ended up on the dev list.
> >
> > Matei
> >
> > On May 17, 2014, at 1:36 AM, Mridul Muralidharan 
> wrote:
> >
> > > We don't have 3x replication in spark :-)
> > > And if we use replicated storagelevel, while decreasing odds of
> failure,
> > it
> > > does not eliminate it (since we are not doing a great job with
> > replication
> > > anyway from fault tolerance point of view).
> > > Also it does take a nontrivial performance hit with replicated levels.
> > >
> > > Regards,
> > > Mridul
> > > On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:
> > >
> > >> With 3x replication, we should be able to achieve fault tolerance.
> > >> This checkPointed RDD can be cleared if we have another in-memory
> > >> checkPointed RDD down the line. It can avoid hitting disk if we have
> > >> enough memory to use. We need to investigate more to find a good
> > >> solution. -Xiangrui
> > >>
> > >> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <
> mri...@gmail.com>
> > >> wrote:
> > >>> Effectively this is persist without fault tolerance.
> > >>> Failure of any node means complete lack of fault tolerance.
> > >>> I would be very skeptical of truncating lineage if it is not
> reliable.
> > >>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" 
> > wrote:
> > >>>
> >  Xiangrui Meng created SPARK-1855:
> >  
> > 
> >  Summary: Provide memory-and-local-disk RDD checkpointing
> >  Key: SPARK-1855
> >  URL:
> https://issues.apache.org/jira/browse/SPARK-1855
> >  Project: Spark
> >   Issue Type: New Feature
> >   Components: MLlib, Spark Core
> > Affects Versions: 1.0.0
> > Reporter: Xiangrui Meng
> > 
> > 
> >  Checkpointing is used to cut long lineage while maintaining fault
> >  tolerance. The current implementation is HDFS-based. Using the
> > BlockRDD
> > >> we
> >  can create in-memory-and-local-disk (with replication) checkpoints
> > that
> > >> are
> >  not as reliable as HDFS-based solution but faster.
> > 
> >  It can help applications that require many iterations.
> > 
> > 
> > 
> >  --
> >  This message was sent by Atlassian JIRA
> >  (v6.2#6252)
> > 
> > >>
> >
> >
>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Jacek Laskowski
Hi,

I'm curious if it's a common approach to have discussions in JIRA not here.
I don't think it's the ASF way.

Pozdrawiam,
Jacek Laskowski
http://blog.japila.pl
17 maj 2014 23:55 "Matei Zaharia"  napisał(a):

> We do actually have replicated StorageLevels in Spark. You can use
> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom
> replication factor.
>
> BTW you guys should probably have this discussion on the JIRA rather than
> the dev list; I think the replies somehow ended up on the dev list.
>
> Matei
>
> On May 17, 2014, at 1:36 AM, Mridul Muralidharan  wrote:
>
> > We don't have 3x replication in spark :-)
> > And if we use replicated storagelevel, while decreasing odds of failure,
> it
> > does not eliminate it (since we are not doing a great job with
> replication
> > anyway from fault tolerance point of view).
> > Also it does take a nontrivial performance hit with replicated levels.
> >
> > Regards,
> > Mridul
> > On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:
> >
> >> With 3x replication, we should be able to achieve fault tolerance.
> >> This checkPointed RDD can be cleared if we have another in-memory
> >> checkPointed RDD down the line. It can avoid hitting disk if we have
> >> enough memory to use. We need to investigate more to find a good
> >> solution. -Xiangrui
> >>
> >> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan 
> >> wrote:
> >>> Effectively this is persist without fault tolerance.
> >>> Failure of any node means complete lack of fault tolerance.
> >>> I would be very skeptical of truncating lineage if it is not reliable.
> >>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" 
> wrote:
> >>>
>  Xiangrui Meng created SPARK-1855:
>  
> 
>  Summary: Provide memory-and-local-disk RDD checkpointing
>  Key: SPARK-1855
>  URL: https://issues.apache.org/jira/browse/SPARK-1855
>  Project: Spark
>   Issue Type: New Feature
>   Components: MLlib, Spark Core
> Affects Versions: 1.0.0
> Reporter: Xiangrui Meng
> 
> 
>  Checkpointing is used to cut long lineage while maintaining fault
>  tolerance. The current implementation is HDFS-based. Using the
> BlockRDD
> >> we
>  can create in-memory-and-local-disk (with replication) checkpoints
> that
> >> are
>  not as reliable as HDFS-based solution but faster.
> 
>  It can help applications that require many iterations.
> 
> 
> 
>  --
>  This message was sent by Atlassian JIRA
>  (v6.2#6252)
> 
> >>
>
>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Matei Zaharia
We do actually have replicated StorageLevels in Spark. You can use 
MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom 
replication factor.

BTW you guys should probably have this discussion on the JIRA rather than the 
dev list; I think the replies somehow ended up on the dev list.

Matei

On May 17, 2014, at 1:36 AM, Mridul Muralidharan  wrote:

> We don't have 3x replication in spark :-)
> And if we use replicated storagelevel, while decreasing odds of failure, it
> does not eliminate it (since we are not doing a great job with replication
> anyway from fault tolerance point of view).
> Also it does take a nontrivial performance hit with replicated levels.
> 
> Regards,
> Mridul
> On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:
> 
>> With 3x replication, we should be able to achieve fault tolerance.
>> This checkPointed RDD can be cleared if we have another in-memory
>> checkPointed RDD down the line. It can avoid hitting disk if we have
>> enough memory to use. We need to investigate more to find a good
>> solution. -Xiangrui
>> 
>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan 
>> wrote:
>>> Effectively this is persist without fault tolerance.
>>> Failure of any node means complete lack of fault tolerance.
>>> I would be very skeptical of truncating lineage if it is not reliable.
>>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:
>>> 
 Xiangrui Meng created SPARK-1855:
 
 
 Summary: Provide memory-and-local-disk RDD checkpointing
 Key: SPARK-1855
 URL: https://issues.apache.org/jira/browse/SPARK-1855
 Project: Spark
  Issue Type: New Feature
  Components: MLlib, Spark Core
Affects Versions: 1.0.0
Reporter: Xiangrui Meng
 
 
 Checkpointing is used to cut long lineage while maintaining fault
 tolerance. The current implementation is HDFS-based. Using the BlockRDD
>> we
 can create in-memory-and-local-disk (with replication) checkpoints that
>> are
 not as reliable as HDFS-based solution but faster.
 
 It can help applications that require many iterations.
 
 
 
 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)
 
>> 



Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Matei Zaharia
BTW for what it’s worth I agree this is a good option to add, the only tricky 
thing will be making sure the checkpoint blocks are not garbage-collected by 
the block store. I don’t think they will be though.

Matei
On May 17, 2014, at 2:20 PM, Matei Zaharia  wrote:

> We do actually have replicated StorageLevels in Spark. You can use 
> MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom 
> replication factor.
> 
> BTW you guys should probably have this discussion on the JIRA rather than the 
> dev list; I think the replies somehow ended up on the dev list.
> 
> Matei
> 
> On May 17, 2014, at 1:36 AM, Mridul Muralidharan  wrote:
> 
>> We don't have 3x replication in spark :-)
>> And if we use replicated storagelevel, while decreasing odds of failure, it
>> does not eliminate it (since we are not doing a great job with replication
>> anyway from fault tolerance point of view).
>> Also it does take a nontrivial performance hit with replicated levels.
>> 
>> Regards,
>> Mridul
>> On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:
>> 
>>> With 3x replication, we should be able to achieve fault tolerance.
>>> This checkPointed RDD can be cleared if we have another in-memory
>>> checkPointed RDD down the line. It can avoid hitting disk if we have
>>> enough memory to use. We need to investigate more to find a good
>>> solution. -Xiangrui
>>> 
>>> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan 
>>> wrote:
 Effectively this is persist without fault tolerance.
 Failure of any node means complete lack of fault tolerance.
 I would be very skeptical of truncating lineage if it is not reliable.
 On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:
 
> Xiangrui Meng created SPARK-1855:
> 
> 
>Summary: Provide memory-and-local-disk RDD checkpointing
>Key: SPARK-1855
>URL: https://issues.apache.org/jira/browse/SPARK-1855
>Project: Spark
> Issue Type: New Feature
> Components: MLlib, Spark Core
>   Affects Versions: 1.0.0
>   Reporter: Xiangrui Meng
> 
> 
> Checkpointing is used to cut long lineage while maintaining fault
> tolerance. The current implementation is HDFS-based. Using the BlockRDD
>>> we
> can create in-memory-and-local-disk (with replication) checkpoints that
>>> are
> not as reliable as HDFS-based solution but faster.
> 
> It can help applications that require many iterations.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
> 
>>> 
> 



Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Mridul Muralidharan
We don't have 3x replication in spark :-)
And if we use replicated storagelevel, while decreasing odds of failure, it
does not eliminate it (since we are not doing a great job with replication
anyway from fault tolerance point of view).
Also it does take a nontrivial performance hit with replicated levels.

Regards,
Mridul
 On 17-May-2014 8:16 am, "Xiangrui Meng"  wrote:

> With 3x replication, we should be able to achieve fault tolerance.
> This checkPointed RDD can be cleared if we have another in-memory
> checkPointed RDD down the line. It can avoid hitting disk if we have
> enough memory to use. We need to investigate more to find a good
> solution. -Xiangrui
>
> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan 
> wrote:
> > Effectively this is persist without fault tolerance.
> > Failure of any node means complete lack of fault tolerance.
> > I would be very skeptical of truncating lineage if it is not reliable.
> >  On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:
> >
> >> Xiangrui Meng created SPARK-1855:
> >> 
> >>
> >>  Summary: Provide memory-and-local-disk RDD checkpointing
> >>  Key: SPARK-1855
> >>  URL: https://issues.apache.org/jira/browse/SPARK-1855
> >>  Project: Spark
> >>   Issue Type: New Feature
> >>   Components: MLlib, Spark Core
> >> Affects Versions: 1.0.0
> >> Reporter: Xiangrui Meng
> >>
> >>
> >> Checkpointing is used to cut long lineage while maintaining fault
> >> tolerance. The current implementation is HDFS-based. Using the BlockRDD
> we
> >> can create in-memory-and-local-disk (with replication) checkpoints that
> are
> >> not as reliable as HDFS-based solution but faster.
> >>
> >> It can help applications that require many iterations.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.2#6252)
> >>
>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Xiangrui Meng
With 3x replication, we should be able to achieve fault tolerance.
This checkPointed RDD can be cleared if we have another in-memory
checkPointed RDD down the line. It can avoid hitting disk if we have
enough memory to use. We need to investigate more to find a good
solution. -Xiangrui

On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan  wrote:
> Effectively this is persist without fault tolerance.
> Failure of any node means complete lack of fault tolerance.
> I would be very skeptical of truncating lineage if it is not reliable.
>  On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:
>
>> Xiangrui Meng created SPARK-1855:
>> 
>>
>>  Summary: Provide memory-and-local-disk RDD checkpointing
>>  Key: SPARK-1855
>>  URL: https://issues.apache.org/jira/browse/SPARK-1855
>>  Project: Spark
>>   Issue Type: New Feature
>>   Components: MLlib, Spark Core
>> Affects Versions: 1.0.0
>> Reporter: Xiangrui Meng
>>
>>
>> Checkpointing is used to cut long lineage while maintaining fault
>> tolerance. The current implementation is HDFS-based. Using the BlockRDD we
>> can create in-memory-and-local-disk (with replication) checkpoints that are
>> not as reliable as HDFS-based solution but faster.
>>
>> It can help applications that require many iterations.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>


Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Mridul Muralidharan
Effectively this is persist without fault tolerance.
Failure of any node means complete lack of fault tolerance.
I would be very skeptical of truncating lineage if it is not reliable.
 On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)"  wrote:

> Xiangrui Meng created SPARK-1855:
> 
>
>  Summary: Provide memory-and-local-disk RDD checkpointing
>  Key: SPARK-1855
>  URL: https://issues.apache.org/jira/browse/SPARK-1855
>  Project: Spark
>   Issue Type: New Feature
>   Components: MLlib, Spark Core
> Affects Versions: 1.0.0
> Reporter: Xiangrui Meng
>
>
> Checkpointing is used to cut long lineage while maintaining fault
> tolerance. The current implementation is HDFS-based. Using the BlockRDD we
> can create in-memory-and-local-disk (with replication) checkpoints that are
> not as reliable as HDFS-based solution but faster.
>
> It can help applications that require many iterations.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>