Re: Memory-efficient successive calls to repartition()

2015-09-08 Thread Aurélien Bellet
grows regularly throughout the execution until no free space is available, despite the call to the GC. Aurelien Le 9/8/15 6:22 PM, Aurélien Bellet a écrit : Hi, This is what I tried: for i in range(1000): print i data2=data.repartition(50).cache() if (i+1) % 10 == 0

Re: Memory-efficient successive calls to repartition()

2015-09-08 Thread Aurélien Bellet
Aurélien Bellet <aurelien.bel...@telecom-paristech.fr <mailto:aurelien.bel...@telecom-paristech.fr>>: Thanks a lot for the useful link and comments Alexis! First of all, the problem occurs without doing anything else in the code (except of course loading my da

Re: Memory-efficient successive calls to repartition()

2015-09-02 Thread Aurélien Bellet
GMT+08:00 Aurélien Bellet <aurelien.bel...@telecom-paristech.fr <mailto:aurelien.bel...@telecom-paristech.fr>>: Dear Alexis, Thanks again for your reply. After reading about checkpointing I have modified my sample code as follows: for i in range(1000):

Re: Memory-efficient successive calls to repartition()

2015-09-01 Thread Aurélien Bellet
Dear Alexis, Thanks again for your reply. After reading about checkpointing I have modified my sample code as follows: for i in range(1000): print i data2=data.repartition(50).cache() if (i+1) % 10 == 0: data2.checkpoint() data2.first() # materialize rdd

Re: Random pairs / RDD order

2015-04-19 Thread Aurélien Bellet
= rdd.sample(true,0.01,42).mapPartitions(scala.util.Random.shuffle) val sample2 = rdd.sample(true,0.01,43).mapPartitions(scala.util.Random.shuffle) ... On Fri, Apr 17, 2015 at 3:05 AM, Aurélien Bellet aurelien.bel...@telecom-paristech.fr mailto:aurelien.bel...@telecom-paristech.fr wrote: Hi Sean

Re: Random pairs / RDD order

2015-04-17 Thread Aurélien Bellet
Hi Sean, Thanks a lot for your reply. The problem is that I need to sample random *independent* pairs. If I draw two samples and build all n*(n-1) pairs then there is a lot of dependency. My current solution is also not satisfying because some pairs (the closest ones in a partition) have a