grows regularly throughout the execution
until no free space is available, despite the call to the GC.
Aurelien
Le 9/8/15 6:22 PM, Aurélien Bellet a écrit :
Hi,
This is what I tried:
for i in range(1000):
print i
data2=data.repartition(50).cache()
if (i+1) % 10 == 0
Aurélien Bellet
<aurelien.bel...@telecom-paristech.fr
<mailto:aurelien.bel...@telecom-paristech.fr>>:
Thanks a lot for the useful link and comments Alexis!
First of all, the problem occurs without doing anything else in the
code (except of course loading my da
GMT+08:00 Aurélien Bellet
<aurelien.bel...@telecom-paristech.fr
<mailto:aurelien.bel...@telecom-paristech.fr>>:
Dear Alexis,
Thanks again for your reply. After reading about checkpointing I
have modified my sample code as follows:
for i in range(1000):
Dear Alexis,
Thanks again for your reply. After reading about checkpointing I have
modified my sample code as follows:
for i in range(1000):
print i
data2=data.repartition(50).cache()
if (i+1) % 10 == 0:
data2.checkpoint()
data2.first() # materialize rdd
=
rdd.sample(true,0.01,42).mapPartitions(scala.util.Random.shuffle)
val sample2 =
rdd.sample(true,0.01,43).mapPartitions(scala.util.Random.shuffle)
...
On Fri, Apr 17, 2015 at 3:05 AM, Aurélien Bellet
aurelien.bel...@telecom-paristech.fr
mailto:aurelien.bel...@telecom-paristech.fr wrote:
Hi Sean
Hi Sean,
Thanks a lot for your reply. The problem is that I need to sample random
*independent* pairs. If I draw two samples and build all n*(n-1) pairs
then there is a lot of dependency. My current solution is also not
satisfying because some pairs (the closest ones in a partition) have a