Thanks a lot!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartitio
st() # unpersist previous version
data=data2
Help and suggestions on this would be greatly
appreciated!
Thanks a lot!
--
View this message in context:
http://apache-spark-us
Net_gist_train.txt",50).map(parseLine).cache()
>> data.count()
>> for i in range(1000):
>> data=data.repartition(50).persist()
>> # below several operations are done on data
>>
>>
>>
s that
>>> I am doing something wrong because as the iterations go the
>>> memory usage
>>> increases, causing the job to spill onto HDFS, which
>
hat am I doing wrong? I tried the following but it doesn't solve
>> the issue:
>>
>> for i in range(1000):
>> data2=data.repartition(50).persist()
>>
Help and suggestions on this would be greatly appreciated!
Thanks a lot!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html
rsist()
data2.count() # materialize rdd
data.unpersist() # unpersist previous version
data=data2
Help and suggestions on this would be greatly appreciated! Thanks a lot!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.
!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html
Sent from the Apache Spark User List mailing list archive at Nabble.com