Spark structured streaming with periodical persist and unpersist

2021-02-11 Thread act_coder
I am currently building a spark structured streaming application where I am doing a batch-stream join. And the source for the batch data gets updated periodically. So, I am planning to do a persist/unpersist of that batch data periodically. Below is a sample code which I am using to persist

Broadcast variables: destroy/unpersist unexpected behaviour

2018-03-13 Thread Sunil
I experienced the below two cases when unpersisting or destroying broadcast variables in pyspark. But the same works good in spark scala shell. Any clue why this happens ? Is it a bug in pyspark? ***Case 1:*** >>> b1 = sc.broadcast([1,2,3]) >>> b1.value [1, 2, 3] >>> b1.destroy()

Unpersist all from memory in spark 2.2

2017-09-25 Thread Cesar
Is there a way to unpersist all data frames, data sets, and/or RDD in Spark 2.2 in a single call? Thanks -- Cesar Flores

Re: Unpersist RDD in Graphx

2016-02-01 Thread Takeshi Yamamuro
Hi, Please call "Graph#unpersist" that releases two RDDs, vertex and edge ones. "Graph#unpersist" just invokes "Graph#unpersistVertices" and "Graph#edges#unpersist"; "Graph#unpersistVertices" releases memory for vertices and "Graph#edg

Unpersist RDD in Graphx

2016-01-31 Thread Zhang, Jingyu
Hi, What is he best way to unpersist the RDD in graphx to release memory? RDD.unpersist or RDD.unpersistVertices and RDD..edges.unpersist I study the source code of Pregel.scala, Both of above were used between line 148 and line 150. Can anyone please tell me what the different? In addition, what

Re: automatically unpersist RDDs which are not used for 24 hours?

2016-01-13 Thread Andrew Or
questions, -Andrew 2016-01-13 11:36 GMT-08:00 Alexander Pivovarov <apivova...@gmail.com>: > Is it possible to automatically unpersist RDDs which are not used for 24 > hours? >

automatically unpersist RDDs which are not used for 24 hours?

2016-01-13 Thread Alexander Pivovarov
Is it possible to automatically unpersist RDDs which are not used for 24 hours?

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-12-08 Thread Ewan Higgs
Sean, Thanks. It's a developer API and doesn't appear to be exposed. Ewan On 07/12/15 15:06, Sean Owen wrote: I'm not sure if this is available in Python but from 1.3 on you should be able to call ALS.setFinalRDDStorageLevel with level "none" to ask it to unpersist when it is don

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-12-07 Thread Ewan Higgs
how I could unpersist the data so I could clean up the files, but I was unsuccessful. We're using Spark 1.5.0 Yours, Ewan On 16/07/15 23:18, Stahlman, Jonathan wrote: Hello all, I am running the Spark recommendation algorithm in MLlib and I have been studying its output with various mo

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-12-07 Thread Sean Owen
I'm not sure if this is available in Python but from 1.3 on you should be able to call ALS.setFinalRDDStorageLevel with level "none" to ask it to unpersist when it is done. On Mon, Dec 7, 2015 at 1:42 PM, Ewan Higgs <ewan.hi...@ugent.be> wrote: > Jonathan, > Did you

Re: How to unpersist a DStream in Spark Streaming

2015-11-06 Thread Adrian Tanase
1. Spark Streaming will automatically unpersist outdated data (you already mentioned about the configurations). 2. If streaming job is started, I think you may lose the control of the job, when do you call this unpersist, how to call this unpersist (from another thread)? Thanks Saisai On Thu, N

Re: How to unpersist a DStream in Spark Streaming

2015-11-05 Thread swetha kasireddy
usage scenario of DStream unpersisting? > > From my understanding: > > 1. Spark Streaming will automatically unpersist outdated data (you already > mentioned about the configurations). > 2. If streaming job is started, I think you may lose the control of the > job, when do

Re: How to unpersist a DStream in Spark Streaming

2015-11-05 Thread Tathagata Das
> Hi Swetha, >> >> Would you mind elaborating your usage scenario of DStream unpersisting? >> >> From my understanding: >> >> 1. Spark Streaming will automatically unpersist outdated data (you >> already mentioned about the configurations). >> 2.

How to unpersist a DStream in Spark Streaming

2015-11-04 Thread swetha
Hi, How to unpersist a DStream in Spark Streaming? I know that we can persist using dStream.persist() or dStream.cache. But, I don't see any method to unPersist. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-unpersist-a-DStream

Re: How to unpersist a DStream in Spark Streaming

2015-11-04 Thread Saisai Shao
Hi Swetha, Would you mind elaborating your usage scenario of DStream unpersisting? >From my understanding: 1. Spark Streaming will automatically unpersist outdated data (you already mentioned about the configurations). 2. If streaming job is started, I think you may lose the control of the

Re: How to unpersist a DStream in Spark Streaming

2015-11-04 Thread Yashwanth Kumar
Hi, DStream->Discretized Streams are made up of multiple RDDs You can unpersist each RDD by accessing the individual RDD's using dstreamrdd.foreachRDD { rdd.unpersist(). } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-unpersist-a-DStr

Re: How to unpersist a DStream in Spark Streaming

2015-11-04 Thread swetha kasireddy
Other than setting the following. sparkConf.set("spark.streaming.unpersist", "true") sparkConf.set("spark.cleaner.ttl", "7200s") On Wed, Nov 4, 2015 at 5:03 PM, swetha <swethakasire...@gmail.com> wrote: > Hi, > > How to unpersist a DStream

unpersist RDD from another thread

2015-09-16 Thread Paul Weiss
rdd.unpersist while 3. is still running Questions: * Will the computation in 3) finish properly even if unpersist was called on the rdd while running? * What happens if a part of the computation fails and the rdd needs to reconstruct based on DAG lineage, will this still work even though unpersist

Re: unpersist RDD from another thread

2015-09-16 Thread Paul Weiss
So in order to not incur any performance issues I should really wait for all usage of the rdd to complete before calling unpersist, correct? On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > unpredictable. I think it will be safe (as in nothing sh

Re: unpersist RDD from another thread

2015-09-16 Thread Tathagata Das
Yes. On Wed, Sep 16, 2015 at 1:12 PM, Paul Weiss <paulweiss@gmail.com> wrote: > So in order to not incur any performance issues I should really wait for > all usage of the rdd to complete before calling unpersist, correct? > > On Wed, Sep 16, 2015 at 4:08 PM, Tathagata Das

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-28 Thread Xiangrui Meng
Hi Stahlman, finalRDDStorageLevel is the storage level for the final user/item factors. It is not common to set it to StorageLevel.NONE, unless you want to save the factors directly to disk. So if it is NONE, we cannot unpersist the intermediate RDDs (in/out blocks) because the final user/item

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-22 Thread Stahlman, Jonathan
of the ALS calculation, these RDDs are no longer needed nor returned, so I would assume the logical choice would be to unpersist() these RDDs. The strategy in the code seems to be set by finalRDDStorageLevel, which for some reason only calls unpersist() on the intermediate RDDs if finalRDDStorageLevel

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-22 Thread Burak Yavuz
Hi Jonathan, I believe calling persist with StorageLevel.NONE doesn't do anything. That's why the unpersist has an if statement before it. Could you give more information about your setup please? Number of cores, memory, number of partitions of ratings_train? Thanks, Burak On Wed, Jul 22, 2015

RE: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-22 Thread Ganelin, Ilya
...@capitalone.com] Sent: Wednesday, July 22, 2015 01:42 PM Eastern Standard Time To: user@spark.apache.org Subject: Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel Hello again, In trying to understand the caching of intermediate RDDs by ALS, I looked into the source code

Re: How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-22 Thread Stahlman, Jonathan
() is being called with finalRDDStorageLevel = StorageLevel.NONE, which I would understand to mean that the intermediate RDDs will not be persisted. Looking here: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L631 unpersist() is only

How to unpersist RDDs generated by ALS/MatrixFactorizationModel

2015-07-16 Thread Stahlman, Jonathan
. A sample code in python is copied below. The issue I have is that each new model which is trained caches a set of RDDs and eventually the executors run out of memory. Is there any way in Pyspark to unpersist() these RDDs after each iteration? The names of the RDDs which I gather from the UI

Re: Making Unpersist Lazy

2015-07-02 Thread Akhil Das
rdd's which are no longer required will be removed from memory by spark itself (which you can consider as lazy?). Thanks Best Regards On Wed, Jul 1, 2015 at 7:48 PM, Jem Tucker jem.tuc...@gmail.com wrote: Hi, The current behavior of rdd.unpersist() appears to not be lazily executed and

Re: Making Unpersist Lazy

2015-07-02 Thread Jem Tucker
Hi, After running some tests it appears the unpersist is called as soon as it is reached, so any tasks using this rdd later on will have to re calculate it. This is fine for simple programs but when an rdd is created within a function and its reference is then lost but children of it continue

RE: Making Unpersist Lazy

2015-07-02 Thread Ganelin, Ilya
Unpersist Lazy Hi, After running some tests it appears the unpersist is called as soon as it is reached, so any tasks using this rdd later on will have to re calculate it. This is fine for simple programs but when an rdd is created within a function and its reference is then lost but children

Making Unpersist Lazy

2015-07-01 Thread Jem Tucker
Hi, The current behavior of rdd.unpersist() appears to not be lazily executed and therefore must be placed after an action. Is there any way to emulate lazy execution of this function so it is added to the task queue? Thanks, Jem

Re: when cached RDD will unpersist its data

2015-06-23 Thread eric wong
In a case that memory cannot hold all the cached RDD, then BlockManager will evict some older block for storage of new RDD block. Hope that will helpful. 2015-06-24 13:22 GMT+08:00 bit1...@163.com bit1...@163.com: I am kind of consused about when cached RDD will unpersist its data. I know we

when cached RDD will unpersist its data

2015-06-23 Thread bit1...@163.com
I am kind of consused about when cached RDD will unpersist its data. I know we can explicitly unpersist it with RDD.unpersist ,but can it be unpersist automatically by the spark framework? Thanks. bit1...@163.com

Re: Futures timed out during unpersist

2015-01-17 Thread Akhil Das
What is the data size? Have you tried increasing the driver memory?? Thanks Best Regards On Sat, Jan 17, 2015 at 1:01 PM, Kevin (Sangwoo) Kim kevin...@apache.org wrote: Hi experts, I got an error during unpersist RDD. Any ideas? java.util.concurrent.TimeoutException: Futures timed out

Re: Futures timed out during unpersist

2015-01-17 Thread Kevin (Sangwoo) Kim
(Sangwoo) Kim kevin...@apache.org wrote: Hi experts, I got an error during unpersist RDD. Any ideas? java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise

Futures timed out during unpersist

2015-01-16 Thread Kevin (Sangwoo) Kim
Hi experts, I got an error during unpersist RDD. Any ideas? java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223

Unpersist

2014-09-11 Thread Deep Pradhan
I want to create a temporary variables in a spark code. Can I do this? for (i - num) { val temp = .. { do something } temp.unpersist() } Thank You

Re: Unpersist

2014-09-11 Thread Akhil Das
like this? var temp = ... for (i - num) { temp = .. { do something } temp.unpersist() } Thanks Best Regards On Thu, Sep 11, 2014 at 3:26 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: I want to create a temporary variables in a spark code. Can I do this? for (i - num) {

Re: Unpersist

2014-09-11 Thread Deep Pradhan
After every loop I want the temp variable to cease to exist On Thu, Sep 11, 2014 at 4:33 PM, Akhil Das ak...@sigmoidanalytics.com wrote: like this? var temp = ... for (i - num) { temp = .. { do something } temp.unpersist() } Thanks Best Regards On Thu, Sep 11, 2014

Regarding function unpersist on rdd

2014-09-02 Thread Zijing Guo
Hello, Can someone enlighten me regarding whether call unpersist on a rdd is expensive? what is the best solution to uncache the cached rdd? Thanks Edwin

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim
Maybe It would be nice that unpersist() ‘triggers’ the computations of other rdds that depends on it but not yet computed. The pseudo code can be as follows: unpersist() { if (this rdd has not been persisted) return; for (all rdds that depends on this rdd but not yet computed

Re: Question about RDD cache, unpersist, materialization

2014-06-12 Thread Nicholas Chammas
FYI: Here is a related discussion http://apache-spark-user-list.1001560.n3.nabble.com/Persist-and-unpersist-td6437.html about this. On Thu, Jun 12, 2014 at 8:10 PM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: Maybe It would be nice that unpersist() ‘triggers’ the computations

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim
Currently I use rdd.count() for forceful computation, as Nick Pentreath suggested. I think that it will be nice to have a method that forcefully computes a rdd, so that the unnecessary rdds are safely unpersist()ed. Let’s think a case that a rdd_a is a parent of both: (1) a short-term

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim
(I¡¯ve clarified the statement (1) of my previous mail. See below.) From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, June 13, 2014 10:05 AM To: user@spark.apache.org Subject: RE: Question about RDD cache, unpersist, materialization Currently I use

RE: Question about RDD cache, unpersist, materialization

2014-06-11 Thread Nick Pentreath
If you want to force materialization use .count() Also if you can simply don't unpersist anything, unless you really need to free the memory  — Sent from Mailbox On Wed, Jun 11, 2014 at 5:13 AM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: BTW, it is possible that rdd.first

Question about RDD cache, unpersist, materialization

2014-06-10 Thread innowireless TaeYun Kim
Hi, What I (seems to) know about RDD persisting API is as follows: - cache() and persist() is not an action. It only does a marking. - unpersist() is also not an action. It only removes a marking. But if the rdd is already in memory, it is unloaded. And there seems no API to forcefully

RE: Question about RDD cache, unpersist, materialization

2014-06-10 Thread innowireless TaeYun Kim
Subject: Question about RDD cache, unpersist, materialization Hi, What I (seems to) know about RDD persisting API is as follows: - cache() and persist() is not an action. It only does a marking. - unpersist() is also not an action. It only removes a marking. But if the rdd is already in memory

Persist and unpersist

2014-05-27 Thread Daniel Darabos
I keep bumping into a problem with persisting RDDs. Consider this (silly) example: def everySecondFromBehind(input: RDD[Int]): RDD[Int] = { val count = input.count if (count % 2 == 0) { return input.filter(_ % 2 == 1) } else { return input.filter(_ % 2 == 0) } } The situation is

Re: Persist and unpersist

2014-05-27 Thread Nicholas Chammas
Daniel, Is SPARK-1103 https://issues.apache.org/jira/browse/SPARK-1103 related to your example? Automatic unpersist()-ing of unreferenced RDDs would be nice. Nick ​ On Tue, May 27, 2014 at 12:28 PM, Daniel Darabos daniel.dara...@lynxanalytics.com wrote: I keep bumping into a problem

Re: Persist and unpersist

2014-05-27 Thread Ankur Dave
I think what's desired here is for input to be unpersisted automatically as soon as result is materialized. I don't think there's currently a way to do this, but the usual workaround is to force result to be materialized immediately and then unpersist input: input.cache()val count