I've not done this in Scala yet, but in PySpark I've run into a similar
issue where having too many dataframes cached does cause memory issues.
Unpersist by itself did not clear the memory usage, but rather setting the
variable equal to None allowed all the references to be cleared and the
memory
First of all, Thank you for your comments.
Actually, What I mean "update" is generate a new data frame with modified data.
The more detailed while loop will be something like below.
var continue = 1
var dfA = "a data frame"
dfA.persist
while (continue > 0) {
val temp = "modified dfA"
Yes, iterating over a dataframe and making changes is not uncommon.
Ofcourse RDDs, dataframes and datasets are immultable, but there is some
optimization in the optimizer that can potentially help to dampen the
effect/impact of creating a new rdd, df or ds.
Also, the use-case you cited is
I think most of the "big data" tools, like Spark and Hive, are not designed
to edit data. They are only designed to query data. I wonder in what
scenario you need to update large volume of data repetitively.
On Mon, Oct 17, 2016 at 2:00 PM Divya Gehlot
wrote:
> If my
If my understanding is correct about your query
In spark Dataframes are immutable , cant update the dataframe.
you have to create a new dataframe to update the current dataframe .
Thanks,
Divya
On 17 October 2016 at 09:50, Mungeol Heo wrote:
> Hello, everyone.
>
> As
Hello, everyone.
As I mentioned at the tile, I wonder that is spark a right tool for
updating a data frame repeatedly until there is no more date to
update.
For example.
while (if there was a updating) {
update a data frame A
}
If it is the right tool, then what is the best practice for this