Hi,
I'm puzzling over the following problem: when I cache a small sample of a
big dataframe, the small dataframe is recomputed when selecting a column
(but not if show() or count() is invoked).
Why is that so and how can I avoid recomputation of the small sample
dataframe?
More details:
- I
We will try to address this before Spark 1.5 is released:
https://issues.apache.org/jira/browse/SPARK-9141
On Tue, Jul 28, 2015 at 11:50 AM, Kristina Rogale Plazonic kpl...@gmail.com
wrote:
Hi,
I'm puzzling over the following problem: when I cache a small sample of a
big dataframe, the