This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9e4411e2450 [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring 9e4411e2450 is described below commit 9e4411e2450d0503933626207b5e03308c30bc72 Author: Paul Staab <paulst...@users.noreply.github.com> AuthorDate: Wed Oct 25 07:36:15 2023 -0500 [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring ### What changes were proposed in this pull request? Corrects the docstring `DataFrame.cache` to give the correct storage level after it changed with Spark 3.0. It seems that the docstring of `DataFrame.persist` was updated, but `cache` was forgotten. ### Why are the changes needed? The doctoring claims that `cache` uses serialised storage, but it actually uses deserialised storage. I confirmed that this is still the case with Spark 3.5.0 using the example code from the Jira ticket. ### Does this PR introduce _any_ user-facing change? Yes, the docstring changes. ### How was this patch tested? The Github actions workflow succeeded. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43229 from paulstaab/SPARK-40154. Authored-by: Paul Staab <paulst...@users.noreply.github.com> Signed-off-by: Sean Owen <sro...@gmail.com> (cherry picked from commit 94607dd001b133a25dc9865f25b3f9e7f5a5daa3) Signed-off-by: Sean Owen <sro...@gmail.com> --- python/pyspark/sql/dataframe.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 30ed73d3c47..5707ae2a31f 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1485,7 +1485,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): self.rdd.foreachPartition(f) # type: ignore[arg-type] def cache(self) -> "DataFrame": - """Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK`). + """Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK_DESER`). .. versionadded:: 1.3.0 @@ -1494,7 +1494,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Notes ----- - The default storage level has changed to `MEMORY_AND_DISK` to match Scala in 2.0. + The default storage level has changed to `MEMORY_AND_DISK_DESER` to match Scala in 3.0. Returns ------- --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org