[ https://issues.apache.org/jira/browse/SPARK-38353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503770#comment-17503770 ]
Apache Spark commented on SPARK-38353: -------------------------------------- User 'heyihong' has created a pull request for this issue: https://github.com/apache/spark/pull/35790 > Instrument __enter__ and __exit__ magic methods for pandas API on Spark > ----------------------------------------------------------------------- > > Key: SPARK-38353 > URL: https://issues.apache.org/jira/browse/SPARK-38353 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.2.1 > Reporter: Yihong He > Assignee: Yihong He > Priority: Minor > Fix For: 3.3.0, 3.2.2 > > > Create the ticket since instrumenting _{_}enter{_}_ and _{_}exit{_}_ magic > methods for pandas API on Spark can help improve accuracy of the usage data. > Besides, we are interested in extending the pandas-on-Spark usage logger to > other PySpark modules in the future so it will help improve accuracy of usage > data of other PySpark modules. > For example, for the following code: > > {code:java} > pdf = pd.DataFrame( > [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)], columns=["dogs", "cats"] > ) > psdf = ps.from_pandas(pdf) > with psdf.spark.cache() as cached_df: > self.assert_eq(isinstance(cached_df, CachedDataFrame), True) > self.assert_eq( > repr(cached_df.spark.storage_level), repr(StorageLevel(True, True, > False, True)) > ){code} > > pandas-on-Spark usage logger records the internal call > [self.spark.unpersist()|https://github.com/apache/spark/blob/master/python/pyspark/pandas/frame.py#L12518] > since __enter__ and __exit__ methods of > [CachedDataFrame|https://github.com/apache/spark/blob/master/python/pyspark/pandas/frame.py#L12492] > are not instrumented. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org