[ https://issues.apache.org/jira/browse/SPARK-32423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432744#comment-17432744 ]
Jacob Duenke edited comment on SPARK-32423 at 10/21/21, 11:40 PM: ------------------------------------------------------------------ I'm looking for this same thing a year later. I believe what OP was referring to is the DataFrame class. Some 40 or so methods return a new DataFrame object like below. https://github.com/apache/spark/blob/dc607911a91c515f23d8192f389e7e54e785f94d/python/pyspark/sql/dataframe.py#L761 {code:python} def limit(self, num: int) -> "DataFrame": """Limits the result count to the number specified. .. versionadded:: 1.3.0 Examples -------- >>> df.limit(1).collect() [Row(age=2, name='Alice')] >>> df.limit(0).collect() [] """ jdf = self._jdf.limit(num) return DataFrame(jdf, self.sql_ctx) {code} If these methods returned *type(self)*, it would be easier to extend the class DataFrame for our own uses. Otherwise, we are forced to re-copy all these 40 or so methods to ensure they return our extended class, "MyDataFrameClass". {code:python} def limit(self, num: int) -> "DataFrame": """Limits the result count to the number specified. .. versionadded:: 1.3.0 Examples -------- >>> df.limit(1).collect() [Row(age=2, name='Alice')] >>> df.limit(0).collect() [] """ jdf = self._jdf.limit(num) return type(self)(jdf, self.sql_ctx) {code} was (Author: halfbytemedic): I'm looking for this same thing a year later. I beleive what OP was referring to is the DataFrame class. Some 40 or so methods return a new DataFrame object like below. https://github.com/apache/spark/blob/dc607911a91c515f23d8192f389e7e54e785f94d/python/pyspark/sql/dataframe.py#L761 ```python def limit(self, num: int) -> "DataFrame": """Limits the result count to the number specified. .. versionadded:: 1.3.0 Examples -------- >>> df.limit(1).collect() [Row(age=2, name='Alice')] >>> df.limit(0).collect() [] """ jdf = self._jdf.limit(num) return DataFrame(jdf, self.sql_ctx) ``` If these methods returned `type(self)`, it would be easier to extend the class DataFrame for our own uses. Otherwise, we are forced to re-copy all these 40 or so methods to ensure they return our extended class, "MyDataFrameClass". > class 'DataFrame' returns instance of type(self) instead of DataFrame > ---------------------------------------------------------------------- > > Key: SPARK-32423 > URL: https://issues.apache.org/jira/browse/SPARK-32423 > Project: Spark > Issue Type: Wish > Components: PySpark > Affects Versions: 2.4.6, 3.0.0 > Reporter: Timothy > Priority: Minor > > To allow for appropriate child classing of DataFrame, I propose the following > change: > class 'DataFrame' returns instance of type(self) instead of typeDataFrame > > Therefore child classes using methods such as '.limit()' will return an > instance of the child class. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org