[ 
https://issues.apache.org/jira/browse/SPARK-32423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432744#comment-17432744
 ] 

Jacob Duenke edited comment on SPARK-32423 at 10/21/21, 11:40 PM:
------------------------------------------------------------------

I'm looking for this same thing a year later. I believe what OP was referring 
to is the DataFrame class. Some 40 or so methods return a new DataFrame object 
like below. 
https://github.com/apache/spark/blob/dc607911a91c515f23d8192f389e7e54e785f94d/python/pyspark/sql/dataframe.py#L761

{code:python}
def limit(self, num: int) -> "DataFrame":
    """Limits the result count to the number specified.
    .. versionadded:: 1.3.0
    Examples
    --------
    >>> df.limit(1).collect()
    [Row(age=2, name='Alice')]
    >>> df.limit(0).collect()
    []
    """
    jdf = self._jdf.limit(num)
    return DataFrame(jdf, self.sql_ctx)
{code}

If these methods returned *type(self)*, it would be easier to extend the class 
DataFrame for our own uses. Otherwise, we are forced to re-copy all these 40 or 
so methods to ensure they return our extended class, "MyDataFrameClass".

{code:python}
def limit(self, num: int) -> "DataFrame":
    """Limits the result count to the number specified.
    .. versionadded:: 1.3.0
    Examples
    --------
    >>> df.limit(1).collect()
    [Row(age=2, name='Alice')]
    >>> df.limit(0).collect()
    []
    """
    jdf = self._jdf.limit(num)
    return type(self)(jdf, self.sql_ctx)
{code}


was (Author: halfbytemedic):
I'm looking for this same thing a year later. I beleive what OP was referring 
to is the DataFrame class. Some 40 or so methods return a new DataFrame object 
like below. 
https://github.com/apache/spark/blob/dc607911a91c515f23d8192f389e7e54e785f94d/python/pyspark/sql/dataframe.py#L761

```python
def limit(self, num: int) -> "DataFrame":
    """Limits the result count to the number specified.
    .. versionadded:: 1.3.0
    Examples
    --------
    >>> df.limit(1).collect()
    [Row(age=2, name='Alice')]
    >>> df.limit(0).collect()
    []
    """
    jdf = self._jdf.limit(num)
    return DataFrame(jdf, self.sql_ctx)
```
 
If these methods returned `type(self)`, it would be easier to extend the class 
DataFrame for our own uses. Otherwise, we are forced to re-copy all these 40 or 
so methods to ensure they return our extended class, "MyDataFrameClass".

> class 'DataFrame' returns instance of type(self) instead of DataFrame 
> ----------------------------------------------------------------------
>
>                 Key: SPARK-32423
>                 URL: https://issues.apache.org/jira/browse/SPARK-32423
>             Project: Spark
>          Issue Type: Wish
>          Components: PySpark
>    Affects Versions: 2.4.6, 3.0.0
>            Reporter: Timothy
>            Priority: Minor
>
> To allow for appropriate child classing of DataFrame, I propose the following 
> change:
> class 'DataFrame' returns instance of type(self) instead of  typeDataFrame 
>  
> Therefore child classes using methods such as '.limit()' will return an 
> instance of the child class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to