[ 
https://issues.apache.org/jira/browse/SPARK-39939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-39939.
-----------------------------------
    Fix Version/s: 3.4.0
       Resolution: Fixed

Issue resolved by pull request 37366
[https://github.com/apache/spark/pull/37366]

> shift() func need support periods=0
> -----------------------------------
>
>                 Key: SPARK-39939
>                 URL: https://issues.apache.org/jira/browse/SPARK-39939
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.2.2
>         Environment: Pandas: 1.3.X/1.4.X
> PySpark: Master
>            Reporter: bo zhao
>            Assignee: bo zhao
>            Priority: Minor
>             Fix For: 3.4.0
>
>
> PySpark raises Error when we call shift func with periods=0.
> The behavior of Pandas will return a same copy for the said obj.
>  
> PySpark:
> {code:java}
> >>> df = ps.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18, 33, 
> >>> 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3'])
> >>> df.Col1.shift(periods=3)
> 22/08/02 09:37:51 WARN WindowExec: No Partition Defined for Window operation! 
> Moving all data to a single partition, this can cause serious performance 
> degradation.
> 22/08/02 09:37:51 WARN WindowExec: No Partition Defined for Window operation! 
> Moving all data to a single partition, this can cause serious performance 
> degradation.
> 22/08/02 09:37:51 WARN WindowExec: No Partition Defined for Window operation! 
> Moving all data to a single partition, this can cause serious performance 
> degradation.
> 22/08/02 09:37:52 WARN WindowExec: No Partition Defined for Window operation! 
> Moving all data to a single partition, this can cause serious performance 
> degradation.
> 22/08/02 09:37:52 WARN WindowExec: No Partition Defined for Window operation! 
> Moving all data to a single partition, this can cause serious performance 
> degradation.
> 0     NaN
> 1     NaN
> 2     NaN
> 3    10.0
> 4    20.0
> Name: Col1, dtype: float64
> >>> df.Col1.shift(periods=0)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/spark/spark/python/pyspark/pandas/base.py", line 1170, in shift
>     return self._shift(periods, fill_value).spark.analyzed
>   File "/home/spark/spark/python/pyspark/pandas/spark/accessors.py", line 
> 256, in analyzed
>     return first_series(DataFrame(self._data._internal.resolved_copy))
>   File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in 
> wrapped_lazy_property
>     setattr(self, attr_name, fn(self))
>   File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1173, in 
> resolved_copy
>     sdf = self.spark_frame.select(self.spark_columns + list(HIDDEN_COLUMNS))
>   File "/home/spark/spark/python/pyspark/sql/dataframe.py", line 2073, in 
> select
>     jdf = self._jdf.select(self._jcols(*cols))
>   File 
> "/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/py4j/java_gateway.py",
>  line 1321, in __call__
>     return_value = get_return_value(
>   File "/home/spark/spark/python/pyspark/sql/utils.py", line 196, in deco
>     raise converted from None
> pyspark.sql.utils.AnalysisException: Cannot specify window frame for lag 
> function
>  {code}
> Pandas:
> {code:java}
> >>> pdf = pd.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18, 
> >>> 33, 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3'])
> >>> pdf.Col1.shift(periods=3)
> 0     NaN
> 1     NaN
> 2     NaN
> 3    10.0
> 4    20.0
> Name: Col1, dtype: float64
> >>> pdf.Col1.shift(periods=0)
> 0    10
> 1    20
> 2    15
> 3    30
> 4    45
> Name: Col1, dtype: int64
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to