[ https://issues.apache.org/jira/browse/SPARK-39939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruifeng Zheng resolved SPARK-39939. ----------------------------------- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37366 [https://github.com/apache/spark/pull/37366] > shift() func need support periods=0 > ----------------------------------- > > Key: SPARK-39939 > URL: https://issues.apache.org/jira/browse/SPARK-39939 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark > Affects Versions: 3.2.2 > Environment: Pandas: 1.3.X/1.4.X > PySpark: Master > Reporter: bo zhao > Assignee: bo zhao > Priority: Minor > Fix For: 3.4.0 > > > PySpark raises Error when we call shift func with periods=0. > The behavior of Pandas will return a same copy for the said obj. > > PySpark: > {code:java} > >>> df = ps.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18, 33, > >>> 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3']) > >>> df.Col1.shift(periods=3) > 22/08/02 09:37:51 WARN WindowExec: No Partition Defined for Window operation! > Moving all data to a single partition, this can cause serious performance > degradation. > 22/08/02 09:37:51 WARN WindowExec: No Partition Defined for Window operation! > Moving all data to a single partition, this can cause serious performance > degradation. > 22/08/02 09:37:51 WARN WindowExec: No Partition Defined for Window operation! > Moving all data to a single partition, this can cause serious performance > degradation. > 22/08/02 09:37:52 WARN WindowExec: No Partition Defined for Window operation! > Moving all data to a single partition, this can cause serious performance > degradation. > 22/08/02 09:37:52 WARN WindowExec: No Partition Defined for Window operation! > Moving all data to a single partition, this can cause serious performance > degradation. > 0 NaN > 1 NaN > 2 NaN > 3 10.0 > 4 20.0 > Name: Col1, dtype: float64 > >>> df.Col1.shift(periods=0) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/spark/spark/python/pyspark/pandas/base.py", line 1170, in shift > return self._shift(periods, fill_value).spark.analyzed > File "/home/spark/spark/python/pyspark/pandas/spark/accessors.py", line > 256, in analyzed > return first_series(DataFrame(self._data._internal.resolved_copy)) > File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in > wrapped_lazy_property > setattr(self, attr_name, fn(self)) > File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1173, in > resolved_copy > sdf = self.spark_frame.select(self.spark_columns + list(HIDDEN_COLUMNS)) > File "/home/spark/spark/python/pyspark/sql/dataframe.py", line 2073, in > select > jdf = self._jdf.select(self._jcols(*cols)) > File > "/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/py4j/java_gateway.py", > line 1321, in __call__ > return_value = get_return_value( > File "/home/spark/spark/python/pyspark/sql/utils.py", line 196, in deco > raise converted from None > pyspark.sql.utils.AnalysisException: Cannot specify window frame for lag > function > {code} > Pandas: > {code:java} > >>> pdf = pd.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18, > >>> 33, 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3']) > >>> pdf.Col1.shift(periods=3) > 0 NaN > 1 NaN > 2 NaN > 3 10.0 > 4 20.0 > Name: Col1, dtype: float64 > >>> pdf.Col1.shift(periods=0) > 0 10 > 1 20 > 2 15 > 3 30 > 4 45 > Name: Col1, dtype: int64 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org