[ 
https://issues.apache.org/jira/browse/SPARK-38627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511748#comment-17511748
 ] 

Prakhar Sandhu commented on SPARK-38627:
----------------------------------------

Hi [~hyukjin.kwon] , 

I am not sure if I would be able to share the full repo-code, but please try to 
run the below commands in spark 3.3. 

The below code snippet is running fine with pandas library but failed when I 
replaced pandas with pyspark.pandas : 

 
{code:java}
import pyspark.pandas as pd
import numpy as np

np.random.seed(0)

rng = pd.date_range('2015-02-24', periods=5, freq='T')
df = pd.DataFrame({ 'Date1': rng,  'Date2': rng}) 

print(df)

df["x"] = df["Date1"] - df["Date2"]

print(df) {code}

> TypeError: Datetime subtraction can only be applied to datetime series
> ----------------------------------------------------------------------
>
>                 Key: SPARK-38627
>                 URL: https://issues.apache.org/jira/browse/SPARK-38627
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.1
>            Reporter: Prakhar Sandhu
>            Priority: Major
>
> I am trying to replace pandas with pyspark.pandas library, when I tried this :
> pdf is a pyspark.pandas dataframe
> {code:java}
> pdf["date_diff"] = (pdf["date1"] - pdf["date2"])/pdf.Timedelta(days=30){code}
> I got the below error :
> {code:java}
> File 
> "C:\Users\abc\Anaconda3\envs\test\lib\site-packages\pyspark\pandas\data_type_ops\datetime_ops.py",
>  line 75, in sub
> raise TypeError("Datetime subtraction can only be applied to datetime 
> series.") {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to