[ 
https://issues.apache.org/jira/browse/SPARK-38627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511763#comment-17511763
 ] 

Prakhar Sandhu commented on SPARK-38627:
----------------------------------------

Hi [~hyukjin.kwon] , Nice ^^
 # Did it work on spark 3.3?
 # What environment are you using?

I have set up a conda environment in my local system with spark 3.2. 

I specified the numpy explicitly

 
{code:java}
 df = pd.DataFrame({ 'Date1': rng.to_numpy,  'Date2': rng.to_numpy})
  File 
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pyspark\pandas\frame.py", 
line 519, in __init__
    pdf = pd.DataFrame(data=data, index=index, columns=columns, dtype=dtype, 
copy=copy)
  File 
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\frame.py", line 
435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File 
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\internals\construction.py",
 line 254, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File 
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\internals\construction.py",
 line 64, in arrays_to_mgr
    index = extract_index(arrays)
  File 
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\internals\construction.py",
 line 355, in extract_index
    raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index {code}
 

 

 

 

> TypeError: Datetime subtraction can only be applied to datetime series
> ----------------------------------------------------------------------
>
>                 Key: SPARK-38627
>                 URL: https://issues.apache.org/jira/browse/SPARK-38627
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.1
>            Reporter: Prakhar Sandhu
>            Priority: Major
>
> I am trying to replace pandas with pyspark.pandas library, when I tried this :
> pdf is a pyspark.pandas dataframe
> {code:java}
> pdf["date_diff"] = (pdf["date1"] - pdf["date2"])/pdf.Timedelta(days=30){code}
> I got the below error :
> {code:java}
> File 
> "C:\Users\abc\Anaconda3\envs\test\lib\site-packages\pyspark\pandas\data_type_ops\datetime_ops.py",
>  line 75, in sub
> raise TypeError("Datetime subtraction can only be applied to datetime 
> series.") {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to