[jira] [Commented] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp

2023-07-15 Thread Gurpreet Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743486#comment-17743486
 ] 

Gurpreet Singh commented on SPARK-42027:


[~gurwls223] Got it. Any pointers on where to get started? I am pretty new to 
this codebase and wanted to get involved. Thanks 

 

> CreateDataframe from Pandas with Struct and Timestamp
> -
>
> Key: SPARK-42027
> URL: https://issues.apache.org/jira/browse/SPARK-42027
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> The following should be supported and correctly truncate the nanosecond 
> timestamps.
> {code:python}
> from datetime import datetime, timezone, timedelta
> from pandas import Timestamp
> ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, 
> tz=timezone(timedelta(hours=-8)))
> d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]})
> spark.createDataFrame(d).collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp

2023-07-15 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743485#comment-17743485
 ] 

Hyukjin Kwon commented on SPARK-42027:
--

I think it will require a pretty large fix to support this correctly from my 
cursory look (might be wrong). Please go ahead for a PR [~gdhuper] 

> CreateDataframe from Pandas with Struct and Timestamp
> -
>
> Key: SPARK-42027
> URL: https://issues.apache.org/jira/browse/SPARK-42027
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> The following should be supported and correctly truncate the nanosecond 
> timestamps.
> {code:python}
> from datetime import datetime, timezone, timedelta
> from pandas import Timestamp
> ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, 
> tz=timezone(timedelta(hours=-8)))
> d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]})
> spark.createDataFrame(d).collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp

2023-07-15 Thread Gurpreet Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743475#comment-17743475
 ] 

Gurpreet Singh commented on SPARK-42027:


[~grundprinzip-db] [~gurwls223] Is this issue up for grabs? I would like to 
take this on. Thanks 

> CreateDataframe from Pandas with Struct and Timestamp
> -
>
> Key: SPARK-42027
> URL: https://issues.apache.org/jira/browse/SPARK-42027
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> The following should be supported and correctly truncate the nanosecond 
> timestamps.
> {code:python}
> from datetime import datetime, timezone, timedelta
> from pandas import Timestamp
> ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, 
> tz=timezone(timedelta(hours=-8)))
> d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]})
> spark.createDataFrame(d).collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42027) CreateDataframe from Pandas with Struct and Timestamp

2023-01-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676418#comment-17676418
 ] 

Hyukjin Kwon commented on SPARK-42027:
--

Converted to a general issue in PySpark.

> CreateDataframe from Pandas with Struct and Timestamp
> -
>
> Key: SPARK-42027
> URL: https://issues.apache.org/jira/browse/SPARK-42027
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> The following should be supported and correctly truncate the nanosecond 
> timestamps.
> {code:python}
> from datetime import datetime, timezone, timedelta
> from pandas import Timestamp
> ts=Timestamp(year=2019, month=1, day=1, nanosecond=500, 
> tz=timezone(timedelta(hours=-8)))
> d = pd.DataFrame({"col1": [1], "col2": [{"a":1, "b":2.32, "c":ts}]})
> spark.createDataFrame(d).collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org