[ https://issues.apache.org/jira/browse/SPARK-41455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xinrong Meng updated SPARK-41455: --------------------------------- Description: When implementing date/timestamp functions, we notice inconsistent dtypes with PySpark, as shown below. {code:python} >> sdf.select(SF.current_timestamp()).toPandas().dtypes current_timestamp() datetime64[ns] dtype: object >>> cdf.select(CF.current_timestamp()).toPandas().dtypes current_timestamp() datetime64[ns, America/Los_Angeles] {code} Affected functions include: {code:python} to_timestamp, from_utc_timestamp, to_utc_timestamp, timestamp_seconds, current_timestamp, date_trunc {code} We may have to implement `is_timestamp_ntz_preferred` for Connect. After the fix, tests of those date/timestamp functions which use `compare_by_show` should be switched to `toPandas` comparison. was: When implementing date/timestamp functions, we notice inconsistent dtypes with PySpark, as shown below. {code:python} >> sdf.select(SF.current_timestamp()).toPandas().dtypes current_timestamp() datetime64[ns] dtype: object >>> cdf.select(CF.current_timestamp()).toPandas().dtypes current_timestamp() datetime64[ns, America/Los_Angeles] {code} Affected functions include: `to_timestamp, from_utc_timestamp, to_utc_timestamp, timestamp_seconds, current_timestamp, date_trunc`. We may have to implement `is_timestamp_ntz_preferred` for Connect. After the fix, tests of those date/timestamp functions which use `compare_by_show` should be switched to `toPandas` comparison. > Resolve dtypes inconsistencies of date/timestamp functions > ---------------------------------------------------------- > > Key: SPARK-41455 > URL: https://issues.apache.org/jira/browse/SPARK-41455 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Xinrong Meng > Priority: Major > > When implementing date/timestamp functions, we notice inconsistent dtypes > with PySpark, as shown below. > {code:python} > >> sdf.select(SF.current_timestamp()).toPandas().dtypes > current_timestamp() datetime64[ns] > dtype: object > >>> cdf.select(CF.current_timestamp()).toPandas().dtypes > current_timestamp() datetime64[ns, America/Los_Angeles] > {code} > Affected functions include: > {code:python} > to_timestamp, from_utc_timestamp, to_utc_timestamp, timestamp_seconds, > current_timestamp, date_trunc > {code} > We may have to implement `is_timestamp_ntz_preferred` for Connect. > After the fix, tests of those date/timestamp functions which use > `compare_by_show` should be switched to `toPandas` comparison. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org