Github user wesm commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r146923941 --- Diff: python/pyspark/serializers.py --- @@ -224,7 +225,13 @@ def _create_batch(series): # If a nullable integer series has been promoted to floating point with NaNs, need to cast # NOTE: this is not necessary with Arrow >= 0.7 def cast_series(s, t): - if t is None or s.dtype == t.to_pandas_dtype(): + if type(t) == pa.TimestampType: + # NOTE: convert to 'us' with astype here, unit ignored in `from_pandas` see ARROW-1680 + return _series_convert_timestamps_internal(s).values.astype('datetime64[us]') --- End diff -- I fixed the date/time-related casting bugs in pyarrow and added new cast implementations -- conversions from one timestamp unit to another in Arrow-land fail silently right now, this will all be in the 0.8.0 release landing hopefully the week of 11/6 or thereabouts https://github.com/apache/arrow/pull/1245
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org