Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19607#discussion_r148601935 --- Diff: python/pyspark/serializers.py --- @@ -274,12 +278,13 @@ def load_stream(self, stream): """ Deserialize ArrowRecordBatches to an Arrow table and return as a list of pandas.Series. """ - from pyspark.sql.types import _check_dataframe_localize_timestamps + from pyspark.sql.types import _check_dataframe_localize_timestamps, from_arrow_schema import pyarrow as pa reader = pa.open_stream(stream) + schema = from_arrow_schema(reader.schema) for batch in reader: # NOTE: changed from pa.Columns.to_pandas, timezone issue in conversion fixed in 0.7.1 - pdf = _check_dataframe_localize_timestamps(batch.to_pandas()) + pdf = _check_dataframe_localize_timestamps(batch.to_pandas(), schema, self._timezone) --- End diff -- Oh, maybe I misunderstood the purpose of this conf "spark.sql.execution.pandas.respectSessionTimeZone". If that is true then what is the behavior of Spark? 1) convert timestamps in Pandas to remove the timezone and localize to SESSION_LOCAL_TIMEZONE 2) show Pandas timestamps with SESSION_LOCAL_TIMEZONE set as the timezone It seems this change is doing (1), but what's wrong with doing (2)? I think that would be a lot cleaner
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org