[ https://issues.apache.org/jira/browse/SPARK-36934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-36934: --------------------------------- Description: This is tested with master build 04.10.21 {code} df = ps.DataFrame({'year': ['2015-2-4', '2016-3-5'], 'month': [2, 3], 'day': [4, 5], 'test': [1, 2]}) df["year"] = ps.to_datetime(df["year"]) df.info() <class 'pyspark.pandas.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 2 non-null datetime64 1 month 2 non-null int64 2 day 2 non-null int64 3 test 2 non-null int64 dtypes: datetime64(1), int64(3) spark_df_date = df.to_spark() spark_df_date.printSchema() root |-- year: timestamp (nullable = true) |-- month: long (nullable = false) |-- day: long (nullable = false) |-- test: long (nullable = false) spark_df_date.write.parquet("s3a://falk0509/spark_df_date.parquet") {code} Load the files in to Apache drill I use docker apache/drill:master-openjdk-14 SELECT * FROM cp.`/data/spark_df_date.*` It print's year {code} \x00\x00\x00\x00\x00\x00\x00\x00\xE2}%\x00 \x00\x00\x00\x00\x00\x00\x00\x00m\x7F%\x00 {code} The rest of the columns are ok. So is this a spark problem or Apache drill? was: This is tested with master build 04.10.21 df = ps.DataFrame({'year': ['2015-2-4', '2016-3-5'], 'month': [2, 3], 'day': [4, 5], 'test': [1, 2]}) df["year"] = ps.to_datetime(df["year"]) df.info() <class 'pyspark.pandas.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 2 non-null datetime64 1 month 2 non-null int64 2 day 2 non-null int64 3 test 2 non-null int64 dtypes: datetime64(1), int64(3) spark_df_date = df.to_spark() spark_df_date.printSchema() root |-- year: timestamp (nullable = true) |-- month: long (nullable = false) |-- day: long (nullable = false) |-- test: long (nullable = false) spark_df_date.write.parquet("s3a://falk0509/spark_df_date.parquet") Load the files in to Apache drill I use docker apache/drill:master-openjdk-14 SELECT * FROM cp.`/data/spark_df_date.*` It print's year \x00\x00\x00\x00\x00\x00\x00\x00\xE2}%\x00 \x00\x00\x00\x00\x00\x00\x00\x00m\x7F%\x00 The rest of the columns are ok. So is this a spark problem or Apache drill? > Timestamp are written as array bytes. > ------------------------------------- > > Key: SPARK-36934 > URL: https://issues.apache.org/jira/browse/SPARK-36934 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Bjørn Jørgensen > Priority: Major > > This is tested with master build 04.10.21 > {code} > df = ps.DataFrame({'year': ['2015-2-4', '2016-3-5'], > 'month': [2, 3], > 'day': [4, 5], > 'test': [1, 2]}) > df["year"] = ps.to_datetime(df["year"]) > df.info() > <class 'pyspark.pandas.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data > columns (total 4 columns): # Column Non-Null Count Dtype --- ------ > -------------- ----- 0 year 2 non-null datetime64 1 month 2 non-null int64 2 > day 2 non-null int64 3 test 2 non-null int64 dtypes: datetime64(1), int64(3) > spark_df_date = df.to_spark() > spark_df_date.printSchema() > root > |-- year: timestamp (nullable = true) > |-- month: long (nullable = false) > |-- day: long (nullable = false) > |-- test: long (nullable = false) > spark_df_date.write.parquet("s3a://falk0509/spark_df_date.parquet") > {code} > Load the files in to Apache drill I use docker apache/drill:master-openjdk-14 > > SELECT * FROM cp.`/data/spark_df_date.*` > It print's > year > {code} > \x00\x00\x00\x00\x00\x00\x00\x00\xE2}%\x00 > \x00\x00\x00\x00\x00\x00\x00\x00m\x7F%\x00 > {code} > > The rest of the columns are ok. > So is this a spark problem or Apache drill? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org