[ https://issues.apache.org/jira/browse/SPARK-21163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438535#comment-16438535 ]
Ed Lee commented on SPARK-21163: -------------------------------- Had a question: in Spark 2.2.1, if I do a .toPandas on a Spark DataFrame with column integer type, the dtypes in pandas is int64. Whereas in in Spark 2.3.0 they ints are converted to int32. I ran the below in Spark 2.2.1 and 2.3.0: ``` df = spark.sparkContext.parallelize([(i, ) for i in [1, 2, 3]]).toDF(["a"]).select(sf.col('a').cast('int')).toPandas() df.dtypes ``` Is this intended? We ran into as we have unit tests in a project that passed in Spark 2.2.1 that fail in Spark 2.3.0 Left a comment on github: [https://github.com/apache/spark/pull/18378/files/d8ba5452539c5fd5b650b7f5e51e467aabc33739#diff-6fc344560230bf0ef711bb9b5573f1faR1775] > DataFrame.toPandas should respect the data type > ----------------------------------------------- > > Key: SPARK-21163 > URL: https://issues.apache.org/jira/browse/SPARK-21163 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 2.3.0 > Reporter: Wenchen Fan > Assignee: Wenchen Fan > Priority: Major > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org