GitHub user 351zyf opened a pull request: https://github.com/apache/spark/pull/22888
SPARK-25881 add parametere coerce_float https://issues.apache.org/jira/browse/SPARK-25881 ## What changes were proposed in this pull request? when using pyspark dataframe.toPandas() the type decimal in spark df turn to object in pandas dataframe >>> for i in df_spark.dtypes: ... print(i) ... ('dt', 'string') ('cost_sum', 'decimal(38,3)') ('req_sum', 'bigint') ('pv_sum', 'bigint') ('click_sum', 'bigint') >>> df_pd = df_spark.toPandas() >>> df_pd.dtypes dt object cost_sum object req_sum int64 pv_sum int64 click_sum int64 dtype: object the paramater coerce_float in pd.DataFrame.from_records will handle type decimal.Decimal to floating point. >>> arr = df_spark.collect() >>> df2_pd = pd.DataFrame.from_records(df_spark.collect(), columns=df_spark.columns, coerce_float=True) >>> df2_pd.dtypes dt object cost_sum float64 req_sum int64 pv_sum int64 click_sum int64 dtype: object (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/351zyf/spark SPARK-25881 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22888.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22888 ---- commit edc2a6173c89315afddefbd0c29cfd98f80049f8 Author: zhangyefei <zhangyefei@...> Date: 2018-10-30T07:22:41Z add parametere coerce_float ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org