[GitHub] spark pull request #22891: [SPARK-25881][pyspark] df.toPandas() convert deci...
Github user 351zyf closed the pull request at: https://github.com/apache/spark/pull/22891 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22891: SPARK-25881
GitHub user 351zyf opened a pull request: https://github.com/apache/spark/pull/22891 SPARK-25881 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/22888 decimal type should consider as a number but not object (string) (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/351zyf/spark SPARK-25881 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22891.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22891 commit 5a73e1710bdb663cfd6fa4a3f228737dc309e0e4 Author: zhangyefei Date: 2018-10-30T09:28:37Z [SPARK-25881] deal with decimal type commit 403b4d00934de8e51b2c19c76170624fb91b1fb6 Author: zhangyefei Date: 2018-10-30T07:22:41Z add parametere coerce_float commit 11b7cf47e83018c1d9a4ae9bf8df4f507680e0c4 Author: zhangyefei Date: 2018-10-30T07:52:39Z comment commit 891a25f344db6d476e7d0b2857b09943d6c84720 Author: zhangyefei Date: 2018-10-30T09:35:28Z [SPARK-25881] deal with decimal type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22888: SPARK-25881
Github user 351zyf closed the pull request at: https://github.com/apache/spark/pull/22888 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22888: SPARK-25881
Github user 351zyf commented on the issue: https://github.com/apache/spark/pull/22888 OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22888: SPARK-25881
Github user 351zyf commented on the issue: https://github.com/apache/spark/pull/22888 > Then, you can convert the type into double or floats in Spark DataFrame. This is super easily able to work around at Pandas DataFrame or Spark's DataFrame. I don't think we should add this flag. > > BTW, the same feature should be added to when Arrow optimization is enabled as well. Or can we correct this conversion in function dataframe._to_corrected_pandas_type ? Converting decimal type manually everytime sounds not good.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22888: SPARK-25881
Github user 351zyf commented on the issue: https://github.com/apache/spark/pull/22888 and this also have no effect on timestamp values. tested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22888: SPARK-25881
Github user 351zyf commented on the issue: https://github.com/apache/spark/pull/22888 > I think you can just manually convert from Pandas DataFrame, no? If I'm using function toPandas, I dont think decimal to object is right. Isn't decimal values usually a value to calculate? I mean, numbers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22888: SPARK-25881
GitHub user 351zyf opened a pull request: https://github.com/apache/spark/pull/22888 SPARK-25881 add parametere coerce_float https://issues.apache.org/jira/browse/SPARK-25881 ## What changes were proposed in this pull request? when using pyspark dataframe.toPandas() the type decimal in spark df turn to object in pandas dataframe >>> for i in df_spark.dtypes: ... print(i) ... ('dt', 'string') ('cost_sum', 'decimal(38,3)') ('req_sum', 'bigint') ('pv_sum', 'bigint') ('click_sum', 'bigint') >>> df_pd = df_spark.toPandas() >>> df_pd.dtypes dt object cost_sum object req_sum int64 pv_sumint64 click_sum int64 dtype: object the paramater coerce_float in pd.DataFrame.from_records will handle type decimal.Decimal to floating point. >>> arr = df_spark.collect() >>> df2_pd = pd.DataFrame.from_records(df_spark.collect(), columns=df_spark.columns, coerce_float=True) >>> df2_pd.dtypes dtobject cost_sum float64 req_sumint64 pv_sum int64 click_sum int64 dtype: object (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/351zyf/spark SPARK-25881 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22888.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22888 commit edc2a6173c89315afddefbd0c29cfd98f80049f8 Author: zhangyefei Date: 2018-10-30T07:22:41Z add parametere coerce_float --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16485: [SPARK-19099] correct the wrong time display in history ...
Github user 351zyf commented on the issue: https://github.com/apache/spark/pull/16485 But the time display on history server web UI is not correct. It is 8 hours eralier than the actual time here. Am I using the wrong configuration ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16485: [SPARK-19099] correct the wrong time display in h...
GitHub user 351zyf opened a pull request: https://github.com/apache/spark/pull/16485 [SPARK-19099] correct the wrong time display in history server web UI JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19099 Correct the wrong job start/end time display in spark history server web UI. I am a user from China. The job time is 8 hour less than the actual time due to the hard coding of rawOffsetValue 0. You can merge this pull request into a Git repository by running: $ git pull https://github.com/351zyf/spark zyf_b1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16485 commit 7c80d96fa6c1dcea07eec56363a115a9f145e6eb Author: Johnson Zhang <johnson...@qq.com> Date: 2017-01-06T06:44:34Z correct the wrong time display in history server web UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org