[GitHub] spark pull request #22891: [SPARK-25881][pyspark] df.toPandas() convert deci...

2018-10-31 Thread 351zyf
Github user 351zyf closed the pull request at:

https://github.com/apache/spark/pull/22891


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22891: SPARK-25881

2018-10-30 Thread 351zyf
GitHub user 351zyf opened a pull request:

https://github.com/apache/spark/pull/22891

SPARK-25881

## What changes were proposed in this pull request?

https://github.com/apache/spark/pull/22888

decimal type should consider as a number but not object (string)

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/351zyf/spark SPARK-25881

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22891.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22891


commit 5a73e1710bdb663cfd6fa4a3f228737dc309e0e4
Author: zhangyefei 
Date:   2018-10-30T09:28:37Z

[SPARK-25881] deal with decimal type

commit 403b4d00934de8e51b2c19c76170624fb91b1fb6
Author: zhangyefei 
Date:   2018-10-30T07:22:41Z

add parametere coerce_float

commit 11b7cf47e83018c1d9a4ae9bf8df4f507680e0c4
Author: zhangyefei 
Date:   2018-10-30T07:52:39Z

comment

commit 891a25f344db6d476e7d0b2857b09943d6c84720
Author: zhangyefei 
Date:   2018-10-30T09:35:28Z

[SPARK-25881] deal with decimal type




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22888: SPARK-25881

2018-10-30 Thread 351zyf
Github user 351zyf closed the pull request at:

https://github.com/apache/spark/pull/22888


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread 351zyf
Github user 351zyf commented on the issue:

https://github.com/apache/spark/pull/22888
  
OK


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread 351zyf
Github user 351zyf commented on the issue:

https://github.com/apache/spark/pull/22888
  
> Then, you can convert the type into double or floats in Spark DataFrame. 
This is super easily able to work around at Pandas DataFrame or Spark's 
DataFrame. I don't think we should add this flag.
> 
> BTW, the same feature should be added to when Arrow optimization is 
enabled as well.

Or can we correct this conversion in function 
dataframe._to_corrected_pandas_type ? 
Converting decimal type manually everytime sounds not good..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread 351zyf
Github user 351zyf commented on the issue:

https://github.com/apache/spark/pull/22888
  
and this also have no effect on timestamp values.
tested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread 351zyf
Github user 351zyf commented on the issue:

https://github.com/apache/spark/pull/22888
  
> I think you can just manually convert from Pandas DataFrame, no?

If I'm using function toPandas, I dont think decimal to object is right. 
Isn't decimal values usually a value to calculate? I mean, numbers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22888: SPARK-25881

2018-10-30 Thread 351zyf
GitHub user 351zyf opened a pull request:

https://github.com/apache/spark/pull/22888

SPARK-25881

add parametere coerce_float
https://issues.apache.org/jira/browse/SPARK-25881

## What changes were proposed in this pull request?
when using pyspark  dataframe.toPandas() 
the type decimal in spark df turn to object in pandas dataframe

>>> for i in df_spark.dtypes:
...   print(i)
... 
('dt', 'string')
('cost_sum', 'decimal(38,3)')
('req_sum', 'bigint')
('pv_sum', 'bigint')
('click_sum', 'bigint')

>>> df_pd = df_spark.toPandas()

>>> df_pd.dtypes
dt   object
cost_sum object
req_sum   int64
pv_sumint64
click_sum int64
dtype: object

the paramater coerce_float in pd.DataFrame.from_records will handle type 
decimal.Decimal to floating point.

>>> arr = df_spark.collect()
>>> df2_pd = pd.DataFrame.from_records(df_spark.collect(), 
columns=df_spark.columns, coerce_float=True)
>>> df2_pd.dtypes
dtobject
cost_sum float64
req_sumint64
pv_sum int64
click_sum  int64
dtype: object

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/351zyf/spark SPARK-25881

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22888


commit edc2a6173c89315afddefbd0c29cfd98f80049f8
Author: zhangyefei 
Date:   2018-10-30T07:22:41Z

add parametere coerce_float




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16485: [SPARK-19099] correct the wrong time display in history ...

2017-01-06 Thread 351zyf
Github user 351zyf commented on the issue:

https://github.com/apache/spark/pull/16485
  
But the time display on history server web UI is not correct. It is 8 hours 
eralier than the actual time here.

Am I using the wrong configuration ? 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16485: [SPARK-19099] correct the wrong time display in h...

2017-01-05 Thread 351zyf
GitHub user 351zyf opened a pull request:

https://github.com/apache/spark/pull/16485

[SPARK-19099] correct the wrong time display in history server web UI

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19099

Correct the wrong job start/end time display in spark history server web UI.
I am a user from China. The job time is 8 hour less than the actual time 
due to the hard coding of rawOffsetValue 0.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/351zyf/spark zyf_b1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16485


commit 7c80d96fa6c1dcea07eec56363a115a9f145e6eb
Author: Johnson Zhang <johnson...@qq.com>
Date:   2017-01-06T06:44:34Z

correct the wrong time display in history server web UI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org