Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16792#discussion_r99471688 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1272,16 +1272,18 @@ def replace(self, to_replace, value, subset=None): """Returns a new :class:`DataFrame` replacing a value with another value. :func:`DataFrame.replace` and :func:`DataFrameNaFunctions.replace` are aliases of each other. + Values `to_replace` and `value` should be homogeneous. Mixed string and numeric --- End diff -- I don't think we need to cast the types - if you look inside of `replace0` all of the numerics are turned into doubles in the map (but we should probably - in your other PR - add a test around that so that if the internals change we know we need to update the Python side). Doing `sc.parallelize([Row(name='Alice', age=0, height=80)]).toDF().replace(0, 12.5).collect()` is what I was talking about cutting of the deciminal component (so while it runs it arguable doesn't do what the user expects - but that ). What about something along the lines of: `to_replace` and `value` should contain either all numerics, all booleans, or all strings. When replacing, the new value will be cast to the type of the existing column." I think this more clearly communicates the requirements, but is still a bit awkward -- can you think of something better?)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org