Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16792#discussion_r99471688
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1272,16 +1272,18 @@ def replace(self, to_replace, value, subset=None):
             """Returns a new :class:`DataFrame` replacing a value with another 
value.
             :func:`DataFrame.replace` and :func:`DataFrameNaFunctions.replace` 
are
             aliases of each other.
    +        Values `to_replace` and `value` should be homogeneous. Mixed 
string and numeric
    --- End diff --
    
    I don't think we need to cast the types - if you look inside of `replace0` 
all of the numerics are turned into doubles in the map (but we should probably 
- in your other PR - add a test around that so that if the internals change we 
know we need to update the Python side).
    
    Doing `sc.parallelize([Row(name='Alice', age=0, 
height=80)]).toDF().replace(0, 12.5).collect()` is what I was talking about 
cutting of the deciminal component (so while it runs it arguable doesn't do 
what the user expects - but that ).
    
    What about something along the lines of: `to_replace` and `value` should 
contain either all numerics, all booleans, or all strings. When replacing, the 
new value will be cast to the type of the existing column."
    
    I think this more clearly communicates the requirements, but is still a bit 
awkward -- can you think of something better?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to