Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17688#discussion_r112823211
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None):
                 Value to replace null values with.
                 If the value is a dict, then `subset` is ignored and `value` 
must be a mapping
                 from column name (string) to replacement value. The 
replacement value must be
    -            an int, long, float, or string.
    +            an int, long, float, boolean, or string.
    --- End diff --
    
    I think this indicates the replacement `If the value is a dict` whereas 
`param value` can't be a bool as below:
    
    ```python
    >>> from pyspark.sql import Row
    >>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna({"a": 
True}).first()
    Row(a=True)
    >>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna(True).first()
    Row(a=None)
    ```
    
    I can't find `def fill(value: Boolean)` in `functions.scala`. Namely, this 
will call it with `int`. So,
    
    ```python
    >>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(True).first()
    Row(a=1)
    >>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(False).first()
    Row(a=0)
    ```
    
    So, the current status looks correct to me.
    
    BTW, ideally, we should throw an exception in 
    
    ```python
    if not isinstance(value, (float, int, long, basestring, dict)):
        raise ValueError("value should be a float, int, long, string, or dict")
    ```
    
    However, in Python boolean is a int - 
https://www.python.org/dev/peps/pep-0285/
    
    >   6) Should bool inherit from int?
    >
    >    => Yes.
    
    
    ```python
    >>> isinstance(True, int)
    True
    ```
    
    However, this looks just a documentation fix and I guess there are many 
instances with it. I think it is fine with not fixing it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to