xuchen-plus opened a new issue, #7038:
URL: https://github.com/apache/arrow-datafusion/issues/7038

   ### Describe the bug
   
   When applied `IsNull` filter on a dataframe with only one row containing 
`null` value, the filtered result is empty.
   
   ### To Reproduce
   
   I used datafusion python binding with version 27.0.0.
   ```
   >>> import datafusion
   >>> datafusion.__version__
   '27.0.0'
   ```
   
   ```python
   import pandas as pd
   from datafusion import functions as f
   
   pandas_df = pd.DataFrame({"a": [None], "b": [1]})
   ctx = SessionContext()
   df = ctx.from_pandas(pandas_df)
   df.show()
   ```
   
   The above prints the dataframe where `a`'s value is null:
   ```
   >>> df.show()
   DataFrame()
   +---+---+
   | a | b |
   +---+---+
   |   | 1 |
   +---+---+
   ```
   
   Now filter column `a` with is_null:
   ```python
   df.filter(f.col("a").is_null()).show()
   ```
   
   The result is empty:
   ```
   DataFrame()
   ++
   ++
   ```
   
   However if there are more rows with non-null values, the filtered result is 
correct:
   ```
   >>> pandas_df = pd.DataFrame({"a": [None, 1], "b": [1, 2]})
   >>> df = ctx.from_pandas(pandas_df)
   >>> df.show()
   DataFrame()
   +-----+---+
   | a   | b |
   +-----+---+
   |     | 1 |
   | 1.0 | 2 |
   +-----+---+
   >>> df.filter(f.col("a").is_null()).show()
   DataFrame()
   +---+---+
   | a | b |
   +---+---+
   |   | 1 |
   +---+---+
   ```
   
   ### Expected behavior
   
   The filtered result should contain the only row with null value.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to