fuxi611 opened a new pull request, #55987:
URL: https://github.com/apache/spark/pull/55987

   ### What changes were proposed in this pull request?
   
   This PR fixes pandas-on-Spark equality and inequality comparisons between 
incompatible dtypes under ANSI mode.
   
   The change makes pandas-on-Spark return pandas-compatible boolean results 
for incompatible dtype comparisons instead of delegating them to Spark SQL 
casting behavior:
   
   - `eq` returns all `False`
   - `ne` returns all `True`
   
   This covers comparisons such as numeric Series/Index against string 
Series/Index or string scalar values.
   
   ### Why are the changes needed?
   
   ANSI mode should not change pandas API on Spark behavior. Without this fix, 
Spark SQL may try to cast incompatible operands under ANSI mode, which can 
produce behavior that differs from pandas or raise errors for comparisons where 
pandas would simply return boolean results.
   
   ### Does this PR introduce any user-facing change?
   
   Yes. pandas-on-Spark comparison behavior becomes more consistent with pandas 
under ANSI mode for incompatible dtype equality and inequality comparisons.
   
   ### How was this patch tested?
   
   Ran:
   
   ```bash
   
   python3 python/run-tests.py --testnames 
pyspark.pandas.tests.data_type_ops.test_num_ops
   python3 python/run-tests.py --testnames 
pyspark.pandas.tests.data_type_ops.test_boolean_ops


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to