zhengruifeng opened a new pull request, #55698:
URL: https://github.com/apache/spark/pull/55698

   ### What changes were proposed in this pull request?
   
   Gate one `assertRaises(PythonException)` block in 
`ArrowPythonUDFTestsMixin.test_type_coercion_string_to_numeric` on 
`LooseVersion(pd.__version__) < "3.0.0"`. Specifically, the `string("1","2") -> 
decimal` failure assertion is skipped on pandas 3+. The other failure 
assertions (`"1.1" -> int`, `"1.1" -> decimal`) and all success cases are 
unchanged.
   
   ### Why are the changes needed?
   
   `ArrowPythonUDFLegacyTests.test_type_coercion_string_to_numeric` is failing 
on the scheduled `Build / Python-only (master, Python 3.12, Pandas 3)` job, 
e.g. https://github.com/apache/spark/actions/runs/25402959034/job/74508177526.
   
   Root cause: pandas 3's `StringDtype` implements `__arrow_array__`. In 
`PandasToArrowConversion.convert` (`python/pyspark/sql/conversion.py`), the 
path is
   
   ```python
   mask = None if hasattr(series.array, "__arrow_array__") else series.isnull()
   ...
   pa.Array.from_pandas(series, mask=mask, type=arrow_type, safe=safecheck)
   ```
   
   On pandas 2 the result series of strings has object dtype, no 
`__arrow_array__`, and `from_pandas` with `type=decimal128(...)` raises 
`ArrowTypeError` ("int or Decimal object expected, got str") which surfaces as 
`PythonException`. On pandas 3 the series has `StringDtype`, mask is `None`, 
and the `__arrow_array__` protocol cleanly casts `"1"` to `Decimal("1")` — the 
conversion silently succeeds, so `assertRaises(PythonException)` fails.
   
   The non-legacy `ArrowPythonUDF` path is unaffected because it converts a 
Python list directly via `pa.array(list, type=...)`, where pyarrow's 
per-element type check still rejects `str` for `Decimal`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. Test-only.
   
   ### How was this patch tested?
   
   Pending CI on the Pandas 3 image. Locally the gate is short-circuited 
(pandas 2.3.3) so existing behavior is unchanged.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to