randolf-scholz opened a new issue, #34976:
URL: https://github.com/apache/arrow/issues/34976

   ### Describe the enhancement requested
   
   I have large array consisting of string data. Unfortunately, there is 
numerical data mixed with categorical data. `pyarrow` seems to offer no 
straightforward way to separate them.
   
   ```python
   import pyarrow as pa
   
   arr = pa.array(["3", "+5", "-4.2", "1,000.00", "foo", "7e-3"], type="string")
   print(pa.compute.utf8_is_numeric(arr))  # ynnnnn
   pa.compute.cast(arr, pa.float32())  # ArrowInvalid: Failed to parse string: 
'foo' as a scalar of type float
   ```
   
   basically, it would be great to have either (or both)
   
   - function that returns boolean mask whether string can be cast to float
   - add option to `pyarrow.compute.cast` that replaces errors with null values.
   
   My current workaround is to use cast to pandas: 
`pd.to_numeric(pd.Series(arr, dtype="string[pyarrow]"), errors="coerce")`.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to