AlenkaF commented on issue #46138:
URL: https://github.com/apache/arrow/issues/46138#issuecomment-2809340768

   I had a look, and I agree there is a bug in the `Table.cast()` and also a 
case of the error message being unclear, combined with missing documentation 
and examples.
   
   First of all, this behaviour also occurs when using an array directly:
   
   ```python
   >>> arr = pa.array([pd.to_datetime("2023-03-15T15:15:00.123456789Z")], 
pa.timestamp("ns", "UTC"))
   >>> arr.cast(pa.timestamp("us", "UTC"))
   ```
   This throws:
   
   ```Python
   ArrowInvalid: Casting from timestamp[ns, tz=UTC] to timestamp[us, tz=UTC] 
would lose data: ...
   ```
   
   When trying to add `CastOptions,` you run into the error from above:
   
   ```python
   >>> arr.cast(pa.timestamp("us", "UTC"), 
options=pc.CastOptions(allow_time_truncate=True))
   ValueError: Must either pass values for 'target_type' and 'safe' or pass a 
value for 'options'
   ```
   
   This behavior is actually intentional — see [this PR 
comment](https://github.com/apache/arrow/pull/13109#discussion_r904379778). The 
key is that when using `CastOptions,` the `target_type` should be passed inside 
the options, not as a separate argument:
   
   ```python
   >>> arr.cast(options=pc.CastOptions(target_type=pa.timestamp("us", "UTC"), 
allow_time_truncate=True))
   <pyarrow.lib.TimestampArray object at 0x105ad1ba0>
   [
     2023-03-15 15:15:00.123456Z
   ]
   ```
   
   That works correctly and truncates as expected in case of an `Array` or a 
`ChunkedArray` because, in the mentioned PR, the `target_type` has been made 
optional. That is not the case for a `Table` or a `RecordBatch` where the 
`target_schema` is not optional:
   
   
https://github.com/apache/arrow/blob/586ed925f4bb4d333f8e2a0beb07564bade355e8/python/pyarrow/table.pxi#L3300
   
https://github.com/apache/arrow/blob/586ed925f4bb4d333f8e2a0beb07564bade355e8/python/pyarrow/table.pxi#L4666
   
   I think we need to update `cast()` method for `RecordBatch` and `Table`, 
probably making target schema optional. We would also need to clarify all of 
the cast functions docstrings and update [the tests in 
test_compute.py](https://github.com/apache/arrow/blob/586ed925f4bb4d333f8e2a0beb07564bade355e8/python/pyarrow/tests/test_compute.py#L1887-L1902)
 where we should match exception messages and maybe add some comments and tests 
for tables and batches.
   
   Happy to help with review if someone wants to send in a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to