AlenkaF commented on issue #46138: URL: https://github.com/apache/arrow/issues/46138#issuecomment-2809340768
I had a look, and I agree there is a bug in the `Table.cast()` and also a case of the error message being unclear, combined with missing documentation and examples. First of all, this behaviour also occurs when using an array directly: ```python >>> arr = pa.array([pd.to_datetime("2023-03-15T15:15:00.123456789Z")], pa.timestamp("ns", "UTC")) >>> arr.cast(pa.timestamp("us", "UTC")) ``` This throws: ```Python ArrowInvalid: Casting from timestamp[ns, tz=UTC] to timestamp[us, tz=UTC] would lose data: ... ``` When trying to add `CastOptions,` you run into the error from above: ```python >>> arr.cast(pa.timestamp("us", "UTC"), options=pc.CastOptions(allow_time_truncate=True)) ValueError: Must either pass values for 'target_type' and 'safe' or pass a value for 'options' ``` This behavior is actually intentional — see [this PR comment](https://github.com/apache/arrow/pull/13109#discussion_r904379778). The key is that when using `CastOptions,` the `target_type` should be passed inside the options, not as a separate argument: ```python >>> arr.cast(options=pc.CastOptions(target_type=pa.timestamp("us", "UTC"), allow_time_truncate=True)) <pyarrow.lib.TimestampArray object at 0x105ad1ba0> [ 2023-03-15 15:15:00.123456Z ] ``` That works correctly and truncates as expected in case of an `Array` or a `ChunkedArray` because, in the mentioned PR, the `target_type` has been made optional. That is not the case for a `Table` or a `RecordBatch` where the `target_schema` is not optional: https://github.com/apache/arrow/blob/586ed925f4bb4d333f8e2a0beb07564bade355e8/python/pyarrow/table.pxi#L3300 https://github.com/apache/arrow/blob/586ed925f4bb4d333f8e2a0beb07564bade355e8/python/pyarrow/table.pxi#L4666 I think we need to update `cast()` method for `RecordBatch` and `Table`, probably making target schema optional. We would also need to clarify all of the cast functions docstrings and update [the tests in test_compute.py](https://github.com/apache/arrow/blob/586ed925f4bb4d333f8e2a0beb07564bade355e8/python/pyarrow/tests/test_compute.py#L1887-L1902) where we should match exception messages and maybe add some comments and tests for tables and batches. Happy to help with review if someone wants to send in a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org