This is an automated email from the ASF dual-hosted git repository.
AlenkaF pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 1fc0e194e8 GH-45644: [Doc][Python] Document timezone loss when
converting timestamp arrays to NumPy (#49843)
1fc0e194e8 is described below
commit 1fc0e194e869ea4a66b03809d0bc3204ed83947f
Author: Alexandros Anastasiou <[email protected]>
AuthorDate: Tue May 5 13:26:15 2026 +0100
GH-45644: [Doc][Python] Document timezone loss when converting timestamp
arrays to NumPy (#49843)
### Rationale for this change
NumPy's `datetime64` type does not support timezones. When converting a
timezone-aware Arrow timestamp array to NumPy via `to_numpy()`, the timezone
information is silently dropped. This behaviour is expected but undocumented,
which can surprise users (see #45644).
### What changes are included in this PR?
Adds a "Timezone-aware Timestamps" subsection to
`docs/source/python/numpy.rst` that:
- Explains the timezone loss when calling `to_numpy()` on tz-aware
timestamp arrays
- Shows a code example demonstrating the behavior
- Documents two alternatives: `to_pandas()` for tz-aware Series, and
`to_pylist()` for Python `datetime` objects with `tzinfo`
### Are these changes tested?
Documentation-only change. All code examples were verified against pyarrow
24.0.0 and `sphinx-lint` passes clean.
### Are there any user-facing changes?
No behaviour changes. This adds documentation for existing behaviour.
### AI-generated code disclosure
This PR was developed with assistance from an AI coding tool (Claude,
Anthropic). All changes have been reviewed, understood, and verified.
* GitHub Issue: #45644
Closes #45644
Authored-by: Alexandros Anastasiou <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
---
docs/source/python/numpy.rst | 52 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/docs/source/python/numpy.rst b/docs/source/python/numpy.rst
index 01fb1982d5..07a6aa803f 100644
--- a/docs/source/python/numpy.rst
+++ b/docs/source/python/numpy.rst
@@ -73,3 +73,55 @@ representation as Arrow, and assuming the Arrow data has no
nulls.
For more complex data types, you have to use the
:meth:`~pyarrow.Array.to_pandas`
method (which will construct a Numpy array with Pandas semantics for, e.g.,
representation of null values).
+
+Timezone-aware Timestamps
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+NumPy's ``datetime64`` type does not support timezones. When converting a
+timezone-aware Arrow timestamp array to NumPy via
:meth:`~pyarrow.Array.to_numpy`,
+the timezone information is silently dropped:
+
+.. code-block:: python
+
+ >>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s",
tz="UTC"))
+ >>> arr.type
+ TimestampType(timestamp[s, tz=UTC])
+ >>> arr.to_numpy()
+ array(['2025-01-01T00:00:00', '2025-01-01T00:00:00'],
+ dtype='datetime64[s]')
+
+If you need to preserve timezone information, there are two alternatives:
+
+* Convert to a Pandas Series, which supports timezone-aware ``datetime64``
dtypes:
+
+ .. code-block:: python
+
+ >>> arr.to_pandas()
+ 0 2025-01-01 00:00:00+00:00
+ 1 2025-01-01 00:00:00+00:00
+ dtype: datetime64[s, UTC]
+
+ To get a NumPy array while preserving timezone information, use
+ ``timestamp_as_object=True``:
+
+ .. code-block:: python
+
+ >>> arr.to_pandas(timestamp_as_object=True).to_numpy() # doctest:
+ELLIPSIS
+ array([datetime.datetime(2025, 1, 1, 0, 0, tzinfo=...),
+ datetime.datetime(2025, 1, 1, 0, 0, tzinfo=...)],
+ dtype=object)
+
+ .. note::
+
+ For nested types (e.g., list arrays containing timestamps),
+ ``to_pandas()`` may not preserve timezone information. Structs and maps
+ do retain timezones, but lists currently do not. See
+ `GH-41162 <https://github.com/apache/arrow/issues/41162>`_ for details.
+
+* Convert to Python ``datetime`` objects, which carry ``tzinfo``:
+
+ .. code-block:: python
+
+ >>> arr.to_pylist() # doctest: +SKIP
+ [datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC')),
+ datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))]