This is an automated email from the ASF dual-hosted git repository.

AlenkaF pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new ea8cef531c GH-49875: [Python] Fix timezone dropped when converting 
tz-aware Categorical to Arrow array (#49878)
ea8cef531c is described below

commit ea8cef531c0340fd1a92f9cca6a61634af33d806
Author: AnkitAhlawat <[email protected]>
AuthorDate: Wed May 6 13:20:58 2026 +0530

    GH-49875: [Python] Fix timezone dropped when converting tz-aware 
Categorical to Arrow array (#49878)
    
    
    
    ### Rationale for this change
    
    When converting a pandas.Categorical with tz-aware datetime categories to a 
PyArrow array, the timezone information was silently dropped from the 
dictionary array's value type. This is a silent data loss bug — no warning or 
error is raised, but the timezone metadata is lost.
    
    ### What changes are included in this PR?
    
    In `python/pyarrow/array.pxi`, the Categorical conversion was using 
`values.categories.values(raw numpy array) `which strips timezone metadata 
since numpy does not support tz-aware datetimes. Changed to values.categories 
(pandas Index) and added from_pandas=True so PyArrow uses the pandas conversion 
path, which correctly preserves timezone metadata.
    
    ### Are these changes tested?
    
    Yes. Verified manually
    ### Are there any user-facing changes?
    
    Yes — this is a bug fix. Users did #49875
    
    This PR contains a **"Critical Fix"** — timezone information was lost 
silently during conversion without any warning or error.
    * GitHub Issue: #49875
    
    Authored-by: [email protected] <[email protected]>
    Signed-off-by: AlenkaF <[email protected]>
---
 python/pyarrow/array.pxi            | 7 ++++---
 python/pyarrow/tests/test_pandas.py | 9 +++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index b7f3a46f9e..ecdbb342d3 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -356,8 +356,8 @@ def array(object obj, type=None, mask=None, size=None, 
from_pandas=None,
                 values.codes, mask, index_type, memory_pool)
             try:
                 dictionary = array(
-                    values.categories.values, type=value_type,
-                    memory_pool=memory_pool)
+                    values.categories, type=value_type,
+                    from_pandas=True, memory_pool=memory_pool)
             except TypeError:
                 # TODO when removing the deprecation warning, this whole
                 # try/except can be removed (to bubble the TypeError of
@@ -371,7 +371,8 @@ def array(object obj, type=None, mask=None, size=None, 
from_pandas=None,
                         "TypeError",
                         FutureWarning, stacklevel=2)
                     dictionary = array(
-                        values.categories.values, memory_pool=memory_pool)
+                        values.categories, from_pandas=True,
+                        memory_pool=memory_pool)
                 else:
                     raise
 
diff --git a/python/pyarrow/tests/test_pandas.py 
b/python/pyarrow/tests/test_pandas.py
index 0339975f45..063532140c 100644
--- a/python/pyarrow/tests/test_pandas.py
+++ b/python/pyarrow/tests/test_pandas.py
@@ -3047,6 +3047,15 @@ class TestConvertMisc:
         df['a'] = df['a'].astype('category')
         _check_pandas_roundtrip(df)
 
+    def test_categorical_with_timezone(self):
+        # GH-49875: timezone was dropped when converting tz-aware categorical
+        cats = pd.DatetimeIndex(["2024-01-01", 
"2024-01-02"]).tz_localize("US/Eastern")
+        cat = pd.Categorical(values=[cats[0], cats[1], cats[0]], 
categories=cats)
+
+        arr = pa.array(cat, from_pandas=True)
+
+        assert arr.type.value_type.tz == "US/Eastern"
+
     def test_empty_arrays(self):
         for dtype_str, pa_type in self.type_pairs:
             if (Version(pd.__version__) >= Version("3.0.0") and

Reply via email to