AlenkaF commented on code in PR #49271:
URL: https://github.com/apache/arrow/pull/49271#discussion_r2923544413


##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -3096,10 +3100,10 @@ def _check(v):
             result = arr.to_pandas()
             tm.assert_series_equal(pd.Series(result), pd.Series(v))
 
+        base = pd.Categorical(['a', 'b', 'c'])
         arrays = [
-            pd.Categorical(['a', 'b', 'c'], categories=['a', 'b']),
-            pd.Categorical(['a', 'b', 'c'], categories=['a', 'b'],
-                           ordered=True)
+            base.set_categories(['a', 'b']),
+            base.set_categories(['a', 'b']).as_ordered(),

Review Comment:
   I think it would be simpler to do:
   
   ```python
   pd.Categorical(['a', 'b', None], categories=['a', 'b'])
   ```
   
   This should be the same use case as reported here 
https://github.com/apache/arrow/issues/19704.



##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -3069,15 +3069,19 @@ def test_category(self):
         v2 = [4, 5, 6, 7, 8]
         v3 = [b'foo', None, b'bar', b'qux', np.nan]
 
+        cat_strings = pd.Categorical(v1 * repeats)
+        cat_strings_with_na = cat_strings.set_categories(['foo', 'bar'])

Review Comment:
   We are probably still getting into missing categories being silently 
converted to `NaN` here and Pandas is moving away from that AFAIU.
   
   As the idea in the test is to have `NaN` in the constructed categorical 
array, we might simply remove this line as the `cat_strings` actually already 
includes them:
   
   ```python
   In [19]: pd.Categorical(v1 * repeats)
       ...: 
   Out[19]: 
   ['foo', NaN, 'bar', 'qux', NaN, ..., 'foo', NaN, 'bar', 'qux', NaN]
   Length: 25
   Categories (3, str): ['bar', 'foo', 'qux']
   ```
   
   What we can do is to add:
   
   ```python
   v0 = ['foo', 'bar', 'qux']
   ```
   
   and use this for `cat_strings`? This way we do not have to look for a 
workaround where we use deprecated behavior to construct `NaN` values due to 
missing categories.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to