Copilot commented on code in PR #49658:
URL: https://github.com/apache/arrow/pull/49658#discussion_r3497851433


##########
python/pyarrow/tests/test_array.py:
##########
@@ -4468,3 +4468,19 @@ def test_dunders_checked_overflow():
         arr ** pa.scalar(2, type=pa.int8())
     with pytest.raises(pa.ArrowInvalid, match=error_match):
         arr / (-arr)
+
+
+def test_dictionary_large_string_and_binary():
+    # Test dictionary with large_string values
+    arr_str = pa.array(
+        ["a", "b", "a"], type=pa.dictionary(pa.int32(), pa.large_string())
+    )
+    assert arr_str.type.value_type == pa.large_string()
+    assert arr_str.to_pylist() == ["a", "b", "a"]
+
+    # Test dictionary with large_binary values
+    arr_bin = pa.array(
+        [b"x", b"y", b"x"], type=pa.dictionary(pa.int32(), pa.large_binary())
+    )
+    assert arr_bin.type.value_type == pa.large_binary()
+    assert arr_bin.to_pylist() == [b"x", b"y", b"x"]

Review Comment:
   The new regression test doesn't cover the original reported failure mode 
(GH-49505): converting an *empty* Python sequence with an explicitly-provided 
dictionary type whose value_type is large_string/large_binary. Adding 
empty-input assertions will prevent regressions where dispatch works for 
non-empty inputs but still fails for empty sequences.



##########
python/pyarrow/src/arrow/python/python_to_arrow.cc:
##########
@@ -826,7 +827,7 @@ class PyDictionaryConverter<U, enable_if_has_string_view<U>>
     } else {
       ARROW_RETURN_NOT_OK(
           PyValue::Convert(this->value_type_, this->options_, value, view_));
-      return this->value_builder_->Append(view_.bytes, 
static_cast<int32_t>(view_.size));
+      return this->value_builder_->Append(std::string_view(view_.bytes, 
view_.size));
     }

Review Comment:
   `PyBytesView::size` is a signed `Py_ssize_t`, but `std::string_view` expects 
a `size_t` length. Making the cast explicit avoids implicit signed→unsigned 
conversion (and potential -Wsign-conversion / -Werror build breaks).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to