pitrou commented on code in PR #48813:
URL: https://github.com/apache/arrow/pull/48813#discussion_r2683064671
##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -704,15 +704,19 @@ class TypeInferrer {
Py_TYPE(key_obj)->tp_name, "'");
}
// Get or create visitor for this key
- auto it = struct_inferrers_.find(key);
- if (it == struct_inferrers_.end()) {
- it = struct_inferrers_
- .insert(
- std::make_pair(key, TypeInferrer(pandas_null_sentinels_,
- validate_interval_,
make_unions_)))
- .first;
+ TypeInferrer* visitor;
+ auto it = struct_field_index_.find(key);
+ if (it == struct_field_index_.end()) {
+ // New field - add to vector and index
+ size_t new_index = struct_inferrers_.size();
+ struct_inferrers_.emplace_back(
+ key, TypeInferrer(pandas_null_sentinels_, validate_interval_,
make_unions_));
+ struct_field_index_.emplace(key, new_index);
Review Comment:
We can perhaps save a copy here, though probably not very important.
```suggestion
struct_field_index_.emplace(std::move(key), new_index);
```
##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -71,10 +71,10 @@ def __int__(self):
def check_struct_type(ty, expected):
Review Comment:
AFAICT, this function was only necessary because we wanted to ignore field
order. So now all call sites could use an equality assertion between types
directly?
##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2054,6 +2058,23 @@ def test_struct_from_dicts_inference():
pa.array([1, {'a': 2}])
+def test_struct_from_dicts_field_order():
Review Comment:
Is this test doing something different from
`test_struct_from_dicts_inference` above?
##########
python/pyarrow/tests/test_convert_builtin.py:
##########
Review Comment:
How about we also add a type check here too?
##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2014,21 +2014,25 @@ def test_struct_from_dicts_inference():
assert arr.to_pylist() == data
# With omitted values
+ # Field order is determined by first occurrence: a, c from first dict,
then b from fourth
data = [{'a': 5, 'c': True},
None,
{},
{'a': None, 'b': 'bar'}]
- expected = [{'a': 5, 'b': None, 'c': True},
+ expected_type_omitted = pa.struct([pa.field('a', pa.int64()),
+ pa.field('c', pa.bool_()),
+ pa.field('b', pa.string())])
+ expected = [{'a': 5, 'c': True, 'b': None},
None,
- {'a': None, 'b': None, 'c': None},
- {'a': None, 'b': 'bar', 'c': None}]
+ {'a': None, 'c': None, 'b': None},
+ {'a': None, 'c': None, 'b': 'bar'}]
arr = pa.array(data)
data_as_ndarray = np.empty(len(data), dtype=object)
data_as_ndarray[:] = data
arr2 = pa.array(data)
- check_struct_type(arr.type, expected_type)
+ check_struct_type(arr.type, expected_type_omitted)
Review Comment:
For example this could be replaced with:
```suggestion
assert arr.type == expected_type_omitted
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]