tustvold commented on a change in pull request #1476:
URL: https://github.com/apache/arrow-rs/pull/1476#discussion_r834338546
##########
File path: arrow/src/array/equal/list.rs
##########
@@ -131,17 +159,46 @@ pub(super) fn list_equal<T: OffsetSizeTrait>(
lengths_equal(
&lhs_offsets[lhs_start..lhs_start + len],
&rhs_offsets[rhs_start..rhs_start + len],
- ) && equal_range(
- lhs_values,
- rhs_values,
- child_lhs_nulls.as_ref(),
- child_rhs_nulls.as_ref(),
- lhs_offsets[lhs_start].to_usize().unwrap(),
- rhs_offsets[rhs_start].to_usize().unwrap(),
- (lhs_offsets[lhs_start + len] - lhs_offsets[lhs_start])
- .to_usize()
- .unwrap(),
- )
+ ) && {
+ match lhs.data_type() {
+ DataType::Map(_, _) => {
+ // Don't use `equal_range` which calls `utils::base_equal`
that checks
Review comment:
Same as above
##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -66,7 +66,25 @@ pub(super) fn equal_nulls(
#[inline]
pub(super) fn base_equal(lhs: &ArrayData, rhs: &ArrayData) -> bool {
- lhs.data_type() == rhs.data_type() && lhs.len() == rhs.len()
+ let equal_type = match (lhs.data_type(), rhs.data_type()) {
+ (DataType::Map(l_field, l_sorted), DataType::Map(r_field, r_sorted))
=> {
+ let field_equal = match (l_field.data_type(), r_field.data_type())
{
+ (DataType::Struct(l_fields), DataType::Struct(r_fields))
+ if l_fields.len() == 2 && r_fields.len() == 2 =>
+ {
+ // We don't enforce the equality of field names
+ l_fields.get(0).unwrap().data_type()
Review comment:
Should this also check nullability and metadata? I also wonder if this
logic is better contained within the equality logic for DataType itself
:thinking:
##########
File path: integration-testing/src/bin/arrow-json-integration-test.rs
##########
@@ -121,7 +164,7 @@ fn validate(arrow_name: &str, json_name: &str, verbose:
bool) -> Result<()> {
let arrow_schema = arrow_reader.schema().as_ref().to_owned();
// compare schemas
- if json_file.schema != arrow_schema {
+ if canonicalize_schema(&json_file.schema) !=
canonicalize_schema(&arrow_schema) {
Review comment:
Something feels a bit off about this logic living at the edges, i.e. the
interface. If the field names of a DataType::Map really don't matter, I would
expect this to either not be stored, or at least stored as an implementation
detail...
##########
File path: arrow/src/array/equal/list.rs
##########
@@ -58,22 +61,47 @@ fn offset_value_equal<T: OffsetSizeTrait>(
lhs_pos: usize,
rhs_pos: usize,
len: usize,
+ data_type: &DataType,
) -> bool {
let lhs_start = lhs_offsets[lhs_pos].to_usize().unwrap();
let rhs_start = rhs_offsets[rhs_pos].to_usize().unwrap();
let lhs_len = lhs_offsets[lhs_pos + len] - lhs_offsets[lhs_pos];
let rhs_len = rhs_offsets[rhs_pos + len] - rhs_offsets[rhs_pos];
- lhs_len == rhs_len
- && equal_range(
- lhs_values,
- rhs_values,
- lhs_nulls,
- rhs_nulls,
- lhs_start,
- rhs_start,
- lhs_len.to_usize().unwrap(),
- )
+ lhs_len == rhs_len && {
+ match data_type {
+ DataType::Map(_, _) => {
+ // Don't use `equal_range` which calls `utils::base_equal`
that checks
Review comment:
I'm probably missing something but as far as I can tell you have
modified base_equal to not check the names, and so I'm not sure why this
special case is still needed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]