jorgecarleitao opened a new pull request #8541: URL: https://github.com/apache/arrow/pull/8541
This is a major refactor of the `equal.rs` module. :/ The rational for this change is many fold: * currently array comparison requires downcasting the array ref to its concrete types. This is painful and not very ergonomics, as the user must "guess" what to downcast for comparison. We can see this in the hacks around `sort`, `take` and `concatenate` kernel's tests, and some of the tests of the builders. * the code in array comparison is hard to follow given the amount of calls that they perform. * The implementation currently indirectly uses many of the `unsafe` APIs that we have (via `is_null`, `value` and the like), which makes it risky to operate and mutate. * Some code is being repeated. This PR: 1. adds `impl PartialEq for dyn Array`, to allow `Array` comparison based on `Array::data` (main change) 2. Makes array equality to only depend on `ArrayData`, i.e. it no longer depends on concrete array types (such as `PrimitiveArray` and related API) to perform comparisons. 3. Significantly reduces the risk of panics and UB when composite arrays are of different types, by checking the types on `range` comparison 4. Makes array equality be statically dispatched, via `match datatype`. 5. DRY the code around array equality 6. Fixes an error in equality of dictionary with equal values 7. Added tests to equalities that were not tested (fixed binary, some edge cases of dictionaries) 8. splits `equal.rs` in smaller, more manageable files. 9. Removes `ArrayListOps`, since it it no longer needed 10. Moves Json equality to its own module, for clarity. 11. removes the need to have two functions per type to compare arrays. 12. Adds, as part of the specification, the number of buffers and their respective width to datatypes. This was backported from #8401 This is a draft as there is a significant design change here: all our code base so far focused on implementing the logic on the implementers of `Array` instead of `ArrayData`. This PR breaks that design, by using an Array-independent implementation, so that we can use it on `dyn Array == dyn Array`. Note that this does not implement `PartialEq` for `ArrayData`, only `dyn Array`, as different data does not a different array (due to the nullability). That implementation is being worked on #8200. IMO this PR significantly simplifies the code around array comparison, to the point where many implementations are 5 lines long. The first 3 commits are almost independent of the rest. The remaining are historically correct, but not very interesting to read. The functions are divided by modules for readability. All tests are there, plus new tests for some of the edge cases and untested arrays. This change is backward incompatible `array1.equals(&array2)` no longer works: use `array1 == array2` instead, which is the idiomatic way of comparing structs and trait objects in rust. :P ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org