jorgecarleitao opened a new pull request #8541:
URL: https://github.com/apache/arrow/pull/8541


   This is a major refactor of the `equal.rs` module. :/
   
   The rational for this change is many fold: 
   
   * currently array comparison requires downcasting the array ref to its 
concrete types. This is painful and not very ergonomics, as the user must 
"guess" what to downcast for comparison. We can see this in the hacks around 
`sort`, `take` and `concatenate` kernel's tests, and some of the tests of the 
builders.
   * the code in array comparison is hard to follow given the amount of calls 
that they perform.
   * The implementation currently indirectly uses many of the `unsafe` APIs 
that we have (via `is_null`, `value` and the like), which makes it risky to 
operate and mutate.
   * Some code is being repeated.
   
   This PR:
   
   1. adds `impl PartialEq for dyn Array`, to allow `Array` comparison based on 
`Array::data` (main change)
   2. Makes array equality to only depend on `ArrayData`, i.e. it no longer 
depends on concrete array types (such as `PrimitiveArray` and related API) to 
perform comparisons.
   3. Significantly reduces the risk of panics and UB when composite arrays are 
of different types, by checking the types on `range` comparison
   4. Makes array equality be statically dispatched, via `match datatype`.
   5. DRY the code around array equality
   6. Fixes an error in equality of dictionary with equal values
   7. Added tests to equalities that were not tested (fixed binary, some edge 
cases of dictionaries)
   8. splits `equal.rs` in smaller, more manageable files.
   9. Removes `ArrayListOps`, since it it no longer needed
   10. Moves Json equality to its own module, for clarity.
   11. removes the need to have two functions per type to compare arrays.
   12. Adds, as part of the specification, the number of buffers and their 
respective width to datatypes. This was backported from #8401
   
   This is a draft as there is a significant design change here: all our code 
base so far focused on implementing the logic on the implementers of `Array` 
instead of `ArrayData`. This PR breaks that design, by using an 
Array-independent implementation, so that we can use it on `dyn Array == dyn 
Array`.
   
   Note that this does not implement `PartialEq` for `ArrayData`, only `dyn 
Array`, as different data does not a different array (due to the nullability). 
That implementation is being worked on #8200.
   
   IMO this PR significantly simplifies the code around array comparison, to 
the point where many implementations are 5 lines long.
   
   The first 3 commits are almost independent of the rest. The remaining are 
historically correct, but not very interesting to read. The functions are 
divided by modules for readability.
   
   All tests are there, plus new tests for some of the edge cases and untested 
arrays.
   
   This change is backward incompatible `array1.equals(&array2)` no longer 
works: use `array1 == array2` instead, which is the idiomatic way of comparing 
structs and trait objects in rust. :P


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to