alamb commented on code in PR #8849:
URL: https://github.com/apache/arrow-rs/pull/8849#discussion_r2531894200
##########
arrow-ord/src/cmp.rs:
##########
@@ -390,14 +390,14 @@ fn take_bits(v: &dyn AnyDictionaryArray, buffer:
BooleanBuffer) -> BooleanBuffer
/// Invokes `f` with values `0..len` collecting the boolean results into a new
`BooleanBuffer`
///
-/// This is similar to [`MutableBuffer::collect_bool`] but with
+/// This is similar to [`arrow_buffer::MutableBuffer::collect_bool`] but with
/// the option to efficiently negate the result
fn collect_bool(len: usize, neg: bool, f: impl Fn(usize) -> bool) ->
BooleanBuffer {
- let mut buffer = MutableBuffer::new(ceil(len, 64) * 8);
+ let mut buffer = Vec::with_capacity(ceil(len, 64));
Review Comment:
Maybe we could make the `neg` a generic argument on
MutableBuffer::collect_bool and then avoid the duplication (as a follow on PR)
Or maybe make a `collect_bool` function in `bit_util` that returns a Vec and
have the mutable buffer and this one call it 🤔
##########
arrow-select/src/take.rs:
##########
@@ -422,9 +422,10 @@ fn take_native<T: ArrowNativeType, I: ArrowPrimitiveType>(
.enumerate()
.map(|(idx, index)| match values.get(index.as_usize()) {
Some(v) => *v,
- None => match n.is_null(idx) {
- true => T::default(),
- false => panic!("Out-of-bounds index {index:?}"),
+ // SAFETY: idx<indices.len()
Review Comment:
I had to read this several times to convince myself it is correct -- namely
that `idx` i doesn't come from `indices` (provided by the user) but instead
comes from iterating `indices`
I found this whole method actually pretty confusing as there are multiple
things called values and indices (and `indices.values()`..)
I also double checked that there is a test for out of bounds indexes here:
https://github.com/apache/arrow-rs/blob/f93da94e61e731344ce84146dee946a94fe36602/arrow-select/src/take.rs#L2084-L2083
##########
arrow-select/src/take.rs:
##########
@@ -422,9 +422,10 @@ fn take_native<T: ArrowNativeType, I: ArrowPrimitiveType>(
.enumerate()
.map(|(idx, index)| match values.get(index.as_usize()) {
Some(v) => *v,
- None => match n.is_null(idx) {
- true => T::default(),
- false => panic!("Out-of-bounds index {index:?}"),
+ // SAFETY: idx<indices.len()
+ None => match unsafe { n.inner().value_unchecked(idx) } {
+ false => T::default(),
Review Comment:
@Dandandan do you know why this even bothers to look at the null buffer
again? I realize you didn't change this code, but it seems to me like checking
`n.inner()` (the nulls) is unecessary - it was already implicitly checked by
calling `values.get()` (which returns Some/None).
It seems like all this is doing is re-checking that `value()` and the nulls
match up.
so, TLDR I suggest we remove this clause entirely (could be a follow on PR)
##########
arrow-select/src/take.rs:
##########
@@ -448,8 +449,10 @@ fn take_bits<I: ArrowPrimitiveType>(
let mut output_buffer = MutableBuffer::new_null(len);
let output_slice = output_buffer.as_slice_mut();
nulls.valid_indices().for_each(|idx| {
- if values.value(indices.value(idx).as_usize()) {
- bit_util::set_bit(output_slice, idx);
+ // SAFETY: idx<indices.len()
Review Comment:
The logic goes something like idx comes from a valid index in indices.nulls,
which means it is guaranteed to be a valid index into the values too. Make a
comment like this would make this clearer:
```suggestion
// SAFETY: idx is a valid index in indices.nulls() -->
idx<indices.len()
```
##########
arrow-select/src/take.rs:
##########
@@ -448,8 +449,10 @@ fn take_bits<I: ArrowPrimitiveType>(
let mut output_buffer = MutableBuffer::new_null(len);
let output_slice = output_buffer.as_slice_mut();
nulls.valid_indices().for_each(|idx| {
- if values.value(indices.value(idx).as_usize()) {
- bit_util::set_bit(output_slice, idx);
+ // SAFETY: idx<indices.len()
+ if values.value(unsafe {
indices.value_unchecked(idx).as_usize() }) {
+ // SAFETY: idx < indices.len()
Review Comment:
I think the safety of this line relies on the fact that
`MutableBuffer::new_len()` was created with sufficeint space for indices.len()
bits
```suggestion
// SAFETY: MutableBuffer was created with space for
indices.len() bit, and idx < indices.len()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]