tustvold commented on code in PR #3473:
URL: https://github.com/apache/arrow-rs/pull/3473#discussion_r1063751538


##########
arrow-select/src/take.rs:
##########
@@ -831,10 +815,18 @@ where
 /// Where a list array has indices `[0,2,5,10]`, taking indices of `[2,0]` 
returns
 /// an array of the indices `[5..10, 0..2]` and offsets `[0,5,7]` (5 elements 
and 2
 /// elements)
+#[allow(clippy::type_complexity)]

Review Comment:
   I think it would be helpful to at least have some documentation of what the 
various fields are. 



##########
arrow-select/src/take.rs:
##########
@@ -691,27 +691,11 @@ where
 {
     // TODO: Some optimizations can be done here such as if it is
     // taking the whole list or a contiguous sublist
-    let (list_indices, offsets) =
+    let (list_indices, offsets, null_buf) =
         take_value_indices_from_list::<IndexType, OffsetType>(values, 
indices)?;
 
     let taken = take_impl::<OffsetType>(values.values().as_ref(), 
&list_indices, None)?;
-    // determine null count and null buffer, which are a function of `values` 
and `indices`
-    let mut null_count = 0;
-    let num_bytes = bit_util::ceil(indices.len(), 8);
-    let mut null_buf = MutableBuffer::new(num_bytes).with_bitset(num_bytes, 
true);
-    {
-        let null_slice = null_buf.as_slice_mut();
-        offsets[..].windows(2).enumerate().for_each(
-            |(i, window): (usize, &[OffsetType::Native])| {
-                if window[0] == window[1] {
-                    // offsets are equal, slot is null
-                    bit_util::unset_bit(null_slice, i);
-                    null_count += 1;
-                }
-            },
-        );
-    }
-    let value_offsets = Buffer::from_slice_ref(&offsets);
+    let value_offsets = Buffer::from_slice_ref(offsets);

Review Comment:
   Not something from this PR, but it occurs to me that it would be better for 
take_value_indices_from_list to simply build and return this buffer, e.g. using 
BufferBuilder, instead of this additional copy.



##########
arrow-select/src/take.rs:
##########
@@ -850,6 +842,12 @@ where
     let mut current_offset = OffsetType::Native::zero();
     // add first offset
     new_offsets.push(OffsetType::Native::zero());
+
+    // Initialize null buffer
+    let num_bytes = bit_util::ceil(indices.len(), 8);
+    let mut null_buf = MutableBuffer::new(num_bytes).with_bitset(num_bytes, 
true);
+    let null_slice = null_buf.as_slice_mut();
+

Review Comment:
   Rather than moving this into here, I think it might be faster and simpler to 
use `take_bits` on the list null buffer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to