alamb commented on a change in pull request #1291: URL: https://github.com/apache/arrow-datafusion/pull/1291#discussion_r764054319
########## File path: datafusion/src/physical_plan/hash_join.rs ########## @@ -909,31 +913,16 @@ fn equal_rows( // Produces a batch for left-side rows that have/have not been matched during the whole join fn produce_from_matched( - visited_left_side: &[bool], + visited_left_side: &BooleanBufferBuilder, schema: &SchemaRef, column_indices: &[ColumnIndex], left_data: &JoinLeftData, unmatched: bool, ) -> ArrowResult<RecordBatch> { - // Find indices which didn't match any right row (are false) - let indices = if unmatched { - UInt64Array::from_iter_values( - visited_left_side - .iter() - .enumerate() - .filter(|&(_, &value)| !value) - .map(|(index, _)| index as u64), - ) - } else { - // produce those that did match - UInt64Array::from_iter_values( - visited_left_side - .iter() - .enumerate() - .filter(|&(_, &value)| value) - .map(|(index, _)| index as u64), - ) - }; + let indices = + UInt64Array::from_iter_values((0..visited_left_side.len()).filter_map(|v| { + (unmatched ^ visited_left_side.get_bit(v)).then(|| v as u64) + })); Review comment: Maybe to remove all doubt we could skip the check on each iteration. Something like (untested) ```rust let indices = if unmached { UInt64Array::from_iter_values((0..visited_left_side.len()).filter_map(|v| { (!visited_left_side.get_bit(v)).then(|| v as u64) })) } else { UInt64Array::from_iter_values((0..visited_left_side.len()).filter_map(|v| { (visited_left_side.get_bit(v)).then(|| v as u64) })); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org