[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1291: Left join could use bitmap for left join instead of Vec

GitBox Tue, 23 Nov 2021 10:59:22 -0800


alamb commented on a change in pull request #1291:
URL: https://github.com/apache/arrow-datafusion/pull/1291#discussion_r755424450




##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -909,31 +913,16 @@ fn equal_rows(
 
 // Produces a batch for left-side rows that have/have not been matched during 
the whole join
 fn produce_from_matched(
-    visited_left_side: &[bool],
+    visited_left_side: &BooleanBufferBuilder,
     schema: &SchemaRef,
     column_indices: &[ColumnIndex],
     left_data: &JoinLeftData,
     unmatched: bool,
 ) -> ArrowResult<RecordBatch> {
-    // Find indices which didn't match any right row (are false)
-    let indices = if unmatched {
-        UInt64Array::from_iter_values(
-            visited_left_side
-                .iter()
-                .enumerate()
-                .filter(|&(_, &value)| !value)
-                .map(|(index, _)| index as u64),
-        )
-    } else {
-        // produce those that did match
-        UInt64Array::from_iter_values(
-            visited_left_side
-                .iter()
-                .enumerate()
-                .filter(|&(_, &value)| value)
-                .map(|(index, _)| index as u64),
-        )
-    };
+    let indices =
+        
UInt64Array::from_iter_values((0..visited_left_side.len()).filter_map(|v| {
+            (unmatched ^ visited_left_side.get_bit(v)).then(|| v as u64)
+        }));

Review comment:
       Can we perhaps just run the TPCH benchmarks to make sure we didn't 
introduce any major regression in performance?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1291: Left join could use bitmap for left join instead of Vec

Reply via email to