houqp commented on a change in pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r619864093



##########
File path: datafusion/src/execution/dataframe_impl.rs
##########
@@ -252,7 +252,12 @@ mod tests {
         let right = test_table()?.select_columns(&["c1", "c3"])?;
         let left_rows = left.collect().await?;
         let right_rows = right.collect().await?;
-        let join = left.join(right, JoinType::Inner, &["c1"], &["c1"])?;
+        let join = left.join(
+            right,
+            JoinType::Inner,
+            vec![Column::from_name("c1".to_string())],

Review comment:
       That's a great idea!

##########
File path: datafusion/src/physical_optimizer/coalesce_batches.rs
##########
@@ -58,7 +59,17 @@ impl PhysicalOptimizerRule for CoalesceBatches {
         // See https://issues.apache.org/jira/browse/ARROW-11068
         let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some()
             || plan_any.downcast_ref::<HashJoinExec>().is_some()
-            || plan_any.downcast_ref::<RepartitionExec>().is_some();
+            || {
+                match plan_any.downcast_ref::<RepartitionExec>() {
+                    Some(p) => match p.partitioning() {
+                        // do not coalesce hash partitions since other plans 
like partitioned hash

Review comment:
       Good call, I started with handling this in `HashJoinStream`, but 
couldn't get it to work in all cases due to a small bug. I took a second look 
at my first attempt and noticed it could be fixed by changing 3 characters :P I 
have now reverted my change in the coalesce optimizer and added back my initial 
fix through 
https://github.com/apache/arrow-datafusion/pull/55/commits/696f8e04178fef877a658d0ef08c3c4ceebfcfdf.
   
   Let me know if this fix aligns with what you have in mind. Also I suck at 
naming, `preserve_left` is the best I can come up with, suggestions welcome :)

##########
File path: datafusion/src/execution/dataframe_impl.rs
##########
@@ -252,7 +252,12 @@ mod tests {
         let right = test_table()?.select_columns(&["c1", "c3"])?;
         let left_rows = left.collect().await?;
         let right_rows = right.collect().await?;
-        let join = left.join(right, JoinType::Inner, &["c1"], &["c1"])?;
+        let join = left.join(
+            right,
+            JoinType::Inner,
+            vec![Column::from_name("c1".to_string())],

Review comment:
       Turns out it is not possible to use `Vec<impl Into<Column>>` in the 
signature because Dataframe is used as trait object. I changed the signature 
back to `&[&str]` for dataframe API and switched to `Vec<impl Into<Column>>` in 
the logical plan builder to keep the high level API simple.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to