houqp commented on a change in pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r619864093
########## File path: datafusion/src/execution/dataframe_impl.rs ########## @@ -252,7 +252,12 @@ mod tests { let right = test_table()?.select_columns(&["c1", "c3"])?; let left_rows = left.collect().await?; let right_rows = right.collect().await?; - let join = left.join(right, JoinType::Inner, &["c1"], &["c1"])?; + let join = left.join( + right, + JoinType::Inner, + vec![Column::from_name("c1".to_string())], Review comment: That's a great idea! ########## File path: datafusion/src/physical_optimizer/coalesce_batches.rs ########## @@ -58,7 +59,17 @@ impl PhysicalOptimizerRule for CoalesceBatches { // See https://issues.apache.org/jira/browse/ARROW-11068 let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some() || plan_any.downcast_ref::<HashJoinExec>().is_some() - || plan_any.downcast_ref::<RepartitionExec>().is_some(); + || { + match plan_any.downcast_ref::<RepartitionExec>() { + Some(p) => match p.partitioning() { + // do not coalesce hash partitions since other plans like partitioned hash Review comment: Good call, I started with handling this in `HashJoinStream`, but couldn't get it to work in all cases due to a small bug. I took a second look at my first attempt and noticed it could be fixed by changing 3 characters :P I have now reverted my change in the coalesce optimizer and added back my initial fix through https://github.com/apache/arrow-datafusion/pull/55/commits/696f8e04178fef877a658d0ef08c3c4ceebfcfdf. Let me know if this fix aligns with what you have in mind. Also I suck at naming, `preserve_left` is the best I can come up with, suggestions welcome :) ########## File path: datafusion/src/execution/dataframe_impl.rs ########## @@ -252,7 +252,12 @@ mod tests { let right = test_table()?.select_columns(&["c1", "c3"])?; let left_rows = left.collect().await?; let right_rows = right.collect().await?; - let join = left.join(right, JoinType::Inner, &["c1"], &["c1"])?; + let join = left.join( + right, + JoinType::Inner, + vec![Column::from_name("c1".to_string())], Review comment: Turns out it is not possible to use `Vec<impl Into<Column>>` in the signature because Dataframe is used as trait object. I changed the signature back to `&[&str]` for dataframe API and switched to `Vec<impl Into<Column>>` in the logical plan builder to keep the high level API simple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org