alamb commented on a change in pull request #8748:
URL: https://github.com/apache/arrow/pull/8748#discussion_r541908029
##########
File path: rust/datafusion/src/physical_plan/mod.rs
##########
@@ -80,8 +80,9 @@ pub trait ExecutionPlan: Debug + Send + Sync {
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>>;
- /// creates an iterator
- async fn execute(&self, partition: usize) ->
Result<SendableRecordBatchStream>;
+ /// creates a vector of streams of [RecordBatch]es, each stream
corresponding
+ /// to a part of the partition.
+ async fn execute(&self) -> Result<Vec<SendableRecordBatchStream>>;
Review comment:
I am sorry I have just started looking at this PR more carefully -- it
seems like the proposal is to change execute so that it returns a `Vec` of
streams rather than a single stream. I think this is reasonable as it allows
physical plan implementations more flexibility when reading the inputs (they
can always merge the streams into one if they choose too)
##########
File path: rust/datafusion/src/physical_plan/hash_join.rs
##########
@@ -143,13 +143,10 @@ impl ExecutionPlan for HashJoinExec {
self.right.output_partitioning()
}
- async fn execute(&self, partition: usize) ->
Result<SendableRecordBatchStream> {
+ async fn execute(&self) -> Result<Vec<SendableRecordBatchStream>> {
Review comment:
Ah, I had missed this detail on the earlier versions of this PR -- that
each invocation of execute for the probe side needed to call execute on the
build side.
An alternate implementation would be to store the hash table on `self` (in a
Arc<Mutex<..>>) and reuse it on subsequent invocations.
I like the pattern of this PR where each `execute` method is invoked once
and then the async streams are used to coordinate the threading / execution.
👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]