alamb commented on a change in pull request #8748:
URL: https://github.com/apache/arrow/pull/8748#discussion_r541908029



##########
File path: rust/datafusion/src/physical_plan/mod.rs
##########
@@ -80,8 +80,9 @@ pub trait ExecutionPlan: Debug + Send + Sync {
         children: Vec<Arc<dyn ExecutionPlan>>,
     ) -> Result<Arc<dyn ExecutionPlan>>;
 
-    /// creates an iterator
-    async fn execute(&self, partition: usize) -> 
Result<SendableRecordBatchStream>;
+    /// creates a vector of streams of [RecordBatch]es, each stream 
corresponding
+    /// to a part of the partition.
+    async fn execute(&self) -> Result<Vec<SendableRecordBatchStream>>;

Review comment:
       I am sorry I have just started looking at this PR more carefully -- it 
seems like the proposal is to change execute so that it returns a `Vec` of 
streams rather than a single stream. I think this is reasonable as it allows 
physical plan implementations more flexibility when reading the inputs (they 
can always merge the streams into one if they choose too)

##########
File path: rust/datafusion/src/physical_plan/hash_join.rs
##########
@@ -143,13 +143,10 @@ impl ExecutionPlan for HashJoinExec {
         self.right.output_partitioning()
     }
 
-    async fn execute(&self, partition: usize) -> 
Result<SendableRecordBatchStream> {
+    async fn execute(&self) -> Result<Vec<SendableRecordBatchStream>> {

Review comment:
       Ah, I had missed this detail on the earlier versions of this PR -- that 
each invocation of execute for the probe side needed to call execute on the 
build side.
   
   An alternate implementation would be to store the hash table on `self` (in a 
Arc<Mutex<..>>) and reuse it on subsequent invocations.
   
   I like the pattern of this PR where each `execute` method is invoked once 
and then the async streams are used to coordinate the threading / execution. 
   
   👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to