LiaCastaneda commented on code in PR #19937:
URL: https://github.com/apache/datafusion/pull/19937#discussion_r2717084931


##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -287,17 +293,18 @@ impl DynamicFilterPhysicalExpr {
 
     /// Wait asynchronously until this dynamic filter is marked as complete.
     ///
-    /// This method returns immediately if the filter is already complete or 
if the filter
-    /// is not being used by any consumers.
+    /// This method returns immediately if the filter is already complete.
     /// Otherwise, it waits until [`Self::mark_complete`] is called.
     ///
     /// Unlike [`Self::wait_update`], this method guarantees that when it 
returns,
     /// the filter is fully complete with no more updates expected.
-    pub async fn wait_complete(self: &Arc<Self>) {
-        if !self.is_used() {
-            return;
-        }
-
+    ///
+    /// # Note
+    ///
+    /// This method should only be called on filters that have consumers. If 
you don't
+    /// know whether the filter is being used, call [`Self::is_used`] first to 
avoid
+    /// waiting indefinitely.

Review Comment:
   > Under normal operation it would be a consumer calling wait_complete and 
hence it knows that a consumer exists because it is a consumer. In other words, 
under normal operation wait_completed is only called by consumers and thus 
is_used would always be true.
   
   🤔 I agree -- when I implemented this API, I had in mind that it would be 
used mainly in custom probe nodes of a `HashJoinExec`, so yes, ideally 
`wait_complete` is always called by consumers (in those cases it's impossible 
for `wait_complete` to wait indefinitely).
   
   However, I remember when I added `is_used` in `HashJoinExec::execute(`), I 
saw in the tests we were waiting indefinitely (but this was mainly because 
before, `is_used` only checked the inner struct, which is not the case 
anymore), which is why I then added `is_used` inside `wait_complete`.
   
   I included this note mainly thinking about a scenario where some third-party 
node has a reference to the DynamicFilter of the HashJoin but doesn't know if 
it has a consumer or not. However, in that case, the third-party node would 
hold a reference and `is_used` would return true, then filters would be 
computed and `wait_complete` would return successfully.
   
   So yes, if this happens, it would be a programming error. I will remove the 
comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to