LiaCastaneda commented on code in PR #19937:
URL: https://github.com/apache/datafusion/pull/19937#discussion_r2717084931
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -287,17 +293,18 @@ impl DynamicFilterPhysicalExpr {
/// Wait asynchronously until this dynamic filter is marked as complete.
///
- /// This method returns immediately if the filter is already complete or
if the filter
- /// is not being used by any consumers.
+ /// This method returns immediately if the filter is already complete.
/// Otherwise, it waits until [`Self::mark_complete`] is called.
///
/// Unlike [`Self::wait_update`], this method guarantees that when it
returns,
/// the filter is fully complete with no more updates expected.
- pub async fn wait_complete(self: &Arc<Self>) {
- if !self.is_used() {
- return;
- }
-
+ ///
+ /// # Note
+ ///
+ /// This method should only be called on filters that have consumers. If
you don't
+ /// know whether the filter is being used, call [`Self::is_used`] first to
avoid
+ /// waiting indefinitely.
Review Comment:
> Under normal operation it would be a consumer calling wait_complete and
hence it knows that a consumer exists because it is a consumer. In other words,
under normal operation wait_completed is only called by consumers and thus
is_used would always be true.
🤔 I agree -- when I implemented this API, I had in mind that it would be
used mainly in custom probe nodes of a `HashJoinExec`, so yes, ideally
`wait_complete` is always called by consumers (in those cases it's impossible
for `wait_complete` to wait indefinitely).
However, I remember when I added `is_used` in `HashJoinExec::execute(`), I
saw in the tests we were waiting indefinitely (but this was mainly because
before, `is_used` only checked the inner struct, which is not the case
anymore), which is why I then added `is_used` inside `wait_complete`.
I included this note mainly thinking about a scenario where some third-party
node has a reference to the DynamicFilter of the HashJoin but doesn't know if
it has a consumer or not. However, in that case, the third-party node would
hold a reference and `is_used` would return true, then filters would be
computed and `wait_complete` would return successfully.
So yes, if this happens, it would be a programming error. I will remove the
comment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]