mpurins-coralogix opened a new issue, #21893:
URL: https://github.com/apache/datafusion/issues/21893

   ### Describe the bug
   
   `FilterExecStream::poll_next` re-polls its input via `continue` without 
yielding when the predicate rejects every row in a batch. With many such 
batches the task becomes uncancellable and pins a Tokio worker. Worst with 
expensive predicates or non-cooperative inputs. `EnsureCooperative` does not 
help because it skips lazy non-leaf operators.
   
   ### To Reproduce
   
   I don't have exact reproducer, but we are hitting this for queries which are 
something like
   ```
   SELECT * FROM unbounded_source
   WHERE expensive_udf(col) = 'never_matches';
   ```
   
   ### Expected behavior
   
   FilterExec` yields cooperatively so that cancellation takes effect within 
bounded time regardless of predicate cost or whether anything upstream is 
cooperative.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to