Re: [D] Why the unbounded channel is used inside the RepartitionExec? [datafusion]

via GitHub Mon, 22 Sep 2025 20:54:07 -0700


GitHub user YjyJeff closed a discussion: Why the unbounded channel is used 
inside the RepartitionExec?


According to the document in the `execute` method in `RepartitionExec`: 
>  Note that this operator uses unbounded channels to avoid deadlocks because 
> the output partitions can be read in any order and this could cause input 
> partitions to be blocked when sending data to output UnboundedReceivers that 
> are not being read yet. 

Can you explain it in detail? Why it will cause deadlocks? As far as I know, 
when we send the `RecordBatch` to channels that are not being read yet will 
cause the `Task` to sleep(scheduled off from the cpu by the tokio Scheduler and 
can be waked up by the scheduler later when the channel has space) rather than 
block the thread, which will never cause deadlock. A naive example:

```rust
// Single thread executor
#[tokio::main]
async fn main() {
    let (tx, mut rx) = tokio::sync::mpsc::channel(1);

    tokio::spawn(async move {
        for i in 0..10 {
            tx.send(i).await.unwrap();
            println!("Send {i}")
        }
    });

    // Sleep for a while, rx is not being readed now, which will cause the 
tx.send to sleep
    tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;
    while let Some(value) = rx.recv().await {
        println!("{value}")
    }
}

```

Thanks in advance!

GitHub link: https://github.com/apache/datafusion/discussions/4052

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] Why the unbounded channel is used inside the RepartitionExec? [datafusion]

Reply via email to