milenkovicm opened a new issue, #15087:
URL: https://github.com/apache/datafusion/issues/15087

   ### Describe the bug
   
   There might be a regression on v46
   
   After updating to 46.0.0 there is a `stack overflow` calling 
`PhysicalPlanNode::try_from_physical_plan` on a relatively simple plan panics 
with `thread 'testing::should_not_overflow_stack' has overflowed its stack`
   
   This is a strange one to reproduce
   
   ### To Reproduce
   
   Create new datafusion project (for some reason I can't reproduce it at 
datafusion `main` nor tag `46.0.0`)
   
   ```toml
   [package]
   name = "bug_reproducer"
   version = "0.1.0"
   edition = "2024"
   
   [dependencies]
   tokio = { version = "1", features = ["full"] }
   datafusion = { version = "46" }
   datafusion-proto = { version = "46" }
   ```
   
   ```rust
   #[cfg(test)]
   mod testing {
       use datafusion::prelude::*;
       use datafusion_proto::physical_plan::{AsExecutionPlan, 
DefaultPhysicalExtensionCodec};
       use datafusion_proto::protobuf::PhysicalPlanNode;
   
       #[tokio::test]
       async fn should_not_overflow_stack() {
           let ctx = SessionContext::new();
   
           let test_data = crate::common::example_test_data();
   
           ctx.register_parquet(
               "pt",
               &format!("{test_data}/alltypes_plain.parquet"),
               Default::default(),
           )
           .await
           .unwrap();
   
           let plan = ctx
               .sql("select id, string_col, timestamp_col from pt where id > 4 
order by string_col")
               .await
               .unwrap()
               .create_physical_plan()
               .await
               .unwrap();
           // this call panics
           //
           // thread 'testing::should_not_overflow_stack' has overflowed its 
stack
           // fatal runtime error: stack overflow
           //
           let node: PhysicalPlanNode =
               PhysicalPlanNode::try_from_physical_plan(plan, 
&DefaultPhysicalExtensionCodec {})
                   .unwrap();
   
           let plan = node
               .try_into_physical_plan(&ctx, &ctx.runtime_env(), 
&DefaultPhysicalExtensionCodec {})
               .unwrap();
   
           let _ = plan.execute(0, ctx.task_ctx()).unwrap();
       }
   }
   ```
   run 
   
   ```bash
   cargo test
   ````
   fails with 
   
   ```
   running 1 test
   
   thread 'testing::should_not_overflow_stack' has overflowed its stack
   fatal runtime error: stack overflow
   ```
   
   works ok with
   
   ```bash
   export RUST_MIN_STACK=20971520
   cargo test
   ```
   
   ```bash
   cargo test --release
   ```
   
   
   looking at the plan and quick debugging: 
   ```
       //     SortPreservingMergeExec: [string_col@1 ASC NULLS LAST]
       //          SortExec: expr=[string_col@1 ASC NULLS LAST], 
preserve_partitioning=[true]
       //              CoalesceBatchesExec: target_batch_size=8192
       //                  FilterExec: id@0 > 4
       //                      RepartitionExec: 
partitioning=RoundRobinBatch(14), input_partitions=1
       //                          DataSourceExec: file_groups={1 group: 
[[Users/marko/git/arrow-datafusion-fork/parquet-testing/data/alltypes_plain.parquet]]},
 projection=[id, string_col, timestamp_col], file_type=parquet, predicate=id@0 
> 4, pruning_predicate=id_null_count@1 != row_count@2 AND id_max@0 > 4, 
required_guarantees=[]
   ```
   
   last valid frame before it panics is at 
   
   ```
           if let Some(exec) = plan.downcast_ref::<RepartitionExec>() {
               let input = protobuf::PhysicalPlanNode::try_from_physical_plan(
                   exec.input().to_owned(),
                   extension_codec,
               )?;
   
               let pb_partitioning =
                   serialize_partitioning(exec.partitioning(), 
extension_codec)?;
   ```
   
   
   
   ### Expected behavior
   
   Expected round trip to be successful, without stack increase for such a 
simple plan 
   
   
   ### Additional context
   
   reproduced on MacBook Pro (M4), rust 1.85
   I'll note again, I haven't been able to reproduce it directly on `main` nor 
`46.0.0` tag (which puzzles me even more)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to