duongcongtoai commented on issue #17446:
URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3366427064

   ```
   Benchmark 1: uv run polar.py sample-1m.parquet
     Time (mean ± σ):     258.6 ms ±  32.2 ms    [User: 514.8 ms, System: 218.1 
ms]
     Range (min … max):   238.5 ms … 348.3 ms    10 runs
    
     Warning: The first benchmarking run for this command was significantly 
slower than the rest (348.3 ms). This could be caused by (filesystem) caches 
that were not filled until after the first run. You should consider using the 
'--warmup' option to fill those caches before the actual benchmark. 
Alternatively, use the '--prepare' option to clear the caches before each 
timing run.
    
   Benchmark 2: uv run df.py sample-1m.parquet
     Time (mean ± σ):     345.3 ms ±   8.4 ms    [User: 2194.9 ms, System: 
241.7 ms]
     Range (min … max):   331.7 ms … 360.7 ms    10 runs
    
   Summary
     uv run polar.py sample-1m.parquet ran
       1.34 ± 0.17 times faster than uv run df.py sample-1m.parquet
   
   ```
   There were significant improvement after using `interleave`, but there are 
still some overflowing error writing to Parquet, i'm fixing that and push for 
review soon
   ```
   thread 'tokio-runtime-worker' (2611103) panicked at 
/home/toai/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-56.1.0/src/transform/mod.rs:676:31:
   MutableArrayData::new is infallible: DictionaryKeyOverflowError
   stack backtrace:
      0: __rustc::rust_begin_unwind
      1: core::panicking::panic_fmt
      2: core::result::unwrap_failed
      3: arrow_data::transform::MutableArrayData::with_capacities
      4: <alloc::vec::Vec<T> as 
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
      5: arrow_data::transform::MutableArrayData::with_capacities
      6: arrow_data::transform::MutableArrayData::with_capacities
      7: arrow_select::interleave::interleave_fallback
      8: <core::iter::adapters::GenericShunt<I,R> as 
core::iter::traits::iterator::Iterator>::next
      9: 
datafusion_physical_plan::sorts::builder::BatchBuilder::build_record_batch
     10: <datafusion_physical_plan::sorts::merge::SortPreservingMergeStream<C> 
as futures_core::stream::Stream>::poll_next
     11: datafusion_common_runtime::trace_utils::trace_future::{{closure}}
     12: <futures_util::future::future::Map<Fut,F> as 
core::future::future::Future>::poll
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to