dhegberg commented on PR #13228:
URL: https://github.com/apache/datafusion/pull/13228#issuecomment-2547418265

   Updated to move Null parsing regex to a config.  
   
   Benchmark comparison when using regex does show some regression:
   
   Before:
   ```
        Running benches/csv_load.rs 
(/Users/dhegberg/workplace/datafusion/target/release/deps/csv_load-0ca64cec5e99a8c3)
   Gnuplot not found, using plotters backend
   Generated test dataset with 69642 rows
   Benchmarking load csv testing/default csv read options
   Benchmarking load csv testing/default csv read options: Warming up for 
3.0000 s
   Benchmarking load csv testing/default csv read options: Collecting 100 
samples in estimated 20.457 s (1200 iterations)
   Benchmarking load csv testing/default csv read options: Analyzing
   load csv testing/default csv read options
                           time:   [20.305 ms 20.536 ms 20.763 ms]
   mean   [20.305 ms 20.763 ms] std. dev.      [1.0513 ms 1.2800 ms]
   median [20.127 ms 21.042 ms] med. abs. dev. [1.0398 ms 1.6551 ms]
   
   After:
   ```
   Gnuplot not found, using plotters backend
   Generated test dataset with 69642 rows
   Benchmarking load csv testing/default csv read options
   Benchmarking load csv testing/default csv read options: Warming up for 
3.0000 s
   Benchmarking load csv testing/default csv read options: Collecting 100 
samples in estimated 21.606 s (1200 iterations)
   Benchmarking load csv testing/default csv read options: Analyzing
   load csv testing/default csv read options
                           time:   [21.583 ms 21.856 ms 22.166 ms]
                           change: [+1.9609% +3.6130% +5.3682%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
     1 (1.00%) high severe
   mean   [21.583 ms 22.166 ms] std. dev.      [988.76 µs 2.0538 ms]
   median [21.438 ms 21.965 ms] med. abs. dev. [776.50 µs 1.3261 ms]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to