yjshen commented on PR #2146:
URL: 
https://github.com/apache/arrow-datafusion/pull/2146#issuecomment-1087134690

   # TPC-H SF=1
   
   `master`:
   
   ```
   Running benchmarks with the following options: DataFusionBenchmarkOpt { 
query: 1, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: 
"../../tpch-parquet/", file_format: "parquet", mem_table: false, output_path: 
None }
   Query 1 iteration 0 took 2851.7 ms and returned 6001214 rows
   Query 1 iteration 1 took 2817.7 ms and returned 6001214 rows
   Query 1 iteration 2 took 2735.9 ms and returned 6001214 rows
   Query 1 avg time: 2801.75 ms
   ```
   
   This PR:
   
   ```
   Running benchmarks with the following options: DataFusionBenchmarkOpt { 
query: 1, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: 
"/home/yijie/sort_test/tpch-parquet", file_format: "parquet", mem_table: false, 
output_path: None }
   Query 1 iteration 0 took 3174.9 ms and returned 6001214 rows
   Query 1 iteration 1 took 3130.8 ms and returned 6001214 rows
   Query 1 iteration 2 took 3058.3 ms and returned 6001214 rows
   Query 1 avg time: 3121.35 ms
   ```
   
   The row format comes with a price of more computation, with ~11% performance 
deterioration witnessed. Although this PR is showing a better cache locality, 
the computation cost overweight the cache benefits:
   
   ```
   sudo perf stat -a -e 
cache-misses,cache-references,l3_cache_accesses,l3_misses,dTLB-load-misses,dTLB-loads
 target/release/tpch benchmark datafusion --iterations 3 --path 
/home/yijie/sort_test/tpch-parquet --format parquet --query 1 --batch-size 4096
   ```
   
   `master`
   ```
    Performance counter stats for 'system wide':
   
          756,702,553      cache-misses              #   34.256 % of all cache 
refs    
        2,208,936,269      cache-references                                     
       
        1,156,898,644      l3_cache_accesses                                    
       
          362,860,081      l3_misses                                            
       
          215,166,268      dTLB-load-misses          #   45.27% of all dTLB 
cache accesses
          475,312,480      dTLB-loads                                           
       
   
          8.774750150 seconds time elapsed
   ```
   
   This PR:
   
   ```
    Performance counter stats for 'system wide':
   
          593,785,538      cache-misses              #   25.841 % of all cache 
refs    
        2,297,807,480      cache-references                                     
       
          835,622,737      l3_cache_accesses                                    
       
          227,838,803      l3_misses                                            
       
          146,442,785      dTLB-load-misses          #   55.76% of all dTLB 
cache accesses
          262,616,456      dTLB-loads                                           
       
   
         10.556249442 seconds time elapsed
   ```
   Much better cache accessing behavior with the row format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to