lidavidm commented on pull request #9656:
URL: https://github.com/apache/arrow/pull/9656#issuecomment-812079330


   Antoine's suggestion helps the microbenchmarks and the datasets benchmark. 
Now the throughput loss is more like 50% in the worst case rather than 70%. 
Note that ReadTempFileAsync here is using a separate thread for I/O (it's not 
overriding ReadAsync).
   
   For datasets, the median read time is now 0.39 seconds instead of 0.28 
(baseline with Arrow 3.0) for 16 files. (I did the test locally for 
convenience, so it's not directly comparable to the figures above. I'll re-test 
on EC2 with S3 as well.)
   
   <details>
   <summary>Microbenchmarks</summary>
   
   ```
   
-------------------------------------------------------------------------------------------------
   Benchmark                                       Time             CPU   
Iterations UserCounters...
   
-------------------------------------------------------------------------------------------------
   ReadFile/1/real_time                         8101 ns         8101 ns        
85085 bytes_per_second=120.545G/s
   ReadFile/4/real_time                        10056 ns        10056 ns        
64400 bytes_per_second=97.1103G/s
   ReadFile/16/real_time                       18447 ns        18446 ns        
37541 bytes_per_second=52.9399G/s
   ReadFile/64/real_time                       56345 ns        56344 ns        
12021 bytes_per_second=17.3319G/s
   ReadFile/256/real_time                     275449 ns       275443 ns         
2530 bytes_per_second=3.54535G/s
   ReadFile/1024/real_time                   1075215 ns      1075215 ns         
 650 bytes_per_second=930.046M/s
   ReadFile/4096/real_time                   4280521 ns      4280466 ns         
 164 bytes_per_second=233.616M/s
   ReadFile/8192/real_time                   8277543 ns      8277531 ns         
  83 bytes_per_second=120.809M/s
   ReadFileAsync/1/real_time                   14815 ns        14814 ns        
46994 bytes_per_second=65.9191G/s
   ReadFileAsync/4/real_time                   15624 ns        15624 ns        
44248 bytes_per_second=62.5038G/s
   ReadFileAsync/16/real_time                  27335 ns        27335 ns        
25732 bytes_per_second=35.7253G/s
   ReadFileAsync/64/real_time                  73688 ns        73687 ns         
9198 bytes_per_second=13.2528G/s
   ReadFileAsync/256/real_time                291492 ns       291469 ns         
2449 bytes_per_second=3.35022G/s
   ReadFileAsync/1024/real_time              1102951 ns      1102895 ns         
 619 bytes_per_second=906.658M/s
   ReadFileAsync/4096/real_time              4314662 ns      4314682 ns         
 163 bytes_per_second=231.768M/s
   ReadFileAsync/8192/real_time              8694463 ns      8694232 ns         
  83 bytes_per_second=115.016M/s
   ReadTempFile/1/real_time                    71343 ns        71307 ns         
9745 bytes_per_second=219.011G/s
   ReadTempFile/4/real_time                    74333 ns        74289 ns         
9359 bytes_per_second=210.204G/s
   ReadTempFile/16/real_time                   86159 ns        86120 ns         
8013 bytes_per_second=181.351G/s
   ReadTempFile/64/real_time                  134295 ns       134260 ns         
5204 bytes_per_second=116.348G/s
   ReadTempFile/256/real_time                 348257 ns       348257 ns         
2033 bytes_per_second=44.8663G/s
   ReadTempFile/1024/real_time               1182705 ns      1182686 ns         
 578 bytes_per_second=13.2112G/s
   ReadTempFile/4096/real_time               4513926 ns      4513829 ns         
 152 bytes_per_second=3.46151G/s
   ReadTempFile/8192/real_time               9022289 ns      9022091 ns         
  82 bytes_per_second=1.73182G/s
   ReadTempFileAsync/1/real_time               92365 ns        14635 ns         
7266 bytes_per_second=169.166G/s
   ReadTempFileAsync/4/real_time               95914 ns        16114 ns         
7146 bytes_per_second=162.906G/s
   ReadTempFileAsync/16/real_time             116399 ns        23281 ns         
5929 bytes_per_second=134.237G/s
   ReadTempFileAsync/64/real_time             171039 ns        43051 ns         
3991 bytes_per_second=91.3533G/s
   ReadTempFileAsync/256/real_time            432781 ns       142806 ns         
1704 bytes_per_second=36.1037G/s
   ReadTempFileAsync/1024/real_time          1371883 ns       498746 ns         
 500 bytes_per_second=11.3895G/s
   ReadTempFileAsync/4096/real_time          4835772 ns      2005729 ns         
 146 bytes_per_second=3.23113G/s
   ReadTempFileAsync/8192/real_time          8852002 ns      3848468 ns         
  76 bytes_per_second=1.76514G/s
   ReadMmapFile/1/real_time                    20815 ns        20769 ns        
31614 bytes_per_second=750.644G/s
   ReadMmapFile/4/real_time                    23754 ns        23691 ns        
29596 bytes_per_second=657.78G/s
   ReadMmapFile/16/real_time                   34881 ns        34836 ns        
19372 bytes_per_second=447.949G/s
   ReadMmapFile/64/real_time                   89626 ns        89578 ns         
8409 bytes_per_second=174.335G/s
   ReadMmapFile/256/real_time                 299948 ns       299949 ns         
2438 bytes_per_second=52.0924G/s
   ReadMmapFile/1024/real_time               1154208 ns      1154170 ns         
 633 bytes_per_second=13.5374G/s
   ReadMmapFile/4096/real_time               4645056 ns      4645020 ns         
 144 bytes_per_second=3.36379G/s
   ReadMmapFile/8192/real_time               9252529 ns      9252098 ns         
  84 bytes_per_second=1.68873G/s
   ReadMmapFileAsync/1/real_time               30271 ns        30206 ns        
23769 bytes_per_second=516.174G/s
   ReadMmapFileAsync/4/real_time               33118 ns        33064 ns        
20024 bytes_per_second=471.792G/s
   ReadMmapFileAsync/16/real_time              46112 ns        46051 ns        
15327 bytes_per_second=338.847G/s
   ReadMmapFileAsync/64/real_time              94376 ns        94332 ns         
7292 bytes_per_second=165.561G/s
   ReadMmapFileAsync/256/real_time            303248 ns       303204 ns         
2287 bytes_per_second=51.5255G/s
   ReadMmapFileAsync/1024/real_time          1155417 ns      1155200 ns         
 616 bytes_per_second=13.5233G/s
   ReadMmapFileAsync/4096/real_time          4420082 ns      4419477 ns         
 158 bytes_per_second=3.535G/s
   ReadMmapFileAsync/8192/real_time          9202209 ns      9200604 ns         
  83 bytes_per_second=1.69796G/s
   ReadCompressedFile/1/real_time           30842424 ns      1062249 ns         
  22 bytes_per_second=518.766M/s
   ReadCompressedFile/4/real_time            9152438 ns       670243 ns         
  73 bytes_per_second=1.7072G/s
   ReadCompressedFile/16/real_time           5901434 ns      1896983 ns         
 100 bytes_per_second=2.64766G/s
   ReadCompressedFile/64/real_time           8682105 ns      6752406 ns         
  84 bytes_per_second=1.79968G/s
   ReadCompressedFile/256/real_time         22773192 ns     21679357 ns         
  31 bytes_per_second=702.58M/s
   ReadCompressedFile/1024/real_time        89945075 ns     84703673 ns         
   8 bytes_per_second=177.886M/s
   ReadCompressedFile/4096/real_time       369957748 ns    348617092 ns         
   2 bytes_per_second=43.2482M/s
   ReadCompressedFile/8192/real_time       738977444 ns    706855214 ns         
   1 bytes_per_second=21.6515M/s
   ReadCompressedFileAsync/1/real_time      33837552 ns      1357830 ns         
  22 bytes_per_second=472.847M/s
   ReadCompressedFileAsync/4/real_time      10317374 ns       786148 ns         
  63 bytes_per_second=1.51444G/s
   ReadCompressedFileAsync/16/real_time      6362766 ns      2010616 ns         
 109 bytes_per_second=2.45569G/s
   ReadCompressedFileAsync/64/real_time      9536251 ns      6972457 ns         
  71 bytes_per_second=1.63848G/s
   ReadCompressedFileAsync/256/real_time    23441839 ns     22122635 ns         
  30 bytes_per_second=682.54M/s
   ReadCompressedFileAsync/1024/real_time   92509529 ns     86519080 ns         
   8 bytes_per_second=172.955M/s
   ReadCompressedFileAsync/4096/real_time  371194475 ns    345331236 ns         
   2 bytes_per_second=43.1041M/s
   ReadCompressedFileAsync/8192/real_time  660117227 ns    639096322 ns         
   1 bytes_per_second=24.2381M/s
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to