lidavidm edited a comment on pull request #9656: URL: https://github.com/apache/arrow/pull/9656#issuecomment-811445456
I've rebased this to use the background generator, however, it doesn't help much, and it makes us non-reentrant, so we also lose any advantage with compressed data as we can't parallelize the decompression anymore. The async reader gets anywhere from 30-90% of the throughput of the synchronous one. Cases here are numbered by the number of columns in the file. The cases with very few columns are a worst case for async, since decoding is basically 0-cost and async is purely overhead. Conversely the cases with many columns are a best case, since decoding is expensive. However async doesn't help because I/O is relatively cheap in all cases benchmarked here and there is no pipelining to be had. Frankly, the fastest approach I tested was to just wrap the synchronous reader in a Future and block the caller, which isn't encouraging. A flamegraph shows that using the thread pool for decoding work is still rather expensive, and so it might be better if we used something like the background generator for that as well. In that case it would be convenient if we could somehow pull directly from the background generator's queue instead of having to get and block on futures; also this still means we can't get any benefit from parallelizing decompression if needed. For datasets with files >= cores that's probably not a big deal if you only care about throughput (we'll still decode in parallel) but if you need results in order and/or you have few files relative to cores then it won't be optimal. You may question why in-memory (ReadFile) is slower than a temp file (ReadTempFile). In the flamegraphs, the culprit appears to be BufferReader's use of MemoryAdviseWillNeed, which spends a significant amount of time in the kernel. Removing it improves performance drastically. ``` ------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------- ReadFile/1/real_time 7858 ns 7858 ns 85629 bytes_per_second=124.269G/s ReadFile/4/real_time 10698 ns 10698 ns 64406 bytes_per_second=91.2852G/s ReadFile/16/real_time 21661 ns 21661 ns 32684 bytes_per_second=45.0839G/s ReadFile/64/real_time 67470 ns 67470 ns 10406 bytes_per_second=14.4741G/s ReadFile/256/real_time 275292 ns 275282 ns 2553 bytes_per_second=3.54738G/s ReadFile/1024/real_time 1071125 ns 1071065 ns 652 bytes_per_second=933.598M/s ReadFile/4096/real_time 4245107 ns 4245052 ns 165 bytes_per_second=235.565M/s ReadFile/8192/real_time 8157924 ns 8157957 ns 85 bytes_per_second=122.58M/s ReadFileAsync/1/real_time 23883 ns 7835 ns 29390 bytes_per_second=40.8887G/s ReadFileAsync/4/real_time 27242 ns 9040 ns 25836 bytes_per_second=35.8478G/s ReadFileAsync/16/real_time 40988 ns 14562 ns 17154 bytes_per_second=23.8253G/s ReadFileAsync/64/real_time 93104 ns 33633 ns 7334 bytes_per_second=10.489G/s ReadFileAsync/256/real_time 303852 ns 116901 ns 2313 bytes_per_second=3.21394G/s ReadFileAsync/1024/real_time 1430233 ns 531043 ns 546 bytes_per_second=699.187M/s ReadFileAsync/4096/real_time 4589980 ns 1895584 ns 153 bytes_per_second=217.866M/s ReadFileAsync/8192/real_time 8793373 ns 3865574 ns 82 bytes_per_second=113.722M/s ReadTempFile/1/real_time 70972 ns 70936 ns 9712 bytes_per_second=220.157G/s ReadTempFile/4/real_time 74053 ns 74022 ns 9243 bytes_per_second=210.997G/s ReadTempFile/16/real_time 85777 ns 85749 ns 8100 bytes_per_second=182.158G/s ReadTempFile/64/real_time 132803 ns 132783 ns 5331 bytes_per_second=117.656G/s ReadTempFile/256/real_time 333974 ns 333967 ns 2093 bytes_per_second=46.785G/s ReadTempFile/1024/real_time 1131198 ns 1131179 ns 607 bytes_per_second=13.8128G/s ReadTempFile/4096/real_time 4330575 ns 4330568 ns 161 bytes_per_second=3.60807G/s ReadTempFile/8192/real_time 8270275 ns 8270100 ns 85 bytes_per_second=1.8893G/s ReadTempFileAsync/1/real_time 88569 ns 12731 ns 7814 bytes_per_second=176.417G/s ReadTempFileAsync/4/real_time 94127 ns 14422 ns 7477 bytes_per_second=165.998G/s ReadTempFileAsync/16/real_time 104455 ns 20203 ns 6652 bytes_per_second=149.586G/s ReadTempFileAsync/64/real_time 158604 ns 38862 ns 4443 bytes_per_second=98.516G/s ReadTempFileAsync/256/real_time 372728 ns 122446 ns 1831 bytes_per_second=41.9207G/s ReadTempFileAsync/1024/real_time 1347728 ns 485078 ns 520 bytes_per_second=11.5936G/s ReadTempFileAsync/4096/real_time 4649311 ns 1930484 ns 151 bytes_per_second=3.36071G/s ReadTempFileAsync/8192/real_time 8773800 ns 3815852 ns 80 bytes_per_second=1.78087G/s ReadCompressedFile/1/real_time 30636840 ns 1421583 ns 23 bytes_per_second=522.247M/s ReadCompressedFile/4/real_time 9529811 ns 628655 ns 65 bytes_per_second=1.63959G/s ReadCompressedFile/16/real_time 5673642 ns 1863531 ns 122 bytes_per_second=2.75396G/s ReadCompressedFile/64/real_time 8372634 ns 6633169 ns 84 bytes_per_second=1.8662G/s ReadCompressedFile/256/real_time 22590210 ns 21607133 ns 28 bytes_per_second=708.271M/s ReadCompressedFile/1024/real_time 84274350 ns 81412117 ns 9 bytes_per_second=189.856M/s ReadCompressedFile/4096/real_time 330157333 ns 317542733 ns 2 bytes_per_second=48.4617M/s ReadCompressedFile/8192/real_time 648075491 ns 627804731 ns 1 bytes_per_second=24.6885M/s ReadCompressedFileAsync/1/real_time 57512529 ns 1849864 ns 9 bytes_per_second=278.2M/s ReadCompressedFileAsync/4/real_time 9702801 ns 553906 ns 71 bytes_per_second=1.61036G/s ReadCompressedFileAsync/16/real_time 6001873 ns 1765858 ns 114 bytes_per_second=2.60335G/s ReadCompressedFileAsync/64/real_time 8414578 ns 6398791 ns 81 bytes_per_second=1.8569G/s ReadCompressedFileAsync/256/real_time 22844448 ns 20703843 ns 30 bytes_per_second=700.389M/s ReadCompressedFileAsync/1024/real_time 83260767 ns 75605439 ns 8 bytes_per_second=192.167M/s ReadCompressedFileAsync/4096/real_time 329809506 ns 298760917 ns 2 bytes_per_second=48.5129M/s ReadCompressedFileAsync/8192/real_time 643886356 ns 584995701 ns 1 bytes_per_second=24.8491M/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
