bkietz commented on pull request #8188:
URL: https://github.com/apache/arrow/pull/8188#issuecomment-693622612
For a range of file and column counts, the time to read is as follows:
```
nfiles ncolumns legacy_time default_time regression
1 1 0.490398 0.401345 -0.181592
1 2 0.642569 0.523074 -0.185964
1 4 0.988469 0.945871 -0.043095
1 8 1.541519 1.602061 0.039274
2 1 1.078602 0.622690 -0.422688
2 2 1.275463 0.922737 -0.276548
2 4 1.601820 2.001778 0.249689
2 8 2.847058 4.283226 0.504439
4 1 2.116808 0.760073 -0.640935
4 2 2.458016 1.472731 -0.400846
4 4 3.975070 2.648561 -0.333707
4 8 6.531598 6.030903 -0.076657
```
(times in seconds, regression computed as (default_time -
legacy_time)/legacy_time)
`$ python -m pyperf system show`
<details>
<pre>
System state
============
CPU: use 8 logical CPUs: 0-7
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-7=min=max=1800 MHz
CPU scaling governor (intel_pstate): performance
Turbo Boost (intel_pstate): Turbo Boost disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0-7
IRQ affinity: IRQ affinity: IRQ
0-17,51,120-127,129-130,138-139,146,155-158=CPU 0-7; IRQ 128=CPU 0; IRQ 131=CPU
1; IRQ 132=CPU 2; IRQ 133=CPU 3; IRQ 134=CPU 4; IRQ 135=CPU 5; IRQ 136=CPU 6;
IRQ 137=CPU 7
Power supply: the power cable is plugged
</pre>
</details>
We mostly see a performance improvement with defaults, including moderate
improvement in single file reading time. Note the significant regressions when
reading two files with 4 or 8 columns, which is to be expected since legacy is
able to divide that work across 4 or 8 threads instead of only 2.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]