[ https://issues.apache.org/jira/browse/ARROW-17913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612323#comment-17612323 ]
Joris Van den Bossche edited comment on ARROW-17913 at 10/3/22 2:49 PM: ------------------------------------------------------------------------ I am not directly sure what <=6.0 did differently, but looking at the current implementation this is somewhat expected (it might still be that it can be implemented in a better way, of course): when specifying columns, it will read each column separately from the MemoryMappedFile (instead doing a single ReadAt call), and copying each read chunk in a single output buffer, and thus because of this copy the memory-mapping basically has no effect in this case (https://github.com/apache/arrow/blob/ec579df631deaa8f6186208ed2a4ebec00581dfa/cpp/src/arrow/io/file.h#L182-L185) This can also be seen when you compare timings with and without memory mapping (with {{memory_map=False}}, there is no difference anymore between manually selecting all columns or not): {code} In [5]: %timeit feather.read_table('test.feather', columns=list(df.columns), memory_map=True) 29.4 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [6]: %timeit feather.read_table('test.feather', columns=list(df.columns), memory_map=False) 35.3 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [7]: %timeit feather.read_table('test.feather', memory_map=True) 239 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [8]: %timeit feather.read_table('test.feather', memory_map=False) 35 ms ± 428 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) {code} Now, I would have assumed that it is not needed that all buffers of all columns live in a single body, so I am not 100% sure why it is needed to copy each field to a single output. was (Author: jorisvandenbossche): I am not directly sure what <=6.0 did differently, but looking at the current implementation this is somewhat expected (it might still be that it can be implemented in a better way, of course): when specifying columns, it will read each column separately from the MemoryMappedFile (instead doing a single ReadAt call), and copying each read chunk in a single output buffer, and thus because of this copy the memory-mapping basically has no effect in this case (https://github.com/apache/arrow/blob/ec579df631deaa8f6186208ed2a4ebec00581dfa/cpp/src/arrow/io/file.h#L182-L185) This can also be seen when you compare timings with and without memory mapping (with {{memory_map=False}}, there is no difference anymore between manually selecting all columns or not): {code} In [5]: %timeit feather.read_table('test.feather', columns=list(df.columns), memory_map=True) 29.4 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [6]: %timeit feather.read_table('test.feather', columns=list(df.columns), memory_map=False) 35.3 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [7]: %timeit feather.read_table('test.feather', memory_map=True) 239 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [8]: %timeit feather.read_table('test.feather', memory_map=False) 35 ms ± 428 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) {code} Now, I would have assumed that it is not needed that all buffers of all columns live in a single memory chunk, so I am not 100% sure why it is needed to copy each field to a single output. > feather.read_table 150x slower when reading columns in newer versions > --------------------------------------------------------------------- > > Key: ARROW-17913 > URL: https://issues.apache.org/jira/browse/ARROW-17913 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 7.0.0, 8.0.0, 9.0.0 > Environment: python 3.9, ubuntu 20.04 > Reporter: Håkon Magne Holmen > Priority: Major > Labels: feather, performance > > h3. Description > Performance when reading columns using {{feather.read_table}} on Arrow > 7.0.0-9.0.0 is drastically slower than it was in 6.0.0. > Profiling the code below shows that the bottleneck is somewhere in the > {{read_names}} function of {{pyarrow._feather.FeatherReader}}. > h5. Example > Setup code: > {code} > import pandas as pd > from pyarrow import feather > rows, cols = (1_000_000, 10) > data = {f'c{c}': range(rows) for c in range(cols)} > df = pd.DataFrame(data=data) > feather.write_feather(df, 'test.feather', compression="uncompressed"){code} > Benchmarks Arrow 9.0.0: > {code} > %timeit feather.read_table('test.feather', memory_map=True) > %timeit feather.read_table('test.feather', columns=list(df.columns), > memory_map=True) > > 178 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) > 33.8 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) > {code} > Benchmarks Arrow 6.0.0: > {code} > %timeit feather.read_table('test.feather', memory_map=True) > %timeit feather.read_table('test.feather', columns=list(df.columns), > memory_map=True) > > 173 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) > 224 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)