pitrou opened a new pull request, #46886:
URL: https://github.com/apache/arrow/pull/46886
### Rationale for this change
Reading FIXED_LEN_BYTE_ARRAY columns goes through an intermediate array of
FLBA structures, even when the end goal is to decode to a FixedSizeBinaryArray.
This makes reading FLOAT16 data slower than FLOAT, even though the data is
smaller in memory.
### What changes are included in this PR?
Improve the performance of reading FIXED_LEN_BYTE_ARRAY columns to Arrow, by
avoiding an intermediate read to FLBA structures. This especially helps improve
the speed of reading FLOAT16 columns and makes it faster than FLOAT.
#### GH-43891 reproducer
* on git main:
```
.
writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1,
rows=64000, compression=None, dtype=float
Parquet size=1.8 GB
finished writing parquet file in 1.91 seconds
`ParquetReader.read_row_groups`, dtype:float, duration:1.02 seconds
.
writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1,
rows=64000, compression=None, dtype=halffloat
Parquet size=896.9 MB
finished writing parquet file in 3.85 seconds
`ParquetReader.read_row_groups`, dtype:halffloat, duration:1.94 seconds
```
* on this PR:
```
.
writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1,
rows=64000, compression=None, dtype=float
Parquet size=1.8 GB
finished writing parquet file in 2.57 seconds
`ParquetReader.read_row_groups`, dtype:float, duration:0.97 seconds
.
writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1,
rows=64000, compression=None, dtype=halffloat
Parquet size=896.9 MB
finished writing parquet file in 3.75 seconds
`ParquetReader.read_row_groups`, dtype:halffloat, duration:0.69 seconds
```
#### Float16 micro-benchmarks
```
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (12)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark
baseline contender change %
counters
BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1
601.727 MiB/sec 3.400 GiB/sec 478.579 {'family_index': 10,
'per_family_instance_index': 0, 'run_name':
'BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21}
BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1
525.749 MiB/sec 2.062 GiB/sec 301.588 {'family_index': 12,
'per_family_instance_index': 0, 'run_name':
'BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17}
BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100
387.569 MiB/sec 1.366 GiB/sec 260.962 {'family_index': 11,
'per_family_instance_index': 4, 'run_name':
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100
411.361 MiB/sec 1.443 GiB/sec 259.201 {'family_index': 13,
'per_family_instance_index': 4, 'run_name':
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99
354.204 MiB/sec 1.053 GiB/sec 204.556 {'family_index': 13,
'per_family_instance_index': 3, 'run_name':
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12}
BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99
349.165 MiB/sec 1.029 GiB/sec 201.659 {'family_index': 11,
'per_family_instance_index': 3, 'run_name':
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12}
BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0
463.178 MiB/sec 1.311 GiB/sec 189.900 {'family_index': 11,
'per_family_instance_index': 0, 'run_name':
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0
392.840 MiB/sec 1.079 GiB/sec 181.219 {'family_index': 13,
'per_family_instance_index': 0, 'run_name':
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13}
BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50
187.701 MiB/sec 514.172 MiB/sec 173.930 {'family_index': 11,
'per_family_instance_index': 2, 'run_name':
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50
188.327 MiB/sec 506.043 MiB/sec 168.704 {'family_index': 13,
'per_family_instance_index': 2, 'run_name':
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1
325.579 MiB/sec 848.216 MiB/sec 160.525 {'family_index': 13,
'per_family_instance_index': 1, 'run_name':
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11}
BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1
391.533 MiB/sec 976.771 MiB/sec 149.473 {'family_index': 11,
'per_family_instance_index': 1, 'run_name':
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13}
```
### Are these changes tested?
Yes.
### Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]