pitrou opened a new pull request, #46886:
URL: https://github.com/apache/arrow/pull/46886

   ### Rationale for this change
   
   Reading FIXED_LEN_BYTE_ARRAY columns goes through an intermediate array of 
FLBA structures, even when the end goal is to decode to a FixedSizeBinaryArray. 
This makes reading FLOAT16 data slower than FLOAT, even though the data is 
smaller in memory.
   
   ### What changes are included in this PR?
   
   Improve the performance of reading FIXED_LEN_BYTE_ARRAY columns to Arrow, by 
avoiding an intermediate read to FLBA structures. This especially helps improve 
the speed of reading FLOAT16 columns and makes it faster than FLOAT.
   
   #### GH-43891 reproducer
   
   * on git main:
   ```
   .
   writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1, 
rows=64000, compression=None, dtype=float
   Parquet size=1.8 GB
   finished writing parquet file in 1.91 seconds
   `ParquetReader.read_row_groups`, dtype:float, duration:1.02 seconds
   .
   writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1, 
rows=64000, compression=None, dtype=halffloat
   Parquet size=896.9 MB
   finished writing parquet file in 3.85 seconds
   `ParquetReader.read_row_groups`, dtype:halffloat, duration:1.94 seconds
   ```
   * on this PR:
   ```
   .
   writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1, 
rows=64000, compression=None, dtype=float
   Parquet size=1.8 GB
   finished writing parquet file in 2.57 seconds
   `ParquetReader.read_row_groups`, dtype:float, duration:0.97 seconds
   .
   writing parquet file:/tmp/my.parquet, columns=7000, row_groups=1, 
rows=64000, compression=None, dtype=halffloat
   Parquet size=896.9 MB
   finished writing parquet file in 3.75 seconds
   `ParquetReader.read_row_groups`, dtype:halffloat, duration:0.69 seconds
   ```
   
   #### Float16 micro-benchmarks
   ```
   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (12)
   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                    benchmark   
     baseline       contender  change %                                         
                                                                                
                                                                                
       counters
             BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1 
601.727 MiB/sec   3.400 GiB/sec   478.579           {'family_index': 10, 
'per_family_instance_index': 0, 'run_name': 
'BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21}
   BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1 
525.749 MiB/sec   2.062 GiB/sec   301.588 {'family_index': 12, 
'per_family_instance_index': 0, 'run_name': 
'BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17}
             BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100 
387.569 MiB/sec   1.366 GiB/sec   260.962           {'family_index': 11, 
'per_family_instance_index': 4, 'run_name': 
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10}
   BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100 
411.361 MiB/sec   1.443 GiB/sec   259.201 {'family_index': 13, 
'per_family_instance_index': 4, 'run_name': 
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13}
    BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99 
354.204 MiB/sec   1.053 GiB/sec   204.556  {'family_index': 13, 
'per_family_instance_index': 3, 'run_name': 
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12}
              BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99 
349.165 MiB/sec   1.029 GiB/sec   201.659            {'family_index': 11, 
'per_family_instance_index': 3, 'run_name': 
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12}
               BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0 
463.178 MiB/sec   1.311 GiB/sec   189.900             {'family_index': 11, 
'per_family_instance_index': 0, 'run_name': 
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15}
     BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0 
392.840 MiB/sec   1.079 GiB/sec   181.219   {'family_index': 13, 
'per_family_instance_index': 0, 'run_name': 
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13}
              BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50 
187.701 MiB/sec 514.172 MiB/sec   173.930             {'family_index': 11, 
'per_family_instance_index': 2, 'run_name': 
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7}
    BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50 
188.327 MiB/sec 506.043 MiB/sec   168.704   {'family_index': 13, 
'per_family_instance_index': 2, 'run_name': 
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6}
     BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1 
325.579 MiB/sec 848.216 MiB/sec   160.525   {'family_index': 13, 
'per_family_instance_index': 1, 'run_name': 
'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11}
               BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1 
391.533 MiB/sec 976.771 MiB/sec   149.473             {'family_index': 11, 
'per_family_instance_index': 1, 'run_name': 
'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13}
   ```
   
   ### Are these changes tested?
   
   Yes.
   
   ### Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to