[GitHub] [arrow] mapleFU commented on issue #34510: Reading FixedSizeList from parquet is slower than reading values into more rows

via GitHub Fri, 10 Mar 2023 06:29:45 -0800


mapleFU commented on issue #34510:
URL: https://github.com/apache/arrow/issues/34510#issuecomment-1463884481


   > and it could be optimized to use a single static value, right?
   
   Yes, in the future, developer may optimize it. If `FixedSizeArray` is 
non-nullable, Parquet can have a single static value, but if `FixedSizeArray` 
is non-nullable, it cannot.
   
   > other reasons or replevels cascade somehow into even worse perf
   
   I've profile the C++ part, in my MacOS with release (O2):
   
   <img width="1305" alt="C811842B-4DDA-423A-B8A1-EC6A7E4ADE33" 
src="https://user-images.githubusercontent.com/24351052/224341042-8d2cb020-2aa6-41f4-8c66-a913436ea1c2.png";>
   
   1. Decoding double is fast
   2. Decoding levels use nearly same time as Decoding double
   3. Constructing List cost a little time
   
   The benchmark uses list rather than FixedSizeList, but I think the benchmark 
is similiar.
   
   I'm not so familiar with Python part, maybe someone can profile that path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mapleFU commented on issue #34510: Reading FixedSizeList from parquet is slower than reading values into more rows

Reply via email to