[GitHub] [arrow] Dandandan commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

GitBox Fri, 05 Mar 2021 02:33:47 -0800


Dandandan commented on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-791332249



   @yordan-pavlov thanks a lot for detailed descriptions. I think that's a 
great overview of the current situation.
   
   I can help with reviewing your next PR (maybe the `10-15% ` improvement 
might be useful to have as PR already?).
   
   I will also have a look at using a sampling profiler (`perf`?), so far I 
have been using callgrind / cachegrind for collecting profiles which doesn't 
always give perfect results (although run time and instructions are quite 
correlated) and is quite slow.
   Using MS Visual Studio sound like a great idea too. I think maybe it's worth 
to document steps to profile Arrow/Parquet/DataFusion on different profiles, 
these are my current steps with callgrind:
   
   
https://docs.google.com/document/d/1OqM1SSFmopcbz4JtOXJ8pXE7c1b4A2zDm4w207KBjq0/edit?usp=sharing
 
    
   I think from different queries the source of the "hot path" might be very 
different. For example, I didn't so far see that much in the 
`ComplexObjectArrayReader` in my test so it would be also good to profile / 
optimize a variety of parquet files / queries.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Dandandan commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

Reply via email to