hn5092 edited a comment on pull request #7288:
URL: https://github.com/apache/arrow/pull/7288#issuecomment-637438925


   > @hn5092 thank you for the PR. Could you add some benchmarks on your 
machine that shows this improves things? I might be looking at the wrong place 
but it appears memory is retained as capacity in the buffers after a reset 
(assuming nothing is written). I agree that the ordering is strange though.
   
   
   because TypedRecordReader::ReadValuesDense does not ResetValues, so when you 
buffer is fully, call record_reader_->Reserve(batch_size) will resize to 
current size * 2
   data : with 6_000_000 rows parquet,about 250M,read 3 long type columns(about 
45MB total)
   env: mac pro 2019  Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 256GSSD
   
   improve about 10%
   below is before and after :
   before: 
   
![image](https://user-images.githubusercontent.com/10030046/83508621-c14d3d80-a4fc-11ea-9bd6-f6fa3b1e2ca3.png)
   
   
   after:
   
![image](https://user-images.githubusercontent.com/10030046/83508123-1472c080-a4fc-11ea-9570-d1685fcc83d6.png)
   
   
   before profile:
   
![image](https://user-images.githubusercontent.com/10030046/83508517-a11d7e80-a4fc-11ea-8077-12f3fa677c19.png)
   
   
   after profile:
   
![image](https://user-images.githubusercontent.com/10030046/83508427-80552900-a4fc-11ea-894d-95cfdd8aa5c0.png)
   
   
   we can see the method reserve time from 24% to 16%
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to