Re: [I] Data set which is much bigger than RAM [datafusion]

via GitHub Fri, 14 Jun 2024 11:44:08 -0700


alamb commented on issue #10897:
URL: https://github.com/apache/datafusion/issues/10897#issuecomment-2168571852


   > Will it swallow all memory and fail or it will be running in a kind on 
streaming format?
   
   Hi @Smotrov, given your description and code, I would expect this query to 
run incrementally and not buffer all the results to memory -- that is I would 
expect the query to stream 
   
   There are some operators that require potentially buffering all data 
(grouping, joins, sorts) but you don't seem to be doing that
   
   I am not super familar with exactly how the json writing is implemented, but 
I believe that should be streaming as well
   
   > How could I limit the amount of memory 
   
   You can limit the amount of memory using 
https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/trait.MemoryPool.html
   
   However, as I mentioed I wouldn't expect your query to buffer large amounts 
of memory, so if it is maybe we need to adjust the writer seetings or there is 
some improvement to make to datafusion
   
   Let us know how it goes!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Data set which is much bigger than RAM [datafusion]

Reply via email to