alamb commented on PR #7581:
URL: 
https://github.com/apache/arrow-datafusion/pull/7581#issuecomment-1885365940

   I thought more about this PR last night and in the shower. First of all, I 
think the direction it is headed is great and would make a good addition to 
DataFusion. Thank you @matthewgapp  both for the code as well as for your 
patience
   
   In terms of implementation, I think there are a few items that we need prior 
to making this feature available for everyone
   1. More test coverage (I hilighted some areas)
   2. Memory limits (so a recursive CTE where the intermediate relation grows 
without limit will not cause the process to be OOM killed). This is to prevent 
a denial of service (DOS) attack vector against systems that use DataFusion by 
exploding memory
   
   Some nice to have (but not strictly required) items:
   1. Simpler API (as @jonahgao  and I have suggested)
   2. A config flag to enable/disable support (as a way to protect against DOS)
   
   
   In order to avoid a massive PR, my suggestion is to implement this feature 
in stages:
   1. Add a config flag to enable/disable CTE support, default to disable
   2. Add parser / LogicalPlan support in one PR
   3. Add execution support in a second PR
   4. Add memory limiting in another PR
   5. Iterate / update tests
   6. When we are happy that everything is working well, change the switches 
default to enabled by default
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to