matthewgapp commented on PR #7581: URL: https://github.com/apache/arrow-datafusion/pull/7581#issuecomment-1886010269
> I thought more about this PR last night and in the shower. First of all, I think the direction it is headed is great and would make a good addition to DataFusion. Thank you @matthewgapp both for the code as well as for your patience > > In terms of implementation, I think there are a few items that we need prior to making this feature available for everyone > > 1. More test coverage (I hilighted some areas) > 2. Memory limits (so a recursive CTE where the intermediate relation grows without limit will not cause the process to be OOM killed). This is to prevent a denial of service (DOS) attack vector against systems that use DataFusion by exploding memory > > Some nice to have (but not strictly required) items: > > 1. Simpler API (as @jonahgao and I have suggested) > 2. A config flag to enable/disable support (as a way to protect against DOS) > > In order to avoid a massive PR, my suggestion is to implement this feature in stages: > > 1. Add a config flag to enable/disable CTE support, default to disable > 2. Add parser / LogicalPlan support in one PR > 3. Add execution support in a second PR > 4. Add memory limiting in another PR > 5. Iterate / update tests > 6. When we are happy that everything is working well, change the switches default to enabled by default Thanks @alamb! Think this impl plan makes sense. I'll start knocking out the PRs. Also going to have a go at reworking the existing PR to use MemTable in the two plans (as @jonahgao mentioned here https://github.com/apache/arrow-datafusion/pull/7581#discussion_r1446062679). I'll link the resulting PRs to this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
