altmannmarcelo opened a new pull request, #2378:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2378

   ## Summary
   
   `Parser::parse_function_args` runs in `O(2^n)` time on `n`-deep nested 
function-call arguments (e.g. `replace(replace(replace(...)))`) for any dialect 
where `supports_named_fn_args_with_expr_name()` is true — notably 
**PostgreSQL** and **MSSQL**. A 20-level chain takes ~43s; a 30-level chain 
would take ~12 hours. The fix makes parse time linear.
   
   ## Root cause
   
   To support expression-named arguments like `JSON_OBJECT(a + b : c)`, 
`parse_function_args` speculatively parses the *entire* argument as an 
expression to see whether a named-argument operator (`=>`, `:=`, `:`) follows:
   
   ```rust
   let arg = if self.dialect.supports_named_fn_args_with_expr_name() {
       self.maybe_parse(|p| {
           let name = p.parse_expr()?;                         // parse the 
WHOLE argument
           let operator = p.parse_function_named_arg_operator()?;  // needs 
=>/:=/:
           let arg = p.parse_wildcard_expr()?.into();
           Ok(FunctionArg::ExprNamed { name, arg, operator })
       })?
   } else { ... };
   if let Some(arg) = arg { return Ok(arg); }
   let wildcard_expr = self.parse_wildcard_expr()?;            // RE-PARSE the 
same argument
   ```
   
   For a plain positional argument that is itself a function call, the 
speculative `parse_expr()` consumes the whole nested subtree, finds no 
named-arg operator, `maybe_parse` rewinds, and `parse_wildcard_expr()` parses 
the identical subtree again. Each nesting level parses its child twice: `T(n) = 
2*T(n-1)` → `O(2^n)`.
   
   Introduced in commit `92be237` — "Add support for MSSQL's 
`JSON_ARRAY`/`JSON_OBJECT` expr (#1507)" — which changed named-argument 
detection from a single-token `parse_identifier` lookahead to this speculative 
full-expression parse.
   
   ## Fix
   
   Parse the leading expression once, then check for a named-argument operator 
(an O(1) token check). If one follows, it is a named argument; otherwise the 
same already-parsed expression is the positional argument. The positional tail 
(`* EXCLUDE`, `expr AS alias`) is factored into `parse_unnamed_function_arg`, 
shared by both paths. No re-parse; behavior is otherwise unchanged.
   
   ## Evidence
   
   Pathological input: `SELECT replace(replace(... x ..., 'a','b'), 'a','b')`, 
PostgreSQL dialect. Parse time per nesting depth, before vs after the fix 
(milliseconds):
   
   | depth | before | after |
   |------:|-------:|------:|
   | 10 | 48 ms | 0.025 ms |
   | 20 | 42,600 ms | 0.049 ms |
   | 30 | ~43,000,000 ms (extrapolated, not run) | 0.071 ms |
   
   Both columns are parse-only (min of 3 runs). Before: parse time doubles per 
level — `O(2^n)` (additional measured points: depth 15 = 1,350 ms, depth 18 = 
10,600 ms). After: linear, ~0.0024 ms per level. At depth 20 that is roughly an 
870,000x speedup. The depth-30 case was **not run** before the fix; 
extrapolating the doubling (42,600 ms x 2^10) puts it near 12 hours.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to