Re: [PR] Fix column definition `COLLATE` parsing [datafusion-sqlparser-rs]

via GitHub Fri, 01 Aug 2025 10:04:24 -0700


mvzink commented on code in PR #1986:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1986#discussion_r2248453508



##########
src/parser/mod.rs:
##########
@@ -1248,6 +1248,12 @@ impl<'a> Parser<'a> {
         debug!("parsing expr");
         let mut expr = self.parse_prefix()?;
 
+        // We would have exited early in `parse_prefix` before checking for 
`COLLATE`, and there's
+        // no infix operator handling for `COLLATE`, so we must return now.
+        if self.in_column_definition_state() && 
self.peek_keyword(Keyword::COLLATE) {
+            return Ok(expr);
+        }

Review Comment:
   If there is no infix handling for a given token, and we try to `parse_infix` 
anyway, we get a `No infix parser for token` error. 
   
   In practice, we avoid this (I would say accidentally) for all dialects other 
than PostgreSQL, because the default precedence of `COLLATE` is 0. In 
PostgreSQL, it is 120.
   
   Consider column options `DEFAULT 'foo' COLLATE 'en-US'` Without this early 
return, after parsing `'foo'`, we will flow through to checking the precedence 
of the next token. By default, it is `0`, which is `<=` the current precedence 
(also 0), so we break and return `'foo'`. But for PostgreSQL, it will be 120, 
and we will flow into the infix parsing (i.e. treating `COLLATE` as an infix 
operator, which we don't handle because technically it's not).
   
   The result is this:
   
   ```
       2025-08-01T16:56:30.976Z DEBUG [sqlparser::parser] prefix: 
Value(ValueWithSpan { value: SingleQuotedString("foo"), span: 
Span(Location(0,0)..Location(0,0)) })
       2025-08-01T16:56:30.976Z DEBUG [sqlparser::dialect::postgresql] 
get_next_precedence() TokenWithSpan { token: Word(Word { value: "COLLATE", 
quote_style: None, keyword: COLLATE }), span: 
Span(Location(0,0)..Location(0,0)) }
       2025-08-01T16:56:30.976Z DEBUG [sqlparser::parser] next precedence: 120
       2025-08-01T16:56:30.976Z DEBUG [sqlparser::parser] infix: TokenWithSpan 
{ token: Word(Word { value: "COLLATE", quote_style: None, keyword: COLLATE }), 
span: Span(Location(0,0)..Location(0,0)) }
   
       thread 'test_parse_default_with_collate_column_option' panicked at 
src/test_utils.rs:157:61:
       CREATE TABLE foo (abc TEXT DEFAULT 'foo' COLLATE 'en_US'): 
ParserError("No infix parser for token Word(Word { value: \"COLLATE\", 
quote_style: None, keyword: COLLATE })")
   ```
   
   I am not 100% this special case is the best way to fix this, but in my 
understanding it is necessary so long as we have special handling for `COLLATE` 
in `parse_prefix`; and that, in turn, is necessary so long as we don't parse 
the RHS as an expression.
   
   I could experiment with treating `COLLATE` as an infix operator, but I don't 
really know how that would go; at least PostgreSQL and MySQL don't allow 
anything other than a single collation name in the righthand side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Fix column definition `COLLATE` parsing [datafusion-sqlparser-rs]

Reply via email to