morrySnow opened a new pull request, #63225:
URL: https://github.com/apache/doris/pull/63225

   ## Summary
   
   When a user writes a statement like `CREATE DATABASE load`, Doris previously 
produced an overwhelming ANTLR-generated error listing hundreds of expected 
tokens:
   
   ```
   mismatched input 'load' expecting {'{', '}', 'ACTIONS', 'AFTER', 
'AGG_STATE', 'AGGREGATE', ...hundreds of tokens...}(line 1, pos 16)
   ```
   
   This message is cryptic and gives no actionable guidance.
   
   ## Root Cause
   
   `LOAD` is a reserved keyword, so the parser expects an identifier but 
receives a keyword token. ANTLR then generates a huge "expecting {all 
non-reserved keywords}" list which is useless to the user.
   
   ## Research: Other Databases
   
   | Database | Error Message |
   |---|---|
   | **BigQuery** | `Syntax error: Unexpected keyword LOAD at [1:17]` — names 
the keyword explicitly |
   | **PostgreSQL/DuckDB** | `syntax error at or near "load"` — short and 
concise |
   | **Spark SQL** | Suggests backtick quoting for keyword-as-identifier |
   | **Trino** | Same verbose ANTLR output (same problem) |
   
   ## Fix
   
   Improved `ParseErrorListener` to:
   
   1. **Detect reserved-keyword-as-identifier errors**: When 
`InputMismatchException` fires with expected tokens containing 
`IDENTIFIER`/`BACKQUOTED_IDENTIFIER`, and the offending token has a grammar 
literal name AND looks like a word (not punctuation like `;`), emit a targeted 
message
   2. **New message format** (inspired by BigQuery + Spark):
      ```
      Syntax error near 'load': 'load' is a reserved keyword and cannot be used 
as an identifier without quoting.
      If you want to use 'load' as an identifier, please use backtick quotes: 
`load`
      (line 1, pos 16)
      ```
   3. **Trim long expected-token lists**: For other mismatch errors where the 
expected-token list exceeds 200 chars, strip the list to avoid overwhelming 
users
   4. **pom.xml**: Added default `<argLine/>` property so Maven Surefire can 
run tests without the JaCoCo coverage profile
   
   ## Testing
   
   Added `NereidsParserTest#testReservedKeywordAsIdentifierError`:
   - Verifies `CREATE DATABASE load` produces "reserved keyword" message with 
backtick hint
   - Verifies `CREATE DATABASE select` likewise  
   - Verifies `CREATE DATABASE \`load\`` still parses successfully
   
   Existing `testErrorListener` passes unchanged (its short expected-token list 
is under the 200-char trim threshold).
   
   ### Check List (For Author)
   
   - Test: Unit Test — added 
`NereidsParserTest#testReservedKeywordAsIdentifierError`
   - Behavior changed: Yes — parse errors for reserved-keyword-as-identifier 
show a human-friendly message instead of raw ANTLR output
   - Does this need documentation: No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to