Re: [PR] Remove Whitespace Tokens from Parser [datafusion-sqlparser-rs]

via GitHub Tue, 11 Nov 2025 11:36:25 -0800


Viicos commented on code in PR #2077:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/2077#discussion_r2515477432



##########
src/tokenizer.rs:
##########
@@ -449,29 +449,6 @@ impl Word {
     }
 }
 
-#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
-#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
-#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
-pub enum Whitespace {
-    Space,
-    Newline,
-    Tab,
-    SingleLineComment { comment: String, prefix: String },
-    MultiLineComment(String),
-}

Review Comment:
   Actually having a comment token kind would defeat the purpose of this PR, 
because the logic in the parser to skip those comment tokens would be the same.
   
   The Ruff parser solves this by having a 
[`TokenSource`](https://github.com/astral-sh/ruff/blob/bd8812127daa556bd86fa81c9a79f5f49a2feaa8/crates/ruff_python_parser/src/token_source.rs#L11)
 struct, acting as a bridge between the lexer/tokenizer and parser. It has [a 
couple 
methods](https://github.com/astral-sh/ruff/blob/bd8812127daa556bd86fa81c9a79f5f49a2feaa8/crates/ruff_python_parser/src/token_source.rs#L138-L161)
 to bump the tokens, ignoring the trivia tokens (in our case, that would only 
be the comment tokens). Maybe we could take inspiration from this pattern?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Remove Whitespace Tokens from Parser [datafusion-sqlparser-rs]

Reply via email to