iffyio commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2210549842
########## src/parser/mod.rs: ########## @@ -7724,6 +7737,27 @@ impl<'a> Parser<'a> { return option; } + self.with_state( + ColumnDefinition, + |parser| -> Result<Option<ColumnOption>, ParserError> { + parser.parse_optional_column_option_inner() + }, + ) + } + + fn parse_optional_column_option_inner(&mut self) -> Result<Option<ColumnOption>, ParserError> { + /// In some cases, we need to revert to [ParserState::Normal] when parsing nested expressions + /// In those cases we use the following macro to parse instead of calling [parse_expr] directly. + macro_rules! parse_expr_normal { Review Comment: Can we use a regular function for this vs a macro? In terms of fn signature, would this work just well `fn parse_column_option_expr(&mut self) -> Result<Expr, ParserError>`? then the caller can explicitly do e.g. `Ok(Some(ColumnOption::Default(self.parse_column_option()?)))` ########## src/dialect/mod.rs: ########## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> bool { false } + + /// Returns true if the dialect supports the passed in alias. + /// See [IsNotNullAlias]. + fn supports_is_not_null_alias(&self, alias: IsNotNullAlias) -> bool { Review Comment: Yeah I think the approach look reasonable thanks! Having to explicitly configure each option that accepts arbitrary expression would be the downside but that feels like an acceptable trade-off. We would indeed need to do a pass through the existing options like Ephemeral and do similar like we have for `MATERIALIZED` and `CHECK`. We also have a couple like `OnUpdate` and some in `self.parse_optional_column_option_generated()` ########## src/parser/mod.rs: ########## @@ -28,16 +28,16 @@ use helpers::attached_token::AttachedToken; use log::debug; -use recursion::RecursionCounter; -use IsLateral::*; -use IsOptional::*; - use crate::ast::helpers::stmt_create_table::{CreateTableBuilder, CreateTableConfiguration}; use crate::ast::Statement::CreatePolicy; use crate::ast::*; use crate::dialect::*; use crate::keywords::{Keyword, ALL_KEYWORDS}; use crate::tokenizer::*; +use recursion::RecursionCounter; +use sqlparser::parser::ParserState::ColumnDefinition; +use IsLateral::*; +use IsOptional::*; Review Comment: was the change intentional or autofmt it looks like only one import is being added, if the latter could we revert to a minimal diff? ########## tests/sqlparser_clickhouse.rs: ########## @@ -1705,6 +1705,15 @@ fn parse_table_sample() { clickhouse().verified_stmt("SELECT * FROM tbl SAMPLE 1 / 10 OFFSET 1 / 2"); } +#[test] +fn test_parse_not_null_in_column_options() { + // In addition to DEFAULT and CHECK ClickHouse also supports MATERIALIZED, all of which + // can contain `IS NOT NULL` and thus `NOT NULL` as an alias. + let canonical = "CREATE TABLE foo (abc INT DEFAULT (42 IS NOT NULL) NOT NULL, not_null BOOL MATERIALIZED (abc IS NOT NULL), CHECK (abc IS NOT NULL))"; + clickhouse().verified_stmt(canonical); + clickhouse().one_statement_parses_to("CREATE TABLE foo (abc INT DEFAULT (42 NOT NULL) NOT NULL, not_null BOOL MATERIALIZED (abc IS NOT NULL), CHECK (abc NOT NULL) )", canonical); Review Comment: ```suggestion clickhouse().one_statement_parses_to("CREATE TABLE foo (abc INT DEFAULT (42 NOT NULL) NOT NULL, not_null BOOL MATERIALIZED (abc NOT NULL), CHECK (abc NOT NULL) )", canonical); ``` I think this would be the intention? ########## src/parser/mod.rs: ########## @@ -16514,6 +16549,10 @@ impl<'a> Parser<'a> { Ok(None) } } + + pub fn in_normal_state(&self) -> bool { Review Comment: ```suggestion fn in_normal_state(&self) -> bool { ``` ########## src/parser/mod.rs: ########## @@ -275,6 +275,9 @@ enum ParserState { /// PRIOR expressions while still allowing prior as an identifier name /// in other contexts. ConnectBy, + /// The state when parsing column definitions. This state prohibits + /// NOT NULL as an alias for IS NOT NULL. + ColumnDefinition, Review Comment: Could we include a sql example to illustrate this? I'm wondering that folks won't be able to fully get the picture without reading the code otherwise ########## tests/sqlparser_clickhouse.rs: ########## @@ -1705,6 +1705,15 @@ fn parse_table_sample() { clickhouse().verified_stmt("SELECT * FROM tbl SAMPLE 1 / 10 OFFSET 1 / 2"); } +#[test] +fn test_parse_not_null_in_column_options() { + // In addition to DEFAULT and CHECK ClickHouse also supports MATERIALIZED, all of which + // can contain `IS NOT NULL` and thus `NOT NULL` as an alias. + let canonical = "CREATE TABLE foo (abc INT DEFAULT (42 IS NOT NULL) NOT NULL, not_null BOOL MATERIALIZED (abc IS NOT NULL), CHECK (abc IS NOT NULL))"; Review Comment: For the longer strings, in order to break them up into lines could we use this [concat pattern](https://github.com/apache/datafusion-sqlparser-rs/blob/b7d4e1e7fcf0abcbf2692485e90796b70297b08b/tests/sqlparser_clickhouse.rs#L545-L553)? ########## src/parser/mod.rs: ########## @@ -16514,6 +16549,10 @@ impl<'a> Parser<'a> { Ok(None) } } + + pub fn in_normal_state(&self) -> bool { + matches!(self.state, ParserState::Normal) Review Comment: I'm wondering should this check be `self.state != ParserState::ColumnDefinition`? Thinking that covers the enum being extended to other interactions later on (feels like the not null logic only cares that its not in this state in order to be correct)? ########## src/parser/mod.rs: ########## @@ -7724,6 +7737,27 @@ impl<'a> Parser<'a> { return option; } + self.with_state( + ColumnDefinition, + |parser| -> Result<Option<ColumnOption>, ParserError> { + parser.parse_optional_column_option_inner() + }, + ) + } + + fn parse_optional_column_option_inner(&mut self) -> Result<Option<ColumnOption>, ParserError> { + /// In some cases, we need to revert to [ParserState::Normal] when parsing nested expressions + /// In those cases we use the following macro to parse instead of calling [parse_expr] directly. + macro_rules! parse_expr_normal { + ($option:expr) => { + if matches!(self.peek_token().token, Token::LParen) { Review Comment: ```suggestion if self.peek_token_ref().token == Token::LParen { ``` ########## src/parser/mod.rs: ########## @@ -7724,6 +7737,27 @@ impl<'a> Parser<'a> { return option; } + self.with_state( + ColumnDefinition, + |parser| -> Result<Option<ColumnOption>, ParserError> { + parser.parse_optional_column_option_inner() + }, + ) + } + + fn parse_optional_column_option_inner(&mut self) -> Result<Option<ColumnOption>, ParserError> { + /// In some cases, we need to revert to [ParserState::Normal] when parsing nested expressions + /// In those cases we use the following macro to parse instead of calling [parse_expr] directly. + macro_rules! parse_expr_normal { Review Comment: Documentation wise, I'm thinking it could be helpful to illustrate why we need this function and with an example like below. And that we also callout that any column option that parses an arbitrary expression likely wants to call this variant instead of `parse_expr()`, for folks that wander around this area of the code in the future looking to add/update column options support. ```rust CREATE TABLE foo (abc (42 NOT NULL) NOT NULL); vs CREATE TABLE foo (abc 42 NOT NULL); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org