davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2130307887
@metegenez Of course on it being a community decision. It is an ASF project after all. 😄 I've probably led everyone astray by naming this "BigQuery Options" because that just happened to be what the internal AST node type was labeled as in the sqlparser crate. I'm not at all tied to that syntax or even the general shape of the syntax. It was just the easiest way I found to attach arbitrary metadata directly to each column in a `create external table` statement. Now that its using `METADATA` instead of `OPTIONS`, its not even BigQuery related anymore. @ozankabak That's an interesting point about anything that wants to support `COPY` might have to support the string based parsing. However, [given this PR](https://github.com/apache/datafusion/pull/9604), it seems like there'd be openness on creating new syntax that mimics any new syntax on the `CREATE EXTERNAL TABLE` statement. As to your three options, unless I'm mistaken (and I'll point out again, I'm very new here, so I could absolutely be missing something), but my reading of the implementation of `DFParser` vs `sqlparser::Parser` makes it seem to me like Option 3 has already been chosen (implicitly or not). Which means all we need to do is figure out what DataFusion should support and implement it. As to comparing against other dialects, perhaps I wasn't as clear as I could be. But if we assume that `sqlparser`'s existing dialects are all correctly/completely implemented then we have these options to consider: 1. Don't attach metadata directly to columns (i.e., the current DataFusion approach) 2. BigQuery's `OPTIONS` keyword in the column list 3. Some new syntax we invent to attach metadata to columns Since I've replaced `OPTIONS` with `METADATA` I guess this PR is technically in the third option, though I'd also consider a completely different syntax if someone cares enough to go and invent it. I'd even be happy to implement it if we just come up with a grammar everyone agrees on. Though for my needs a `col1 BIGINT NOT NULL METADATA(key1=val1, key2=val2, ...)` syntax is more than enough and also feels like the right level of "general enough for others while not significantly increasing parser complexity". Also, it occurs to me to point out that this syntax wouldn't be mandatory for anything. Patching the parser with this extra syntax doesn't require other `TableProviderFactory` implementations from supporting or using it. Though I'd personally vote in favor of it over the existing approach and would be more than happy to do that implementation as part of this PR. That said, I understand not wanting to rush into new syntax given backwards compatibility concerns. Though I think allowing syntax vs using syntax should be separate concerns as allowing it seems to fall squarely into the "reducing friction for downstream users" category. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org