davisp commented on PR #10590:
URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2130307887

   @metegenez Of course on it being a community decision. It is an ASF project 
after all. 😄 
   
   I've probably led everyone astray by naming this "BigQuery Options" because 
that just happened to be what the internal AST node type was labeled as in the 
sqlparser crate. I'm not at all tied to that syntax or even the general shape 
of the syntax. It was just the easiest way I found to attach arbitrary metadata 
directly to each column in a `create external table` statement. Now that its 
using `METADATA` instead of `OPTIONS`, its not even BigQuery related anymore.
   
   @ozankabak That's an interesting point about anything that wants to support 
`COPY` might have to support the string based parsing. However, [given this 
PR](https://github.com/apache/datafusion/pull/9604), it seems like there'd be 
openness on creating new syntax that mimics any new syntax on the `CREATE 
EXTERNAL TABLE` statement.
   
   As to your three options, unless I'm mistaken (and I'll point out again, I'm 
very new here, so I could absolutely be missing something), but my reading of 
the implementation of `DFParser` vs `sqlparser::Parser` makes it seem to me 
like Option 3 has already been chosen (implicitly or not). Which means all we 
need to do is figure out what DataFusion should support and implement it.
   
   As to comparing against other dialects, perhaps I wasn't as clear as I could 
be. But if we assume that `sqlparser`'s existing dialects are all 
correctly/completely implemented then we have these options to consider:
   
   1. Don't attach metadata directly to columns (i.e., the current DataFusion 
approach)
   2. BigQuery's `OPTIONS` keyword in the column list
   3. Some new syntax we invent to attach metadata to columns
   
   Since I've replaced `OPTIONS` with `METADATA` I guess this PR is technically 
in the third option, though I'd also consider a completely different syntax if 
someone cares enough to go and invent it. I'd even be happy to implement it if 
we just come up with a grammar everyone agrees on. Though for my needs a `col1 
BIGINT NOT NULL METADATA(key1=val1, key2=val2, ...)` syntax is more than enough 
and also feels like the right level of "general enough for others while not 
significantly increasing parser complexity".
   
   Also, it occurs to me to point out that this syntax wouldn't be mandatory 
for anything. Patching the parser with this extra syntax doesn't require other 
`TableProviderFactory` implementations from supporting or using it. Though I'd 
personally vote in favor of it over the existing approach and would be more 
than happy to do that implementation as part of this PR. That said, I 
understand not wanting to rush into new syntax given backwards compatibility 
concerns. Though I think allowing syntax vs using syntax should be separate 
concerns as allowing it seems to fall squarely into the "reducing friction for 
downstream users" category.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to