+1 for CREATE EXTERNAL TABLE. It is a good balance between the general SQL expectation of having tables as an abstraction and reinforcing that Beam does not store your data.
On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <[email protected]> wrote: > > I think users will be more confused to find that 'CREATE TABLE' doesn't > exist then to learn that it might not always create a table. > > >> I think that having CREATE TABLE do something unexpected or not do > something expected (or do the opposite things depending on the table type > or some flag) is worse than having users look up the correct way of > creating a data source in Beam SQL without expecting something we don't > promise. > > I agree on this. Enforcing users to look up documentation for the correct > way is better than letting them use an ambiguous way that could fail their > expectation. > > > -Rui > > On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <[email protected]> wrote: > >> I think that something unique along the lines of `REGISTER EXTERNAL DATA >> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of >> other dialects. >> >> > There is a lot of value in making sure our common operations closely >> map to the equivalent common operations in other SQL dialects. >> >> We're trying to make opposite points using the same arguments :) A lot of >> popular dialects make difference between CREATE TABLE and CREATE EXTERNAL >> TABLE (or similar): >> - T-SQL: >> create: >> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql >> create external: >> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017 >> external datasource: >> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017 >> - PL/SQL: >> create: >> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369 >> create external: >> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127 >> - postgres: >> import foreign schema: >> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html >> create table: >> https://www.postgresql.org/docs/9.1/static/sql-createtable.html >> - redshift: >> create external schema: >> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html >> create table: >> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html >> - hive internal and external: >> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables >> >> My understanding is that the behavior of create table is somewhat similar >> in all of the above dialects, from the high-level perspective it usually >> creates a persistent table in the current storage context (database). >> That's not what Beam SQL's create table does right now, and my opinion is >> that it should not be called create table for this reason. >> >> > I think users will be more confused to find that 'CREATE TABLE' >> doesn't exist then to learn that it might not always create a table. >> >> I think that having CREATE TABLE do something unexpected or not do >> something expected (or do the opposite things depending on the table type >> or some flag) is worse than having users look up the correct way of >> creating a data source in Beam SQL without expecting something we don't >> promise. >> >> > (For example, a user guessing at the syntax of CREATE TABLE would have >> a better experience with the error being "field LOCATION not specified" >> rather than "operation CREATE TABLE not found".) >> >> They have to look it up anyway (what format is location for a Pubsub >> topic? or is it a subscription?), and when doing so I think it would be >> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL >> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE >> TABLE`. >> >> External tables and schemas don't have a standard approach and I don't >> have a strong preference between any one from the above. >> >> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <[email protected]> wrote: >> >>> Adding dev@ back now. >>> >>> -Rui >>> >>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <[email protected]> >>> wrote: >>> >>>> Did we drop the dev list from this on purpose? (I haven't added it >>>> back, but we probably should.) >>>> >>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE >>>> SCHEMA' if there is only to be one option. Sticking with those names >>>> minimizes both our deviation from other implementations and user surprise. >>>> There is a lot of value in making sure our common operations closely map to >>>> the equivalent common operations in other SQL dialects. I think users will >>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn >>>> that it might not always create a table. This minimizes the overhead of >>>> learning our dialect of SQL and maximizes the odds that a user will be able >>>> to guess at the syntax of something and have it work. (For example, a user >>>> guessing at the syntax of CREATE TABLE would have a better experience with >>>> the error being "field LOCATION not specified" rather than "operation >>>> CREATE TABLE not found".) >>>> >>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA >>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names >>>> remove the ambiguity around the operation creating and the data source >>>> being a table. >>>> >>>> Andrew >>>> >>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <[email protected]> wrote: >>>> >>>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE >>>>> EXTERNAL TABLE` for existing semantics. My main reasons are: >>>>> - user friendliness, matching expectations, readability. Current >>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to >>>>> users familiar with SQL who expect that `CREATE TABLE` will actually >>>>> create >>>>> a table; >>>>> - forward-compatibility. We could potentially support non-external >>>>> `CREATE TABLE` at some point in the future, whatever semantics it might >>>>> have. It will be wrong to use the same syntax for external and >>>>> non-external >>>>> CREATEs; >>>>> >>>>> I agree that typing extra word each time is not ideal, but my opinion >>>>> is on the side that readability of code (including SQL) is important (how >>>>> much time you spend reading / understanding code vs writing it) and we >>>>> should try to improve it if we can. In case of DDL every non-trivial >>>>> statement will already have a ton of unavoidable words (field names, >>>>> types, >>>>> location, options) so I would argue that adding extra one word would not >>>>> noticeably reduce your happiness of writing it :) But it would improve >>>>> readability and reduce ambiguity, which I think is worth it. >>>>> >>>>> I think that making it optional only introduces more confusion (e.g. >>>>> what's the difference between the two DDL statements without reading the >>>>> doc?) and would make situation worse. >>>>> >>>>> Regards, >>>>> Anton >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <[email protected]> >>>>> wrote: >>>>> >>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to >>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making >>>>>> `EXTERNAL` as optional? >>>>>> >>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from >>>>>>> a documentation prospective, but I'd be really unhappy if I had to type >>>>>>> out >>>>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL >>>>>>> SCHEMA'.) >>>>>>> >>>>>>> Andrew >>>>>>> >>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <[email protected]> wrote: >>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I know you are probably using CREATE TABLE, Can I know your >>>>>>>> thoughts on this? >>>>>>>> >>>>>>>> -Rui >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be >>>>>>>>> better. >>>>>>>>> >>>>>>>>> "create external" is being widely used by different systems with >>>>>>>>> similar meaning so "create" usually is ok to external data sources. >>>>>>>>> >>>>>>>>> -Rui >>>>>>>>> >>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that >>>>>>>>>> present, when I was triaging post-commit tests. >>>>>>>>>> >>>>>>>>>> Do we have any terms that specify connection to external table? >>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be >>>>>>>>>> a new >>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but >>>>>>>>>> adding >>>>>>>>>> something more explicit for the task might be even better. >>>>>>>>>> >>>>>>>>>> --Mikhail >>>>>>>>>> >>>>>>>>>> Have feedback <http://go/migryz-feedback>? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We >>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE >>>>>>>>>>> (a-la >>>>>>>>>>> T-SQL >>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>), >>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or >>>>>>>>>>> restrict >>>>>>>>>>> DML statements. >>>>>>>>>>> >>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :) >>>>>>>>>>> >>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Community, >>>>>>>>>>>> >>>>>>>>>>>> BeamSQL allows CREATE TABLE >>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/> >>>>>>>>>>>> statements to register virtual tables from external storage >>>>>>>>>>>> systems (e.g. >>>>>>>>>>>> BigQuery). >>>>>>>>>>>> >>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by >>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be >>>>>>>>>>>> registered by >>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a >>>>>>>>>>>> LOCATION and >>>>>>>>>>>> BeamSQL will register the table outside of current execution >>>>>>>>>>>> environment >>>>>>>>>>>> based on LOCATION. >>>>>>>>>>>> >>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in >>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and >>>>>>>>>>>> BeamSQL >>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at >>>>>>>>>>>> least on >>>>>>>>>>>> some storage systems, if not all). >>>>>>>>>>>> >>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional. >>>>>>>>>>>> >>>>>>>>>>>> If we make the EXTERNAL keyword required: >>>>>>>>>>>> >>>>>>>>>>>> Pros: >>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE >>>>>>>>>>>> TABLE. >>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the >>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only >>>>>>>>>>>> register >>>>>>>>>>>> tables in BeamSQL. >>>>>>>>>>>> >>>>>>>>>>>> Cons: >>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing >>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes. >>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time, >>>>>>>>>>>> especially in SQL Shell. >>>>>>>>>>>> >>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed >>>>>>>>>>>> pros and cons above. >>>>>>>>>>>> >>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required >>>>>>>>>>>> or optional? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Rui >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ---- >>>>>> Mingmin >>>>>> >>>>>
