Raising this topic once more. The PR[1] has been open for a while, if there is no further input, I'm going to merge it by end of day.
[1]: https://github.com/apache/beam/pull/6252 Thank you, Anton On Wed, Aug 15, 2018 at 10:48 PM Tim <[email protected]> wrote: > +1 for CREATE EXTERNAL TABLE with similar reasoning given by others on > this thread. > > Tim > > On 15 Aug 2018, at 23:01, Charles Chen <[email protected]> wrote: > > +1 for CREATE EXTERNAL TABLE. It is a good balance between the general > SQL expectation of having tables as an abstraction and reinforcing that > Beam does not store your data. > > On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <[email protected]> wrote: > >> > I think users will be more confused to find that 'CREATE TABLE' >> doesn't exist then to learn that it might not always create a table. >> >> >> I think that having CREATE TABLE do something unexpected or not do >> something expected (or do the opposite things depending on the table type >> or some flag) is worse than having users look up the correct way of >> creating a data source in Beam SQL without expecting something we don't >> promise. >> >> I agree on this. Enforcing users to look up documentation for the correct >> way is better than letting them use an ambiguous way that could fail their >> expectation. >> >> >> -Rui >> >> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <[email protected]> wrote: >> >>> I think that something unique along the lines of `REGISTER EXTERNAL DATA >>> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of >>> other dialects. >>> >>> > There is a lot of value in making sure our common operations closely >>> map to the equivalent common operations in other SQL dialects. >>> >>> We're trying to make opposite points using the same arguments :) A lot >>> of popular dialects make difference between CREATE TABLE and CREATE >>> EXTERNAL TABLE (or similar): >>> - T-SQL: >>> create: >>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql >>> create external: >>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017 >>> external datasource: >>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017 >>> - PL/SQL: >>> create: >>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369 >>> create external: >>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127 >>> - postgres: >>> import foreign schema: >>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html >>> create table: >>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html >>> - redshift: >>> create external schema: >>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html >>> create table: >>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html >>> - hive internal and external: >>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables >>> >>> My understanding is that the behavior of create table is somewhat >>> similar in all of the above dialects, from the high-level perspective it >>> usually creates a persistent table in the current storage context >>> (database). That's not what Beam SQL's create table does right now, and my >>> opinion is that it should not be called create table for this reason. >>> >>> > I think users will be more confused to find that 'CREATE TABLE' >>> doesn't exist then to learn that it might not always create a table. >>> >>> I think that having CREATE TABLE do something unexpected or not do >>> something expected (or do the opposite things depending on the table type >>> or some flag) is worse than having users look up the correct way of >>> creating a data source in Beam SQL without expecting something we don't >>> promise. >>> >>> > (For example, a user guessing at the syntax of CREATE TABLE would >>> have a better experience with the error being "field LOCATION not >>> specified" rather than "operation CREATE TABLE not found".) >>> >>> They have to look it up anyway (what format is location for a Pubsub >>> topic? or is it a subscription?), and when doing so I think it would be >>> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL >>> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE >>> TABLE`. >>> >>> External tables and schemas don't have a standard approach and I don't >>> have a strong preference between any one from the above. >>> >>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <[email protected]> wrote: >>> >>>> Adding dev@ back now. >>>> >>>> -Rui >>>> >>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <[email protected]> >>>> wrote: >>>> >>>>> Did we drop the dev list from this on purpose? (I haven't added it >>>>> back, but we probably should.) >>>>> >>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE >>>>> SCHEMA' if there is only to be one option. Sticking with those names >>>>> minimizes both our deviation from other implementations and user surprise. >>>>> There is a lot of value in making sure our common operations closely map >>>>> to >>>>> the equivalent common operations in other SQL dialects. I think users will >>>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn >>>>> that it might not always create a table. This minimizes the overhead of >>>>> learning our dialect of SQL and maximizes the odds that a user will be >>>>> able >>>>> to guess at the syntax of something and have it work. (For example, a user >>>>> guessing at the syntax of CREATE TABLE would have a better experience with >>>>> the error being "field LOCATION not specified" rather than "operation >>>>> CREATE TABLE not found".) >>>>> >>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA >>>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names >>>>> remove the ambiguity around the operation creating and the data source >>>>> being a table. >>>>> >>>>> Andrew >>>>> >>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <[email protected]> wrote: >>>>> >>>>>> My preference is to make `EXTERNAL` mandatory and only support >>>>>> `CREATE EXTERNAL TABLE` for existing semantics. My main reasons are: >>>>>> - user friendliness, matching expectations, readability. Current >>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to >>>>>> users familiar with SQL who expect that `CREATE TABLE` will actually >>>>>> create >>>>>> a table; >>>>>> - forward-compatibility. We could potentially support non-external >>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might >>>>>> have. It will be wrong to use the same syntax for external and >>>>>> non-external >>>>>> CREATEs; >>>>>> >>>>>> I agree that typing extra word each time is not ideal, but my opinion >>>>>> is on the side that readability of code (including SQL) is important (how >>>>>> much time you spend reading / understanding code vs writing it) and we >>>>>> should try to improve it if we can. In case of DDL every non-trivial >>>>>> statement will already have a ton of unavoidable words (field names, >>>>>> types, >>>>>> location, options) so I would argue that adding extra one word would not >>>>>> noticeably reduce your happiness of writing it :) But it would improve >>>>>> readability and reduce ambiguity, which I think is worth it. >>>>>> >>>>>> I think that making it optional only introduces more confusion (e.g. >>>>>> what's the difference between the two DDL statements without reading the >>>>>> doc?) and would make situation worse. >>>>>> >>>>>> Regards, >>>>>> Anton >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to >>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making >>>>>>> `EXTERNAL` as optional? >>>>>>> >>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer >>>>>>>> from a documentation prospective, but I'd be really unhappy if I had to >>>>>>>> type out 'EXTERNAL' every time. (I have the same concern with 'CREATE >>>>>>>> EXTERNAL SCHEMA'.) >>>>>>>> >>>>>>>> Andrew >>>>>>>> >>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi guys, >>>>>>>>> >>>>>>>>> I know you are probably using CREATE TABLE, Can I know your >>>>>>>>> thoughts on this? >>>>>>>>> >>>>>>>>> -Rui >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be >>>>>>>>>> better. >>>>>>>>>> >>>>>>>>>> "create external" is being widely used by different systems with >>>>>>>>>> similar meaning so "create" usually is ok to external data sources. >>>>>>>>>> >>>>>>>>>> -Rui >>>>>>>>>> >>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that >>>>>>>>>>> present, when I was triaging post-commit tests. >>>>>>>>>>> >>>>>>>>>>> Do we have any terms that specify connection to external table? >>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be >>>>>>>>>>> a new >>>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but >>>>>>>>>>> adding >>>>>>>>>>> something more explicit for the task might be even better. >>>>>>>>>>> >>>>>>>>>>> --Mikhail >>>>>>>>>>> >>>>>>>>>>> Have feedback <http://go/migryz-feedback>? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We >>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE >>>>>>>>>>>> (a-la >>>>>>>>>>>> T-SQL >>>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>), >>>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or >>>>>>>>>>>> restrict >>>>>>>>>>>> DML statements. >>>>>>>>>>>> >>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :) >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Community, >>>>>>>>>>>>> >>>>>>>>>>>>> BeamSQL allows CREATE TABLE >>>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/> >>>>>>>>>>>>> statements to register virtual tables from external storage >>>>>>>>>>>>> systems (e.g. >>>>>>>>>>>>> BigQuery). >>>>>>>>>>>>> >>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by >>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be >>>>>>>>>>>>> registered by >>>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a >>>>>>>>>>>>> LOCATION and >>>>>>>>>>>>> BeamSQL will register the table outside of current execution >>>>>>>>>>>>> environment >>>>>>>>>>>>> based on LOCATION. >>>>>>>>>>>>> >>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in >>>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and >>>>>>>>>>>>> BeamSQL >>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at >>>>>>>>>>>>> least on >>>>>>>>>>>>> some storage systems, if not all). >>>>>>>>>>>>> >>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional. >>>>>>>>>>>>> >>>>>>>>>>>>> If we make the EXTERNAL keyword required: >>>>>>>>>>>>> >>>>>>>>>>>>> Pros: >>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE >>>>>>>>>>>>> TABLE. >>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the >>>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only >>>>>>>>>>>>> register >>>>>>>>>>>>> tables in BeamSQL. >>>>>>>>>>>>> >>>>>>>>>>>>> Cons: >>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing >>>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes. >>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time, >>>>>>>>>>>>> especially in SQL Shell. >>>>>>>>>>>>> >>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have >>>>>>>>>>>>> reversed pros and cons above. >>>>>>>>>>>>> >>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required >>>>>>>>>>>>> or optional? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Rui >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ---- >>>>>>> Mingmin >>>>>>> >>>>>>
