The change of documentation on Beam website is done (see: create-external-table <https://beam.apache.org/documentation/dsls/sql/create-external-table/>). This proposal is finally transformed into master branch. Thanks all.
-Rui On Fri, Sep 14, 2018 at 12:29 PM Anton Kedin <[email protected]> wrote: > Raising this topic once more. The PR[1] has been open for a while, if > there is no further input, I'm going to merge it by end of day. > > [1]: https://github.com/apache/beam/pull/6252 > > Thank you, > Anton > > > On Wed, Aug 15, 2018 at 10:48 PM Tim <[email protected]> wrote: > >> +1 for CREATE EXTERNAL TABLE with similar reasoning given by others on >> this thread. >> >> Tim >> >> On 15 Aug 2018, at 23:01, Charles Chen <[email protected]> wrote: >> >> +1 for CREATE EXTERNAL TABLE. It is a good balance between the general >> SQL expectation of having tables as an abstraction and reinforcing that >> Beam does not store your data. >> >> On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <[email protected]> wrote: >> >>> > I think users will be more confused to find that 'CREATE TABLE' >>> doesn't exist then to learn that it might not always create a table. >>> >>> >> I think that having CREATE TABLE do something unexpected or not do >>> something expected (or do the opposite things depending on the table type >>> or some flag) is worse than having users look up the correct way of >>> creating a data source in Beam SQL without expecting something we don't >>> promise. >>> >>> I agree on this. Enforcing users to look up documentation for the >>> correct way is better than letting them use an ambiguous way that could >>> fail their expectation. >>> >>> >>> -Rui >>> >>> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <[email protected]> wrote: >>> >>>> I think that something unique along the lines of `REGISTER EXTERNAL >>>> DATA SOURCE` is probably fine, as it doesn't conflict with existing >>>> behaviors of other dialects. >>>> >>>> > There is a lot of value in making sure our common operations closely >>>> map to the equivalent common operations in other SQL dialects. >>>> >>>> We're trying to make opposite points using the same arguments :) A lot >>>> of popular dialects make difference between CREATE TABLE and CREATE >>>> EXTERNAL TABLE (or similar): >>>> - T-SQL: >>>> create: >>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql >>>> create external: >>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017 >>>> external datasource: >>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017 >>>> - PL/SQL: >>>> create: >>>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369 >>>> create external: >>>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127 >>>> - postgres: >>>> import foreign schema: >>>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html >>>> create table: >>>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html >>>> - redshift: >>>> create external schema: >>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html >>>> create table: >>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html >>>> - hive internal and external: >>>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables >>>> >>>> My understanding is that the behavior of create table is somewhat >>>> similar in all of the above dialects, from the high-level perspective it >>>> usually creates a persistent table in the current storage context >>>> (database). That's not what Beam SQL's create table does right now, and my >>>> opinion is that it should not be called create table for this reason. >>>> >>>> > I think users will be more confused to find that 'CREATE TABLE' >>>> doesn't exist then to learn that it might not always create a table. >>>> >>>> I think that having CREATE TABLE do something unexpected or not do >>>> something expected (or do the opposite things depending on the table type >>>> or some flag) is worse than having users look up the correct way of >>>> creating a data source in Beam SQL without expecting something we don't >>>> promise. >>>> >>>> > (For example, a user guessing at the syntax of CREATE TABLE would >>>> have a better experience with the error being "field LOCATION not >>>> specified" rather than "operation CREATE TABLE not found".) >>>> >>>> They have to look it up anyway (what format is location for a Pubsub >>>> topic? or is it a subscription?), and when doing so I think it would be >>>> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL >>>> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE >>>> TABLE`. >>>> >>>> External tables and schemas don't have a standard approach and I don't >>>> have a strong preference between any one from the above. >>>> >>>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <[email protected]> wrote: >>>> >>>>> Adding dev@ back now. >>>>> >>>>> -Rui >>>>> >>>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <[email protected]> >>>>> wrote: >>>>> >>>>>> Did we drop the dev list from this on purpose? (I haven't added it >>>>>> back, but we probably should.) >>>>>> >>>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE >>>>>> SCHEMA' if there is only to be one option. Sticking with those names >>>>>> minimizes both our deviation from other implementations and user >>>>>> surprise. >>>>>> There is a lot of value in making sure our common operations closely map >>>>>> to >>>>>> the equivalent common operations in other SQL dialects. I think users >>>>>> will >>>>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn >>>>>> that it might not always create a table. This minimizes the overhead of >>>>>> learning our dialect of SQL and maximizes the odds that a user will be >>>>>> able >>>>>> to guess at the syntax of something and have it work. (For example, a >>>>>> user >>>>>> guessing at the syntax of CREATE TABLE would have a better experience >>>>>> with >>>>>> the error being "field LOCATION not specified" rather than "operation >>>>>> CREATE TABLE not found".) >>>>>> >>>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL >>>>>> DATA >>>>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names >>>>>> remove the ambiguity around the operation creating and the data source >>>>>> being a table. >>>>>> >>>>>> Andrew >>>>>> >>>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> My preference is to make `EXTERNAL` mandatory and only support >>>>>>> `CREATE EXTERNAL TABLE` for existing semantics. My main reasons are: >>>>>>> - user friendliness, matching expectations, readability. Current >>>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing >>>>>>> to >>>>>>> users familiar with SQL who expect that `CREATE TABLE` will actually >>>>>>> create >>>>>>> a table; >>>>>>> - forward-compatibility. We could potentially support non-external >>>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might >>>>>>> have. It will be wrong to use the same syntax for external and >>>>>>> non-external >>>>>>> CREATEs; >>>>>>> >>>>>>> I agree that typing extra word each time is not ideal, but my >>>>>>> opinion is on the side that readability of code (including SQL) is >>>>>>> important (how much time you spend reading / understanding code vs >>>>>>> writing >>>>>>> it) and we should try to improve it if we can. In case of DDL every >>>>>>> non-trivial statement will already have a ton of unavoidable words >>>>>>> (field >>>>>>> names, types, location, options) so I would argue that adding extra one >>>>>>> word would not noticeably reduce your happiness of writing it :) But it >>>>>>> would improve readability and reduce ambiguity, which I think is worth >>>>>>> it. >>>>>>> >>>>>>> I think that making it optional only introduces more confusion (e.g. >>>>>>> what's the difference between the two DDL statements without reading the >>>>>>> doc?) and would make situation worse. >>>>>>> >>>>>>> Regards, >>>>>>> Anton >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to >>>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making >>>>>>>> `EXTERNAL` as optional? >>>>>>>> >>>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer >>>>>>>>> from a documentation prospective, but I'd be really unhappy if I had >>>>>>>>> to >>>>>>>>> type out 'EXTERNAL' every time. (I have the same concern with 'CREATE >>>>>>>>> EXTERNAL SCHEMA'.) >>>>>>>>> >>>>>>>>> Andrew >>>>>>>>> >>>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi guys, >>>>>>>>>> >>>>>>>>>> I know you are probably using CREATE TABLE, Can I know your >>>>>>>>>> thoughts on this? >>>>>>>>>> >>>>>>>>>> -Rui >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be >>>>>>>>>>> better. >>>>>>>>>>> >>>>>>>>>>> "create external" is being widely used by different systems with >>>>>>>>>>> similar meaning so "create" usually is ok to external data sources. >>>>>>>>>>> >>>>>>>>>>> -Rui >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate >>>>>>>>>>>> that present, when I was triaging post-commit tests. >>>>>>>>>>>> >>>>>>>>>>>> Do we have any terms that specify connection to external table? >>>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will >>>>>>>>>>>> be a new >>>>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, >>>>>>>>>>>> but adding >>>>>>>>>>>> something more explicit for the task might be even better. >>>>>>>>>>>> >>>>>>>>>>>> --Mikhail >>>>>>>>>>>> >>>>>>>>>>>> Have feedback <http://go/migryz-feedback>? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We >>>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE >>>>>>>>>>>>> (a-la >>>>>>>>>>>>> T-SQL >>>>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>), >>>>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns >>>>>>>>>>>>> or restrict >>>>>>>>>>>>> DML statements. >>>>>>>>>>>>> >>>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :) >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Community, >>>>>>>>>>>>>> >>>>>>>>>>>>>> BeamSQL allows CREATE TABLE >>>>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/> >>>>>>>>>>>>>> statements to register virtual tables from external storage >>>>>>>>>>>>>> systems (e.g. >>>>>>>>>>>>>> BigQuery). >>>>>>>>>>>>>> >>>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by >>>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be >>>>>>>>>>>>>> registered by >>>>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a >>>>>>>>>>>>>> LOCATION and >>>>>>>>>>>>>> BeamSQL will register the table outside of current execution >>>>>>>>>>>>>> environment >>>>>>>>>>>>>> based on LOCATION. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in >>>>>>>>>>>>>> BeamSQL to help users understand they are registering tables, >>>>>>>>>>>>>> and BeamSQL >>>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at >>>>>>>>>>>>>> least on >>>>>>>>>>>>>> some storage systems, if not all). >>>>>>>>>>>>>> >>>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we make the EXTERNAL keyword required: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pros: >>>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE >>>>>>>>>>>>>> TABLE. >>>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in >>>>>>>>>>>>>> the future if we want CREATE TABLE to create, rather than not >>>>>>>>>>>>>> only register >>>>>>>>>>>>>> tables in BeamSQL. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cons: >>>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing >>>>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes. >>>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time, >>>>>>>>>>>>>> especially in SQL Shell. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have >>>>>>>>>>>>>> reversed pros and cons above. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required >>>>>>>>>>>>>> or optional? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Rui >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ---- >>>>>>>> Mingmin >>>>>>>> >>>>>>>
