+1 to option#1.

I think it makes sense to enhance the datagen connector.
In this case, I think we can support the default TIMESTAMP generation
strategy as "sequence" with an optional start point.
This strategy can be changed to "constant", "random", or others.
This would be really helpful and cool if we can support this, and that's
why I prefer #1 than #2:

CREATE TEMPORARY TABLE Orders WITH (
    'connector' = 'datagen'
) LIKE Orders (EXCLUDING ALL)

Regarding #2, if we want to extend the LIKE clause, maybe we can add an
"OVERWRITING COLUMNS" like option.
But unless we have other strong use cases for this, otherwise, I think this
makes things complicated (the current like options are already puzzling).
Maybe @Dawid Wysakowicz <dwysakow...@apache.org>  has more thoughts on
this.


Best,
Jark

On Mon, 27 Jul 2020 at 10:50, godfrey he <godfre...@gmail.com> wrote:

> Hi Seth,
> Thanks for bringing up this topic.
>
> I think the second approach is a more generic solution.
> Other connectors can also benefit from this.
> We also keep the flexibility for generating random timestamps for some
> scenarios.
>
> Best,
> Godfrey
>
> Seth Wiesman <sjwies...@gmail.com> 于2020年7月24日周五 下午11:30写道:
>
> > Hi everyone,
> >
> > Currently, the data gen table source only supports a subset of Flink SQL
> > types. One missing type in particular is TIMESTAMP(3). The reason, I
> > suspect, it was not added originally is that it doesn't really make sense
> > to have random timestamps. What you really want is for them to be
> > ascending. In the use cases of data generation, users typically don't
> care
> > about late data. The workaround proposed in the docs is to create your
> > event time attribute using a computed column.
> >
> > CREATE TABLE t (
> >     ts AS LOCALTIMESTAMP
> > ) WITH (
> >     'connector' = 'datagen'
> > )
> >
> > The problem is that this does not play well with the LIKE clause. Many
> > users do not create datagen backed tables from scratch but using the LIKE
> > clause to shadow a physical table in their catalog - such as Kafka.
> >
> > The problem is the LIKE clause does not allow redefining columns so there
> > is no way to do this for a table with an event time attribute. The below
> > will fail.
> >
> > CREATE TABLE Orders (
> >     order_id   BIGINT,
> >     order_time TIMESTAMP(3)
> >     quantity   INT,
> >     cost       AS price * quantity,
> >     WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
> >     PRIMARY KEY (order_id) NOT ENFORCED
> > ) WITH (
> >     'connector' = 'kafka',
> >     'topic' = 'orders',
> >     'properties.bootstrap.servers' = 'localhost:9092',
> >     'properties.group.id' = 'orderGroup',
> >     'format' = 'csv'
> > )
> >
> > CREATE TEMPORARY TABLE Orders WITH (
> >     'connector' = 'datagen'
> > ) LIKE Orders (EXCLUDING ALL)
> >
> >
> > I see two solutions to this and would like to hear what people think.
> >
> > 1) Support TIMESTAMP in datagen tables but always supply strictly
> ascending
> > timestamps. The above would now "just work". This semantic makes sense
> > given the way event time attributes are used in streaming applications
> and
> > we can clearly document the behavior.
> >
> > 2) Relax the constraints of the LIKE clause to allow overriding physical
> > columns with computed columns. This would make it clearer to the user
> what
> > is happening but would require substantially higher development effort
> and
> > I don't know if this feature would add value beyond this one use case. In
> > practice, this would allow the following.
> >
> > Please let me know what you think.
> > CREATE TEMPORARY TABLE Orders (
> >     order_time AS LOCALTIMESTAMP
> > ) WITH (
> >      'connector' = 'datagen'
> > ) LIKE Orders (EXCLUDING ALL)
> >
> > Seth
> >
>

Reply via email to