Hi, I'm +1 to use the naming of from/toDataStream, rather than from/toInsertStream. So we don't need to deprecate the existing `fromDataStream`.
I'm +1 to Danny's proposal: fromDataStream(dataStream, Schema) and toDataStream(table, AbstractDataType<?>) I think we can also keep the method `createTemporaryView(path, DataStream<T>)`. I don't have a strong opinion on deprecating fromDataStream(datastream, exprs), but slightly prefer to keep them. Best, Jark On Wed, 9 Sep 2020 at 14:34, Danny Chan <yuzhao....@gmail.com> wrote: > Thanks for the summary Timo ~ > > I want to clarify a little bit, what is the conclusion about the > fromChangelogStream and toChangelogStream, should we use this name or we > use fromDataStream with an optional ChangelogMode flag ? > > Best, > Danny Chan > 在 2020年9月8日 +0800 PM8:22,Timo Walther <twal...@apache.org>,写道: > > Hi Danny, > > > > Your proposed signatures sound good to me. > > > > fromDataStream(dataStream, Schema) > > toDataStream(table, AbstractDataType<?>) > > > > They address all my concerns. The API would not be symmetric anymore, > > but this is fine with me. Others raised concerns about deprecating > > `fromDataStream(dataStream, Expression)`. Are they fine with this as > well? > > > > If there are no objections, I would update the FLIP with the methods > > above. Bu let me briefly summarize my thoughts on this again, so that we > > are all on the same page: > > - The biggest discussion point seems the fromInsertStream/toInsertStream. > > - I don’t have a strong opinion on naming, from/toDataStream would be > > fine for me. But: > > - It slightly different type mappings and might break existing pipelines > > silently. This point can be neglected as the differences are only minor. > > - We need a way of declaring the rowtime attribute but without declaring > > all columns again. Reduce manual schema work as much as possible. > > - Both Dawid and I don’t like the current either “position based” or > > “name based” expression logic that looks like a projection but is not. > > - Actually name based expressions are not necessary, since we have > > positions for all new data types. > > - Schema is not suitable to influence the output type for toDataStream. > > It should be DataType. > > > > All items are solved by Danny's suggestion. > > > > Regards, > > Timo > > > > > > > > On 08.09.20 14:04, Danny Chan wrote: > > > Hi, Timo ~ > > > > > > "It is not about changelog mode compatibility, it is about the type > > > compatibility.” > > > > > > For fromDataStream(dataStream, Schema), there should not be > compatibility problem or data type inconsistency. We know the logical type > from Schema and physical type from the dataStream itself. > > > > > > For toDataStream(table, AbstractDataType<?>), we can get the logical > type from the table itself > > > and the physical type from the passed data type. > > > > > > If both behavior are deterministic, what's the problem for type > compatibility and safety? > > > > > > My concern is that in most of the cases, people use the "insert > stream", they do not need to care about > > > the data stream ChangelogMode, so there is no need to distinguish them > from the APIs, an optional param is enough. If we introduces 2 new API > there, people have to choose between them, and can fromChangelogStream() > > > accept an insert stream ? What is the behavior if fromInsertStream() > accepts a changelog stream ? > > > > > > > > > "This means potentially duplicate definition of fields and their data > types etc” > > > > > > I agree, because table already has an underlying schema there. > > > > > > Best, > > > Danny Chan > > > 在 2020年9月3日 +0800 PM8:12,Timo Walther <twal...@apache.org>,写道: > > > > Hi Danny, > > > > > > > > "if ChangelogMode.INSERT is the default, existing pipelines should be > > > > compatible" > > > > > > > > It is not about changelog mode compatibility, it is about the type > > > > compatibility. The renaming to `toInsertStream` is only to have a > mean > > > > of dealing with data type inconsistencies that could break existing > > > > pipelines. > > > > > > > > As the FLIP describes, the following new behavior should be > implemented: > > > > > > > > - It does this by translating the TypeInformation to DataType. > > > > - This will happen with a new TypeInfoDataTypeConverter that will no > > > > longer produce LegacyTypeInformationType. > > > > - All types from DataStream API should be supported by this > converter. > > > > - TupleTypeInfoBase will be translated into a proper RowType or > > > > StructuredType. > > > > - BigDecimals will be converted to DECIMAL(38,18) by default. > > > > - Composite types (tuples, POJOs, rows) will be flattened by default > if > > > > they are used as top-level records (similar to the old behavior). > > > > - The order of POJO field's is determined by the DataTypeExtractor > and > > > > must not be defined manually anymore. > > > > - GenericTypeInfo is converted to RawType immediately by considering > the > > > > current configuration. > > > > - A DataStream that originated from Table API will keep its DataType > > > > information due to ExternalTypeInfo implementing DataTypeQueryable. > > > > > > > > I would feel safer if we do this under a new method name. > > > > > > > > "toDataStream(table, schema.bindTo(DataType))" > > > > > > > > This is what I meant with "integrate the DataType into the Schema > class > > > > itself". Yes, we can do that if everybody is fine with it. But why > > > > should a user specify both a schema and a data type? This means > > > > potentially duplicate definition of fields and their data types etc. > > > > > > > > Regards, > > > > Timo > > > > > > > > > > > > On 03.09.20 11:31, Danny Chan wrote: > > > > > "It is a more conservative approach to introduce that in a > > > > > new method rather than changing the existing one under the hood and > > > > > potentially break existing pipelines silently” > > > > > > > > > > I like the idea actually, but if ChangelogMode.INSERT is the > default, existing pipelines should be compatible. We can see the other > kinds of ChangelogMode as an extension. > > > > > > > > > > “for `toDataStream` users need to be > > > > > able to express whether they would prefer Row, POJO or atomic” > > > > > > > > > > I think most of the cases people do not need to convert the stream > to a Row or POJO, because the table projection always returns a flatternned > internal row, if people did want a POJO there, how about we bind the > DataType to the existing schema, like this > > > > > > > > > > toDataStream(table, schema.bindTo(DataType)) > > > > > > > > > > Best, > > > > > Danny Chan > > > > > 在 2020年9月3日 +0800 PM3:18,dev@flink.apache.org,写道: > > > > > > > > > > > > It is a more conservative approach to introduce that in a > > > > > > new method rather than changing the existing one under the hood > and > > > > > > potentially break existing pipelines silently > > > > > > > > > > > > > > >