Hi Jark,

the fieldNames map is not intended for users. I would also be fine to make it a default scope constructor and access it with some internal utility class next to the Row class. The fieldNames map must only be used by serializers and converters. A user has no benefit in using it.

For the creation of new rows (without reusing, which only advanced users usually do), I don't see a benefit of having:

final Row reuse = new Row(Arrays.asList("myField", "myOtherField"))
reuse.setField("myField", 12);
reuse.setField("myOtherField", "This is a test");

The purpose of Row.withName() is too create a Row easily and readable without declaring 50+ column names or dealing with indices in this range.

Personally, I would like to make Row an interface and have concrete row implementations for different purposes but this would break existing programs too much.

What do you think?

Regards,
Timo


On 18.09.20 11:04, Jark Wu wrote:
Personally I think the fieldNames Map is confusing and not handy.
I just have an idea but not sure what you think.
What about adding a new constructor with List field names, this enables all
name-based setter/getters.
Regarding to List -> Map cost for every record, we can suggest users to
reuse the Row in the task.

new Row(int arity)
new Row(List<String> fieldNames)

final Row reuse = new Row(Arrays.asList("myField", "myOtherField"))
reuse.setField("myField", 12);
reuse.setField("myOtherField", "This is a test");

My point is that, if we can have a handy constructor for named Row, we may
not need to distinguish the named-only or positionAndNamed mode.
This can avoid (fast-fail) the potential problem when setting an invalid
field.

We can also come up with a new class for the field names which will
construct the Map and be shared among all Row instances.

What do you think?

Best,
Jark

On Thu, 17 Sep 2020 at 16:48, Timo Walther <twal...@apache.org> wrote:

Hi everyone,

thanks for all the feedback. I updated the FLIP again on Thursday to
integrate the feedback I got from Jingsong and Jark offline. In
particular I updated the `Improve dealing with Row in DataStream API`
section another time. We introduced static methods for Row that should
make the semantics clear to users:

// allows to use index-based setters and getters (equivalent to new
Row(int))
// method exists for completeness
public static withPositions(int length);

// allows to use name-based setters and getters
public static withNames();

// allows to use both name-based and position-based setters and getters
public static withNamesAndPositions(Map<String, Integer> fieldNames);

In any case, non of the existing methods will be deprecated and only
additional functionality will be available through the methods above.

I started a voting thread on Friday. Please feel free to vote.

Regards,
Timo

On 10.09.20 10:21, Danny Chan wrote:
Thanks for driving this Timo, +1 for voting ~

Best,
Danny Chan
在 2020年9月10日 +0800 PM3:54,Timo Walther <twal...@apache.org>,写道:
Thanks everyone for this healthy discussion. I updated the FLIP with the
outcome. I think the result is one of the last core API refactoring and
users will be happy to have a consistent changelog support. Thanks for
all the contributions.

If there are no objections, I would continue with a voting.

What do you think?

Regards,
Timo


On 09.09.20 14:31, Danny Chan wrote:
Thanks, i'm fine with that.


Timo Walther <twal...@apache.org> 于2020年9月9日周三 下午7:02写道:

I agree with Jark. It reduces confusion.

The DataStream API doesn't know changelog processing at all. A
DataStream of Row can be used with both `fromDataStream` and
`fromChangelogStream`. But only the latter API will interpret it as a
changelog something.

And as I mentioned before, the `toChangelogStream` must work with Row
otherwise users are confused due to duplicate records with a missing
changeflag.

I will update the FLIP-136 a last time. I hope we can then continue
to a
vote.

Regards,
Timo


On 09.09.20 10:50, Danny Chan wrote:
I think it would bring in much confusion by a different API name just
because the DataStream generic type is different.
If there are ChangelogMode that only works for Row, can we have a
type
check there ?

Switch to a new API name does not really solve the problem well,
people
still need to declare the ChangelogMode explicitly, and there are some
confusions:

• Should DataStream of Row type always use #fromChangelogStream ?
• Does fromChangelogStream works for only INSERT ChangelogMode ?


Best,
Danny Chan
在 2020年9月9日 +0800 PM4:21,Timo Walther <twal...@apache.org>,写道:
I had this in the inital design, but Jark had concerns at least for
the
`toChangelogStream(ChangelogMode)` (see earlier discussion).

`fromDataStream(dataStream, schema, changelogMode)` would be
possible.

But in this case I would vote for a symmetric API. If we keep
toChangelogStream we should also have a fromChangelogStream.

And if we unify `toChangelogStream` and `toDataStream`, retractions
cannot be represented for non-Rows and users will experience
duplicate
records with a missing changeflag.

Regards,
Timo


On 09.09.20 09:31, Danny Chan wrote:
“But I think the planner needs to
know whether the input is insert-only or not.”

Does fromDataStream(dataStream, schema, changelogMode)

solve your concerns ? People can pass around whatever ChangelogMode
they like as an optional param.
By default: fromDataStream(dataStream, schema), the ChangelogMode
is
INSERT.

Best,
Danny Chan
在 2020年9月9日 +0800 PM2:53,dev@flink.apache.org,写道:

But I think the planner needs to
know whether the input is insert-only or not.












Reply via email to