[jira] [Created] (FLINK-21346) Tasks on AZP should timeout more gracefully

2021-02-09 Thread Arvid Heise (Jira)
Arvid Heise created FLINK-21346:
---

 Summary: Tasks on AZP should timeout more gracefully
 Key: FLINK-21346
 URL: https://issues.apache.org/jira/browse/FLINK-21346
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Affects Versions: 1.13.0
Reporter: Arvid Heise
Assignee: Arvid Heise


Currently, whenever a non-e2e test times out, where are left with little 
information on what went wrong. e2e tests have a good watchdog that uploads on 
timeout.

 

The goal of this ticket is to unify the treatment of tests and e2e tests and 
always upload stack traces and logs on timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Flink upset-kaka connector not working with Avro confluent

2021-02-09 Thread Svend Vanderveken
Hi Shamit,

In this snippet:

Table definition with upsert-kafka is below (not working),
>
> CREATE TABLE proposalLine (PROPOSAL_LINE_ID
> bigint,LAST_MODIFIED_BY STRING ,PRIMARY KEY (PROPOSAL_LINE_ID) NOT ENFORCED
> ) "WITH ('connector' = 'upsert-kafka', 'properties.bootstrap.servers' =
> 'localhost:9092', 'properties.auto.offset.reset' = 'earliest', 'topic' =
> 'lndcdcadsprpslproposalline', 'key.format' = 'avro', 'value.format' =
> 'avro', 'properties.group.id'='dd', 'properties.schema.registry.url'=' href="http://localhost:8081'">http://localhost:8081',
> 'properties.key.deserializer'='org.apache.kafka.common.serialization.LongDeserializer',
> 'properties.value.deserializer'='io.confluent.kafka.serializers.KafkaAvroDeserializer')
>

I believe this part of the connector configuration needs to be updated:

'key.format' = 'avro',
'value.format' = 'avro',

to this:

'key.format' = 'avro-confluent',
'value.format' = 'avro-confluent'

Also, the 'properties.key.deserializer and 'properties.value.deserializer'
should not be specified. They have no effect actually, since Flink will
override them based on the information we provide in the 'key.format' and
'value.format' entries (see here:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/kafka.html#properties
).

In the next version of the documentation, there will be examples of how to
use the Confluent Avro format for both the key and the value, including an
example with the upsert-kafka connector. You can already preview it here:

https://ci.apache.org/projects/flink/flink-docs-release-1.13/dev/table/connectors/formats/avro-confluent.html#how-to-create-tables-with-avro-confluent-format

I hope this helps,

Cheers,

Svend

On Sun, Feb 7, 2021 at 6:22 AM shamit jain  wrote:

> Hello Team,
>
> I am facing issue with "upsert-kafka" connector which should read the Avro
> message serialized using
> "io.confluent.kafka.serializers.KafkaAvroDeserializer". Same is working
> with "kafka" connector.
>
> Looks like we are not able to pass the schema registry url and subject
> name like the way we are passing while using "kafka" connector.
>
> Please help.
>
>
> Table definition with upsert-kafka is below (not working),
>
> CREATE TABLE proposalLine (PROPOSAL_LINE_ID
> bigint,LAST_MODIFIED_BY STRING ,PRIMARY KEY (PROPOSAL_LINE_ID) NOT ENFORCED
> ) "WITH ('connector' = 'upsert-kafka', 'properties.bootstrap.servers' =
> 'localhost:9092', 'properties.auto.offset.reset' = 'earliest', 'topic' =
> 'lndcdcadsprpslproposalline', 'key.format' = 'avro', 'value.format' =
> 'avro', 'properties.group.id'='dd', 'properties.schema.registry.url'=' href="http://localhost:8081'">http://localhost:8081',
> 'properties.key.deserializer'='org.apache.kafka.common.serialization.LongDeserializer',
> 'properties.value.deserializer'='io.confluent.kafka.serializers.KafkaAvroDeserializer')
>
> ERROR:
>  Caused by: java.io.IOException: Failed to deserialize Avro record.
> at
> org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:101)
> at
> org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:44)
> at
> org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:82)
> at
> org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaDeserializationSchema.deserialize(DynamicKafkaDeserializationSchema.java:130)
> at
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:179)
> at
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.runFetchLoop(KafkaFetcher.java:142)
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:826)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
> at org.apache.avro.io
> .ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
> at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
> at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
> at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
> at
> org.apache.avro.g

[jira] [Created] (FLINK-21345) NullPointerException LogicalCorrelateToJoinFromTemporalTableFunctionRule.scala:157

2021-02-09 Thread lynn1.zhang (Jira)
lynn1.zhang created FLINK-21345:
---

 Summary: NullPointerException 
LogicalCorrelateToJoinFromTemporalTableFunctionRule.scala:157
 Key: FLINK-21345
 URL: https://issues.apache.org/jira/browse/FLINK-21345
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.12.1
 Environment: Planner: BlinkPlanner
Flink Version: 1.12.1_2.11
Java Version: 1.8
OS: mac os
Reporter: lynn1.zhang


First Step: Create 2 Source Tables as below:
{code:java}
CREATE TABLE test_streaming(
 vid BIGINT,
 ts BIGINT,
 proc AS proctime()
) WITH (
 'connector' = 'kafka',
 'topic' = 'test-streaming',
 'properties.bootstrap.servers' = '127.0.0.1:9092',
 'scan.startup.mode' = 'latest-offset',
 'format' = 'json'
);
CREATE TABLE test_streaming2(
 vid BIGINT,
 ts BIGINT,
 proc AS proctime()
) WITH (
 'connector' = 'kafka',
 'topic' = 'test-streaming2',
 'properties.bootstrap.servers' = '127.0.0.1:9092',
 'scan.startup.mode' = 'latest-offset',
 'format' = 'json'
);
{code}
Second Step: Create a TEMPORARY Table Function, function name:dim, key:vid, 
timestamp:proctime()

Third Step, test_streaming union all  test_streaming2 join dim like below:

 
{code:java}
SELECT r.vid,d.name,timestamp_from_long(r.ts)
FROM (
SELECT * FROM test_streaming UNION ALL SELECT * FROM test_streaming2
) AS r,
LATERAL TABLE (dim(r.proc)) AS d
WHERE r.vid = d.vid;
{code}
Exception Detail:(if not union all, the program run ok)

 
{code:java}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromTemporalTableFunctionRule.getRelOptSchema(LogicalCorrelateToJoinFromTemporalTableFunctionRule.scala:157)
at 
org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromTemporalTableFunctionRule.onMatch(LogicalCorrelateToJoinFromTemporalTableFunctionRule.scala:99)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:155)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:155)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:155)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:155)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:155)
at scala.collection.immutable.Range.foreach(Range.scala:166)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:155)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:62)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:58)
at 
scala.collection.TraversableOnce$$anon

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Shengkai Fang
Hi, guys.

I have updated the FLIP.  It seems we have reached agreement. Maybe we can
start the vote soon. If anyone has other questions, please leave your
comments.

Best,
Shengkai

Rui Li 于2021年2月9日 周二下午7:52写道:

> Hi guys,
>
> The conclusion sounds good to me.
>
> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang  wrote:
>
> > Hi, Timo, Jark.
> >
> > I am fine with the new option name.
> >
> > Best,
> > Shengkai
> >
> > Timo Walther 于2021年2月9日 周二下午5:35写道:
> >
> > > Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > >
> > > @Rui, Shengkai: Are you also fine with this conclusion?
> > >
> > > Thanks,
> > > Timo
> > >
> > > On 09.02.21 10:14, Jark Wu wrote:
> > > > I'm fine with `table.multi-dml-sync`.
> > > >
> > > > My previous concern about "multi" is that DML in CLI looks like
> single
> > > > statement.
> > > > But we can treat CLI as a multi-line accepting statements from
> opening
> > to
> > > > closing.
> > > > Thus, I'm fine with `table.multi-dml-sync`.
> > > >
> > > > So the conclusion is `table.multi-dml-sync` (false by default), and
> we
> > > will
> > > > support this config
> > > > in SQL CLI first, will support it in
> TableEnvironment#executeMultiSql()
> > > in
> > > > the future, right?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Tue, 9 Feb 2021 at 16:37, Timo Walther 
> wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I understand Rui's concerns. `table.dml-sync` should not apply to
> > > >> regular `executeSql`. Actually, this option makes only sense when
> > > >> executing multi statements. Once we have a
> > > >> `TableEnvironment.executeMultiSql()` this config could be
> considered.
> > > >>
> > > >> Maybe we can find a better generic name? Other platforms will also
> > need
> > > >> to have this config option, which is why I would like to avoid a SQL
> > > >> Client specific option. Otherwise every platform has to come up with
> > > >> this important config option separately.
> > > >>
> > > >> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> > opinions?
> > > >>
> > > >> Regards,
> > > >> Timo
> > > >>
> > > >> On 09.02.21 08:50, Shengkai Fang wrote:
> > > >>> Hi, all.
> > > >>>
> > > >>> I think it may cause user confused. The main problem is  we have no
> > > means
> > > >>> to detect the conflict configuration, e.g. users set the option
> true
> > > and
> > > >>> use `TableResult#await` together.
> > > >>>
> > > >>> Best,
> > > >>> Shengkai.
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>


Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Jark Wu
Hi Timo,

1) I'm fine with `Column`, but are we going to introduce new interfaces
for `UniqueConstraint` and `WatermarkSpec`? If we want to introduce
a new stack, it would be better to have a different name, otherwise,
it's easy to use a wrong class for users.

Best,
Jark

On Wed, 10 Feb 2021 at 09:49, Rui Li  wrote:

> I see. Makes sense to me. Thanks Timo for the detailed explanation!
>
> On Tue, Feb 9, 2021 at 9:48 PM Timo Walther  wrote:
>
> > Hi Rui,
> >
> > 1. It depends whether you would like to declare (unresolved) or use
> > (resolved) a schema. In catalogs and APIs, people would actually like to
> > declare a schema. Because the schema might reference objects from other
> > catalogs etc. However, whenever the schema comes out of the framework it
> > is fully resolved and people can use to configure their UI, connector,
> etc.
> > 2. No, `getTable` doesn't have to return a resolved schema. Actually,
> > this was my initial design (see Rejected Alternatives 1) where we pass
> > the SchemaResolver into the Catalog. However, a catalog must not deal
> > with resolution. When storing a table we need a resolved schema to
> > perist the fully expanded properties, however, when reading those
> > properties in again the schema can be resolved in a later stage.
> >
> > Regards,
> > Timo
> >
> > On 09.02.21 14:07, Rui Li wrote:
> > > Hi Timo,
> > >
> > > Thanks for the FLIP. It looks good to me overall. I have two questions.
> > > 1. When should we use a resolved schema and when to use an unresolved
> > one?
> > > 2. The FLIP mentions only resolved tables/views can be stored into a
> > > catalog. Does that mean the getTable method should also return a
> resolved
> > > object?
> > >
> > > On Tue, Feb 9, 2021 at 6:29 PM Timo Walther 
> wrote:
> > >
> > >> Hi Jark,
> > >>
> > >> thanks for your feedback. Let me answer some of your comments:
> > >>
> > >> 1) Since we decided to build an entire new stack, we can also
> introduce
> > >> better names for columns, constraints, and watermark spec. My goal was
> > >> to shorten the names during this refactoring. Therefore, `TableSchema`
> > >> becomes `Schema` and `TableColumn` becomes `Column`. This also fits
> > >> better to a `CatalogView` that has a schema but is actually not a
> table
> > >> but a view. So `Column` is very generic. What do you think?
> > >>
> > >> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will
> store
> > >> `ResolvedExpression`.
> > >>
> > >> 3) I adopted most of the methods from `TableSchema` in
> `ResolvedSchema`.
> > >> However, I skipped `getColumnDataTypes()` because the behavior is not
> > >> clear to me. Should it include computed columns or virtual metadata
> > >> columns? I think we should force users to think about what they
> require.
> > >> Otherwise we implicitly introduce bugs.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >> On 09.02.21 10:56, Jark Wu wrote:
> > >>> Hi Timo,
> > >>>
> > >>> The messy TableSchema confuses many developers.
> > >>> It's great to see we can finally come up with a clean interface
> > hierarchy
> > >>> and still backward compatible.
> > >>>
> > >>> Thanks for preparing the nice FLIP. It looks good to me. I have some
> > >> minor
> > >>> comments:
> > >>>
> > >>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn`
> instead
> > >> of
> > >>> `Column`?
> > >>>
> > >>> 2) You mentioned ResolvedSchema should store ResolvedExpression,
> should
> > >> we
> > >>> extend
> > >>> `ComputedColumn` and `WatermarkSpec` to allow
> `ResolvedExpression`?
> > >>>
> > >>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better
> > to
> > >>> add un-deprecated
> > >>> methods of `TableSchema` into `ResolvedSchema`
> > >>> (e.g. `getColumnDataTypes()`).
> > >>> Then users can have a smooth migration.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <
> dwysakow...@apache.org>
> > >>> wrote:
> > >>>
> >  Hi Timo,
> > 
> >    From my perspective the proposed changes look good. I agree it is
> an
> >  important step towards FLIP-129 and FLIP-136. Personally I feel
> >  comfortable voting on the document.
> > 
> >  Best,
> > 
> >  Dawid
> > 
> >  On 05/02/2021 16:09, Timo Walther wrote:
> > > Hi everyone,
> > >
> > > you might have seen that we discussed a better schema API in past
> as
> > > part of FLIP-129 and FLIP-136. We also discussed this topic during
> > > different releases:
> > >
> > > https://issues.apache.org/jira/browse/FLINK-17793
> > >
> > > Jark and I had an offline discussion how we can finally fix this
> > > shortcoming and maintain backwards compatibile for a couple of
> > > releases to give people time to update their code.
> > >
> > > I would like to propose the following FLIP:
> > >
> > >
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs

Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Rui Li
I see. Makes sense to me. Thanks Timo for the detailed explanation!

On Tue, Feb 9, 2021 at 9:48 PM Timo Walther  wrote:

> Hi Rui,
>
> 1. It depends whether you would like to declare (unresolved) or use
> (resolved) a schema. In catalogs and APIs, people would actually like to
> declare a schema. Because the schema might reference objects from other
> catalogs etc. However, whenever the schema comes out of the framework it
> is fully resolved and people can use to configure their UI, connector, etc.
> 2. No, `getTable` doesn't have to return a resolved schema. Actually,
> this was my initial design (see Rejected Alternatives 1) where we pass
> the SchemaResolver into the Catalog. However, a catalog must not deal
> with resolution. When storing a table we need a resolved schema to
> perist the fully expanded properties, however, when reading those
> properties in again the schema can be resolved in a later stage.
>
> Regards,
> Timo
>
> On 09.02.21 14:07, Rui Li wrote:
> > Hi Timo,
> >
> > Thanks for the FLIP. It looks good to me overall. I have two questions.
> > 1. When should we use a resolved schema and when to use an unresolved
> one?
> > 2. The FLIP mentions only resolved tables/views can be stored into a
> > catalog. Does that mean the getTable method should also return a resolved
> > object?
> >
> > On Tue, Feb 9, 2021 at 6:29 PM Timo Walther  wrote:
> >
> >> Hi Jark,
> >>
> >> thanks for your feedback. Let me answer some of your comments:
> >>
> >> 1) Since we decided to build an entire new stack, we can also introduce
> >> better names for columns, constraints, and watermark spec. My goal was
> >> to shorten the names during this refactoring. Therefore, `TableSchema`
> >> becomes `Schema` and `TableColumn` becomes `Column`. This also fits
> >> better to a `CatalogView` that has a schema but is actually not a table
> >> but a view. So `Column` is very generic. What do you think?
> >>
> >> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will store
> >> `ResolvedExpression`.
> >>
> >> 3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`.
> >> However, I skipped `getColumnDataTypes()` because the behavior is not
> >> clear to me. Should it include computed columns or virtual metadata
> >> columns? I think we should force users to think about what they require.
> >> Otherwise we implicitly introduce bugs.
> >>
> >> Regards,
> >> Timo
> >>
> >> On 09.02.21 10:56, Jark Wu wrote:
> >>> Hi Timo,
> >>>
> >>> The messy TableSchema confuses many developers.
> >>> It's great to see we can finally come up with a clean interface
> hierarchy
> >>> and still backward compatible.
> >>>
> >>> Thanks for preparing the nice FLIP. It looks good to me. I have some
> >> minor
> >>> comments:
> >>>
> >>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead
> >> of
> >>> `Column`?
> >>>
> >>> 2) You mentioned ResolvedSchema should store ResolvedExpression, should
> >> we
> >>> extend
> >>> `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`?
> >>>
> >>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better
> to
> >>> add un-deprecated
> >>> methods of `TableSchema` into `ResolvedSchema`
> >>> (e.g. `getColumnDataTypes()`).
> >>> Then users can have a smooth migration.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz 
> >>> wrote:
> >>>
>  Hi Timo,
> 
>    From my perspective the proposed changes look good. I agree it is an
>  important step towards FLIP-129 and FLIP-136. Personally I feel
>  comfortable voting on the document.
> 
>  Best,
> 
>  Dawid
> 
>  On 05/02/2021 16:09, Timo Walther wrote:
> > Hi everyone,
> >
> > you might have seen that we discussed a better schema API in past as
> > part of FLIP-129 and FLIP-136. We also discussed this topic during
> > different releases:
> >
> > https://issues.apache.org/jira/browse/FLINK-17793
> >
> > Jark and I had an offline discussion how we can finally fix this
> > shortcoming and maintain backwards compatibile for a couple of
> > releases to give people time to update their code.
> >
> > I would like to propose the following FLIP:
> >
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs
> >
> >
> > The FLIP updates the class hierarchy to achieve the following goals:
> >
> > - make it visible whether a schema is resolved or unresolved and when
> > the resolution happens
> > - offer a unified API for FLIP-129, FLIP-136, and catalogs
> > - allow arbitrary data types and expressions in the schema for
> > watermark spec or columns
> > - have access to other catalogs for declaring a data type or
> > expression via CatalogManager
> > - a cleaned up TableSchema
> > - remain backwards compatible in the persisted properties and API
> >
> >>

[jira] [Created] (FLINK-21344) Support switching from/to rocks db with heap timers

2021-02-09 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-21344:


 Summary: Support switching from/to rocks db with heap timers
 Key: FLINK-21344
 URL: https://issues.apache.org/jira/browse/FLINK-21344
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.13.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21343) Update documentation with possible migration strategies

2021-02-09 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-21343:


 Summary: Update documentation with possible migration strategies
 Key: FLINK-21343
 URL: https://issues.apache.org/jira/browse/FLINK-21343
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Runtime / State Backends
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.13.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21342) Various classes extending TestBases cannot be instantiated

2021-02-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-21342:


 Summary: Various classes extending TestBases cannot be instantiated
 Key: FLINK-21342
 URL: https://issues.apache.org/jira/browse/FLINK-21342
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.13.0


Various tests (mostly serializer tests) use a pattern where they have an inner 
class extending a test base. When the naming conventions are relaxed these are 
picked up as test classes, but they lack a public constructor.

Putting aside how jank this approach is, we can add {{@Ignore}} as a workaround.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21341) Update state reader to return InputFormat

2021-02-09 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-21341:


 Summary: Update state reader to return InputFormat
 Key: FLINK-21341
 URL: https://issues.apache.org/jira/browse/FLINK-21341
 Project: Flink
  Issue Type: Improvement
  Components: API / State Processor
Affects Versions: 1.13.0
Reporter: Seth Wiesman


The state processor api on read currently takes an ExecutionEnvironment and 
returns a DataSet. As Flink now supports Bounded DataStream we want to support 
that as well but not have to maintain parallel apis. To that end we propose the 
following.

Introduce a new `load` method to Savepoint.

 
{code:java}
SavepointReader Savepoint#load(String path, StateBackend backend);
{code}
SavepointReader will contain the same read methods as ExistingSavepoint but 
they will instead return InputFormat. This way the input format can be 
used with either DataSet, DataStream, or Table API.

 

The reader methods in ExistingSavepoint should be deprecated.

 

Additionally, OperatorStateInputFormat and KeyedStateInputFormat should both 
now extend ResultTypeQueryable so users get efficient serializer for their 
reads.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21340) CheckForbiddenMethodsUsage is not run and fails

2021-02-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-21340:


 Summary: CheckForbiddenMethodsUsage is not run and fails
 Key: FLINK-21340
 URL: https://issues.apache.org/jira/browse/FLINK-21340
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
 Fix For: 1.13.0


This seems to be intended as a manual test, that checks that certain methods 
are not being used.

It currently fails for 35 instances, 26 of which are for classes from 
flink-shaded (i.e., out of our control).

We should either adjust the test it ignore flink-shaded and fix the remaining 
issues, or remove the test.

flink-shaded classes could be ignored pretty easily like this:
{code}
methodUsages.removeIf(
memberUsage ->
memberUsage
.getDeclaringClass()
.getPackageName()
.startsWith("org.apache.flink.shaded"));
{code}

These are the remaining failures:
{code}
private static void 
org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtilTest.modifySerialVersionUID(byte[],java.lang.String,long)
 throws java.lang.Exception
public void 
org.apache.flink.util.IOUtilsTest.testTryReadFullyFromLongerStream() throws 
java.io.IOException
public void 
org.apache.flink.core.io.PostVersionedIOReadableWritableTest.testReadNonVersionedWithLongPayload()
 throws java.io.IOException
public void 
org.apache.flink.streaming.api.operators.AbstractStreamOperatorTest.testCustomRawKeyedStateSnapshotAndRestore()
 throws java.lang.Exception
public void 
org.apache.flink.util.IOUtilsTest.testTryReadFullyFromShorterStream() throws 
java.io.IOException
public void 
org.apache.flink.core.io.PostVersionedIOReadableWritableTest.testReadVersioned()
 throws java.io.IOException
private 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriter$ChannelStateWriteResult
 
org.apache.flink.runtime.state.ChannelPersistenceITCase.write(long,java.util.Map,java.util.Map)
 throws java.lang.Exception
public void 
org.apache.flink.core.memory.HybridOnHeapMemorySegmentTest.testReadOnlyByteBufferPut()
private static org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandle 
org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.create(java.util.Random)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21339) ExceptionUtilsITCases is not run and fails

2021-02-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-21339:


 Summary: ExceptionUtilsITCases is not run and fails
 Key: FLINK-21339
 URL: https://issues.apache.org/jira/browse/FLINK-21339
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
 Fix For: 1.13.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Timo Walther

Hi Rui,

1. It depends whether you would like to declare (unresolved) or use 
(resolved) a schema. In catalogs and APIs, people would actually like to 
declare a schema. Because the schema might reference objects from other 
catalogs etc. However, whenever the schema comes out of the framework it 
is fully resolved and people can use to configure their UI, connector, etc.
2. No, `getTable` doesn't have to return a resolved schema. Actually, 
this was my initial design (see Rejected Alternatives 1) where we pass 
the SchemaResolver into the Catalog. However, a catalog must not deal 
with resolution. When storing a table we need a resolved schema to 
perist the fully expanded properties, however, when reading those 
properties in again the schema can be resolved in a later stage.


Regards,
Timo

On 09.02.21 14:07, Rui Li wrote:

Hi Timo,

Thanks for the FLIP. It looks good to me overall. I have two questions.
1. When should we use a resolved schema and when to use an unresolved one?
2. The FLIP mentions only resolved tables/views can be stored into a
catalog. Does that mean the getTable method should also return a resolved
object?

On Tue, Feb 9, 2021 at 6:29 PM Timo Walther  wrote:


Hi Jark,

thanks for your feedback. Let me answer some of your comments:

1) Since we decided to build an entire new stack, we can also introduce
better names for columns, constraints, and watermark spec. My goal was
to shorten the names during this refactoring. Therefore, `TableSchema`
becomes `Schema` and `TableColumn` becomes `Column`. This also fits
better to a `CatalogView` that has a schema but is actually not a table
but a view. So `Column` is very generic. What do you think?

2) `ComputedColumn` and `WatermarkSpec` of the new generation will store
`ResolvedExpression`.

3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`.
However, I skipped `getColumnDataTypes()` because the behavior is not
clear to me. Should it include computed columns or virtual metadata
columns? I think we should force users to think about what they require.
Otherwise we implicitly introduce bugs.

Regards,
Timo

On 09.02.21 10:56, Jark Wu wrote:

Hi Timo,

The messy TableSchema confuses many developers.
It's great to see we can finally come up with a clean interface hierarchy
and still backward compatible.

Thanks for preparing the nice FLIP. It looks good to me. I have some

minor

comments:

1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead

of

`Column`?

2) You mentioned ResolvedSchema should store ResolvedExpression, should

we

extend
`ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`?

3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to
add un-deprecated
methods of `TableSchema` into `ResolvedSchema`
(e.g. `getColumnDataTypes()`).
Then users can have a smooth migration.

Best,
Jark

On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz 
wrote:


Hi Timo,

  From my perspective the proposed changes look good. I agree it is an
important step towards FLIP-129 and FLIP-136. Personally I feel
comfortable voting on the document.

Best,

Dawid

On 05/02/2021 16:09, Timo Walther wrote:

Hi everyone,

you might have seen that we discussed a better schema API in past as
part of FLIP-129 and FLIP-136. We also discussed this topic during
different releases:

https://issues.apache.org/jira/browse/FLINK-17793

Jark and I had an offline discussion how we can finally fix this
shortcoming and maintain backwards compatibile for a couple of
releases to give people time to update their code.

I would like to propose the following FLIP:





https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs



The FLIP updates the class hierarchy to achieve the following goals:

- make it visible whether a schema is resolved or unresolved and when
the resolution happens
- offer a unified API for FLIP-129, FLIP-136, and catalogs
- allow arbitrary data types and expressions in the schema for
watermark spec or columns
- have access to other catalogs for declaring a data type or
expression via CatalogManager
- a cleaned up TableSchema
- remain backwards compatible in the persisted properties and API

Looking forward to your feedback.

Thanks,
Timo














Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Rui Li
Hi Timo,

Thanks for the FLIP. It looks good to me overall. I have two questions.
1. When should we use a resolved schema and when to use an unresolved one?
2. The FLIP mentions only resolved tables/views can be stored into a
catalog. Does that mean the getTable method should also return a resolved
object?

On Tue, Feb 9, 2021 at 6:29 PM Timo Walther  wrote:

> Hi Jark,
>
> thanks for your feedback. Let me answer some of your comments:
>
> 1) Since we decided to build an entire new stack, we can also introduce
> better names for columns, constraints, and watermark spec. My goal was
> to shorten the names during this refactoring. Therefore, `TableSchema`
> becomes `Schema` and `TableColumn` becomes `Column`. This also fits
> better to a `CatalogView` that has a schema but is actually not a table
> but a view. So `Column` is very generic. What do you think?
>
> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will store
> `ResolvedExpression`.
>
> 3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`.
> However, I skipped `getColumnDataTypes()` because the behavior is not
> clear to me. Should it include computed columns or virtual metadata
> columns? I think we should force users to think about what they require.
> Otherwise we implicitly introduce bugs.
>
> Regards,
> Timo
>
> On 09.02.21 10:56, Jark Wu wrote:
> > Hi Timo,
> >
> > The messy TableSchema confuses many developers.
> > It's great to see we can finally come up with a clean interface hierarchy
> > and still backward compatible.
> >
> > Thanks for preparing the nice FLIP. It looks good to me. I have some
> minor
> > comments:
> >
> > 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead
> of
> > `Column`?
> >
> > 2) You mentioned ResolvedSchema should store ResolvedExpression, should
> we
> > extend
> >`ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`?
> >
> > 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to
> > add un-deprecated
> > methods of `TableSchema` into `ResolvedSchema`
> > (e.g. `getColumnDataTypes()`).
> > Then users can have a smooth migration.
> >
> > Best,
> > Jark
> >
> > On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz 
> > wrote:
> >
> >> Hi Timo,
> >>
> >>  From my perspective the proposed changes look good. I agree it is an
> >> important step towards FLIP-129 and FLIP-136. Personally I feel
> >> comfortable voting on the document.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 05/02/2021 16:09, Timo Walther wrote:
> >>> Hi everyone,
> >>>
> >>> you might have seen that we discussed a better schema API in past as
> >>> part of FLIP-129 and FLIP-136. We also discussed this topic during
> >>> different releases:
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-17793
> >>>
> >>> Jark and I had an offline discussion how we can finally fix this
> >>> shortcoming and maintain backwards compatibile for a couple of
> >>> releases to give people time to update their code.
> >>>
> >>> I would like to propose the following FLIP:
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs
> >>>
> >>>
> >>> The FLIP updates the class hierarchy to achieve the following goals:
> >>>
> >>> - make it visible whether a schema is resolved or unresolved and when
> >>> the resolution happens
> >>> - offer a unified API for FLIP-129, FLIP-136, and catalogs
> >>> - allow arbitrary data types and expressions in the schema for
> >>> watermark spec or columns
> >>> - have access to other catalogs for declaring a data type or
> >>> expression via CatalogManager
> >>> - a cleaned up TableSchema
> >>> - remain backwards compatible in the persisted properties and API
> >>>
> >>> Looking forward to your feedback.
> >>>
> >>> Thanks,
> >>> Timo
> >>
> >>
> >
>
>

-- 
Best regards!
Rui Li


[jira] [Created] (FLINK-21338) Relax test naming constraints

2021-02-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-21338:


 Summary: Relax test naming constraints
 Key: FLINK-21338
 URL: https://issues.apache.org/jira/browse/FLINK-21338
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.13.0


Issues like FLINK-21337 or FLINK-21031 show that accidents happen where tests 
were added with incorrect names, causing them to not be run on CI, potentially 
hiding regressions.

[~rmetzger] had the neat idea to relax the constrainst such that in the verify 
phase we just run everything that is not end on {{Test.java}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Apache Flink 1.12.2

2021-02-09 Thread Yuan Mei
Thanks very much for your replies!

I've been reaching out to the owners of the issues/blockers mentioned
above. By now, we have

Blocker Issues:

   -

   https://issues.apache.org/jira/browse/FLINK-21013: fixed
   -

   https://issues.apache.org/jira/browse/FLINK-21030: estimate to finish by
   the end of this week, but should not be a release blocker

Best to Include:

   -

   https://issues.apache.org/jira/browse/FLINK-20417: fixed
   -

   https://issues.apache.org/jira/browse/FLINK-20663: estimate to finish by
   the end of this week, but should not be a release blocker
   -

   https://github.com/apache/flink/pull/14848: under review


Hence, let's target *a feature freeze on Feb. 15, next Monday*!

If there is any concern on anything, please feel free to contact me.

Best
Yuan


On Tue, Feb 9, 2021 at 4:38 AM Matthias Pohl  wrote:

> Fabian and I were investigating strange behavior with stop-with-savepoint
> not terminating when using the new fromSource to add a source to a job
> definition. I created FLINK-21323 [1] to cover the issue. This might not be
> a blocker for 1.12.2 since this bug would have been already around since
> 1.11, if I'm not mistaken? I wanted to bring this to your attention,
> anyway. It would be good if someone more familiar with this part of the
> source code could verify our findings and confirm the severity of the
> issue.
>
> Best,
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-21323
>
> On Mon, Feb 8, 2021 at 7:25 PM Thomas Weise  wrote:
>
> > +1 for the 1.12.2 release
> >
> > On Mon, Feb 8, 2021 at 3:20 AM Matthias Pohl 
> > wrote:
> >
> > > Thanks Yuan for driving 1.12.2.
> > > +1 for releasing 1.12.2
> > >
> > > One comment about FLINK-21030 [1]: I hope to fix it this week. But
> there
> > > are still some uncertainties. The underlying problem is older than
> 1.12.
> > > Hence, the suggestion is to not block the 1.12.2 release because of
> > > FLINK-21030 [1]. I will leave it as a blocker issue, though, to
> underline
> > > that it should be fixed for 1.13.
> > >
> > > Best,
> > > Matthias
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-21030
> > >
> > > On Sun, Feb 7, 2021 at 4:10 AM Xintong Song 
> > wrote:
> > >
> > > > Thanks Yuan,
> > > >
> > > > +1 for releasing 1.12.2 and Yuan as the release manager.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Sat, Feb 6, 2021 at 3:41 PM Yu Li  wrote:
> > > >
> > > > > +1 for releasing 1.12.2, and thanks for volunteering to be our
> > release
> > > > > manager Yuan.
> > > > >
> > > > > Besides the mentioned issues, I could see two more blockers with
> > 1.12.2
> > > > as
> > > > > fix version [1] and need some tracking:
> > > > > * FLINK-21013 
> > > Blink
> > > > > planner does not ingest timestamp into StreamRecord
> > > > > * FLINK- 21030  > graph>
> > > > > Broken
> > > > > job restart for job with disjoint graph
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > > [1] https://s.apache.org/flink-1.12.2-blockers
> > > > >
> > > > >
> > > > > On Sat, 6 Feb 2021 at 08:57, Kurt Young  wrote:
> > > > >
> > > > > > Thanks for being our release manager Yuan.
> > > > > >
> > > > > > We found a out of memory issue [1] which will affect most batch
> > jobs
> > > > > thus I
> > > > > > think
> > > > > > it would be great if we can include this fix in 1.12.2.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-20663
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Sat, Feb 6, 2021 at 12:36 AM Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for volunteering for being our release manager Yuan :-)
> > > > > > >
> > > > > > > +1 for a timely bug fix release.
> > > > > > >
> > > > > > > I will try to review the PR for FLINK- 20417 [1] which is a
> good
> > > fix
> > > > to
> > > > > > > include in the next bug fix release. We don't have to block the
> > > > release
> > > > > > on
> > > > > > > this fix though.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-20417
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Fri, Feb 5, 2021 at 5:12 PM Piotr Nowojski <
> > > pnowoj...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Thanks Yuan for bringing up this topic.
> > > > > > > >
> > > > > > > > +1 for the quick 1.12.2 release.
> > > > > > > >
> > > > > > > > As Yuan mentioned, me and Roman can help whenever committer
> > > rights
> > > > > will
> > > > > > > be
> > > > > > > > required.
> > > > > > > >
> > > > > > > > > I am a Firefox user and I just fixed a long lasting bug
> > > > > > > > https://github.com/apache/flink/pull/14848 , wish this would
> > be
> > > > > merged
> > > > > > > in
> > > > > > > > this release.
> > > > > > > >
> > > >

[jira] [Created] (FLINK-21337) ComparableInputTypeStrategyTests is not running

2021-02-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-21337:


 Summary: ComparableInputTypeStrategyTests is not running
 Key: FLINK-21337
 URL: https://issues.apache.org/jira/browse/FLINK-21337
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Tests
Affects Versions: 1.13.0
Reporter: Chesnay Schepler


The test is incorrectly named; the trailing "s" is disabling the test.

ping [~dwysakowicz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Rui Li
Hi guys,

The conclusion sounds good to me.

On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang  wrote:

> Hi, Timo, Jark.
>
> I am fine with the new option name.
>
> Best,
> Shengkai
>
> Timo Walther 于2021年2月9日 周二下午5:35写道:
>
> > Yes, `TableEnvironment#executeMultiSql()` can be future work.
> >
> > @Rui, Shengkai: Are you also fine with this conclusion?
> >
> > Thanks,
> > Timo
> >
> > On 09.02.21 10:14, Jark Wu wrote:
> > > I'm fine with `table.multi-dml-sync`.
> > >
> > > My previous concern about "multi" is that DML in CLI looks like single
> > > statement.
> > > But we can treat CLI as a multi-line accepting statements from opening
> to
> > > closing.
> > > Thus, I'm fine with `table.multi-dml-sync`.
> > >
> > > So the conclusion is `table.multi-dml-sync` (false by default), and we
> > will
> > > support this config
> > > in SQL CLI first, will support it in TableEnvironment#executeMultiSql()
> > in
> > > the future, right?
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 9 Feb 2021 at 16:37, Timo Walther  wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I understand Rui's concerns. `table.dml-sync` should not apply to
> > >> regular `executeSql`. Actually, this option makes only sense when
> > >> executing multi statements. Once we have a
> > >> `TableEnvironment.executeMultiSql()` this config could be considered.
> > >>
> > >> Maybe we can find a better generic name? Other platforms will also
> need
> > >> to have this config option, which is why I would like to avoid a SQL
> > >> Client specific option. Otherwise every platform has to come up with
> > >> this important config option separately.
> > >>
> > >> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> opinions?
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >> On 09.02.21 08:50, Shengkai Fang wrote:
> > >>> Hi, all.
> > >>>
> > >>> I think it may cause user confused. The main problem is  we have no
> > means
> > >>> to detect the conflict configuration, e.g. users set the option true
> > and
> > >>> use `TableResult#await` together.
> > >>>
> > >>> Best,
> > >>> Shengkai.
> > >>>
> > >>
> > >>
> > >
> >
> >
>


-- 
Best regards!
Rui Li


Re: Activate bloom filter in RocksDB State Backend via Flink configuration

2021-02-09 Thread Jun Qin
Thanks Till and Yun Tang. 

I’ve created https://issues.apache.org/jira/browse/FLINK-21336 
 and I will work on it.

Thanks
Jun

> On Feb 9, 2021, at 7:52 AM, Yun Tang  wrote:
> 
> Hi Jun,
> 
> Some predefined options would also activate bloom filters, e.g.  
> PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering 
> configurable option is good idea. +1 for this.
> 
> When talking about the bloom filter default value, I slight prefer to use 
> full format [1] instead of old block format. This is related with FLINK-20496 
> [2] which try to add option to enable partitioned index & filter.
> 
> [1] 
> https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format
> [2] https://issues.apache.org/jira/browse/FLINK-20496
> 
> Best
> Yun Tang
> 
> From: Till Rohrmann 
> Sent: Monday, February 8, 2021 17:06
> To: dev 
> Subject: Re: Activate bloom filter in RocksDB State Backend via Flink 
> configuration
> 
> Hi Jun,
> 
> Making things easier to use and configure is a good idea. Hence, +1 for
> this proposal. Maybe create a JIRA ticket for it.
> 
> For the concrete default values it would be nice to hear the opinion of a
> RocksDB expert.
> 
> Cheers,
> Till
> 
> On Sun, Feb 7, 2021 at 7:23 PM Jun Qin  wrote:
> 
>> Hi,
>> 
>> Activating bloom filter in the RocksDB state backend improves read
>> performance. Currently activating bloom filter can only be done by
>> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
>> provide an option to activate bloom filter via Flink configuration.  What
>> do you think? If so, what about the following configuration?
>> 
>> state.backend.rocksdb.bloom-filter.enabled: false (default)
>> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
>> state.backend.rocksdb.bloom-filter.block-based: true (default)
>> 
>> 
>> Thanks
>> Jun



[jira] [Created] (FLINK-21336) Activate bloom filter in RocksDB State Backend via Flink configuration

2021-02-09 Thread Jun Qin (Jira)
Jun Qin created FLINK-21336:
---

 Summary: Activate bloom filter in RocksDB State Backend via Flink 
configuration
 Key: FLINK-21336
 URL: https://issues.apache.org/jira/browse/FLINK-21336
 Project: Flink
  Issue Type: Improvement
Reporter: Jun Qin
Assignee: Jun Qin


Activating bloom filter in the RocksDB state backend improves read performance. 
Currently activating bloom filter can only be done by implementing a custom 
ConfigurableRocksDBOptionsFactory. I think we should provide an option to 
activate bloom filter via Flink configuration.

See also the discussion in ML:

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Activate-bloom-filter-in-RocksDB-State-Backend-via-Flink-configuration-td48636.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Timo Walther

Hi Jark,

thanks for your feedback. Let me answer some of your comments:

1) Since we decided to build an entire new stack, we can also introduce 
better names for columns, constraints, and watermark spec. My goal was 
to shorten the names during this refactoring. Therefore, `TableSchema` 
becomes `Schema` and `TableColumn` becomes `Column`. This also fits 
better to a `CatalogView` that has a schema but is actually not a table 
but a view. So `Column` is very generic. What do you think?


2) `ComputedColumn` and `WatermarkSpec` of the new generation will store 
`ResolvedExpression`.


3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`. 
However, I skipped `getColumnDataTypes()` because the behavior is not 
clear to me. Should it include computed columns or virtual metadata 
columns? I think we should force users to think about what they require. 
Otherwise we implicitly introduce bugs.


Regards,
Timo

On 09.02.21 10:56, Jark Wu wrote:

Hi Timo,

The messy TableSchema confuses many developers.
It's great to see we can finally come up with a clean interface hierarchy
and still backward compatible.

Thanks for preparing the nice FLIP. It looks good to me. I have some minor
comments:

1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead of
`Column`?

2) You mentioned ResolvedSchema should store ResolvedExpression, should we
extend
   `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`?

3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to
add un-deprecated
methods of `TableSchema` into `ResolvedSchema`
(e.g. `getColumnDataTypes()`).
Then users can have a smooth migration.

Best,
Jark

On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz 
wrote:


Hi Timo,

 From my perspective the proposed changes look good. I agree it is an
important step towards FLIP-129 and FLIP-136. Personally I feel
comfortable voting on the document.

Best,

Dawid

On 05/02/2021 16:09, Timo Walther wrote:

Hi everyone,

you might have seen that we discussed a better schema API in past as
part of FLIP-129 and FLIP-136. We also discussed this topic during
different releases:

https://issues.apache.org/jira/browse/FLINK-17793

Jark and I had an offline discussion how we can finally fix this
shortcoming and maintain backwards compatibile for a couple of
releases to give people time to update their code.

I would like to propose the following FLIP:



https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs



The FLIP updates the class hierarchy to achieve the following goals:

- make it visible whether a schema is resolved or unresolved and when
the resolution happens
- offer a unified API for FLIP-129, FLIP-136, and catalogs
- allow arbitrary data types and expressions in the schema for
watermark spec or columns
- have access to other catalogs for declaring a data type or
expression via CatalogManager
- a cleaned up TableSchema
- remain backwards compatible in the persisted properties and API

Looking forward to your feedback.

Thanks,
Timo









Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Jark Wu
Hi Timo,

The messy TableSchema confuses many developers.
It's great to see we can finally come up with a clean interface hierarchy
and still backward compatible.

Thanks for preparing the nice FLIP. It looks good to me. I have some minor
comments:

1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead of
`Column`?

2) You mentioned ResolvedSchema should store ResolvedExpression, should we
extend
  `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`?

3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to
add un-deprecated
methods of `TableSchema` into `ResolvedSchema`
(e.g. `getColumnDataTypes()`).
Then users can have a smooth migration.

Best,
Jark

On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz 
wrote:

> Hi Timo,
>
> From my perspective the proposed changes look good. I agree it is an
> important step towards FLIP-129 and FLIP-136. Personally I feel
> comfortable voting on the document.
>
> Best,
>
> Dawid
>
> On 05/02/2021 16:09, Timo Walther wrote:
> > Hi everyone,
> >
> > you might have seen that we discussed a better schema API in past as
> > part of FLIP-129 and FLIP-136. We also discussed this topic during
> > different releases:
> >
> > https://issues.apache.org/jira/browse/FLINK-17793
> >
> > Jark and I had an offline discussion how we can finally fix this
> > shortcoming and maintain backwards compatibile for a couple of
> > releases to give people time to update their code.
> >
> > I would like to propose the following FLIP:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs
> >
> >
> > The FLIP updates the class hierarchy to achieve the following goals:
> >
> > - make it visible whether a schema is resolved or unresolved and when
> > the resolution happens
> > - offer a unified API for FLIP-129, FLIP-136, and catalogs
> > - allow arbitrary data types and expressions in the schema for
> > watermark spec or columns
> > - have access to other catalogs for declaring a data type or
> > expression via CatalogManager
> > - a cleaned up TableSchema
> > - remain backwards compatible in the persisted properties and API
> >
> > Looking forward to your feedback.
> >
> > Thanks,
> > Timo
>
>


Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Shengkai Fang
Hi, Timo, Jark.

I am fine with the new option name.

Best,
Shengkai

Timo Walther 于2021年2月9日 周二下午5:35写道:

> Yes, `TableEnvironment#executeMultiSql()` can be future work.
>
> @Rui, Shengkai: Are you also fine with this conclusion?
>
> Thanks,
> Timo
>
> On 09.02.21 10:14, Jark Wu wrote:
> > I'm fine with `table.multi-dml-sync`.
> >
> > My previous concern about "multi" is that DML in CLI looks like single
> > statement.
> > But we can treat CLI as a multi-line accepting statements from opening to
> > closing.
> > Thus, I'm fine with `table.multi-dml-sync`.
> >
> > So the conclusion is `table.multi-dml-sync` (false by default), and we
> will
> > support this config
> > in SQL CLI first, will support it in TableEnvironment#executeMultiSql()
> in
> > the future, right?
> >
> > Best,
> > Jark
> >
> > On Tue, 9 Feb 2021 at 16:37, Timo Walther  wrote:
> >
> >> Hi everyone,
> >>
> >> I understand Rui's concerns. `table.dml-sync` should not apply to
> >> regular `executeSql`. Actually, this option makes only sense when
> >> executing multi statements. Once we have a
> >> `TableEnvironment.executeMultiSql()` this config could be considered.
> >>
> >> Maybe we can find a better generic name? Other platforms will also need
> >> to have this config option, which is why I would like to avoid a SQL
> >> Client specific option. Otherwise every platform has to come up with
> >> this important config option separately.
> >>
> >> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?
> >>
> >> Regards,
> >> Timo
> >>
> >> On 09.02.21 08:50, Shengkai Fang wrote:
> >>> Hi, all.
> >>>
> >>> I think it may cause user confused. The main problem is  we have no
> means
> >>> to detect the conflict configuration, e.g. users set the option true
> and
> >>> use `TableResult#await` together.
> >>>
> >>> Best,
> >>> Shengkai.
> >>>
> >>
> >>
> >
>
>


[jira] [Created] (FLINK-21335) READEME should use "mvn install" instead of "mvn package" to compile Flink

2021-02-09 Thread Dong Lin (Jira)
Dong Lin created FLINK-21335:


 Summary: READEME should use "mvn install" instead of "mvn package" 
to compile Flink
 Key: FLINK-21335
 URL: https://issues.apache.org/jira/browse/FLINK-21335
 Project: Flink
  Issue Type: Improvement
Reporter: Dong Lin


Currently README suggests users to use "mvn package" to build Flink. As a 
result, the compiled package will not be put in mvn local repository, and "mvn 
test" will download/use snapshots from remote repository instead of testing 
local code change.

This is a bit unintuitive since developers (including those who are not so 
familiar with maven's mechanism) in general expect "mvn test" to test local 
code changes.

So it seems better to update README to use "mvn install" instead of "mvn 
package". There is no downside of this change to my best knowledge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Timo Walther

Yes, `TableEnvironment#executeMultiSql()` can be future work.

@Rui, Shengkai: Are you also fine with this conclusion?

Thanks,
Timo

On 09.02.21 10:14, Jark Wu wrote:

I'm fine with `table.multi-dml-sync`.

My previous concern about "multi" is that DML in CLI looks like single
statement.
But we can treat CLI as a multi-line accepting statements from opening to
closing.
Thus, I'm fine with `table.multi-dml-sync`.

So the conclusion is `table.multi-dml-sync` (false by default), and we will
support this config
in SQL CLI first, will support it in TableEnvironment#executeMultiSql() in
the future, right?

Best,
Jark

On Tue, 9 Feb 2021 at 16:37, Timo Walther  wrote:


Hi everyone,

I understand Rui's concerns. `table.dml-sync` should not apply to
regular `executeSql`. Actually, this option makes only sense when
executing multi statements. Once we have a
`TableEnvironment.executeMultiSql()` this config could be considered.

Maybe we can find a better generic name? Other platforms will also need
to have this config option, which is why I would like to avoid a SQL
Client specific option. Otherwise every platform has to come up with
this important config option separately.

Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?

Regards,
Timo

On 09.02.21 08:50, Shengkai Fang wrote:

Hi, all.

I think it may cause user confused. The main problem is  we have no means
to detect the conflict configuration, e.g. users set the option true and
use `TableResult#await` together.

Best,
Shengkai.










Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Jark Wu
I'm fine with `table.multi-dml-sync`.

My previous concern about "multi" is that DML in CLI looks like single
statement.
But we can treat CLI as a multi-line accepting statements from opening to
closing.
Thus, I'm fine with `table.multi-dml-sync`.

So the conclusion is `table.multi-dml-sync` (false by default), and we will
support this config
in SQL CLI first, will support it in TableEnvironment#executeMultiSql() in
the future, right?

Best,
Jark

On Tue, 9 Feb 2021 at 16:37, Timo Walther  wrote:

> Hi everyone,
>
> I understand Rui's concerns. `table.dml-sync` should not apply to
> regular `executeSql`. Actually, this option makes only sense when
> executing multi statements. Once we have a
> `TableEnvironment.executeMultiSql()` this config could be considered.
>
> Maybe we can find a better generic name? Other platforms will also need
> to have this config option, which is why I would like to avoid a SQL
> Client specific option. Otherwise every platform has to come up with
> this important config option separately.
>
> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?
>
> Regards,
> Timo
>
> On 09.02.21 08:50, Shengkai Fang wrote:
> > Hi, all.
> >
> > I think it may cause user confused. The main problem is  we have no means
> > to detect the conflict configuration, e.g. users set the option true and
> > use `TableResult#await` together.
> >
> > Best,
> > Shengkai.
> >
>
>


[jira] [Created] (FLINK-21334) Run kubernetes pyflink application test fails with Connection reset by peer on setup

2021-02-09 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-21334:
--

 Summary: Run kubernetes pyflink application test fails with 
Connection reset by peer on setup
 Key: FLINK-21334
 URL: https://issues.apache.org/jira/browse/FLINK-21334
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Tests
Affects Versions: 1.13.0
Reporter: Piotr Nowojski


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=13105&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529


{noformat}
2021-02-08T16:22:20.9868739Z Feb 08 16:22:20 Collecting pyarrow<0.18.0,>=0.15.1 
(from apache-flink==1.13.dev0)
2021-02-08T16:22:21.0637683Z Feb 08 16:22:21   Downloading 
https://files.pythonhosted.org/packages/9c/2e/793ea698895b6d131546ee1bad995ce6b690695edd8a2e0d3eb9568b0a65/pyarrow-0.17.1-cp37-cp37m-manylinux1_x86_64.whl
 (64.5MB)
2021-02-08T16:22:21.2131155Z Feb 08 16:22:21 Exception:
2021-02-08T16:22:21.2131756Z Feb 08 16:22:21 Traceback (most recent call last):
2021-02-08T16:22:21.2143148Z Feb 08 16:22:21   File 
"/usr/share/python-wheels/urllib3-1.24.1-py2.py3-none-any.whl/urllib3/response.py",
 line 360, in _error_catcher
2021-02-08T16:22:21.2143871Z Feb 08 16:22:21 yield
2021-02-08T16:22:21.2144799Z Feb 08 16:22:21   File 
"/usr/share/python-wheels/urllib3-1.24.1-py2.py3-none-any.whl/urllib3/response.py",
 line 442, in read
2021-02-08T16:22:21.2145398Z Feb 08 16:22:21 data = self._fp.read(amt)
2021-02-08T16:22:21.2146258Z Feb 08 16:22:21   File 
"/usr/share/python-wheels/CacheControl-0.11.7-py2.py3-none-any.whl/cachecontrol/filewrapper.py",
 line 60, in read
2021-02-08T16:22:21.2146902Z Feb 08 16:22:21 data = self.__fp.read(amt)
2021-02-08T16:22:21.2147398Z Feb 08 16:22:21   File 
"/usr/lib/python3.7/http/client.py", line 457, in read
2021-02-08T16:22:21.2147887Z Feb 08 16:22:21 n = self.readinto(b)
2021-02-08T16:22:21.2148392Z Feb 08 16:22:21   File 
"/usr/lib/python3.7/http/client.py", line 501, in readinto
2021-02-08T16:22:21.2148869Z Feb 08 16:22:21 n = self.fp.readinto(b)
2021-02-08T16:22:21.2149362Z Feb 08 16:22:21   File 
"/usr/lib/python3.7/socket.py", line 589, in readinto
2021-02-08T16:22:21.2149867Z Feb 08 16:22:21 return self._sock.recv_into(b)
2021-02-08T16:22:21.2150364Z Feb 08 16:22:21   File 
"/usr/lib/python3.7/ssl.py", line 1052, in recv_into
2021-02-08T16:22:21.2150848Z Feb 08 16:22:21 return self.read(nbytes, 
buffer)
2021-02-08T16:22:21.2151391Z Feb 08 16:22:21   File 
"/usr/lib/python3.7/ssl.py", line 911, in read
2021-02-08T16:22:21.2151900Z Feb 08 16:22:21 return self._sslobj.read(len, 
buffer)
2021-02-08T16:22:21.2152437Z Feb 08 16:22:21 ConnectionResetError: [Errno 104] 
Connection reset by peer
2021-02-08T16:22:21.2152869Z Feb 08 16:22:21 
2021-02-08T16:22:21.2153371Z Feb 08 16:22:21 During handling of the above 
exception, another exception occurred:
2021-02-08T16:22:21.2154047Z Feb 08 16:22:21 
2021-02-08T16:22:21.2154442Z Feb 08 16:22:21 Traceback (most recent call last):
2021-02-08T16:22:21.2155326Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 143, 
in main
2021-02-08T16:22:21.2156441Z Feb 08 16:22:21 status = self.run(options, 
args)
2021-02-08T16:22:21.2167187Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 338, 
in run
2021-02-08T16:22:21.2167793Z Feb 08 16:22:21 
resolver.resolve(requirement_set)
2021-02-08T16:22:21.2168610Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/resolve.py", line 102, in resolve
2021-02-08T16:22:21.2169187Z Feb 08 16:22:21 
self._resolve_one(requirement_set, req)
2021-02-08T16:22:21.2170026Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/resolve.py", line 256, in 
_resolve_one
2021-02-08T16:22:21.2170627Z Feb 08 16:22:21 abstract_dist = 
self._get_abstract_dist_for(req_to_install)
2021-02-08T16:22:21.2171509Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/resolve.py", line 209, in 
_get_abstract_dist_for
2021-02-08T16:22:21.2172063Z Feb 08 16:22:21 self.require_hashes
2021-02-08T16:22:21.2172906Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 283, 
in prepare_linked_requirement
2021-02-08T16:22:21.2173495Z Feb 08 16:22:21 progress_bar=self.progress_bar
2021-02-08T16:22:21.2174286Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/download.py", line 836, in 
unpack_url
2021-02-08T16:22:21.2174843Z Feb 08 16:22:21 progress_bar=progress_bar
2021-02-08T16:22:21.2175630Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-packages/pip/_internal/download.py", line 673, in 
unpack_http_url
2021-02-08T16:22:21.2176336Z Feb 08 16:22:21 progress_bar)
2021-02-08T16:22:21.2177120Z Feb 08 16:22:21   File 
"/usr/lib/python3/dist-

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Timo Walther

Hi everyone,

I understand Rui's concerns. `table.dml-sync` should not apply to 
regular `executeSql`. Actually, this option makes only sense when 
executing multi statements. Once we have a 
`TableEnvironment.executeMultiSql()` this config could be considered.


Maybe we can find a better generic name? Other platforms will also need 
to have this config option, which is why I would like to avoid a SQL 
Client specific option. Otherwise every platform has to come up with 
this important config option separately.


Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?

Regards,
Timo

On 09.02.21 08:50, Shengkai Fang wrote:

Hi, all.

I think it may cause user confused. The main problem is  we have no means
to detect the conflict configuration, e.g. users set the option true and
use `TableResult#await` together.

Best,
Shengkai.





[jira] [Created] (FLINK-21333) Introduce stopping with savepoint state

2021-02-09 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-21333:
--

 Summary: Introduce stopping with savepoint state
 Key: FLINK-21333
 URL: https://issues.apache.org/jira/browse/FLINK-21333
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Robert Metzger


The declarative scheduler is also affected by the problem described in 
FLINK-21030. We want to solve this problem by introducing a separate state when 
are taking a savepoint for stopping Flink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)