[jira] [Created] (FLINK-25242) UDF with primitive int argument does not accept int values even after a not null filter

2021-12-09 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-25242:
---

 Summary: UDF with primitive int argument does not accept int 
values even after a not null filter
 Key: FLINK-25242
 URL: https://issues.apache.org/jira/browse/FLINK-25242
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.0
Reporter: Caizhi Weng


Add the following test case to {{TableEnvironmentITCase}} to reproduce this 
issue.

{code:scala}
@Test
def myTest(): Unit = {
  tEnv.executeSql("CREATE TEMPORARY FUNCTION MyUdf AS 
'org.apache.flink.table.api.MyUdf'")
  tEnv.executeSql(
"""
  |CREATE TABLE T (
  |  a INT
  |) WITH (
  |  'connector' = 'values',
  |  'bounded' = 'true'
  |)
  |""".stripMargin)
  tEnv.executeSql("SELECT MyUdf(a) FROM T WHERE a IS NOT NULL").print()
}
{code}

UDF code
{code:scala}
class MyUdf extends ScalarFunction {
  def eval(a: Int): Int = {
a + 1
  }
}
{code}

Exception stack
{code}
org.apache.flink.table.api.ValidationException: SQL validation failed. Invalid 
function call:
default_catalog.default_database.MyUdf(INT)

at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:168)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:107)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:219)
at 
org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:101)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:736)
at 
org.apache.flink.table.api.TableEnvironmentITCase.myTest(TableEnvironmentITCase.scala:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at 
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: 

[jira] [Created] (FLINK-25241) Architectural tests do not fail on CI if violations are removed

2021-12-09 Thread Jira
Ingo Bürk created FLINK-25241:
-

 Summary: Architectural tests do not fail on CI if violations are 
removed
 Key: FLINK-25241
 URL: https://issues.apache.org/jira/browse/FLINK-25241
 Project: Flink
  Issue Type: Improvement
Reporter: Ingo Bürk
Assignee: Ingo Bürk






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25240) Update log4j2 version to 2.15.0

2021-12-09 Thread Ada Wong (Jira)
Ada Wong created FLINK-25240:


 Summary: Update log4j2 version to 2.15.0 
 Key: FLINK-25240
 URL: https://issues.apache.org/jira/browse/FLINK-25240
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.14.0
Reporter: Ada Wong


2.0 <= Apache log4j2 <= 2.14.1 have a RCE zero day.

https://www.cyberkendra.com/2021/12/worst-log4j-rce-zeroday-dropped-on.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-09 Thread wenlong.lwl
Hi, Timo, thanks for updating the doc.

I have a comment on plan migration:
I think we may need to add a version field for every exec node when
serialising. In earlier discussions, I think we have a conclusion that
treating the version of plan as the version of node, but in this case it
would be broken.
Take the following example in FLIP into consideration, there is a bad case:
when in 1.17, we introduced an incompatible version 3 and dropped version
1, we can only update the version to 2, so the version should be per exec
node.

ExecNode version *1* is not supported anymore. Even though the state is
actually compatible. The plan restore will fail with a helpful exception
that forces users to perform plan migration.

COMPILE PLAN '/mydir/plan_new.json' FROM '/mydir/plan_old.json';

The plan migration will safely replace the old version *1* with *2. The
JSON plan flinkVersion changes to 1.17.*


Best,

Wenlong

On Thu, 9 Dec 2021 at 18:36, Timo Walther  wrote:

> Hi Jing and Godfrey,
>
> I had another iteration over the document. There are two major changes:
>
> 1. Supported Flink Upgrade Versions
>
> I got the feedback via various channels that a step size of one minor
> version is not very convenient. As you said, "because upgrading to a new
> version is a time-consuming process". I rephrased this section:
>
> Upgrading usually involves work which is why many users perform this
> task rarely (e.g. only once per year). Also skipping a versions is
> common until a new feature has been introduced for which is it worth to
> upgrade. We will support the upgrade to the most recent Flink version
> from a set of previous versions. We aim to support upgrades from the
> last 2-3 releases on a best-effort basis; maybe even more depending on
> the maintenance overhead. However, in order to not grow the testing
> matrix infinitely and to perform important refactoring if necessary, we
> only guarantee upgrades with a step size of a single minor version (i.e.
> a cascade of upgrades).
>
> 2. Annotation Design
>
> I also adopted the multiple annotations design for the previous
> supportPlanFormat. So no array of versions anymore. I reworked the
> section, please have a look with updated examples:
>
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI
>
> I also got the feedback offline that `savepoint` might not be the right
> terminology for the annotation. I changed that to minPlanVersion and
> minStateVersion.
>
> Let me know what you think.
>
> Regards,
> Timo
>
>
>
> On 09.12.21 08:44, Jing Zhang wrote:
> > Hi Timo,
> > Thanks a lot for driving this discussion.
> > I believe it could solve many problems what we are suffering in
> upgrading.
> >
> > I only have a little complain on the following point.
> >
> >> For simplification of the design, we assume that upgrades use a step
> size
> > of a single minor version. We don't guarantee skipping minor versions
> (e.g.
> > 1.11 to
> > 1.14).
> >
> > In our internal production environment, we follow up with the community's
> > latest stable release version almost once a year because upgrading to a
> new
> > version is a time-consuming process.
> > So we might missed 1~3 version after we upgrade to the latest version.
> This
> > might also appears in other company too.
> > Could we guarantee FLIP-190 work if we skip minor versions less than
> > specified threshold?
> > Then we could know which version is good for us when prepare upgrading.
> >
> > Best,
> > Jing Zhang
> >
> > godfrey he  于2021年12月8日周三 22:16写道:
> >
> >> Hi Timo,
> >>
> >> Thanks for the explanation, it's much clearer now.
> >>
> >> One thing I want to confirm about `supportedPlanFormat `
> >> and `supportedSavepointFormat `:
> >> `supportedPlanFormat ` supports multiple versions,
> >> while `supportedSavepointFormat ` supports only one version ?
> >> A json plan  can be deserialized by multiple versions
> >> because default value will be set for new fields.
> >> In theory, a Savepoint can be restored by more than one version
> >> of the operators even if a state layout is changed,
> >> such as deleting a whole state and starting job with
> >> `allowNonRestoredState`=true.
> >> I think this is a corner case, and it's hard to understand comparing
> >> to `supportedPlanFormat ` supporting multiple versions.
> >> So, for most cases, when the state layout is changed, the savepoint is
> >> incompatible,
> >> and `supportedSavepointFormat` and version need to be changed.
> >>
> >> I think we need a detail explanation about the annotations change story
> in
> >> the java doc of  `ExecNodeMetadata` class for all developers
> >> (esp. those unfamiliar with this part).
> >>
> >> Best,
> >> Godfrey
> >>
> >> Timo Walther  于2021年12月8日周三 下午4:57写道:
> >>>
> >>> Hi Wenlong,
> >>>
> >>> thanks for the feedback. Great that we reached consensus here. I will
> >>> update the entire document with my previous example shortly.
> >>>
> >>>   > if we don't update 

[jira] [Created] (FLINK-25239) Delete useless variables

2021-12-09 Thread Ada Wong (Jira)
Ada Wong created FLINK-25239:


 Summary: Delete useless variables
 Key: FLINK-25239
 URL: https://issues.apache.org/jira/browse/FLINK-25239
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Affects Versions: 1.14.0
Reporter: Ada Wong


    public static final int DEFAULT_FLUSH_MAX_SIZE = 5000;
    public static final long DEFAULT_FLUSH_INTERVAL_MILLS = 0L;



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Thomas Weise
+1 for repo per connector from my side also

Thanks for trying out the different approaches.

Where would the common/infra pieces live? In a separate repository
with its own release?

Thomas

On Thu, Dec 9, 2021 at 12:42 PM Till Rohrmann  wrote:
>
> Sorry if I was a bit unclear. +1 for the single repo per connector approach.
>
> Cheers,
> Till
>
> On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann  wrote:
>
> > +1 for the single repo approach.
> >
> > Cheers,
> > Till
> >
> > On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser 
> > wrote:
> >
> >> I also agree that it feels more natural to go with a repo for each
> >> individual connector. Each repository can be made available at
> >> flink-packages.org so users can find them, next to referring to them in
> >> documentation. +1 from my side.
> >>
> >> On Thu, 9 Dec 2021 at 15:38, Arvid Heise  wrote:
> >>
> >> > Hi all,
> >> >
> >> > We tried out Chesnay's proposal and went with Option 2. Unfortunately,
> >> we
> >> > experienced tough nuts to crack and feel like we hit a dead end:
> >> > - The main pain point with the outlined Frankensteinian connector repo
> >> is
> >> > how to handle shared code / infra code. If we have it in some 
> >> > branch, then we need to merge the common branch in the connector branch
> >> on
> >> > update. However, it's unclear to me how improvements in the common
> >> branch
> >> > that naturally appear while working on a specific connector go back into
> >> > the common branch. You can't use a pull request from your branch or else
> >> > your connector code would poison the connector-less common branch. So
> >> you
> >> > would probably manually copy the files over to a common branch and
> >> create a
> >> > PR branch for that.
> >> > - A weird solution could be to have the common branch as a submodule in
> >> the
> >> > repo itself (if that's even possible). I'm sure that this setup would
> >> blow
> >> > up the minds of all newcomers.
> >> > - Similarly, it's mandatory to have safeguards against code from
> >> connector
> >> > A poisoning connector B, common, or main. I had some similar setup in
> >> the
> >> > past and code from two "distinct" branch types constantly swept over.
> >> > - We could also say that we simply release  independently and
> >> just
> >> > have a maven (SNAPSHOT) dependency on it. But that would create a weird
> >> > flow if you need to change in common where you need to constantly switch
> >> > branches back and forth.
> >> > - In general, Frankensteinian's approach is very switch intensive. If
> >> you
> >> > maintain 3 connectors and need to fix 1 build stability each at the same
> >> > time (quite common nowadays for some reason) and you have 2 review
> >> rounds,
> >> > you need to switch branches 9 times ignoring changes to common.
> >> >
> >> > Additionally, we still have the rather user/dev unfriendly main that is
> >> > mostly empty. I'm also not sure we can generate an overview README.md to
> >> > make it more friendly here because in theory every connector branch
> >> should
> >> > be based on main and we would get merge conflicts.
> >> >
> >> > I'd like to propose once again to go with individual repositories.
> >> > - The only downside that we discussed so far is that we have more
> >> initial
> >> > setup to do. Since we organically grow the number of
> >> connector/repositories
> >> > that load is quite distributed. We can offer templates after finding a
> >> good
> >> > approach that can even be used by outside organizations.
> >> > - Regarding secrets, I think it's actually an advantage that the Kafka
> >> > connector has no access to the AWS secrets. If there are secrets to be
> >> > shared across connectors, we can and should use Azure's Variable Groups
> >> (I
> >> > have used it in the past to share Nexus creds across repos). That would
> >> > also make rotation easy.
> >> > - Working on different connectors would be rather easy as all modern IDE
> >> > support multiple repo setups in the same project. You still need to do
> >> > multiple releases in case you update common code (either accessed
> >> through
> >> > Nexus or git submodule) and you want to release your connector.
> >> > - There is no difference in respect to how many CI runs there in both
> >> > approaches.
> >> > - Individual repositories also have the advantage of allowing external
> >> > incubation. Let's assume someone builds connector A and hosts it in
> >> their
> >> > organization (very common setup). If they want to contribute the code to
> >> > Flink, we could simply transfer the repository into ASF after ensuring
> >> > Flink coding standards. Then we retain git history and Github issues.
> >> >
> >> > Is there any point that I'm missing?
> >> >
> >> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler 
> >> > wrote:
> >> >
> >> > > For sharing workflows we should be able to use composite actions. We'd
> >> > > have the main definition files in the flink-connectors repo, that we
> >> > > also need to tag/release, which 

[jira] [Created] (FLINK-25238) flink iceberg source reading array types fail with Cast Exception

2021-12-09 Thread Praneeth Ramesh (Jira)
Praneeth Ramesh created FLINK-25238:
---

 Summary: flink iceberg source reading array types fail with Cast 
Exception
 Key: FLINK-25238
 URL: https://issues.apache.org/jira/browse/FLINK-25238
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.13.2
Reporter: Praneeth Ramesh
 Attachments: Screen Shot 2021-12-09 at 6.58.56 PM.png, Screen Shot 
2021-12-09 at 7.04.10 PM.png

I have a stream with iceberg table as a source. I have few columns of array 
types in the table. 

I try to read using iceberg connector. 

Flink Version : 1.13.2

Iceberg Flink Version: 0.12.1

 

I see the error as below.

java.lang.ClassCastException: class 
org.apache.iceberg.flink.data.FlinkParquetReaders$ReusableArrayData cannot be 
cast to class org.apache.flink.table.data.ColumnarArrayData 
(org.apache.iceberg.flink.data.FlinkParquetReaders$ReusableArrayData and 
org.apache.flink.table.data.ColumnarArrayData are in unnamed module of loader 
'app')
    at 
org.apache.flink.table.runtime.typeutils.ArrayDataSerializer.copy(ArrayDataSerializer.java:90)
    at 
org.apache.flink.table.runtime.typeutils.ArrayDataSerializer.copy(ArrayDataSerializer.java:47)
    at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:170)
    at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:131)
    at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:48)
    at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:69)
    at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
    at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
    at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
    at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
    at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:317)
    at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:411)
    at 
org.apache.iceberg.flink.source.StreamingReaderOperator.processSplits(StreamingReaderOperator.java:155)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:359)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:323)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
    at java.base/java.lang.Thread.run(Thread.java:834)

 

Could be same issue as https://issues.apache.org/jira/browse/FLINK-21247 except 
it happening for another type.

I see that Iceberg use custom types other than the types from 

org.apache.flink.table.data like

org.apache.iceberg.flink.data.FlinkParquetReaders.ReusableArrayData and these 
types are not handled in 
org.apache.flink.table.runtime.typeutils.ArrayDataSerializer !Screen Shot 
2021-12-09 at 6.58.56 PM.png!

 Just to try I changed the above code to handle the iceberg type as a binary 
Array and built it locally and used in my application and that worked. 

 

!Screen Shot 2021-12-09 at 7.04.10 PM.png!

Not sure if this is already handled in some newer versions. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25237) ElasticsearchSinkITCase fails due to NoNodeAvailableException

2021-12-09 Thread Yun Tang (Jira)
Yun Tang created FLINK-25237:


 Summary: ElasticsearchSinkITCase fails due to 
NoNodeAvailableException
 Key: FLINK-25237
 URL: https://issues.apache.org/jira/browse/FLINK-25237
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch, Tests
Reporter: Yun Tang


instance: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27900=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d

{code:java}
Dec 09 17:28:50 Caused by: 
ElasticsearchException[java.util.concurrent.ExecutionException: 
java.net.ConnectException: Connection refused]; nested: 
ExecutionException[java.net.ConnectException: Connection refused]; nested: 
ConnectException[Connection refused];
Dec 09 17:28:50 at 
org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2078)
Dec 09 17:28:50 at 
org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1732)
Dec 09 17:28:50 at 
org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1717)
Dec 09 17:28:50 at 
org.elasticsearch.client.RestHighLevelClient.ping(RestHighLevelClient.java:815)
Dec 09 17:28:50 at 
org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.verifyClientConnection(Elasticsearch7ApiCallBridge.java:144)
Dec 09 17:28:50 at 
org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.verifyClientConnection(Elasticsearch7ApiCallBridge.java:46)
Dec 09 17:28:50 at 
org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.open(ElasticsearchSinkBase.java:317)
Dec 09 17:28:50 at 
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
Dec 09 17:28:50 at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:100)
Dec 09 17:28:50 at 
org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:46)
Dec 09 17:28:50 at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:111)
Dec 09 17:28:50 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:696)
Dec 09 17:28:50 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
Dec 09 17:28:50 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:672)
Dec 09 17:28:50 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:639)
Dec 09 17:28:50 at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
Dec 09 17:28:50 at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
Dec 09 17:28:50 at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
Dec 09 17:28:50 at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
Dec 09 17:28:50 at java.lang.Thread.run(Thread.java:748)
Dec 09 17:28:50 Caused by: java.util.concurrent.ExecutionException: 
java.net.ConnectException: Connection refused
Dec 09 17:28:50 at 
org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:262)
Dec 09 17:28:50 at 
org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:249)
Dec 09 17:28:50 at 
org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:76)
Dec 09 17:28:50 at 
org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2075)
Dec 09 17:28:50 ... 19 more
Dec 09 17:28:50 Caused by: java.net.ConnectException: Connection refused
Dec 09 17:28:50 at sun.nio.ch.SocketChannelImpl.checkConnect(Native 
Method)
Dec 09 17:28:50 at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
Dec 09 17:28:50 at 
org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
Dec 09 17:28:50 at 
org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
Dec 09 17:28:50 at 
org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
Dec 09 17:28:50 at 
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
Dec 09 17:28:50 at 
org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
Dec 09 17:28:50 ... 1 more
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Till Rohrmann
Sorry if I was a bit unclear. +1 for the single repo per connector approach.

Cheers,
Till

On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann  wrote:

> +1 for the single repo approach.
>
> Cheers,
> Till
>
> On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser 
> wrote:
>
>> I also agree that it feels more natural to go with a repo for each
>> individual connector. Each repository can be made available at
>> flink-packages.org so users can find them, next to referring to them in
>> documentation. +1 from my side.
>>
>> On Thu, 9 Dec 2021 at 15:38, Arvid Heise  wrote:
>>
>> > Hi all,
>> >
>> > We tried out Chesnay's proposal and went with Option 2. Unfortunately,
>> we
>> > experienced tough nuts to crack and feel like we hit a dead end:
>> > - The main pain point with the outlined Frankensteinian connector repo
>> is
>> > how to handle shared code / infra code. If we have it in some 
>> > branch, then we need to merge the common branch in the connector branch
>> on
>> > update. However, it's unclear to me how improvements in the common
>> branch
>> > that naturally appear while working on a specific connector go back into
>> > the common branch. You can't use a pull request from your branch or else
>> > your connector code would poison the connector-less common branch. So
>> you
>> > would probably manually copy the files over to a common branch and
>> create a
>> > PR branch for that.
>> > - A weird solution could be to have the common branch as a submodule in
>> the
>> > repo itself (if that's even possible). I'm sure that this setup would
>> blow
>> > up the minds of all newcomers.
>> > - Similarly, it's mandatory to have safeguards against code from
>> connector
>> > A poisoning connector B, common, or main. I had some similar setup in
>> the
>> > past and code from two "distinct" branch types constantly swept over.
>> > - We could also say that we simply release  independently and
>> just
>> > have a maven (SNAPSHOT) dependency on it. But that would create a weird
>> > flow if you need to change in common where you need to constantly switch
>> > branches back and forth.
>> > - In general, Frankensteinian's approach is very switch intensive. If
>> you
>> > maintain 3 connectors and need to fix 1 build stability each at the same
>> > time (quite common nowadays for some reason) and you have 2 review
>> rounds,
>> > you need to switch branches 9 times ignoring changes to common.
>> >
>> > Additionally, we still have the rather user/dev unfriendly main that is
>> > mostly empty. I'm also not sure we can generate an overview README.md to
>> > make it more friendly here because in theory every connector branch
>> should
>> > be based on main and we would get merge conflicts.
>> >
>> > I'd like to propose once again to go with individual repositories.
>> > - The only downside that we discussed so far is that we have more
>> initial
>> > setup to do. Since we organically grow the number of
>> connector/repositories
>> > that load is quite distributed. We can offer templates after finding a
>> good
>> > approach that can even be used by outside organizations.
>> > - Regarding secrets, I think it's actually an advantage that the Kafka
>> > connector has no access to the AWS secrets. If there are secrets to be
>> > shared across connectors, we can and should use Azure's Variable Groups
>> (I
>> > have used it in the past to share Nexus creds across repos). That would
>> > also make rotation easy.
>> > - Working on different connectors would be rather easy as all modern IDE
>> > support multiple repo setups in the same project. You still need to do
>> > multiple releases in case you update common code (either accessed
>> through
>> > Nexus or git submodule) and you want to release your connector.
>> > - There is no difference in respect to how many CI runs there in both
>> > approaches.
>> > - Individual repositories also have the advantage of allowing external
>> > incubation. Let's assume someone builds connector A and hosts it in
>> their
>> > organization (very common setup). If they want to contribute the code to
>> > Flink, we could simply transfer the repository into ASF after ensuring
>> > Flink coding standards. Then we retain git history and Github issues.
>> >
>> > Is there any point that I'm missing?
>> >
>> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler 
>> > wrote:
>> >
>> > > For sharing workflows we should be able to use composite actions. We'd
>> > > have the main definition files in the flink-connectors repo, that we
>> > > also need to tag/release, which other branches/repos can then import.
>> > > These are also versioned, so we don't have to worry about accidentally
>> > > breaking stuff.
>> > > These could also be used to enforce certain standards / interfaces
>> such
>> > > that we can automate more things (e.g., integration into the Flink
>> > > documentation).
>> > >
>> > > It is true that Option 2) and dedicated repositories share a lot of
>> > > properties. While I did say in an offline 

[jira] [Created] (FLINK-25236) Add a mechanism to generate and validate a jobgraph with a checkpoint before submission

2021-12-09 Thread Ben Augarten (Jira)
Ben Augarten created FLINK-25236:


 Summary: Add a mechanism to generate and validate a jobgraph with 
a checkpoint before submission
 Key: FLINK-25236
 URL: https://issues.apache.org/jira/browse/FLINK-25236
 Project: Flink
  Issue Type: Improvement
Reporter: Ben Augarten


I've mostly worked on flink 1.9-1.12, but I believe this is still an issue 
today. 

 

I've worked on a few flink applications now that have struggled to reliably 
activate new versions of a currently running job. Sometimes, users make changes 
to a job graph that make it so state cannot be restored. Sometimes users make 
changes to a job graph that make it unable to be scheduled on a given cluster 
(increased parallelism with insufficient task slots on the cluster). These 
validations are [performed 
here|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/Checkpoints.java#L120]

 

It's not flink's problem that these issues arise, but these issues are only 
detected when the JM tries to run the given jobgraph. For exactly once 
applications (and other applications where running two job graphs for the same 
application is undesirable) there is unneeded downtime when users submit 
jobgraphs with breaking changes because users must cancel the old job, submit 
the new job to see if it is valid and will activate, and then resubmit the old 
job when activation fails. As a user with low-latency requirements, this change 
management solution is unfortunate, and there doesn't seem to be anything 
technical preventing these validations from happening earlier.

 

Suggestion: provide a mechanism for users to (1) create and (2) validate the 
new job graph+checkpoint without running it so that they do not need to cancel 
a currently running version of the job until they're sure it will activate



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Till Rohrmann
+1 for the single repo approach.

Cheers,
Till

On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser  wrote:

> I also agree that it feels more natural to go with a repo for each
> individual connector. Each repository can be made available at
> flink-packages.org so users can find them, next to referring to them in
> documentation. +1 from my side.
>
> On Thu, 9 Dec 2021 at 15:38, Arvid Heise  wrote:
>
> > Hi all,
> >
> > We tried out Chesnay's proposal and went with Option 2. Unfortunately, we
> > experienced tough nuts to crack and feel like we hit a dead end:
> > - The main pain point with the outlined Frankensteinian connector repo is
> > how to handle shared code / infra code. If we have it in some 
> > branch, then we need to merge the common branch in the connector branch
> on
> > update. However, it's unclear to me how improvements in the common branch
> > that naturally appear while working on a specific connector go back into
> > the common branch. You can't use a pull request from your branch or else
> > your connector code would poison the connector-less common branch. So you
> > would probably manually copy the files over to a common branch and
> create a
> > PR branch for that.
> > - A weird solution could be to have the common branch as a submodule in
> the
> > repo itself (if that's even possible). I'm sure that this setup would
> blow
> > up the minds of all newcomers.
> > - Similarly, it's mandatory to have safeguards against code from
> connector
> > A poisoning connector B, common, or main. I had some similar setup in the
> > past and code from two "distinct" branch types constantly swept over.
> > - We could also say that we simply release  independently and
> just
> > have a maven (SNAPSHOT) dependency on it. But that would create a weird
> > flow if you need to change in common where you need to constantly switch
> > branches back and forth.
> > - In general, Frankensteinian's approach is very switch intensive. If you
> > maintain 3 connectors and need to fix 1 build stability each at the same
> > time (quite common nowadays for some reason) and you have 2 review
> rounds,
> > you need to switch branches 9 times ignoring changes to common.
> >
> > Additionally, we still have the rather user/dev unfriendly main that is
> > mostly empty. I'm also not sure we can generate an overview README.md to
> > make it more friendly here because in theory every connector branch
> should
> > be based on main and we would get merge conflicts.
> >
> > I'd like to propose once again to go with individual repositories.
> > - The only downside that we discussed so far is that we have more initial
> > setup to do. Since we organically grow the number of
> connector/repositories
> > that load is quite distributed. We can offer templates after finding a
> good
> > approach that can even be used by outside organizations.
> > - Regarding secrets, I think it's actually an advantage that the Kafka
> > connector has no access to the AWS secrets. If there are secrets to be
> > shared across connectors, we can and should use Azure's Variable Groups
> (I
> > have used it in the past to share Nexus creds across repos). That would
> > also make rotation easy.
> > - Working on different connectors would be rather easy as all modern IDE
> > support multiple repo setups in the same project. You still need to do
> > multiple releases in case you update common code (either accessed through
> > Nexus or git submodule) and you want to release your connector.
> > - There is no difference in respect to how many CI runs there in both
> > approaches.
> > - Individual repositories also have the advantage of allowing external
> > incubation. Let's assume someone builds connector A and hosts it in their
> > organization (very common setup). If they want to contribute the code to
> > Flink, we could simply transfer the repository into ASF after ensuring
> > Flink coding standards. Then we retain git history and Github issues.
> >
> > Is there any point that I'm missing?
> >
> > On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler 
> > wrote:
> >
> > > For sharing workflows we should be able to use composite actions. We'd
> > > have the main definition files in the flink-connectors repo, that we
> > > also need to tag/release, which other branches/repos can then import.
> > > These are also versioned, so we don't have to worry about accidentally
> > > breaking stuff.
> > > These could also be used to enforce certain standards / interfaces such
> > > that we can automate more things (e.g., integration into the Flink
> > > documentation).
> > >
> > > It is true that Option 2) and dedicated repositories share a lot of
> > > properties. While I did say in an offline conversation that we in that
> > > case might just as well use separate repositories, I'm not so sure
> > > anymore. One repo would make administration a bit easier, for example
> > > secrets wouldn't have to be applied to each repo (we wouldn't want
> > > certain secrets to be set 

Re: [DISCUSS] Deprecate MapR FS

2021-12-09 Thread David Morávek
+1, agreed with Seth's reasoning. There has been no real activity in MapR
FS module for years [1], so the eventual users should be good with using
the jars from the older Flink versions for quite some time

[1]
https://github.com/apache/flink/commits/master/flink-filesystems/flink-mapr-fs

Best,
D.

On Thu, Dec 9, 2021 at 4:28 PM Konstantin Knauf  wrote:

> +1 (what Seth said)
>
> On Thu, Dec 9, 2021 at 4:15 PM Seth Wiesman  wrote:
>
> > +1
> >
> > I actually thought we had already dropped this FS. If anyone is still
> > relying on it in production, the file system abstraction in Flink has
> been
> > incredibly stable over the years. They should be able to use the 1.14
> MapR
> > FS with later versions of Flink.
> >
> > Seth
> >
> > On Wed, Dec 8, 2021 at 10:03 AM Martijn Visser 
> > wrote:
> >
> >> Hi all,
> >>
> >> Flink supports multiple file systems [1] which includes MapR FS. MapR as
> >> a company doesn't exist anymore since 2019, the technology and
> intellectual
> >> property has been sold to Hewlett Packard.
> >>
> >> I don't think that there's anyone who's using MapR anymore and therefore
> >> I think it would be good to deprecate this for Flink 1.15 and then
> remove
> >> it in Flink 1.16. Removing this from Flink will slightly shrink the
> >> codebase and CI runtime.
> >>
> >> I'm also cross posting this to the User mailing list, in case there's
> >> still anyone who's using MapR.
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> [1]
> >>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >>
> >
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


Re: [DISCUSS] Releasing Flink 1.14.1

2021-12-09 Thread David Morávek
Hi Martijn, I've just opened a backport PR [1] for FLINK-23946 [2].

[1] https://github.com/apache/flink/pull/18066
[2] https://issues.apache.org/jira/browse/FLINK-23946

Best,
D.

On Thu, Dec 9, 2021 at 4:59 PM Fabian Paul  wrote:

> Actually I meant https://issues.apache.org/jira/browse/FLINK-25126
> sorry for the confusion.
>
> On Thu, Dec 9, 2021 at 4:55 PM Fabian Paul  wrote:
> >
> > Hi Martijn,
> >
> > I just opened the backport for
> > https://issues.apache.org/jira/browse/FLINK-25132. The changes are
> > already approved I only wait for a green Azure build.
> >
> > Best,
> > Fabian
> >
> > On Thu, Dec 9, 2021 at 4:01 PM Martijn Visser 
> wrote:
> > >
> > > Hi all,
> > >
> > > Thanks for the fixes Jingsong and Zhu!
> > >
> > > That means that we still have the following tickets open:
> > >
> > > * https://issues.apache.org/jira/browse/FLINK-23946 - Application mode
> > > fails fatally when being shut down -> A PR is there, just pending a
> review.
> > > * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka connector
> tries
> > > to commit aborted transaction in batch mode -> I believe this is
> pending a
> > > backport, correct @fp...@apache.org  ?
> > > * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
> cannot
> > > work with object-reusing DeserializationSchema -> @renqs...@gmail.com
> > >  can you provide an ETA for this ticket?
> > > * https://issues.apache.org/jira/browse/FLINK-25199 - fromValues does
> not
> > > emit final MAX watermark -> @Marios Trivyzas  can
> you
> > > provide an ETA for this ticket?
> > > * https://issues.apache.org/jira/browse/FLINK-25227 - Comparing the
> > > equality of the same (boxed) numeric values returns false -> @Caizhi
> Weng
> > >  mentioned that a fix is planned for
> today/tomorrow.
> > > I am wondering if this is indeed a blocker for 1.14.1, but given that
> there
> > > are still some blockers waiting to be merged we could probably include
> it.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Thu, 9 Dec 2021 at 07:31, Caizhi Weng  wrote:
> > >
> > > > Hi devs!
> > > >
> > > > Sorry for the interruptions, but I just found an issue [1] (which I
> think
> > > > is a blocking one) in every Flink version, including Flink 1.14.1.
> > > >
> > > > For Flink < 1.15, this issue will cause incorrect result when user
> cast
> > > > two strings to numerics and compare the numerics.
> > > >
> > > > I'm planning for a quick fix today or tomorrow.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-25227
> > > >
> > > > Zhu Zhu  于2021年12月9日周四 10:48写道:
> > > >
> > > >> update: backport of FLINK-19142 is done
> > > >>
> > > >> Thanks,
> > > >> Zhu
> > > >>
> > > >> Zhu Zhu  于2021年12月8日周三 19:35写道:
> > > >>
> > > >> > Hi Martijn,
> > > >> >
> > > >> > I'd like to backport the fix of FLINK-19142 to 1.14.1.
> > > >> > The backport is in progress.
> > > >> > Will update it here when it is done.
> > > >> >
> > > >> > Thanks,
> > > >> > Zhu
> > > >> >
> > > >> > Jingsong Li  于2021年12月8日周三 10:33写道:
> > > >> >
> > > >> >> Hi Martijn,
> > > >> >>
> > > >> >> We just created a cherry-pick pull-request for
> > > >> >> https://issues.apache.org/jira/browse/FLINK-20370
> > > >> >> We could finish it as soon as possible.
> > > >> >>
> > > >> >> Best,
> > > >> >> Jingsong
> > > >> >>
> > > >> >> On Fri, Dec 3, 2021 at 10:25 PM Fabian Paul 
> wrote:
> > > >> >> >
> > > >> >> > I just opened a PR for
> > > >> >> > https://issues.apache.org/jira/browse/FLINK-25126 I'll expect
> to
> > > >> merge
> > > >> >> > it sometime next week.
> > > >> >> >
> > > >> >> > Best,
> > > >> >> > Fabian
> > > >> >> >
> > > >> >> > On Fri, Dec 3, 2021 at 10:49 AM Martijn Visser <
> > > >> mart...@ververica.com>
> > > >> >> wrote:
> > > >> >> > >
> > > >> >> > > Hi all,
> > > >> >> > >
> > > >> >> > > Just a status update on the open blockers for 1.14.1:
> > > >> >> > > * https://issues.apache.org/jira/browse/FLINK-22113 -
> UniqueKey
> > > >> >> constraint is lost with multiple sources join in SQL -> I
> believe most
> > > >> >> review comments have been fixed and it's just the final review
> remarks
> > > >> >> before it's ready.
> > > >> >> > > * https://issues.apache.org/jira/browse/FLINK-23946 -
> Application
> > > >> >> mode fails fatally when being shut down -> @David Morávek can you
> > > >> provide
> > > >> >> an update?
> > > >> >> > > * https://issues.apache.org/jira/browse/FLINK-25022 -
> ClassLoader
> > > >> >> leak with ThreadLocals on the JM when submitting a job through
> the
> > > >> REST API
> > > >> >> -> I think this is just pending on a merge to master and then
> creating
> > > >> a
> > > >> >> backport?
> > > >> >> > > * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka
> > > >> >> connector tries to commit aborted transaction in batch mode ->
> This is
> > > >> a
> > > >> >> new blocker. @fp...@apache.org can you give an update?
> > > >> >> > > * https://issues.apache.org/jira/browse/FLINK-25132 -
> KafkaSource
> > > >> >> cannot work 

Re: [DISCUSS] Releasing Flink 1.14.1

2021-12-09 Thread Fabian Paul
Actually I meant https://issues.apache.org/jira/browse/FLINK-25126
sorry for the confusion.

On Thu, Dec 9, 2021 at 4:55 PM Fabian Paul  wrote:
>
> Hi Martijn,
>
> I just opened the backport for
> https://issues.apache.org/jira/browse/FLINK-25132. The changes are
> already approved I only wait for a green Azure build.
>
> Best,
> Fabian
>
> On Thu, Dec 9, 2021 at 4:01 PM Martijn Visser  wrote:
> >
> > Hi all,
> >
> > Thanks for the fixes Jingsong and Zhu!
> >
> > That means that we still have the following tickets open:
> >
> > * https://issues.apache.org/jira/browse/FLINK-23946 - Application mode
> > fails fatally when being shut down -> A PR is there, just pending a review.
> > * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka connector tries
> > to commit aborted transaction in batch mode -> I believe this is pending a
> > backport, correct @fp...@apache.org  ?
> > * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource cannot
> > work with object-reusing DeserializationSchema -> @renqs...@gmail.com
> >  can you provide an ETA for this ticket?
> > * https://issues.apache.org/jira/browse/FLINK-25199 - fromValues does not
> > emit final MAX watermark -> @Marios Trivyzas  can you
> > provide an ETA for this ticket?
> > * https://issues.apache.org/jira/browse/FLINK-25227 - Comparing the
> > equality of the same (boxed) numeric values returns false -> @Caizhi Weng
> >  mentioned that a fix is planned for today/tomorrow.
> > I am wondering if this is indeed a blocker for 1.14.1, but given that there
> > are still some blockers waiting to be merged we could probably include it.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Thu, 9 Dec 2021 at 07:31, Caizhi Weng  wrote:
> >
> > > Hi devs!
> > >
> > > Sorry for the interruptions, but I just found an issue [1] (which I think
> > > is a blocking one) in every Flink version, including Flink 1.14.1.
> > >
> > > For Flink < 1.15, this issue will cause incorrect result when user cast
> > > two strings to numerics and compare the numerics.
> > >
> > > I'm planning for a quick fix today or tomorrow.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-25227
> > >
> > > Zhu Zhu  于2021年12月9日周四 10:48写道:
> > >
> > >> update: backport of FLINK-19142 is done
> > >>
> > >> Thanks,
> > >> Zhu
> > >>
> > >> Zhu Zhu  于2021年12月8日周三 19:35写道:
> > >>
> > >> > Hi Martijn,
> > >> >
> > >> > I'd like to backport the fix of FLINK-19142 to 1.14.1.
> > >> > The backport is in progress.
> > >> > Will update it here when it is done.
> > >> >
> > >> > Thanks,
> > >> > Zhu
> > >> >
> > >> > Jingsong Li  于2021年12月8日周三 10:33写道:
> > >> >
> > >> >> Hi Martijn,
> > >> >>
> > >> >> We just created a cherry-pick pull-request for
> > >> >> https://issues.apache.org/jira/browse/FLINK-20370
> > >> >> We could finish it as soon as possible.
> > >> >>
> > >> >> Best,
> > >> >> Jingsong
> > >> >>
> > >> >> On Fri, Dec 3, 2021 at 10:25 PM Fabian Paul  wrote:
> > >> >> >
> > >> >> > I just opened a PR for
> > >> >> > https://issues.apache.org/jira/browse/FLINK-25126 I'll expect to
> > >> merge
> > >> >> > it sometime next week.
> > >> >> >
> > >> >> > Best,
> > >> >> > Fabian
> > >> >> >
> > >> >> > On Fri, Dec 3, 2021 at 10:49 AM Martijn Visser <
> > >> mart...@ververica.com>
> > >> >> wrote:
> > >> >> > >
> > >> >> > > Hi all,
> > >> >> > >
> > >> >> > > Just a status update on the open blockers for 1.14.1:
> > >> >> > > * https://issues.apache.org/jira/browse/FLINK-22113 - UniqueKey
> > >> >> constraint is lost with multiple sources join in SQL -> I believe most
> > >> >> review comments have been fixed and it's just the final review remarks
> > >> >> before it's ready.
> > >> >> > > * https://issues.apache.org/jira/browse/FLINK-23946 - Application
> > >> >> mode fails fatally when being shut down -> @David Morávek can you
> > >> provide
> > >> >> an update?
> > >> >> > > * https://issues.apache.org/jira/browse/FLINK-25022 - ClassLoader
> > >> >> leak with ThreadLocals on the JM when submitting a job through the
> > >> REST API
> > >> >> -> I think this is just pending on a merge to master and then creating
> > >> a
> > >> >> backport?
> > >> >> > > * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka
> > >> >> connector tries to commit aborted transaction in batch mode -> This is
> > >> a
> > >> >> new blocker. @fp...@apache.org can you give an update?
> > >> >> > > * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
> > >> >> cannot work with object-reusing DeserializationSchema -> There's a PR
> > >> >> that's being reviewed and then needs a backport.
> > >> >> > >
> > >> >> > > It would be great if we can finish all these blockers next week to
> > >> >> start a release. Do the assignees think that's realistic?
> > >> >> > >
> > >> >> > > Best regards,
> > >> >> > >
> > >> >> > > Martijn
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > On Thu, 2 Dec 2021 at 14:25, Marios Trivyzas 
> > >> >> wrote:
> > >> >> > >>
> > >> >> > >>  

Re: [DISCUSS] Releasing Flink 1.14.1

2021-12-09 Thread Fabian Paul
Hi Martijn,

I just opened the backport for
https://issues.apache.org/jira/browse/FLINK-25132. The changes are
already approved I only wait for a green Azure build.

Best,
Fabian

On Thu, Dec 9, 2021 at 4:01 PM Martijn Visser  wrote:
>
> Hi all,
>
> Thanks for the fixes Jingsong and Zhu!
>
> That means that we still have the following tickets open:
>
> * https://issues.apache.org/jira/browse/FLINK-23946 - Application mode
> fails fatally when being shut down -> A PR is there, just pending a review.
> * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka connector tries
> to commit aborted transaction in batch mode -> I believe this is pending a
> backport, correct @fp...@apache.org  ?
> * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource cannot
> work with object-reusing DeserializationSchema -> @renqs...@gmail.com
>  can you provide an ETA for this ticket?
> * https://issues.apache.org/jira/browse/FLINK-25199 - fromValues does not
> emit final MAX watermark -> @Marios Trivyzas  can you
> provide an ETA for this ticket?
> * https://issues.apache.org/jira/browse/FLINK-25227 - Comparing the
> equality of the same (boxed) numeric values returns false -> @Caizhi Weng
>  mentioned that a fix is planned for today/tomorrow.
> I am wondering if this is indeed a blocker for 1.14.1, but given that there
> are still some blockers waiting to be merged we could probably include it.
>
> Best regards,
>
> Martijn
>
> On Thu, 9 Dec 2021 at 07:31, Caizhi Weng  wrote:
>
> > Hi devs!
> >
> > Sorry for the interruptions, but I just found an issue [1] (which I think
> > is a blocking one) in every Flink version, including Flink 1.14.1.
> >
> > For Flink < 1.15, this issue will cause incorrect result when user cast
> > two strings to numerics and compare the numerics.
> >
> > I'm planning for a quick fix today or tomorrow.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-25227
> >
> > Zhu Zhu  于2021年12月9日周四 10:48写道:
> >
> >> update: backport of FLINK-19142 is done
> >>
> >> Thanks,
> >> Zhu
> >>
> >> Zhu Zhu  于2021年12月8日周三 19:35写道:
> >>
> >> > Hi Martijn,
> >> >
> >> > I'd like to backport the fix of FLINK-19142 to 1.14.1.
> >> > The backport is in progress.
> >> > Will update it here when it is done.
> >> >
> >> > Thanks,
> >> > Zhu
> >> >
> >> > Jingsong Li  于2021年12月8日周三 10:33写道:
> >> >
> >> >> Hi Martijn,
> >> >>
> >> >> We just created a cherry-pick pull-request for
> >> >> https://issues.apache.org/jira/browse/FLINK-20370
> >> >> We could finish it as soon as possible.
> >> >>
> >> >> Best,
> >> >> Jingsong
> >> >>
> >> >> On Fri, Dec 3, 2021 at 10:25 PM Fabian Paul  wrote:
> >> >> >
> >> >> > I just opened a PR for
> >> >> > https://issues.apache.org/jira/browse/FLINK-25126 I'll expect to
> >> merge
> >> >> > it sometime next week.
> >> >> >
> >> >> > Best,
> >> >> > Fabian
> >> >> >
> >> >> > On Fri, Dec 3, 2021 at 10:49 AM Martijn Visser <
> >> mart...@ververica.com>
> >> >> wrote:
> >> >> > >
> >> >> > > Hi all,
> >> >> > >
> >> >> > > Just a status update on the open blockers for 1.14.1:
> >> >> > > * https://issues.apache.org/jira/browse/FLINK-22113 - UniqueKey
> >> >> constraint is lost with multiple sources join in SQL -> I believe most
> >> >> review comments have been fixed and it's just the final review remarks
> >> >> before it's ready.
> >> >> > > * https://issues.apache.org/jira/browse/FLINK-23946 - Application
> >> >> mode fails fatally when being shut down -> @David Morávek can you
> >> provide
> >> >> an update?
> >> >> > > * https://issues.apache.org/jira/browse/FLINK-25022 - ClassLoader
> >> >> leak with ThreadLocals on the JM when submitting a job through the
> >> REST API
> >> >> -> I think this is just pending on a merge to master and then creating
> >> a
> >> >> backport?
> >> >> > > * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka
> >> >> connector tries to commit aborted transaction in batch mode -> This is
> >> a
> >> >> new blocker. @fp...@apache.org can you give an update?
> >> >> > > * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
> >> >> cannot work with object-reusing DeserializationSchema -> There's a PR
> >> >> that's being reviewed and then needs a backport.
> >> >> > >
> >> >> > > It would be great if we can finish all these blockers next week to
> >> >> start a release. Do the assignees think that's realistic?
> >> >> > >
> >> >> > > Best regards,
> >> >> > >
> >> >> > > Martijn
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Thu, 2 Dec 2021 at 14:25, Marios Trivyzas 
> >> >> wrote:
> >> >> > >>
> >> >> > >>  https://issues.apache.org/jira/browse/FLINK-22113 will be merged
> >> >> today (most probably)
> >> >> > >>
> >> >> > >> On Mon, Nov 29, 2021 at 10:15 AM Martijn Visser <
> >> >> mart...@ververica.com> wrote:
> >> >> > >>>
> >> >> > >>> Thanks all for the updates! To summarize, these are open tickets
> >> >> that are considered blockers for Flink 1.14.1:
> >> >> > >>>
> >> >> > >>> * 

Re: [DISCUSS] Deprecate MapR FS

2021-12-09 Thread Konstantin Knauf
+1 (what Seth said)

On Thu, Dec 9, 2021 at 4:15 PM Seth Wiesman  wrote:

> +1
>
> I actually thought we had already dropped this FS. If anyone is still
> relying on it in production, the file system abstraction in Flink has been
> incredibly stable over the years. They should be able to use the 1.14 MapR
> FS with later versions of Flink.
>
> Seth
>
> On Wed, Dec 8, 2021 at 10:03 AM Martijn Visser 
> wrote:
>
>> Hi all,
>>
>> Flink supports multiple file systems [1] which includes MapR FS. MapR as
>> a company doesn't exist anymore since 2019, the technology and intellectual
>> property has been sold to Hewlett Packard.
>>
>> I don't think that there's anyone who's using MapR anymore and therefore
>> I think it would be good to deprecate this for Flink 1.15 and then remove
>> it in Flink 1.16. Removing this from Flink will slightly shrink the
>> codebase and CI runtime.
>>
>> I'm also cross posting this to the User mailing list, in case there's
>> still anyone who's using MapR.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Deprecate MapR FS

2021-12-09 Thread Seth Wiesman
+1

I actually thought we had already dropped this FS. If anyone is still
relying on it in production, the file system abstraction in Flink has been
incredibly stable over the years. They should be able to use the 1.14 MapR
FS with later versions of Flink.

Seth

On Wed, Dec 8, 2021 at 10:03 AM Martijn Visser 
wrote:

> Hi all,
>
> Flink supports multiple file systems [1] which includes MapR FS. MapR as a
> company doesn't exist anymore since 2019, the technology and intellectual
> property has been sold to Hewlett Packard.
>
> I don't think that there's anyone who's using MapR anymore and therefore I
> think it would be good to deprecate this for Flink 1.15 and then remove it
> in Flink 1.16. Removing this from Flink will slightly shrink the codebase
> and CI runtime.
>
> I'm also cross posting this to the User mailing list, in case there's
> still anyone who's using MapR.
>
> Best regards,
>
> Martijn
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>


Re: [DISCUSS] Releasing Flink 1.14.1

2021-12-09 Thread Martijn Visser
Hi all,

Thanks for the fixes Jingsong and Zhu!

That means that we still have the following tickets open:

* https://issues.apache.org/jira/browse/FLINK-23946 - Application mode
fails fatally when being shut down -> A PR is there, just pending a review.
* https://issues.apache.org/jira/browse/FLINK-25126 - Kafka connector tries
to commit aborted transaction in batch mode -> I believe this is pending a
backport, correct @fp...@apache.org  ?
* https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource cannot
work with object-reusing DeserializationSchema -> @renqs...@gmail.com
 can you provide an ETA for this ticket?
* https://issues.apache.org/jira/browse/FLINK-25199 - fromValues does not
emit final MAX watermark -> @Marios Trivyzas  can you
provide an ETA for this ticket?
* https://issues.apache.org/jira/browse/FLINK-25227 - Comparing the
equality of the same (boxed) numeric values returns false -> @Caizhi Weng
 mentioned that a fix is planned for today/tomorrow.
I am wondering if this is indeed a blocker for 1.14.1, but given that there
are still some blockers waiting to be merged we could probably include it.

Best regards,

Martijn

On Thu, 9 Dec 2021 at 07:31, Caizhi Weng  wrote:

> Hi devs!
>
> Sorry for the interruptions, but I just found an issue [1] (which I think
> is a blocking one) in every Flink version, including Flink 1.14.1.
>
> For Flink < 1.15, this issue will cause incorrect result when user cast
> two strings to numerics and compare the numerics.
>
> I'm planning for a quick fix today or tomorrow.
>
> [1] https://issues.apache.org/jira/browse/FLINK-25227
>
> Zhu Zhu  于2021年12月9日周四 10:48写道:
>
>> update: backport of FLINK-19142 is done
>>
>> Thanks,
>> Zhu
>>
>> Zhu Zhu  于2021年12月8日周三 19:35写道:
>>
>> > Hi Martijn,
>> >
>> > I'd like to backport the fix of FLINK-19142 to 1.14.1.
>> > The backport is in progress.
>> > Will update it here when it is done.
>> >
>> > Thanks,
>> > Zhu
>> >
>> > Jingsong Li  于2021年12月8日周三 10:33写道:
>> >
>> >> Hi Martijn,
>> >>
>> >> We just created a cherry-pick pull-request for
>> >> https://issues.apache.org/jira/browse/FLINK-20370
>> >> We could finish it as soon as possible.
>> >>
>> >> Best,
>> >> Jingsong
>> >>
>> >> On Fri, Dec 3, 2021 at 10:25 PM Fabian Paul  wrote:
>> >> >
>> >> > I just opened a PR for
>> >> > https://issues.apache.org/jira/browse/FLINK-25126 I'll expect to
>> merge
>> >> > it sometime next week.
>> >> >
>> >> > Best,
>> >> > Fabian
>> >> >
>> >> > On Fri, Dec 3, 2021 at 10:49 AM Martijn Visser <
>> mart...@ververica.com>
>> >> wrote:
>> >> > >
>> >> > > Hi all,
>> >> > >
>> >> > > Just a status update on the open blockers for 1.14.1:
>> >> > > * https://issues.apache.org/jira/browse/FLINK-22113 - UniqueKey
>> >> constraint is lost with multiple sources join in SQL -> I believe most
>> >> review comments have been fixed and it's just the final review remarks
>> >> before it's ready.
>> >> > > * https://issues.apache.org/jira/browse/FLINK-23946 - Application
>> >> mode fails fatally when being shut down -> @David Morávek can you
>> provide
>> >> an update?
>> >> > > * https://issues.apache.org/jira/browse/FLINK-25022 - ClassLoader
>> >> leak with ThreadLocals on the JM when submitting a job through the
>> REST API
>> >> -> I think this is just pending on a merge to master and then creating
>> a
>> >> backport?
>> >> > > * https://issues.apache.org/jira/browse/FLINK-25126 - Kafka
>> >> connector tries to commit aborted transaction in batch mode -> This is
>> a
>> >> new blocker. @fp...@apache.org can you give an update?
>> >> > > * https://issues.apache.org/jira/browse/FLINK-25132 - KafkaSource
>> >> cannot work with object-reusing DeserializationSchema -> There's a PR
>> >> that's being reviewed and then needs a backport.
>> >> > >
>> >> > > It would be great if we can finish all these blockers next week to
>> >> start a release. Do the assignees think that's realistic?
>> >> > >
>> >> > > Best regards,
>> >> > >
>> >> > > Martijn
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, 2 Dec 2021 at 14:25, Marios Trivyzas 
>> >> wrote:
>> >> > >>
>> >> > >>  https://issues.apache.org/jira/browse/FLINK-22113 will be merged
>> >> today (most probably)
>> >> > >>
>> >> > >> On Mon, Nov 29, 2021 at 10:15 AM Martijn Visser <
>> >> mart...@ververica.com> wrote:
>> >> > >>>
>> >> > >>> Thanks all for the updates! To summarize, these are open tickets
>> >> that are considered blockers for Flink 1.14.1:
>> >> > >>>
>> >> > >>> * https://issues.apache.org/jira/browse/FLINK-22113 - UniqueKey
>> >> constraint is lost with multiple sources join in SQL -> @Marios
>> Trivyzas
>> >> can you give an estimate when you expect this to be resolved?
>> >> > >>> * https://issues.apache.org/jira/browse/FLINK-23946 -
>> Application
>> >> mode fails fatally when being shut down -> A patch is being prepared.
>> >> @David Morávek do you have an estimate when this patch will be there?
>> >> > >>> * https://issues.apache.org/jira/browse/FLINK-24596 - Bugs in

Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Martijn Visser
I also agree that it feels more natural to go with a repo for each
individual connector. Each repository can be made available at
flink-packages.org so users can find them, next to referring to them in
documentation. +1 from my side.

On Thu, 9 Dec 2021 at 15:38, Arvid Heise  wrote:

> Hi all,
>
> We tried out Chesnay's proposal and went with Option 2. Unfortunately, we
> experienced tough nuts to crack and feel like we hit a dead end:
> - The main pain point with the outlined Frankensteinian connector repo is
> how to handle shared code / infra code. If we have it in some 
> branch, then we need to merge the common branch in the connector branch on
> update. However, it's unclear to me how improvements in the common branch
> that naturally appear while working on a specific connector go back into
> the common branch. You can't use a pull request from your branch or else
> your connector code would poison the connector-less common branch. So you
> would probably manually copy the files over to a common branch and create a
> PR branch for that.
> - A weird solution could be to have the common branch as a submodule in the
> repo itself (if that's even possible). I'm sure that this setup would blow
> up the minds of all newcomers.
> - Similarly, it's mandatory to have safeguards against code from connector
> A poisoning connector B, common, or main. I had some similar setup in the
> past and code from two "distinct" branch types constantly swept over.
> - We could also say that we simply release  independently and just
> have a maven (SNAPSHOT) dependency on it. But that would create a weird
> flow if you need to change in common where you need to constantly switch
> branches back and forth.
> - In general, Frankensteinian's approach is very switch intensive. If you
> maintain 3 connectors and need to fix 1 build stability each at the same
> time (quite common nowadays for some reason) and you have 2 review rounds,
> you need to switch branches 9 times ignoring changes to common.
>
> Additionally, we still have the rather user/dev unfriendly main that is
> mostly empty. I'm also not sure we can generate an overview README.md to
> make it more friendly here because in theory every connector branch should
> be based on main and we would get merge conflicts.
>
> I'd like to propose once again to go with individual repositories.
> - The only downside that we discussed so far is that we have more initial
> setup to do. Since we organically grow the number of connector/repositories
> that load is quite distributed. We can offer templates after finding a good
> approach that can even be used by outside organizations.
> - Regarding secrets, I think it's actually an advantage that the Kafka
> connector has no access to the AWS secrets. If there are secrets to be
> shared across connectors, we can and should use Azure's Variable Groups (I
> have used it in the past to share Nexus creds across repos). That would
> also make rotation easy.
> - Working on different connectors would be rather easy as all modern IDE
> support multiple repo setups in the same project. You still need to do
> multiple releases in case you update common code (either accessed through
> Nexus or git submodule) and you want to release your connector.
> - There is no difference in respect to how many CI runs there in both
> approaches.
> - Individual repositories also have the advantage of allowing external
> incubation. Let's assume someone builds connector A and hosts it in their
> organization (very common setup). If they want to contribute the code to
> Flink, we could simply transfer the repository into ASF after ensuring
> Flink coding standards. Then we retain git history and Github issues.
>
> Is there any point that I'm missing?
>
> On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler 
> wrote:
>
> > For sharing workflows we should be able to use composite actions. We'd
> > have the main definition files in the flink-connectors repo, that we
> > also need to tag/release, which other branches/repos can then import.
> > These are also versioned, so we don't have to worry about accidentally
> > breaking stuff.
> > These could also be used to enforce certain standards / interfaces such
> > that we can automate more things (e.g., integration into the Flink
> > documentation).
> >
> > It is true that Option 2) and dedicated repositories share a lot of
> > properties. While I did say in an offline conversation that we in that
> > case might just as well use separate repositories, I'm not so sure
> > anymore. One repo would make administration a bit easier, for example
> > secrets wouldn't have to be applied to each repo (we wouldn't want
> > certain secrets to be set up organization-wide).
> > I overall also like that one repo would present a single access point;
> > you can't "miss" a connector repo, and I would hope that having it as
> > one repo would nurture more collaboration between the connectors, which
> > after all need to solve similar 

Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-12-09 Thread Timo Walther

Hi everyone,

thanks for the healthy discussion. Let us find a compromise to make all 
parties happy.


> NULL in SQL essentially means "UNKNOWN"

This argument is true, but as a user I would like to know the `Why?` a 
value is UNKNOWN. I could imagine users have spent hours on finding the 
root cause of a NULL value in their pipeline. This might not always be 
reported in JIRA afterwards.


> I'm ok to change the behavior, but just not now

As Marios mentioned, we have introduced the flag for both new and legacy 
behavior. I would suggest we wait a couple of releases before enabling 
it. But not necessarily until Flink 2.0.


I'm sure we will have more of these cases in the future as many built-in 
functions have not been properly tested or verified against other 
vendors. Nobody has reserved the time yet to look at the built-in 
functions as a whole. We have very specialized ones whereas common 
functions are missing.


> a group of people managing some streaming platform

Esp. streaming platforms should be able to have a way of setting a 
config option globally for all existing pipelines. What we should fix in 
the near future is that a flink-conf.yaml option can be propagated into 
the code generation. We currently don't have a complete layered 
configuration story that would allow such use case.


Also the upgrade story (FLIP-190) will help us as functions are 
versioned in the compiled plan. We can introduce a new CAST function 
version. However, personally I would have liked to solve this before, as 
CAST is a fundamental operation.


How about the following steps?

1) Introduce CAST with option and TRY_CAST
2) Add a warning to the documentation to let users prepare
3) Ease the management overhead via upgrade story and layered 
configuration. Maybe even with unified failure management.

4) Change the default value of the option to new behavior
5) Drop option in 2.0

Regards,
Timo



On 01.12.21 17:13, Marios Trivyzas wrote:

FYI: Opened https://github.com/apache/flink/pull/17985 which will introduce
the config option,
so we can continue working on the CAST fixes and improvements. It will be
very easy to flip
the default behaviour (currently on the PR: legacy = ENABLED) when this
discussion concludes,
and update the documentation accordingly.

On Mon, Nov 29, 2021 at 10:37 AM Marios Trivyzas  wrote:


I definitely understand the argument to continue supporting the existing
(return null) as the default behaviour.
I'd like to point out though that if we decide this (option no.2) it's
kind of unnatural, to directly drop the flag in *Flink 2.0*
for example, and force the users at that point to use either *CAST *(throw
errors) or *TRY_CAST* (define a default return value).

The decision for the default value of this flag, is a commitment, because
In my opinion, changing this default value in the future
to throw errors instead, is not an option, as this will definitely confuse
the users, so the next step (in future versions) would be to
completely drop the flag and have the users choosing between *CAST* and
*TRY_CAST.*

Therefore, and speaking from a developing cycle perspective, my personal
preference is to go with option no.1 which is in line
with the usual approach (at least to my experience :)) in the open source
software.

Best regards,
Marios


On Tue, Nov 23, 2021 at 12:59 PM Martijn Visser 
wrote:


Hi all,

My conclusion at this point is that there is consensus that the default
behaviour of CAST should raise errors when it fails and that it should be
configurable via a configuration flag.

The open discussion is on when we want to fix the current (incorrect)
behaviour:

1. Doing it in the next Flink release (1.15) by setting the configuration
flag to fail by default
2. Keep the current (incorrect) behaviour in Flink 1.15 by setting the
configuration flag to the current situation and only changing this
if/whenever a Flink 2.0 version is released.

 From my perspective, I can understand that going for option 2 is a
preferred option for those that are running large Flink setups with a
great
number of users. I am wondering if those platforms also have the ability
to
set default values and/or override user configuration. That could be a way
to solve this issue for these platform teams.

I would prefer to go for option 1, because I think correct execution of
CAST is important, especially for new Flink users. These new users should
have a smooth user experience and shouldn't need to change configuration
flags to get correct behaviour. I do expect that users who have used Flink
before are more familiar with checking release notes and interpreting how
this potentially affects them. That's why we have release notes. I also
doubt that we will have a Flink 2.0 release any time soon, meaning that we
are only going to make the pain even bigger for more users if we change
this incorrect behaviour at a later time.

Best regards,

Martijn

On Tue, 23 Nov 2021 at 02:10, Kurt Young  wrote:


This is what I don't 

[jira] [Created] (FLINK-25235) Re-enable ZooKeeperLeaderElectionITCase#testJobExecutionOnClusterWithLeaderChange

2021-12-09 Thread Jira
David Morávek created FLINK-25235:
-

 Summary: Re-enable 
ZooKeeperLeaderElectionITCase#testJobExecutionOnClusterWithLeaderChange
 Key: FLINK-25235
 URL: https://issues.apache.org/jira/browse/FLINK-25235
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: David Morávek
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Creating an external connector repository

2021-12-09 Thread Arvid Heise
Hi all,

We tried out Chesnay's proposal and went with Option 2. Unfortunately, we
experienced tough nuts to crack and feel like we hit a dead end:
- The main pain point with the outlined Frankensteinian connector repo is
how to handle shared code / infra code. If we have it in some 
branch, then we need to merge the common branch in the connector branch on
update. However, it's unclear to me how improvements in the common branch
that naturally appear while working on a specific connector go back into
the common branch. You can't use a pull request from your branch or else
your connector code would poison the connector-less common branch. So you
would probably manually copy the files over to a common branch and create a
PR branch for that.
- A weird solution could be to have the common branch as a submodule in the
repo itself (if that's even possible). I'm sure that this setup would blow
up the minds of all newcomers.
- Similarly, it's mandatory to have safeguards against code from connector
A poisoning connector B, common, or main. I had some similar setup in the
past and code from two "distinct" branch types constantly swept over.
- We could also say that we simply release  independently and just
have a maven (SNAPSHOT) dependency on it. But that would create a weird
flow if you need to change in common where you need to constantly switch
branches back and forth.
- In general, Frankensteinian's approach is very switch intensive. If you
maintain 3 connectors and need to fix 1 build stability each at the same
time (quite common nowadays for some reason) and you have 2 review rounds,
you need to switch branches 9 times ignoring changes to common.

Additionally, we still have the rather user/dev unfriendly main that is
mostly empty. I'm also not sure we can generate an overview README.md to
make it more friendly here because in theory every connector branch should
be based on main and we would get merge conflicts.

I'd like to propose once again to go with individual repositories.
- The only downside that we discussed so far is that we have more initial
setup to do. Since we organically grow the number of connector/repositories
that load is quite distributed. We can offer templates after finding a good
approach that can even be used by outside organizations.
- Regarding secrets, I think it's actually an advantage that the Kafka
connector has no access to the AWS secrets. If there are secrets to be
shared across connectors, we can and should use Azure's Variable Groups (I
have used it in the past to share Nexus creds across repos). That would
also make rotation easy.
- Working on different connectors would be rather easy as all modern IDE
support multiple repo setups in the same project. You still need to do
multiple releases in case you update common code (either accessed through
Nexus or git submodule) and you want to release your connector.
- There is no difference in respect to how many CI runs there in both
approaches.
- Individual repositories also have the advantage of allowing external
incubation. Let's assume someone builds connector A and hosts it in their
organization (very common setup). If they want to contribute the code to
Flink, we could simply transfer the repository into ASF after ensuring
Flink coding standards. Then we retain git history and Github issues.

Is there any point that I'm missing?

On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler  wrote:

> For sharing workflows we should be able to use composite actions. We'd
> have the main definition files in the flink-connectors repo, that we
> also need to tag/release, which other branches/repos can then import.
> These are also versioned, so we don't have to worry about accidentally
> breaking stuff.
> These could also be used to enforce certain standards / interfaces such
> that we can automate more things (e.g., integration into the Flink
> documentation).
>
> It is true that Option 2) and dedicated repositories share a lot of
> properties. While I did say in an offline conversation that we in that
> case might just as well use separate repositories, I'm not so sure
> anymore. One repo would make administration a bit easier, for example
> secrets wouldn't have to be applied to each repo (we wouldn't want
> certain secrets to be set up organization-wide).
> I overall also like that one repo would present a single access point;
> you can't "miss" a connector repo, and I would hope that having it as
> one repo would nurture more collaboration between the connectors, which
> after all need to solve similar problems.
>
> It is a fair point that the branching model would be quite weird, but I
> think that would subside pretty quickly.
>
> Personally I'd go with Option 2, and if that doesn't work out we can
> still split the repo later on. (Which should then be a trivial matter of
> copying all /* branches and renaming them).
>
> On 26/11/2021 12:47, Till Rohrmann wrote:
> > Hi Arvid,
> >
> > Thanks for updating this thread with the latest 

[jira] [Created] (FLINK-25234) Flink should parse ISO timestamp in UTC format

2021-12-09 Thread Egor Ryashin (Jira)
Egor Ryashin created FLINK-25234:


 Summary: Flink should parse ISO timestamp in UTC format
 Key: FLINK-25234
 URL: https://issues.apache.org/jira/browse/FLINK-25234
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.14.0
Reporter: Egor Ryashin


Error parsing timestamp with ISO-8601 format:
{code:java}
[ERROR] Could not execute SQL statement. Reason:
java.time.format.DateTimeParseException: Text '2021-12-08T12:59:57.028Z' could 
not be parsed, unparsed text found at index 23 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-09 Thread Timo Walther

Hi Jing and Godfrey,

I had another iteration over the document. There are two major changes:

1. Supported Flink Upgrade Versions

I got the feedback via various channels that a step size of one minor 
version is not very convenient. As you said, "because upgrading to a new

version is a time-consuming process". I rephrased this section:

Upgrading usually involves work which is why many users perform this 
task rarely (e.g. only once per year). Also skipping a versions is 
common until a new feature has been introduced for which is it worth to 
upgrade. We will support the upgrade to the most recent Flink version 
from a set of previous versions. We aim to support upgrades from the 
last 2-3 releases on a best-effort basis; maybe even more depending on 
the maintenance overhead. However, in order to not grow the testing 
matrix infinitely and to perform important refactoring if necessary, we 
only guarantee upgrades with a step size of a single minor version (i.e. 
a cascade of upgrades).


2. Annotation Design

I also adopted the multiple annotations design for the previous 
supportPlanFormat. So no array of versions anymore. I reworked the 
section, please have a look with updated examples:


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI

I also got the feedback offline that `savepoint` might not be the right 
terminology for the annotation. I changed that to minPlanVersion and 
minStateVersion.


Let me know what you think.

Regards,
Timo



On 09.12.21 08:44, Jing Zhang wrote:

Hi Timo,
Thanks a lot for driving this discussion.
I believe it could solve many problems what we are suffering in upgrading.

I only have a little complain on the following point.


For simplification of the design, we assume that upgrades use a step size

of a single minor version. We don't guarantee skipping minor versions (e.g.
1.11 to
1.14).

In our internal production environment, we follow up with the community's
latest stable release version almost once a year because upgrading to a new
version is a time-consuming process.
So we might missed 1~3 version after we upgrade to the latest version. This
might also appears in other company too.
Could we guarantee FLIP-190 work if we skip minor versions less than
specified threshold?
Then we could know which version is good for us when prepare upgrading.

Best,
Jing Zhang

godfrey he  于2021年12月8日周三 22:16写道:


Hi Timo,

Thanks for the explanation, it's much clearer now.

One thing I want to confirm about `supportedPlanFormat `
and `supportedSavepointFormat `:
`supportedPlanFormat ` supports multiple versions,
while `supportedSavepointFormat ` supports only one version ?
A json plan  can be deserialized by multiple versions
because default value will be set for new fields.
In theory, a Savepoint can be restored by more than one version
of the operators even if a state layout is changed,
such as deleting a whole state and starting job with
`allowNonRestoredState`=true.
I think this is a corner case, and it's hard to understand comparing
to `supportedPlanFormat ` supporting multiple versions.
So, for most cases, when the state layout is changed, the savepoint is
incompatible,
and `supportedSavepointFormat` and version need to be changed.

I think we need a detail explanation about the annotations change story in
the java doc of  `ExecNodeMetadata` class for all developers
(esp. those unfamiliar with this part).

Best,
Godfrey

Timo Walther  于2021年12月8日周三 下午4:57写道:


Hi Wenlong,

thanks for the feedback. Great that we reached consensus here. I will
update the entire document with my previous example shortly.

  > if we don't update the version when plan format changes, we can't
find that the plan can't not be deserialized in 1.15

This should not be a problem as the entire plan file has a version as
well. We should not allow reading a 1.16 plan in 1.15. We can throw a
helpful exception early.

Reading a 1.15 plan in 1.16 is possible until we drop the old
`supportedPlanFormat` from one of used ExecNodes. Afterwards all
`supportedPlanFormat` of ExecNodes must be equal or higher then the plan
version.

Regards,
Timo

On 08.12.21 03:07, wenlong.lwl wrote:

Hi, Timo,  +1 for multi metadata.

The compatible change I mean in the last email is the slight state

change

example you gave, so we have got  consensus on this actually, IMO.

Another question based on the example you gave:
In the example "JSON node gets an additional property in 1.16", if we

don't

update the version when plan format changes, we can't find that the

plan

can't not be deserialized in 1.15, although the savepoint state is
compatible.
The error message may be not so friendly if we just throw

deserialization

failure.

On Tue, 7 Dec 2021 at 16:49, Timo Walther  wrote:


Hi Wenlong,

   > First,  we add a newStateLayout because of some improvement in

state, in

   > order to keep compatibility we may still keep the old state for

the

first
  

[jira] [Created] (FLINK-25233) UpsertKafkaTableITCase.testAggregate fails on AZP

2021-12-09 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-25233:
-

 Summary: UpsertKafkaTableITCase.testAggregate fails on AZP
 Key: FLINK-25233
 URL: https://issues.apache.org/jira/browse/FLINK-25233
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Ecosystem
Affects Versions: 1.15.0
Reporter: Till Rohrmann
 Fix For: 1.15.0


{{UpsertKafkaTableITCase.testAggregate}} fails on AZP with

{code}
2021-12-09T01:41:49.8038402Z Dec 09 01:41:49 [ERROR] 
UpsertKafkaTableITCase.testAggregate  Time elapsed: 90.624 s  <<< ERROR!
2021-12-09T01:41:49.8039372Z Dec 09 01:41:49 
java.util.concurrent.ExecutionException: 
org.apache.flink.table.api.TableException: Failed to wait job finish
2021-12-09T01:41:49.8040303Z Dec 09 01:41:49at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
2021-12-09T01:41:49.8040956Z Dec 09 01:41:49at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
2021-12-09T01:41:49.8041862Z Dec 09 01:41:49at 
org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118)
2021-12-09T01:41:49.8042939Z Dec 09 01:41:49at 
org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81)
2021-12-09T01:41:49.8044130Z Dec 09 01:41:49at 
org.apache.flink.streaming.connectors.kafka.table.UpsertKafkaTableITCase.wordCountToUpsertKafka(UpsertKafkaTableITCase.java:436)
2021-12-09T01:41:49.8045308Z Dec 09 01:41:49at 
org.apache.flink.streaming.connectors.kafka.table.UpsertKafkaTableITCase.testAggregate(UpsertKafkaTableITCase.java:79)
2021-12-09T01:41:49.8045940Z Dec 09 01:41:49at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2021-12-09T01:41:49.8052892Z Dec 09 01:41:49at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2021-12-09T01:41:49.8053812Z Dec 09 01:41:49at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2021-12-09T01:41:49.8054458Z Dec 09 01:41:49at 
java.lang.reflect.Method.invoke(Method.java:498)
2021-12-09T01:41:49.8055027Z Dec 09 01:41:49at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2021-12-09T01:41:49.8055649Z Dec 09 01:41:49at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2021-12-09T01:41:49.8056644Z Dec 09 01:41:49at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2021-12-09T01:41:49.8057911Z Dec 09 01:41:49at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2021-12-09T01:41:49.8058858Z Dec 09 01:41:49at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2021-12-09T01:41:49.8059907Z Dec 09 01:41:49at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2021-12-09T01:41:49.8060871Z Dec 09 01:41:49at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
2021-12-09T01:41:49.8061847Z Dec 09 01:41:49at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
2021-12-09T01:41:49.8062898Z Dec 09 01:41:49at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
2021-12-09T01:41:49.8063804Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2021-12-09T01:41:49.8064963Z Dec 09 01:41:49at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
2021-12-09T01:41:49.8065992Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
2021-12-09T01:41:49.8066940Z Dec 09 01:41:49at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
2021-12-09T01:41:49.8067939Z Dec 09 01:41:49at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
2021-12-09T01:41:49.8068904Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2021-12-09T01:41:49.8069837Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2021-12-09T01:41:49.8070715Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2021-12-09T01:41:49.8071587Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2021-12-09T01:41:49.8072582Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2021-12-09T01:41:49.8073540Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
2021-12-09T01:41:49.8074407Z Dec 09 01:41:49at 
org.junit.runners.Suite.runChild(Suite.java:128)
2021-12-09T01:41:49.8075215Z Dec 09 01:41:49at 
org.junit.runners.Suite.runChild(Suite.java:27)
2021-12-09T01:41:49.8080759Z Dec 09 01:41:49at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2021-12-09T01:41:49.8081754Z Dec 09 

[jira] [Created] (FLINK-25232) flink sql supports fine-grained state configuration

2021-12-09 Thread ZhuoYu Chen (Jira)
ZhuoYu Chen created FLINK-25232:
---

 Summary: flink sql supports fine-grained state configuration
 Key: FLINK-25232
 URL: https://issues.apache.org/jira/browse/FLINK-25232
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: ZhuoYu Chen


In production, I found that if I configure the TTL of the state in sql, it will 
cause this ttl time to take effect globally
If I have the following sql grouped by region:
select count(1),region from (select * from A join B on a.uid = b.uid)
If I configure a global TTL it will cause count the status of this 
GroupAggFunction to be eliminated, for example the accumulation will be cleared 
after one day
If I don't configure it, it will cause the status of Regular join to increase 
again 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] FLIP-194: Introduce the JobResultStore

2021-12-09 Thread David Morávek
Hi everyone,

I'm happy to announce that we have unanimously approved this FLIP.

There are 5 approving votes, 4 of which are binding:

Till Rohrmann (binding)
Xintong Song (binding)
Yang Wang (non-binding)
Zhu Zhu (binding)
Matthias Pohl (binding)

There are no disapproving votes. Thanks everyone for voting!

Best,
D.

On Tue, Dec 7, 2021 at 12:33 PM Matthias Pohl 
wrote:

> +1 (binding)
>
> On Tue, Dec 7, 2021 at 3:32 AM Zhu Zhu  wrote:
>
> > +1 (binding)
> >
> > Thanks,
> > Zhu
> >
> > Yang Wang  于2021年12月7日周二 上午9:52写道:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Yang
> > >
> > > Xintong Song  于2021年12月7日周二 上午9:38写道:
> > >
> > > > +1 (binding)
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Dec 6, 2021 at 5:02 PM Till Rohrmann 
> > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Dec 6, 2021 at 10:00 AM David Morávek 
> > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'd like to open a vote on FLIP-194: Introduce the JobResultStore
> > [1]
> > > > > which
> > > > > > has been
> > > > > > discussed in this thread [2].
> > > > > > The vote will be open for at least 72 hours unless there is an
> > > > objection
> > > > > or
> > > > > > not enough votes.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726435
> > > > > > [2]
> > https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
> > > > > >
> > > > > > Best,
> > > > > > D.
> > > > > >
> > > > >
> > > >
> > >
>


[jira] [Created] (FLINK-25231) Update PyFlink to use the new type system

2021-12-09 Thread Dian Fu (Jira)
Dian Fu created FLINK-25231:
---

 Summary: Update PyFlink to use the new type system
 Key: FLINK-25231
 URL: https://issues.apache.org/jira/browse/FLINK-25231
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.15.0


Currently, there are a lot of places in PyFlink Table API still using the 
legacy type system. We need to revisit this and migrate them to the new type 
system(DataType/LogicalType).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25230) Harden type serialization in JSON plan

2021-12-09 Thread Timo Walther (Jira)
Timo Walther created FLINK-25230:


 Summary: Harden type serialization in JSON plan
 Key: FLINK-25230
 URL: https://issues.apache.org/jira/browse/FLINK-25230
 Project: Flink
  Issue Type: Sub-task
Reporter: Timo Walther


1. Introduce two representations for LogicalType

Compact one (using asSerializableString):

{code}
// compact one
outputType: "ROW"

// full one for all kinds of logical types (time attributes, char(0), inline 
structured, etc.)
outputType: {
  "root" : "ROW",
  "nullable" : true,
  "fields" : [ {
"i" : "INT"
  }, {
"s" : "VARCHAR(2147483647)"
  }]
}
{code}

2. Drop support of legacy types and symbol classes which should not be part of 
the plan

3. Rework DataView support (shorten, remove concrete classes, support any 
external type in accumulators)

4. Implement a DataTypeJsonDeSerializer

5. Replace RelDataTypeJsonDeSerializer with LogicalTypeJsonDeSerializer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25229) Introduce flink-table-api-bridge-common

2021-12-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25229:
---

 Summary: Introduce flink-table-api-bridge-common
 Key: FLINK-25229
 URL: https://issues.apache.org/jira/browse/FLINK-25229
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Planner
Reporter: Francesco Guardiani


This package should deduplicate code from api-java-bridge and api-scala-bridge, 
notably:

* the various operations provided by both {{ScalaDataStreamQueryOperation}} and 
{{JavaDataStreamQueryOperation}} (which are essentially the same code)
* some code in {{StreamTableEnvironmentImpl}} and {{StreamStatementSetImpl}}

The end goal is that planner should remove the runtime (not test) dependency on 
flink-scala-api and flink-scala-api-bridge



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25228) Introduce flink-table-test-utils

2021-12-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25228:
---

 Summary: Introduce flink-table-test-utils
 Key: FLINK-25228
 URL: https://issues.apache.org/jira/browse/FLINK-25228
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Ecosystem, Table SQL / 
Planner
Reporter: Francesco Guardiani


Introduce a package to ship test utilities for formats, connectors and end 
users.

This package should provide:

* Assertions for data types, logical types and internal data structures.
* Test cases for formats and connnectors

The end goal is to remove the test-jar planner dependency in formats and 
connectors and replace it with this package, so formats and connectors can then 
just depend on table-planner-loader.   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)