[jira] [Created] (FLINK-26273) Test checkpoints restore modes & formats

2022-02-20 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-26273:


 Summary: Test checkpoints restore modes & formats
 Key: FLINK-26273
 URL: https://issues.apache.org/jira/browse/FLINK-26273
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Reporter: Dawid Wysakowicz
 Fix For: 1.15.0


We should test manually changes introduced in [FLINK-25276] & [FLINK-25154]

Proposal: 
Take canonical savepoint/native savepoint/externalised checkpoint (with 
RocksDB), and perform claim (1)/no claim (2) recoveries, and verify that in:
1. after a couple of checkpoints claimed files have been cleaned up
2. that after a single successful checkpoint, you can remove the start up files 
and failover the job without any errors.
3. take a native, incremental RocksDB savepoint, move to a different directory, 
restore from it

documentation:
1. 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#restore-mode
2. 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#savepoint-format



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26272) Elasticsearch7SinkITCase.testWriteJsonToElasticsearch fails with socket timeout

2022-02-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-26272:
-

 Summary: Elasticsearch7SinkITCase.testWriteJsonToElasticsearch 
fails with socket timeout
 Key: FLINK-26272
 URL: https://issues.apache.org/jira/browse/FLINK-26272
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Affects Versions: 1.15.0
Reporter: Matthias Pohl


We observed a test failure in [this 
build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31883&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=12917]
 with {{Elasticsearch7SinkITCase.testWriteJsonToElasticsearch}} failing due to 
a {{SocketTimeoutException}}:

{code}
Feb 18 18:04:20 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 80.248 s <<< FAILURE! - in 
org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkITCase
Feb 18 18:04:20 [ERROR] 
org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkITCase.testWriteJsonToElasticsearch(BiFunction)[1]
  Time elapsed: 31.525 s  <<< ERROR!
Feb 18 18:04:20 org.apache.flink.runtime.client.JobExecutionException: Job 
execution failed.
Feb 18 18:04:20 at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
Feb 18 18:04:20 at 
org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
Feb 18 18:04:20 at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:259)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
Feb 18 18:04:20 at 
org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
Feb 18 18:04:20 at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
Feb 18 18:04:20 at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
Feb 18 18:04:20 at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
Feb 18 18:04:20 at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
Feb 18 18:04:20 at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
Feb 18 18:04:20 at akka.dispatch.OnComplete.internal(Future.scala:300)
Feb 18 18:04:20 at akka.dispatch.OnComplete.internal(Future.scala:297)
Feb 18 18:04:20 at 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
Feb 18 18:04:20 at 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
Feb 18 18:04:20 at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
Feb 18 18:04:20 at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
Feb 18 18:04:20 at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
[...]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1241)
Feb 18 18:04:20 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1126)
Feb 18 18:04:20 ... 13 more
Feb 18 18:04:20 Caused by: java.net.SocketTimeoutException: 30,000 milliseconds 
timeout on connection http-outgoing-3 [ACTIVE]
Feb 18 18:04:20 at 
org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
Feb 18 18:04:20 at 
org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
Feb 18 18:04:20 at 
org.apache.http.impl.nio.client.InternalIODispatch

[jira] [Created] (FLINK-26271) TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed

2022-02-20 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26271:
-

 Summary: TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed
 Key: FLINK-26271
 URL: https://issues.apache.org/jira/browse/FLINK-26271
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Affects Versions: 1.13.6
Reporter: Till Rohrmann


The test {{TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed}} failed on 
AZP with

{code}
Feb 21 03:27:40 [ERROR] 
testCancelTaskExceptionAfterTaskMarkedFailed(org.apache.flink.runtime.taskmanager.TaskTest)
  Time elapsed: 0.043 s  <<< FAILURE!
Feb 21 03:27:40 java.lang.AssertionError: expected: but was:
Feb 21 03:27:40 at org.junit.Assert.fail(Assert.java:88)
Feb 21 03:27:40 at org.junit.Assert.failNotEquals(Assert.java:834)
Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:118)
Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:144)
Feb 21 03:27:40 at 
org.apache.flink.runtime.taskmanager.TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed(TaskTest.java:598)
Feb 21 03:27:40 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Feb 21 03:27:40 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Feb 21 03:27:40 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Feb 21 03:27:40 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 21 03:27:40 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
Feb 21 03:27:40 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Feb 21 03:27:40 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
Feb 21 03:27:40 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Feb 21 03:27:40 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Feb 21 03:27:40 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Feb 21 03:27:40 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Feb 21 03:27:40 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
Feb 21 03:27:40 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
Feb 21 03:27:40 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
Feb 21 03:27:40 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31903&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=c2734c79-73b6-521c-e85a-67c7ecae9107&l=5762



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26270) Flink SQL write data to kafka by CSV format , decimal type was converted to scientific notation

2022-02-20 Thread fengjk (Jira)
fengjk created FLINK-26270:
--

 Summary: Flink SQL write data to kafka by CSV format , decimal 
type was converted to scientific notation
 Key: FLINK-26270
 URL: https://issues.apache.org/jira/browse/FLINK-26270
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.12.4
Reporter: fengjk
 Fix For: 1.12.4
 Attachments: image-2022-02-21-14-11-49-617.png, 
image-2022-02-21-14-12-17-845.png, image-2022-02-21-14-13-28-605.png

Source:Oracle

field type:decimal

!image-2022-02-21-14-12-17-845.png|width=362,height=137!

!image-2022-02-21-14-11-49-617.png|width=756,height=268!

 

Sink::kafka

field type:decimal

format:CSV

!image-2022-02-21-14-13-28-605.png|width=259,height=184!

 

Cannot set not to convert to scientific notation

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-20 Thread Yang Wang
I also lean to persist the FlinkDeployment and its status via K8s resources.
Unless necessary, we should not introduce other external dependencies(e.g.
mysql).
It will make the k8s operator more complicated.


Best,
Yang

Gyula Fóra  于2022年2月21日周一 02:48写道:

> Hi!
>
> Thank you for your interest in contributing to the operator.
>
> The operator persists information in the status of the FlinkDeployment
> resource. We should not need any additional persistence layer on top of
> this in the current design.
>
> Could you please give me a concrete example of what is not working with the
> current design?
>
> Thanks,
> Gyula
>
> On Sat, Feb 19, 2022 at 7:06 AM zhengyu chen  wrote:
>
> > Hi, regarding the construction of k8s Flink Operator, I have already
> > completed some functions. I hope to contribute this part of the functions
> > and discuss with the community how to improve it. How should I start?
> >
> > So far I have seen that the component has no operation persistence.
> Should
> > we persist its operation? for example, when I have a SessionCluster
> > deployment, I need to write its metadata to an external storage system in
> > yaml mode,
> > such as use mysql for storage. This design idea is similar to etcd in
> > k8s.If our  k8s Flink Operator application is restarted, We can recover
> > metadata information about deployment jobs, clusters, and so on based on
> > the database
> >
> > Best
> > ConradJam
> >
> > On 2022/01/25 05:08:01 Thomas Weise wrote:
> > > Hi,
> > >
> > > As promised in [1] we would like to start the discussion on the
> > > addition of a Kubernetes operator to the Flink project as FLIP-212:
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > >
> > > Please note that the FLIP is currently focussed on the overall
> > > direction; the intention is to fill in more details once we converge
> > > on the high level plan.
> > >
> > > Thanks and looking forward to a lively discussion!
> > >
> > > Thomas
> > >
> > > [1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4
> > >
> >
>


[jira] [Created] (FLINK-26269) Add clustering algorithm support for KMeans in ML Python API

2022-02-20 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-26269:


 Summary: Add clustering algorithm support for KMeans in ML Python 
API
 Key: FLINK-26269
 URL: https://issues.apache.org/jira/browse/FLINK-26269
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Library / Machine Learning
Affects Versions: ml-2.1.0
Reporter: Huang Xingbo
 Fix For: ml-2.1.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26268) Add classfication algorithm support for LogisticRegression, KNN and NaiveBayes in ML Python API

2022-02-20 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-26268:


 Summary: Add classfication algorithm support for 
LogisticRegression, KNN and NaiveBayes in ML Python API
 Key: FLINK-26268
 URL: https://issues.apache.org/jira/browse/FLINK-26268
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Library / Machine Learning
Affects Versions: ml-2.1.0
Reporter: Huang Xingbo
 Fix For: ml-2.1.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26267) Add common params interface in ML Python API

2022-02-20 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-26267:


 Summary: Add common params interface in ML Python API
 Key: FLINK-26267
 URL: https://issues.apache.org/jira/browse/FLINK-26267
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Library / Machine Learning
Affects Versions: ml-2.1.0
Reporter: Huang Xingbo
 Fix For: ml-2.1.0


We will add the common used params interface of 
HasDistanceMeasure,HasFeaturesCol,HasGlobalBatchSize,HasHandleInvalid,HasInputCols,HasLabelCol,HasLearningRate,HasMaxIter,HasMultiClass,HasOutputCols,HasPredictionCol,HasRawPredictionCol,HasReg,HasSeed,HasTol
 and HasWeightCol



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26266) Support Vector and Matrix

2022-02-20 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-26266:


 Summary: Support Vector and Matrix
 Key: FLINK-26266
 URL: https://issues.apache.org/jira/browse/FLINK-26266
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Library / Machine Learning
Affects Versions: ml-2.1.0
Reporter: Huang Xingbo
 Fix For: ml-2.1.0


We will add class of DenseVector, SparseVector, DenseMatrix and SparseMatrix



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26265) Support Java Algorithm in Python ML API

2022-02-20 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-26265:


 Summary: Support Java Algorithm in Python ML API
 Key: FLINK-26265
 URL: https://issues.apache.org/jira/browse/FLINK-26265
 Project: Flink
  Issue Type: New Feature
  Components: API / Python, Library / Machine Learning
Affects Versions: ml-2.1.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: ml-2.1.0


In Flink ML 2.0, we provide a new version of ML Pipeline API and implement some 
traditional machine learning algorithms based on the API provided by Java. 
Although we have also provided the Python Pipeline API in Flink ML 2.0, we do 
not intend to re-implement these algorithms in the Python API. Instead, we need 
to provide a mechanism to expose the algorithms implemented by these Java APIs 
to Python users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26264) Python Test failed on its docs building phase

2022-02-20 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-26264:
--

 Summary: Python Test failed on its docs building phase
 Key: FLINK-26264
 URL: https://issues.apache.org/jira/browse/FLINK-26264
 Project: Flink
  Issue Type: Bug
Reporter: Leonard Xu


 
{code:java}
Feb 18 15:28:17 installing tox...
Feb 18 15:28:19 install tox... [SUCCESS]
Feb 18 15:28:19 installing flake8...
Feb 18 15:28:21 install flake8... [SUCCESS]
Feb 18 15:28:21 installing sphinx...
Feb 18 15:28:28 install sphinx... [SUCCESS]
Feb 18 15:28:28 installing mypy...
Feb 18 15:28:34 install mypy... [SUCCESS]
Feb 18 15:28:34 ===install environment... [SUCCESS]===
Feb 18 15:28:34 ===checks starting
Feb 18 15:28:34 flake8 checks=
Feb 18 15:28:36 ==flake8 checks... [SUCCESS]==
Feb 18 15:28:36 =mypy checks==
Feb 18 15:28:40 Success: no issues found in 65 source files
Feb 18 15:28:41 ===mypy checks... [SUCCESS]===
Feb 18 15:28:41 rm -rf _build/*
Feb 18 15:28:41 /__w/1/s/flink-python/dev/.conda/bin/sphinx-build -b html -d 
_build/doctrees  -a -W . _build/html
Feb 18 15:28:41 Running Sphinx v2.4.4
Feb 18 15:28:41 
Feb 18 15:28:41 Warning, treated as error:
Feb 18 15:28:41 node class 'meta' is already registered, its visitors will be 
overridden
Feb 18 15:28:41 Makefile:76: recipe for target 'html' failed
Feb 18 15:28:41 make: *** [html] Error 2
Feb 18 15:28:41 ==sphinx checks... [FAILED]===
Feb 18 15:28:41 Process exited with EXIT CODE: 1.
Feb 18 15:28:41 Trying to KILL watchdog (2871).
/__w/1/s/tools/ci/watchdog.sh: line 100:  2871 Terminated  watchdog
Feb 18 15:28:41 Searching for .dump, .dumpstream and related files in '/__w/1/s'
The STDIO streams did not close within 10 seconds of the exit event from 
process '/bin/bash'. This may indicate a child process inherited the STDIO 
streams and has not yet exited.
##[error]Bash exited with code '1'.
Finishing: Test - python {code}
 

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31869&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=c67e71ed-6451-5d26-8920-5a8cf9651901



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26263) Check data size in LogisticRegression

2022-02-20 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-26263:


 Summary: Check data size in LogisticRegression
 Key: FLINK-26263
 URL: https://issues.apache.org/jira/browse/FLINK-26263
 Project: Flink
  Issue Type: Bug
  Components: Library / Machine Learning
Affects Versions: ml-2.0.0
Reporter: Yunfeng Zhou


In Flink ML LogisticRegression, the algorithm would fail if the parallelism is 
larger than input data size. For example, in 
`LogisticRegressionTest.testFitAndPredict()` if we add the following code

```java

env.setParallelism(12);

```

Then the test case would fail with the following exception

```

Caused by: java.lang.IllegalArgumentException: bound must be positive
    at java.base/java.util.Random.nextInt(Random.java:388)
    at 
org.apache.flink.ml.classification.logisticregression.LogisticRegression$CacheDataAndDoTrain.getMiniBatchData(LogisticRegression.java:351)
    at 
org.apache.flink.ml.classification.logisticregression.LogisticRegression$CacheDataAndDoTrain.onEpochWatermarkIncremented(LogisticRegression.java:381)
    at 
org.apache.flink.iteration.operator.AbstractWrapperOperator.notifyEpochWatermarkIncrement(AbstractWrapperOperator.java:129)
    at 
org.apache.flink.iteration.operator.allround.AbstractAllRoundWrapperOperator.lambda$1(AbstractAllRoundWrapperOperator.java:105)
    at 
org.apache.flink.iteration.operator.OperatorUtils.processOperatorOrUdfIfSatisfy(OperatorUtils.java:79)
    at 
org.apache.flink.iteration.operator.allround.AbstractAllRoundWrapperOperator.onEpochWatermarkIncrement(AbstractAllRoundWrapperOperator.java:102)
    at 
org.apache.flink.iteration.progresstrack.OperatorEpochWatermarkTracker.tryUpdateLowerBound(OperatorEpochWatermarkTracker.java:79)
    at 
org.apache.flink.iteration.progresstrack.OperatorEpochWatermarkTracker.onEpochWatermark(OperatorEpochWatermarkTracker.java:63)
    at 
org.apache.flink.iteration.operator.AbstractWrapperOperator.onEpochWatermarkEvent(AbstractWrapperOperator.java:121)
    at 
org.apache.flink.iteration.operator.allround.TwoInputAllRoundWrapperOperator.processElement(TwoInputAllRoundWrapperOperator.java:77)
    at 
org.apache.flink.iteration.operator.allround.TwoInputAllRoundWrapperOperator.processElement2(TwoInputAllRoundWrapperOperator.java:59)
    at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord2(StreamTwoInputProcessorFactory.java:225)
    at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$1(StreamTwoInputProcessorFactory.java:194)
    at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:266)
    at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
    at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
    at 
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:86)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
    at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
    at java.base/java.lang.Thread.run(Thread.java:834)

```

The cause of this exception is that LogisticRegression has not considered the 
case when input data size is 0. This can be resolved by adding an additional 
check.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26262) Rest API to submit PyFlink job

2022-02-20 Thread Thanh Thao Huynh (Jira)
Thanh Thao Huynh created FLINK-26262:


 Summary: Rest API to submit PyFlink job
 Key: FLINK-26262
 URL: https://issues.apache.org/jira/browse/FLINK-26262
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: Thanh Thao Huynh


According to this document 
[https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/],
 there are Rest APIs to work with Flink. I would like to request Rest APIs to 
work with PyFlink job such as submit Python file,...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26261) Reconciliation should try to start job when not already started or move to permanent error

2022-02-20 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-26261:


 Summary: Reconciliation should try to start job when not already 
started or move to permanent error
 Key: FLINK-26261
 URL: https://issues.apache.org/jira/browse/FLINK-26261
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Thomas Weise


When job submission fails, the operator currently keeps trying to find the job 
status. In the case I'm looking at the cluster wasn't created because the image 
could not be resolved. We either need the logic to re-attempt job submission or 
flag the submission as failed so that JobStatusObserver does not attempt to 
check again. We should also capture the submission error as event on the CR.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26260) Support watching specific namespace for FlinkDeployments

2022-02-20 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-26260:
--

 Summary: Support watching specific namespace for FlinkDeployments
 Key: FLINK-26260
 URL: https://issues.apache.org/jira/browse/FLINK-26260
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Gyula Fora


Currently the operator is watching all the namespaces for FlinkDeployments. We 
should support configuring it to watch a specific namespace instead.

cc [~mbalassi] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Plan to externalize connectors and versioning

2022-02-20 Thread Chesnay Schepler
If we don't make a release, I think it would appear as partially 
externalized (since the binaries are still only created with Flink core, 
not from the external repository).


I'm wondering you are referring to when you say "it appear[s]". Users 
don't know about it, and contributors can be easily informed about the 
current state. Who's opinion are you worried about?


doing a release [means] that we have then completely externalized the 
connector


In my mind the connector is completely externalized once the connector 
project is complete and can act independently from the core repo. That 
includes having all the code, working CI and the documentation being 
integrated into the Flink website. And /then/ we can do a release. I 
don't see how this could work any other way; how could we possibly argue 
that the connector is externalized when development on the connector 
isn't even possible in that repository?


There are also other connectors (like Opensearch and I believe RabbitMQ) 
that will end up straight in their own repositories



Which is a bit of a different situation because here the only source of 
this connector will have been that repository.


Would you prefer to remove a connector in a Flink patch release?


No. I think I misread your statement; when you said that there "is 1 
release cycle where the connector both exists in Flink core and the 
external repo", you are referring to 1.15, correct? (although this 
should also apply to 1.14 so isn't it 2 releases...?)
How I understood it was that we'd keep the connector around until 1.16, 
which would obviously be terrible.


On 19/02/2022 13:30, Martijn Visser wrote:

Hi Chesnay,

I think the advantage of also doing a release is that we have then
completely externalized the connector. If we don't make a release, I think
it would appear as partially externalized (since the binaries are still
only created with Flink core, not from the external repository). It would
also mean that in our documentation we would still point to the binary
created with the Flink core release.

There are also other connectors (like Opensearch and I believe RabbitMQ)
that will end up straight in their own repositories. Since we also would
like to document those, I don't think the situation will be messy. We can
also solve it with an information hint in the documentation.

With regards to point 6, do you have an alternative? Would you prefer to
remove a connector in a Flink patch release?

Best regards,

Martijn

On Fri, 18 Feb 2022 at 16:17, Chesnay Schepler  wrote:


Why do you want to immediately do a release for the elasticsearch
connector? What does that provide us?

I'd rather first have a fully working setup and integrated documentation
before thinking about releasing anything.
Once we have that we may be able to externalize all connectors within 1
release cycle and do a clean switch; otherwise we end up with a bit of a
messy situation for users where some connectors use version scheme A and
others B.

I also doubt the value of 6). They'll have to update the version anyway
(and discover at some point that the version scheme has changed), so I
don't see what this makes easier.

On 18/02/2022 14:54, Martijn Visser wrote:

Hi everyone,

As a follow-up to earlier discussions [1] [2] to externalize the

connectors

from the Flink repository, I would like to propose a plan to externalize
these connectors. The goal of this plan is to start with moving

connectors

to its own repositories without introducing regressions for connector
developers.

The plan is as follows:

1. A new repository is requested for a connector.
2. The code for that connector is moved to its individual repository,
including the commit history
3. Any first release made for a connector in an external connector
repository starts with major version 3, so 3.0.0. The reason for that is
that we want to decouple the Flink releases from a connector release. If

we

would start with major version 2, it could cause some confusion because
people could think a Flink 2.0 has been released. This does mean that

each

connector needs to have a compatibility matrix generated, stating which
version number of the connector is compatible with the correct Flink
version.
4. The group id and artifact id for the connector will remain the same,
only the version is different.
5. The connector dependencies on the Flink website are updated to point

to

the newly released connector artifact.
6. If a connector is moved, there is one release cycle where there will

be

binary releases for that connector in both Flink core and from the
connector repository. This is to make Flink users who are upgrading
slightly easier. We will have to make a note in the release notes that a
connector has been moved and that a user should update any references

from

the original connector artifact (from the Flink connector) to the new
connector artifact (from the external conenctor version)

We propose to first try to execute this plan for th

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-20 Thread Gyula Fóra
Hi!

Thank you for your interest in contributing to the operator.

The operator persists information in the status of the FlinkDeployment
resource. We should not need any additional persistence layer on top of
this in the current design.

Could you please give me a concrete example of what is not working with the
current design?

Thanks,
Gyula

On Sat, Feb 19, 2022 at 7:06 AM zhengyu chen  wrote:

> Hi, regarding the construction of k8s Flink Operator, I have already
> completed some functions. I hope to contribute this part of the functions
> and discuss with the community how to improve it. How should I start?
>
> So far I have seen that the component has no operation persistence. Should
> we persist its operation? for example, when I have a SessionCluster
> deployment, I need to write its metadata to an external storage system in
> yaml mode,
> such as use mysql for storage. This design idea is similar to etcd in
> k8s.If our  k8s Flink Operator application is restarted, We can recover
> metadata information about deployment jobs, clusters, and so on based on
> the database
>
> Best
> ConradJam
>
> On 2022/01/25 05:08:01 Thomas Weise wrote:
> > Hi,
> >
> > As promised in [1] we would like to start the discussion on the
> > addition of a Kubernetes operator to the Flink project as FLIP-212:
> >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> >
> > Please note that the FLIP is currently focussed on the overall
> > direction; the intention is to fill in more details once we converge
> > on the high level plan.
> >
> > Thanks and looking forward to a lively discussion!
> >
> > Thomas
> >
> > [1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4
> >
>