[jira] [Created] (FLINK-26273) Test checkpoints restore modes & formats
Dawid Wysakowicz created FLINK-26273: Summary: Test checkpoints restore modes & formats Key: FLINK-26273 URL: https://issues.apache.org/jira/browse/FLINK-26273 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Reporter: Dawid Wysakowicz Fix For: 1.15.0 We should test manually changes introduced in [FLINK-25276] & [FLINK-25154] Proposal: Take canonical savepoint/native savepoint/externalised checkpoint (with RocksDB), and perform claim (1)/no claim (2) recoveries, and verify that in: 1. after a couple of checkpoints claimed files have been cleaned up 2. that after a single successful checkpoint, you can remove the start up files and failover the job without any errors. 3. take a native, incremental RocksDB savepoint, move to a different directory, restore from it documentation: 1. https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#restore-mode 2. https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#savepoint-format -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26272) Elasticsearch7SinkITCase.testWriteJsonToElasticsearch fails with socket timeout
Matthias Pohl created FLINK-26272: - Summary: Elasticsearch7SinkITCase.testWriteJsonToElasticsearch fails with socket timeout Key: FLINK-26272 URL: https://issues.apache.org/jira/browse/FLINK-26272 Project: Flink Issue Type: Bug Components: Connectors / ElasticSearch Affects Versions: 1.15.0 Reporter: Matthias Pohl We observed a test failure in [this build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31883&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=12917] with {{Elasticsearch7SinkITCase.testWriteJsonToElasticsearch}} failing due to a {{SocketTimeoutException}}: {code} Feb 18 18:04:20 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 80.248 s <<< FAILURE! - in org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkITCase Feb 18 18:04:20 [ERROR] org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkITCase.testWriteJsonToElasticsearch(BiFunction)[1] Time elapsed: 31.525 s <<< ERROR! Feb 18 18:04:20 org.apache.flink.runtime.client.JobExecutionException: Job execution failed. Feb 18 18:04:20 at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) Feb 18 18:04:20 at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) Feb 18 18:04:20 at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:259) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) Feb 18 18:04:20 at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389) Feb 18 18:04:20 at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) Feb 18 18:04:20 at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) Feb 18 18:04:20 at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) Feb 18 18:04:20 at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) Feb 18 18:04:20 at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47) Feb 18 18:04:20 at akka.dispatch.OnComplete.internal(Future.scala:300) Feb 18 18:04:20 at akka.dispatch.OnComplete.internal(Future.scala:297) Feb 18 18:04:20 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224) Feb 18 18:04:20 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221) Feb 18 18:04:20 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) Feb 18 18:04:20 at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65) Feb 18 18:04:20 at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) [...] at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1241) Feb 18 18:04:20 at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1126) Feb 18 18:04:20 ... 13 more Feb 18 18:04:20 Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-3 [ACTIVE] Feb 18 18:04:20 at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) Feb 18 18:04:20 at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) Feb 18 18:04:20 at org.apache.http.impl.nio.client.InternalIODispatch
[jira] [Created] (FLINK-26271) TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed
Till Rohrmann created FLINK-26271: - Summary: TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed Key: FLINK-26271 URL: https://issues.apache.org/jira/browse/FLINK-26271 Project: Flink Issue Type: Improvement Components: Runtime / Task Affects Versions: 1.13.6 Reporter: Till Rohrmann The test {{TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed}} failed on AZP with {code} Feb 21 03:27:40 [ERROR] testCancelTaskExceptionAfterTaskMarkedFailed(org.apache.flink.runtime.taskmanager.TaskTest) Time elapsed: 0.043 s <<< FAILURE! Feb 21 03:27:40 java.lang.AssertionError: expected: but was: Feb 21 03:27:40 at org.junit.Assert.fail(Assert.java:88) Feb 21 03:27:40 at org.junit.Assert.failNotEquals(Assert.java:834) Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:118) Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:144) Feb 21 03:27:40 at org.apache.flink.runtime.taskmanager.TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed(TaskTest.java:598) Feb 21 03:27:40 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 21 03:27:40 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 21 03:27:40 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 21 03:27:40 at java.lang.reflect.Method.invoke(Method.java:498) Feb 21 03:27:40 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) Feb 21 03:27:40 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Feb 21 03:27:40 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) Feb 21 03:27:40 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Feb 21 03:27:40 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) Feb 21 03:27:40 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) Feb 21 03:27:40 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Feb 21 03:27:40 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Feb 21 03:27:40 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) Feb 21 03:27:40 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) Feb 21 03:27:40 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) Feb 21 03:27:40 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) Feb 21 03:27:40 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) Feb 21 03:27:40 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) Feb 21 03:27:40 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) Feb 21 03:27:40 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) Feb 21 03:27:40 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Feb 21 03:27:40 at org.junit.runners.ParentRunner.run(ParentRunner.java:363) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31903&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=c2734c79-73b6-521c-e85a-67c7ecae9107&l=5762 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26270) Flink SQL write data to kafka by CSV format , decimal type was converted to scientific notation
fengjk created FLINK-26270: -- Summary: Flink SQL write data to kafka by CSV format , decimal type was converted to scientific notation Key: FLINK-26270 URL: https://issues.apache.org/jira/browse/FLINK-26270 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.12.4 Reporter: fengjk Fix For: 1.12.4 Attachments: image-2022-02-21-14-11-49-617.png, image-2022-02-21-14-12-17-845.png, image-2022-02-21-14-13-28-605.png Source:Oracle field type:decimal !image-2022-02-21-14-12-17-845.png|width=362,height=137! !image-2022-02-21-14-11-49-617.png|width=756,height=268! Sink::kafka field type:decimal format:CSV !image-2022-02-21-14-13-28-605.png|width=259,height=184! Cannot set not to convert to scientific notation -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator
I also lean to persist the FlinkDeployment and its status via K8s resources. Unless necessary, we should not introduce other external dependencies(e.g. mysql). It will make the k8s operator more complicated. Best, Yang Gyula Fóra 于2022年2月21日周一 02:48写道: > Hi! > > Thank you for your interest in contributing to the operator. > > The operator persists information in the status of the FlinkDeployment > resource. We should not need any additional persistence layer on top of > this in the current design. > > Could you please give me a concrete example of what is not working with the > current design? > > Thanks, > Gyula > > On Sat, Feb 19, 2022 at 7:06 AM zhengyu chen wrote: > > > Hi, regarding the construction of k8s Flink Operator, I have already > > completed some functions. I hope to contribute this part of the functions > > and discuss with the community how to improve it. How should I start? > > > > So far I have seen that the component has no operation persistence. > Should > > we persist its operation? for example, when I have a SessionCluster > > deployment, I need to write its metadata to an external storage system in > > yaml mode, > > such as use mysql for storage. This design idea is similar to etcd in > > k8s.If our k8s Flink Operator application is restarted, We can recover > > metadata information about deployment jobs, clusters, and so on based on > > the database > > > > Best > > ConradJam > > > > On 2022/01/25 05:08:01 Thomas Weise wrote: > > > Hi, > > > > > > As promised in [1] we would like to start the discussion on the > > > addition of a Kubernetes operator to the Flink project as FLIP-212: > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > Please note that the FLIP is currently focussed on the overall > > > direction; the intention is to fill in more details once we converge > > > on the high level plan. > > > > > > Thanks and looking forward to a lively discussion! > > > > > > Thomas > > > > > > [1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4 > > > > > >
[jira] [Created] (FLINK-26269) Add clustering algorithm support for KMeans in ML Python API
Huang Xingbo created FLINK-26269: Summary: Add clustering algorithm support for KMeans in ML Python API Key: FLINK-26269 URL: https://issues.apache.org/jira/browse/FLINK-26269 Project: Flink Issue Type: Sub-task Components: API / Python, Library / Machine Learning Affects Versions: ml-2.1.0 Reporter: Huang Xingbo Fix For: ml-2.1.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26268) Add classfication algorithm support for LogisticRegression, KNN and NaiveBayes in ML Python API
Huang Xingbo created FLINK-26268: Summary: Add classfication algorithm support for LogisticRegression, KNN and NaiveBayes in ML Python API Key: FLINK-26268 URL: https://issues.apache.org/jira/browse/FLINK-26268 Project: Flink Issue Type: Sub-task Components: API / Python, Library / Machine Learning Affects Versions: ml-2.1.0 Reporter: Huang Xingbo Fix For: ml-2.1.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26267) Add common params interface in ML Python API
Huang Xingbo created FLINK-26267: Summary: Add common params interface in ML Python API Key: FLINK-26267 URL: https://issues.apache.org/jira/browse/FLINK-26267 Project: Flink Issue Type: Sub-task Components: API / Python, Library / Machine Learning Affects Versions: ml-2.1.0 Reporter: Huang Xingbo Fix For: ml-2.1.0 We will add the common used params interface of HasDistanceMeasure,HasFeaturesCol,HasGlobalBatchSize,HasHandleInvalid,HasInputCols,HasLabelCol,HasLearningRate,HasMaxIter,HasMultiClass,HasOutputCols,HasPredictionCol,HasRawPredictionCol,HasReg,HasSeed,HasTol and HasWeightCol -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26266) Support Vector and Matrix
Huang Xingbo created FLINK-26266: Summary: Support Vector and Matrix Key: FLINK-26266 URL: https://issues.apache.org/jira/browse/FLINK-26266 Project: Flink Issue Type: Sub-task Components: API / Python, Library / Machine Learning Affects Versions: ml-2.1.0 Reporter: Huang Xingbo Fix For: ml-2.1.0 We will add class of DenseVector, SparseVector, DenseMatrix and SparseMatrix -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26265) Support Java Algorithm in Python ML API
Huang Xingbo created FLINK-26265: Summary: Support Java Algorithm in Python ML API Key: FLINK-26265 URL: https://issues.apache.org/jira/browse/FLINK-26265 Project: Flink Issue Type: New Feature Components: API / Python, Library / Machine Learning Affects Versions: ml-2.1.0 Reporter: Huang Xingbo Assignee: Huang Xingbo Fix For: ml-2.1.0 In Flink ML 2.0, we provide a new version of ML Pipeline API and implement some traditional machine learning algorithms based on the API provided by Java. Although we have also provided the Python Pipeline API in Flink ML 2.0, we do not intend to re-implement these algorithms in the Python API. Instead, we need to provide a mechanism to expose the algorithms implemented by these Java APIs to Python users. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26264) Python Test failed on its docs building phase
Leonard Xu created FLINK-26264: -- Summary: Python Test failed on its docs building phase Key: FLINK-26264 URL: https://issues.apache.org/jira/browse/FLINK-26264 Project: Flink Issue Type: Bug Reporter: Leonard Xu {code:java} Feb 18 15:28:17 installing tox... Feb 18 15:28:19 install tox... [SUCCESS] Feb 18 15:28:19 installing flake8... Feb 18 15:28:21 install flake8... [SUCCESS] Feb 18 15:28:21 installing sphinx... Feb 18 15:28:28 install sphinx... [SUCCESS] Feb 18 15:28:28 installing mypy... Feb 18 15:28:34 install mypy... [SUCCESS] Feb 18 15:28:34 ===install environment... [SUCCESS]=== Feb 18 15:28:34 ===checks starting Feb 18 15:28:34 flake8 checks= Feb 18 15:28:36 ==flake8 checks... [SUCCESS]== Feb 18 15:28:36 =mypy checks== Feb 18 15:28:40 Success: no issues found in 65 source files Feb 18 15:28:41 ===mypy checks... [SUCCESS]=== Feb 18 15:28:41 rm -rf _build/* Feb 18 15:28:41 /__w/1/s/flink-python/dev/.conda/bin/sphinx-build -b html -d _build/doctrees -a -W . _build/html Feb 18 15:28:41 Running Sphinx v2.4.4 Feb 18 15:28:41 Feb 18 15:28:41 Warning, treated as error: Feb 18 15:28:41 node class 'meta' is already registered, its visitors will be overridden Feb 18 15:28:41 Makefile:76: recipe for target 'html' failed Feb 18 15:28:41 make: *** [html] Error 2 Feb 18 15:28:41 ==sphinx checks... [FAILED]=== Feb 18 15:28:41 Process exited with EXIT CODE: 1. Feb 18 15:28:41 Trying to KILL watchdog (2871). /__w/1/s/tools/ci/watchdog.sh: line 100: 2871 Terminated watchdog Feb 18 15:28:41 Searching for .dump, .dumpstream and related files in '/__w/1/s' The STDIO streams did not close within 10 seconds of the exit event from process '/bin/bash'. This may indicate a child process inherited the STDIO streams and has not yet exited. ##[error]Bash exited with code '1'. Finishing: Test - python {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31869&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=c67e71ed-6451-5d26-8920-5a8cf9651901 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26263) Check data size in LogisticRegression
Yunfeng Zhou created FLINK-26263: Summary: Check data size in LogisticRegression Key: FLINK-26263 URL: https://issues.apache.org/jira/browse/FLINK-26263 Project: Flink Issue Type: Bug Components: Library / Machine Learning Affects Versions: ml-2.0.0 Reporter: Yunfeng Zhou In Flink ML LogisticRegression, the algorithm would fail if the parallelism is larger than input data size. For example, in `LogisticRegressionTest.testFitAndPredict()` if we add the following code ```java env.setParallelism(12); ``` Then the test case would fail with the following exception ``` Caused by: java.lang.IllegalArgumentException: bound must be positive at java.base/java.util.Random.nextInt(Random.java:388) at org.apache.flink.ml.classification.logisticregression.LogisticRegression$CacheDataAndDoTrain.getMiniBatchData(LogisticRegression.java:351) at org.apache.flink.ml.classification.logisticregression.LogisticRegression$CacheDataAndDoTrain.onEpochWatermarkIncremented(LogisticRegression.java:381) at org.apache.flink.iteration.operator.AbstractWrapperOperator.notifyEpochWatermarkIncrement(AbstractWrapperOperator.java:129) at org.apache.flink.iteration.operator.allround.AbstractAllRoundWrapperOperator.lambda$1(AbstractAllRoundWrapperOperator.java:105) at org.apache.flink.iteration.operator.OperatorUtils.processOperatorOrUdfIfSatisfy(OperatorUtils.java:79) at org.apache.flink.iteration.operator.allround.AbstractAllRoundWrapperOperator.onEpochWatermarkIncrement(AbstractAllRoundWrapperOperator.java:102) at org.apache.flink.iteration.progresstrack.OperatorEpochWatermarkTracker.tryUpdateLowerBound(OperatorEpochWatermarkTracker.java:79) at org.apache.flink.iteration.progresstrack.OperatorEpochWatermarkTracker.onEpochWatermark(OperatorEpochWatermarkTracker.java:63) at org.apache.flink.iteration.operator.AbstractWrapperOperator.onEpochWatermarkEvent(AbstractWrapperOperator.java:121) at org.apache.flink.iteration.operator.allround.TwoInputAllRoundWrapperOperator.processElement(TwoInputAllRoundWrapperOperator.java:77) at org.apache.flink.iteration.operator.allround.TwoInputAllRoundWrapperOperator.processElement2(TwoInputAllRoundWrapperOperator.java:59) at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord2(StreamTwoInputProcessorFactory.java:225) at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$1(StreamTwoInputProcessorFactory.java:194) at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:266) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) at org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:86) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at java.base/java.lang.Thread.run(Thread.java:834) ``` The cause of this exception is that LogisticRegression has not considered the case when input data size is 0. This can be resolved by adding an additional check. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26262) Rest API to submit PyFlink job
Thanh Thao Huynh created FLINK-26262: Summary: Rest API to submit PyFlink job Key: FLINK-26262 URL: https://issues.apache.org/jira/browse/FLINK-26262 Project: Flink Issue Type: New Feature Components: API / Python Reporter: Thanh Thao Huynh According to this document [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/], there are Rest APIs to work with Flink. I would like to request Rest APIs to work with PyFlink job such as submit Python file,... -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26261) Reconciliation should try to start job when not already started or move to permanent error
Thomas Weise created FLINK-26261: Summary: Reconciliation should try to start job when not already started or move to permanent error Key: FLINK-26261 URL: https://issues.apache.org/jira/browse/FLINK-26261 Project: Flink Issue Type: Sub-task Components: Kubernetes Operator Reporter: Thomas Weise When job submission fails, the operator currently keeps trying to find the job status. In the case I'm looking at the cluster wasn't created because the image could not be resolved. We either need the logic to re-attempt job submission or flag the submission as failed so that JobStatusObserver does not attempt to check again. We should also capture the submission error as event on the CR. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26260) Support watching specific namespace for FlinkDeployments
Gyula Fora created FLINK-26260: -- Summary: Support watching specific namespace for FlinkDeployments Key: FLINK-26260 URL: https://issues.apache.org/jira/browse/FLINK-26260 Project: Flink Issue Type: Sub-task Components: Kubernetes Operator Reporter: Gyula Fora Currently the operator is watching all the namespaces for FlinkDeployments. We should support configuring it to watch a specific namespace instead. cc [~mbalassi] -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [DISCUSS] Plan to externalize connectors and versioning
If we don't make a release, I think it would appear as partially externalized (since the binaries are still only created with Flink core, not from the external repository). I'm wondering you are referring to when you say "it appear[s]". Users don't know about it, and contributors can be easily informed about the current state. Who's opinion are you worried about? doing a release [means] that we have then completely externalized the connector In my mind the connector is completely externalized once the connector project is complete and can act independently from the core repo. That includes having all the code, working CI and the documentation being integrated into the Flink website. And /then/ we can do a release. I don't see how this could work any other way; how could we possibly argue that the connector is externalized when development on the connector isn't even possible in that repository? There are also other connectors (like Opensearch and I believe RabbitMQ) that will end up straight in their own repositories Which is a bit of a different situation because here the only source of this connector will have been that repository. Would you prefer to remove a connector in a Flink patch release? No. I think I misread your statement; when you said that there "is 1 release cycle where the connector both exists in Flink core and the external repo", you are referring to 1.15, correct? (although this should also apply to 1.14 so isn't it 2 releases...?) How I understood it was that we'd keep the connector around until 1.16, which would obviously be terrible. On 19/02/2022 13:30, Martijn Visser wrote: Hi Chesnay, I think the advantage of also doing a release is that we have then completely externalized the connector. If we don't make a release, I think it would appear as partially externalized (since the binaries are still only created with Flink core, not from the external repository). It would also mean that in our documentation we would still point to the binary created with the Flink core release. There are also other connectors (like Opensearch and I believe RabbitMQ) that will end up straight in their own repositories. Since we also would like to document those, I don't think the situation will be messy. We can also solve it with an information hint in the documentation. With regards to point 6, do you have an alternative? Would you prefer to remove a connector in a Flink patch release? Best regards, Martijn On Fri, 18 Feb 2022 at 16:17, Chesnay Schepler wrote: Why do you want to immediately do a release for the elasticsearch connector? What does that provide us? I'd rather first have a fully working setup and integrated documentation before thinking about releasing anything. Once we have that we may be able to externalize all connectors within 1 release cycle and do a clean switch; otherwise we end up with a bit of a messy situation for users where some connectors use version scheme A and others B. I also doubt the value of 6). They'll have to update the version anyway (and discover at some point that the version scheme has changed), so I don't see what this makes easier. On 18/02/2022 14:54, Martijn Visser wrote: Hi everyone, As a follow-up to earlier discussions [1] [2] to externalize the connectors from the Flink repository, I would like to propose a plan to externalize these connectors. The goal of this plan is to start with moving connectors to its own repositories without introducing regressions for connector developers. The plan is as follows: 1. A new repository is requested for a connector. 2. The code for that connector is moved to its individual repository, including the commit history 3. Any first release made for a connector in an external connector repository starts with major version 3, so 3.0.0. The reason for that is that we want to decouple the Flink releases from a connector release. If we would start with major version 2, it could cause some confusion because people could think a Flink 2.0 has been released. This does mean that each connector needs to have a compatibility matrix generated, stating which version number of the connector is compatible with the correct Flink version. 4. The group id and artifact id for the connector will remain the same, only the version is different. 5. The connector dependencies on the Flink website are updated to point to the newly released connector artifact. 6. If a connector is moved, there is one release cycle where there will be binary releases for that connector in both Flink core and from the connector repository. This is to make Flink users who are upgrading slightly easier. We will have to make a note in the release notes that a connector has been moved and that a user should update any references from the original connector artifact (from the Flink connector) to the new connector artifact (from the external conenctor version) We propose to first try to execute this plan for th
Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator
Hi! Thank you for your interest in contributing to the operator. The operator persists information in the status of the FlinkDeployment resource. We should not need any additional persistence layer on top of this in the current design. Could you please give me a concrete example of what is not working with the current design? Thanks, Gyula On Sat, Feb 19, 2022 at 7:06 AM zhengyu chen wrote: > Hi, regarding the construction of k8s Flink Operator, I have already > completed some functions. I hope to contribute this part of the functions > and discuss with the community how to improve it. How should I start? > > So far I have seen that the component has no operation persistence. Should > we persist its operation? for example, when I have a SessionCluster > deployment, I need to write its metadata to an external storage system in > yaml mode, > such as use mysql for storage. This design idea is similar to etcd in > k8s.If our k8s Flink Operator application is restarted, We can recover > metadata information about deployment jobs, clusters, and so on based on > the database > > Best > ConradJam > > On 2022/01/25 05:08:01 Thomas Weise wrote: > > Hi, > > > > As promised in [1] we would like to start the discussion on the > > addition of a Kubernetes operator to the Flink project as FLIP-212: > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > Please note that the FLIP is currently focussed on the overall > > direction; the intention is to fill in more details once we converge > > on the high level plan. > > > > Thanks and looking forward to a lively discussion! > > > > Thomas > > > > [1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4 > > >