[GitHub] [flink] kgrimsby commented on pull request #20685: [FLINK-28429][python] Optimize PyFlink tests
kgrimsby commented on PR #20685: URL: https://github.com/apache/flink/pull/20685#issuecomment-1257563464 Hi, doesn't this commit introduce a bug? I've tried to setup from flink release-1.16 branch and found that the changes in `pyflink/fn_execution/flink_fn_execution_pb2.py` at `DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'...')` will never return a descriptor since the code for `AddSerializedFile` in protbuf <3.18 has no return on that function: ` def AddSerializedFile(self, serialized_file_desc_proto): """Adds the FileDescriptorProto and its types to this pool. Args: serialized_file_desc_proto (bytes): A bytes string, serialization of the :class:`FileDescriptorProto` to add. """ # pylint: disable=g-import-not-at-top from google.protobuf import descriptor_pb2 file_desc_proto = descriptor_pb2.FileDescriptorProto.FromString( serialized_file_desc_proto) self.Add(file_desc_proto) ` Am I missing something here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29315) HDFSTest#testBlobServerRecovery fails on CI
[ https://issues.apache.org/jira/browse/FLINK-29315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609324#comment-17609324 ] Matthias Pohl commented on FLINK-29315: --- Great. Thanks for solving it. [~wangyang0918] I'm wondering what triggered that behavior? Did an automatic update of the kernel to {{3.10.0-1160.62.1.el7.x86_64}} happen recently? > HDFSTest#testBlobServerRecovery fails on CI > --- > > Key: FLINK-29315 > URL: https://issues.apache.org/jira/browse/FLINK-29315 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / FileSystem, Tests >Affects Versions: 1.16.0, 1.15.2 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: pull-request-available > Fix For: 1.16.0, 1.17.0 > > > The test started failing 2 days ago on different branches. I suspect > something's wrong with the CI infrastructure. > {code:java} > Sep 15 09:11:22 [ERROR] Failures: > Sep 15 09:11:22 [ERROR] HDFSTest.testBlobServerRecovery Multiple Failures > (2 failures) > Sep 15 09:11:22 java.lang.AssertionError: Test failed Error while > running command to get file permissions : java.io.IOException: Cannot run > program "ls": error=1, Operation not permitted > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.runCommand(Shell.java:913) > Sep 15 09:11:22 at org.apache.hadoop.util.Shell.run(Shell.java:869) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1246) > Sep 15 09:11:22 at > org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1089) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2580) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2622) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1501) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:851) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:485) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:444) > Sep 15 09:11:22 at > org.apache.flink.hdfstests.HDFSTest.createHDFS(HDFSTest.java:93) > Sep 15 09:11:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 15 09:11:22 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Sep 15 09:11:22 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > Sep 15 09:11:22 ... 67 more > Sep 15 09:11:22 > Sep 15 09:11:22 java.lang.NullPointerException: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] morhidi commented on pull request #384: [FLINK-29159] Harden initial deployment logic
morhidi commented on PR #384: URL: https://github.com/apache/flink-kubernetes-operator/pull/384#issuecomment-1257554675 +1 Thanks, Gyula. I have the feeling though that on long term we should rely on a more consistent state-machine model, instead of introducing new states/substates in utility methods. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-29410) Add checks to guarantee the non-deprecated options not conflicting with standard YAML
Yun Tang created FLINK-29410: Summary: Add checks to guarantee the non-deprecated options not conflicting with standard YAML Key: FLINK-29410 URL: https://issues.apache.org/jira/browse/FLINK-29410 Project: Flink Issue Type: Sub-task Components: Runtime / Configuration, Tests Reporter: Yun Tang To support the standard YAML parser, we add suffixes to all necessary options in FLINK-29372. However, this cannot guarantee that newly added options would still obey this rule. Thus, we should add test checks to guarantee this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-28131) FLIP-168: Speculative Execution for Batch Job
[ https://issues.apache.org/jira/browse/FLINK-28131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-28131. --- Release Note: Speculative execution(FLIP-168) is introduced in Flink 1.16 to mitigate batch job slowness which is caused by problematic nodes. A problematic node may have hardware problems, accident I/O busy, or high CPU load. These problems may make the hosted tasks run much slower than tasks on other nodes, and affect the overall execution time of a batch job. When speculative execution is enabled, Flink will keep detecting slow tasks. Once slow tasks are detected, the nodes that the slow tasks locate in will be identified as problematic nodes and get blocked via the blocklist mechanism(FLIP-224). The scheduler will create new attempts for the slow tasks and deploy them to nodes that are not blocked, while the existing attempts will keep running. The new attempts process the same input data and produce the same data as the original attempt. Once any attempt finishes first, it will be admitted as the only finished attempt of the task, and the remaining attempts of the task will be canceled. Most existing sources can work with speculative execution(FLIP-245). Only if a source uses SourceEvent, it must implement SupportsHandleExecutionAttemptSourceEvent to support speculative execution. Sinks do not support speculative execution yet so that speculative execution will not happen on sinks at the moment. The Web UI & REST API are also improved(FLIP-249) to display multiple concurrent attempts of tasks and blocked task managers. Resolution: Done > FLIP-168: Speculative Execution for Batch Job > - > > Key: FLINK-28131 > URL: https://issues.apache.org/jira/browse/FLINK-28131 > Project: Flink > Issue Type: New Feature > Components: Runtime / Coordination >Reporter: Zhu Zhu >Assignee: Zhu Zhu >Priority: Major > Fix For: 1.16.0 > > > Speculative executions is helpful to mitigate slow tasks caused by > problematic nodes. The basic idea is to start mirror tasks on other nodes > when a slow task is detected. The mirror task processes the same input data > and produces the same data as the original task. > More detailed can be found in > [FLIP-168|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job].] > > This is the umbrella ticket to track all the changes of this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (FLINK-28131) FLIP-168: Speculative Execution for Batch Job
[ https://issues.apache.org/jira/browse/FLINK-28131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reopened FLINK-28131: - > FLIP-168: Speculative Execution for Batch Job > - > > Key: FLINK-28131 > URL: https://issues.apache.org/jira/browse/FLINK-28131 > Project: Flink > Issue Type: New Feature > Components: Runtime / Coordination >Reporter: Zhu Zhu >Assignee: Zhu Zhu >Priority: Major > Fix For: 1.16.0 > > > Speculative executions is helpful to mitigate slow tasks caused by > problematic nodes. The basic idea is to start mirror tasks on other nodes > when a slow task is detected. The mirror task processes the same input data > and produces the same data as the original task. > More detailed can be found in > [FLIP-168|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job].] > > This is the umbrella ticket to track all the changes of this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] fsk119 commented on pull request #20891: [FLINK-29274][hive] Fix ObjectStore leak when different users has dif…
fsk119 commented on PR #20891: URL: https://github.com/apache/flink/pull/20891#issuecomment-1257528078 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] pvary commented on a diff in pull request #382: [FLINK-29392] Trigger error when session job is lost without HA
pvary commented on code in PR #382: URL: https://github.com/apache/flink-kubernetes-operator/pull/382#discussion_r979582956 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/JobStatusObserver.java: ## @@ -72,22 +81,92 @@ public boolean observe( } if (!clusterJobStatuses.isEmpty()) { +// There are jobs on the cluster, we filter the ones for this resource Optional targetJobStatusMessage = filterTargetJob(jobStatus, clusterJobStatuses); + if (targetJobStatusMessage.isEmpty()) { - jobStatus.setState(org.apache.flink.api.common.JobStatus.RECONCILING.name()); +LOG.warn("No matching jobs found on the cluster"); +ifRunningMoveToReconciling(jobStatus, previousJobStatus); +// We could list the jobs but cannot find the one for this resource +if (resource instanceof FlinkDeployment) { +// This should never happen for application clusters, there is something wrong +setUnknownJobError((FlinkDeployment) resource); Review Comment: Could this be normal if we concurrently removed a job as well with a complex configuration change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB
[ https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609298#comment-17609298 ] Donatien commented on FLINK-29402: -- Indeed, with RocksDB API it is easy to add this new option (two new options with {{use_direct_io_for_flush_and_compaction). I am not quite familiar with the process of adding a new option, e.g adding it to the doc, localization, ..., but if you give me some resources about guidelines I'll be happy to create a PR.}} I will edit my ticket later to add some examples with Grafana of the impact of DirectIO on performance. > Add USE_DIRECT_READ configuration parameter for RocksDB > --- > > Key: FLINK-29402 > URL: https://issues.apache.org/jira/browse/FLINK-29402 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.15.2 >Reporter: Donatien >Priority: Not a Priority > Labels: Enhancement, rocksdb > Fix For: 1.15.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > RocksDB allows the use of DirectIO for read operations to bypass the Linux > Page Cache. To understand the impact of Linux Page Cache on performance, one > can run a heavy workload on a single-tasked Task Manager with a container > memory limit identical to the TM process memory. Running this same workload > on a TM with no container memory limit will result in better performances but > with the host memory exceeding the TM requirement. > Linux Page Cache are of course useful but can give false results when > benchmarking the Managed Memory used by RocksDB. DirectIO is typically > enabled for benchmarks on working set estimation [Zwaenepoel et > al.|[https://arxiv.org/abs/1702.04323].] > I propose to add a configuration key allowing users to enable the use of > DirectIO for reads thanks to the RocksDB API. This configuration would be > disabled by default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-29020) Add document for CTAS feature
[ https://issues.apache.org/jira/browse/FLINK-29020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu reassigned FLINK-29020: --- Assignee: tartarus > Add document for CTAS feature > - > > Key: FLINK-29020 > URL: https://issues.apache.org/jira/browse/FLINK-29020 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.16.0 >Reporter: dalongliu >Assignee: tartarus >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29020) Add document for CTAS feature
[ https://issues.apache.org/jira/browse/FLINK-29020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609297#comment-17609297 ] Jark Wu commented on FLINK-29020: - Fixed in - master: 8ce056c59439c1a3cedd6b32c0a98a14febc7ffb > Add document for CTAS feature > - > > Key: FLINK-29020 > URL: https://issues.apache.org/jira/browse/FLINK-29020 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.16.0 >Reporter: dalongliu >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] wuchong merged pull request #20653: [FLINK-29020][docs] add document for CTAS feature
wuchong merged PR #20653: URL: https://github.com/apache/flink/pull/20653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] wangyang0918 closed pull request #20879: Debug CI
wangyang0918 closed pull request #20879: Debug CI URL: https://github.com/apache/flink/pull/20879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-29409) Add Transformer and Estimator for VarianceThresholdSelector
[ https://issues.apache.org/jira/browse/FLINK-29409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-29409: --- Labels: pull-request-available (was: ) > Add Transformer and Estimator for VarianceThresholdSelector > > > Key: FLINK-29409 > URL: https://issues.apache.org/jira/browse/FLINK-29409 > Project: Flink > Issue Type: New Feature > Components: Library / Machine Learning >Reporter: Jiang Xin >Priority: Major > Labels: pull-request-available > > Add Transformer and Estimator for VarianceThresholdSelector -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-ml] jiangxin369 opened a new pull request, #158: [FLINK-29409] Add Transformer and Estimator for VarianceThresholdSelector
jiangxin369 opened a new pull request, #158: URL: https://github.com/apache/flink-ml/pull/158 ## What is the purpose of the change Adding Transformer and Estimator for VarianceThresholdSelector ## Brief change log - Added java/python source/test/example for Transformer and Estimator for VarianceThresholdSelector in Flink ML. ## Does this pull request potentially affect one of the following parts: Dependencies (does it add or upgrade a dependency): (no) The public API, i.e., is any changed class annotated with @public(Evolving): (no) Does this pull request introduce a new feature? (yes) If yes, how is the feature documented? (Java doc) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-29409) Add Transformer and Estimator for VarianceThresholdSelector
Jiang Xin created FLINK-29409: - Summary: Add Transformer and Estimator for VarianceThresholdSelector Key: FLINK-29409 URL: https://issues.apache.org/jira/browse/FLINK-29409 Project: Flink Issue Type: New Feature Components: Library / Machine Learning Reporter: Jiang Xin Add Transformer and Estimator for VarianceThresholdSelector -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29272) Document DataStream API (DataStream to Table) for table store
[ https://issues.apache.org/jira/browse/FLINK-29272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingsong Lee updated FLINK-29272: - Fix Version/s: (was: table-store-0.2.1) > Document DataStream API (DataStream to Table) for table store > - > > Key: FLINK-29272 > URL: https://issues.apache.org/jira/browse/FLINK-29272 > Project: Flink > Issue Type: New Feature > Components: Table Store >Reporter: Jingsong Lee >Priority: Major > Fix For: table-store-0.3.0 > > > We can have documentation to describe how to convert from DataStream to Table > to write to TableStore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] xintongsong closed pull request #20857: [hotfix] Fix the problem that BatchShuffleItCase not subject to configuration.
xintongsong closed pull request #20857: [hotfix] Fix the problem that BatchShuffleItCase not subject to configuration. URL: https://github.com/apache/flink/pull/20857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-29406) Expose Finish Method For TableFunction
[ https://issues.apache.org/jira/browse/FLINK-29406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lincoln lee updated FLINK-29406: Affects Version/s: 1.14.5 > Expose Finish Method For TableFunction > -- > > Key: FLINK-29406 > URL: https://issues.apache.org/jira/browse/FLINK-29406 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.14.5, 1.16.0, 1.15.2 >Reporter: lincoln lee >Priority: Major > Fix For: 1.17.0 > > > FLIP-260: Expose Finish Method For TableFunction > https://cwiki.apache.org/confluence/display/FLINK/FLIP-260%3A+Expose+Finish+Method+For+TableFunction -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-29315) HDFSTest#testBlobServerRecovery fails on CI
[ https://issues.apache.org/jira/browse/FLINK-29315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huang Xingbo resolved FLINK-29315. -- Resolution: Fixed > HDFSTest#testBlobServerRecovery fails on CI > --- > > Key: FLINK-29315 > URL: https://issues.apache.org/jira/browse/FLINK-29315 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / FileSystem, Tests >Affects Versions: 1.16.0, 1.15.2 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: pull-request-available > Fix For: 1.16.0, 1.17.0 > > > The test started failing 2 days ago on different branches. I suspect > something's wrong with the CI infrastructure. > {code:java} > Sep 15 09:11:22 [ERROR] Failures: > Sep 15 09:11:22 [ERROR] HDFSTest.testBlobServerRecovery Multiple Failures > (2 failures) > Sep 15 09:11:22 java.lang.AssertionError: Test failed Error while > running command to get file permissions : java.io.IOException: Cannot run > program "ls": error=1, Operation not permitted > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.runCommand(Shell.java:913) > Sep 15 09:11:22 at org.apache.hadoop.util.Shell.run(Shell.java:869) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1246) > Sep 15 09:11:22 at > org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1089) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2580) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2622) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1501) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:851) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:485) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:444) > Sep 15 09:11:22 at > org.apache.flink.hdfstests.HDFSTest.createHDFS(HDFSTest.java:93) > Sep 15 09:11:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 15 09:11:22 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Sep 15 09:11:22 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > Sep 15 09:11:22 ... 67 more > Sep 15 09:11:22 > Sep 15 09:11:22 java.lang.NullPointerException: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29315) HDFSTest#testBlobServerRecovery fails on CI
[ https://issues.apache.org/jira/browse/FLINK-29315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609282#comment-17609282 ] Huang Xingbo commented on FLINK-29315: -- After upgrading the kernel version, this test has been successfully passed in the latest two days of CI. Thanks to [~wangyang0918], [~chesnay] and [~mapohl] for their help in solving this tricky problem. > HDFSTest#testBlobServerRecovery fails on CI > --- > > Key: FLINK-29315 > URL: https://issues.apache.org/jira/browse/FLINK-29315 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / FileSystem, Tests >Affects Versions: 1.16.0, 1.15.2 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: pull-request-available > Fix For: 1.16.0, 1.17.0 > > > The test started failing 2 days ago on different branches. I suspect > something's wrong with the CI infrastructure. > {code:java} > Sep 15 09:11:22 [ERROR] Failures: > Sep 15 09:11:22 [ERROR] HDFSTest.testBlobServerRecovery Multiple Failures > (2 failures) > Sep 15 09:11:22 java.lang.AssertionError: Test failed Error while > running command to get file permissions : java.io.IOException: Cannot run > program "ls": error=1, Operation not permitted > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.runCommand(Shell.java:913) > Sep 15 09:11:22 at org.apache.hadoop.util.Shell.run(Shell.java:869) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1246) > Sep 15 09:11:22 at > org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1089) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2580) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2622) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1501) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:851) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:485) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:444) > Sep 15 09:11:22 at > org.apache.flink.hdfstests.HDFSTest.createHDFS(HDFSTest.java:93) > Sep 15 09:11:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 15 09:11:22 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Sep 15 09:11:22 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > Sep 15 09:11:22 ... 67 more > Sep 15 09:11:22 > Sep 15 09:11:22 java.lang.NullPointerException: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #20898: [FLINK-29349][table-runtime] Use state ttl instead of timer to clean up state in proctime unbounded over aggregate
flinkbot commented on PR #20898: URL: https://github.com/apache/flink/pull/20898#issuecomment-1257419037 ## CI report: * 2ec2c1645096c01744e1bd7c19fe6535063ee687 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] zezaeoh commented on a diff in pull request #377: [FLINK-28979] Add owner reference to flink deployment object
zezaeoh commented on code in PR #377: URL: https://github.com/apache/flink-kubernetes-operator/pull/377#discussion_r979518665 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java: ## @@ -426,4 +431,17 @@ protected boolean flinkVersionChanged(SPEC oldSpec, SPEC newSpec) { } return false; } + +private void setOwnerReference(CR owner, Configuration deployConfig) { +final Map ownerReference = +Map.of( +"apiVersion", owner.getApiVersion(), +"kind", owner.getKind(), +"name", owner.getMetadata().getName(), +"uid", owner.getMetadata().getUid(), +"blockOwnerDeletion", "false", +"controller", "false"); +deployConfig.set( +KubernetesConfigOptions.JOB_MANAGER_OWNER_REFERENCE, List.of(ownerReference)); +} Review Comment: @gyfora I tried it but, the problem is with the apiVersion and kind Is there a good way to add the apiVersion and kind of CustomResources, if not in the `AbstractFlinkResourceReconciler`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-29349) Use state ttl instead of timer to clean up state in proctime unbounded over aggregate
[ https://issues.apache.org/jira/browse/FLINK-29349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-29349: --- Labels: pull-request-available (was: ) > Use state ttl instead of timer to clean up state in proctime unbounded over > aggregate > - > > Key: FLINK-29349 > URL: https://issues.apache.org/jira/browse/FLINK-29349 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Affects Versions: 1.16.0, 1.15.2 >Reporter: lincoln lee >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > Currently we rely on the timer based state cleaning in proctime over > aggregate, this can be optimized to use state ttl for a more efficienct way -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] lincoln-lil opened a new pull request, #20898: [FLINK-29349][table-runtime] Use state ttl instead of timer to clean up state in proctime unbounded over aggregate
lincoln-lil opened a new pull request, #20898: URL: https://github.com/apache/flink/pull/20898 ## What is the purpose of the change Currently we rely on the timer based state cleaning in proctime unbounded over aggregate, this can be optimized to use state ttl for a more efficienct way ## Brief change log use ttl to replace timer for ProcTimeUnboundedPrecedingFunction ## Verifying this change ProcTimeUnboundedPrecedingFunctionTest ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with @Public(Evolving): (no) - The serializers: (no ) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] wangyang0918 commented on a diff in pull request #383: [FLINK-29384] Bump snakeyaml version to 1.32
wangyang0918 commented on code in PR #383: URL: https://github.com/apache/flink-kubernetes-operator/pull/383#discussion_r979525831 ## flink-kubernetes-operator/pom.xml: ## @@ -47,6 +47,20 @@ under the License. io.fabric8 kubernetes-client ${fabric8.version} + + + +org.yaml +snakeyaml + + + + + + Review Comment: Could you please share me why we include the dependency explicitly and not use the `dependencyManagement`? ## flink-kubernetes-shaded/pom.xml: ## @@ -88,6 +103,7 @@ under the License. META-INF/DEPENDENCIES META-INF/LICENSE META-INF/MANIFEST.MF + org/apache/flink/kubernetes/shaded/org/yaml/snakeyaml/** Review Comment: I am not sure whether it is possible to update the bundled `NOTICE` file to also remove the `org.yaml:snakeyaml:1.27`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB
[ https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609277#comment-17609277 ] Yanfei Lei edited comment on FLINK-29402 at 9/26/22 3:05 AM: - This is a very interesting proposal, I think this is not hard to implement in Flink. From the [wiki|https://github.com/facebook/rocksdb/wiki/Direct-IO] there are two options to control the DirectIO: {{use_direct_reads}} and {{use_direct_io_for_flush_and_compaction, }}and these two options are supported by current{{{} frocksdb-jni(6.20.3){}}}. BTW, do you have quantitative benchmark results about DirectIO *ON* vs DirectIO {*}OFF{*}? was (Author: yanfei lei): This is a very interesting proposal, I think this is not hard to implement in Flink. From the [wiki|https://github.com/facebook/rocksdb/wiki/Direct-IO] there are two options to control the DirectIO: {{use_direct_reads}} and {{use_direct_io_for_flush_and_compaction, }}and these two options are supported by current {{{}frocksdb-jni(6.20.3){}}}. BTW, do you have quantitative benchmark results about DirectIO *ON* vs DirectIO {*}OFF{*}? > Add USE_DIRECT_READ configuration parameter for RocksDB > --- > > Key: FLINK-29402 > URL: https://issues.apache.org/jira/browse/FLINK-29402 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.15.2 >Reporter: Donatien >Priority: Not a Priority > Labels: Enhancement, rocksdb > Fix For: 1.15.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > RocksDB allows the use of DirectIO for read operations to bypass the Linux > Page Cache. To understand the impact of Linux Page Cache on performance, one > can run a heavy workload on a single-tasked Task Manager with a container > memory limit identical to the TM process memory. Running this same workload > on a TM with no container memory limit will result in better performances but > with the host memory exceeding the TM requirement. > Linux Page Cache are of course useful but can give false results when > benchmarking the Managed Memory used by RocksDB. DirectIO is typically > enabled for benchmarks on working set estimation [Zwaenepoel et > al.|[https://arxiv.org/abs/1702.04323].] > I propose to add a configuration key allowing users to enable the use of > DirectIO for reads thanks to the RocksDB API. This configuration would be > disabled by default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB
[ https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609277#comment-17609277 ] Yanfei Lei commented on FLINK-29402: This is a very interesting proposal, I think this is not hard to implement in Flink. From the [wiki|https://github.com/facebook/rocksdb/wiki/Direct-IO] there are two options to control the DirectIO: {{use_direct_reads}} and {{use_direct_io_for_flush_and_compaction, }}and these two options are supported by current {{{}frocksdb-jni(6.20.3){}}}. BTW, do you have quantitative benchmark results about DirectIO *ON* vs DirectIO {*}OFF{*}? > Add USE_DIRECT_READ configuration parameter for RocksDB > --- > > Key: FLINK-29402 > URL: https://issues.apache.org/jira/browse/FLINK-29402 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.15.2 >Reporter: Donatien >Priority: Not a Priority > Labels: Enhancement, rocksdb > Fix For: 1.15.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > RocksDB allows the use of DirectIO for read operations to bypass the Linux > Page Cache. To understand the impact of Linux Page Cache on performance, one > can run a heavy workload on a single-tasked Task Manager with a container > memory limit identical to the TM process memory. Running this same workload > on a TM with no container memory limit will result in better performances but > with the host memory exceeding the TM requirement. > Linux Page Cache are of course useful but can give false results when > benchmarking the Managed Memory used by RocksDB. DirectIO is typically > enabled for benchmarks on working set estimation [Zwaenepoel et > al.|[https://arxiv.org/abs/1702.04323].] > I propose to add a configuration key allowing users to enable the use of > DirectIO for reads thanks to the RocksDB API. This configuration would be > disabled by default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29408) HiveCatalogITCase failed with NPE
Huang Xingbo created FLINK-29408: Summary: HiveCatalogITCase failed with NPE Key: FLINK-29408 URL: https://issues.apache.org/jira/browse/FLINK-29408 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.17.0 Reporter: Huang Xingbo {code:java} 2022-09-25T03:41:07.4212129Z Sep 25 03:41:07 [ERROR] org.apache.flink.table.catalog.hive.HiveCatalogUdfITCase.testFlinkUdf Time elapsed: 0.098 s <<< ERROR! 2022-09-25T03:41:07.4212662Z Sep 25 03:41:07 java.lang.NullPointerException 2022-09-25T03:41:07.4213189Z Sep 25 03:41:07at org.apache.flink.table.catalog.hive.HiveCatalogUdfITCase.testFlinkUdf(HiveCatalogUdfITCase.java:109) 2022-09-25T03:41:07.4213753Z Sep 25 03:41:07at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2022-09-25T03:41:07.4224643Z Sep 25 03:41:07at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2022-09-25T03:41:07.4225311Z Sep 25 03:41:07at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2022-09-25T03:41:07.4225879Z Sep 25 03:41:07at java.lang.reflect.Method.invoke(Method.java:498) 2022-09-25T03:41:07.4226405Z Sep 25 03:41:07at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 2022-09-25T03:41:07.4227201Z Sep 25 03:41:07at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 2022-09-25T03:41:07.4227807Z Sep 25 03:41:07at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 2022-09-25T03:41:07.4228394Z Sep 25 03:41:07at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 2022-09-25T03:41:07.4228966Z Sep 25 03:41:07at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 2022-09-25T03:41:07.4229514Z Sep 25 03:41:07at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-09-25T03:41:07.4230066Z Sep 25 03:41:07at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 2022-09-25T03:41:07.4230587Z Sep 25 03:41:07at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) 2022-09-25T03:41:07.4231258Z Sep 25 03:41:07at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-09-25T03:41:07.4231823Z Sep 25 03:41:07at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) 2022-09-25T03:41:07.4232384Z Sep 25 03:41:07at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 2022-09-25T03:41:07.4232930Z Sep 25 03:41:07at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) 2022-09-25T03:41:07.4233511Z Sep 25 03:41:07at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) 2022-09-25T03:41:07.4234039Z Sep 25 03:41:07at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 2022-09-25T03:41:07.4234546Z Sep 25 03:41:07at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 2022-09-25T03:41:07.4235057Z Sep 25 03:41:07at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 2022-09-25T03:41:07.4235573Z Sep 25 03:41:07at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 2022-09-25T03:41:07.4236087Z Sep 25 03:41:07at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 2022-09-25T03:41:07.4236635Z Sep 25 03:41:07at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 2022-09-25T03:41:07.4237314Z Sep 25 03:41:07at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 2022-09-25T03:41:07.4238211Z Sep 25 03:41:07at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-09-25T03:41:07.4238775Z Sep 25 03:41:07at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-09-25T03:41:07.4239277Z Sep 25 03:41:07at org.junit.rules.RunRules.evaluate(RunRules.java:20) 2022-09-25T03:41:07.4239769Z Sep 25 03:41:07at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-09-25T03:41:07.4240265Z Sep 25 03:41:07at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 2022-09-25T03:41:07.4240731Z Sep 25 03:41:07at org.junit.runner.JUnitCore.run(JUnitCore.java:137) 2022-09-25T03:41:07.4241196Z Sep 25 03:41:07at org.junit.runner.JUnitCore.run(JUnitCore.java:115) 2022-09-25T03:41:07.4241715Z Sep 25 03:41:07at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) 2022-09-25T03:41:07.4242316Z Sep 25 03:41:07at org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) 2022-09-25T03:41:07.4242904Z Sep 25 03:41:07at org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) 2022-09-25T03:41:07.4243528Z S
[jira] [Commented] (FLINK-29387) IntervalJoinITCase.testIntervalJoinSideOutputRightLateData failed with AssertionError
[ https://issues.apache.org/jira/browse/FLINK-29387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609276#comment-17609276 ] Huang Xingbo commented on FLINK-29387: -- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41316&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=0c010d0c-3dec-5bf1-d408-7b18988b1b2b&l=8267 > IntervalJoinITCase.testIntervalJoinSideOutputRightLateData failed with > AssertionError > - > > Key: FLINK-29387 > URL: https://issues.apache.org/jira/browse/FLINK-29387 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.17.0 >Reporter: Huang Xingbo >Priority: Critical > Labels: test-stability > > {code:java} > 2022-09-22T04:40:21.9296331Z Sep 22 04:40:21 [ERROR] > org.apache.flink.test.streaming.runtime.IntervalJoinITCase.testIntervalJoinSideOutputRightLateData > Time elapsed: 2.46 s <<< FAILURE! > 2022-09-22T04:40:21.9297487Z Sep 22 04:40:21 java.lang.AssertionError: > expected:<[(key,2)]> but was:<[]> > 2022-09-22T04:40:21.9298208Z Sep 22 04:40:21 at > org.junit.Assert.fail(Assert.java:89) > 2022-09-22T04:40:21.9298927Z Sep 22 04:40:21 at > org.junit.Assert.failNotEquals(Assert.java:835) > 2022-09-22T04:40:21.9299655Z Sep 22 04:40:21 at > org.junit.Assert.assertEquals(Assert.java:120) > 2022-09-22T04:40:21.9300403Z Sep 22 04:40:21 at > org.junit.Assert.assertEquals(Assert.java:146) > 2022-09-22T04:40:21.9301538Z Sep 22 04:40:21 at > org.apache.flink.test.streaming.runtime.IntervalJoinITCase.expectInAnyOrder(IntervalJoinITCase.java:521) > 2022-09-22T04:40:21.9302578Z Sep 22 04:40:21 at > org.apache.flink.test.streaming.runtime.IntervalJoinITCase.testIntervalJoinSideOutputRightLateData(IntervalJoinITCase.java:280) > 2022-09-22T04:40:21.9303641Z Sep 22 04:40:21 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-09-22T04:40:21.9304472Z Sep 22 04:40:21 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-09-22T04:40:21.9305371Z Sep 22 04:40:21 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-09-22T04:40:21.9306195Z Sep 22 04:40:21 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-09-22T04:40:21.9307011Z Sep 22 04:40:21 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 2022-09-22T04:40:21.9308077Z Sep 22 04:40:21 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2022-09-22T04:40:21.9308968Z Sep 22 04:40:21 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 2022-09-22T04:40:21.9309849Z Sep 22 04:40:21 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2022-09-22T04:40:21.9310704Z Sep 22 04:40:21 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 2022-09-22T04:40:21.9311533Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 2022-09-22T04:40:21.9312386Z Sep 22 04:40:21 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > 2022-09-22T04:40:21.9313231Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > 2022-09-22T04:40:21.9314985Z Sep 22 04:40:21 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > 2022-09-22T04:40:21.9315857Z Sep 22 04:40:21 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > 2022-09-22T04:40:21.9316633Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > 2022-09-22T04:40:21.9317450Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > 2022-09-22T04:40:21.9318209Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > 2022-09-22T04:40:21.9318949Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > 2022-09-22T04:40:21.9319680Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > 2022-09-22T04:40:21.9320401Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 2022-09-22T04:40:21.9321130Z Sep 22 04:40:21 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > 2022-09-22T04:40:21.9321822Z Sep 22 04:40:21 at > org.junit.runner.JUnitCore.run(JUnitCore.java:137) > 2022-09-22T04:40:21.9322498Z Sep 22 04:40:21 at > org.junit.runner.JUnitCore.run(JUnitCore.java:115) > 2022-09-22T04:40:21.9323248Z Sep 22 04:40:21 at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > 2022
[GitHub] [flink] xintongsong commented on pull request #20857: [hotfix] Fix the problem that BatchShuffleItCase not subject to configuration.
xintongsong commented on PR #20857: URL: https://github.com/apache/flink/pull/20857#issuecomment-1257406264 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-28898) ChangelogRecoverySwitchStateBackendITCase.testSwitchFromEnablingToDisablingWithRescalingOut failed
[ https://issues.apache.org/jira/browse/FLINK-28898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609275#comment-17609275 ] Huang Xingbo commented on FLINK-28898: -- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41318&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798 > ChangelogRecoverySwitchStateBackendITCase.testSwitchFromEnablingToDisablingWithRescalingOut > failed > -- > > Key: FLINK-28898 > URL: https://issues.apache.org/jira/browse/FLINK-28898 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.16.0 >Reporter: Xingbo Huang >Assignee: Hangxiang Yu >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.16.0 > > > {code:java} > 2022-08-10T02:48:19.5711924Z Aug 10 02:48:19 [ERROR] > ChangelogRecoverySwitchStateBackendITCase.testSwitchFromEnablingToDisablingWithRescalingOut > Time elapsed: 6.064 s <<< ERROR! > 2022-08-10T02:48:19.5712815Z Aug 10 02:48:19 > org.apache.flink.runtime.JobException: Recovery is suppressed by > FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=0, > backoffTimeMS=0) > 2022-08-10T02:48:19.5714530Z Aug 10 02:48:19 at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139) > 2022-08-10T02:48:19.5716211Z Aug 10 02:48:19 at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:83) > 2022-08-10T02:48:19.5717627Z Aug 10 02:48:19 at > org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:256) > 2022-08-10T02:48:19.5718885Z Aug 10 02:48:19 at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:247) > 2022-08-10T02:48:19.5720430Z Aug 10 02:48:19 at > org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:240) > 2022-08-10T02:48:19.5721733Z Aug 10 02:48:19 at > org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:738) > 2022-08-10T02:48:19.5722680Z Aug 10 02:48:19 at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:715) > 2022-08-10T02:48:19.5723612Z Aug 10 02:48:19 at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78) > 2022-08-10T02:48:19.5724389Z Aug 10 02:48:19 at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:477) > 2022-08-10T02:48:19.5725046Z Aug 10 02:48:19 at > sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > 2022-08-10T02:48:19.5725708Z Aug 10 02:48:19 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-08-10T02:48:19.5726374Z Aug 10 02:48:19 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-08-10T02:48:19.5727065Z Aug 10 02:48:19 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:309) > 2022-08-10T02:48:19.5727932Z Aug 10 02:48:19 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > 2022-08-10T02:48:19.5729087Z Aug 10 02:48:19 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:307) > 2022-08-10T02:48:19.5730134Z Aug 10 02:48:19 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:222) > 2022-08-10T02:48:19.5731536Z Aug 10 02:48:19 at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:84) > 2022-08-10T02:48:19.5732549Z Aug 10 02:48:19 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:168) > 2022-08-10T02:48:19.5735018Z Aug 10 02:48:19 at > akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) > 2022-08-10T02:48:19.5735821Z Aug 10 02:48:19 at > akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) > 2022-08-10T02:48:19.5736465Z Aug 10 02:48:19 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > 2022-08-10T02:48:19.5737234Z Aug 10 02:48:19 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > 2022-08-10T02:48:19.5737895Z Aug 10 02:48:19 at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) > 2022-08-10T02:48:19.5738574Z Aug 10 02:48:19 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > 2022-08-10T02:48:19.5739276Z Aug 10 02:48:19 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > 2
[jira] [Created] (FLINK-29407) HiveCatalogITCase.testCsvTableViaSQL failed with FileNotFoundException
Huang Xingbo created FLINK-29407: Summary: HiveCatalogITCase.testCsvTableViaSQL failed with FileNotFoundException Key: FLINK-29407 URL: https://issues.apache.org/jira/browse/FLINK-29407 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.16.0 Reporter: Huang Xingbo {code:java} 2022-09-25T04:37:47.3992924Z Sep 25 04:37:47 [ERROR] Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 65.785 s <<< FAILURE! - in org.apache.flink.table.catalog.hive.HiveCatalogITCase 2022-09-25T04:37:47.4025795Z Sep 25 04:37:47 [ERROR] org.apache.flink.table.catalog.hive.HiveCatalogITCase.testCsvTableViaSQL Time elapsed: 5.584 s <<< ERROR! 2022-09-25T04:37:47.4026393Z Sep 25 04:37:47 java.lang.RuntimeException: Failed to fetch next result 2022-09-25T04:37:47.4027006Z Sep 25 04:37:47at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:109) 2022-09-25T04:37:47.4046427Z Sep 25 04:37:47at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80) 2022-09-25T04:37:47.4047674Z Sep 25 04:37:47at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222) 2022-09-25T04:37:47.4048848Z Sep 25 04:37:47at java.util.Iterator.forEachRemaining(Iterator.java:115) 2022-09-25T04:37:47.4051301Z Sep 25 04:37:47at org.apache.flink.util.CollectionUtil.iteratorToList(CollectionUtil.java:115) 2022-09-25T04:37:47.4052042Z Sep 25 04:37:47at org.apache.flink.table.catalog.hive.HiveCatalogITCase.testCsvTableViaSQL(HiveCatalogITCase.java:140) 2022-09-25T04:37:47.4052654Z Sep 25 04:37:47at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2022-09-25T04:37:47.4053295Z Sep 25 04:37:47at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2022-09-25T04:37:47.4053909Z Sep 25 04:37:47at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2022-09-25T04:37:47.4054470Z Sep 25 04:37:47at java.lang.reflect.Method.invoke(Method.java:498) 2022-09-25T04:37:47.4055022Z Sep 25 04:37:47at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 2022-09-25T04:37:47.4055655Z Sep 25 04:37:47at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 2022-09-25T04:37:47.4056250Z Sep 25 04:37:47at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 2022-09-25T04:37:47.4056855Z Sep 25 04:37:47at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 2022-09-25T04:37:47.4057453Z Sep 25 04:37:47at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-09-25T04:37:47.4058013Z Sep 25 04:37:47at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-09-25T04:37:47.4058592Z Sep 25 04:37:47at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) 2022-09-25T04:37:47.4059176Z Sep 25 04:37:47at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 2022-09-25T04:37:47.4059731Z Sep 25 04:37:47at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) 2022-09-25T04:37:47.4060350Z Sep 25 04:37:47at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) 2022-09-25T04:37:47.4060907Z Sep 25 04:37:47at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 2022-09-25T04:37:47.4061441Z Sep 25 04:37:47at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 2022-09-25T04:37:47.4062200Z Sep 25 04:37:47at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 2022-09-25T04:37:47.4062736Z Sep 25 04:37:47at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 2022-09-25T04:37:47.4063261Z Sep 25 04:37:47at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 2022-09-25T04:37:47.4063821Z Sep 25 04:37:47at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 2022-09-25T04:37:47.4064407Z Sep 25 04:37:47at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 2022-09-25T04:37:47.4064971Z Sep 25 04:37:47at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-09-25T04:37:47.4065503Z Sep 25 04:37:47at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 2022-09-25T04:37:47.4066162Z Sep 25 04:37:47at org.junit.runner.JUnitCore.run(JUnitCore.java:137) 2022-09-25T04:37:47.4066640Z Sep 25 04:37:47at org.junit.runner.JUnitCore.run(JUnitCore.java:115) 2022-09-25T04:37:47.4067195Z Sep 25 04:37:47at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:
[GitHub] [flink-kubernetes-operator] zezaeoh commented on a diff in pull request #377: [FLINK-28979] Add owner reference to flink deployment object
zezaeoh commented on code in PR #377: URL: https://github.com/apache/flink-kubernetes-operator/pull/377#discussion_r979518665 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java: ## @@ -426,4 +431,17 @@ protected boolean flinkVersionChanged(SPEC oldSpec, SPEC newSpec) { } return false; } + +private void setOwnerReference(CR owner, Configuration deployConfig) { +final Map ownerReference = +Map.of( +"apiVersion", owner.getApiVersion(), +"kind", owner.getKind(), +"name", owner.getMetadata().getName(), +"uid", owner.getMetadata().getUid(), +"blockOwnerDeletion", "false", +"controller", "false"); +deployConfig.set( +KubernetesConfigOptions.JOB_MANAGER_OWNER_REFERENCE, List.of(ownerReference)); +} Review Comment: @gyfora I tried it but, the problem is with the apiVersion and kind Is there a good way to add the apiVersion and kind of a CustomResource, if not in the `AbstractFlinkResourceReconciler`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29405) InputFormatCacheLoaderTest is unstable
[ https://issues.apache.org/jira/browse/FLINK-29405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609272#comment-17609272 ] Huang Xingbo commented on FLINK-29405: -- [~smiralex][~renqs] Could you help take a look? Thx. > InputFormatCacheLoaderTest is unstable > -- > > Key: FLINK-29405 > URL: https://issues.apache.org/jira/browse/FLINK-29405 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / Runtime >Affects Versions: 1.16.0, 1.17.0 >Reporter: Chesnay Schepler >Priority: Critical > > #testExceptionDuringReload/#testCloseAndInterruptDuringReload fail reliably > when run in a loop. > {code} > java.lang.AssertionError: > Expecting AtomicInteger(0) to have value: > 0 > but did not. > at > org.apache.flink.table.runtime.functions.table.fullcache.inputformat.InputFormatCacheLoaderTest.testCloseAndInterruptDuringReload(InputFormatCacheLoaderTest.java:161) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29405) InputFormatCacheLoaderTest is unstable
[ https://issues.apache.org/jira/browse/FLINK-29405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huang Xingbo updated FLINK-29405: - Priority: Critical (was: Major) > InputFormatCacheLoaderTest is unstable > -- > > Key: FLINK-29405 > URL: https://issues.apache.org/jira/browse/FLINK-29405 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / Runtime >Affects Versions: 1.16.0, 1.17.0 >Reporter: Chesnay Schepler >Priority: Critical > > #testExceptionDuringReload/#testCloseAndInterruptDuringReload fail reliably > when run in a loop. > {code} > java.lang.AssertionError: > Expecting AtomicInteger(0) to have value: > 0 > but did not. > at > org.apache.flink.table.runtime.functions.table.fullcache.inputformat.InputFormatCacheLoaderTest.testCloseAndInterruptDuringReload(InputFormatCacheLoaderTest.java:161) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29405) InputFormatCacheLoaderTest is unstable
[ https://issues.apache.org/jira/browse/FLINK-29405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huang Xingbo updated FLINK-29405: - Affects Version/s: 1.16.0 > InputFormatCacheLoaderTest is unstable > -- > > Key: FLINK-29405 > URL: https://issues.apache.org/jira/browse/FLINK-29405 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / Runtime >Affects Versions: 1.16.0, 1.17.0 >Reporter: Chesnay Schepler >Priority: Major > > #testExceptionDuringReload/#testCloseAndInterruptDuringReload fail reliably > when run in a loop. > {code} > java.lang.AssertionError: > Expecting AtomicInteger(0) to have value: > 0 > but did not. > at > org.apache.flink.table.runtime.functions.table.fullcache.inputformat.InputFormatCacheLoaderTest.testCloseAndInterruptDuringReload(InputFormatCacheLoaderTest.java:161) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29405) InputFormatCacheLoaderTest is unstable
[ https://issues.apache.org/jira/browse/FLINK-29405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609271#comment-17609271 ] Huang Xingbo commented on FLINK-29405: -- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41328&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4 1.16 instance > InputFormatCacheLoaderTest is unstable > -- > > Key: FLINK-29405 > URL: https://issues.apache.org/jira/browse/FLINK-29405 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / Runtime >Affects Versions: 1.17.0 >Reporter: Chesnay Schepler >Priority: Major > > #testExceptionDuringReload/#testCloseAndInterruptDuringReload fail reliably > when run in a loop. > {code} > java.lang.AssertionError: > Expecting AtomicInteger(0) to have value: > 0 > but did not. > at > org.apache.flink.table.runtime.functions.table.fullcache.inputformat.InputFormatCacheLoaderTest.testCloseAndInterruptDuringReload(InputFormatCacheLoaderTest.java:161) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-29373) DataStream to table not support BigDecimalTypeInfo
[ https://issues.apache.org/jira/browse/FLINK-29373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingsong Lee reassigned FLINK-29373: Assignee: hk__lrzy > DataStream to table not support BigDecimalTypeInfo > -- > > Key: FLINK-29373 > URL: https://issues.apache.org/jira/browse/FLINK-29373 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.17.0 >Reporter: hk__lrzy >Assignee: hk__lrzy >Priority: Major > Attachments: image-2022-09-21-15-12-11-082.png, > image-2022-09-22-18-08-44-385.png > > > When we try to transfrom datastream to table, *TypeInfoDataTypeConverter* > will try to convert *TypeInformation* to {*}DataType{*}, but if datastream's > produce types contains {*}BigDecimalTypeInfo{*}, *TypeInfoDataTypeConverter* > will final convert it to {*}RawDataType{*},then when we want tranform table > to datastream again, exception will hapend, and show the data type not match. > Blink planner also will has this exception. > !image-2022-09-22-18-08-44-385.png! > > {code:java} > Query schema: [f0: RAW('java.math.BigDecimal', '...')] > Sink schema: [f0: RAW('java.math.BigDecimal', ?)]{code} > how to recurrent > {code:java} > // code placeholder > StreamExecutionEnvironment executionEnvironment = > StreamExecutionEnvironment.getExecutionEnvironment(); > EnvironmentSettings.Builder envBuilder = EnvironmentSettings.newInstance() > .inStreamingMode(); > StreamTableEnvironment streamTableEnvironment = > StreamTableEnvironment.create(executionEnvironment, envBuilder.build()); > FromElementsFunction fromElementsFunction = new FromElementsFunction(new > BigDecimal(1.11D)); > DataStreamSource dataStreamSource = > executionEnvironment.addSource(fromElementsFunction, new > BigDecimalTypeInfo(10, 8)); > streamTableEnvironment.createTemporaryView("tmp", dataStreamSource); > Table table = streamTableEnvironment.sqlQuery("select * from tmp"); > streamTableEnvironment.toRetractStream(table, > table.getSchema().toRowType());{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29373) DataStream to table not support BigDecimalTypeInfo
[ https://issues.apache.org/jira/browse/FLINK-29373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609270#comment-17609270 ] Jingsong Lee commented on FLINK-29373: -- [~hk__lrzy] Thanks, assigned to u. CC [~jark] to review~ > DataStream to table not support BigDecimalTypeInfo > -- > > Key: FLINK-29373 > URL: https://issues.apache.org/jira/browse/FLINK-29373 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.17.0 >Reporter: hk__lrzy >Assignee: hk__lrzy >Priority: Major > Attachments: image-2022-09-21-15-12-11-082.png, > image-2022-09-22-18-08-44-385.png > > > When we try to transfrom datastream to table, *TypeInfoDataTypeConverter* > will try to convert *TypeInformation* to {*}DataType{*}, but if datastream's > produce types contains {*}BigDecimalTypeInfo{*}, *TypeInfoDataTypeConverter* > will final convert it to {*}RawDataType{*},then when we want tranform table > to datastream again, exception will hapend, and show the data type not match. > Blink planner also will has this exception. > !image-2022-09-22-18-08-44-385.png! > > {code:java} > Query schema: [f0: RAW('java.math.BigDecimal', '...')] > Sink schema: [f0: RAW('java.math.BigDecimal', ?)]{code} > how to recurrent > {code:java} > // code placeholder > StreamExecutionEnvironment executionEnvironment = > StreamExecutionEnvironment.getExecutionEnvironment(); > EnvironmentSettings.Builder envBuilder = EnvironmentSettings.newInstance() > .inStreamingMode(); > StreamTableEnvironment streamTableEnvironment = > StreamTableEnvironment.create(executionEnvironment, envBuilder.build()); > FromElementsFunction fromElementsFunction = new FromElementsFunction(new > BigDecimal(1.11D)); > DataStreamSource dataStreamSource = > executionEnvironment.addSource(fromElementsFunction, new > BigDecimalTypeInfo(10, 8)); > streamTableEnvironment.createTemporaryView("tmp", dataStreamSource); > Table table = streamTableEnvironment.sqlQuery("select * from tmp"); > streamTableEnvironment.toRetractStream(table, > table.getSchema().toRowType());{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #298: [FLINK-29291] Change DataFileWriter into a factory to create writers
JingsongLi commented on code in PR #298: URL: https://github.com/apache/flink-table-store/pull/298#discussion_r979506509 ## flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/MergeTreeWriter.java: ## @@ -138,34 +137,45 @@ public void flushMemory() throws Exception { // stop writing, wait for compaction finished trySyncLatestCompaction(true); } + +// write changelog file List extraFiles = new ArrayList<>(); if (changelogProducer == ChangelogProducer.INPUT) { -extraFiles.add( -dataFileWriter -.writeLevel0Changelog( -CloseableIterator.adapterForIterator( -memTable.rawIterator())) -.getName()); +SingleFileWriter writer = writerFactory.createExtraFileWriter(); +writer.write(memTable.rawIterator()); +writer.close(); +extraFiles.add(writer.path().getName()); } -boolean success = false; + +// write lsm level 0 file try { Iterator iterator = memTable.mergeIterator(keyComparator, mergeFunction); -success = -dataFileWriter - .writeLevel0(CloseableIterator.adapterForIterator(iterator)) -.map( -file -> { -DataFileMeta fileMeta = file.copy(extraFiles); -newFiles.add(fileMeta); - compactManager.addNewFile(fileMeta); -return true; -}) -.orElse(false); -} finally { -if (!success) { -extraFiles.forEach(dataFileWriter::delete); +KeyValueDataFileWriter writer = writerFactory.createLevel0Writer(); +writer.write(iterator); +writer.close(); + +// In theory, this fileMeta should contain statistics from both lsm file extra file. +// However for level 0 files, as we do not drop DELETE records, keys appear in one +// file will also appear in the other. So we just need to use statistics from one of +// them. +// +// For value count merge function, it is possible that we have changelog first +// adding one record then remove one record, but after merging this record will not +// appear in lsm file. This is OK because we can also skip this changelog. +DataFileMeta fileMeta = writer.result(); +if (fileMeta != null) { Review Comment: `if (fileMeta == null)`, `extraFiles` is not be deleted? ## flink-table-store-core/src/main/java/org/apache/flink/table/store/file/io/KeyValueFileWriterFactory.java: ## @@ -149,16 +96,35 @@ private KeyValueDataFileWriter createDataFileWriter(int level) { level); } -public void delete(DataFileMeta file) { -delete(file.fileName()); +public SingleFileWriter createExtraFileWriter() { Review Comment: `createChangelogFileWriter`? we may have more extra files in future. ## flink-table-store-core/src/main/java/org/apache/flink/table/store/file/io/KeyValueFileWriterFactory.java: ## @@ -52,20 +44,19 @@ public class DataFileWriter { private final DataFilePathFactory pathFactory; private final long suggestedFileSize; -private DataFileWriter( +private KeyValueFileWriterFactory( long schemaId, RowType keyType, RowType valueType, BulkWriter.Factory writerFactory, -@Nullable FileStatsExtractor fileStatsExtractor, +FileStatsExtractor fileStatsExtractor, Review Comment: why remove nullable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-29406) Expose Finish Method For TableFunction
[ https://issues.apache.org/jira/browse/FLINK-29406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lincoln lee updated FLINK-29406: Component/s: Table SQL / API > Expose Finish Method For TableFunction > -- > > Key: FLINK-29406 > URL: https://issues.apache.org/jira/browse/FLINK-29406 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.16.0, 1.15.2 >Reporter: lincoln lee >Priority: Major > Fix For: 1.17.0 > > > FLIP-260: Expose Finish Method For TableFunction > https://cwiki.apache.org/confluence/display/FLINK/FLIP-260%3A+Expose+Finish+Method+For+TableFunction -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] luoyuxia commented on pull request #20883: [WIP] validate for https://github.com/apache/flink/pull/20882
luoyuxia commented on PR #20883: URL: https://github.com/apache/flink/pull/20883#issuecomment-1257357271 The failure should related to FLINK-29315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] tweise commented on a diff in pull request #385: [FLINK-29109] Avoid checkpoint path conflicts (Flink < 1.16)
tweise commented on code in PR #385: URL: https://github.com/apache/flink-kubernetes-operator/pull/385#discussion_r979497847 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/status/FlinkDeploymentStatus.java: ## @@ -53,4 +53,7 @@ public class FlinkDeploymentStatus extends CommonStatus { /** Information about the TaskManagers for the scale subresource. */ private TaskManagerInfo taskManager; + +/** The jobId, if it was set by the operator. */ +private String overrideJobId; Review Comment: Reusing jobStatus.jobId is not perfect as we are using the same field for two different purposes. But it works pretty well and is better than adding another field to status for something that is going to be phased out when >=1.16 becomes the minimum version. PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-29406) Expose Finish Method For TableFunction
lincoln lee created FLINK-29406: --- Summary: Expose Finish Method For TableFunction Key: FLINK-29406 URL: https://issues.apache.org/jira/browse/FLINK-29406 Project: Flink Issue Type: Improvement Affects Versions: 1.15.2, 1.16.0 Reporter: lincoln lee Fix For: 1.17.0 FLIP-260: Expose Finish Method For TableFunction https://cwiki.apache.org/confluence/display/FLINK/FLIP-260%3A+Expose+Finish+Method+For+TableFunction -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-29389) Update documentation of JDBC and HBase lookup table for new caching options
[ https://issues.apache.org/jira/browse/FLINK-29389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qingsheng Ren resolved FLINK-29389. --- Assignee: Qingsheng Ren Resolution: Fixed > Update documentation of JDBC and HBase lookup table for new caching options > --- > > Key: FLINK-29389 > URL: https://issues.apache.org/jira/browse/FLINK-29389 > Project: Flink > Issue Type: Sub-task > Components: Connectors / HBase, Connectors / JDBC, Documentation >Affects Versions: 1.16.0 >Reporter: Qingsheng Ren >Assignee: Qingsheng Ren >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Update documentation of JDBC and HBase lookup table for new caching options -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-29389) Update documentation of JDBC and HBase lookup table for new caching options
[ https://issues.apache.org/jira/browse/FLINK-29389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609265#comment-17609265 ] Qingsheng Ren edited comment on FLINK-29389 at 9/26/22 1:25 AM: 1.16: 9699ca7ac188d24a3a8d33fc6749b08c10ca85c7 master: 3fa7d03ddad576e99a05ff558e2ee536872a34d2 was (Author: renqs): 1.16: 9699ca7ac188d24a3a8d33fc6749b08c10ca85c7 > Update documentation of JDBC and HBase lookup table for new caching options > --- > > Key: FLINK-29389 > URL: https://issues.apache.org/jira/browse/FLINK-29389 > Project: Flink > Issue Type: Sub-task > Components: Connectors / HBase, Connectors / JDBC, Documentation >Affects Versions: 1.16.0 >Reporter: Qingsheng Ren >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Update documentation of JDBC and HBase lookup table for new caching options -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29389) Update documentation of JDBC and HBase lookup table for new caching options
[ https://issues.apache.org/jira/browse/FLINK-29389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609265#comment-17609265 ] Qingsheng Ren commented on FLINK-29389: --- 1.16: 9699ca7ac188d24a3a8d33fc6749b08c10ca85c7 > Update documentation of JDBC and HBase lookup table for new caching options > --- > > Key: FLINK-29389 > URL: https://issues.apache.org/jira/browse/FLINK-29389 > Project: Flink > Issue Type: Sub-task > Components: Connectors / HBase, Connectors / JDBC, Documentation >Affects Versions: 1.16.0 >Reporter: Qingsheng Ren >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Update documentation of JDBC and HBase lookup table for new caching options -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] PatrickRen merged pull request #20892: [BP-1.16][FLINK-29389][docs] Update documentation of JDBC and HBase lookup table for new caching options
PatrickRen merged PR #20892: URL: https://github.com/apache/flink/pull/20892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] PatrickRen commented on pull request #20892: [BP-1.16][FLINK-29389][docs] Update documentation of JDBC and HBase lookup table for new caching options
PatrickRen commented on PR #20892: URL: https://github.com/apache/flink/pull/20892#issuecomment-1257352406 Merging as this is a doc-only PR. Doc validation has passed on CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] snuyanzin commented on a diff in pull request #20850: [FLINK-20873][Table SQl/API] Update to calcite 1.27
snuyanzin commented on code in PR #20850: URL: https://github.com/apache/flink/pull/20850#discussion_r979468073 ## flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/codegen/InputFormatCodeGenerator.scala: ## @@ -80,6 +80,7 @@ object InputFormatCodeGenerator { @Override public Object nextRecord(Object reuse) { + | ${ctx.reuseLocalVariableCode()} Review Comment: Probably as a result of https://issues.apache.org/jira/browse/CALCITE-4383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29365) Millisecond behind latest jumps after Flink 1.15.2 upgrade
[ https://issues.apache.org/jira/browse/FLINK-29365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609254#comment-17609254 ] Hong Liang Teoh commented on FLINK-29365: - Hi [~wilsonwu], Thanks for reporting this. Indeed, we don't expect millisBehindLatest to jump after a version upgrade. I've tried replicating this issue by upgrading my Kinesis consumer application from 1.14.4 to 1.15.2, but was unable to do so. These are the test scenarios I tried: * tested both Polling and EFO * tested consuming streams without resharding as well as stream with multiple resharding (mix of shards with records, and without records) * tested scaling up job from 2 parallelism to 5 parallelism at the same time as upgrade * start the consumer from both TRIM_HORIZON and AT_TIMESTAMP (this shouldn't matter when restoring from a snapshot anyways) In all these cases, there was no spike in client side millisBehindLatest / server-side IteratorAgeMilliseconds. However, in your case, it seems you were able to reliably replicate the spike in MillisBehindLatest. It is likely that there is some difference in our setup / kinesis stream that causes this issue to happen for your case. To help debug further, would you be able to provide the following, to help root cause the issue? * Logs from the taskmanager during the 1.15.2 -> 1.15.2 change (no spike) and 1.14.4 -> 1.15.2 (with spike). We can compare and contrast the restored state for each shard. * Kinesis streams have [Enhanced metrics|https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-cloudwatch.html#kinesis-metrics-shard]. If you enable it, we can see IteratorAgeMilliseconds for each shard - and we can see which shardId is seeing the spike in MillisBehindLatest. * If possible, it would be nice to have the result from the describe stream call (i.e. {*}aws kinesis describe-stream --stream-name {*}), as this will help us determine the shard parent-child relations. > Millisecond behind latest jumps after Flink 1.15.2 upgrade > -- > > Key: FLINK-29365 > URL: https://issues.apache.org/jira/browse/FLINK-29365 > Project: Flink > Issue Type: Bug > Components: Connectors / Kinesis >Affects Versions: 1.15.2 > Environment: Redeployment from 1.14.4 to 1.15.2 >Reporter: Wilson Wu >Priority: Major > Attachments: Screen Shot 2022-09-19 at 2.50.56 PM.png > > > (First time filling a ticket in Flink community, please let me know if there > are any guidelines I need to follow) > I noticed a very strange behavior with a recent version bump from Flink > 1.14.4 to 1.15.2. My project consumes around 30K records per second from a > sharded kinesis stream, and during the version upgrade, it will follow the > best practice to first trigger a savepoint from the running job, start the > new job from the savepoint and then remove the old job. So far so good, and > the above logic has been tested multiple times without any issue for 1.14.4. > Usually, after the version upgrade, our job will have a few minutes delay for > millisecond behind latest, but it will catch up with the speed quickly(within > 30mins). Our savepoint is around one hundred MBs big, and our job DAG will > become 90 - 100% busy with some backpressure when we redeploy but after 10-20 > minutes it goes back to normal. > Then the strange thing happened, when I tried to redeploy with 1.15.2 upgrade > from a running 1.14.4 job, I can see a savepoint has been created and the new > job is running, all the metrics look fine, except suddenly [millisecond > behind the > latest|https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html] > jumps to 10 hours!! and it takes days for my application to catch up with > the kinesis stream latest record. I don't understand why it jumps from 0 > second to 10+ hours when we restart the new job. The only main change I > introduced with version bump is to change > [failOnError|https://nightlies.apache.org/flink/flink-docs-release-1.15/api/java/org/apache/flink/connector/kinesis/sink/KinesisStreamsSink.html] > from true to false, but I don't think this is the root cause. > I tried to redeploy the new 1.15.2 job by changing our parallelism, > redeploying a job from 1.15.2 does not introduce a big delay, so I assume the > issue above only happens when we bump version from 1.14.4 to 1.15.2(note the > attached screenshot)? I did try to bump it twice and I see the same 10hrs+ > jump in delay, we do not have changes related to any timezones. > Please let me know if this can be filled as a bug, as I do not have a running > project with all the kinesis setup available that can reproduce the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-29131) Kubernetes operator webhook can use hostPort
[ https://issues.apache.org/jira/browse/FLINK-29131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609248#comment-17609248 ] Dylan Meissner edited comment on FLINK-29131 at 9/25/22 8:54 PM: - The Helm chart changes dramatically to do this work. I'm not even clear if it could cleanly upgrade. Testing many upgrade scenarios is intimidating. When I first brought up this idea in Slack we agreed _if you have time opening a JIRA ticket and a minimal PR to address it, we would be happy to review and merge this_ Do we still envision a small change? was (Author: dylanmei): The Helm chart changes dramatically to do this work. I'm not even clear if it could cleanly upgrade. Testing may upgrade scenarios is intimidating. When I first brought up this idea in Slack we agreed _if you have time opening a JIRA ticket and a minimal PR to address it, we would be happy to review and merge this_ Do we still envision a small change? > Kubernetes operator webhook can use hostPort > > > Key: FLINK-29131 > URL: https://issues.apache.org/jira/browse/FLINK-29131 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.1.0 >Reporter: Dylan Meissner >Assignee: Dylan Meissner >Priority: Minor > > When running Flink operator on EKS cluster with Calico networking the > control-plane (managed by AWS) cannot reach the webhook. Requests to create > Flink resources fail with {_}Address is not allowed{_}. > When the webhook listens on hostPort the requests to create Flink resources > are successful. However, a pod security policy is generally required to allow > webhook to listen on such ports. > To support this scenario with the Helm chart make changes so that we can > * Specify a hostPort value for the webhook > * Name the port that the webhook listens on > * Use the named port in the webhook service > * Add a "use" pod security policy verb to cluster role -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29131) Kubernetes operator webhook can use hostPort
[ https://issues.apache.org/jira/browse/FLINK-29131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609248#comment-17609248 ] Dylan Meissner commented on FLINK-29131: The Helm chart changes dramatically to do this work. I'm not even clear if it could cleanly upgrade. Testing may upgrade scenarios is intimidating. When I first brought up this idea in Slack we agreed _if you have time opening a JIRA ticket and a minimal PR to address it, we would be happy to review and merge this_ Do we still envision a small change? > Kubernetes operator webhook can use hostPort > > > Key: FLINK-29131 > URL: https://issues.apache.org/jira/browse/FLINK-29131 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.1.0 >Reporter: Dylan Meissner >Assignee: Dylan Meissner >Priority: Minor > > When running Flink operator on EKS cluster with Calico networking the > control-plane (managed by AWS) cannot reach the webhook. Requests to create > Flink resources fail with {_}Address is not allowed{_}. > When the webhook listens on hostPort the requests to create Flink resources > are successful. However, a pod security policy is generally required to allow > webhook to listen on such ports. > To support this scenario with the Helm chart make changes so that we can > * Specify a hostPort value for the webhook > * Name the port that the webhook listens on > * Use the named port in the webhook service > * Add a "use" pod security policy verb to cluster role -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #385: [FLINK-29109] Avoid checkpoint path conflicts (Flink < 1.16)
gyfora commented on code in PR #385: URL: https://github.com/apache/flink-kubernetes-operator/pull/385#discussion_r979441596 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/status/FlinkDeploymentStatus.java: ## @@ -53,4 +53,7 @@ public class FlinkDeploymentStatus extends CommonStatus { /** Information about the TaskManagers for the scale subresource. */ private TaskManagerInfo taskManager; + +/** The jobId, if it was set by the operator. */ +private String overrideJobId; Review Comment: All the current logic uses the jobStatus.jobId for savepointing, cancelling, observing etc, it would avoid a lot of complications :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #385: [FLINK-29109] Avoid checkpoint path conflicts (Flink < 1.16)
gyfora commented on code in PR #385: URL: https://github.com/apache/flink-kubernetes-operator/pull/385#discussion_r979441460 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/status/FlinkDeploymentStatus.java: ## @@ -53,4 +53,7 @@ public class FlinkDeploymentStatus extends CommonStatus { /** Information about the TaskManagers for the scale subresource. */ private TaskManagerInfo taskManager; + +/** The jobId, if it was set by the operator. */ +private String overrideJobId; Review Comment: I think we should simply set it into the jobStatus.jobId and avoid introducing an extra field for it. As long as the job is running we should assume that the jobStatus reflects the correct jobid -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-29109) Checkpoint path conflict with stateless upgrade mode
[ https://issues.apache.org/jira/browse/FLINK-29109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-29109: --- Labels: pull-request-available (was: ) > Checkpoint path conflict with stateless upgrade mode > > > Key: FLINK-29109 > URL: https://issues.apache.org/jira/browse/FLINK-29109 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.1.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: pull-request-available > > A stateful job with stateless upgrade mode (yes, there are such use cases) > fails with checkpoint path conflict due to constant jobId and FLINK-19358 > (applies to Flink < 1.16x). Since with stateless upgrade mode the checkpoint > id resets on restart the job is going to write to previously used locations > and fail. The workaround is to rotate the jobId on every redeploy when the > upgrade mode is stateless. While this can be worked around externally it is > best done in the operator itself because reconciliation resolves when a > restart is actually required while rotating jobId externally may trigger > unnecessary restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] tweise opened a new pull request, #385: [FLINK-29109] Avoid checkpoint path conflicts (Flink < 1.16)
tweise opened a new pull request, #385: URL: https://github.com/apache/flink-kubernetes-operator/pull/385 ## What is the purpose of the change Automatically set jobId for Flink version < 1.16 to avoid conflicts in checkpoint path (and other jobId based effects). TODO: * Unit test * This breaks stateful upgrade if the CRD is not updated * We may want a better way to accommodate this in the status in the future rather than adding individual fields. Should we add a map that can be used for various purposes? ## Brief change log * Set a jobId when no jobID is defined by user and Flink version is < 1.16. * Rotate jobId when upgrade mode is stateless ## Verifying this change TODO: This change added tests and can be verified as follows: Manual verification: * No jobId set in Flink config * Set upgrade mode to stateless. Trigger CR changes and check that every redeploy uses a new ID. * Set upgrade mode to last_state. Thrigger CR changes and check that ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changes to the `CustomResourceDescriptors`: (**yes** / no) - Core observer or reconciler logic that is regularly executed: (yes / no) Field added to status and CRD needs to be updated for this change to work. If the CRD is not updated, the last state upgrade mode breaks. ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #20897: [hotfix][docs] Fix typo in try-flink/datastream.md
flinkbot commented on PR #20897: URL: https://github.com/apache/flink/pull/20897#issuecomment-1257241665 ## CI report: * 6c41532ee60e7e4c9471b2a2216aeb25fe9aa568 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] saikikun opened a new pull request, #20897: [hotfix][docs] Fix typo in try-flink/datastream.md
saikikun opened a new pull request, #20897: URL: https://github.com/apache/flink/pull/20897 Fix typo in try-flink/datastream.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-29405) InputFormatCacheLoaderTest is unstable
Chesnay Schepler created FLINK-29405: Summary: InputFormatCacheLoaderTest is unstable Key: FLINK-29405 URL: https://issues.apache.org/jira/browse/FLINK-29405 Project: Flink Issue Type: Technical Debt Components: Table SQL / Runtime Affects Versions: 1.17.0 Reporter: Chesnay Schepler #testExceptionDuringReload/#testCloseAndInterruptDuringReload fail reliably when run in a loop. {code} java.lang.AssertionError: Expecting AtomicInteger(0) to have value: 0 but did not. at org.apache.flink.table.runtime.functions.table.fullcache.inputformat.InputFormatCacheLoaderTest.testCloseAndInterruptDuringReload(InputFormatCacheLoaderTest.java:161) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #384: [FLINK-29159] Harden initial deployment logic
gyfora commented on PR #384: URL: https://github.com/apache/flink-kubernetes-operator/pull/384#issuecomment-1257210648 cc @morhidi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-29159) Revisit/harden initial deployment logic
[ https://issues.apache.org/jira/browse/FLINK-29159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-29159: --- Labels: pull-request-available (was: ) > Revisit/harden initial deployment logic > --- > > Key: FLINK-29159 > URL: https://issues.apache.org/jira/browse/FLINK-29159 > Project: Flink > Issue Type: Technical Debt > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.1.0 >Reporter: Thomas Weise >Assignee: Gyula Fora >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.2.0 > > > Found isFirstDeployment logic not working as expected for a deployment that > had never successfully deployed (image pull error). We are probably also > lacking test coverage for the initialSavepointPath field. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #384: [FLINK-29159] Harden initial deployment logic
gyfora opened a new pull request, #384: URL: https://github.com/apache/flink-kubernetes-operator/pull/384 ## What is the purpose of the change Fixes a small flaw in the initialDeployment logic that made it useless after the job was actually submitted. This also simplifies code in other places. ## Brief change log - *Rename ReconStatus#isFirstDeployment to isBeforeFirstDeployment to better reflect logic* - *Fix reconciliation metadata isFirstDeployment computation* - *Add some tests* ## Verifying this change Partly covered by existing tests + new tests added for the fixed logic ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no - Core observer or reconciler logic that is regularly executed: yes ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #383: [FLINK-29384] Bump snakeyaml version to 1.32
gyfora commented on code in PR #383: URL: https://github.com/apache/flink-kubernetes-operator/pull/383#discussion_r979406834 ## flink-kubernetes-shaded/pom.xml: ## @@ -88,6 +103,7 @@ under the License. META-INF/DEPENDENCIES META-INF/LICENSE META-INF/MANIFEST.MF + org/apache/flink/kubernetes/shaded/org/yaml/snakeyaml/** Review Comment: Wouldn't this line be the only change we need for `flink-kubernetes-shaded`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29315) HDFSTest#testBlobServerRecovery fails on CI
[ https://issues.apache.org/jira/browse/FLINK-29315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609156#comment-17609156 ] Yang Wang commented on FLINK-29315: --- After more debugging with strace, I found that it might be related with kernel version. Then I upgraded the kernel version for all the CI machines from {{3.10.0-1160.62.1.el7.x86_64}} to {{3.10.0-1160.76.1.el7.x86_64}}. It seems that the CI could pass now. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41323&view=results > HDFSTest#testBlobServerRecovery fails on CI > --- > > Key: FLINK-29315 > URL: https://issues.apache.org/jira/browse/FLINK-29315 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / FileSystem, Tests >Affects Versions: 1.16.0, 1.15.2 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: pull-request-available > Fix For: 1.16.0, 1.17.0 > > > The test started failing 2 days ago on different branches. I suspect > something's wrong with the CI infrastructure. > {code:java} > Sep 15 09:11:22 [ERROR] Failures: > Sep 15 09:11:22 [ERROR] HDFSTest.testBlobServerRecovery Multiple Failures > (2 failures) > Sep 15 09:11:22 java.lang.AssertionError: Test failed Error while > running command to get file permissions : java.io.IOException: Cannot run > program "ls": error=1, Operation not permitted > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.runCommand(Shell.java:913) > Sep 15 09:11:22 at org.apache.hadoop.util.Shell.run(Shell.java:869) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1264) > Sep 15 09:11:22 at > org.apache.hadoop.util.Shell.execCommand(Shell.java:1246) > Sep 15 09:11:22 at > org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1089) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697) > Sep 15 09:11:22 at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141) > Sep 15 09:11:22 at > org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2580) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2622) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1501) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:851) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:485) > Sep 15 09:11:22 at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:444) > Sep 15 09:11:22 at > org.apache.flink.hdfstests.HDFSTest.createHDFS(HDFSTest.java:93) > Sep 15 09:11:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 15 09:11:22 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Sep 15 09:11:22 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Sep 15 09:11:22 at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > Sep 15 09:11:22 ... 67 more > Sep 15 09:11:22 > Sep 15 09:11:22 java.lang.NullPointerException: > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #20896: Update Event Time Temporal Join .md
flinkbot commented on PR #20896: URL: https://github.com/apache/flink/pull/20896#issuecomment-1257183323 ## CI report: * 61ad4277694a7a12e8befb414e8a089d7df85fdb UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] nicai008 opened a new pull request, #20896: Update Event Time Temporal Join .md
nicai008 opened a new pull request, #20896: URL: https://github.com/apache/flink/pull/20896 Event Time Temporal Join 的例子中 在关联查询的时候 两个表都有 currency 字段 ,应该将查询中的‘currency ’ 改为orders.currency, ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #20895: Update datastream_api.md
flinkbot commented on PR #20895: URL: https://github.com/apache/flink/pull/20895#issuecomment-1257155574 ## CI report: * 83d8ee9181512a6466e2eaddf59d113ec26c3cdd UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zs19940317 opened a new pull request, #20895: Update datastream_api.md
zs19940317 opened a new pull request, #20895: URL: https://github.com/apache/flink/pull/20895 translate a sentence into chinese, maybe this sentence was left when other people translated it. This sentence is `In production, commonly used sinks include the StreamingFileSink, various databases, and several pub-sub systems` ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org