[jira] [Created] (FLINK-35159) CreatingExecutionGraph can leak CheckpointCoordinator and cause JM crash
Chesnay Schepler created FLINK-35159: Summary: CreatingExecutionGraph can leak CheckpointCoordinator and cause JM crash Key: FLINK-35159 URL: https://issues.apache.org/jira/browse/FLINK-35159 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.2, 1.20.0, 1.19.1 When a task manager dies while the JM is generating an ExecutionGraph in the background then {{CreatingExecutionGraph#handleExecutionGraphCreation}} can transition back into WaitingForResources if the TM hosted one of the slots that we planned to use in {{tryToAssignSlots}}. At this point the ExecutionGraph was already transitioned to running, which implicitly kicks of periodic checkpointing by the CheckpointCoordinator, without the operator coordinator holders being initialized yet (as this happens after we assigned slots). This effectively leaks that CheckpointCoordinator, including the timer thread that will continue to try triggering checkpoints, which will naturally fail to trigger. This can cause a JM crash because it results in {{OperatorCoordinatorHolder#abortCurrentTriggering}} to be called, which fails with an NPE since the {{mainThreadExecutor}} was not initialized yet. {code} java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: java.lang.NullPointerException at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$8(CheckpointCoordinator.java:707) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:910) at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.util.concurrent.CompletionException: java.lang.NullPointerException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ... 7 more Caused by: java.lang.NullPointerException at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.abortCurrentTriggering(OperatorCoordinatorHolder.java:388) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:985) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:961) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$7(CheckpointCoordinator.java:693) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ... 8 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService
Chesnay Schepler created FLINK-34672: Summary: HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService Key: FLINK-34672 URL: https://issues.apache.org/jira/browse/FLINK-34672 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1 Reporter: Chesnay Schepler Fix For: 1.18.2, 1.20.0, 1.19.1 We recently observed a deadlock in the JM within the HA system. (see below for the thread dump) [~mapohl] and I looked a bit into it and there appears to be a race condition when leadership is revoked while a JobMaster is being started. It appears to be caused by {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} forwarding futures while holding a lock; depending on whether the forwarded future is already complete the next stage may or may not run while holding that same lock. We haven't determined yet whether we should be holding that lock or not. {{code}} "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 nid=0x19d waiting for monitor entry [0x7f53084fd000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462) - waiting to lock <0xf1c0e088> (a java.lang.Object) at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown Source) at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown Source) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549) - locked <0xf0e3f4d8> (a java.lang.Object) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown Source) at java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.22/Thread.java:829) {{code}} {{code}} "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry [0x7f530567d000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366) - waiting to lock <0xf0e3f4d8> (a java.lang.Object) at org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52) at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509) at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520) - locked <0xf1c0e088> (a java.lang.Object) at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837) at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.22/CompletableFuture.java:506) at java.util.concurrent.CompletableFuture.complete(java.base@11.0.22/CompletableFuture.java:2079) at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.registerJobMasterServiceFutures(DefaultJobMasterServiceProcess.java:124) at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:114) at
[jira] [Created] (FLINK-34640) Replace DummyMetricGroup usage with UnregisteredMetricsGroup
Chesnay Schepler created FLINK-34640: Summary: Replace DummyMetricGroup usage with UnregisteredMetricsGroup Key: FLINK-34640 URL: https://issues.apache.org/jira/browse/FLINK-34640 Project: Flink Issue Type: Technical Debt Components: Runtime / Metrics, Tests Reporter: Chesnay Schepler Fix For: 1.20.0 The {{DummyMetricGroup}} is terrible because it is decidedly unsafe to use. Use the {{UnregisteredMetricsGroup}} instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34499) Configuration#toString should hide sensitive values
Chesnay Schepler created FLINK-34499: Summary: Configuration#toString should hide sensitive values Key: FLINK-34499 URL: https://issues.apache.org/jira/browse/FLINK-34499 Project: Flink Issue Type: Improvement Components: Runtime / Configuration Reporter: Chesnay Schepler Fix For: 1.20.0 Time and time again people log the entire Flink configuration for no reason, risking that sensitive values are logged in plain text. We should make this harder by changing {{Configuration#toString}} to automatically hide sensitive values, for example like this: {code} @Override public String toString() { return ConfigurationUtils .hideSensitiveValues(this.confData.entrySet().stream().collect( Collectors.toMap(Map.Entry::getKey, entry -> entry.getValue().toString( .toString(); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34498) GSFileSystemFactory logs full Flink config
Chesnay Schepler created FLINK-34498: Summary: GSFileSystemFactory logs full Flink config Key: FLINK-34498 URL: https://issues.apache.org/jira/browse/FLINK-34498 Project: Flink Issue Type: Bug Components: Connectors / FileSystem Affects Versions: 1.18.1 Reporter: Chesnay Schepler Fix For: 1.19.0, 1.18.2, 1.20.0 This can cause secrets from the config to be logged. {code} @Override public void configure(Configuration flinkConfig) { LOGGER.info("Configuring GSFileSystemFactory with Flink configuration {}", flinkConfig); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34496) Classloading deadlock between ExecNodeMetadataUtil and JsonSerdeUtil
Chesnay Schepler created FLINK-34496: Summary: Classloading deadlock between ExecNodeMetadataUtil and JsonSerdeUtil Key: FLINK-34496 URL: https://issues.apache.org/jira/browse/FLINK-34496 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.18.1 Reporter: Chesnay Schepler Fix For: 1.19.0, 1.18.2, 1.20.0 This is a fun one! ExecNodeMetadataUtil and JsonSerdeUtil have a circular dependency in their static initialization, which can cause a classloading lockup when 2 threads are running the class initialization of each class at the same time because during class initialization they hold a lock. {code} Feb 22 00:31:58 "ForkJoinPool-3-worker-11" #25 daemon prio=5 os_prio=0 cpu=219.87ms elapsed=995.99s tid=0x7ff11c50e000 nid=0xf0fc in Object.wait() [0x7ff12a4f3000] Feb 22 00:31:58java.lang.Thread.State: RUNNABLE Feb 22 00:31:58 at org.apache.flink.table.planner.plan.nodes.exec.serde.JsonSerdeUtil.createFlinkTableJacksonModule(JsonSerdeUtil.java:133) Feb 22 00:31:58 at org.apache.flink.table.planner.plan.nodes.exec.serde.JsonSerdeUtil.(JsonSerdeUtil.java:111) Feb 22 00:31:58 "ForkJoinPool-3-worker-7" #23 daemon prio=5 os_prio=0 cpu=54.83ms elapsed=996.00s tid=0x7ff11c50c000 nid=0xf0fb in Object.wait() [0x7ff12a5f4000] Feb 22 00:31:58java.lang.Thread.State: RUNNABLE Feb 22 00:31:58 at org.apache.flink.table.planner.plan.utils.ExecNodeMetadataUtil.addToLookupMap(ExecNodeMetadataUtil.java:235) Feb 22 00:31:58 at org.apache.flink.table.planner.plan.utils.ExecNodeMetadataUtil.(ExecNodeMetadataUtil.java:156) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34485) Token delegation doesn't work with Presto S3 filesystem
Chesnay Schepler created FLINK-34485: Summary: Token delegation doesn't work with Presto S3 filesystem Key: FLINK-34485 URL: https://issues.apache.org/jira/browse/FLINK-34485 Project: Flink Issue Type: Bug Components: Connectors / FileSystem Affects Versions: 1.18.1 Reporter: Chesnay Schepler Fix For: 1.20.0 AFAICT it's not possible to use token delegation with the Presto filesystem. The token delegation relies on the {{DynamicTemporaryAWSCredentialsProvider}}, but it doesn't have a constructor that presto required (ruling out presto.s3.credentials-provider), and other providers can't be used due to FLINK-13602. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34431) Move static BlobWriter methods to separate util
Chesnay Schepler created FLINK-34431: Summary: Move static BlobWriter methods to separate util Key: FLINK-34431 URL: https://issues.apache.org/jira/browse/FLINK-34431 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.20.0 The BlobWriter interface contains several static methods, some being used, others being de-facto internal methods. We should move these into a dedicated BlobWriterUtils class so we can properly deal with method visibility. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34422) BatchTestBase doesn't actually use MiniClusterExtension
Chesnay Schepler created FLINK-34422: Summary: BatchTestBase doesn't actually use MiniClusterExtension Key: FLINK-34422 URL: https://issues.apache.org/jira/browse/FLINK-34422 Project: Flink Issue Type: Technical Debt Components: Test Infrastructure Affects Versions: 1.18.1 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0, 1.18.2, 1.20.0 BatchTestBase sets up a table environment in instance fields, which runs before the BeforeEachCallback from the MiniClusterExtension has time to run. As a result _all_ test extending the BatchTestBase are spawning separate mini clusters for every single job. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34421) Skip post-compile checks in compile.sh if fast profile is active
Chesnay Schepler created FLINK-34421: Summary: Skip post-compile checks in compile.sh if fast profile is active Key: FLINK-34421 URL: https://issues.apache.org/jira/browse/FLINK-34421 Project: Flink Issue Type: Improvement Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0, 1.18.2, 1.20.0 We currently waste time in our e2e tests, re-running a bunch of post-compile checks (like packaging/licensing). Let's couple this to the -Dfast/-Pfast switches. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34397) Resource wait timeout can't be disabled
Chesnay Schepler created FLINK-34397: Summary: Resource wait timeout can't be disabled Key: FLINK-34397 URL: https://issues.apache.org/jira/browse/FLINK-34397 Project: Flink Issue Type: Bug Components: Runtime / Configuration Affects Versions: 1.17.2 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0, 1.17.3, 1.18.2 The documentation for {{jobmanager.adaptive-scheduler.resource-wait-timeout}} states that: ??Setting a negative duration will disable the resource timeout: The JobManager will wait indefinitely for resources to appear.?? However, we don't support parsing negative durations. {code} Could not parse value '-1 s' for key 'jobmanager.adaptive-scheduler.resource-wait-timeout'. Caused by: java.lang.NumberFormatException: text does not start with a number at org.apache.flink.util.TimeUtils.parseDuration(TimeUtils.java:80) at org.apache.flink.configuration.ConfigurationUtils.convertToDuration(ConfigurationUtils.java:399) at org.apache.flink.configuration.ConfigurationUtils.convertValue(ConfigurationUtils.java:331) at org.apache.flink.configuration.Configuration.lambda$getOptional$3(Configuration.java:729) at java.base/java.util.Optional.map(Optional.java:260) at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:729) ... 2 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34286) Attach cluster config map labels at creation time
Chesnay Schepler created FLINK-34286: Summary: Attach cluster config map labels at creation time Key: FLINK-34286 URL: https://issues.apache.org/jira/browse/FLINK-34286 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 We attach a set of labels to config maps that we create to ease the manual cleanup by users in case Flink fails unrecoverably. For cluster config maps (that are used for leader election), these labels are not set at creation time, but when leadership is acquired, in contrast to job config maps. This means there's a gap where we create a CM without any labels being attached, and should Flink fail before leadership can be acquired it will continue to lack labels indefinitely. AFAICT it should be straight-forward, at least API-wise, to set these labels at creation time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34097) Remove unused JobMasterGateway#requestJobDetails
Chesnay Schepler created FLINK-34097: Summary: Remove unused JobMasterGateway#requestJobDetails Key: FLINK-34097 URL: https://issues.apache.org/jira/browse/FLINK-34097 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 This method is wired all the way to the scheduler; remove it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34004) TestingCheckpointIDCounter can easily lead to NPEs
Chesnay Schepler created FLINK-34004: Summary: TestingCheckpointIDCounter can easily lead to NPEs Key: FLINK-34004 URL: https://issues.apache.org/jira/browse/FLINK-34004 Project: Flink Issue Type: Technical Debt Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 The TestingCheckpointIDCounter builder doesn't define safe defaults for all builder parameters. Using it can easily lead to surprising null pointer exceptions in tests when code is being modified to call more methods. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33352) OpenAPI spec is lacking mappings for discriminator properties
Chesnay Schepler created FLINK-33352: Summary: OpenAPI spec is lacking mappings for discriminator properties Key: FLINK-33352 URL: https://issues.apache.org/jira/browse/FLINK-33352 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.17.2, 1.19.0, 1.18.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32888) File upload runs into EndOfDataDecoderException
Chesnay Schepler created FLINK-32888: Summary: File upload runs into EndOfDataDecoderException Key: FLINK-32888 URL: https://issues.apache.org/jira/browse/FLINK-32888 Project: Flink Issue Type: Bug Components: Runtime / REST Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.2 With the right request the FIleUploadHandler runs into a EndOfDataDecoderException although everything is fine. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32834) Allow compile.sh to be used manually
Chesnay Schepler created FLINK-32834: Summary: Allow compile.sh to be used manually Key: FLINK-32834 URL: https://issues.apache.org/jira/browse/FLINK-32834 Project: Flink Issue Type: Improvement Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 For debugging purposes it would be nice if you could run compile.sh locally. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32745) Add a flag to skip InputSelectable preValidate step
Chesnay Schepler created FLINK-32745: Summary: Add a flag to skip InputSelectable preValidate step Key: FLINK-32745 URL: https://issues.apache.org/jira/browse/FLINK-32745 Project: Flink Issue Type: Improvement Components: API / DataStream, Runtime / Configuration Reporter: Chesnay Schepler Fix For: 1.19.0 {{StreamingJobGraphGenerator#preValidate}} has a step where it checks that no operator implements {{InputSelectable}} if checkpointing is enabled, because these features aren't compatible. This step can be extremely expensive when the {{CodeGenOperatorFactory}} is used, because it requires all generated operator classes to actually be compiled (which usually only happens on the task manager). If you know what jobs you're running this step can be pure overhead. It would be nice if we'd have a flag to skip this validation step. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
Chesnay Schepler created FLINK-32681: Summary: RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie Key: FLINK-32681 URL: https://issues.apache.org/jira/browse/FLINK-32681 Project: Flink Issue Type: Technical Debt Components: Runtime / State Backends, Tests Affects Versions: 1.18.0 Reporter: Chesnay Schepler Fix For: 1.18.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32571) Prebuild HBase testing docker image
Chesnay Schepler created FLINK-32571: Summary: Prebuild HBase testing docker image Key: FLINK-32571 URL: https://issues.apache.org/jira/browse/FLINK-32571 Project: Flink Issue Type: Technical Debt Components: Connectors / HBase Reporter: Chesnay Schepler Fix For: hbase-3.0.0 For testing we currently build an HBase docker image on-demand during testing. We can improve reliability and testing times by building this image ahead of time, as the only parameter is the HBase version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32544) PythonFunctionFactoryTest fails on Java 17
Chesnay Schepler created FLINK-32544: Summary: PythonFunctionFactoryTest fails on Java 17 Key: FLINK-32544 URL: https://issues.apache.org/jira/browse/FLINK-32544 Project: Flink Issue Type: Sub-task Components: API / Python, Legacy Components / Flink on Tez Affects Versions: 1.18.0 Reporter: Chesnay Schepler https://dev.azure.com/chesnay/flink/_build/results?buildId=3676=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068 {code} Jul 05 10:17:23 Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make field private static java.util.IdentityHashMap java.lang.ApplicationShutdownHooks.hooks accessible: module java.base does not "opens java.lang" to unnamed module @1880a322 Jul 05 10:17:23 at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) Jul 05 10:17:23 at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) Jul 05 10:17:23 at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178) Jul 05 10:17:23 at java.base/java.lang.reflect.Field.setAccessible(Field.java:172) Jul 05 10:17:23 at org.apache.flink.client.python.PythonFunctionFactoryTest.closeStartedPythonProcess(PythonFunctionFactoryTest.java:115) Jul 05 10:17:23 at org.apache.flink.client.python.PythonFunctionFactoryTest.cleanEnvironment(PythonFunctionFactoryTest.java:79) Jul 05 10:17:23 at org.apache.flink.client.python.PythonFunctionFactoryTest.main(PythonFunctionFactoryTest.java:52) {code} Side-notes: * maybe re-evaluate if the test could be run through maven now * The shutdown hooks business is quite sketchy, and AFAICT would be unnecessary if the test were an ITCase -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32536) Python tests fail with Arrow DirectBuffer exception
Chesnay Schepler created FLINK-32536: Summary: Python tests fail with Arrow DirectBuffer exception Key: FLINK-32536 URL: https://issues.apache.org/jira/browse/FLINK-32536 Project: Flink Issue Type: Sub-task Components: API / Python, Tests Affects Versions: 1.18.0 Reporter: Chesnay Schepler https://dev.azure.com/chesnay/flink/_build/results?buildId=3674=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068 {code} 2023-07-04T12:54:15.5296754Z Jul 04 12:54:15 E py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame. 2023-07-04T12:54:15.5299579Z Jul 04 12:54:15 E : java.lang.RuntimeException: Arrow depends on DirectByteBuffer.(long, int) which is not available. Please set the system property 'io.netty.tryReflectionSetAccessible' to 'true'. 2023-07-04T12:54:15.5302307Z Jul 04 12:54:15 E at org.apache.flink.table.runtime.arrow.ArrowUtils.checkArrowUsable(ArrowUtils.java:184) 2023-07-04T12:54:15.5302859Z Jul 04 12:54:15 E at org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame(ArrowUtils.java:546) 2023-07-04T12:54:15.5303177Z Jul 04 12:54:15 E at jdk.internal.reflect.GeneratedMethodAccessor287.invoke(Unknown Source) 2023-07-04T12:54:15.5303515Z Jul 04 12:54:15 E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-07-04T12:54:15.5303929Z Jul 04 12:54:15 E at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-07-04T12:54:15.5307338Z Jul 04 12:54:15 E at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-07-04T12:54:15.5309888Z Jul 04 12:54:15 E at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) 2023-07-04T12:54:15.5310306Z Jul 04 12:54:15 E at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) 2023-07-04T12:54:15.5337220Z Jul 04 12:54:15 E at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-07-04T12:54:15.5341859Z Jul 04 12:54:15 E at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-07-04T12:54:15.5342363Z Jul 04 12:54:15 E at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) 2023-07-04T12:54:15.5344866Z Jul 04 12:54:15 E at java.base/java.lang.Thread.run(Thread.java:833) {code} {code} 2023-07-04T12:54:15.5663559Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_empty_to_pandas 2023-07-04T12:54:15.5663891Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_from_pandas 2023-07-04T12:54:15.5664299Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas 2023-07-04T12:54:15.5664655Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas_for_retract_table 2023-07-04T12:54:15.5665003Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_empty_to_pandas 2023-07-04T12:54:15.5665360Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_from_pandas 2023-07-04T12:54:15.5665704Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas 2023-07-04T12:54:15.5666045Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_for_retract_table 2023-07-04T12:54:15.5666415Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_with_event_time 2023-07-04T12:54:15.5666840Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_function 2023-07-04T12:54:15.5667189Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_with_aux_group 2023-07-04T12:54:15.5667526Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_without_keys 2023-07-04T12:54:15.5667882Z Jul 04 12:54:15 FAILED pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_over_window_aggregate_function 2023-07-04T12:54:15.5668242Z Jul 04 12:54:15 FAILED
[jira] [Created] (FLINK-32482) Add Java 17 to Docker build matrix
Chesnay Schepler created FLINK-32482: Summary: Add Java 17 to Docker build matrix Key: FLINK-32482 URL: https://issues.apache.org/jira/browse/FLINK-32482 Project: Flink Issue Type: Sub-task Components: flink-docker, Release System Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32479) Tests revoke leadership too early
Chesnay Schepler created FLINK-32479: Summary: Tests revoke leadership too early Key: FLINK-32479 URL: https://issues.apache.org/jira/browse/FLINK-32479 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination, Tests Affects Versions: 1.18.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 There are a few tests issue a request to the dispatcher and immediately revoke leadership. In this case there is no guarantee that the guarantee arrived before leadership was revoked, so it could fail if it arrives afterwards since we reject requests if we aren't the leader anymore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32467) Move CleanupOnCloseRpcSystem to rpc-core
Chesnay Schepler created FLINK-32467: Summary: Move CleanupOnCloseRpcSystem to rpc-core Key: FLINK-32467 URL: https://issues.apache.org/jira/browse/FLINK-32467 Project: Flink Issue Type: Sub-task Components: Runtime / RPC Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 This class is useful for any rpc system implementation and should thus be shared. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32380) Serialization of Java records fails
Chesnay Schepler created FLINK-32380: Summary: Serialization of Java records fails Key: FLINK-32380 URL: https://issues.apache.org/jira/browse/FLINK-32380 Project: Flink Issue Type: Sub-task Components: API / Type Serialization System Reporter: Chesnay Schepler Reportedly Java records are not supported, because they are neither detected by our Pojo serializer nor supported by Kryo 2.x -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32379) Skip archunit tests in java1X-target profiles
Chesnay Schepler created FLINK-32379: Summary: Skip archunit tests in java1X-target profiles Key: FLINK-32379 URL: https://issues.apache.org/jira/browse/FLINK-32379 Project: Flink Issue Type: Technical Debt Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 When compiling to Java 11/17 byte code archunit fails; not sure why. Maybe it finds more/less stuff or signatures are represented differently. In any case let's use the Java 8 bytecode version as the "canonical" version and skip archunit otherwise. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32378) 2.0 Breaking Metric system changes
Chesnay Schepler created FLINK-32378: Summary: 2.0 Breaking Metric system changes Key: FLINK-32378 URL: https://issues.apache.org/jira/browse/FLINK-32378 Project: Flink Issue Type: Technical Debt Components: Runtime / Metrics Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Umbrella issue for all breaking changes to the metric system -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32377) 2.0 Breaking REST API changes
Chesnay Schepler created FLINK-32377: Summary: 2.0 Breaking REST API changes Key: FLINK-32377 URL: https://issues.apache.org/jira/browse/FLINK-32377 Project: Flink Issue Type: Technical Debt Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 2.0.0 Umbrella issue for all breaking changes to the REST API. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32370) JDBC SQl gateway e2e test is unstable
Chesnay Schepler created FLINK-32370: Summary: JDBC SQl gateway e2e test is unstable Key: FLINK-32370 URL: https://issues.apache.org/jira/browse/FLINK-32370 Project: Flink Issue Type: Technical Debt Affects Versions: 1.18.0 Reporter: Chesnay Schepler Fix For: 1.18.0 Attachments: flink-vsts-sql-gateway-0-fv-az75-650.log, flink-vsts-standalonesession-0-fv-az75-650.log, flink-vsts-taskexecutor-0-fv-az75-650.log The client is failing while trying to collect data when the job already finished on the cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32369) Setup cron build
Chesnay Schepler created FLINK-32369: Summary: Setup cron build Key: FLINK-32369 URL: https://issues.apache.org/jira/browse/FLINK-32369 Project: Flink Issue Type: Sub-task Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32359) AdaptiveSchedulerBuilder shoudl accept executor service in constructor
Chesnay Schepler created FLINK-32359: Summary: AdaptiveSchedulerBuilder shoudl accept executor service in constructor Key: FLINK-32359 URL: https://issues.apache.org/jira/browse/FLINK-32359 Project: Flink Issue Type: Technical Debt Components: Tests Reporter: Chesnay Schepler Fix For: 1.18.0 The ASBuilder currently accepts mandatory arguments in both the constructor and final {{build()}} method. This makes it difficult to create composite helper factory methods, since you always need to pass a special value in build(), usually leaking details of the test setup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32358) CI may unintentionally use fallback akka loader
Chesnay Schepler created FLINK-32358: Summary: CI may unintentionally use fallback akka loader Key: FLINK-32358 URL: https://issues.apache.org/jira/browse/FLINK-32358 Project: Flink Issue Type: Technical Debt Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 We have a fallback akka loader for developer convenience in the IDE, that is on the classpath of most modules. Depending on the order of jars on the classpath it can happen that the fallback loader appears first, which we dont want because it slows down the build and creates noisy logs. We can add a simple prioritization scheme to the rpc system loading to remedy that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32338) Add FailsOnJava17 annotation
Chesnay Schepler created FLINK-32338: Summary: Add FailsOnJava17 annotation Key: FLINK-32338 URL: https://issues.apache.org/jira/browse/FLINK-32338 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Add an annotation for disabling specific tests on Java 17, similar to FailsOnJava11. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32336) PartitionITCase#ComparablePojo should be public
Chesnay Schepler created FLINK-32336: Summary: PartitionITCase#ComparablePojo should be public Key: FLINK-32336 URL: https://issues.apache.org/jira/browse/FLINK-32336 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 POJOs should be public, but this one is private forcing it go through Kryo, which is currently failing for some odd reason. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32330) Setup Java 17 in e2e builds
Chesnay Schepler created FLINK-32330: Summary: Setup Java 17 in e2e builds Key: FLINK-32330 URL: https://issues.apache.org/jira/browse/FLINK-32330 Project: Flink Issue Type: Sub-task Components: Test Infrastructure Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32329) Do not overwrite env.java.opts.all in HA e2e test
Chesnay Schepler created FLINK-32329: Summary: Do not overwrite env.java.opts.all in HA e2e test Key: FLINK-32329 URL: https://issues.apache.org/jira/browse/FLINK-32329 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Avoid overriding env.java.opts.all since it will soon contain the module declarations required for running Java 17. This is a bit of a hack; a nicer approach would be to append to the existing value, but ain't no one got time to deal with bash. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32328) Ensure surefire baseLine is picked up by IntelliJ
Chesnay Schepler created FLINK-32328: Summary: Ensure surefire baseLine is picked up by IntelliJ Key: FLINK-32328 URL: https://issues.apache.org/jira/browse/FLINK-32328 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 We currently configure JVM arguments exclusively within the surefire executions, which IntelliJ doesn't read. We should also set the baseArgsLine (which in the future will contain module declarations) to the base surefire configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32327) Python Kafka connector runs into strange NullPointerException
Chesnay Schepler created FLINK-32327: Summary: Python Kafka connector runs into strange NullPointerException Key: FLINK-32327 URL: https://issues.apache.org/jira/browse/FLINK-32327 Project: Flink Issue Type: Sub-task Components: API / Python Reporter: Chesnay Schepler The following error occurs when running the python kafka tests: (this uses a slightly modified version of the code, but the error also happens without it) {code:python} def set_record_serializer(self, record_serializer: 'KafkaRecordSerializationSchema') \ -> 'KafkaSinkBuilder': """ Sets the :class:`KafkaRecordSerializationSchema` that transforms incoming records to kafka producer records. :param record_serializer: The :class:`KafkaRecordSerializationSchema`. """ # NOTE: If topic selector is a generated first-column selector, do extra preprocessing j_topic_selector = get_field_value(record_serializer._j_serialization_schema, 'topicSelector') caching_name_suffix = 'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector' if j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix): class_name = get_field_value(j_topic_selector, 'topicSelector')\ .getClass().getCanonicalName() > if class_name.startswith('com.sun.proxy') or > class_name.startswith('jdk.proxy'): E AttributeError: 'NoneType' object has no attribute 'startswith' {code} My assumption is that {{getCanonicalName}} returns {{null}} for some objects, and this set of objects may have increased in Java 17. I tried adding a null check, but that caused other tests to fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32314) Ignore class-loading errors after RPC system shutdown
Chesnay Schepler created FLINK-32314: Summary: Ignore class-loading errors after RPC system shutdown Key: FLINK-32314 URL: https://issues.apache.org/jira/browse/FLINK-32314 Project: Flink Issue Type: Improvement Components: Runtime / RPC, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 In tests we occasionally see the akka rpc service throwing class loading errors _after_ it was shut down. AFAICT our shutdown procedure is correct, and it's just akka shutting down some things asynchronously. I couldn't figure out why/what is still running, so as a bandaid I suggest to ignore classloading errors after the rpc service shutdown has completed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32304) Reduce rpc-akka jar
Chesnay Schepler created FLINK-32304: Summary: Reduce rpc-akka jar Key: FLINK-32304 URL: https://issues.apache.org/jira/browse/FLINK-32304 Project: Flink Issue Type: Improvement Components: Build System, Runtime / RPC Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.2 We bundle unnecessary dependencies in the rpc-akka jar; we can easily shave of 15mb of dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32302) Disable Hbase 2.x tests on Java 17
Chesnay Schepler created FLINK-32302: Summary: Disable Hbase 2.x tests on Java 17 Key: FLINK-32302 URL: https://issues.apache.org/jira/browse/FLINK-32302 Project: Flink Issue Type: Sub-task Components: Connectors / HBase, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Lacking support on the HBase side. Version bumps may solve it, but that's out of scope of this issue since the connector is being externalized. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32301) common.sh#create_ha_config should use set_config_key
Chesnay Schepler created FLINK-32301: Summary: common.sh#create_ha_config should use set_config_key Key: FLINK-32301 URL: https://issues.apache.org/jira/browse/FLINK-32301 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Instead of replacing the entire configuration, set the desired individual options instead. The current approach isn't great because it prevents us from setting required defaults in the flink-dist config. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32297) Use Temurin image in FlinkImageBuilder
Chesnay Schepler created FLINK-32297: Summary: Use Temurin image in FlinkImageBuilder Key: FLINK-32297 URL: https://issues.apache.org/jira/browse/FLINK-32297 Project: Flink Issue Type: Sub-task Components: Test Infrastructure Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 The FlinkImageBuilder currently uses openjdk images. I've seen issues with these on Java 17, and propose to use Temurin, similar to the prod images. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32295) Try out Infra-provided Gradle Enterprise
Chesnay Schepler created FLINK-32295: Summary: Try out Infra-provided Gradle Enterprise Key: FLINK-32295 URL: https://issues.apache.org/jira/browse/FLINK-32295 Project: Flink Issue Type: Technical Debt Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Infra has a Gradle Enterprise instance that can be used for Github Action branch builds (not PRs). We could try this out in one of the connector repos to see if it provides value to us; if so rolling this out to all connector/auxiliary repos could be interesting. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32291) Hive E2E test fails consistently
Chesnay Schepler created FLINK-32291: Summary: Hive E2E test fails consistently Key: FLINK-32291 URL: https://issues.apache.org/jira/browse/FLINK-32291 Project: Flink Issue Type: Technical Debt Components: Connectors / Hive, Tests Affects Versions: 1.18.0 Reporter: Chesnay Schepler Fix For: 1.18.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=49754=results -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32290) Enable -XX:+IgnoreUnrecognizedVMOptions
Chesnay Schepler created FLINK-32290: Summary: Enable -XX:+IgnoreUnrecognizedVMOptions Key: FLINK-32290 URL: https://issues.apache.org/jira/browse/FLINK-32290 Project: Flink Issue Type: Sub-task Components: API / Python, Build System, Deployment / YARN Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 We can make our lives a lot easier by enabling {{IgnoreUnrecognizedVMOptions}} for all processes. With this we can set add-opens/add-exports independent of what JDK is actually being used, removing a major source of complexity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32239) Unify TestJvmProcess and TestProcessBuilder
Chesnay Schepler created FLINK-32239: Summary: Unify TestJvmProcess and TestProcessBuilder Key: FLINK-32239 URL: https://issues.apache.org/jira/browse/FLINK-32239 Project: Flink Issue Type: Technical Debt Components: Test Infrastructure Reporter: Chesnay Schepler Fix For: 1.18.0 Both of these utility classes are used to spawn additional JVM processes during tests, and contain a fair bit of duplicated logic. We can unify them to ease maintenance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32238) Stable approach for installing libssl
Chesnay Schepler created FLINK-32238: Summary: Stable approach for installing libssl Key: FLINK-32238 URL: https://issues.apache.org/jira/browse/FLINK-32238 Project: Flink Issue Type: Technical Debt Components: Test Infrastructure Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.16.3, 1.17.2 I think I found a stable way to install libssl on CI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32236) Ease YarnTestBase allowlist address regex
Chesnay Schepler created FLINK-32236: Summary: Ease YarnTestBase allowlist address regex Key: FLINK-32236 URL: https://issues.apache.org/jira/browse/FLINK-32236 Project: Flink Issue Type: Sub-task Components: Deployment / YARN, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 The YarnTestBase contains allow-list items like this: {code}Remote connection to \\[null\\] failed with java.net.ConnectException: Connection refused{code} I've seen this exception a few times without the address being null. This could be due to difference in how Java 17 resolves addresses (?). In any case I don't see any harm in relaxing this regex to accept any address. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32226) RestClusterClient leaks jobgraph file if submission fails
Chesnay Schepler created FLINK-32226: Summary: RestClusterClient leaks jobgraph file if submission fails Key: FLINK-32226 URL: https://issues.apache.org/jira/browse/FLINK-32226 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.2 {code:java} submissionFuture .thenCompose(ignored -> jobGraphFileFuture) .thenAccept( jobGraphFile -> { try { Files.delete(jobGraphFile); } catch (IOException e) { LOG.warn("Could not delete temporary file {}.", jobGraphFile, e); } }); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32185) Remove M2_HOME usages
Chesnay Schepler created FLINK-32185: Summary: Remove M2_HOME usages Key: FLINK-32185 URL: https://issues.apache.org/jira/browse/FLINK-32185 Project: Flink Issue Type: Sub-task Components: Build System, Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 Apparently M2_HOME is no longer evaluated by Maven, so we either need to adjust some CI stuff or outright remove existing usages. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32184) Use revision version property
Chesnay Schepler created FLINK-32184: Summary: Use revision version property Key: FLINK-32184 URL: https://issues.apache.org/jira/browse/FLINK-32184 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 With the revision property we can centrally define the project version in the root pom, and no longer have to change the poms of all modules when creating a release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32183) Use maven.multiModuleProjectDirectory property instead of rootDir plugin
Chesnay Schepler created FLINK-32183: Summary: Use maven.multiModuleProjectDirectory property instead of rootDir plugin Key: FLINK-32183 URL: https://issues.apache.org/jira/browse/FLINK-32183 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 Drop the now redundant rootDir plugin in favor of a new built-in property. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32182) Use original japicmp plugin
Chesnay Schepler created FLINK-32182: Summary: Use original japicmp plugin Key: FLINK-32182 URL: https://issues.apache.org/jira/browse/FLINK-32182 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.0 We currently use a japicmp fork for maven 3.2.5 compatibility, then we can now drop. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32181) Drop support for Maven 3.2.5
Chesnay Schepler created FLINK-32181: Summary: Drop support for Maven 3.2.5 Key: FLINK-32181 URL: https://issues.apache.org/jira/browse/FLINK-32181 Project: Flink Issue Type: Technical Debt Components: Build System Reporter: Chesnay Schepler Fix For: 1.19.0 Collection of improvements we can make when dropping support for Maven 3.2.5. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32179) Handle more repo names for automatic dist discovery
Chesnay Schepler created FLINK-32179: Summary: Handle more repo names for automatic dist discovery Key: FLINK-32179 URL: https://issues.apache.org/jira/browse/FLINK-32179 Project: Flink Issue Type: Technical Debt Components: Test Infrastructure Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.2 The e2e tests have a routine to auto-detect the distribution that they need to actually run Flink. When Flink is checked out in a directory not starting with "flink" the auto-discovery doesn't find it. We can improve this slightly by adjusting the iteration condition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32169) Show allocated slots on TM page
Chesnay Schepler created FLINK-32169: Summary: Show allocated slots on TM page Key: FLINK-32169 URL: https://issues.apache.org/jira/browse/FLINK-32169 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination, Runtime / Web Frontend Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Show the allocated slogs on the TM page, so that you can better understand which job is consuming what resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32168) Log required/available resources in RM
Chesnay Schepler created FLINK-32168: Summary: Log required/available resources in RM Key: FLINK-32168 URL: https://issues.apache.org/jira/browse/FLINK-32168 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 When matching requirements against available resource the RM currently doesn't log anything apart from whether it could fulfill the resources or not. We can make the system easier to audit by logging the current requirements, available resources, and how many resources are left after the matching. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32167) Log dynamic slot creation on task manager
Chesnay Schepler created FLINK-32167: Summary: Log dynamic slot creation on task manager Key: FLINK-32167 URL: https://issues.apache.org/jira/browse/FLINK-32167 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 When a slot is dynamically allocated on the TM we should log that this happens, what resources it consumes and what the remaining resources are. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32166) Show unassigned/total TM resources in web ui
Chesnay Schepler created FLINK-32166: Summary: Show unassigned/total TM resources in web ui Key: FLINK-32166 URL: https://issues.apache.org/jira/browse/FLINK-32166 Project: Flink Issue Type: Sub-task Components: Runtime / Web Frontend Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 It is important to know how many resources of a TM are currently _assigned_ to jobs. This is different to what resources currently _used_, since you can have assigned 1gb memory to a job with it only using 10mb at this time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32165) Improve observability ofd fine-grained resource management
Chesnay Schepler created FLINK-32165: Summary: Improve observability ofd fine-grained resource management Key: FLINK-32165 URL: https://issues.apache.org/jira/browse/FLINK-32165 Project: Flink Issue Type: Improvement Components: Runtime / Coordination, Runtime / Web Frontend Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Right now fine-grained resource management is way too much of a black-box, with the only source of information being the taskmanager rest endpoints. While this is fine-ish for services built around it the developer experience is suffering greatly and it becomes impossible to reason about the system afterwards (because we don't even log anything). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32162) Misleading log message due to missing null check
Chesnay Schepler created FLINK-32162: Summary: Misleading log message due to missing null check Key: FLINK-32162 URL: https://issues.apache.org/jira/browse/FLINK-32162 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Updating the job requirements always logs "Failed to update requirements for job {}." because we don't check whether the error is not null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32154) Setup checkstyle rule to forbid mockito/powermock
Chesnay Schepler created FLINK-32154: Summary: Setup checkstyle rule to forbid mockito/powermock Key: FLINK-32154 URL: https://issues.apache.org/jira/browse/FLINK-32154 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32153) Limit powermock to flink-core/-runtime
Chesnay Schepler created FLINK-32153: Summary: Limit powermock to flink-core/-runtime Key: FLINK-32153 URL: https://issues.apache.org/jira/browse/FLINK-32153 Project: Flink Issue Type: Sub-task Components: Build System, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32152) Consolidate mocking library usage
Chesnay Schepler created FLINK-32152: Summary: Consolidate mocking library usage Key: FLINK-32152 URL: https://issues.apache.org/jira/browse/FLINK-32152 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Use mockito instead of powermock wherever possible, with the goal of restricting powermock to specific modules, eventually dropping it entirely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32149) Remove some unnecessary mocking usages
Chesnay Schepler created FLINK-32149: Summary: Remove some unnecessary mocking usages Key: FLINK-32149 URL: https://issues.apache.org/jira/browse/FLINK-32149 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31972) Remove powermock whitebox usages
Chesnay Schepler created FLINK-31972: Summary: Remove powermock whitebox usages Key: FLINK-31972 URL: https://issues.apache.org/jira/browse/FLINK-31972 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31971) Drop HadoopRecoverableWriterOldHadoopWithNoTruncateSupportTest
Chesnay Schepler created FLINK-31971: Summary: Drop HadoopRecoverableWriterOldHadoopWithNoTruncateSupportTest Key: FLINK-31971 URL: https://issues.apache.org/jira/browse/FLINK-31971 Project: Flink Issue Type: Technical Debt Components: Connectors / Hadoop Compatibility, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 This test explicitly checks behavior for Hadoop < 2.7, which we no longer support. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31940) DataStreamCsvITCase#CityPojo should be public
Chesnay Schepler created FLINK-31940: Summary: DataStreamCsvITCase#CityPojo should be public Key: FLINK-31940 URL: https://issues.apache.org/jira/browse/FLINK-31940 Project: Flink Issue Type: Sub-task Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Since the class is package-private it is serialized via Kryo and not the pojo serializer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31934) Remove mocking in RocksDB tests
Chesnay Schepler created FLINK-31934: Summary: Remove mocking in RocksDB tests Key: FLINK-31934 URL: https://issues.apache.org/jira/browse/FLINK-31934 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31933) Remove Whitebox usage in ExpressionKeysTest
Chesnay Schepler created FLINK-31933: Summary: Remove Whitebox usage in ExpressionKeysTest Key: FLINK-31933 URL: https://issues.apache.org/jira/browse/FLINK-31933 Project: Flink Issue Type: Sub-task Components: API / Core, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Reduce illegal reflective accesses. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31919) Skip ClosureCleaner if object can be serialized
Chesnay Schepler created FLINK-31919: Summary: Skip ClosureCleaner if object can be serialized Key: FLINK-31919 URL: https://issues.apache.org/jira/browse/FLINK-31919 Project: Flink Issue Type: Sub-task Components: API / Core Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 Given an object the ClosureCleaner currently recursively digs into every non-static/transient field of the given object. This causes a problem on Java 17 because these reflective accesses all need to be explicitly allowed beforehand. Instead, we could limit the CC to objects that fail serialization, because if something can be serialized there isn't anything for the CC to do. This should allow us to avoid a lot of unnecessary reflection accesses to immutable JDK classes, like Strings/BigDecimals etc etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31916) Python API only respects deprecated env.java.opts key
Chesnay Schepler created FLINK-31916: Summary: Python API only respects deprecated env.java.opts key Key: FLINK-31916 URL: https://issues.apache.org/jira/browse/FLINK-31916 Project: Flink Issue Type: Sub-task Components: API / Python, Runtime / Configuration Reporter: Chesnay Schepler Fix For: 1.18.0 pyflink_gateway_server.py is only reading the deprecated env.java.opts from the configuration. This key should only be used as a fallback, with env.java.opts.tm/jm/client being the actual keys to support. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31915) Python API incorrectly passes env.java.opts as single argument
Chesnay Schepler created FLINK-31915: Summary: Python API incorrectly passes env.java.opts as single argument Key: FLINK-31915 URL: https://issues.apache.org/jira/browse/FLINK-31915 Project: Flink Issue Type: Sub-task Components: API / Python Affects Versions: 1.16.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 The python API passes all java options as a single string argument, which typically means that the JVM will reject them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31913) sql-client.sh does not respect env.java.opts.all/client
Chesnay Schepler created FLINK-31913: Summary: sql-client.sh does not respect env.java.opts.all/client Key: FLINK-31913 URL: https://issues.apache.org/jira/browse/FLINK-31913 Project: Flink Issue Type: Sub-task Components: Deployment / Scripts, Table SQL / Client Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31912) Upgrade bytebuddy
Chesnay Schepler created FLINK-31912: Summary: Upgrade bytebuddy Key: FLINK-31912 URL: https://issues.apache.org/jira/browse/FLINK-31912 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31911) Bad address construction in SqlClientTest
Chesnay Schepler created FLINK-31911: Summary: Bad address construction in SqlClientTest Key: FLINK-31911 URL: https://issues.apache.org/jira/browse/FLINK-31911 Project: Flink Issue Type: Sub-task Components: Table SQL / Client, Tests Affects Versions: 1.16.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 The SqlClientTest constructs a host:port pair with this: {code} InetSocketAddress.createUnresolved( SQL_GATEWAY_REST_ENDPOINT_EXTENSION.getTargetAddress(), SQL_GATEWAY_REST_ENDPOINT_EXTENSION.getTargetPort()) .toString() {code} This is unnecessarily complicated and fails on Java 17 because the toString representation is _not_ guaranteed to return something of the form host:port. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31807) Test architecture tests don't cover all tests
Chesnay Schepler created FLINK-31807: Summary: Test architecture tests don't cover all tests Key: FLINK-31807 URL: https://issues.apache.org/jira/browse/FLINK-31807 Project: Flink Issue Type: Technical Debt Components: Connectors / Cassandra Affects Versions: cassandra-4.0.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31806) Prod architecture tests didn't detect non-public API usage
Chesnay Schepler created FLINK-31806: Summary: Prod architecture tests didn't detect non-public API usage Key: FLINK-31806 URL: https://issues.apache.org/jira/browse/FLINK-31806 Project: Flink Issue Type: Technical Debt Components: Connectors / Cassandra, Tests Affects Versions: cassandra-3.0.0, 1.18.0 Reporter: Chesnay Schepler FLINK-31805 wasn't detected by the production architecture tests. Not sure if this is an issue on the cassandra or Flink side. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31805) Cassandra Source shouldn't use IOUtils
Chesnay Schepler created FLINK-31805: Summary: Cassandra Source shouldn't use IOUtils Key: FLINK-31805 URL: https://issues.apache.org/jira/browse/FLINK-31805 Project: Flink Issue Type: Technical Debt Components: Connectors / Cassandra Affects Versions: cassandra-4.0.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: cassandra-4.0.0 IOUtils is not part of the public API and shouldn't be used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31804) ITCase MiniCluster test architecture rule should accept MiniClusterTestEnvironment
Chesnay Schepler created FLINK-31804: Summary: ITCase MiniCluster test architecture rule should accept MiniClusterTestEnvironment Key: FLINK-31804 URL: https://issues.apache.org/jira/browse/FLINK-31804 Project: Flink Issue Type: Technical Debt Components: Connectors / Common, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31744) Extend Adaptive Scheduler sparse EG to contain maxParallelism
Chesnay Schepler created FLINK-31744: Summary: Extend Adaptive Scheduler sparse EG to contain maxParallelism Key: FLINK-31744 URL: https://issues.apache.org/jira/browse/FLINK-31744 Project: Flink Issue Type: Improvement Components: Runtime / Coordination, Runtime / REST Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 When a job is WaitingForResources the adpative scheduler returns a sparse execution graph that omits many details that are only know at execution time (like subtasks). We could include all JobVertex-level information though, which would cover things like the vertex id/name and the maxParallelism. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31738) FlameGraphTypeQueryParameter#Type clashes with java.reflect.Type in generated clients
Chesnay Schepler created FLINK-31738: Summary: FlameGraphTypeQueryParameter#Type clashes with java.reflect.Type in generated clients Key: FLINK-31738 URL: https://issues.apache.org/jira/browse/FLINK-31738 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.1 Generating a client with the openapi generators causes compile errors because the generated file imports java.reflect.Type, but also the generated "Type" model. For convenience it would be neat to give this enum a slightly different name, because working around this issue is surprisingly annoying. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31735) JobDetailsInfo plan incorrectly documented as string
Chesnay Schepler created FLINK-31735: Summary: JobDetailsInfo plan incorrectly documented as string Key: FLINK-31735 URL: https://issues.apache.org/jira/browse/FLINK-31735 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.1 The {{plan}} field in the JobDefaultsInfo contains an object, not a string. Internally we handle it as a string, but write it out as an object. The docs generators aren't aware of this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31733) Model name clashes in OpenAPI spec
Chesnay Schepler created FLINK-31733: Summary: Model name clashes in OpenAPI spec Key: FLINK-31733 URL: https://issues.apache.org/jira/browse/FLINK-31733 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.1 The OpenAPi spec uses simple class names for naming models. There are however several models, usually inner classes, that share simple names, like "Summary". This goes undetected and breaks the model for some API calls. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31728) Remove Scala API dependencies from batch/streaming examples
Chesnay Schepler created FLINK-31728: Summary: Remove Scala API dependencies from batch/streaming examples Key: FLINK-31728 URL: https://issues.apache.org/jira/browse/FLINK-31728 Project: Flink Issue Type: Technical Debt Components: Build System, Examples Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 The example modules have leftover Scala API dependencies and build infrastructure. Remove them, along with the scala suffix on these modules. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31711) OpenAPI spec omits complete-statement request body
Chesnay Schepler created FLINK-31711: Summary: OpenAPI spec omits complete-statement request body Key: FLINK-31711 URL: https://issues.apache.org/jira/browse/FLINK-31711 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.17.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.1 The OpenAPI generator omits request bodies for get requests because it is usually a bad idea. Still, the generator shouldn't omit this on it's own. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31672) Requirement validation does nto take user-specified maxParallelism into account
Chesnay Schepler created FLINK-31672: Summary: Requirement validation does nto take user-specified maxParallelism into account Key: FLINK-31672 URL: https://issues.apache.org/jira/browse/FLINK-31672 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31657) ConfigurationInfo generates incorrect openapi schema
Chesnay Schepler created FLINK-31657: Summary: ConfigurationInfo generates incorrect openapi schema Key: FLINK-31657 URL: https://issues.apache.org/jira/browse/FLINK-31657 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.16.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.16.2, 1.18.0, 1.17.1 ConfigurationInfo extends ArrayList, and the schema generator picks up List#isEmpty as a property. This results in an invalid schema, as arrays cant have properties. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31608) Re-evaluate 'min-parallelism-increase' option
Chesnay Schepler created FLINK-31608: Summary: Re-evaluate 'min-parallelism-increase' option Key: FLINK-31608 URL: https://issues.apache.org/jira/browse/FLINK-31608 Project: Flink Issue Type: Sub-task Components: Runtime / Configuration, Runtime / Coordination Reporter: Chesnay Schepler Fix For: 1.18.0 This option was meant to prevent scale up operations where the benefit doesn't outweigh the cost, like scaling up to increase a single vertices parallelism by 1. Meanwhile, scale-down operations were always immediately executed, because they were always the result of a stopped TaskManager, causing the job to restart anyway. Now that users can change the requirements at will this has changed, and the expected behavior is overall undefined. We need to answer: * should there be a dedicated option for limiting scale-down operations if the requirements were changed? * should the min-parallelism-*increase* option be generalized to a min-parallelism-*change* option? * How shall operations be handled that scale different vertices up or down at the same? So far the decision was made on the cumulative parallelism change, but in this case the parallelism distribution can change significantly while the cumulative change is 0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31591) Extend JobGraphWriter to persist requirements
Chesnay Schepler created FLINK-31591: Summary: Extend JobGraphWriter to persist requirements Key: FLINK-31591 URL: https://issues.apache.org/jira/browse/FLINK-31591 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31590) Allow setting JobResourceRequirements through JobMasterGateway
Chesnay Schepler created FLINK-31590: Summary: Allow setting JobResourceRequirements through JobMasterGateway Key: FLINK-31590 URL: https://issues.apache.org/jira/browse/FLINK-31590 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31475) Allow project to be user-defined in release scripts
Chesnay Schepler created FLINK-31475: Summary: Allow project to be user-defined in release scripts Key: FLINK-31475 URL: https://issues.apache.org/jira/browse/FLINK-31475 Project: Flink Issue Type: New Feature Components: Connectors / Common, Release System Reporter: Chesnay Schepler Assignee: Chesnay Schepler The connector release scripts derive the project name from the repository. For some esoteric cases (like the flink-connector-shared-utils repo) it would be beneficial to be able to override this on the command-line. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31455) Add a simple test project for running CI workflow in shared repo
Chesnay Schepler created FLINK-31455: Summary: Add a simple test project for running CI workflow in shared repo Key: FLINK-31455 URL: https://issues.apache.org/jira/browse/FLINK-31455 Project: Flink Issue Type: Technical Debt Components: Build System / CI, Connectors / Common Reporter: Chesnay Schepler Assignee: Chesnay Schepler connector-shared-utils has no CI for the CI workflow, which has repeatedly shown to be a problem. Setup some simple workflows that at least run CI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31454) Shared CI workflow always caches snapshot binaries
Chesnay Schepler created FLINK-31454: Summary: Shared CI workflow always caches snapshot binaries Key: FLINK-31454 URL: https://issues.apache.org/jira/browse/FLINK-31454 Project: Flink Issue Type: Technical Debt Components: Build System / CI, Connectors / Common Reporter: Chesnay Schepler Assignee: Chesnay Schepler The if conditions need to work on strings because they use environment variables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30967) Add MongoDB connector documentation
Chesnay Schepler created FLINK-30967: Summary: Add MongoDB connector documentation Key: FLINK-30967 URL: https://issues.apache.org/jira/browse/FLINK-30967 Project: Flink Issue Type: Sub-task Components: Connectors / MongoDB, Documentation Reporter: Chesnay Schepler Fix For: mongodb-1.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30963) Switch binary downloads to archive.apache.org
Chesnay Schepler created FLINK-30963: Summary: Switch binary downloads to archive.apache.org Key: FLINK-30963 URL: https://issues.apache.org/jira/browse/FLINK-30963 Project: Flink Issue Type: Technical Debt Components: Build System / CI, Connectors / Common Reporter: Chesnay Schepler Assignee: Chesnay Schepler archive.apache.org is the only stable download link for binaries. Now that we properly fixed the binary caching in the connector workflows it should be fine to make use of it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30930) Automatically determine Flink binary download URL from version
Chesnay Schepler created FLINK-30930: Summary: Automatically determine Flink binary download URL from version Key: FLINK-30930 URL: https://issues.apache.org/jira/browse/FLINK-30930 Project: Flink Issue Type: Technical Debt Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30895) SlotSharingSlotAllocator may waste slots
Chesnay Schepler created FLINK-30895: Summary: SlotSharingSlotAllocator may waste slots Key: FLINK-30895 URL: https://issues.apache.org/jira/browse/FLINK-30895 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.16.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.18.0, 1.17.1 The allocated evenly distributes slots across slot sharing groups independent of how many slots the vertices in that group actually need. This can cause slots to be unused. -- This message was sent by Atlassian Jira (v8.20.10#820010)