[jira] [Created] (FLINK-35159) CreatingExecutionGraph can leak CheckpointCoordinator and cause JM crash

2024-04-18 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-35159:


 Summary: CreatingExecutionGraph can leak CheckpointCoordinator and 
cause JM crash
 Key: FLINK-35159
 URL: https://issues.apache.org/jira/browse/FLINK-35159
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.2, 1.20.0, 1.19.1


When a task manager dies while the JM is generating an ExecutionGraph in the 
background then {{CreatingExecutionGraph#handleExecutionGraphCreation}} can 
transition back into WaitingForResources if the TM hosted one of the slots that 
we planned to use in {{tryToAssignSlots}}.

At this point the ExecutionGraph was already transitioned to running, which 
implicitly kicks of periodic checkpointing by the CheckpointCoordinator, 
without the operator coordinator holders being initialized yet (as this happens 
after we assigned slots).

This effectively leaks that CheckpointCoordinator, including the timer thread 
that will continue to try triggering checkpoints, which will naturally fail to 
trigger.
This can cause a JM crash because it results in 
{{OperatorCoordinatorHolder#abortCurrentTriggering}} to be called, which fails 
with an NPE since the {{mainThreadExecutor}} was not initialized yet.

{code}
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: java.lang.NullPointerException
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$8(CheckpointCoordinator.java:707)
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
at 
java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:910)
at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.CompletionException: 
java.lang.NullPointerException
at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932)
at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
... 7 more
Caused by: java.lang.NullPointerException
at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.abortCurrentTriggering(OperatorCoordinatorHolder.java:388)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at 
java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:985)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:961)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$7(CheckpointCoordinator.java:693)
at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
... 8 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-03-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34672:


 Summary: HA deadlock between JobMasterServiceLeadershipRunner and 
DefaultLeaderElectionService
 Key: FLINK-34672
 URL: https://issues.apache.org/jira/browse/FLINK-34672
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1
Reporter: Chesnay Schepler
 Fix For: 1.18.2, 1.20.0, 1.19.1


We recently observed a deadlock in the JM within the HA system.
(see below for the thread dump)

[~mapohl] and I looked a bit into it and there appears to be a race condition 
when leadership is revoked while a JobMaster is being started.
It appears to be caused by 
{{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
forwarding futures while holding a lock; depending on whether the forwarded 
future is already complete the next stage may or may not run while holding that 
same lock.
We haven't determined yet whether we should be holding that lock or not.

{{code}}
"DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 daemon 
prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
nid=0x19d waiting for monitor entry  [0x7f53084fd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
- waiting to lock <0xf1c0e088> (a java.lang.Object)
at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
 Source)
at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
 Source)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
- locked <0xf0e3f4d8> (a java.lang.Object)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
 Source)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
{{code}}

{{code}}
"jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
[0x7f530567d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
- waiting to lock <0xf0e3f4d8> (a java.lang.Object)
at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
- locked <0xf1c0e088> (a java.lang.Object)
at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown
 Source)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837)
at 
java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.22/CompletableFuture.java:506)
at 
java.util.concurrent.CompletableFuture.complete(java.base@11.0.22/CompletableFuture.java:2079)
at 
org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.registerJobMasterServiceFutures(DefaultJobMasterServiceProcess.java:124)
at 
org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:114)
at 

[jira] [Created] (FLINK-34640) Replace DummyMetricGroup usage with UnregisteredMetricsGroup

2024-03-11 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34640:


 Summary: Replace DummyMetricGroup usage with 
UnregisteredMetricsGroup
 Key: FLINK-34640
 URL: https://issues.apache.org/jira/browse/FLINK-34640
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Metrics, Tests
Reporter: Chesnay Schepler
 Fix For: 1.20.0


The {{DummyMetricGroup}} is terrible because it is decidedly unsafe to use. Use 
the {{UnregisteredMetricsGroup}} instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34499) Configuration#toString should hide sensitive values

2024-02-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34499:


 Summary: Configuration#toString should hide sensitive values
 Key: FLINK-34499
 URL: https://issues.apache.org/jira/browse/FLINK-34499
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Chesnay Schepler
 Fix For: 1.20.0


Time and time again people log the entire Flink configuration for no reason, 
risking that sensitive values are logged in plain text.

We should make this harder by changing {{Configuration#toString}} to 
automatically hide sensitive values, for example like this:

{code}
@Override
public String toString() {
return ConfigurationUtils
.hideSensitiveValues(this.confData.entrySet().stream().collect(
Collectors.toMap(Map.Entry::getKey, entry -> 
entry.getValue().toString(
.toString();
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34498) GSFileSystemFactory logs full Flink config

2024-02-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34498:


 Summary: GSFileSystemFactory logs full Flink config
 Key: FLINK-34498
 URL: https://issues.apache.org/jira/browse/FLINK-34498
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.18.1
Reporter: Chesnay Schepler
 Fix For: 1.19.0, 1.18.2, 1.20.0


This can cause secrets from the config to be logged.
{code}
@Override
public void configure(Configuration flinkConfig) {
LOGGER.info("Configuring GSFileSystemFactory with Flink configuration 
{}", flinkConfig);
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34496) Classloading deadlock between ExecNodeMetadataUtil and JsonSerdeUtil

2024-02-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34496:


 Summary: Classloading deadlock between ExecNodeMetadataUtil and 
JsonSerdeUtil
 Key: FLINK-34496
 URL: https://issues.apache.org/jira/browse/FLINK-34496
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.18.1
Reporter: Chesnay Schepler
 Fix For: 1.19.0, 1.18.2, 1.20.0


This is a fun one!

ExecNodeMetadataUtil and JsonSerdeUtil have a circular dependency in their 
static initialization, which can cause a classloading lockup when 2 threads are 
running the class initialization of each class at the same time because during 
class initialization they hold a lock.

{code}
Feb 22 00:31:58 "ForkJoinPool-3-worker-11" #25 daemon prio=5 os_prio=0 
cpu=219.87ms elapsed=995.99s tid=0x7ff11c50e000 nid=0xf0fc in Object.wait() 
 [0x7ff12a4f3000]
Feb 22 00:31:58java.lang.Thread.State: RUNNABLE
Feb 22 00:31:58 at 
org.apache.flink.table.planner.plan.nodes.exec.serde.JsonSerdeUtil.createFlinkTableJacksonModule(JsonSerdeUtil.java:133)
Feb 22 00:31:58 at 
org.apache.flink.table.planner.plan.nodes.exec.serde.JsonSerdeUtil.(JsonSerdeUtil.java:111)

Feb 22 00:31:58 "ForkJoinPool-3-worker-7" #23 daemon prio=5 os_prio=0 
cpu=54.83ms elapsed=996.00s tid=0x7ff11c50c000 nid=0xf0fb in Object.wait()  
[0x7ff12a5f4000]
Feb 22 00:31:58java.lang.Thread.State: RUNNABLE
Feb 22 00:31:58 at 
org.apache.flink.table.planner.plan.utils.ExecNodeMetadataUtil.addToLookupMap(ExecNodeMetadataUtil.java:235)
Feb 22 00:31:58 at 
org.apache.flink.table.planner.plan.utils.ExecNodeMetadataUtil.(ExecNodeMetadataUtil.java:156)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34485) Token delegation doesn't work with Presto S3 filesystem

2024-02-21 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34485:


 Summary: Token delegation doesn't work with Presto S3 filesystem
 Key: FLINK-34485
 URL: https://issues.apache.org/jira/browse/FLINK-34485
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.18.1
Reporter: Chesnay Schepler
 Fix For: 1.20.0


AFAICT it's not possible to use token delegation with the Presto filesystem.
The token delegation relies on the {{DynamicTemporaryAWSCredentialsProvider}}, 
but it doesn't have a constructor that presto required (ruling out 
presto.s3.credentials-provider), and other providers can't be used due to 
FLINK-13602.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34431) Move static BlobWriter methods to separate util

2024-02-12 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34431:


 Summary: Move static BlobWriter methods to separate util
 Key: FLINK-34431
 URL: https://issues.apache.org/jira/browse/FLINK-34431
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.20.0


The BlobWriter interface contains several static methods, some being used, 
others being de-facto internal methods.
We should move these into a dedicated BlobWriterUtils class so we can properly 
deal with method visibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34422) BatchTestBase doesn't actually use MiniClusterExtension

2024-02-10 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34422:


 Summary: BatchTestBase doesn't actually use MiniClusterExtension
 Key: FLINK-34422
 URL: https://issues.apache.org/jira/browse/FLINK-34422
 Project: Flink
  Issue Type: Technical Debt
  Components: Test Infrastructure
Affects Versions: 1.18.1
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0, 1.18.2, 1.20.0


BatchTestBase sets up a table environment in instance fields, which runs before 
the BeforeEachCallback from the MiniClusterExtension has time to run.
As a result _all_ test extending the BatchTestBase are spawning separate mini 
clusters for every single job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34421) Skip post-compile checks in compile.sh if fast profile is active

2024-02-10 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34421:


 Summary: Skip post-compile checks in compile.sh if fast profile is 
active
 Key: FLINK-34421
 URL: https://issues.apache.org/jira/browse/FLINK-34421
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0, 1.18.2, 1.20.0


We currently waste time in our e2e tests, re-running a bunch of post-compile 
checks (like packaging/licensing).
Let's couple this to the -Dfast/-Pfast switches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34397) Resource wait timeout can't be disabled

2024-02-06 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34397:


 Summary: Resource wait timeout can't be disabled
 Key: FLINK-34397
 URL: https://issues.apache.org/jira/browse/FLINK-34397
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Configuration
Affects Versions: 1.17.2
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0, 1.17.3, 1.18.2


The documentation for {{jobmanager.adaptive-scheduler.resource-wait-timeout}} 
states that:

??Setting a negative duration will disable the resource timeout: The JobManager 
will wait indefinitely for resources to appear.??

However, we don't support parsing negative durations.

{code}
Could not parse value '-1 s' for key 
'jobmanager.adaptive-scheduler.resource-wait-timeout'.
Caused by: java.lang.NumberFormatException: text does not start with a number
at org.apache.flink.util.TimeUtils.parseDuration(TimeUtils.java:80)
at 
org.apache.flink.configuration.ConfigurationUtils.convertToDuration(ConfigurationUtils.java:399)
at 
org.apache.flink.configuration.ConfigurationUtils.convertValue(ConfigurationUtils.java:331)
at 
org.apache.flink.configuration.Configuration.lambda$getOptional$3(Configuration.java:729)
at java.base/java.util.Optional.map(Optional.java:260)
at 
org.apache.flink.configuration.Configuration.getOptional(Configuration.java:729)
... 2 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34286) Attach cluster config map labels at creation time

2024-01-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34286:


 Summary: Attach cluster config map labels at creation time
 Key: FLINK-34286
 URL: https://issues.apache.org/jira/browse/FLINK-34286
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


We attach a set of labels to config maps that we create to ease the manual 
cleanup by users in case Flink fails unrecoverably.

For cluster config maps (that are used for leader election), these labels are 
not set at creation time, but when leadership is acquired, in contrast to job 
config maps.

This means there's a gap where we create a CM without any labels being 
attached, and should Flink fail before leadership can be acquired it will 
continue to lack labels indefinitely.

AFAICT it should be straight-forward, at least API-wise, to set these labels at 
creation time. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34097) Remove unused JobMasterGateway#requestJobDetails

2024-01-15 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34097:


 Summary: Remove unused JobMasterGateway#requestJobDetails
 Key: FLINK-34097
 URL: https://issues.apache.org/jira/browse/FLINK-34097
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


This method is wired all the way to the scheduler; remove it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34004) TestingCheckpointIDCounter can easily lead to NPEs

2024-01-05 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34004:


 Summary: TestingCheckpointIDCounter can easily lead to NPEs
 Key: FLINK-34004
 URL: https://issues.apache.org/jira/browse/FLINK-34004
 Project: Flink
  Issue Type: Technical Debt
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


The TestingCheckpointIDCounter builder doesn't define safe defaults for all 
builder parameters. Using it can easily lead to surprising null pointer 
exceptions in tests when code is being modified to call more methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33352) OpenAPI spec is lacking mappings for discriminator properties

2023-10-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-33352:


 Summary: OpenAPI spec is lacking mappings for discriminator 
properties
 Key: FLINK-33352
 URL: https://issues.apache.org/jira/browse/FLINK-33352
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.2, 1.19.0, 1.18.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32888) File upload runs into EndOfDataDecoderException

2023-08-17 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32888:


 Summary: File upload runs into EndOfDataDecoderException
 Key: FLINK-32888
 URL: https://issues.apache.org/jira/browse/FLINK-32888
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.2


With the right request the FIleUploadHandler runs into a 
EndOfDataDecoderException although everything is fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32834) Allow compile.sh to be used manually

2023-08-11 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32834:


 Summary: Allow compile.sh to be used manually
 Key: FLINK-32834
 URL: https://issues.apache.org/jira/browse/FLINK-32834
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


For debugging purposes it would be nice if you could run compile.sh locally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32745) Add a flag to skip InputSelectable preValidate step

2023-08-03 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32745:


 Summary: Add a flag to skip InputSelectable preValidate step
 Key: FLINK-32745
 URL: https://issues.apache.org/jira/browse/FLINK-32745
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream, Runtime / Configuration
Reporter: Chesnay Schepler
 Fix For: 1.19.0


{{StreamingJobGraphGenerator#preValidate}} has a step where it checks that no 
operator implements {{InputSelectable}} if checkpointing is enabled, because 
these features aren't compatible.

This step can be extremely expensive when the {{CodeGenOperatorFactory}} is 
used, because it requires all generated operator classes to actually be 
compiled (which usually only happens on the task manager).

If you know what jobs you're running this step can be pure overhead.
It would be nice if we'd have a flag to skip this validation step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie

2023-07-26 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32681:


 Summary: 
RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
 Key: FLINK-32681
 URL: https://issues.apache.org/jira/browse/FLINK-32681
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / State Backends, Tests
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
 Fix For: 1.18.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32571) Prebuild HBase testing docker image

2023-07-10 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32571:


 Summary: Prebuild HBase testing docker image
 Key: FLINK-32571
 URL: https://issues.apache.org/jira/browse/FLINK-32571
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / HBase
Reporter: Chesnay Schepler
 Fix For: hbase-3.0.0


For testing we currently build an HBase docker image on-demand during testing. 
We can improve reliability and testing times by building this image ahead of 
time, as the only parameter is the HBase version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32544) PythonFunctionFactoryTest fails on Java 17

2023-07-05 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32544:


 Summary: PythonFunctionFactoryTest fails on Java 17
 Key: FLINK-32544
 URL: https://issues.apache.org/jira/browse/FLINK-32544
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Legacy Components / Flink on Tez
Affects Versions: 1.18.0
Reporter: Chesnay Schepler


https://dev.azure.com/chesnay/flink/_build/results?buildId=3676=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068

{code}
Jul 05 10:17:23 Exception in thread "main" 
java.lang.reflect.InaccessibleObjectException: Unable to make field private 
static java.util.IdentityHashMap java.lang.ApplicationShutdownHooks.hooks 
accessible: module java.base does not "opens java.lang" to unnamed module 
@1880a322
Jul 05 10:17:23 at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
Jul 05 10:17:23 at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
Jul 05 10:17:23 at 
java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178)
Jul 05 10:17:23 at 
java.base/java.lang.reflect.Field.setAccessible(Field.java:172)
Jul 05 10:17:23 at 
org.apache.flink.client.python.PythonFunctionFactoryTest.closeStartedPythonProcess(PythonFunctionFactoryTest.java:115)
Jul 05 10:17:23 at 
org.apache.flink.client.python.PythonFunctionFactoryTest.cleanEnvironment(PythonFunctionFactoryTest.java:79)
Jul 05 10:17:23 at 
org.apache.flink.client.python.PythonFunctionFactoryTest.main(PythonFunctionFactoryTest.java:52)
{code}

Side-notes:
* maybe re-evaluate if the test could be run through maven now
* The shutdown hooks business is quite sketchy, and AFAICT would be unnecessary 
if the test were an ITCase



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32536) Python tests fail with Arrow DirectBuffer exception

2023-07-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32536:


 Summary: Python tests fail with Arrow DirectBuffer exception
 Key: FLINK-32536
 URL: https://issues.apache.org/jira/browse/FLINK-32536
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Tests
Affects Versions: 1.18.0
Reporter: Chesnay Schepler


https://dev.azure.com/chesnay/flink/_build/results?buildId=3674=logs=fba17979-6d2e-591d-72f1-97cf42797c11=727942b6-6137-54f7-1ef9-e66e706ea068

{code}
2023-07-04T12:54:15.5296754Z Jul 04 12:54:15 E   
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame.
2023-07-04T12:54:15.5299579Z Jul 04 12:54:15 E   : 
java.lang.RuntimeException: Arrow depends on DirectByteBuffer.(long, int) 
which is not available. Please set the system property 
'io.netty.tryReflectionSetAccessible' to 'true'.
2023-07-04T12:54:15.5302307Z Jul 04 12:54:15 E  at 
org.apache.flink.table.runtime.arrow.ArrowUtils.checkArrowUsable(ArrowUtils.java:184)
2023-07-04T12:54:15.5302859Z Jul 04 12:54:15 E  at 
org.apache.flink.table.runtime.arrow.ArrowUtils.collectAsPandasDataFrame(ArrowUtils.java:546)
2023-07-04T12:54:15.5303177Z Jul 04 12:54:15 E  at 
jdk.internal.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
2023-07-04T12:54:15.5303515Z Jul 04 12:54:15 E  at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-07-04T12:54:15.5303929Z Jul 04 12:54:15 E  at 
java.base/java.lang.reflect.Method.invoke(Method.java:568)
2023-07-04T12:54:15.5307338Z Jul 04 12:54:15 E  at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
2023-07-04T12:54:15.5309888Z Jul 04 12:54:15 E  at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
2023-07-04T12:54:15.5310306Z Jul 04 12:54:15 E  at 
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
2023-07-04T12:54:15.5337220Z Jul 04 12:54:15 E  at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
2023-07-04T12:54:15.5341859Z Jul 04 12:54:15 E  at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
2023-07-04T12:54:15.5342363Z Jul 04 12:54:15 E  at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
2023-07-04T12:54:15.5344866Z Jul 04 12:54:15 E  at 
java.base/java.lang.Thread.run(Thread.java:833)
{code}

{code}
2023-07-04T12:54:15.5663559Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_empty_to_pandas
2023-07-04T12:54:15.5663891Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_from_pandas
2023-07-04T12:54:15.5664299Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas
2023-07-04T12:54:15.5664655Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::BatchPandasConversionTests::test_to_pandas_for_retract_table
2023-07-04T12:54:15.5665003Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_empty_to_pandas
2023-07-04T12:54:15.5665360Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_from_pandas
2023-07-04T12:54:15.5665704Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas
2023-07-04T12:54:15.5666045Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_for_retract_table
2023-07-04T12:54:15.5666415Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_conversion.py::StreamPandasConversionTests::test_to_pandas_with_event_time
2023-07-04T12:54:15.5666840Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_function
2023-07-04T12:54:15.5667189Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_with_aux_group
2023-07-04T12:54:15.5667526Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_group_aggregate_without_keys
2023-07-04T12:54:15.5667882Z Jul 04 12:54:15 FAILED 
pyflink/table/tests/test_pandas_udaf.py::BatchPandasUDAFITTests::test_over_window_aggregate_function
2023-07-04T12:54:15.5668242Z Jul 04 12:54:15 FAILED 

[jira] [Created] (FLINK-32482) Add Java 17 to Docker build matrix

2023-06-29 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32482:


 Summary: Add Java 17 to Docker build matrix
 Key: FLINK-32482
 URL: https://issues.apache.org/jira/browse/FLINK-32482
 Project: Flink
  Issue Type: Sub-task
  Components: flink-docker, Release System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32479) Tests revoke leadership too early

2023-06-29 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32479:


 Summary: Tests revoke leadership too early
 Key: FLINK-32479
 URL: https://issues.apache.org/jira/browse/FLINK-32479
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination, Tests
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


There are a few tests issue a request to the dispatcher and immediately revoke 
leadership. In this case there is no guarantee that the guarantee arrived 
before leadership was revoked, so it could fail if it arrives afterwards since 
we reject requests if we aren't the leader anymore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32467) Move CleanupOnCloseRpcSystem to rpc-core

2023-06-28 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32467:


 Summary: Move CleanupOnCloseRpcSystem to rpc-core
 Key: FLINK-32467
 URL: https://issues.apache.org/jira/browse/FLINK-32467
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / RPC
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


This class is useful for any rpc system implementation and should thus be 
shared.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32380) Serialization of Java records fails

2023-06-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32380:


 Summary: Serialization of Java records fails
 Key: FLINK-32380
 URL: https://issues.apache.org/jira/browse/FLINK-32380
 Project: Flink
  Issue Type: Sub-task
  Components: API / Type Serialization System
Reporter: Chesnay Schepler


Reportedly Java records are not supported, because they are neither detected by 
our Pojo serializer nor supported by Kryo 2.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32379) Skip archunit tests in java1X-target profiles

2023-06-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32379:


 Summary: Skip archunit tests in java1X-target profiles
 Key: FLINK-32379
 URL: https://issues.apache.org/jira/browse/FLINK-32379
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


When compiling to Java 11/17 byte code archunit fails; not sure why. Maybe it 
finds more/less stuff or signatures are represented differently.

In any case let's use the Java 8 bytecode version as the "canonical" version 
and skip archunit otherwise.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32378) 2.0 Breaking Metric system changes

2023-06-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32378:


 Summary: 2.0 Breaking Metric system changes
 Key: FLINK-32378
 URL: https://issues.apache.org/jira/browse/FLINK-32378
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Umbrella issue for all breaking changes to the metric system



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32377) 2.0 Breaking REST API changes

2023-06-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32377:


 Summary: 2.0 Breaking REST API changes
 Key: FLINK-32377
 URL: https://issues.apache.org/jira/browse/FLINK-32377
 Project: Flink
  Issue Type: Technical Debt
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 2.0.0


Umbrella issue for all breaking changes to the REST API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32370) JDBC SQl gateway e2e test is unstable

2023-06-16 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32370:


 Summary: JDBC SQl gateway e2e test is unstable
 Key: FLINK-32370
 URL: https://issues.apache.org/jira/browse/FLINK-32370
 Project: Flink
  Issue Type: Technical Debt
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
 Fix For: 1.18.0
 Attachments: flink-vsts-sql-gateway-0-fv-az75-650.log, 
flink-vsts-standalonesession-0-fv-az75-650.log, 
flink-vsts-taskexecutor-0-fv-az75-650.log

The client is failing while trying to collect data when the job already 
finished on the cluster.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32369) Setup cron build

2023-06-16 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32369:


 Summary: Setup cron build
 Key: FLINK-32369
 URL: https://issues.apache.org/jira/browse/FLINK-32369
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32359) AdaptiveSchedulerBuilder shoudl accept executor service in constructor

2023-06-15 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32359:


 Summary: AdaptiveSchedulerBuilder shoudl accept executor service 
in constructor
 Key: FLINK-32359
 URL: https://issues.apache.org/jira/browse/FLINK-32359
 Project: Flink
  Issue Type: Technical Debt
  Components: Tests
Reporter: Chesnay Schepler
 Fix For: 1.18.0


The ASBuilder currently accepts mandatory arguments in both the constructor and 
final {{build()}} method.
This makes it difficult to create composite helper factory methods, since you 
always need to pass a special value in build(), usually leaking details of the 
test setup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32358) CI may unintentionally use fallback akka loader

2023-06-15 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32358:


 Summary: CI may unintentionally use fallback akka loader
 Key: FLINK-32358
 URL: https://issues.apache.org/jira/browse/FLINK-32358
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


We have a fallback akka loader for developer convenience in the IDE, that is on 
the classpath of most modules. Depending on the order of jars on the classpath 
it can happen that the fallback loader appears first, which we dont want 
because it slows down the build and creates noisy logs.

We can add a simple prioritization scheme to the rpc system loading to remedy 
that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32338) Add FailsOnJava17 annotation

2023-06-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32338:


 Summary: Add FailsOnJava17 annotation
 Key: FLINK-32338
 URL: https://issues.apache.org/jira/browse/FLINK-32338
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Add an annotation for disabling specific tests on Java 17, similar to 
FailsOnJava11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32336) PartitionITCase#ComparablePojo should be public

2023-06-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32336:


 Summary: PartitionITCase#ComparablePojo should be public
 Key: FLINK-32336
 URL: https://issues.apache.org/jira/browse/FLINK-32336
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


POJOs should be public, but this one is private forcing it go through Kryo, 
which is currently failing for some odd reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32330) Setup Java 17 in e2e builds

2023-06-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32330:


 Summary: Setup Java 17 in e2e builds
 Key: FLINK-32330
 URL: https://issues.apache.org/jira/browse/FLINK-32330
 Project: Flink
  Issue Type: Sub-task
  Components: Test Infrastructure
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32329) Do not overwrite env.java.opts.all in HA e2e test

2023-06-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32329:


 Summary: Do not overwrite env.java.opts.all in HA e2e test
 Key: FLINK-32329
 URL: https://issues.apache.org/jira/browse/FLINK-32329
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Avoid overriding env.java.opts.all since it will soon contain the module 
declarations required for running Java 17.

This is a bit of a hack; a nicer approach would be to append to the existing 
value, but ain't no one got time to deal with bash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32328) Ensure surefire baseLine is picked up by IntelliJ

2023-06-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32328:


 Summary: Ensure surefire baseLine is picked up by IntelliJ
 Key: FLINK-32328
 URL: https://issues.apache.org/jira/browse/FLINK-32328
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


We currently configure JVM arguments exclusively within the surefire 
executions, which IntelliJ doesn't read. We should also set the baseArgsLine 
(which in the future will contain module declarations) to the base surefire 
configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32327) Python Kafka connector runs into strange NullPointerException

2023-06-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32327:


 Summary: Python Kafka connector runs into strange 
NullPointerException
 Key: FLINK-32327
 URL: https://issues.apache.org/jira/browse/FLINK-32327
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Chesnay Schepler


The following error occurs when running the python kafka tests:
(this uses a slightly modified version of the code, but the error also happens 
without it)

{code:python}
 def set_record_serializer(self, record_serializer: 
'KafkaRecordSerializationSchema') \
 -> 'KafkaSinkBuilder':
 """
 Sets the :class:`KafkaRecordSerializationSchema` that transforms 
incoming records to kafka
 producer records.
 
 :param record_serializer: The :class:`KafkaRecordSerializationSchema`.
 """
 # NOTE: If topic selector is a generated first-column selector, do 
extra preprocessing
 j_topic_selector = 
get_field_value(record_serializer._j_serialization_schema,
'topicSelector')
 
 caching_name_suffix = 
'KafkaRecordSerializationSchemaBuilder.CachingTopicSelector'
 if 
j_topic_selector.getClass().getCanonicalName().endswith(caching_name_suffix):
 class_name = get_field_value(j_topic_selector, 'topicSelector')\
 .getClass().getCanonicalName()
 >   if class_name.startswith('com.sun.proxy') or 
 > class_name.startswith('jdk.proxy'):
 E   AttributeError: 'NoneType' object has no attribute 'startswith'
{code}

My assumption is that {{getCanonicalName}} returns {{null}} for some objects, 
and this set of objects may have increased in Java 17. I tried adding a null 
check, but that caused other tests to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32314) Ignore class-loading errors after RPC system shutdown

2023-06-12 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32314:


 Summary: Ignore class-loading errors after RPC system shutdown
 Key: FLINK-32314
 URL: https://issues.apache.org/jira/browse/FLINK-32314
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / RPC, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


In tests we occasionally see the akka rpc service throwing class loading errors 
_after_ it was shut down.
AFAICT our shutdown procedure is correct, and it's just akka shutting down some 
things asynchronously.
I couldn't figure out why/what is still running, so as a bandaid I suggest to 
ignore classloading errors after the rpc service shutdown has completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32304) Reduce rpc-akka jar

2023-06-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32304:


 Summary: Reduce rpc-akka jar
 Key: FLINK-32304
 URL: https://issues.apache.org/jira/browse/FLINK-32304
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Runtime / RPC
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.2


We bundle unnecessary dependencies in the rpc-akka jar; we can easily shave of 
15mb of dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32302) Disable Hbase 2.x tests on Java 17

2023-06-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32302:


 Summary: Disable Hbase 2.x tests on Java 17
 Key: FLINK-32302
 URL: https://issues.apache.org/jira/browse/FLINK-32302
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / HBase, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Lacking support on the HBase side. Version bumps may solve it, but that's out 
of scope of this issue since the connector is being externalized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32301) common.sh#create_ha_config should use set_config_key

2023-06-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32301:


 Summary: common.sh#create_ha_config should use set_config_key
 Key: FLINK-32301
 URL: https://issues.apache.org/jira/browse/FLINK-32301
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Instead of replacing the entire configuration, set the desired individual 
options instead.
The current approach isn't great because it prevents us from setting required 
defaults in the flink-dist config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32297) Use Temurin image in FlinkImageBuilder

2023-06-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32297:


 Summary: Use Temurin image in FlinkImageBuilder
 Key: FLINK-32297
 URL: https://issues.apache.org/jira/browse/FLINK-32297
 Project: Flink
  Issue Type: Sub-task
  Components: Test Infrastructure
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


The FlinkImageBuilder currently uses openjdk images. I've seen issues with 
these on Java 17, and propose to use Temurin, similar to the prod images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32295) Try out Infra-provided Gradle Enterprise

2023-06-09 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32295:


 Summary: Try out Infra-provided Gradle Enterprise
 Key: FLINK-32295
 URL: https://issues.apache.org/jira/browse/FLINK-32295
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


Infra has a Gradle Enterprise instance that can be used for Github Action 
branch builds (not PRs). We could try this out in one of the connector repos to 
see if it provides value to us; if so rolling this out to all 
connector/auxiliary repos could be interesting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32291) Hive E2E test fails consistently

2023-06-08 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32291:


 Summary: Hive E2E test fails consistently
 Key: FLINK-32291
 URL: https://issues.apache.org/jira/browse/FLINK-32291
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Hive, Tests
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
 Fix For: 1.18.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=49754=results



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32290) Enable -XX:+IgnoreUnrecognizedVMOptions

2023-06-08 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32290:


 Summary: Enable -XX:+IgnoreUnrecognizedVMOptions
 Key: FLINK-32290
 URL: https://issues.apache.org/jira/browse/FLINK-32290
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Build System, Deployment / YARN
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


We can make our lives a lot easier by enabling {{IgnoreUnrecognizedVMOptions}} 
for all processes. With this we can set add-opens/add-exports independent of 
what JDK is actually being used, removing a major source of complexity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32239) Unify TestJvmProcess and TestProcessBuilder

2023-06-01 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32239:


 Summary: Unify TestJvmProcess and TestProcessBuilder
 Key: FLINK-32239
 URL: https://issues.apache.org/jira/browse/FLINK-32239
 Project: Flink
  Issue Type: Technical Debt
  Components: Test Infrastructure
Reporter: Chesnay Schepler
 Fix For: 1.18.0


Both of these utility classes are used to spawn additional JVM processes during 
tests, and contain a fair bit of duplicated logic. We can unify them to ease 
maintenance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32238) Stable approach for installing libssl

2023-06-01 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32238:


 Summary: Stable approach for installing libssl
 Key: FLINK-32238
 URL: https://issues.apache.org/jira/browse/FLINK-32238
 Project: Flink
  Issue Type: Technical Debt
  Components: Test Infrastructure
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.16.3, 1.17.2


I think I found a stable way to install libssl on CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32236) Ease YarnTestBase allowlist address regex

2023-06-01 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32236:


 Summary: Ease YarnTestBase allowlist address regex
 Key: FLINK-32236
 URL: https://issues.apache.org/jira/browse/FLINK-32236
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / YARN, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


The YarnTestBase contains allow-list items like this:
{code}Remote connection to \\[null\\] failed with java.net.ConnectException: 
Connection refused{code}

I've seen this exception a few times without the address being null. This could 
be due to difference in how Java 17 resolves addresses (?).
In any case I don't see any harm in relaxing this regex to accept any address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32226) RestClusterClient leaks jobgraph file if submission fails

2023-05-31 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32226:


 Summary: RestClusterClient leaks jobgraph file if submission fails
 Key: FLINK-32226
 URL: https://issues.apache.org/jira/browse/FLINK-32226
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.2


{code:java}
submissionFuture
.thenCompose(ignored -> jobGraphFileFuture)
.thenAccept(
jobGraphFile -> {
try {
Files.delete(jobGraphFile);
} catch (IOException e) {
LOG.warn("Could not delete temporary file {}.", 
jobGraphFile, e);
}
});
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32185) Remove M2_HOME usages

2023-05-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32185:


 Summary: Remove M2_HOME usages
 Key: FLINK-32185
 URL: https://issues.apache.org/jira/browse/FLINK-32185
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


Apparently M2_HOME is no longer evaluated by Maven, so we either need to adjust 
some CI stuff or outright remove existing usages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32184) Use revision version property

2023-05-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32184:


 Summary: Use revision version property
 Key: FLINK-32184
 URL: https://issues.apache.org/jira/browse/FLINK-32184
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


With the revision property we can centrally define the project version in the 
root pom, and no longer have to change the poms of all modules when creating a 
release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32183) Use maven.multiModuleProjectDirectory property instead of rootDir plugin

2023-05-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32183:


 Summary: Use maven.multiModuleProjectDirectory property instead of 
rootDir plugin
 Key: FLINK-32183
 URL: https://issues.apache.org/jira/browse/FLINK-32183
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


Drop the now redundant rootDir plugin in favor of a new built-in property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32182) Use original japicmp plugin

2023-05-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32182:


 Summary: Use original japicmp plugin
 Key: FLINK-32182
 URL: https://issues.apache.org/jira/browse/FLINK-32182
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.0


We currently use a japicmp fork for maven 3.2.5 compatibility, then we can now 
drop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32181) Drop support for Maven 3.2.5

2023-05-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32181:


 Summary: Drop support for Maven 3.2.5
 Key: FLINK-32181
 URL: https://issues.apache.org/jira/browse/FLINK-32181
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System
Reporter: Chesnay Schepler
 Fix For: 1.19.0


Collection of improvements we can make when dropping support for Maven 3.2.5.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32179) Handle more repo names for automatic dist discovery

2023-05-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32179:


 Summary: Handle more repo names for automatic dist discovery
 Key: FLINK-32179
 URL: https://issues.apache.org/jira/browse/FLINK-32179
 Project: Flink
  Issue Type: Technical Debt
  Components: Test Infrastructure
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.2


The e2e tests have a routine to auto-detect the distribution that they need to 
actually run Flink. When Flink is checked out in a directory not starting with 
"flink" the auto-discovery doesn't find it.
We can improve this slightly by adjusting the iteration condition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32169) Show allocated slots on TM page

2023-05-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32169:


 Summary: Show allocated slots on TM page
 Key: FLINK-32169
 URL: https://issues.apache.org/jira/browse/FLINK-32169
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Runtime / Web Frontend
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Show the allocated slogs on the TM page, so that you can better understand 
which job is consuming what resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32168) Log required/available resources in RM

2023-05-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32168:


 Summary: Log required/available resources in RM
 Key: FLINK-32168
 URL: https://issues.apache.org/jira/browse/FLINK-32168
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


When matching requirements against available resource the RM currently doesn't 
log anything apart from whether it could fulfill the resources or not.

We can make the system easier to audit by logging the current requirements, 
available resources, and how many resources are left after the matching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32167) Log dynamic slot creation on task manager

2023-05-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32167:


 Summary: Log dynamic slot creation on task manager
 Key: FLINK-32167
 URL: https://issues.apache.org/jira/browse/FLINK-32167
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


When a slot is dynamically allocated on the TM we should log that this happens, 
what resources it consumes and what the remaining resources are.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32166) Show unassigned/total TM resources in web ui

2023-05-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32166:


 Summary: Show unassigned/total TM resources in web ui
 Key: FLINK-32166
 URL: https://issues.apache.org/jira/browse/FLINK-32166
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


It is important to know how many resources of a TM are currently _assigned_ to 
jobs.
This is different to what resources currently _used_, since you can have 
assigned 1gb memory to a job with it only using 10mb at this time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32165) Improve observability ofd fine-grained resource management

2023-05-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32165:


 Summary: Improve observability ofd fine-grained resource management
 Key: FLINK-32165
 URL: https://issues.apache.org/jira/browse/FLINK-32165
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination, Runtime / Web Frontend
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Right now fine-grained resource management is way too much of a black-box, with 
the only source of information being the taskmanager rest endpoints.

While this is fine-ish for services built around it the developer experience is 
suffering greatly and it becomes impossible to reason about the system 
afterwards (because we don't even log anything).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32162) Misleading log message due to missing null check

2023-05-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32162:


 Summary: Misleading log message due to missing null check
 Key: FLINK-32162
 URL: https://issues.apache.org/jira/browse/FLINK-32162
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Updating the job requirements always logs "Failed to update requirements for 
job {}." because we don't check whether the error is not null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32154) Setup checkstyle rule to forbid mockito/powermock

2023-05-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32154:


 Summary: Setup checkstyle rule to forbid mockito/powermock
 Key: FLINK-32154
 URL: https://issues.apache.org/jira/browse/FLINK-32154
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32153) Limit powermock to flink-core/-runtime

2023-05-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32153:


 Summary: Limit powermock to flink-core/-runtime
 Key: FLINK-32153
 URL: https://issues.apache.org/jira/browse/FLINK-32153
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32152) Consolidate mocking library usage

2023-05-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32152:


 Summary: Consolidate mocking library usage
 Key: FLINK-32152
 URL: https://issues.apache.org/jira/browse/FLINK-32152
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Use mockito instead of powermock wherever possible, with the goal of 
restricting powermock to specific modules, eventually dropping it entirely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32149) Remove some unnecessary mocking usages

2023-05-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-32149:


 Summary: Remove some unnecessary mocking usages
 Key: FLINK-32149
 URL: https://issues.apache.org/jira/browse/FLINK-32149
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31972) Remove powermock whitebox usages

2023-04-28 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31972:


 Summary: Remove powermock whitebox usages
 Key: FLINK-31972
 URL: https://issues.apache.org/jira/browse/FLINK-31972
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31971) Drop HadoopRecoverableWriterOldHadoopWithNoTruncateSupportTest

2023-04-28 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31971:


 Summary: Drop 
HadoopRecoverableWriterOldHadoopWithNoTruncateSupportTest
 Key: FLINK-31971
 URL: https://issues.apache.org/jira/browse/FLINK-31971
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Hadoop Compatibility, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


This test explicitly checks behavior for Hadoop < 2.7, which we no longer 
support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31940) DataStreamCsvITCase#CityPojo should be public

2023-04-25 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31940:


 Summary: DataStreamCsvITCase#CityPojo should be public
 Key: FLINK-31940
 URL: https://issues.apache.org/jira/browse/FLINK-31940
 Project: Flink
  Issue Type: Sub-task
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Since the class is package-private it is serialized via Kryo and not the pojo 
serializer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31934) Remove mocking in RocksDB tests

2023-04-25 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31934:


 Summary: Remove mocking in RocksDB tests
 Key: FLINK-31934
 URL: https://issues.apache.org/jira/browse/FLINK-31934
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31933) Remove Whitebox usage in ExpressionKeysTest

2023-04-25 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31933:


 Summary: Remove Whitebox usage in ExpressionKeysTest
 Key: FLINK-31933
 URL: https://issues.apache.org/jira/browse/FLINK-31933
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Reduce illegal reflective accesses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31919) Skip ClosureCleaner if object can be serialized

2023-04-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31919:


 Summary: Skip ClosureCleaner if object can be serialized
 Key: FLINK-31919
 URL: https://issues.apache.org/jira/browse/FLINK-31919
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


Given an object the ClosureCleaner currently recursively digs into every 
non-static/transient field of the given object. This causes a problem on Java 
17 because these reflective accesses all need to be explicitly allowed 
beforehand.

Instead, we could limit the CC to objects that fail serialization, because if 
something can be serialized there isn't anything for the CC to do.
This should allow us to avoid a lot of unnecessary reflection accesses to 
immutable JDK classes, like Strings/BigDecimals etc etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31916) Python API only respects deprecated env.java.opts key

2023-04-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31916:


 Summary: Python API only respects deprecated env.java.opts key
 Key: FLINK-31916
 URL: https://issues.apache.org/jira/browse/FLINK-31916
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Runtime / Configuration
Reporter: Chesnay Schepler
 Fix For: 1.18.0


pyflink_gateway_server.py is only reading the deprecated env.java.opts from the 
configuration.

This key should only be used as a fallback, with env.java.opts.tm/jm/client 
being the actual keys to support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31915) Python API incorrectly passes env.java.opts as single argument

2023-04-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31915:


 Summary: Python API incorrectly passes env.java.opts as single 
argument
 Key: FLINK-31915
 URL: https://issues.apache.org/jira/browse/FLINK-31915
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.16.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


The python API passes all java options as a single string argument, which 
typically means that the JVM will reject them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31913) sql-client.sh does not respect env.java.opts.all/client

2023-04-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31913:


 Summary: sql-client.sh does not respect env.java.opts.all/client
 Key: FLINK-31913
 URL: https://issues.apache.org/jira/browse/FLINK-31913
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Scripts, Table SQL / Client
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31912) Upgrade bytebuddy

2023-04-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31912:


 Summary: Upgrade bytebuddy
 Key: FLINK-31912
 URL: https://issues.apache.org/jira/browse/FLINK-31912
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31911) Bad address construction in SqlClientTest

2023-04-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31911:


 Summary: Bad address construction in SqlClientTest
 Key: FLINK-31911
 URL: https://issues.apache.org/jira/browse/FLINK-31911
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Client, Tests
Affects Versions: 1.16.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


The SqlClientTest constructs a host:port pair with this:

{code}
InetSocketAddress.createUnresolved(

SQL_GATEWAY_REST_ENDPOINT_EXTENSION.getTargetAddress(),

SQL_GATEWAY_REST_ENDPOINT_EXTENSION.getTargetPort())
.toString()
{code}

This is unnecessarily complicated and fails on Java 17 because the toString 
representation is _not_ guaranteed to return something of the form host:port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31807) Test architecture tests don't cover all tests

2023-04-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31807:


 Summary: Test architecture tests don't cover all tests
 Key: FLINK-31807
 URL: https://issues.apache.org/jira/browse/FLINK-31807
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Cassandra
Affects Versions: cassandra-4.0.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31806) Prod architecture tests didn't detect non-public API usage

2023-04-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31806:


 Summary: Prod architecture tests didn't detect non-public API usage
 Key: FLINK-31806
 URL: https://issues.apache.org/jira/browse/FLINK-31806
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Cassandra, Tests
Affects Versions: cassandra-3.0.0, 1.18.0
Reporter: Chesnay Schepler


FLINK-31805 wasn't detected by the production architecture tests.

Not sure if this is an issue on the cassandra or Flink side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31805) Cassandra Source shouldn't use IOUtils

2023-04-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31805:


 Summary: Cassandra Source shouldn't use IOUtils
 Key: FLINK-31805
 URL: https://issues.apache.org/jira/browse/FLINK-31805
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Cassandra
Affects Versions: cassandra-4.0.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: cassandra-4.0.0


IOUtils is not part of the public API and shouldn't be used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31804) ITCase MiniCluster test architecture rule should accept MiniClusterTestEnvironment

2023-04-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31804:


 Summary: ITCase MiniCluster test architecture rule should accept 
MiniClusterTestEnvironment
 Key: FLINK-31804
 URL: https://issues.apache.org/jira/browse/FLINK-31804
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Common, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31744) Extend Adaptive Scheduler sparse EG to contain maxParallelism

2023-04-06 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31744:


 Summary: Extend Adaptive Scheduler sparse EG to contain 
maxParallelism
 Key: FLINK-31744
 URL: https://issues.apache.org/jira/browse/FLINK-31744
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination, Runtime / REST
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


When a job is WaitingForResources the adpative scheduler returns a sparse 
execution graph that omits many details that are only know at execution time 
(like subtasks).

We could include all JobVertex-level information though, which would cover 
things like the vertex id/name and the maxParallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31738) FlameGraphTypeQueryParameter#Type clashes with java.reflect.Type in generated clients

2023-04-05 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31738:


 Summary: FlameGraphTypeQueryParameter#Type clashes with 
java.reflect.Type in generated clients
 Key: FLINK-31738
 URL: https://issues.apache.org/jira/browse/FLINK-31738
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.1


Generating a client with the openapi generators causes compile errors because 
the generated file imports java.reflect.Type, but also the generated "Type" 
model.

For convenience it would be neat to give this enum a slightly different name, 
because working around this issue is surprisingly annoying.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31735) JobDetailsInfo plan incorrectly documented as string

2023-04-05 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31735:


 Summary: JobDetailsInfo plan incorrectly documented as string
 Key: FLINK-31735
 URL: https://issues.apache.org/jira/browse/FLINK-31735
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.1


The {{plan}} field in the JobDefaultsInfo contains an object, not a string. 
Internally we handle it as a string, but write it out as an object.
The docs generators aren't aware of this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31733) Model name clashes in OpenAPI spec

2023-04-05 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31733:


 Summary: Model name clashes in OpenAPI spec
 Key: FLINK-31733
 URL: https://issues.apache.org/jira/browse/FLINK-31733
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.1


The OpenAPi spec uses simple class names for naming models. There are however 
several models, usually inner classes, that share simple names, like "Summary".

This goes undetected and breaks the model for some API calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31728) Remove Scala API dependencies from batch/streaming examples

2023-04-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31728:


 Summary: Remove Scala API dependencies from batch/streaming 
examples
 Key: FLINK-31728
 URL: https://issues.apache.org/jira/browse/FLINK-31728
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System, Examples
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0


The example modules have leftover Scala API dependencies and build 
infrastructure. Remove them, along with the scala suffix on these modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31711) OpenAPI spec omits complete-statement request body

2023-04-03 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31711:


 Summary: OpenAPI spec omits complete-statement request body
 Key: FLINK-31711
 URL: https://issues.apache.org/jira/browse/FLINK-31711
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.1


The OpenAPI generator omits request bodies for get requests because it is 
usually a bad idea.

Still, the generator shouldn't omit this on it's own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31672) Requirement validation does nto take user-specified maxParallelism into account

2023-03-30 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31672:


 Summary: Requirement validation does nto take user-specified 
maxParallelism into account
 Key: FLINK-31672
 URL: https://issues.apache.org/jira/browse/FLINK-31672
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31657) ConfigurationInfo generates incorrect openapi schema

2023-03-29 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31657:


 Summary: ConfigurationInfo generates incorrect openapi schema
 Key: FLINK-31657
 URL: https://issues.apache.org/jira/browse/FLINK-31657
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.16.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.2, 1.18.0, 1.17.1


ConfigurationInfo extends ArrayList, and the schema generator picks up 
List#isEmpty as a property.
This results in an invalid schema, as arrays cant have properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31608) Re-evaluate 'min-parallelism-increase' option

2023-03-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31608:


 Summary: Re-evaluate 'min-parallelism-increase' option
 Key: FLINK-31608
 URL: https://issues.apache.org/jira/browse/FLINK-31608
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration, Runtime / Coordination
Reporter: Chesnay Schepler
 Fix For: 1.18.0


This option was meant to prevent scale up operations where the benefit doesn't 
outweigh the cost, like scaling up to increase a single vertices parallelism by 
1. Meanwhile, scale-down operations were always immediately executed, because 
they were always the result of a stopped TaskManager, causing the job to 
restart anyway.

Now that users can change the requirements at will this has changed, and the 
expected behavior is overall undefined.

We need to answer:
* should there be a dedicated option for limiting scale-down operations if the 
requirements were changed?
* should the min-parallelism-*increase* option be generalized to a 
min-parallelism-*change* option?
* How shall operations be handled that scale different vertices up or down at 
the same? So far the decision was made on the cumulative parallelism change, 
but in this case the parallelism distribution can change significantly while 
the cumulative change is 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31591) Extend JobGraphWriter to persist requirements

2023-03-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31591:


 Summary: Extend JobGraphWriter to persist requirements
 Key: FLINK-31591
 URL: https://issues.apache.org/jira/browse/FLINK-31591
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31590) Allow setting JobResourceRequirements through JobMasterGateway

2023-03-23 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31590:


 Summary: Allow setting JobResourceRequirements through 
JobMasterGateway
 Key: FLINK-31590
 URL: https://issues.apache.org/jira/browse/FLINK-31590
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31475) Allow project to be user-defined in release scripts

2023-03-15 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31475:


 Summary: Allow project to be user-defined in release scripts
 Key: FLINK-31475
 URL: https://issues.apache.org/jira/browse/FLINK-31475
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Common, Release System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


The connector release scripts derive the project name from the repository.
For some esoteric cases (like the flink-connector-shared-utils repo) it would 
be beneficial to be able to override this on the command-line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31455) Add a simple test project for running CI workflow in shared repo

2023-03-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31455:


 Summary: Add a simple test project for running CI workflow in 
shared repo
 Key: FLINK-31455
 URL: https://issues.apache.org/jira/browse/FLINK-31455
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI, Connectors / Common
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


connector-shared-utils has no CI for the CI workflow, which has repeatedly 
shown to be a problem.
Setup some simple workflows that at least run CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31454) Shared CI workflow always caches snapshot binaries

2023-03-14 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-31454:


 Summary: Shared CI workflow always caches snapshot binaries
 Key: FLINK-31454
 URL: https://issues.apache.org/jira/browse/FLINK-31454
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI, Connectors / Common
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


The if conditions need to work on strings because they use environment 
variables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30967) Add MongoDB connector documentation

2023-02-08 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30967:


 Summary: Add MongoDB connector documentation
 Key: FLINK-30967
 URL: https://issues.apache.org/jira/browse/FLINK-30967
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / MongoDB, Documentation
Reporter: Chesnay Schepler
 Fix For: mongodb-1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30963) Switch binary downloads to archive.apache.org

2023-02-08 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30963:


 Summary: Switch binary downloads to archive.apache.org
 Key: FLINK-30963
 URL: https://issues.apache.org/jira/browse/FLINK-30963
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI, Connectors / Common
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


archive.apache.org is the only stable download link for binaries.
Now that we properly fixed the binary caching in the connector workflows it 
should be fine to make use of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30930) Automatically determine Flink binary download URL from version

2023-02-06 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30930:


 Summary: Automatically determine Flink binary download URL from 
version
 Key: FLINK-30930
 URL: https://issues.apache.org/jira/browse/FLINK-30930
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30895) SlotSharingSlotAllocator may waste slots

2023-02-03 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30895:


 Summary: SlotSharingSlotAllocator may waste slots
 Key: FLINK-30895
 URL: https://issues.apache.org/jira/browse/FLINK-30895
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.16.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.0, 1.17.1


The allocated evenly distributes slots across slot sharing groups independent 
of how many slots the vertices in that group actually need.

This can cause slots to be unused.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >