[jira] [Created] (FLINK-35832) IFNULL returns error result in Flink SQL
Yu Chen created FLINK-35832: --- Summary: IFNULL returns error result in Flink SQL Key: FLINK-35832 URL: https://issues.apache.org/jira/browse/FLINK-35832 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 2.0.0 Reporter: Yu Chen Run following SQL in sql-client: The correct result should be '16', but we got '1' on the master. {code:java} Flink SQL> SET 'sql-client.execution.result-mode' = 'tableau'; [INFO] Execute statement succeeded. Flink SQL> select JSON_VALUE('{"a":16}','$.a'), IFNULL(JSON_VALUE('{"a":16}','$.a'),'0'); ++++ | op | EXPR$0 | EXPR$1 | ++++ | +I | 16 | 1 | ++++ Received a total of 1 row (0.30 seconds){code} With some quick debugging, I guess it may be caused by [FLINK-24413|https://issues.apache.org/jira/browse/FLINK-24413] which was introduced in Flink version 1.15. I think the wrong result '1' was produced because the simplifying SQL procedure assumed that parameter 1 and parameter 2 ('0' was char) of IFNULL were of the same type, and therefore implicitly cast '16' to char, resulting in the incorrect result. I have tested the SQL in the following version: ||Flink Version||Result|| |1.13|16,16| |1.17|16,1| |1.19|16,1| |master|16,1| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34968) Update flink-web copyright to 2024
Yu Chen created FLINK-34968: --- Summary: Update flink-web copyright to 2024 Key: FLINK-34968 URL: https://issues.apache.org/jira/browse/FLINK-34968 Project: Flink Issue Type: Improvement Components: Project Website Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34622) Typo of execution_mode configuration name in Chinese document
Yu Chen created FLINK-34622: --- Summary: Typo of execution_mode configuration name in Chinese document Key: FLINK-34622 URL: https://issues.apache.org/jira/browse/FLINK-34622 Project: Flink Issue Type: Bug Components: Documentation Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34099) CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog is unstable on AZP
Yu Chen created FLINK-34099: --- Summary: CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog is unstable on AZP Key: FLINK-34099 URL: https://issues.apache.org/jira/browse/FLINK-34099 Project: Flink Issue Type: Bug Affects Versions: 1.19.0 Reporter: Yu Chen This build [Pipelines - Run 20240115.30 logs (azure.com)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56403=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba] fails as {code:java} Jan 15 18:29:51 18:29:51.938 [ERROR] org.apache.flink.test.checkpointing.CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog -- Time elapsed: 2.022 s <<< FAILURE! Jan 15 18:29:51 org.opentest4j.AssertionFailedError: Jan 15 18:29:51 Jan 15 18:29:51 expected: 0 Jan 15 18:29:51 but was: 1 Jan 15 18:29:51 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Jan 15 18:29:51 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) Jan 15 18:29:51 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) Jan 15 18:29:51 at org.apache.flink.test.checkpointing.CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog(CheckpointIntervalDuringBacklogITCase.java:141) Jan 15 18:29:51 at java.lang.reflect.Method.invoke(Method.java:498) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34029) Support different profiling mode on Flink WEB
Yu Chen created FLINK-34029: --- Summary: Support different profiling mode on Flink WEB Key: FLINK-34029 URL: https://issues.apache.org/jira/browse/FLINK-34029 Project: Flink Issue Type: Sub-task Components: Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33613) Python UDF Runner process leak in Process Mode
Yu Chen created FLINK-33613: --- Summary: Python UDF Runner process leak in Process Mode Key: FLINK-33613 URL: https://issues.apache.org/jira/browse/FLINK-33613 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.17.0 Reporter: Yu Chen Attachments: ps-ef.txt, streaming_word_count-1.py While working with PyFlink, we found that in Process Mode, the Python UDF process may leak after a failover of the job. It leads to a rising number of processes with their threads in the host machine, which eventually results in failure to create new threads. You can try to reproduce it with the attached test task `streamin_word_count.py`. (Note that the job will continue failover, and you can watch the process leaks by `ps -ef` on Taskmanager. Our test environment: * K8S Application Mode * 4 Taskmanagers with 12 slots/TM * Job's parallelism was set to 48 The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence with parallelism (48), but we found that there are 180 processes after several failovers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33474) ShowPlan throws undefined exception In Flink Web Submit Page
Yu Chen created FLINK-33474: --- Summary: ShowPlan throws undefined exception In Flink Web Submit Page Key: FLINK-33474 URL: https://issues.apache.org/jira/browse/FLINK-33474 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen Attachments: image-2023-11-07-13-53-08-216.png The exception as shown in the figure below, meanwhile, the job plan cannot be displayed properly. The root cause is that the dagreComponent is located in the nz-drawer and is only loaded when the drawer is visible, so we need to wait for the drawer to finish loading and then render the job plan. !image-2023-11-07-13-53-08-216.png|width=400,height=190! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33436) Documentation on the built-in Profiler
Yu Chen created FLINK-33436: --- Summary: Documentation on the built-in Profiler Key: FLINK-33436 URL: https://issues.apache.org/jira/browse/FLINK-33436 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33435) The visualization and download capabilities of profiling history
Yu Chen created FLINK-33435: --- Summary: The visualization and download capabilities of profiling history Key: FLINK-33435 URL: https://issues.apache.org/jira/browse/FLINK-33435 Project: Flink Issue Type: Sub-task Components: Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33434) Support invoke async-profiler on Taskmanager through REST API
Yu Chen created FLINK-33434: --- Summary: Support invoke async-profiler on Taskmanager through REST API Key: FLINK-33434 URL: https://issues.apache.org/jira/browse/FLINK-33434 Project: Flink Issue Type: Sub-task Components: Runtime / REST Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33433) Support invoke async-profiler on Jobmanager through REST API
Yu Chen created FLINK-33433: --- Summary: Support invoke async-profiler on Jobmanager through REST API Key: FLINK-33433 URL: https://issues.apache.org/jira/browse/FLINK-33433 Project: Flink Issue Type: Sub-task Components: Runtime / REST Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33325) FLIP-375: Built-in cross-platform powerful java profiler
Yu Chen created FLINK-33325: --- Summary: FLIP-375: Built-in cross-platform powerful java profiler Key: FLINK-33325 URL: https://issues.apache.org/jira/browse/FLINK-33325 Project: Flink Issue Type: Improvement Components: Runtime / REST, Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen This is an umbrella JIRA of [FLIP-375|https://cwiki.apache.org/confluence/x/64lEE] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Flink
Yu Chen created FLINK-33230: --- Summary: Support Expanding ExecutionGraph to StreamGraph in Flink Key: FLINK-33230 URL: https://issues.apache.org/jira/browse/FLINK-33230 Project: Flink Issue Type: Improvement Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
Yu Chen created FLINK-32754: --- Summary: Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE Key: FLINK-32754 URL: https://issues.apache.org/jira/browse/FLINK-32754 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.17.1, 1.17.0 Reporter: Yu Chen Attachments: image-2023-08-04-18-28-05-897.png We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=2442,height=1123! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
Yu Chen created FLINK-32186: --- Summary: Support subtask stack auto-search when redirecting from subtask backpressure tab Key: FLINK-32186 URL: https://issues.apache.org/jira/browse/FLINK-32186 Project: Flink Issue Type: Improvement Components: Runtime / Web Frontend Affects Versions: 1.18.0 Reporter: Yu Chen Attachments: image-2023-05-25-15-52-54-383.png, image-2023-05-25-16-08-14-325.png Note that we have introduced a dump link on the backpressure page in [FLINK-29996|https://issues.apache.org/jira/browse/FLINK-29996](Figure 1), which helps to check what are the corresponding subtask doing more easily. But we still have to search for the corresponding call stack of the back-pressured subtask from the whole TaskManager thread dumps, it's not convenient enough. Therefore, I would like to trigger the search for the editor automatically after redirecting from the backpressure tab, which will help to scroll the thread dumps to the corresponding call stack of the back-pressured subtask (As shown in Figure 2). !image-2023-05-25-15-52-54-383.png! Figure 1. ThreadDump Link in Backpressure Tab !image-2023-05-25-16-08-14-325.png! Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29107) Bump up spotless version to improve efficiently
Yu Chen created FLINK-29107: --- Summary: Bump up spotless version to improve efficiently Key: FLINK-29107 URL: https://issues.apache.org/jira/browse/FLINK-29107 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.15.2 Reporter: Yu Chen Attachments: image-2022-08-25-22-10-54-453.png Hi all, I noticed a [discussion|https://github.com/diffplug/spotless/issues/927] in the spotless GitHub repository that we can improve the efficiency of spotless checks significantly by upgrading the version of spotless and enabling the `upToDateChecking`. I have made a simple test locally and the improvement of the spotless check after the upgrade is shown in the figure. !image-2022-08-25-22-10-54-453.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)