[jira] [Created] (FLINK-36127) Support sorting watermark on flink web
Yu Chen created FLINK-36127: --- Summary: Support sorting watermark on flink web Key: FLINK-36127 URL: https://issues.apache.org/jira/browse/FLINK-36127 Project: Flink Issue Type: Improvement Components: Runtime / Web Frontend Affects Versions: 2.0.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35832) IFNULL returns incorrect result in Flink SQL
[ https://issues.apache.org/jira/browse/FLINK-35832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865787#comment-17865787 ] Yu Chen commented on FLINK-35832: - Hi [~yunta], Would you mind to help @ someone related to SQL for help in this problem? > IFNULL returns incorrect result in Flink SQL > > > Key: FLINK-35832 > URL: https://issues.apache.org/jira/browse/FLINK-35832 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 2.0.0 >Reporter: Yu Chen >Priority: Critical > > Run following SQL in sql-client: > The correct result should be '16', but we got '1' on the master. > {code:java} > Flink SQL> SET 'sql-client.execution.result-mode' = 'tableau'; > [INFO] Execute statement succeeded. > Flink SQL> select JSON_VALUE('{"a":16}','$.a'), > IFNULL(JSON_VALUE('{"a":16}','$.a'),'0'); > ++++ > | op | EXPR$0 | EXPR$1 | > ++++ > | +I | 16 | 1 | > ++++ > Received a total of 1 row (0.30 seconds){code} > > With some quick debugging, I guess it may be caused by > [FLINK-24413|https://issues.apache.org/jira/browse/FLINK-24413] which was > introduced in Flink version 1.15. > > I think the wrong result '1' was produced because the simplifying SQL > procedure assumed that parameter 1 and parameter 2 ('0' was char) of IFNULL > were of the same type, and therefore implicitly cast '16' to char, resulting > in the incorrect result. > > I have tested the SQL in the following version: > > ||Flink Version||Result|| > |1.13|16,16| > |1.17|16,1| > |1.19|16,1| > |master|16,1| > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35832) IFNULL returns incorrect result in Flink SQL
[ https://issues.apache.org/jira/browse/FLINK-35832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-35832: Summary: IFNULL returns incorrect result in Flink SQL (was: IFNULL returns error result in Flink SQL) > IFNULL returns incorrect result in Flink SQL > > > Key: FLINK-35832 > URL: https://issues.apache.org/jira/browse/FLINK-35832 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 2.0.0 >Reporter: Yu Chen >Priority: Critical > > Run following SQL in sql-client: > The correct result should be '16', but we got '1' on the master. > {code:java} > Flink SQL> SET 'sql-client.execution.result-mode' = 'tableau'; > [INFO] Execute statement succeeded. > Flink SQL> select JSON_VALUE('{"a":16}','$.a'), > IFNULL(JSON_VALUE('{"a":16}','$.a'),'0'); > ++++ > | op | EXPR$0 | EXPR$1 | > ++++ > | +I | 16 | 1 | > ++++ > Received a total of 1 row (0.30 seconds){code} > > With some quick debugging, I guess it may be caused by > [FLINK-24413|https://issues.apache.org/jira/browse/FLINK-24413] which was > introduced in Flink version 1.15. > > I think the wrong result '1' was produced because the simplifying SQL > procedure assumed that parameter 1 and parameter 2 ('0' was char) of IFNULL > were of the same type, and therefore implicitly cast '16' to char, resulting > in the incorrect result. > > I have tested the SQL in the following version: > > ||Flink Version||Result|| > |1.13|16,16| > |1.17|16,1| > |1.19|16,1| > |master|16,1| > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35832) IFNULL returns error result in Flink SQL
Yu Chen created FLINK-35832: --- Summary: IFNULL returns error result in Flink SQL Key: FLINK-35832 URL: https://issues.apache.org/jira/browse/FLINK-35832 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 2.0.0 Reporter: Yu Chen Run following SQL in sql-client: The correct result should be '16', but we got '1' on the master. {code:java} Flink SQL> SET 'sql-client.execution.result-mode' = 'tableau'; [INFO] Execute statement succeeded. Flink SQL> select JSON_VALUE('{"a":16}','$.a'), IFNULL(JSON_VALUE('{"a":16}','$.a'),'0'); ++++ | op | EXPR$0 | EXPR$1 | ++++ | +I | 16 | 1 | ++++ Received a total of 1 row (0.30 seconds){code} With some quick debugging, I guess it may be caused by [FLINK-24413|https://issues.apache.org/jira/browse/FLINK-24413] which was introduced in Flink version 1.15. I think the wrong result '1' was produced because the simplifying SQL procedure assumed that parameter 1 and parameter 2 ('0' was char) of IFNULL were of the same type, and therefore implicitly cast '16' to char, resulting in the incorrect result. I have tested the SQL in the following version: ||Flink Version||Result|| |1.13|16,16| |1.17|16,1| |1.19|16,1| |master|16,1| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35039) Create Profiling JobManager/TaskManager Instance failed
[ https://issues.apache.org/jira/browse/FLINK-35039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842230#comment-17842230 ] Yu Chen commented on FLINK-35039: - Hi [~wczhu] , sorry for the late response. It does surprise me that YARN doesn't support POST. But I have a confusing point: POST requests are already used in many places in the Flink interfaces, such as Stop-with-savepoint. Are these interfaces currently not accessible on YARN? Moreover, according to the principles of RESTful Interface [1], the biggest difference between POST and PUT is that PUT requests are idempotent, the same request is submitted N times and the result is still the same, but the Profiling interface should be a new request each time and the result will be added to the server. Therefore, POST may be more in line with the semantics of this interface. So I wonder if it is appropriate to do this compatibility in Flink. WDYT [~yunta] ? [1] https://restfulapi.net/rest-put-vs-post/ !image-2024-04-30-11-12-34-734.png|width=414,height=496! > Create Profiling JobManager/TaskManager Instance failed > --- > > Key: FLINK-35039 > URL: https://issues.apache.org/jira/browse/FLINK-35039 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 > Environment: Hadoop 3.2.2 > Flink 1.19 >Reporter: ude >Assignee: ude >Priority: Major > Labels: pull-request-available > Attachments: image-2024-04-08-10-21-31-066.png, > image-2024-04-08-10-21-48-417.png, image-2024-04-08-10-30-16-683.png, > image-2024-04-30-11-12-34-734.png, image-2024-04-30-11-14-44-335.png > > > I'm test the "async-profiler" feature in version 1.19, but when I submit a > task in yarn per-job mode, I get an error when I click Create Profiling > Instance on the flink Web UI page. > !image-2024-04-08-10-21-31-066.png! > !image-2024-04-08-10-21-48-417.png! > The error message obviously means that the yarn proxy server does not support > *POST* calls. I checked the code of _*WebAppProxyServlet.java*_ and found > that the *POST* method is indeed not supported, so I changed it to *PUT* > method and the call was successful. > !image-2024-04-08-10-30-16-683.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35039) Create Profiling JobManager/TaskManager Instance failed
[ https://issues.apache.org/jira/browse/FLINK-35039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-35039: Attachment: image-2024-04-30-11-14-44-335.png > Create Profiling JobManager/TaskManager Instance failed > --- > > Key: FLINK-35039 > URL: https://issues.apache.org/jira/browse/FLINK-35039 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 > Environment: Hadoop 3.2.2 > Flink 1.19 >Reporter: ude >Assignee: ude >Priority: Major > Labels: pull-request-available > Attachments: image-2024-04-08-10-21-31-066.png, > image-2024-04-08-10-21-48-417.png, image-2024-04-08-10-30-16-683.png, > image-2024-04-30-11-12-34-734.png, image-2024-04-30-11-14-44-335.png > > > I'm test the "async-profiler" feature in version 1.19, but when I submit a > task in yarn per-job mode, I get an error when I click Create Profiling > Instance on the flink Web UI page. > !image-2024-04-08-10-21-31-066.png! > !image-2024-04-08-10-21-48-417.png! > The error message obviously means that the yarn proxy server does not support > *POST* calls. I checked the code of _*WebAppProxyServlet.java*_ and found > that the *POST* method is indeed not supported, so I changed it to *PUT* > method and the call was successful. > !image-2024-04-08-10-30-16-683.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35039) Create Profiling JobManager/TaskManager Instance failed
[ https://issues.apache.org/jira/browse/FLINK-35039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-35039: Attachment: image-2024-04-30-11-12-34-734.png > Create Profiling JobManager/TaskManager Instance failed > --- > > Key: FLINK-35039 > URL: https://issues.apache.org/jira/browse/FLINK-35039 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 > Environment: Hadoop 3.2.2 > Flink 1.19 >Reporter: ude >Assignee: ude >Priority: Major > Labels: pull-request-available > Attachments: image-2024-04-08-10-21-31-066.png, > image-2024-04-08-10-21-48-417.png, image-2024-04-08-10-30-16-683.png, > image-2024-04-30-11-12-34-734.png > > > I'm test the "async-profiler" feature in version 1.19, but when I submit a > task in yarn per-job mode, I get an error when I click Create Profiling > Instance on the flink Web UI page. > !image-2024-04-08-10-21-31-066.png! > !image-2024-04-08-10-21-48-417.png! > The error message obviously means that the yarn proxy server does not support > *POST* calls. I checked the code of _*WebAppProxyServlet.java*_ and found > that the *POST* method is indeed not supported, so I changed it to *PUT* > method and the call was successful. > !image-2024-04-08-10-30-16-683.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34968) Update flink-web copyright to 2024
Yu Chen created FLINK-34968: --- Summary: Update flink-web copyright to 2024 Key: FLINK-34968 URL: https://issues.apache.org/jira/browse/FLINK-34968 Project: Flink Issue Type: Improvement Components: Project Website Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34622) Typo of execution_mode configuration name in Chinese document
[ https://issues.apache.org/jira/browse/FLINK-34622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-34622: Description: !image-2024-03-08-14-46-34-859.png|width=794,height=380! > Typo of execution_mode configuration name in Chinese document > - > > Key: FLINK-34622 > URL: https://issues.apache.org/jira/browse/FLINK-34622 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Yu Chen >Priority: Major > Attachments: image-2024-03-08-14-46-34-859.png > > > !image-2024-03-08-14-46-34-859.png|width=794,height=380! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34622) Typo of execution_mode configuration name in Chinese document
[ https://issues.apache.org/jira/browse/FLINK-34622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-34622: Attachment: image-2024-03-08-14-46-34-859.png > Typo of execution_mode configuration name in Chinese document > - > > Key: FLINK-34622 > URL: https://issues.apache.org/jira/browse/FLINK-34622 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Yu Chen >Priority: Major > Attachments: image-2024-03-08-14-46-34-859.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34622) Typo of execution_mode configuration name in Chinese document
Yu Chen created FLINK-34622: --- Summary: Typo of execution_mode configuration name in Chinese document Key: FLINK-34622 URL: https://issues.apache.org/jira/browse/FLINK-34622 Project: Flink Issue Type: Bug Components: Documentation Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-33325) FLIP-375: Built-in cross-platform powerful java profiler
[ https://issues.apache.org/jira/browse/FLINK-33325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen resolved FLINK-33325. - Release Note: Since Flink 1.19, we support profiling the JobManager/TaskManager process interactively with [async-profiler](https://github.com/async-profiler/async-profiler) via the Flink Web UI, which allows users to create a profiling instance with arbitrary intervals and event modes, e.g ITIMER, CPU, Lock, Wall-Clock and Allocation. Flink users can complete the profiling submission and result export via Flink Web UI conveniently. For example, - First, users should find out the candidate TaskManager/JobManager with performance bottleneck for profiling, and switch to the corresponding TaskManager/JobManager page (profiler tab). - Users can submit a profiling instance with a specified period and mode by simply clicking on the button `Create Profiling Instance`. (The description of the profiling mode will be shown when hovering over the corresponding mode.) - Once the profiling instance is complete, the user can easily download the interactive HTML file by clicking on the link. **More Information** - [Documents](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/debugging/profiler/) - [FLIP-375: Built-in cross-platform powerful java profiler](https://cwiki.apache.org/confluence/x/64lEE) Resolution: Resolved > FLIP-375: Built-in cross-platform powerful java profiler > > > Key: FLINK-33325 > URL: https://issues.apache.org/jira/browse/FLINK-33325 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST, Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Assignee: Yu Chen >Priority: Major > Fix For: 1.19.0 > > > This is an umbrella JIRA of > [FLIP-375|https://cwiki.apache.org/confluence/x/64lEE] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34388) Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone and native K8s application mode
[ https://issues.apache.org/jira/browse/FLINK-34388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819897#comment-17819897 ] Yu Chen commented on FLINK-34388: - Hi [~ferenc-csaky] [~lincoln.86xy] [~yunta] , I have completed this Release Testing. And I think this feature works well in branch `release-1.19`, and this ticket can be closed. Here are some testing logs as a reference. *> Test 1: Pass local:// job jar in standalone mode, and check the artifacts are not actually copied.* *Passed.* Flink will use the original file and not actually copied. By the way, it also works with an absolute path(without `local://`), but it will copy the file to the `user.artifacts.base-dir`. I'm not sure whether it's expected (In my opinion, it was a local file, maybe we don't need to copy that cc [~ferenc-csaky] ). *> Test 2: Pass multiple artifacts in standalone mode.* *Passed.* In StandaloneJobCluster, flink could load the jar with `http://`(copied) and `local://`(not copied) simultaneously. *> Test 3: Pass a non-local job jar in native k8s mode.* *Passed.* Tested by starting native k8s application cluster in `minikube` with following command, {code:java} ./bin/flink run-application \ --target kubernetes-application \ -Dkubernetes.cluster-id=my-first-application-cluster\ -Dkubernetes.container.image.ref=flink:test_community_1.19_SN \ http://localhost:/data/WordCount.jar {code} flink will throw expected exception with hints(set `user.artifacts.raw-http-enabled` to true). {code:java} 2024-02-23 02:20:48,305 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not create application program. java.lang.RuntimeException: java.lang.IllegalArgumentException: Artifact fetching from raw HTTP endpoints are disabled. Set the 'user.artifacts.raw-http-enabled' property to override. at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) [flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] Caused by: java.lang.IllegalArgumentException: Artifact fetching from raw HTTP endpoints are disabled. Set the 'user.artifacts.raw-http-enabled' property to override. at org.apache.flink.client.program.artifact.ArtifactFetchManager.isRawHttp(ArtifactFetchManager.java:166) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.client.program.artifact.ArtifactFetchManager.getFetcher(ArtifactFetchManager.java:142) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifact(ArtifactFetchManager.java:157) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:124) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] ... 5 more {code} After setting the parameter in `flink-conf.yaml`, the jar was copied to `user.artifacts.base-dir` and the job was running as expected. *> Test 4: Pass additional remote artifacts in native k8s mode.* *Passed.* Tested by starting native k8s application cluster with commands in `Test 3`. In addition, we added additional jars in `flink-conf.yaml`, they are copied as expected and the job running well. {code:java} user.artifacts.artifact-list: http://10.23.171.97:/data/AsyncIO.jar;http://10.23.171.97:/data/WordCount.jar; {code} {code:java} root@my-first-application-cluster-8479579f45-cdpc9:/opt/flink/artifacts/default/my-first-application-cluster# ls AsyncIO.jar WindowJoin.jar WordCount.jar {code} > Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone > and native K8s application mode > --- > >
[jira] [Commented] (FLINK-34388) Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone and native K8s application mode
[ https://issues.apache.org/jira/browse/FLINK-34388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815243#comment-17815243 ] Yu Chen commented on FLINK-34388: - I'd like to take this Release Testing. Is there anyone helps to assinged this to me? > Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone > and native K8s application mode > --- > > Key: FLINK-34388 > URL: https://issues.apache.org/jira/browse/FLINK-34388 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics >Affects Versions: 1.19.0 >Reporter: Ferenc Csaky >Assignee: Yu Chen >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > > This ticket covers testing FLINK-28915. More details and the added docs are > accessible on the [PR|https://github.com/apache/flink/pull/24065] > Test 1: Pass {{local://}} job jar in standalone mode, check the artifacts are > not actually copied. > Test 2: Pass multiple artifacts in standalone mode. > Test 3: Pass a non-local job jar in native k8s mode. [1] > Test 4: Pass additional remote artifacts in native k8s mode. > Available config options: > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#artifact-fetching > [1] Custom docker image build instructions: > https://github.com/apache/flink-docker/tree/dev-master > Note: The docker build instructions also contains a web server example that > can be used to serve HTTP artifacts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler
[ https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815079#comment-17815079 ] Yu Chen commented on FLINK-34310: - Sorry for the late response. Thanks [~yunta] for creating the Testing Instructions. {quote}When I test other features in my Local, I found Profiler page throw some exceptions. I'm not sure whether it's expected. {quote} Hi [~fanrui], so far, the determination of whether the profiler is enabled or not is achieved by checking if the interface is registered with `WebMonitorEndpoint`. Therefore, this behavior is by design. But I think we can implement this check more elegantly in a later version by registering an interface to check the enabled status of the profiler. {quote}[~Yu Chen] Could you estimate when the user doc (https://issues.apache.org/jira/browse/FLINK-33436) can be finished?{quote} Hi [~lincoln.86xy] , really sorry for the late response. I was quite busy recently, is that OK for me to finish working on the documentation within the next week (before 02.18)? > Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform > powerful java profiler > --- > > Key: FLINK-34310 > URL: https://issues.apache.org/jira/browse/FLINK-34310 > Project: Flink > Issue Type: Sub-task > Components: Runtime / REST, Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Yun Tang >Priority: Blocker > Fix For: 1.19.0 > > Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png > > > Instructions: > 1. For the default case, it will print the hint to tell users how to enable > this feature. > !screenshot-2.png! > 2. After we add {{rest.profiling.enabled: true}} in the configurations, we > can use this feature now, and the default mode should be {{ITIMER}} > !screenshot-3.png! > 3. We cannot create another profiling while one is running > !screenshot-4.png! > 4. We can get at most 10 profilling snapshots by default, and the older one > will be deleted automaticially. > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34099) CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog is unstable on AZP
Yu Chen created FLINK-34099: --- Summary: CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog is unstable on AZP Key: FLINK-34099 URL: https://issues.apache.org/jira/browse/FLINK-34099 Project: Flink Issue Type: Bug Affects Versions: 1.19.0 Reporter: Yu Chen This build [Pipelines - Run 20240115.30 logs (azure.com)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56403&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba] fails as {code:java} Jan 15 18:29:51 18:29:51.938 [ERROR] org.apache.flink.test.checkpointing.CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog -- Time elapsed: 2.022 s <<< FAILURE! Jan 15 18:29:51 org.opentest4j.AssertionFailedError: Jan 15 18:29:51 Jan 15 18:29:51 expected: 0 Jan 15 18:29:51 but was: 1 Jan 15 18:29:51 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Jan 15 18:29:51 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) Jan 15 18:29:51 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) Jan 15 18:29:51 at org.apache.flink.test.checkpointing.CheckpointIntervalDuringBacklogITCase.testNoCheckpointDuringBacklog(CheckpointIntervalDuringBacklogITCase.java:141) Jan 15 18:29:51 at java.lang.reflect.Method.invoke(Method.java:498) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34072) Use JAVA_RUN in shell scripts
[ https://issues.apache.org/jira/browse/FLINK-34072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806446#comment-17806446 ] Yu Chen commented on FLINK-34072: - Hi [~yunta] , I'd like to take this ticket if you don't mind. > Use JAVA_RUN in shell scripts > - > > Key: FLINK-34072 > URL: https://issues.apache.org/jira/browse/FLINK-34072 > Project: Flink > Issue Type: Improvement > Components: Deployment / Scripts >Reporter: Yun Tang >Priority: Minor > Fix For: 1.19.0 > > > We should call {{JAVA_RUN}} in all cases when we launch {{java}} command, > otherwise we might be able to run the {{java}} if JAVA_HOME is not set. > such as: > {code:java} > flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT/bin/config.sh: line 339: > 17 : > syntax error: operand expected (error token is "> 17 ") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33325) FLIP-375: Built-in cross-platform powerful java profiler
[ https://issues.apache.org/jira/browse/FLINK-33325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805374#comment-17805374 ] Yu Chen commented on FLINK-33325: - Hi [~Zhanghao Chen] , thanks for your attention, the feature will be available to everyone soon! > FLIP-375: Built-in cross-platform powerful java profiler > > > Key: FLINK-33325 > URL: https://issues.apache.org/jira/browse/FLINK-33325 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST, Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Assignee: Yu Chen >Priority: Major > > This is an umbrella JIRA of > [FLIP-375|https://cwiki.apache.org/confluence/x/64lEE] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34029) Support different profiling mode on Flink WEB
Yu Chen created FLINK-34029: --- Summary: Support different profiling mode on Flink WEB Key: FLINK-34029 URL: https://issues.apache.org/jira/browse/FLINK-34029 Project: Flink Issue Type: Sub-task Components: Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34013) ProfilingServiceTest.testRollingDeletion is unstable on AZP
[ https://issues.apache.org/jira/browse/FLINK-34013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804387#comment-17804387 ] Yu Chen commented on FLINK-34013: - Hi [~Sergey Nuyanzin], I have reproduced it on my local machine. And it's a bug caused by triggering {*}`{*}stopProfiling{*}`{*} twice for each profiling request. I'll create a PR to fix this. Thank you for pointing out! > ProfilingServiceTest.testRollingDeletion is unstable on AZP > --- > > Key: FLINK-34013 > URL: https://issues.apache.org/jira/browse/FLINK-34013 > Project: Flink > Issue Type: Bug > Components: API / Core >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56073&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8258 > fails as > {noformat} > Jan 06 02:09:28 org.opentest4j.AssertionFailedError: expected: <2> but was: > <3> > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) > Jan 06 02:09:28 at > org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531) > Jan 06 02:09:28 at > org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:167) > Jan 06 02:09:28 at > org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117) > Jan 06 02:09:28 at java.lang.reflect.Method.invoke(Method.java:498) > Jan 06 02:09:28 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34013) ProfilingServiceTest.testRollingDeletion is unstable on AZP
[ https://issues.apache.org/jira/browse/FLINK-34013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804341#comment-17804341 ] Yu Chen commented on FLINK-34013: - Hi [~Sergey Nuyanzin] , really sorry about that. Let me have a look to fix this. > ProfilingServiceTest.testRollingDeletion is unstable on AZP > --- > > Key: FLINK-34013 > URL: https://issues.apache.org/jira/browse/FLINK-34013 > Project: Flink > Issue Type: Bug > Components: API / Core >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56073&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8258 > fails as > {noformat} > Jan 06 02:09:28 org.opentest4j.AssertionFailedError: expected: <2> but was: > <3> > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) > Jan 06 02:09:28 at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) > Jan 06 02:09:28 at > org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531) > Jan 06 02:09:28 at > org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:167) > Jan 06 02:09:28 at > org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117) > Jan 06 02:09:28 at java.lang.reflect.Method.invoke(Method.java:498) > Jan 06 02:09:28 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Jan 06 02:09:28 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33435) The visualization and download capabilities of profiling history
[ https://issues.apache.org/jira/browse/FLINK-33435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen closed FLINK-33435. --- Resolution: Duplicate > The visualization and download capabilities of profiling history > - > > Key: FLINK-33435 > URL: https://issues.apache.org/jira/browse/FLINK-33435 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Assignee: Yu Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33435) The visualization and download capabilities of profiling history
[ https://issues.apache.org/jira/browse/FLINK-33435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791600#comment-17791600 ] Yu Chen commented on FLINK-33435: - This subtask will be completed in FLINK-33433 and FLINK-33434. So I'll close this ticket. > The visualization and download capabilities of profiling history > - > > Key: FLINK-33435 > URL: https://issues.apache.org/jira/browse/FLINK-33435 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Assignee: Yu Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33613) Python UDF Runner process leak in Process Mode
[ https://issues.apache.org/jira/browse/FLINK-33613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-33613: Description: While working with PyFlink, we found that in Process Mode, the Python UDF process may leak after a failover of the job. It leads to a rising number of processes with their threads in the host machine, which eventually results in failure to create new threads. You can try to reproduce it with the attached test task `streamin_word_count.py`. (Note that the job will continue failover, and you can watch the process leaks by `ps -ef` on Taskmanager. Our test environment: * K8S Application Mode * 4 Taskmanagers with 12 slots/TM * Job's parallelism was set to 48 The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence with slots of TM (12), but we found that there are 180 processes on one Taskmanager after several failovers. was: While working with PyFlink, we found that in Process Mode, the Python UDF process may leak after a failover of the job. It leads to a rising number of processes with their threads in the host machine, which eventually results in failure to create new threads. You can try to reproduce it with the attached test task `streamin_word_count.py`. (Note that the job will continue failover, and you can watch the process leaks by `ps -ef` on Taskmanager. Our test environment: * K8S Application Mode * 4 Taskmanagers with 12 slots/TM * Job's parallelism was set to 48 The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence with parallelism (48), but we found that there are 180 processes after several failovers. > Python UDF Runner process leak in Process Mode > -- > > Key: FLINK-33613 > URL: https://issues.apache.org/jira/browse/FLINK-33613 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.17.0 >Reporter: Yu Chen >Priority: Major > Attachments: ps-ef.txt, streaming_word_count-1.py > > > While working with PyFlink, we found that in Process Mode, the Python UDF > process may leak after a failover of the job. It leads to a rising number of > processes with their threads in the host machine, which eventually results in > failure to create new threads. > > You can try to reproduce it with the attached test task > `streamin_word_count.py`. > (Note that the job will continue failover, and you can watch the process > leaks by `ps -ef` on Taskmanager. > > Our test environment: > * K8S Application Mode > * 4 Taskmanagers with 12 slots/TM > * Job's parallelism was set to 48 > The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence > with slots of TM (12), but we found that there are 180 processes on one > Taskmanager after several failovers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33613) Python UDF Runner process leak in Process Mode
Yu Chen created FLINK-33613: --- Summary: Python UDF Runner process leak in Process Mode Key: FLINK-33613 URL: https://issues.apache.org/jira/browse/FLINK-33613 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.17.0 Reporter: Yu Chen Attachments: ps-ef.txt, streaming_word_count-1.py While working with PyFlink, we found that in Process Mode, the Python UDF process may leak after a failover of the job. It leads to a rising number of processes with their threads in the host machine, which eventually results in failure to create new threads. You can try to reproduce it with the attached test task `streamin_word_count.py`. (Note that the job will continue failover, and you can watch the process leaks by `ps -ef` on Taskmanager. Our test environment: * K8S Application Mode * 4 Taskmanagers with 12 slots/TM * Job's parallelism was set to 48 The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence with parallelism (48), but we found that there are 180 processes after several failovers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33474) ShowPlan throws undefined exception In Flink Web Submit Page
Yu Chen created FLINK-33474: --- Summary: ShowPlan throws undefined exception In Flink Web Submit Page Key: FLINK-33474 URL: https://issues.apache.org/jira/browse/FLINK-33474 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen Attachments: image-2023-11-07-13-53-08-216.png The exception as shown in the figure below, meanwhile, the job plan cannot be displayed properly. The root cause is that the dagreComponent is located in the nz-drawer and is only loaded when the drawer is visible, so we need to wait for the drawer to finish loading and then render the job plan. !image-2023-11-07-13-53-08-216.png|width=400,height=190! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33436) Documentation on the built-in Profiler
Yu Chen created FLINK-33436: --- Summary: Documentation on the built-in Profiler Key: FLINK-33436 URL: https://issues.apache.org/jira/browse/FLINK-33436 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33435) The visualization and download capabilities of profiling history
Yu Chen created FLINK-33435: --- Summary: The visualization and download capabilities of profiling history Key: FLINK-33435 URL: https://issues.apache.org/jira/browse/FLINK-33435 Project: Flink Issue Type: Sub-task Components: Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33434) Support invoke async-profiler on Taskmanager through REST API
Yu Chen created FLINK-33434: --- Summary: Support invoke async-profiler on Taskmanager through REST API Key: FLINK-33434 URL: https://issues.apache.org/jira/browse/FLINK-33434 Project: Flink Issue Type: Sub-task Components: Runtime / REST Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33433) Support invoke async-profiler on Jobmanager through REST API
Yu Chen created FLINK-33433: --- Summary: Support invoke async-profiler on Jobmanager through REST API Key: FLINK-33433 URL: https://issues.apache.org/jira/browse/FLINK-33433 Project: Flink Issue Type: Sub-task Components: Runtime / REST Affects Versions: 1.19.0 Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33325) FLIP-375: Built-in cross-platform powerful java profiler
Yu Chen created FLINK-33325: --- Summary: FLIP-375: Built-in cross-platform powerful java profiler Key: FLINK-33325 URL: https://issues.apache.org/jira/browse/FLINK-33325 Project: Flink Issue Type: Improvement Components: Runtime / REST, Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: Yu Chen This is an umbrella JIRA of [FLIP-375|https://cwiki.apache.org/confluence/x/64lEE] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Web UI
[ https://issues.apache.org/jira/browse/FLINK-33230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775743#comment-17775743 ] Yu Chen commented on FLINK-33230: - Hi [~JunRuiLi] , Sure, I'll illustrate the details of the implementation as a FLIP and create the discussion in the dev mailing group. > Support Expanding ExecutionGraph to StreamGraph in Web UI > - > > Key: FLINK-33230 > URL: https://issues.apache.org/jira/browse/FLINK-33230 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Assignee: Yu Chen >Priority: Major > Attachments: image-2023-10-10-18-52-38-252.png > > > Flink Web shows users the ExecutionGraph (i.e., chained operators), but in > some cases, we would like to know the structure of the chained operators as > well as the necessary metrics such as the inputs and outputs of data, etc. > > Thus, we propose to show the stream graphs and some related metrics such as > numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). > > !image-2023-10-10-18-52-38-252.png|width=750,height=263! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Web UI
[ https://issues.apache.org/jira/browse/FLINK-33230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774333#comment-17774333 ] Yu Chen commented on FLINK-33230: - Hi [~lsy]. We can store the Json String of the StreamGraph into the ArchiveExecutionGraph in a similar way as JsonPlan. Actually, the execution graph shown in the Web UI was also extracted from the JsonPlan. You can refer to the following code path: StreamingJobGraphGenerator->createJobGraph-[DefaultExecutionGraphBuilder]->executionGraph -> ArchivedExecutionGraph > Support Expanding ExecutionGraph to StreamGraph in Web UI > - > > Key: FLINK-33230 > URL: https://issues.apache.org/jira/browse/FLINK-33230 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Assignee: Yu Chen >Priority: Major > Attachments: image-2023-10-10-18-52-38-252.png > > > Flink Web shows users the ExecutionGraph (i.e., chained operators), but in > some cases, we would like to know the structure of the chained operators as > well as the necessary metrics such as the inputs and outputs of data, etc. > > Thus, we propose to show the stream graphs and some related metrics such as > numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). > > !image-2023-10-10-18-52-38-252.png|width=750,height=263! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Web UI
[ https://issues.apache.org/jira/browse/FLINK-33230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-33230: Attachment: (was: image-2023-10-10-18-45-24-486.png) > Support Expanding ExecutionGraph to StreamGraph in Web UI > - > > Key: FLINK-33230 > URL: https://issues.apache.org/jira/browse/FLINK-33230 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-10-10-18-52-38-252.png > > > Flink Web shows users the ExecutionGraph (i.e., chained operators), but in > some cases, we would like to know the structure of the chained operators as > well as the necessary metrics such as the inputs and outputs of data, etc. > > Thus, we propose to show the stream graphs and some related metrics such as > numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). > > !image-2023-10-10-18-52-38-252.png|width=750,height=263! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Web UI
[ https://issues.apache.org/jira/browse/FLINK-33230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-33230: Description: Flink Web shows users the ExecutionGraph (i.e., chained operators), but in some cases, we would like to know the structure of the chained operators as well as the necessary metrics such as the inputs and outputs of data, etc. Thus, we propose to show the stream graphs and some related metrics such as numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). !image-2023-10-10-18-52-38-252.png|width=750,height=263! was: Flink Web shows users the ExecutionGraph (i.e., chained operators), but in some cases, we would like to know the structure of the chained operators as well as the necessary metrics such as the inputs and outputs of data, etc. Thus, we propose to show the stream graphs and some related metrics such as numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). !image-2023-10-10-18-45-42-991.png|width=508,height=178! > Support Expanding ExecutionGraph to StreamGraph in Web UI > - > > Key: FLINK-33230 > URL: https://issues.apache.org/jira/browse/FLINK-33230 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-10-10-18-45-24-486.png, > image-2023-10-10-18-52-38-252.png > > > Flink Web shows users the ExecutionGraph (i.e., chained operators), but in > some cases, we would like to know the structure of the chained operators as > well as the necessary metrics such as the inputs and outputs of data, etc. > > Thus, we propose to show the stream graphs and some related metrics such as > numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). > > !image-2023-10-10-18-52-38-252.png|width=750,height=263! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Web UI
[ https://issues.apache.org/jira/browse/FLINK-33230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-33230: Attachment: image-2023-10-10-18-52-38-252.png > Support Expanding ExecutionGraph to StreamGraph in Web UI > - > > Key: FLINK-33230 > URL: https://issues.apache.org/jira/browse/FLINK-33230 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-10-10-18-45-24-486.png, > image-2023-10-10-18-52-38-252.png > > > Flink Web shows users the ExecutionGraph (i.e., chained operators), but in > some cases, we would like to know the structure of the chained operators as > well as the necessary metrics such as the inputs and outputs of data, etc. > > Thus, we propose to show the stream graphs and some related metrics such as > numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). > > !image-2023-10-10-18-45-42-991.png|width=508,height=178! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Web UI
[ https://issues.apache.org/jira/browse/FLINK-33230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-33230: Attachment: image-2023-10-10-18-45-24-486.png Component/s: Runtime / Web Frontend Affects Version/s: 1.19.0 Description: Flink Web shows users the ExecutionGraph (i.e., chained operators), but in some cases, we would like to know the structure of the chained operators as well as the necessary metrics such as the inputs and outputs of data, etc. Thus, we propose to show the stream graphs and some related metrics such as numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). !image-2023-10-10-18-45-42-991.png|width=508,height=178! Summary: Support Expanding ExecutionGraph to StreamGraph in Web UI (was: Support Expanding ExecutionGraph to StreamGraph in Flink) > Support Expanding ExecutionGraph to StreamGraph in Web UI > - > > Key: FLINK-33230 > URL: https://issues.apache.org/jira/browse/FLINK-33230 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.19.0 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-10-10-18-45-24-486.png, > image-2023-10-10-18-52-38-252.png > > > Flink Web shows users the ExecutionGraph (i.e., chained operators), but in > some cases, we would like to know the structure of the chained operators as > well as the necessary metrics such as the inputs and outputs of data, etc. > > Thus, we propose to show the stream graphs and some related metrics such as > numberRecordInand numberRecordOut on the Flink Web (As shown in the Figure). > > !image-2023-10-10-18-45-42-991.png|width=508,height=178! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33230) Support Expanding ExecutionGraph to StreamGraph in Flink
Yu Chen created FLINK-33230: --- Summary: Support Expanding ExecutionGraph to StreamGraph in Flink Key: FLINK-33230 URL: https://issues.apache.org/jira/browse/FLINK-33230 Project: Flink Issue Type: Improvement Reporter: Yu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
[ https://issues.apache.org/jira/browse/FLINK-32754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32754: Description: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. {*}Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state{*}. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. *Q: Why does this bug prevent the task from creating the Checkpoint?* `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. *Q: Why this bug doesn't trigger a task failover?* In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=963,height=443! was: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=963,height=443! > Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE > -- > > Key: FLINK-32754 > URL: https://issues.apache.org/jira/browse/FLINK-32754 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.17.0, 1.17.1 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-08-04-18-28-05-897.png > > > We registered some metrics in the `enumerator` of the flip-27 source via > `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in > JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. > {*}Meanwhile, the task does not experience failover, and the Checkpoints > cannot be successfully created even after the task is in running state{*}. > We found that the implementation class of `SplitEnumerator` is > `LazyInitializedCoordinatorContext`, however, the metricGroup() is > initialized after calling lazyInitialize(). By reviewing the code, we found > that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() > has not been called yet, so NPE is thrown. > *Q: Why does this bug prevent the task from creating the Checkpoint?* > `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the > member variable `enumerator` in `SourceCoordinator` being null. > Unfortunately, a
[jira] [Updated] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
[ https://issues.apache.org/jira/browse/FLINK-32754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32754: Description: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=963,height=443! was: We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=2442,height=1123! > Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE > -- > > Key: FLINK-32754 > URL: https://issues.apache.org/jira/browse/FLINK-32754 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.17.0, 1.17.1 >Reporter: Yu Chen >Priority: Major > Attachments: image-2023-08-04-18-28-05-897.png > > > We registered some metrics in the `enumerator` of the flip-27 source via > `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in > JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. > Meanwhile, the task does not experience failover, and the Checkpoints cannot > be successfully created even after the task is in running state. > We found that the implementation class of `SplitEnumerator` is > `LazyInitializedCoordinatorContext`, however, the metricGroup() is > initialized after calling lazyInitialize(). By reviewing the code, we found > that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() > has not been called yet, so NPE is thrown. > Q: Why does this bug prevent the task from creating the Checkpoint? > `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the > member variable `enumerator` in `SourceCoordinator` being null. > Unfortunately, all Checkpoint-r
[jira] [Created] (FLINK-32754) Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE
Yu Chen created FLINK-32754: --- Summary: Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE Key: FLINK-32754 URL: https://issues.apache.org/jira/browse/FLINK-32754 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.17.1, 1.17.0 Reporter: Yu Chen Attachments: image-2023-08-04-18-28-05-897.png We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null. Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state. We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown. Q: Why does this bug prevent the task from creating the Checkpoint? `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`. In `runInEventLoop()`, if the enumerator is null, it will return directly. Q: Why this bug doesn't trigger a task failover? In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job. However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`. Thus it appears that the task did not failover. !image-2023-08-04-18-28-05-897.png|width=2442,height=1123! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
[ https://issues.apache.org/jira/browse/FLINK-32186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726096#comment-17726096 ] Yu Chen commented on FLINK-32186: - Hi [~yunta], could you help to assign this ticket to me? Thank you~ > Support subtask stack auto-search when redirecting from subtask backpressure > tab > > > Key: FLINK-32186 > URL: https://issues.apache.org/jira/browse/FLINK-32186 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.18.0 >Reporter: Yu Chen >Priority: Not a Priority > Attachments: image-2023-05-25-15-52-54-383.png, > image-2023-05-25-16-11-00-374.png > > > Note that we have introduced a dump link on the backpressure page in > FLINK-29996(Figure 1), which helps to check what are the corresponding > subtask doing more easily. > But we still have to search for the corresponding call stack of the > back-pressured subtask from the whole TaskManager thread dumps, it's not > convenient enough. > Therefore, I would like to trigger the search for the editor automatically > after redirecting from the backpressure tab, which will help to scroll the > thread dumps to the corresponding call stack of the back-pressured subtask > (As shown in Figure 2). > !image-2023-05-25-15-52-54-383.png|width=680,height=260! > Figure 1. ThreadDump Link in Backpressure Tab > !image-2023-05-25-16-11-00-374.png|width=680,height=353! > Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
[ https://issues.apache.org/jira/browse/FLINK-32186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32186: Description: Note that we have introduced a dump link on the backpressure page in FLINK-29996(Figure 1), which helps to check what are the corresponding subtask doing more easily. But we still have to search for the corresponding call stack of the back-pressured subtask from the whole TaskManager thread dumps, it's not convenient enough. Therefore, I would like to trigger the search for the editor automatically after redirecting from the backpressure tab, which will help to scroll the thread dumps to the corresponding call stack of the back-pressured subtask (As shown in Figure 2). !image-2023-05-25-15-52-54-383.png|width=680,height=260! Figure 1. ThreadDump Link in Backpressure Tab !image-2023-05-25-16-11-00-374.png|width=680,height=353! Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab was: Note that we have introduced a dump link on the backpressure page in FLINK-29996(Figure 1), which helps to check what are the corresponding subtask doing more easily. But we still have to search for the corresponding call stack of the back-pressured subtask from the whole TaskManager thread dumps, it's not convenient enough. Therefore, I would like to trigger the search for the editor automatically after redirecting from the backpressure tab, which will help to scroll the thread dumps to the corresponding call stack of the back-pressured subtask (As shown in Figure 2). !image-2023-05-25-15-52-54-383.png|width=680,height=260! Figure 1. ThreadDump Link in Backpressure Tab !image-2023-05-25-16-08-14-325.png|width=676,height=351! Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab > Support subtask stack auto-search when redirecting from subtask backpressure > tab > > > Key: FLINK-32186 > URL: https://issues.apache.org/jira/browse/FLINK-32186 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.18.0 >Reporter: Yu Chen >Priority: Not a Priority > Attachments: image-2023-05-25-15-52-54-383.png, > image-2023-05-25-16-11-00-374.png > > > Note that we have introduced a dump link on the backpressure page in > FLINK-29996(Figure 1), which helps to check what are the corresponding > subtask doing more easily. > But we still have to search for the corresponding call stack of the > back-pressured subtask from the whole TaskManager thread dumps, it's not > convenient enough. > Therefore, I would like to trigger the search for the editor automatically > after redirecting from the backpressure tab, which will help to scroll the > thread dumps to the corresponding call stack of the back-pressured subtask > (As shown in Figure 2). > !image-2023-05-25-15-52-54-383.png|width=680,height=260! > Figure 1. ThreadDump Link in Backpressure Tab > !image-2023-05-25-16-11-00-374.png|width=680,height=353! > Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
[ https://issues.apache.org/jira/browse/FLINK-32186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32186: Attachment: image-2023-05-25-16-11-00-374.png > Support subtask stack auto-search when redirecting from subtask backpressure > tab > > > Key: FLINK-32186 > URL: https://issues.apache.org/jira/browse/FLINK-32186 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.18.0 >Reporter: Yu Chen >Priority: Not a Priority > Attachments: image-2023-05-25-15-52-54-383.png, > image-2023-05-25-16-11-00-374.png > > > Note that we have introduced a dump link on the backpressure page in > FLINK-29996(Figure 1), which helps to check what are the corresponding > subtask doing more easily. > But we still have to search for the corresponding call stack of the > back-pressured subtask from the whole TaskManager thread dumps, it's not > convenient enough. > Therefore, I would like to trigger the search for the editor automatically > after redirecting from the backpressure tab, which will help to scroll the > thread dumps to the corresponding call stack of the back-pressured subtask > (As shown in Figure 2). > !image-2023-05-25-15-52-54-383.png|width=680,height=260! > Figure 1. ThreadDump Link in Backpressure Tab > !image-2023-05-25-16-08-14-325.png|width=676,height=351! > Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
[ https://issues.apache.org/jira/browse/FLINK-32186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32186: Attachment: (was: image-2023-05-25-16-08-14-325.png) > Support subtask stack auto-search when redirecting from subtask backpressure > tab > > > Key: FLINK-32186 > URL: https://issues.apache.org/jira/browse/FLINK-32186 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.18.0 >Reporter: Yu Chen >Priority: Not a Priority > Attachments: image-2023-05-25-15-52-54-383.png, > image-2023-05-25-16-11-00-374.png > > > Note that we have introduced a dump link on the backpressure page in > FLINK-29996(Figure 1), which helps to check what are the corresponding > subtask doing more easily. > But we still have to search for the corresponding call stack of the > back-pressured subtask from the whole TaskManager thread dumps, it's not > convenient enough. > Therefore, I would like to trigger the search for the editor automatically > after redirecting from the backpressure tab, which will help to scroll the > thread dumps to the corresponding call stack of the back-pressured subtask > (As shown in Figure 2). > !image-2023-05-25-15-52-54-383.png|width=680,height=260! > Figure 1. ThreadDump Link in Backpressure Tab > !image-2023-05-25-16-11-00-374.png|width=680,height=353! > Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
[ https://issues.apache.org/jira/browse/FLINK-32186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-32186: Description: Note that we have introduced a dump link on the backpressure page in FLINK-29996(Figure 1), which helps to check what are the corresponding subtask doing more easily. But we still have to search for the corresponding call stack of the back-pressured subtask from the whole TaskManager thread dumps, it's not convenient enough. Therefore, I would like to trigger the search for the editor automatically after redirecting from the backpressure tab, which will help to scroll the thread dumps to the corresponding call stack of the back-pressured subtask (As shown in Figure 2). !image-2023-05-25-15-52-54-383.png|width=680,height=260! Figure 1. ThreadDump Link in Backpressure Tab !image-2023-05-25-16-08-14-325.png|width=676,height=351! Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab was: Note that we have introduced a dump link on the backpressure page in [FLINK-29996|https://issues.apache.org/jira/browse/FLINK-29996](Figure 1), which helps to check what are the corresponding subtask doing more easily. But we still have to search for the corresponding call stack of the back-pressured subtask from the whole TaskManager thread dumps, it's not convenient enough. Therefore, I would like to trigger the search for the editor automatically after redirecting from the backpressure tab, which will help to scroll the thread dumps to the corresponding call stack of the back-pressured subtask (As shown in Figure 2). !image-2023-05-25-15-52-54-383.png! Figure 1. ThreadDump Link in Backpressure Tab !image-2023-05-25-16-08-14-325.png! Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab > Support subtask stack auto-search when redirecting from subtask backpressure > tab > > > Key: FLINK-32186 > URL: https://issues.apache.org/jira/browse/FLINK-32186 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.18.0 >Reporter: Yu Chen >Priority: Not a Priority > Attachments: image-2023-05-25-15-52-54-383.png, > image-2023-05-25-16-08-14-325.png > > > Note that we have introduced a dump link on the backpressure page in > FLINK-29996(Figure 1), which helps to check what are the corresponding > subtask doing more easily. > But we still have to search for the corresponding call stack of the > back-pressured subtask from the whole TaskManager thread dumps, it's not > convenient enough. > Therefore, I would like to trigger the search for the editor automatically > after redirecting from the backpressure tab, which will help to scroll the > thread dumps to the corresponding call stack of the back-pressured subtask > (As shown in Figure 2). > !image-2023-05-25-15-52-54-383.png|width=680,height=260! > Figure 1. ThreadDump Link in Backpressure Tab > !image-2023-05-25-16-08-14-325.png|width=676,height=351! > Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32186) Support subtask stack auto-search when redirecting from subtask backpressure tab
Yu Chen created FLINK-32186: --- Summary: Support subtask stack auto-search when redirecting from subtask backpressure tab Key: FLINK-32186 URL: https://issues.apache.org/jira/browse/FLINK-32186 Project: Flink Issue Type: Improvement Components: Runtime / Web Frontend Affects Versions: 1.18.0 Reporter: Yu Chen Attachments: image-2023-05-25-15-52-54-383.png, image-2023-05-25-16-08-14-325.png Note that we have introduced a dump link on the backpressure page in [FLINK-29996|https://issues.apache.org/jira/browse/FLINK-29996](Figure 1), which helps to check what are the corresponding subtask doing more easily. But we still have to search for the corresponding call stack of the back-pressured subtask from the whole TaskManager thread dumps, it's not convenient enough. Therefore, I would like to trigger the search for the editor automatically after redirecting from the backpressure tab, which will help to scroll the thread dumps to the corresponding call stack of the back-pressured subtask (As shown in Figure 2). !image-2023-05-25-15-52-54-383.png! Figure 1. ThreadDump Link in Backpressure Tab !image-2023-05-25-16-08-14-325.png! Figure 2. Trigger Auto-search after Redirecting from Backpressure Tab -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29322) Expose savepoint format on Web UI
[ https://issues.apache.org/jira/browse/FLINK-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630231#comment-17630231 ] Yu Chen commented on FLINK-29322: - I have already implemented it, if no one has any objection, I can take this ticket. > Expose savepoint format on Web UI > - > > Key: FLINK-29322 > URL: https://issues.apache.org/jira/browse/FLINK-29322 > Project: Flink > Issue Type: New Feature > Components: Runtime / Web Frontend >Reporter: Matyas Orhidi >Assignee: Matyas Orhidi >Priority: Major > Fix For: 1.17.0 > > > Savepoint format is not exposed on the Web UI, thus users should remember how > they triggered it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28926) Release Testing: Verify flip-235 hybrid shuffle mode
[ https://issues.apache.org/jira/browse/FLINK-28926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600256#comment-17600256 ] Yu Chen commented on FLINK-28926: - Hi, all. According to the docs of [Batch Shuffle|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle/], I have tested the feature in a standalone cluster with 1JM+2TM(2 slots/TM) locally and submitted the {{WordCount}} example job to the cluster in batch mode with the following command respectively: {code:sh} ./bin/flink run -Dexecution.batch-shuffle-mode=ALL_EXCHANGES_BLOCKING -Dparallelism.default=2 --detached examples/streaming/WordCount.jar --input /tmp/wordcount.txt --output /tmp/wordcount_res ./bin/flink run -Dexecution.batch-shuffle-mode=ALL_EXCHANGES_HYBRID_FULL -Dparallelism.default=2 --detached examples/streaming/WordCount.jar --input /tmp/wordcount.txt --output /tmp/wordcount_res ./bin/flink run -Dexecution.batch-shuffle-mode=ALL_EXCHANGES_HYBRID_SELECTIVE -Dparallelism.default=2 --detached examples/streaming/WordCount.jar --input /tmp/wordcount.txt --output /tmp/wordcount_res {code} Note that {{/tmp/wordcount.txt}} contains {{10,000,000}} random words generated by {{{}RandomStringUtils.randomAlphabetic(10){}}}. Through the Flink WEB, I verified the following scenarios: # {{HYBRID SHUFFLE}} can utilize all four slots, while {{BLOCK SHUFFLE}} uses only two slots, which is consistent with the relevant description in the document. # By examining the timeline chart of the job, we can see that {{HYBRID SHUFFLE}} makes the upstream task and the downstream task in the {{WordCount}} job start simultaneously. # I also restart the cluster with the configuration `{{{}jobmanager.scheduler: AdaptiveBatch{}}}` in the {{{}flink-conf.yaml{}}}. The job configured with {{ALL_EXCHANGES_HYBRID_FULL}} and {{ALL_EXCHANGES_HYBRID_SELECTIVE}} produced an error consistent with the documentation's description of limitation. Overall, I have not found any problems with this feature, please feel free to contact me if any other cases need to be tested. > Release Testing: Verify flip-235 hybrid shuffle mode > > > Key: FLINK-28926 > URL: https://issues.apache.org/jira/browse/FLINK-28926 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Affects Versions: 1.16.0 >Reporter: Weijie Guo >Assignee: Yu Chen >Priority: Blocker > Labels: release-testing > Fix For: 1.16.0 > > > * Please refer to release note of FLINK-27862 for a list of changes need to > be verified. > * Please refer to out document for more details > [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle/] > * Hybrid shuffle have some known limitations: No support for Slot Sharing, > Adaptive Batch Scheduler and Speculative Execution. Please make sure you do > not using this features in testing. > * The changes should be verified only in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28926) Release Testing: Verify flip-235 hybrid shuffle mode
[ https://issues.apache.org/jira/browse/FLINK-28926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597496#comment-17597496 ] Yu Chen commented on FLINK-28926: - Hi [~hxb] [~Weijie Guo] , I have finished the setup work of the tests and the overall progress is about 30%. And there are no problems found yet, if I find any, I will contact you in time. FYI. > Release Testing: Verify flip-235 hybrid shuffle mode > > > Key: FLINK-28926 > URL: https://issues.apache.org/jira/browse/FLINK-28926 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Affects Versions: 1.16.0 >Reporter: Weijie Guo >Assignee: Yu Chen >Priority: Blocker > Labels: release-testing > Fix For: 1.16.0 > > > * Please refer to release note of FLINK-27862 for a list of changes need to > be verified. > * Please refer to out document for more details > [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle/] > * Hybrid shuffle have some known limitations: No support for Slot Sharing, > Adaptive Batch Scheduler and Speculative Execution. Please make sure you do > not using this features in testing. > * The changes should be verified only in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29107) Upgrade spotless version to improve spotless check efficiency
[ https://issues.apache.org/jira/browse/FLINK-29107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-29107: Description: I noticed a [discussion|https://github.com/diffplug/spotless/issues/927] in the spotless GitHub repository that we can improve the efficiency of spotless checks significantly by upgrading the version of spotless and enabling the `upToDateChecking`. I have made a simple test locally and the improvement of the spotless check after the upgrade is shown in the figure. !image-2022-08-25-22-10-54-453.png! was: Hi all, I noticed a [discussion|https://github.com/diffplug/spotless/issues/927] in the spotless GitHub repository that we can improve the efficiency of spotless checks significantly by upgrading the version of spotless and enabling the `upToDateChecking`. I have made a simple test locally and the improvement of the spotless check after the upgrade is shown in the figure. !image-2022-08-25-22-10-54-453.png! > Upgrade spotless version to improve spotless check efficiency > - > > Key: FLINK-29107 > URL: https://issues.apache.org/jira/browse/FLINK-29107 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.15.2 >Reporter: Yu Chen >Priority: Major > Attachments: image-2022-08-25-22-10-54-453.png > > > I noticed a [discussion|https://github.com/diffplug/spotless/issues/927] in > the spotless GitHub repository that we can improve the efficiency of spotless > checks significantly by upgrading the version of spotless and enabling the > `upToDateChecking`. > I have made a simple test locally and the improvement of the spotless check > after the upgrade is shown in the figure. > !image-2022-08-25-22-10-54-453.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29107) Upgrade spotless version to improve spotless check efficiency
[ https://issues.apache.org/jira/browse/FLINK-29107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Chen updated FLINK-29107: Summary: Upgrade spotless version to improve spotless check efficiency (was: Bump up spotless version to improve efficiently) > Upgrade spotless version to improve spotless check efficiency > - > > Key: FLINK-29107 > URL: https://issues.apache.org/jira/browse/FLINK-29107 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.15.2 >Reporter: Yu Chen >Priority: Major > Attachments: image-2022-08-25-22-10-54-453.png > > > Hi all, I noticed a > [discussion|https://github.com/diffplug/spotless/issues/927] in the spotless > GitHub repository that we can improve the efficiency of spotless checks > significantly by upgrading the version of spotless and enabling the > `upToDateChecking`. > I have made a simple test locally and the improvement of the spotless check > after the upgrade is shown in the figure. > !image-2022-08-25-22-10-54-453.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29107) Bump up spotless version to improve efficiently
Yu Chen created FLINK-29107: --- Summary: Bump up spotless version to improve efficiently Key: FLINK-29107 URL: https://issues.apache.org/jira/browse/FLINK-29107 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.15.2 Reporter: Yu Chen Attachments: image-2022-08-25-22-10-54-453.png Hi all, I noticed a [discussion|https://github.com/diffplug/spotless/issues/927] in the spotless GitHub repository that we can improve the efficiency of spotless checks significantly by upgrading the version of spotless and enabling the `upToDateChecking`. I have made a simple test locally and the improvement of the spotless check after the upgrade is shown in the figure. !image-2022-08-25-22-10-54-453.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28926) Release Testing: Verify flip-235 hybrid shuffle mode
[ https://issues.apache.org/jira/browse/FLINK-28926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579497#comment-17579497 ] Yu Chen commented on FLINK-28926: - Hi, I would like to take this release testing, please assign this ticket to me if you don't mind, thanks. > Release Testing: Verify flip-235 hybrid shuffle mode > > > Key: FLINK-28926 > URL: https://issues.apache.org/jira/browse/FLINK-28926 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Affects Versions: 1.16.0 >Reporter: Weijie Guo >Priority: Blocker > Labels: release-testing > Fix For: 1.16.0 > > > * Please refer to release note of FLINK-27862 for a list of changes need to > be verified. > * Please refer to out document for more details > [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/batch_shuffle/] > * Hybrid shuffle have some known limitations: No support for Slot Sharing, > Adaptive Batch Scheduler and Speculative Execution. Please make sure you do > not using this features in testing. > * The changes should be verified only in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28577) 1.15.1 web ui console report error about checkpoint size
[ https://issues.apache.org/jira/browse/FLINK-28577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571713#comment-17571713 ] Yu Chen commented on FLINK-28577: - Well, I think I've found the root cause of this issue: it is mainly introduced by the mistake modification of the flink-runtime-web by [JIRA-25557|https://issues.apache.org/jira/browse/FLINK-25557]. And I can propose a PR to resolve the problem. cc: [~yunta] > 1.15.1 web ui console report error about checkpoint size > > > Key: FLINK-28577 > URL: https://issues.apache.org/jira/browse/FLINK-28577 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.15.1 >Reporter: nobleyd >Priority: Major > > 1.15.1 > 1 start-cluster > 2 submit job: ./bin/flink run -d ./examples/streaming/TopSpeedWindowing.jar > 3 trigger savepoint: ./bin/flink savepoint {{{jobId} ./sp0}} > {{4 open web ui for job and change to checkpoint tab, nothing showed.}} > {{Chrome console log shows some error:}} > {{main.a7e97c2f60a2616e.js:1 ERROR TypeError: Cannot read properties of null > (reading 'checkpointed_size') > at q (253.e9e8f2b56b4981f5.js:1:607974) > at Sl (main.a7e97c2f60a2616e.js:1:186068) > at Br (main.a7e97c2f60a2616e.js:1:184696) > at N8 (main.a7e97c2f60a2616e.js:1:185128) > at Br (main.a7e97c2f60a2616e.js:1:185153) > at N8 (main.a7e97c2f60a2616e.js:1:185128) > at Br (main.a7e97c2f60a2616e.js:1:185153) > at N8 (main.a7e97c2f60a2616e.js:1:185128) > at Br (main.a7e97c2f60a2616e.js:1:185153) > at B8 (main.a7e97c2f60a2616e.js:1:191872)}} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)