[jira] [Created] (FLINK-23241) Translate the page of "Working with State " into Chinese
baobinliang created FLINK-23241: --- Summary: Translate the page of "Working with State " into Chinese Key: FLINK-23241 URL: https://issues.apache.org/jira/browse/FLINK-23241 Project: Flink Issue Type: Improvement Components: chinese-translation Affects Versions: 1.13.0 Reporter: baobinliang Fix For: 1.13.0 The page url of "Working with State" is https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/datastream/fault-tolerance/state/. The markdown file can be found in flink/docs/content.zh/docs/dev/datastream/fault-tolerance/state.md in English. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23242) Translate the page of "Handling Application Parameters" into Chinese
baobinliang created FLINK-23242: --- Summary: Translate the page of "Handling Application Parameters" into Chinese Key: FLINK-23242 URL: https://issues.apache.org/jira/browse/FLINK-23242 Project: Flink Issue Type: Improvement Components: chinese-translation Affects Versions: 1.13.0 Reporter: baobinliang Fix For: 1.13.0 The page url of "Handling Application Parameters" is https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/datastream/application_parameters/. The markdown file can be found in flink/docs/content.zh/docs/dev/datastream/application_parameters.md in English. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23243) Translate "Windowing table-valued functions (Windowing TVFs)" page into Chinese
Edmond Wang created FLINK-23243: --- Summary: Translate "Windowing table-valued functions (Windowing TVFs)" page into Chinese Key: FLINK-23243 URL: https://issues.apache.org/jira/browse/FLINK-23243 Project: Flink Issue Type: Improvement Components: chinese-translation, Documentation Affects Versions: 1.13.0 Reporter: Edmond Wang Fix For: 1.13.0 The page url is [https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/sql/queries/select/] The markdown file is located in *docs/content.zh/docs/dev/table/sql/select.md* -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Migrating Test Framework to JUnit 5
+1 (binding) On Wed, Jun 30, 2021 at 5:56 PM Qingsheng Ren wrote: > Dear devs, > > > As discussed in the thread[1], I’d like to start a vote on migrating the > test framework of Flink project to JUnit 5. > > > JUnit 5[2] provides many exciting new features that can simplify the > development of test cases and make our test structure cleaner, such as > pluggable extension models (replacing rules such as > TestLogger/MiniCluster), improved parameterized test, annotation support > and nested/dynamic test. > > > The migration path towards JUnit 5 would be: > > > 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the > project > > > The vintage engine will keep all existing JUnit4-style cases still > valid. Since classes of JUnit 4 and 5 are located under different packages, > there won’t be conflict having JUnit 4 cases in the project. > > > 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules) > > > 3. Migrate all existing tests to JUnit 5 > > > This would be a giant commit similar to code reformatting and could be > merged after cutting the 1.14 release branch. There are many migration > examples and experiences to refer to, also Intellij IDEA provides tools for > refactoring. > > > 4. Ban JUnit 4 imports in CheckStyle > > > Some modules ilke Testcontainers still require JUnit 4 in the > classpath, and JUnit 4 could still appear as transitive dependency. Banning > JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the > project into 4 & 5 again. > > > 5. Remove vintage runner and some cleanup > > > > This vote will last for at least 72 hours, following the consensus voting > process. > > > Thanks! > > > [1] > > https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E > > [2] https://junit.org/junit5 > > -- > Best Regards, > > *Qingsheng Ren* > > Email: renqs...@gmail.com >
Re: [VOTE] Migrating Test Framework to JUnit 5
+1 (binding) On 05/07/2021 09:45, Arvid Heise wrote: +1 (binding) On Wed, Jun 30, 2021 at 5:56 PM Qingsheng Ren wrote: Dear devs, As discussed in the thread[1], I’d like to start a vote on migrating the test framework of Flink project to JUnit 5. JUnit 5[2] provides many exciting new features that can simplify the development of test cases and make our test structure cleaner, such as pluggable extension models (replacing rules such as TestLogger/MiniCluster), improved parameterized test, annotation support and nested/dynamic test. The migration path towards JUnit 5 would be: 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the project The vintage engine will keep all existing JUnit4-style cases still valid. Since classes of JUnit 4 and 5 are located under different packages, there won’t be conflict having JUnit 4 cases in the project. 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules) 3. Migrate all existing tests to JUnit 5 This would be a giant commit similar to code reformatting and could be merged after cutting the 1.14 release branch. There are many migration examples and experiences to refer to, also Intellij IDEA provides tools for refactoring. 4. Ban JUnit 4 imports in CheckStyle Some modules ilke Testcontainers still require JUnit 4 in the classpath, and JUnit 4 could still appear as transitive dependency. Banning JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the project into 4 & 5 again. 5. Remove vintage runner and some cleanup This vote will last for at least 72 hours, following the consensus voting process. Thanks! [1] https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E [2] https://junit.org/junit5 -- Best Regards, *Qingsheng Ren* Email: renqs...@gmail.com
Re: Job Recovery Time on TM Lost
As far as I know, a TM will report connection failure once its connected TM is lost. I suppose JM can believe the report and fail the tasks in the lost TM if it also encounters a connection failure. Of course, it won't work if the lost TM is standalone. But I suppose we can use the same strategy as the connected scenario. That is, consider it possibly lost on the first connection loss, and fail it if double check also fails. The major difference is the senders of the probes are the same one rather than two different roles, so the results may tend to be the same. On the other hand, the fact also means that the jobs can be fragile in an unstable environment, no matter whether the failover is triggered by TM or JM. So maybe it's not that worthy to introduce extra configurations for fault tolerance of heartbeat, unless we also introduce some retry strategies for netty connections. On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann wrote: > Could you share the full logs with us for the second experiment, Lu? I > cannot tell from the top of my head why it should take 30s unless you have > configured a restart delay of 30s. > > Let's discuss FLINK-23216 on the JIRA ticket, Gen. > > I've now implemented FLINK-23209 [1] but it somehow has the problem that > in a flakey environment you might not want to mark a TaskExecutor dead on > the first connection loss. Maybe this is something we need to make > configurable (e.g. introducing a threshold which admittedly is similar to > the heartbeat timeout) so that the user can configure it for her > environment. On the upside, if you mark the TaskExecutor dead on the first > connection loss (assuming you have a stable network environment), then it > can now detect lost TaskExecutors as fast as the heartbeat interval. > > [1] https://issues.apache.org/jira/browse/FLINK-23209 > > Cheers, > Till > > On Fri, Jul 2, 2021 at 9:33 AM Gen Luo wrote: > >> Thanks for sharing, Till and Yang. >> >> @Lu >> Sorry but I don't know how to explain the new test with the log. Let's >> wait for others' reply. >> >> @Till >> It would be nice if JIRAs could be fixed. Thanks again for proposing them. >> >> In addition, I was tracking an issue that RM keeps allocating and freeing >> slots after a TM lost until its heartbeat timeout, when I found the >> recovery costing as long as heartbeat timeout. That should be a minor bug >> introduced by declarative resource management. I have created a JIRA about >> the problem [1] and we can discuss it there if necessary. >> >> [1] https://issues.apache.org/jira/browse/FLINK-23216 >> >> Lu Niu 于2021年7月2日周五 上午3:13写道: >> >>> Another side question, Shall we add metric to cover the complete >>> restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only >>> covers phase 1. Thanks! >>> >>> Best >>> Lu >>> >>> On Thu, Jul 1, 2021 at 12:09 PM Lu Niu wrote: >>> Thanks TIll and Yang for help! Also Thanks Till for a quick fix! I did another test yesterday. In this test, I intentionally throw exception from the source operator: ``` if (runtimeContext.getIndexOfThisSubtask() == 1 && errorFrenquecyInMin > 0 && System.currentTimeMillis() - lastStartTime >= errorFrenquecyInMin * 60 * 1000) { lastStartTime = System.currentTimeMillis(); throw new RuntimeException( "Trigger expected exception at: " + lastStartTime); } ``` In this case, I found phase 1 still takes about 30s and Phase 2 dropped to 1s (because no need for container allocation). Why phase 1 still takes 30s even though no TM is lost? Related logs: ``` 2021-06-30 00:55:07,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: USER_MATERIALIZED_EVENT_SIGNAL-user_context-frontend_event_source -> ... java.lang.RuntimeException: Trigger expected exception at: 1625014507446 2021-06-30 00:55:07,509 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job NRTG_USER_MATERIALIZED_EVENT_SIGNAL_user_context_staging (35c95ee7141334845cfd49b642fa9f98) switched from state RUNNING to RESTARTING. 2021-06-30 00:55:37,596 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job NRTG_USER_MATERIALIZED_EVENT_SIGNAL_user_context_staging (35c95ee7141334845cfd49b642fa9f98) switched from state RESTARTING to RUNNING. 2021-06-30 00:55:38,678 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph(time when all tasks switch from CREATED to RUNNING) ``` Best Lu On Thu, Jul 1, 2021 at 12:06 PM Lu Niu wrote: > Thanks TIll and Yang for help! Also Thanks Till for a quick fix! > > I did another test yesterday. In this test, I intentionally throw > exception from the source operator: > ``` > if (runtimeContext.getIndexOfThisSubtask() == 1 > && errorFrenquecyInMin >
Re: [VOTE] Migrating Test Framework to JUnit 5
+1 (binding) Regards, Roman On Mon, Jul 5, 2021 at 9:47 AM Chesnay Schepler wrote: > > +1 (binding) > > On 05/07/2021 09:45, Arvid Heise wrote: > > +1 (binding) > > > > On Wed, Jun 30, 2021 at 5:56 PM Qingsheng Ren wrote: > > > >> Dear devs, > >> > >> > >> As discussed in the thread[1], I’d like to start a vote on migrating the > >> test framework of Flink project to JUnit 5. > >> > >> > >> JUnit 5[2] provides many exciting new features that can simplify the > >> development of test cases and make our test structure cleaner, such as > >> pluggable extension models (replacing rules such as > >> TestLogger/MiniCluster), improved parameterized test, annotation support > >> and nested/dynamic test. > >> > >> > >> The migration path towards JUnit 5 would be: > >> > >> > >> 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the > >> project > >> > >> > >> The vintage engine will keep all existing JUnit4-style cases still > >> valid. Since classes of JUnit 4 and 5 are located under different packages, > >> there won’t be conflict having JUnit 4 cases in the project. > >> > >> > >> 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules) > >> > >> > >> 3. Migrate all existing tests to JUnit 5 > >> > >> > >> This would be a giant commit similar to code reformatting and could be > >> merged after cutting the 1.14 release branch. There are many migration > >> examples and experiences to refer to, also Intellij IDEA provides tools for > >> refactoring. > >> > >> > >> 4. Ban JUnit 4 imports in CheckStyle > >> > >> > >> Some modules ilke Testcontainers still require JUnit 4 in the > >> classpath, and JUnit 4 could still appear as transitive dependency. Banning > >> JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the > >> project into 4 & 5 again. > >> > >> > >> 5. Remove vintage runner and some cleanup > >> > >> > >> > >> This vote will last for at least 72 hours, following the consensus voting > >> process. > >> > >> > >> Thanks! > >> > >> > >> [1] > >> > >> https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E > >> > >> [2] https://junit.org/junit5 > >> > >> -- > >> Best Regards, > >> > >> *Qingsheng Ren* > >> > >> Email: renqs...@gmail.com > >> >
Re: [DISCUSS] Dashboard/HistoryServer authentication
Thank you, team. Then based on this agreement we are moving the proposal to the wiki and opening the PR soon. On Thu, Jul 1, 2021 at 12:28 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > > > > * Even if there is a major breaking version, Flink releases major > versions > > too where it could be added > > Netty framework locking is true but AFAIK there was a discussion to > > rewrite the Netty stuff to a more sexy thing but there was no agreement > to > > do that. > > > > Flink major releases seem to happen even less frequently than Netty > releases :( It would be unfortunate if a breaking Netty API change ended up > in the FLINK-3957[1] catch-all 2.0 changes. > > All in all I would agree on making it experimental. > > > > Thus I am happy with this compromise, thank you :) > > This would simply restrict use-cases where order is not important. Limiting > > devs such an add way is no-go. > > > > I think the only case to be made for imposing limitations would be to > encourage devs to only use this API in very specific situations, otherwise > to solve this in another way, and revisit the API if these limitations are > met and alternatives do not work. That said, I am still trying to > understand this specific Cloudera case – anything you can say about the > limitations of its Flink setup (i.e, difficult to spawn sidecar processes > (because of Yarn?)) would be greatly helpful to me and others without this > bit of context. > > But I think the proposed priority function that you've added is a nice > compromise as well, so +1 from my side with the proposal. I would only > further suggest that we include the other options to this problem in the > docs as the preferred approach, where possible. > > Thanks, > Austin > > > [1]: https://issues.apache.org/jira/browse/FLINK-3957 > > On Wed, Jun 30, 2021 at 10:25 AM Gabor Somogyi > wrote: > > > Answered here because the text started to be crowded. > > > > > It also locks Flink into the current major version of Netty (and the > > Netty framework itself) for the foreseeable future. > > It's not doing any Netty version locking because: > > * Netty not necessarily will add breaking changes in major versions, the > > API is quite stable > > * Even if there is a major breaking version, Flink releases major > versions > > too where it could be added > > Netty framework locking is true but AFAIK there was a discussion to > > rewrite the Netty stuff to a more sexy thing but there was no agreement > to > > do that. > > All in all I would agree on making it experimental. > > > > > why not restrict the service loader to only allow one? > > This would simply restrict use-cases where order is not important. > > Limiting devs such an add way is no-go. > > I think the ordering came up multiple places which I think is a good > > reason fill this gap with a priority function. > > I've updated the doc and added it... > > > > BR, > > G > > > > > > On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards < > > austin.caw...@gmail.com> wrote: > > > >> Hi Gabor, > >> > >> Thanks for your answers. I appreciate the explanations. Please see my > >> responses + further questions below. > >> > >> > >> * What stability semantics do you envision for this API? > > >>> As I foresee the API will be as stable as Netty API. Since there is > >>> guarantee on no breaking changes between minor versions we can give the > >>> same guarantee. > >>> If for whatever reason we need to break it we can do it in major > version > >>> like every other open source project does. > >>> > >> > >> * Does Flink expose dependencies’ APIs in other places? Since this > exposes the Netty API, will this make it difficult to upgrade Netty? > > >>> I don't expect breaking changes between minor versions so such cases > >>> there will be no issues. If there is a breaking change in major version > >>> we need to wait Flink major version too. > >>> > >> > >> To clarify, you are proposing this new API to have the same stability > >> guarantees as @Public currently does? Where we will not introduce > breaking > >> changes unless absolutely necessary (and requiring a FLIP, etc.)? > >> > >> If this is the case, I think this puts the community in a tough position > >> where we are forced to maintain compatibility with something that we do > not > >> have control over. It also locks Flink into the current major version of > >> Netty (and the Netty framework itself) for the foreseeable future. > >> > >> I am saying we should not do this, perhaps this is the best solution to > >> finding a good compromise here, but I am trying to discover + > acknowledge > >> the full implications of this proposal so they can be discussed. > >> > >> What do you think about marking this API as @Experimental and not > >> guaranteeing stability between versions? Then, if we do decide we need > to > >> upgrade Netty (or move away from it), we can do so. > >> > >> * I share Till's concern about multiple factories – other HTTP > middlew
[jira] [Created] (FLINK-23244) Refactor the time indicator materialization
JING ZHANG created FLINK-23244: -- Summary: Refactor the time indicator materialization Key: FLINK-23244 URL: https://issues.apache.org/jira/browse/FLINK-23244 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Reporter: JING ZHANG Refactor the time indicator materialization, including 1. Move `time_indicator` materialize after `logical_rewrite` phase, which is closely before the physical optimization 2. Port `RelTimeIndicatorConverter` from scala to Java 3. Refator `RelTimeIndicatorConverter` to match `FlinkLogicalRel` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23245) Materialize rowtime attribute fields of regular join's inputs
JING ZHANG created FLINK-23245: -- Summary: Materialize rowtime attribute fields of regular join's inputs Key: FLINK-23245 URL: https://issues.apache.org/jira/browse/FLINK-23245 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Reporter: JING ZHANG Materialize rowtime attribute fields of regular join's inputs in `RelTimeIndicatorConverter` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23246) Refactor the time indicator materialization
JING ZHANG created FLINK-23246: -- Summary: Refactor the time indicator materialization Key: FLINK-23246 URL: https://issues.apache.org/jira/browse/FLINK-23246 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner Reporter: JING ZHANG Refactor the time indicator materialization, including 1. Move `time_indicator` materialize after `logical_rewrite` phase, which is closely before the physical optimization 2. Port `RelTimeIndicatorConverter` from scala to Java 3. Refator `RelTimeIndicatorConverter` to match `FlinkLogicalRel` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[RESULT][VOTE] Migrating Test Framework to JUnit 5
The voting time of migrating to JUnit 5[1] has passed. I'd like to close the vote now. There were three +1 binding votes: - Arvid Heise (binding) - Chesnay Schepler (binding) - Roman Khachatryan (binding) There were no -1 votes. The migration plan has been accepted. Thanks everyone for your support! [1] https://lists.apache.org/x/thread.html/r89a2675bce01ccfdcfc47f2b0af6ef1afdbe4bad96d8c679cf68825e@%3Cdev.flink.apache.org%3E -- Best Regards, Qingsheng Ren Email: renqs...@gmail.com
[jira] [Created] (FLINK-23247) Materialize rowtime attribute fields of regular join's inputs
JING ZHANG created FLINK-23247: -- Summary: Materialize rowtime attribute fields of regular join's inputs Key: FLINK-23247 URL: https://issues.apache.org/jira/browse/FLINK-23247 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner Reporter: JING ZHANG Materialize rowtime attribute fields of regular join's inputs in `RelTimeIndicatorConverter` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23248) SinkWriter is not close when failing
Fabian Paul created FLINK-23248: --- Summary: SinkWriter is not close when failing Key: FLINK-23248 URL: https://issues.apache.org/jira/browse/FLINK-23248 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.12.4, 1.13.1, 1.14.0 Reporter: Fabian Paul Currently the SinkWriter is only closed when the operator finishes in `AbstractSinkWriterOperator#close()` but we also must close the SinkWrite on `AbstractSinkWriterOperator#dispose()` to release possible acquired resources when failing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23249) Introduce ShuffleMasterContext to ShuffleMaster
Yingjie Cao created FLINK-23249: --- Summary: Introduce ShuffleMasterContext to ShuffleMaster Key: FLINK-23249 URL: https://issues.apache.org/jira/browse/FLINK-23249 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination, Runtime / Network Reporter: Yingjie Cao Fix For: 1.14.0 Introduce ShuffleMasterContext to ShuffleMaster. Just like the ShuffleEnvironmentContext at the TaskManager side, the ShuffleMasterContext can act as a proxy of ShuffleMaster and other components of Flink like the ResourceManagerPartitionTracker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Job Recovery Time on TM Lost
I think for RPC communication there are retry strategies used by the underlying Akka ActorSystem. So a RpcEndpoint can reconnect to a remote ActorSystem and resume communication. Moreover, there are also reconciliation protocols in place which reconcile the states between the components because of potentially lost RPC messages. So the main question would be whether a single connection loss is good enough for triggering the timeout or whether we want a more elaborate mechanism to reason about the availability of the remote system (e.g. a couple of lost heartbeat messages). Cheers, Till On Mon, Jul 5, 2021 at 10:00 AM Gen Luo wrote: > As far as I know, a TM will report connection failure once its connected > TM is lost. I suppose JM can believe the report and fail the tasks in the > lost TM if it also encounters a connection failure. > > Of course, it won't work if the lost TM is standalone. But I suppose we > can use the same strategy as the connected scenario. That is, consider it > possibly lost on the first connection loss, and fail it if double check > also fails. The major difference is the senders of the probes are the same > one rather than two different roles, so the results may tend to be the same. > > On the other hand, the fact also means that the jobs can be fragile in an > unstable environment, no matter whether the failover is triggered by TM or > JM. So maybe it's not that worthy to introduce extra configurations for > fault tolerance of heartbeat, unless we also introduce some retry > strategies for netty connections. > > > On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann wrote: > >> Could you share the full logs with us for the second experiment, Lu? I >> cannot tell from the top of my head why it should take 30s unless you have >> configured a restart delay of 30s. >> >> Let's discuss FLINK-23216 on the JIRA ticket, Gen. >> >> I've now implemented FLINK-23209 [1] but it somehow has the problem that >> in a flakey environment you might not want to mark a TaskExecutor dead on >> the first connection loss. Maybe this is something we need to make >> configurable (e.g. introducing a threshold which admittedly is similar to >> the heartbeat timeout) so that the user can configure it for her >> environment. On the upside, if you mark the TaskExecutor dead on the first >> connection loss (assuming you have a stable network environment), then it >> can now detect lost TaskExecutors as fast as the heartbeat interval. >> >> [1] https://issues.apache.org/jira/browse/FLINK-23209 >> >> Cheers, >> Till >> >> On Fri, Jul 2, 2021 at 9:33 AM Gen Luo wrote: >> >>> Thanks for sharing, Till and Yang. >>> >>> @Lu >>> Sorry but I don't know how to explain the new test with the log. Let's >>> wait for others' reply. >>> >>> @Till >>> It would be nice if JIRAs could be fixed. Thanks again for proposing >>> them. >>> >>> In addition, I was tracking an issue that RM keeps allocating and >>> freeing slots after a TM lost until its heartbeat timeout, when I found the >>> recovery costing as long as heartbeat timeout. That should be a minor bug >>> introduced by declarative resource management. I have created a JIRA about >>> the problem [1] and we can discuss it there if necessary. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-23216 >>> >>> Lu Niu 于2021年7月2日周五 上午3:13写道: >>> Another side question, Shall we add metric to cover the complete restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only covers phase 1. Thanks! Best Lu On Thu, Jul 1, 2021 at 12:09 PM Lu Niu wrote: > Thanks TIll and Yang for help! Also Thanks Till for a quick fix! > > I did another test yesterday. In this test, I intentionally throw > exception from the source operator: > ``` > if (runtimeContext.getIndexOfThisSubtask() == 1 > && errorFrenquecyInMin > 0 > && System.currentTimeMillis() - lastStartTime >= > errorFrenquecyInMin * 60 * 1000) { > lastStartTime = System.currentTimeMillis(); > throw new RuntimeException( > "Trigger expected exception at: " + lastStartTime); > } > ``` > In this case, I found phase 1 still takes about 30s and Phase 2 > dropped to 1s (because no need for container allocation). Why phase 1 > still > takes 30s even though no TM is lost? > > Related logs: > ``` > 2021-06-30 00:55:07,463 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: > USER_MATERIALIZED_EVENT_SIGNAL-user_context-frontend_event_source -> ... > java.lang.RuntimeException: Trigger expected exception at: 1625014507446 > 2021-06-30 00:55:07,509 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job > NRTG_USER_MATERIALIZED_EVENT_SIGNAL_user_context_staging > (35c95ee7141334845cfd49b642fa9f98) switched from state RUNNING to > RESTARTING. > 2021-06-30 00:55:37,596 INFO
Re: [VOTE] Release 1.13.2, release candidate #1
+1 (non-binding) - built from sources - run streaming job of wordcount - web-ui looks good - checkpoint and restore looks good Best, Zakelly On Mon, Jul 5, 2021 at 2:40 PM Jingsong Li wrote: > +1 (non-binding) > > - Verified checksums and signatures > - Built from sources > - run table example jobs > - web-ui looks good > - sql-client looks good > > I think we should update the unresolved JIRAs in [1] to 1.13.3. > > And we should check resolved JIRAs in [2], commits of some are not in the > 1.13.2. We should exclude them. For example FLINK-23196 FLINK-23166 > > [1] > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.13.2%20AND%20status%20not%20in%20(Closed%2C%20Resolved)%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC > [2] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218&styleName=&projectId=12315522 > > Best, > Jingsong > > On Mon, Jul 5, 2021 at 2:27 PM Xingbo Huang wrote: > > > +1 (non-binding) > > > > - Verified checksums and signatures > > - Built from sources > > - Verified Python wheel package contents > > - Pip install Python wheel package in Mac > > - Run Python UDF job in Python shell > > > > Best, > > Xingbo > > > > Yangze Guo 于2021年7月5日周一 上午11:17写道: > > > > > +1 (non-binding) > > > > > > - built from sources > > > - run example jobs with standalone and yarn. > > > - check TaskManager's rest API from the JM master and its standby, > > > everything looks good > > > > > > Best, > > > Yangze Guo > > > > > > On Mon, Jul 5, 2021 at 10:10 AM Xintong Song > > > wrote: > > > > > > > > +1 (binding) > > > > > > > > - verified checksums & signatures > > > > - built from sources > > > > - run example jobs with standalone and native k8s (with custom image) > > > > deployments > > > > * job execution looks fine > > > > * nothing unexpected found in logs and web ui > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Sun, Jul 4, 2021 at 12:36 PM JING ZHANG > > wrote: > > > > > > > > > Hi yun, > > > > > Website pull request lists[6] and JIRA release notes[1] both > contain > > > > > unfinished JIRA (such as FLINK-22955). > > > > > Is it expected? > > > > > > > > > > Best regards, > > > > > JING ZHANG > > > > > > > > > > Dawid Wysakowicz 于2021年7月2日周五 下午9:05写道: > > > > > > > > > > > +1 (binding) > > > > > > > > > > > >- verified signatures and checksums > > > > > >- reviewed the announcement PR > > > > > >- built from sources and run an example, quickly checked Web > UI > > > > > >- checked diff of pom.xml and NOTICE files from 1.13.1, > > > > > >- commons-io updated, > > > > > > - bundled guava:failureaccess addded in > > > flink-sql-connector-kinesis > > > > > > which is properly reflected in the NOTICE file > > > > > > > > > > > > Best, > > > > > > Dawid > > > > > > On 01/07/2021 12:57, Yun Tang wrote: > > > > > > > > > > > > Hi everyone, > > > > > > Please review and vote on the release candidate #1 for the > version > > > > > 1.13.2, as follows: > > > > > > [ ] +1, Approve the release > > > > > > [ ] -1, Do not approve the release (please provide specific > > comments) > > > > > > > > > > > > > > > > > > The complete staging area is available for your review, which > > > includes: > > > > > > * JIRA release notes [1], > > > > > > * the official Apache source release and binary convenience > > releases > > > to > > > > > be deployed to dist.apache.org [2], which are signed with the key > > with > > > > > fingerprint 78A306590F1081CC6794DC7F62DAD618E07CF996 [3], > > > > > > * all artifacts to be deployed to the Maven Central Repository > [4], > > > > > > * source code tag "release-1.13.2-rc1" [5], > > > > > > * website pull request listing the new release and adding > > > announcement > > > > > blog post [6]. > > > > > > > > > > > > The vote will be open for at least 72 hours. It is adopted by > > > majority > > > > > approval, with at least 3 PMC affirmative votes. > > > > > > > > > > > > Best, > > > > > > Yun Tang > > > > > > > > > > > > [1] > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218&styleName=&projectId=12315522 > > > > > > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-1.13.2-rc1/ > > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > > > > [4] > > > > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1429/ > > > > > > [5] > > https://github.com/apache/flink/releases/tag/release-1.13.2-rc1 > > > > > > [6] https://github.com/apache/flink-web/pull/453 > > > > > > > > > > > > > > > > > > > > > > > > > -- > Best, Jingsong Lee >
[jira] [Created] (FLINK-23250) Jira Bot Should Not Unassign Automatically
Konstantin Knauf created FLINK-23250: Summary: Jira Bot Should Not Unassign Automatically Key: FLINK-23250 URL: https://issues.apache.org/jira/browse/FLINK-23250 Project: Flink Issue Type: Improvement Reporter: Konstantin Knauf Assignee: Konstantin Knauf -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Feedback Collection Jira Bot
Hi everyone, ok, let's in addition try out not unassigning anyone from tickets. This makes it the responsibility of the component maintainers to periodically check for stale-unassigned tickets and bring them to a resolution. We can monitor the situation (# of stale-unassigned tickets) and if the number of open stale-unassigned tickets is ever increasing, we need to revisit this topic. For reference here are the tickets for the adjustments: * https://issues.apache.org/jira/browse/FLINK-23207 (PR available) * https://issues.apache.org/jira/browse/FLINK-23206 (blocked by INFRA) * https://issues.apache.org/jira/browse/FLINK-23205 (merged) * https://issues.apache.org/jira/browse/FLINK-23250 (open) Cheers, Konstantin On Fri, Jul 2, 2021 at 9:43 AM Piotr Nowojski wrote: > +1 for the unassignment remark from Stephan > > Piotrek > > czw., 1 lip 2021 o 12:35 Stephan Ewen napisał(a): > > > It is true that the bot surfaces problems that are there (not enough > > committer attention sometimes), but it also "rubs salt in the wound" of > > contributors, and that is tricky. > > > > We can try it out with the extended periods (although I think that in > > reality we probably need even longer periods) and see how it goes. > > > > One thing I would suggest is to never let the bot unassign issues. It > just > > strikes me as very cold and respectless to be unassigned by a bot from an > > issue in which I invested time and energy. (The committers don't even > take > > the time to talk to me and explain why the contribution will not go > > forward). > > Unassignment should come from another person, possibly in response to a > > ping from the bot. I think that makes a big difference in contributor > > treatment. > > > > > > > > On Wed, Jun 30, 2021 at 12:30 PM Till Rohrmann > > wrote: > > > > > I agree that we shouldn't discourage contributions. > > > > > > For me the main idea of the bot is not to clean up the JIRA but to > > improve > > > our communication and expectation management with the community. There > > are > > > many things we could do but for a lot of things we don't have the time > > and > > > capacity. Then to say at some point that we won't do something is just > > > being honest. This also shows when looking at the JIRA numbers of the > > > merged commits. We very rarely resolve tickets which are older than x > > days > > > and if we do, then we usually create a new ticket for the problem. > > > > > > The fact that we see some tickets with available pull requests go stale > > is > > > the symptom that we don't value them to be important enough or > > > allocate enough time for external contributions imo. Otherwise, they > > would > > > have gotten the required attention and been merged. In such a case, > > raising > > > awareness by pinging the watchers of the respective ticket is probably > > > better than silently ignoring the PR. Also adding labels to filter for > > > these PRs should help to get them the required attention. But also > here, > > it > > > happens very rarely that we actually merge a PR that is older than y > > days. > > > Ideally we avoid this situation altogether by only assigning > contributors > > > to tickets for which a committer has review capacity. However, this > does > > > not seem to always work. > > > > > > In some sense, the JIRA bot shows us the things, which fall through the > > > cracks, more explicitly (which is probably not different than before). > Of > > > course we should try to find the time periods for when to ping or > > > de-prioritize tickets that work best for the community. > > > > > > +1 for the proposed changes (extended time periods, "Not a Priority", > > > default priority and fixVersion). > > > > > > @Piotr, I think we have the priorities defined here [1]. Maybe it is > > enough > > > to share the link so that everyone can check whether her assumptions > are > > > correct. > > > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process > > > > > > Cheers, > > > Till > > > > > > On Wed, Jun 30, 2021 at 10:59 AM Piotr Nowojski > > > wrote: > > > > > > > > * Introduce "Not a Priority" priority and stop closing tickets. > > > > > > > > +1 for this one (I also like the name you proposed for this > Konstantin) > > > > > > > > I also have no objections to other proposals that you summarised. > Just > > a > > > > remark, that independently of this discussion we might want to > revisit > > or > > > > reconfirm the priorities and their definition/interpretation across > all > > > > contributors. > > > > > > > > Best, > > > > Piotrek > > > > > > > > śr., 30 cze 2021 o 10:15 Konstantin Knauf > > > napisał(a): > > > > > > > > > Hi everyone, > > > > > > > > > > Thank you for the additional comments and suggestions. > > > > > > > > > > @Stephan, Kurt: I agree that we shouldn't discourage or dishearten > > > > > contributors, and probably 14 days until a ticket becomes > > > > "stale-assigned" > > > > > are too few. That's why I've already pro
[jira] [Created] (FLINK-23251) Support more than one retained checkpoints
Roman Khachatryan created FLINK-23251: - Summary: Support more than one retained checkpoints Key: FLINK-23251 URL: https://issues.apache.org/jira/browse/FLINK-23251 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Roman Khachatryan Fix For: 1.14.0 FLINK-23139 should add private state management capabilities to TM. It does not consider multiple retained checkpoints. >From the offline discussions, there seem to be three options: # pass all num-retained snapshots from JM to TM on recovery (and increase counts accordingly) # do ref count on JM on recovery and send these counts to TM # store counts externally (so JM is not involved) Option 1 seems preferrable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23252) Support recovery with altered "changelog.enabled" option
Roman Khachatryan created FLINK-23252: - Summary: Support recovery with altered "changelog.enabled" option Key: FLINK-23252 URL: https://issues.apache.org/jira/browse/FLINK-23252 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Roman Khachatryan Fix For: 1.14.0 Currently, recovery from savepoint with changelog.enabled altered is not supported (an exception will be thrown because of wrong state handle type). It can be implemented by: * {{disabled -> enabled}} : load materialized state with empty non-materialized * {{enabled -> disabled}}: in {{changelog.snapshot()}} , call {{underlyingBackend.snapshot()}} (instead of {{writer.persist()}}) if savepoint -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23253) Expose some interfaces of changelog to users
Zakelly Lan created FLINK-23253: --- Summary: Expose some interfaces of changelog to users Key: FLINK-23253 URL: https://issues.apache.org/jira/browse/FLINK-23253 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing, Runtime / State Backends Affects Versions: 1.14.0 Reporter: Zakelly Lan Users may want to implement some interface of changelog, see discussion [here|https://github.com/apache/flink/pull/16341#discussion_r662767236]. It is better to expose some interface or API as {{\@PublicEvloving}}, after we finish other tickets for changelog. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23254) Upgrade Test Framework to JUnit 5
Qingsheng Ren created FLINK-23254: - Summary: Upgrade Test Framework to JUnit 5 Key: FLINK-23254 URL: https://issues.apache.org/jira/browse/FLINK-23254 Project: Flink Issue Type: New Feature Components: Tests Affects Versions: 1.14.0 Reporter: Qingsheng Ren Fix For: 1.14.0 Please see mailing list discussion about the background of this upgrade. [{color:#33}https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E{color}] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23255) Add JUnit 5 jupiter and vintage engine
Qingsheng Ren created FLINK-23255: - Summary: Add JUnit 5 jupiter and vintage engine Key: FLINK-23255 URL: https://issues.apache.org/jira/browse/FLINK-23255 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.14.0 Reporter: Qingsheng Ren Fix For: 1.14.0 Add dependencies for JUnit 5 jupiter for supporting JUnit 5 tests, and vintage engine for supporting test cases in JUnit 4 style -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23256) Explain string shows output of legacy planner
Mans Singh created FLINK-23256: -- Summary: Explain string shows output of legacy planner Key: FLINK-23256 URL: https://issues.apache.org/jira/browse/FLINK-23256 Project: Flink Issue Type: Improvement Components: Documentation, Table SQL / API Affects Versions: 1.13.1 Reporter: Mans Singh Fix For: 1.14.0 The output on [Concepts & Common API page|https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/common/#explaining-a-table/] documentation page of: {quote}table.explain() {quote} is showing the result of the legacy planner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23257) docker-build.sh outdated
Chesnay Schepler created FLINK-23257: Summary: docker-build.sh outdated Key: FLINK-23257 URL: https://issues.apache.org/jira/browse/FLINK-23257 Project: Flink Issue Type: Bug Components: Project Website Reporter: Chesnay Schepler Assignee: Chesnay Schepler The script doesn't work because it tries to use an outdated ruby version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23258) Make incremental jekyll opt in and log a warning
Chesnay Schepler created FLINK-23258: Summary: Make incremental jekyll opt in and log a warning Key: FLINK-23258 URL: https://issues.apache.org/jira/browse/FLINK-23258 Project: Flink Issue Type: Improvement Components: Project Website Reporter: Chesnay Schepler Assignee: Chesnay Schepler In the past we have repeatedly seen issues where using {{--incremental}} can cause surprising issues, such as new pages or blog posts not being displayed. I suggest to a) add a new option to explicitly enabled incremental builds and b) log a short warning what common issues can arise if used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23259) [DOCS]The 'window' link on page A is failed and 404 is returned
wuguihu created FLINK-23259: --- Summary: [DOCS]The 'window' link on page A is failed and 404 is returned Key: FLINK-23259 URL: https://issues.apache.org/jira/browse/FLINK-23259 Project: Flink Issue Type: Bug Components: Documentation Reporter: wuguihu [https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#window] The 'window' link this page is failed and 404 is returned。 The original is as follows: ```txt See [windows](windows.html) for a complete description of windows. ``` See [windows]([https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/windows.html]) for a complete description of windows. The modification is done as follows ```txt [windows](windows.html) --> [windows]({{< ref "docs/dev/datastream/operators/windows" >}}) ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23260) [DOCS]The link on page docs/libs/gelly/overview is failed and 404 is returned
wuguihu created FLINK-23260: --- Summary: [DOCS]The link on page docs/libs/gelly/overview is failed and 404 is returned Key: FLINK-23260 URL: https://issues.apache.org/jira/browse/FLINK-23260 Project: Flink Issue Type: Bug Components: Documentation Reporter: wuguihu [https://ci.apache.org/projects/flink/flink-docs-master/docs/libs/gelly/overview/] The link shown below returns 404 ```txt * [Graph API](graph_api.html) * [Iterative Graph Processing](iterative_graph_processing.html) * [Library Methods](library_methods.html) * [Graph Algorithms](graph_algorithms.html) * [Graph Generators](graph_generators.html) * [Bipartite Graphs](bipartite_graph.html) ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23261) StateFun - HTTP State Access Interface
Stephan Ewen created FLINK-23261: Summary: StateFun - HTTP State Access Interface Key: FLINK-23261 URL: https://issues.apache.org/jira/browse/FLINK-23261 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Stephan Ewen h2. Functionality To improve operations of StateFun applications, we should offer an interface to query and manipulate state entries. This can be used for exploration and debugging purposes, and to repair state if the application state needs to be corrected. To make this simple, I would suggest an HTTP REST interface: - GET: read state entry for key - PUT: set/overwrite state for key - DELETE: drop state entry for key The URLs could be: {{http(s)://statefun-service/state}}, where the string {{//}} is the fully-qualified address of the target, and the {{statename}} is the name under which that persistent state is stored. Keys are always UTF-8 strings in StateFun, so they can be encoded in the URL. For the responses, we would use the common codes 200 for GET/PUT/DELETE success and 404 for GET/DELETE not found. The state values, as returned from GET requests, would be generally just the bytes, and not interpreted by this request handling. The integrate of the StateFun type system and HTTP content types (mime types) is up for further discussion. One option is set the content type response header to {{"statefun/"}}, where all non-simple types map to {{Content-Type: application/octet-stream}}. We may make an exception for strings which could be returned as {{Content-Type: text/plain; charset=UTF-8}}. Later refinement is possible, like auto-stringifying contents when the request indicates to only accept {{text/plain}} responses. h2. Failure Guarantees The initial failure guarantees for PUT/DELETE would be none - the requests would be handled best effort. We can easily extend this later in one of two ways: - Delay responses to the HTTP requests until the next checkpoint is complete. That makes synchronous interaction easy and sounds like a good match for a more admin-style interface. - Return the current checkpoint ID and offer a way to poll until the next checkpoint is completed. This avoid blocking requests, but puts more burden on the caller. Given the expected nature of the use cases for PUT/DELETE are more of a "admin/fix" nature, I would suggest to go with synchronous requests, for simplicity. h2. Implementation There are two options to implement this: (1) A Source/Sink (Ingress/Egress) pair (2) An Operator Coordinator with HTTP Requests *Option (1) - Source/Sink pair* We would implement an specific source that is both an ingress and an egress. The source would spawn a HTTP server (singleton per TM process). Requests would be handled as follows: - Am HTTP request gets a generated correlation-ID. - The source injects a new message type (a "control message") into the stream. That message holds the Correlation-ID, the parallel subtask index of the originating source, and the target address and state name. - The function dispatcher handled these message in a special way, retrieving the state and sending an Egress message with the Correlation-ID to the parallel subtask of the egress as indicated by the message's subtask index. - The Egress (which is the same instance as the ingress source) uses to correlation ID to respond to the request. Advantages: - No changes necessary in Flink - Might sustain higher throughput, due to multiple HTTP endpoints Disadvantages: - Additional HTTP servers and ports require more setup (like service definitions on K8s). - Need to introduce new control message type and extend function dispatcher to handle them. - Makes a hard assumption that sources run on all slots. Needs "ugly" singleton hack to start only one server per TM process. *Option (2) - Operator Coordinator* Operator Coordinators are instances that run on the {{JobManager}} and can communicate with the Tasks via RPC. Coordinators can receive calls from HTTP handlers at the JobManager's HTTP endpoint. An example for this is the Result Fetching through HTTP/OperatorCoordinator requests. We would need a patch to Flink to allow registering custom URLs and passing the path as a parameter to the request. The RPCs can be processed in the mailbox on the Tasks, making them thread safe. This would also completely avoid the round-trip (source-to-sink) problem, the tasks simply need to send a response back to the RPC. Advantages: - Reuse existing HTTP Endpoint and port. No need to have an additional HTTP server and port and service, for this admin-style requests, this approach re-uses Flink's admin HTTP endpoint. - No need for singleton HTTP Server logic in Tasks - Does require the assumption that all TMs run an instance of all operators. - No need for "control messages" and
[jira] [Created] (FLINK-23262) FileReadingWatermarkITCase.testWatermarkEmissionWithChaining fails on azure
Xintong Song created FLINK-23262: Summary: FileReadingWatermarkITCase.testWatermarkEmissionWithChaining fails on azure Key: FLINK-23262 URL: https://issues.apache.org/jira/browse/FLINK-23262 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.14.0 Reporter: Xintong Song Fix For: 1.14.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19942&view=logs&j=219e462f-e75e-506c-3671-5017d866ccf6&t=4c5dc768-5c82-5ab0-660d-086cb90b76a0&l=5584 {code} Jul 05 22:19:00 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.334 s <<< FAILURE! - in org.apache.flink.test.streaming.api.FileReadingWatermarkITCase Jul 05 22:19:00 [ERROR] testWatermarkEmissionWithChaining(org.apache.flink.test.streaming.api.FileReadingWatermarkITCase) Time elapsed: 4.16 s <<< FAILURE! Jul 05 22:19:00 java.lang.AssertionError: too few watermarks emitted: 4 Jul 05 22:19:00 at org.junit.Assert.fail(Assert.java:89) Jul 05 22:19:00 at org.junit.Assert.assertTrue(Assert.java:42) Jul 05 22:19:00 at org.apache.flink.test.streaming.api.FileReadingWatermarkITCase.testWatermarkEmissionWithChaining(FileReadingWatermarkITCase.java:65) Jul 05 22:19:00 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Jul 05 22:19:00 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Jul 05 22:19:00 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Jul 05 22:19:00 at java.lang.reflect.Method.invoke(Method.java:498) Jul 05 22:19:00 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Jul 05 22:19:00 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Jul 05 22:19:00 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Jul 05 22:19:00 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Jul 05 22:19:00 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Jul 05 22:19:00 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) Jul 05 22:19:00 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) Jul 05 22:19:00 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) Jul 05 22:19:00 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) Jul 05 22:19:00 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) Jul 05 22:19:00 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) Jul 05 22:19:00 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) Jul 05 22:19:00 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) Jul 05 22:19:00 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) Jul 05 22:19:00 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Jul 05 22:19:00 at org.junit.runners.ParentRunner.run(ParentRunner.java:413) Jul 05 22:19:00 at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) Jul 05 22:19:00 at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) Jul 05 22:19:00 at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) Jul 05 22:19:00 at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) Jul 05 22:19:00 at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) Jul 05 22:19:00 at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) Jul 05 22:19:00 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) Jul 05 22:19:00 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.12.5, release candidate #1
+1 (binding) - verified checksums & signatures - built from sources - run example jobs with standalone and native k8s deployments Thank you~ Xintong Song On Mon, Jul 5, 2021 at 11:18 AM Jingsong Li wrote: > Hi everyone, > > Please review and vote on the release candidate #1 for the version 1.12.5, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > The complete staging area is available for your review, which includes: > * JIRA release notes [1], > * the official Apache source release and binary convenience releases to be > deployed to dist.apache.org [2], which are signed with the key with > fingerprint FBB83C0A4FFB9CA8 [3], > * all artifacts to be deployed to the Maven Central Repository [4], > * source code tag "release-1.12.5-rc1" [5], > * website pull request listing the new release and adding announcement blog > post [6]. > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > Best, > Jingsong Lee > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350166 > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.5-rc1/ > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > [4] https://repository.apache.org/content/repositories/orgapacheflink-1430 > [5] https://github.com/apache/flink/releases/tag/release-1.12.5-rc1 > [6] https://github.com/apache/flink-web/pull/455 >
[jira] [Created] (FLINK-23263) LocalBufferPool can not request memory.
Ada Wong created FLINK-23263: Summary: LocalBufferPool can not request memory. Key: FLINK-23263 URL: https://issues.apache.org/jira/browse/FLINK-23263 Project: Flink Issue Type: Improvement Components: Runtime / Network Affects Versions: 1.10.1 Reporter: Ada Wong Flink job is running, bug it can not consume kafka data. This following is exception. "Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt, after_index, tablename, PROCTIME) -> where: (AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid, item, val, unit, dt, after_index, tablename, PROCTIME) -> where: (AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (sid AS STID, val AS VAL, dt AS DATATIME)) (1/1)" #79 prio=5 os_prio=0 tid=0x7fd4c4a94000 nid=0x21c0 in Object.wait() [0x7fd4d5719000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251) - locked <0x00074e6c8b98> (a java.util.ArrayDeque) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37) at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28) at DataStreamCalcRule$4160.processElement(Unknown Source) at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66) at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:488) at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:465) at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:425) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37) at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28) at DataStreamSourceConversion$4104.processElement(Unknown Source) at org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:488) at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:465) at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:425) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:686) at or
Re: Job Recovery Time on TM Lost
I know that there are retry strategies for akka rpc frameworks. I was just considering that, since the environment is shared by JM and TMs, and the connections among TMs (using netty) are flaky in unstable environments, which will also cause the job failure, is it necessary to build a strongly guaranteed connection between JM and TMs, or it could be as flaky as the connections among TMs? As far as I know, connections among TMs will just fail on their first connection loss, so behaving like this in JM just means "as flaky as connections among TMs". In a stable environment it's good enough, but in an unstable environment, it indeed increases the instability. IMO, though a single connection loss is not reliable, a double check should be good enough. But since I'm not experienced with an unstable environment, I can't tell whether that's also enough for it. On Mon, Jul 5, 2021 at 5:59 PM Till Rohrmann wrote: > I think for RPC communication there are retry strategies used by the > underlying Akka ActorSystem. So a RpcEndpoint can reconnect to a remote > ActorSystem and resume communication. Moreover, there are also > reconciliation protocols in place which reconcile the states between the > components because of potentially lost RPC messages. So the main question > would be whether a single connection loss is good enough for triggering the > timeout or whether we want a more elaborate mechanism to reason about the > availability of the remote system (e.g. a couple of lost heartbeat > messages). > > Cheers, > Till > > On Mon, Jul 5, 2021 at 10:00 AM Gen Luo wrote: > >> As far as I know, a TM will report connection failure once its connected >> TM is lost. I suppose JM can believe the report and fail the tasks in the >> lost TM if it also encounters a connection failure. >> >> Of course, it won't work if the lost TM is standalone. But I suppose we >> can use the same strategy as the connected scenario. That is, consider it >> possibly lost on the first connection loss, and fail it if double check >> also fails. The major difference is the senders of the probes are the same >> one rather than two different roles, so the results may tend to be the same. >> >> On the other hand, the fact also means that the jobs can be fragile in an >> unstable environment, no matter whether the failover is triggered by TM or >> JM. So maybe it's not that worthy to introduce extra configurations for >> fault tolerance of heartbeat, unless we also introduce some retry >> strategies for netty connections. >> >> >> On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann >> wrote: >> >>> Could you share the full logs with us for the second experiment, Lu? I >>> cannot tell from the top of my head why it should take 30s unless you have >>> configured a restart delay of 30s. >>> >>> Let's discuss FLINK-23216 on the JIRA ticket, Gen. >>> >>> I've now implemented FLINK-23209 [1] but it somehow has the problem that >>> in a flakey environment you might not want to mark a TaskExecutor dead on >>> the first connection loss. Maybe this is something we need to make >>> configurable (e.g. introducing a threshold which admittedly is similar to >>> the heartbeat timeout) so that the user can configure it for her >>> environment. On the upside, if you mark the TaskExecutor dead on the first >>> connection loss (assuming you have a stable network environment), then it >>> can now detect lost TaskExecutors as fast as the heartbeat interval. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-23209 >>> >>> Cheers, >>> Till >>> >>> On Fri, Jul 2, 2021 at 9:33 AM Gen Luo wrote: >>> Thanks for sharing, Till and Yang. @Lu Sorry but I don't know how to explain the new test with the log. Let's wait for others' reply. @Till It would be nice if JIRAs could be fixed. Thanks again for proposing them. In addition, I was tracking an issue that RM keeps allocating and freeing slots after a TM lost until its heartbeat timeout, when I found the recovery costing as long as heartbeat timeout. That should be a minor bug introduced by declarative resource management. I have created a JIRA about the problem [1] and we can discuss it there if necessary. [1] https://issues.apache.org/jira/browse/FLINK-23216 Lu Niu 于2021年7月2日周五 上午3:13写道: > Another side question, Shall we add metric to cover the complete > restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only > covers phase 1. Thanks! > > Best > Lu > > On Thu, Jul 1, 2021 at 12:09 PM Lu Niu wrote: > >> Thanks TIll and Yang for help! Also Thanks Till for a quick fix! >> >> I did another test yesterday. In this test, I intentionally throw >> exception from the source operator: >> ``` >> if (runtimeContext.getIndexOfThisSubtask() == 1 >> && errorFrenquecyInMin > 0 >> && System.currentTimeMillis() - lastStartTime >= >> err
[jira] [Created] (FLINK-23264) Fine Grained Resource Management Phase 2
Yangze Guo created FLINK-23264: -- Summary: Fine Grained Resource Management Phase 2 Key: FLINK-23264 URL: https://issues.apache.org/jira/browse/FLINK-23264 Project: Flink Issue Type: New Feature Components: Runtime / Coordination Reporter: Yangze Guo -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23265) Support evenly spread out in fine-grained resource management
Yangze Guo created FLINK-23265: -- Summary: Support evenly spread out in fine-grained resource management Key: FLINK-23265 URL: https://issues.apache.org/jira/browse/FLINK-23265 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Affects Versions: 1.14.0 Reporter: Yangze Guo -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23266) HA per-job cluster (rocks, non-incremental) hangs on Azure
Xintong Song created FLINK-23266: Summary: HA per-job cluster (rocks, non-incremental) hangs on Azure Key: FLINK-23266 URL: https://issues.apache.org/jira/browse/FLINK-23266 Project: Flink Issue Type: Bug Affects Versions: 1.12.4 Reporter: Xintong Song Fix For: 1.12.5 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19943&view=logs&j=6caf31d6-847a-526e-9624-468e053467d6&t=0b23652f-b18b-5b6e-6eb6-a11070364610&l=1858 {code} Jul 05 21:56:00 == Jul 05 21:56:00 Running 'Running HA per-job cluster (rocks, non-incremental) end-to-end test' Jul 05 21:56:00 == Jul 05 21:56:00 TEST_DATA_DIR: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-00772599944 Jul 05 21:56:00 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT Jul 05 21:56:00 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT Jul 05 21:56:01 Starting zookeeper daemon on host fv-az43-4. Jul 05 21:56:01 Running on HA mode: parallelism=4, backend=rocks, asyncSnapshots=true, incremSnapshots=false and zk=3.4. Jul 05 21:56:03 Starting standalonejob daemon on host fv-az43-4. Jul 05 21:56:03 Start 1 more task managers Jul 05 21:56:04 Starting taskexecutor daemon on host fv-az43-4. Jul 05 21:56:10 Job () is not yet running. Jul 05 21:56:18 Job () is running. Jul 05 21:56:18 Running JM watchdog @ 266158 Jul 05 21:56:18 Running TM watchdog @ 266159 Jul 05 21:56:18 Waiting for text Completed checkpoint [1-9]* for job to appear 2 of times in logs... Jul 05 21:56:22 Killed JM @ 264313 Jul 05 21:56:22 Waiting for text Completed checkpoint [1-9]* for job to appear 2 of times in logs... grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory Jul 05 21:56:26 Killed TM @ 264571 grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory Jul 05 21:56:26 Starting standalonejob daemon on host fv-az43-4. grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory Jul 05 21:57:12 Killed JM @ 267798 Jul 05 21:57:12 Waiting for text Completed checkpoint [1-9]* for job to appear 2 of times in logs... grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory Jul 05 21:57:15 Starting standalonejob daemon on host fv-az43-4. grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_ha.sh: line 151: [: 58)\n\tat org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java: integer expression expected Jul 05 21:58:07 Killed JM @ 271440 Jul 05 21:58:07 Waiting for text Completed checkpoint [1-9]* for job to appear 2 of times in logs... grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory Jul 05 21:58:09 Killed TM @ 267660 Jul 05 21:58:09 Starting standalonejob daemon on host fv-az43-4. grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory grep: /
[jira] [Created] (FLINK-23267) Activate Java code splitter for all generated classes
Caizhi Weng created FLINK-23267: --- Summary: Activate Java code splitter for all generated classes Key: FLINK-23267 URL: https://issues.apache.org/jira/browse/FLINK-23267 Project: Flink Issue Type: Sub-task Components: Table SQL / Runtime Reporter: Caizhi Weng In the second step we activate Java code splitter introduced in the first step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23268) [DOCS]The link on page docs/dev/table/sql/queries/match_recognize/ is failed and 404 is returned
wuguihu created FLINK-23268: --- Summary: [DOCS]The link on page docs/dev/table/sql/queries/match_recognize/ is failed and 404 is returned Key: FLINK-23268 URL: https://issues.apache.org/jira/browse/FLINK-23268 Project: Flink Issue Type: Bug Components: Documentation Reporter: wuguihu Attachments: image-20210706134442433.png Some link information on the page is incorrectly written, resulting in a 404 page The page url :[https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/match_recognize/] The markdown file:[https://github.com/apache/flink/blob/master/docs/content/docs/dev/table/sql/queries/match_recognize.md] When i click on this the link `[append table](dynamic_tables.html#update-and-append-queries)`, I get a 404 page。 The corresponding address is :[https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/match_recognize/dynamic_tables.html#update-and-append-queries] Refer to document [https://github.com/apache/flink/blob/master/docs/content.zh/docs/dev/table/sql/queries/match_recognize.md] for the correct link information The link shown below returns 404 {code:java} //1 [append table](dynamic_tables.html#update-and-append-queries) //2 [processing time or event time](time_attributes.html) //3 [time attributes](time_attributes.html) //4 rowtime attribute //5 proctime attribute //6 [state retention time](query_configuration.html#idle-state-retention-time) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23269) json format decode number to vatchar error when define Decimal type in ddl
silence created FLINK-23269: --- Summary: json format decode number to vatchar error when define Decimal type in ddl Key: FLINK-23269 URL: https://issues.apache.org/jira/browse/FLINK-23269 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: silence when use json format and define decimal json: {"c1":50.0,"c2":50.0} ddl: create table( c1 varchar, c2 decimal )with( 'format'='json' ) output: {"c1":"5E+1","c2":50.0} And the following unit tests will produce the following results {"double1":50.0,"double2":50.0,"double3":"50.0","float1":20.0,"float2":20.0,"float3":"20.0"} java.lang.AssertionError: Expected :+I[50.0, 50.0, 50.0, 20.0, 20.0, 20.0] Actual :+I[5E+1, 50.0, 50.0, 2E+1, 20.0, 20.0] {code:java} @Test public void testDeserialization() throws Exception { double doubleValue = 50.0; float floatValue = 20.0f; ObjectMapper objectMapper = new ObjectMapper(); ObjectNode root = objectMapper.createObjectNode(); root.put("double1", doubleValue); root.put("double2", doubleValue); root.put("double3", String.valueOf(doubleValue)); root.put("float1", floatValue); root.put("float2", floatValue); root.put("float3", String.valueOf(floatValue)); byte[] serializedJson = objectMapper.writeValueAsBytes(root); System.out.println(new String(serializedJson)); DataType dataType = ROW( FIELD("double1", STRING()), FIELD("double2", DECIMAL(10,1)), FIELD("double3", DOUBLE()), FIELD("float1", STRING()), FIELD("float2", DECIMAL(10,1)), FIELD("float3", FLOAT())); RowType rowType = (RowType) dataType.getLogicalType(); JsonRowDataDeserializationSchema deserializationSchema = new JsonRowDataDeserializationSchema( rowType, InternalTypeInfo.of(rowType), false, false, TimestampFormat.ISO_8601); Row expected = new Row(6); expected.setField(0, String.valueOf(doubleValue)); expected.setField(1, String.valueOf(doubleValue)); expected.setField(2, doubleValue); expected.setField(3, String.valueOf(floatValue)); expected.setField(4, String.valueOf(floatValue)); expected.setField(5, floatValue); RowData rowData = deserializationSchema.deserialize(serializedJson); Row actual = convertToExternal(rowData, dataType); assertEquals(expected, actual); } {code} when define the DecimalType ObjectMapper will enable USE_BIG_DECIMAL_FOR_FLOATS and jsonNode.asText() will call BigDecimal toString method {code:java} boolean hasDecimalType = LogicalTypeChecks.hasNested(rowType, t -> t instanceof DecimalType); if (hasDecimalType) { objectMapper.enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)