date:20210705

[jira] [Created] (FLINK-23241) Translate the page of "Working with State " into Chinese

2021-07-05 Thread baobinliang (Jira)

baobinliang created FLINK-23241:
---

 Summary: Translate the page of "Working with State " into Chinese
 Key: FLINK-23241
 URL: https://issues.apache.org/jira/browse/FLINK-23241
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation
Affects Versions: 1.13.0
Reporter: baobinliang
 Fix For: 1.13.0


The page url of "Working with State" is 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/datastream/fault-tolerance/state/.

The markdown file can be found in 
flink/docs/content.zh/docs/dev/datastream/fault-tolerance/state.md in English.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23242) Translate the page of "Handling Application Parameters" into Chinese

2021-07-05 Thread baobinliang (Jira)

baobinliang created FLINK-23242:
---

 Summary: Translate the page of "Handling Application Parameters" 
into Chinese
 Key: FLINK-23242
 URL: https://issues.apache.org/jira/browse/FLINK-23242
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation
Affects Versions: 1.13.0
Reporter: baobinliang
 Fix For: 1.13.0


The page url of "Handling Application Parameters" is 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/datastream/application_parameters/.

The markdown file can be found in 
flink/docs/content.zh/docs/dev/datastream/application_parameters.md in English.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23243) Translate "Windowing table-valued functions (Windowing TVFs)" page into Chinese

2021-07-05 Thread Edmond Wang (Jira)

Edmond Wang created FLINK-23243:
---

 Summary: Translate "Windowing table-valued functions (Windowing 
TVFs)" page into Chinese
 Key: FLINK-23243
 URL: https://issues.apache.org/jira/browse/FLINK-23243
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Documentation
Affects Versions: 1.13.0
Reporter: Edmond Wang
 Fix For: 1.13.0


The page url is 
[https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/sql/queries/select/]

The markdown file is located in *docs/content.zh/docs/dev/table/sql/select.md*

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Migrating Test Framework to JUnit 5

2021-07-05 Thread Arvid Heise

+1 (binding)

On Wed, Jun 30, 2021 at 5:56 PM Qingsheng Ren  wrote:

> Dear devs,
>
>
> As discussed in the thread[1], I’d like to start a vote on migrating the
> test framework of Flink project to JUnit 5.
>
>
> JUnit 5[2] provides many exciting new features that can simplify the
> development of test cases and make our test structure cleaner, such as
> pluggable extension models (replacing rules such as
> TestLogger/MiniCluster), improved parameterized test, annotation support
> and nested/dynamic test.
>
>
> The migration path towards JUnit 5 would be:
>
>
> 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the
> project
>
>
> The vintage engine will keep all existing JUnit4-style cases still
> valid. Since classes of JUnit 4 and 5 are located under different packages,
> there won’t be conflict having JUnit 4 cases in the project.
>
>
> 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules)
>
>
> 3. Migrate all existing tests to JUnit 5
>
>
> This would be a giant commit similar to code reformatting and could be
> merged after cutting the 1.14 release branch. There are many migration
> examples and experiences to refer to, also Intellij IDEA provides tools for
> refactoring.
>
>
> 4. Ban JUnit 4 imports in CheckStyle
>
>
> Some modules ilke Testcontainers still require JUnit 4 in the
> classpath, and JUnit 4 could still appear as transitive dependency. Banning
> JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the
> project into 4 & 5 again.
>
>
> 5. Remove vintage runner and some cleanup
>
>
>
> This vote will last for at least 72 hours, following the consensus voting
> process.
>
>
> Thanks!
>
>
> [1]
>
> https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E
>
> [2] https://junit.org/junit5
>
> --
> Best Regards,
>
> *Qingsheng Ren*
>
> Email: renqs...@gmail.com
>

Re: [VOTE] Migrating Test Framework to JUnit 5

2021-07-05 Thread Chesnay Schepler

+1 (binding)

On 05/07/2021 09:45, Arvid Heise wrote:

+1 (binding)

On Wed, Jun 30, 2021 at 5:56 PM Qingsheng Ren wrote:

Dear devs,

As discussed in the thread[1], I’d like to start a vote on migrating the
test framework of Flink project to JUnit 5.

JUnit 5[2] provides many exciting new features that can simplify the
development of test cases and make our test structure cleaner, such as
pluggable extension models (replacing rules such as
TestLogger/MiniCluster), improved parameterized test, annotation support
and nested/dynamic test.

The migration path towards JUnit 5 would be:

1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the
project

The vintage engine will keep all existing JUnit4-style cases still
valid. Since classes of JUnit 4 and 5 are located under different packages,
there won’t be conflict having JUnit 4 cases in the project.

2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules)

3. Migrate all existing tests to JUnit 5

This would be a giant commit similar to code reformatting and could be
merged after cutting the 1.14 release branch. There are many migration
examples and experiences to refer to, also Intellij IDEA provides tools for
refactoring.

4. Ban JUnit 4 imports in CheckStyle

Some modules ilke Testcontainers still require JUnit 4 in the
classpath, and JUnit 4 could still appear as transitive dependency. Banning
JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the
project into 4 & 5 again.

5. Remove vintage runner and some cleanup

This vote will last for at least 72 hours, following the consensus voting
process.

Thanks!

[1]

https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E

[2] https://junit.org/junit5

--
Best Regards,

*Qingsheng Ren*

Email: renqs...@gmail.com

Re: Job Recovery Time on TM Lost

2021-07-05 Thread Gen Luo

As far as I know, a TM will report connection failure once its connected TM
is lost. I suppose JM can believe the report and fail the tasks in the lost
TM if it also encounters a connection failure.

Of course, it won't work if the lost TM is standalone. But I suppose we can
use the same strategy as the connected scenario. That is, consider it
possibly lost on the first connection loss, and fail it if double check
also fails. The major difference is the senders of the probes are the same
one rather than two different roles, so the results may tend to be the same.

On the other hand, the fact also means that the jobs can be fragile in an
unstable environment, no matter whether the failover is triggered by TM or
JM. So maybe it's not that worthy to introduce extra configurations for
fault tolerance of heartbeat, unless we also introduce some retry
strategies for netty connections.

On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann  wrote:

> Could you share the full logs with us for the second experiment, Lu? I
> cannot tell from the top of my head why it should take 30s unless you have
> configured a restart delay of 30s.
>
> Let's discuss FLINK-23216 on the JIRA ticket, Gen.
>
> I've now implemented FLINK-23209 [1] but it somehow has the problem that
> in a flakey environment you might not want to mark a TaskExecutor dead on
> the first connection loss. Maybe this is something we need to make
> configurable (e.g. introducing a threshold which admittedly is similar to
> the heartbeat timeout) so that the user can configure it for her
> environment. On the upside, if you mark the TaskExecutor dead on the first
> connection loss (assuming you have a stable network environment), then it
> can now detect lost TaskExecutors as fast as the heartbeat interval.
>
> [1] https://issues.apache.org/jira/browse/FLINK-23209
>
> Cheers,
> Till
>
> On Fri, Jul 2, 2021 at 9:33 AM Gen Luo  wrote:
>
>> Thanks for sharing, Till and Yang.
>>
>> @Lu
>> Sorry but I don't know how to explain the new test with the log. Let's
>> wait for others' reply.
>>
>> @Till
>> It would be nice if JIRAs could be fixed. Thanks again for proposing them.
>>
>> In addition, I was tracking an issue that RM keeps allocating and freeing
>> slots after a TM lost until its heartbeat timeout, when I found the
>> recovery costing as long as heartbeat timeout. That should be a minor bug
>> introduced by declarative resource management. I have created a JIRA about
>> the problem [1] and  we can discuss it there if necessary.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-23216
>>
>> Lu Niu  于2021年7月2日周五 上午3:13写道：
>>
>>> Another side question, Shall we add metric to cover the complete
>>> restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only
>>> covers phase 1. Thanks!
>>>
>>> Best
>>> Lu
>>>
>>> On Thu, Jul 1, 2021 at 12:09 PM Lu Niu  wrote:
>>>
 Thanks TIll and Yang for help! Also Thanks Till for a quick fix!

 I did another test yesterday. In this test, I intentionally throw
 exception from the source operator:
 ```
 if (runtimeContext.getIndexOfThisSubtask() == 1
 && errorFrenquecyInMin > 0
 && System.currentTimeMillis() - lastStartTime >=
 errorFrenquecyInMin * 60 * 1000) {
   lastStartTime = System.currentTimeMillis();
   throw new RuntimeException(
   "Trigger expected exception at: " + lastStartTime);
 }
 ```
 In this case, I found phase 1 still takes about 30s and Phase 2 dropped
 to 1s (because no need for container allocation). Why phase 1 still takes
 30s even though no TM is lost?

 Related logs:
 ```
 2021-06-30 00:55:07,463 INFO
  org.apache.flink.runtime.executiongraph.ExecutionGraph- Source:
 USER_MATERIALIZED_EVENT_SIGNAL-user_context-frontend_event_source -> ...
 java.lang.RuntimeException: Trigger expected exception at: 1625014507446
 2021-06-30 00:55:07,509 INFO
  org.apache.flink.runtime.executiongraph.ExecutionGraph- Job
 NRTG_USER_MATERIALIZED_EVENT_SIGNAL_user_context_staging
 (35c95ee7141334845cfd49b642fa9f98) switched from state RUNNING to
 RESTARTING.
 2021-06-30 00:55:37,596 INFO
  org.apache.flink.runtime.executiongraph.ExecutionGraph- Job
 NRTG_USER_MATERIALIZED_EVENT_SIGNAL_user_context_staging
 (35c95ee7141334845cfd49b642fa9f98) switched from state RESTARTING to
 RUNNING.
 2021-06-30 00:55:38,678 INFO
  org.apache.flink.runtime.executiongraph.ExecutionGraph(time when
 all tasks switch from CREATED to RUNNING)
 ```
 Best
 Lu

 On Thu, Jul 1, 2021 at 12:06 PM Lu Niu  wrote:

> Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
>
> I did another test yesterday. In this test, I intentionally throw
> exception from the source operator:
> ```
> if (runtimeContext.getIndexOfThisSubtask() == 1
> && errorFrenquecyInMin >

Re: [VOTE] Migrating Test Framework to JUnit 5

2021-07-05 Thread Roman Khachatryan

+1 (binding)

Regards,
Roman

On Mon, Jul 5, 2021 at 9:47 AM Chesnay Schepler  wrote:
>
> +1 (binding)
>
> On 05/07/2021 09:45, Arvid Heise wrote:
> > +1 (binding)
> >
> > On Wed, Jun 30, 2021 at 5:56 PM Qingsheng Ren  wrote:
> >
> >> Dear devs,
> >>
> >>
> >> As discussed in the thread[1], I’d like to start a vote on migrating the
> >> test framework of Flink project to JUnit 5.
> >>
> >>
> >> JUnit 5[2] provides many exciting new features that can simplify the
> >> development of test cases and make our test structure cleaner, such as
> >> pluggable extension models (replacing rules such as
> >> TestLogger/MiniCluster), improved parameterized test, annotation support
> >> and nested/dynamic test.
> >>
> >>
> >> The migration path towards JUnit 5 would be:
> >>
> >>
> >> 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the
> >> project
> >>
> >>
> >>  The vintage engine will keep all existing JUnit4-style cases still
> >> valid. Since classes of JUnit 4 and 5 are located under different packages,
> >> there won’t be conflict having JUnit 4 cases in the project.
> >>
> >>
> >> 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules)
> >>
> >>
> >> 3. Migrate all existing tests to JUnit 5
> >>
> >>
> >>  This would be a giant commit similar to code reformatting and could be
> >> merged after cutting the 1.14 release branch. There are many migration
> >> examples and experiences to refer to, also Intellij IDEA provides tools for
> >> refactoring.
> >>
> >>
> >> 4. Ban JUnit 4 imports in CheckStyle
> >>
> >>
> >>  Some modules ilke Testcontainers still require JUnit 4 in the
> >> classpath, and JUnit 4 could still appear as transitive dependency. Banning
> >> JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the
> >> project into 4 & 5 again.
> >>
> >>
> >> 5. Remove vintage runner and some cleanup
> >>
> >>
> >>
> >> This vote will last for at least 72 hours, following the consensus voting
> >> process.
> >>
> >>
> >> Thanks!
> >>
> >>
> >> [1]
> >>
> >> https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E
> >>
> >> [2] https://junit.org/junit5
> >>
> >> --
> >> Best Regards,
> >>
> >> *Qingsheng Ren*
> >>
> >> Email: renqs...@gmail.com
> >>
>

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-07-05 Thread Márton Balassi

Thank you, team. Then based on this agreement we are moving the proposal to
the wiki and opening the PR soon.

On Thu, Jul 1, 2021 at 12:28 AM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> >
> > * Even if there is a major breaking version, Flink releases major
> versions
> > too where it could be added
> > Netty framework locking is true but AFAIK there was a discussion to
> > rewrite the Netty stuff to a more sexy thing but there was no agreement
> to
> > do that.
> >
>
> Flink major releases seem to happen even less frequently than Netty
> releases :( It would be unfortunate if a breaking Netty API change ended up
> in the FLINK-3957[1] catch-all 2.0 changes.
>
> All in all I would agree on making it experimental.
> >
>
> Thus I am happy with this compromise, thank you :)
>
> This would simply restrict use-cases where order is not important. Limiting
> > devs such an add way is no-go.
> >
>
> I think the only case to be made for imposing limitations would be to
> encourage devs to only use this API in very specific situations, otherwise
> to solve this in another way, and revisit the API if these limitations are
> met and alternatives do not work. That said, I am still trying to
> understand this specific Cloudera case – anything you can say about the
> limitations of its Flink setup (i.e, difficult to spawn sidecar processes
> (because of Yarn?)) would be greatly helpful to me and others without this
> bit of context.
>
> But I think the proposed priority function that you've added is a nice
> compromise as well, so +1 from my side with the proposal. I would only
> further suggest that we include the other options to this problem in the
> docs as the preferred approach, where possible.
>
> Thanks,
> Austin
>
>
> [1]: https://issues.apache.org/jira/browse/FLINK-3957
>
> On Wed, Jun 30, 2021 at 10:25 AM Gabor Somogyi 
> wrote:
>
> > Answered here because the text started to be crowded.
> >
> > > It also locks Flink into the current major version of Netty (and the
> > Netty framework itself) for the foreseeable future.
> > It's not doing any Netty version locking because:
> > * Netty not necessarily will add breaking changes in major versions, the
> > API is quite stable
> > * Even if there is a major breaking version, Flink releases major
> versions
> > too where it could be added
> > Netty framework locking is true but AFAIK there was a discussion to
> > rewrite the Netty stuff to a more sexy thing but there was no agreement
> to
> > do that.
> > All in all I would agree on making it experimental.
> >
> > > why not restrict the service loader to only allow one?
> > This would simply restrict use-cases where order is not important.
> > Limiting devs such an add way is no-go.
> > I think the ordering came up multiple places which I think is a good
> > reason fill this gap with a priority function.
> > I've updated the doc and added it...
> >
> > BR,
> > G
> >
> >
> > On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> >> Hi Gabor,
> >>
> >> Thanks for your answers. I appreciate the explanations. Please see my
> >> responses + further questions below.
> >>
> >>
> >> * What stability semantics do you envision for this API?
> 
> >>> As I foresee the API will be as stable as Netty API. Since there is
> >>> guarantee on no breaking changes between minor versions we can give the
> >>> same guarantee.
> >>> If for whatever reason we need to break it we can do it in major
> version
> >>> like every other open source project does.
> >>>
> >>
> >> * Does Flink expose dependencies’ APIs in other places? Since this
>  exposes the Netty API, will this make it difficult to upgrade Netty?
> 
> >>> I don't expect breaking changes between minor versions so such cases
> >>> there will be no issues. If there is a breaking change in major version
> >>> we need to wait Flink major version too.
> >>>
> >>
> >> To clarify, you are proposing this new API to have the same stability
> >> guarantees as @Public currently does? Where we will not introduce
> breaking
> >> changes unless absolutely necessary (and requiring a FLIP, etc.)?
> >>
> >> If this is the case, I think this puts the community in a tough position
> >> where we are forced to maintain compatibility with something that we do
> not
> >> have control over. It also locks Flink into the current major version of
> >> Netty (and the Netty framework itself) for the foreseeable future.
> >>
> >> I am saying we should not do this, perhaps this is the best solution to
> >> finding a good compromise here, but I am trying to discover +
> acknowledge
> >> the full implications of this proposal so they can be discussed.
> >>
> >> What do you think about marking this API as @Experimental and not
> >> guaranteeing stability between versions? Then, if we do decide we need
> to
> >> upgrade Netty (or move away from it), we can do so.
> >>
> >> * I share Till's concern about multiple factories – other HTTP
> middlew

[jira] [Created] (FLINK-23244) Refactor the time indicator materialization

2021-07-05 Thread JING ZHANG (Jira)

JING ZHANG created FLINK-23244:
--

 Summary: Refactor the time indicator materialization
 Key: FLINK-23244
 URL: https://issues.apache.org/jira/browse/FLINK-23244
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: JING ZHANG


Refactor the time indicator materialization, including

1. Move `time_indicator` materialize after `logical_rewrite` phase, which is 
closely before the physical optimization 

2. Port `RelTimeIndicatorConverter` from scala to Java

3. Refator `RelTimeIndicatorConverter` to match `FlinkLogicalRel`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23245) Materialize rowtime attribute fields of regular join's inputs

2021-07-05 Thread JING ZHANG (Jira)

JING ZHANG created FLINK-23245:
--

 Summary: Materialize rowtime attribute fields of regular join's 
inputs
 Key: FLINK-23245
 URL: https://issues.apache.org/jira/browse/FLINK-23245
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: JING ZHANG


Materialize rowtime attribute fields of regular join's inputs in 
`RelTimeIndicatorConverter`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23246) Refactor the time indicator materialization

2021-07-05 Thread JING ZHANG (Jira)

JING ZHANG created FLINK-23246:
--

 Summary: Refactor the time indicator materialization
 Key: FLINK-23246
 URL: https://issues.apache.org/jira/browse/FLINK-23246
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: JING ZHANG


Refactor the time indicator materialization, including

1. Move `time_indicator` materialize after `logical_rewrite` phase, which is 
closely before the physical optimization 

2. Port `RelTimeIndicatorConverter` from scala to Java

3. Refator `RelTimeIndicatorConverter` to match `FlinkLogicalRel`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[RESULT][VOTE] Migrating Test Framework to JUnit 5

2021-07-05 Thread Qingsheng Ren

The voting time of migrating to JUnit 5[1] has passed. I'd like to close
the vote now.

There were three +1 binding votes:
- Arvid Heise (binding)
- Chesnay Schepler (binding)
- Roman Khachatryan (binding)

There were no -1 votes.

The migration plan has been accepted. Thanks everyone for your support!

[1]
https://lists.apache.org/x/thread.html/r89a2675bce01ccfdcfc47f2b0af6ef1afdbe4bad96d8c679cf68825e@%3Cdev.flink.apache.org%3E

-- 
Best Regards,

Qingsheng Ren

Email: renqs...@gmail.com

[jira] [Created] (FLINK-23247) Materialize rowtime attribute fields of regular join's inputs

2021-07-05 Thread JING ZHANG (Jira)

JING ZHANG created FLINK-23247:
--

 Summary: Materialize rowtime attribute fields of regular join's 
inputs
 Key: FLINK-23247
 URL: https://issues.apache.org/jira/browse/FLINK-23247
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: JING ZHANG


Materialize rowtime attribute fields of regular join's inputs in 
`RelTimeIndicatorConverter`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23248) SinkWriter is not close when failing

2021-07-05 Thread Fabian Paul (Jira)

Fabian Paul created FLINK-23248:
---

 Summary: SinkWriter is not close when failing
 Key: FLINK-23248
 URL: https://issues.apache.org/jira/browse/FLINK-23248
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.12.4, 1.13.1, 1.14.0
Reporter: Fabian Paul


Currently the SinkWriter is only closed when the operator finishes in 
`AbstractSinkWriterOperator#close()` but we also must close the SinkWrite on 
`AbstractSinkWriterOperator#dispose()` to release possible acquired resources 
when failing

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23249) Introduce ShuffleMasterContext to ShuffleMaster

2021-07-05 Thread Yingjie Cao (Jira)

Yingjie Cao created FLINK-23249:
---

 Summary: Introduce ShuffleMasterContext to ShuffleMaster
 Key: FLINK-23249
 URL: https://issues.apache.org/jira/browse/FLINK-23249
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.14.0


Introduce ShuffleMasterContext to ShuffleMaster. Just like the 
ShuffleEnvironmentContext at the TaskManager side, the ShuffleMasterContext can 
act as a proxy of ShuffleMaster and other components of Flink like the 
ResourceManagerPartitionTracker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Job Recovery Time on TM Lost

2021-07-05 Thread Till Rohrmann

I think for RPC communication there are retry strategies used by the
underlying Akka ActorSystem. So a RpcEndpoint can reconnect to a remote
ActorSystem and resume communication. Moreover, there are also
reconciliation protocols in place which reconcile the states between the
components because of potentially lost RPC messages. So the main question
would be whether a single connection loss is good enough for triggering the
timeout or whether we want a more elaborate mechanism to reason about the
availability of the remote system (e.g. a couple of lost heartbeat
messages).

Cheers,
Till

On Mon, Jul 5, 2021 at 10:00 AM Gen Luo  wrote:

> As far as I know, a TM will report connection failure once its connected
> TM is lost. I suppose JM can believe the report and fail the tasks in the
> lost TM if it also encounters a connection failure.
>
> Of course, it won't work if the lost TM is standalone. But I suppose we
> can use the same strategy as the connected scenario. That is, consider it
> possibly lost on the first connection loss, and fail it if double check
> also fails. The major difference is the senders of the probes are the same
> one rather than two different roles, so the results may tend to be the same.
>
> On the other hand, the fact also means that the jobs can be fragile in an
> unstable environment, no matter whether the failover is triggered by TM or
> JM. So maybe it's not that worthy to introduce extra configurations for
> fault tolerance of heartbeat, unless we also introduce some retry
> strategies for netty connections.
>
>
> On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann  wrote:
>
>> Could you share the full logs with us for the second experiment, Lu? I
>> cannot tell from the top of my head why it should take 30s unless you have
>> configured a restart delay of 30s.
>>
>> Let's discuss FLINK-23216 on the JIRA ticket, Gen.
>>
>> I've now implemented FLINK-23209 [1] but it somehow has the problem that
>> in a flakey environment you might not want to mark a TaskExecutor dead on
>> the first connection loss. Maybe this is something we need to make
>> configurable (e.g. introducing a threshold which admittedly is similar to
>> the heartbeat timeout) so that the user can configure it for her
>> environment. On the upside, if you mark the TaskExecutor dead on the first
>> connection loss (assuming you have a stable network environment), then it
>> can now detect lost TaskExecutors as fast as the heartbeat interval.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-23209
>>
>> Cheers,
>> Till
>>
>> On Fri, Jul 2, 2021 at 9:33 AM Gen Luo  wrote:
>>
>>> Thanks for sharing, Till and Yang.
>>>
>>> @Lu
>>> Sorry but I don't know how to explain the new test with the log. Let's
>>> wait for others' reply.
>>>
>>> @Till
>>> It would be nice if JIRAs could be fixed. Thanks again for proposing
>>> them.
>>>
>>> In addition, I was tracking an issue that RM keeps allocating and
>>> freeing slots after a TM lost until its heartbeat timeout, when I found the
>>> recovery costing as long as heartbeat timeout. That should be a minor bug
>>> introduced by declarative resource management. I have created a JIRA about
>>> the problem [1] and  we can discuss it there if necessary.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-23216
>>>
>>> Lu Niu  于2021年7月2日周五 上午3:13写道：
>>>
 Another side question, Shall we add metric to cover the complete
 restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only
 covers phase 1. Thanks!

 Best
 Lu

 On Thu, Jul 1, 2021 at 12:09 PM Lu Niu  wrote:

> Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
>
> I did another test yesterday. In this test, I intentionally throw
> exception from the source operator:
> ```
> if (runtimeContext.getIndexOfThisSubtask() == 1
> && errorFrenquecyInMin > 0
> && System.currentTimeMillis() - lastStartTime >=
> errorFrenquecyInMin * 60 * 1000) {
>   lastStartTime = System.currentTimeMillis();
>   throw new RuntimeException(
>   "Trigger expected exception at: " + lastStartTime);
> }
> ```
> In this case, I found phase 1 still takes about 30s and Phase 2
> dropped to 1s (because no need for container allocation). Why phase 1 
> still
> takes 30s even though no TM is lost?
>
> Related logs:
> ```
> 2021-06-30 00:55:07,463 INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph- Source:
> USER_MATERIALIZED_EVENT_SIGNAL-user_context-frontend_event_source -> ...
> java.lang.RuntimeException: Trigger expected exception at: 1625014507446
> 2021-06-30 00:55:07,509 INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph- Job
> NRTG_USER_MATERIALIZED_EVENT_SIGNAL_user_context_staging
> (35c95ee7141334845cfd49b642fa9f98) switched from state RUNNING to
> RESTARTING.
> 2021-06-30 00:55:37,596 INFO

Re: [VOTE] Release 1.13.2, release candidate #1

2021-07-05 Thread Zakelly Lan

+1 (non-binding)

- built from sources
- run streaming job of wordcount
- web-ui looks good
- checkpoint and restore looks good

Best,
Zakelly

On Mon, Jul 5, 2021 at 2:40 PM Jingsong Li  wrote:

> +1 (non-binding)
>
> - Verified checksums and signatures
> - Built from sources
> - run table example jobs
> - web-ui looks good
> - sql-client looks good
>
> I think we should update the unresolved JIRAs in [1] to 1.13.3.
>
> And we should check resolved JIRAs in [2], commits of some are not in the
> 1.13.2. We should exclude them. For example FLINK-23196 FLINK-23166
>
> [1]
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.13.2%20AND%20status%20not%20in%20(Closed%2C%20Resolved)%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC
> [2]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218&styleName=&projectId=12315522
>
> Best,
> Jingsong
>
> On Mon, Jul 5, 2021 at 2:27 PM Xingbo Huang  wrote:
>
> > +1 (non-binding)
> >
> > - Verified checksums and signatures
> > - Built from sources
> > - Verified Python wheel package contents
> > - Pip install Python wheel package in Mac
> > - Run Python UDF job in Python shell
> >
> > Best,
> > Xingbo
> >
> > Yangze Guo  于2021年7月5日周一 上午11:17写道：
> >
> > > +1 (non-binding)
> > >
> > > - built from sources
> > > - run example jobs with standalone and yarn.
> > > - check TaskManager's rest API from the JM master and its standby,
> > > everything looks good
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Jul 5, 2021 at 10:10 AM Xintong Song 
> > > wrote:
> > > >
> > > > +1 (binding)
> > > >
> > > > - verified checksums & signatures
> > > > - built from sources
> > > > - run example jobs with standalone and native k8s (with custom image)
> > > > deployments
> > > >   * job execution looks fine
> > > >   * nothing unexpected found in logs and web ui
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Sun, Jul 4, 2021 at 12:36 PM JING ZHANG 
> > wrote:
> > > >
> > > > > Hi yun,
> > > > > Website pull request lists[6] and JIRA release notes[1] both
> contain
> > > > > unfinished JIRA (such as FLINK-22955).
> > > > > Is it expected?
> > > > >
> > > > > Best regards,
> > > > > JING ZHANG
> > > > >
> > > > > Dawid Wysakowicz  于2021年7月2日周五 下午9:05写道：
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > >- verified signatures and checksums
> > > > > >- reviewed the announcement PR
> > > > > >- built from sources and run an example, quickly checked Web
> UI
> > > > > >- checked diff of pom.xml and NOTICE files from 1.13.1,
> > > > > >- commons-io updated,
> > > > > >   - bundled guava:failureaccess addded in
> > > flink-sql-connector-kinesis
> > > > > >   which is properly reflected in the NOTICE file
> > > > > >
> > > > > > Best,
> > > > > > Dawid
> > > > > > On 01/07/2021 12:57, Yun Tang wrote:
> > > > > >
> > > > > > Hi everyone,
> > > > > > Please review and vote on the release candidate #1 for the
> version
> > > > > 1.13.2, as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> > releases
> > > to
> > > > > be deployed to dist.apache.org [2], which are signed with the key
> > with
> > > > > fingerprint 78A306590F1081CC6794DC7F62DAD618E07CF996 [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag "release-1.13.2-rc1" [5],
> > > > > > * website pull request listing the new release and adding
> > > announcement
> > > > > blog post [6].
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Best,
> > > > > > Yun Tang
> > > > > >
> > > > > > [1]
> > > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218&styleName=&projectId=12315522
> > > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.13.2-rc1/
> > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > > [4]
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1429/
> > > > > > [5]
> > https://github.com/apache/flink/releases/tag/release-1.13.2-rc1
> > > > > > [6] https://github.com/apache/flink-web/pull/453
> > > > > >
> > > > > >
> > > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

[jira] [Created] (FLINK-23250) Jira Bot Should Not Unassign Automatically

2021-07-05 Thread Konstantin Knauf (Jira)

Konstantin Knauf created FLINK-23250:


 Summary: Jira Bot Should Not Unassign Automatically
 Key: FLINK-23250
 URL: https://issues.apache.org/jira/browse/FLINK-23250
 Project: Flink
  Issue Type: Improvement
Reporter: Konstantin Knauf
Assignee: Konstantin Knauf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Feedback Collection Jira Bot

2021-07-05 Thread Konstantin Knauf

Hi everyone,

ok, let's in addition try out not unassigning anyone from tickets. This
makes it the responsibility of the component maintainers to
periodically check for stale-unassigned tickets and bring them to a
resolution. We can monitor the situation (# of stale-unassigned tickets)
and if the number of open stale-unassigned tickets is ever increasing, we
need to revisit this topic.

For reference here are the tickets for the adjustments:

* https://issues.apache.org/jira/browse/FLINK-23207 (PR available)
* https://issues.apache.org/jira/browse/FLINK-23206 (blocked by INFRA)
* https://issues.apache.org/jira/browse/FLINK-23205 (merged)
* https://issues.apache.org/jira/browse/FLINK-23250 (open)

Cheers,

Konstantin

On Fri, Jul 2, 2021 at 9:43 AM Piotr Nowojski  wrote:

> +1 for the unassignment remark from Stephan
>
> Piotrek
>
> czw., 1 lip 2021 o 12:35 Stephan Ewen  napisał(a):
>
> > It is true that the bot surfaces problems that are there (not enough
> > committer attention sometimes), but it also "rubs salt in the wound" of
> > contributors, and that is tricky.
> >
> > We can try it out with the extended periods (although I think that in
> > reality we probably need even longer periods) and see how it goes.
> >
> > One thing I would suggest is to never let the bot unassign issues. It
> just
> > strikes me as very cold and respectless to be unassigned by a bot from an
> > issue in which I invested time and energy. (The committers don't even
> take
> > the time to talk to me and explain why the contribution will not go
> > forward).
> > Unassignment should come from another person, possibly in response to a
> > ping from the bot. I think that makes a big difference in contributor
> > treatment.
> >
> >
> >
> > On Wed, Jun 30, 2021 at 12:30 PM Till Rohrmann 
> > wrote:
> >
> > > I agree that we shouldn't discourage contributions.
> > >
> > > For me the main idea of the bot is not to clean up the JIRA but to
> > improve
> > > our communication and expectation management with the community. There
> > are
> > > many things we could do but for a lot of things we don't have the time
> > and
> > > capacity. Then to say at some point that we won't do something is just
> > > being honest. This also shows when looking at the JIRA numbers of the
> > > merged commits. We very rarely resolve tickets which are older than x
> > days
> > > and if we do, then we usually create a new ticket for the problem.
> > >
> > > The fact that we see some tickets with available pull requests go stale
> > is
> > > the symptom that we don't value them to be important enough or
> > > allocate enough time for external contributions imo. Otherwise, they
> > would
> > > have gotten the required attention and been merged. In such a case,
> > raising
> > > awareness by pinging the watchers of the respective ticket is probably
> > > better than silently ignoring the PR. Also adding labels to filter for
> > > these PRs should help to get them the required attention. But also
> here,
> > it
> > > happens very rarely that we actually merge a PR that is older than y
> > days.
> > > Ideally we avoid this situation altogether by only assigning
> contributors
> > > to tickets for which a committer has review capacity. However, this
> does
> > > not seem to always work.
> > >
> > > In some sense, the JIRA bot shows us the things, which fall through the
> > > cracks, more explicitly (which is probably not different than before).
> Of
> > > course we should try to find the time periods for when to ping or
> > > de-prioritize tickets that work best for the community.
> > >
> > > +1 for the proposed changes (extended time periods, "Not a Priority",
> > > default priority and fixVersion).
> > >
> > > @Piotr, I think we have the priorities defined here [1]. Maybe it is
> > enough
> > > to share the link so that everyone can check whether her assumptions
> are
> > > correct.
> > >
> > > [1]
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jun 30, 2021 at 10:59 AM Piotr Nowojski 
> > > wrote:
> > >
> > > > > * Introduce "Not a Priority" priority and stop closing tickets.
> > > >
> > > > +1 for this one (I also like the name you proposed for this
> Konstantin)
> > > >
> > > > I also have no objections to other proposals that you summarised.
> Just
> > a
> > > > remark, that independently of this discussion we might want to
> revisit
> > or
> > > > reconfirm the priorities and their definition/interpretation across
> all
> > > > contributors.
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > > śr., 30 cze 2021 o 10:15 Konstantin Knauf 
> > > napisał(a):
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thank you for the additional comments and suggestions.
> > > > >
> > > > > @Stephan, Kurt: I agree that we shouldn't discourage or dishearten
> > > > > contributors, and probably 14 days until a ticket becomes
> > > > "stale-assigned"
> > > > > are too few. That's why I've already pro

[jira] [Created] (FLINK-23251) Support more than one retained checkpoints

2021-07-05 Thread Roman Khachatryan (Jira)

Roman Khachatryan created FLINK-23251:
-

 Summary: Support more than one retained checkpoints
 Key: FLINK-23251
 URL: https://issues.apache.org/jira/browse/FLINK-23251
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Roman Khachatryan
 Fix For: 1.14.0


FLINK-23139 should add private state management capabilities to TM. It does not 
consider multiple retained checkpoints.

>From the offline discussions, there seem to be three options:
 # pass all num-retained snapshots from JM to TM on recovery (and increase 
counts accordingly)
 # do ref count on JM on recovery and send these counts to TM
 # store counts externally (so JM is not involved)

Option 1 seems preferrable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23252) Support recovery with altered "changelog.enabled" option

2021-07-05 Thread Roman Khachatryan (Jira)

Roman Khachatryan created FLINK-23252:
-

 Summary: Support recovery with altered "changelog.enabled" option
 Key: FLINK-23252
 URL: https://issues.apache.org/jira/browse/FLINK-23252
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Roman Khachatryan
 Fix For: 1.14.0


Currently, recovery from savepoint with changelog.enabled altered is not 
supported (an exception will be thrown because of wrong state handle type).
 
It can be implemented by: * {{disabled -> enabled}} : load materialized state 
with empty non-materialized
 * {{enabled -> disabled}}: in {{changelog.snapshot()}} , call 
{{underlyingBackend.snapshot()}} (instead of {{writer.persist()}}) if savepoint



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23253) Expose some interfaces of changelog to users

2021-07-05 Thread Zakelly Lan (Jira)

Zakelly Lan created FLINK-23253:
---

 Summary: Expose some interfaces of changelog to users
 Key: FLINK-23253
 URL: https://issues.apache.org/jira/browse/FLINK-23253
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing, Runtime / State Backends
Affects Versions: 1.14.0
Reporter: Zakelly Lan


Users may want to implement some interface of changelog, see discussion 
[here|https://github.com/apache/flink/pull/16341#discussion_r662767236].

 

It is better to expose some interface or API as {{\@PublicEvloving}}, after we 
finish other tickets for changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23254) Upgrade Test Framework to JUnit 5

2021-07-05 Thread Qingsheng Ren (Jira)

Qingsheng Ren created FLINK-23254:
-

 Summary: Upgrade Test Framework to JUnit 5
 Key: FLINK-23254
 URL: https://issues.apache.org/jira/browse/FLINK-23254
 Project: Flink
  Issue Type: New Feature
  Components: Tests
Affects Versions: 1.14.0
Reporter: Qingsheng Ren
 Fix For: 1.14.0


Please see mailing list discussion about the background of this upgrade. 

[{color:#33}https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E{color}]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23255) Add JUnit 5 jupiter and vintage engine

2021-07-05 Thread Qingsheng Ren (Jira)

Qingsheng Ren created FLINK-23255:
-

 Summary: Add JUnit 5 jupiter and vintage engine
 Key: FLINK-23255
 URL: https://issues.apache.org/jira/browse/FLINK-23255
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.14.0
Reporter: Qingsheng Ren
 Fix For: 1.14.0


Add dependencies for JUnit 5 jupiter for supporting JUnit 5 tests, and vintage 
engine for supporting test cases in JUnit 4 style



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23256) Explain string shows output of legacy planner

2021-07-05 Thread Mans Singh (Jira)

Mans Singh created FLINK-23256:
--

 Summary: Explain string shows output of legacy planner
 Key: FLINK-23256
 URL: https://issues.apache.org/jira/browse/FLINK-23256
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table SQL / API
Affects Versions: 1.13.1
Reporter: Mans Singh
 Fix For: 1.14.0


The output on [Concepts & Common API 
page|https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/common/#explaining-a-table/]
 documentation page of:
{quote}table.explain()
{quote}
is showing the result of the legacy planner.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23257) docker-build.sh outdated

2021-07-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-23257:


 Summary: docker-build.sh outdated
 Key: FLINK-23257
 URL: https://issues.apache.org/jira/browse/FLINK-23257
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


The script doesn't work because it tries to use an outdated ruby version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23258) Make incremental jekyll opt in and log a warning

2021-07-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-23258:


 Summary: Make incremental jekyll opt in and log a warning
 Key: FLINK-23258
 URL: https://issues.apache.org/jira/browse/FLINK-23258
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


In the past we have repeatedly seen issues where using {{--incremental}} can 
cause surprising issues, such as new pages or blog posts not being displayed.
I suggest to a) add a new option to explicitly enabled incremental builds and 
b) log a short warning what common issues can arise if used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23259) [DOCS]The 'window' link on page A is failed and 404 is returned

2021-07-05 Thread wuguihu (Jira)

wuguihu created FLINK-23259:
---

 Summary: [DOCS]The 'window' link on page A is failed and 404 is 
returned
 Key: FLINK-23259
 URL: https://issues.apache.org/jira/browse/FLINK-23259
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: wuguihu


 

[https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#window]

 

The 'window' link this page is failed and 404 is returned。

The original is as follows：

```txt
 See [windows](windows.html) for a complete description of windows.
 ```

See 
[windows]([https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/windows.html])
 for a complete description of windows.

 

 

 

The modification is done as follows

```txt
 [windows](windows.html) --> [windows]({{< ref 
"docs/dev/datastream/operators/windows" >}})
 ```
  

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23260) [DOCS]The link on page docs/libs/gelly/overview is failed and 404 is returned

2021-07-05 Thread wuguihu (Jira)

wuguihu created FLINK-23260:
---

 Summary: [DOCS]The link on page docs/libs/gelly/overview is failed 
and 404 is returned
 Key: FLINK-23260
 URL: https://issues.apache.org/jira/browse/FLINK-23260
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: wuguihu


[https://ci.apache.org/projects/flink/flink-docs-master/docs/libs/gelly/overview/]

The link shown below returns 404

```txt

 * [Graph API](graph_api.html)
 * [Iterative Graph Processing](iterative_graph_processing.html)
 * [Library Methods](library_methods.html)
 * [Graph Algorithms](graph_algorithms.html)
 * [Graph Generators](graph_generators.html)
 * [Bipartite Graphs](bipartite_graph.html)

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23261) StateFun - HTTP State Access Interface

2021-07-05 Thread Stephan Ewen (Jira)

Stephan Ewen created FLINK-23261:


 Summary: StateFun - HTTP State Access Interface 
 Key: FLINK-23261
 URL: https://issues.apache.org/jira/browse/FLINK-23261
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Stephan Ewen


h2. Functionality

To improve operations of StateFun applications, we should offer an interface to 
query and manipulate state entries.
 This can be used for exploration and debugging purposes, and to repair state 
if the application state needs to be corrected.

To make this simple, I would suggest an HTTP REST interface:
 - GET: read state entry for key
 - PUT: set/overwrite state for key
 - DELETE: drop state entry for key

The URLs could be: 
{{http(s)://statefun-service/state}},
 where the string {{//}} is the fully-qualified address 
of the target, and the {{statename}} is the name under which that persistent 
state is stored.

Keys are always UTF-8 strings in StateFun, so they can be encoded in the URL.

For the responses, we would use the common codes 200 for GET/PUT/DELETE success 
and 404 for GET/DELETE not found.
 The state values, as returned from GET requests, would be generally just the 
bytes, and not interpreted by this request handling.

The integrate of the StateFun type system and HTTP content types (mime types) 
is up for further discussion.
 One option is set the content type response header to 
{{"statefun/"}}, where all non-simple types map to 
{{Content-Type: application/octet-stream}}. We may make an exception for 
strings which could be returned as {{Content-Type: text/plain; charset=UTF-8}}.
 Later refinement is possible, like auto-stringifying contents when the request 
indicates to only accept {{text/plain}} responses.
h2. Failure Guarantees

The initial failure guarantees for PUT/DELETE would be none - the requests 
would be handled best effort.

We can easily extend this later in one of two ways:
 - Delay responses to the HTTP requests until the next checkpoint is complete. 
That makes synchronous interaction easy and sounds like a good match for a more 
admin-style interface.
 - Return the current checkpoint ID and offer a way to poll until the next 
checkpoint is completed. This avoid blocking requests, but puts more burden on 
the caller.
 Given the expected nature of the use cases for PUT/DELETE are more of a 
"admin/fix" nature, I would suggest to go with synchronous requests, for 
simplicity.

h2. Implementation

There are two options to implement this:
 (1) A Source/Sink (Ingress/Egress) pair
 (2) An Operator Coordinator with HTTP Requests

*Option (1) - Source/Sink pair*

We would implement an specific source that is both an ingress and an egress.
 The source would spawn a HTTP server (singleton per TM process).

Requests would be handled as follows:
 - Am HTTP request gets a generated correlation-ID.
 - The source injects a new message type (a "control message") into the stream. 
That message holds the Correlation-ID, the parallel subtask index of the 
originating source, and the target address and state name.
 - The function dispatcher handled these message in a special way, retrieving 
the state and sending an Egress message with the Correlation-ID to the parallel 
subtask of the egress as indicated by the message's subtask index.
 - The Egress (which is the same instance as the ingress source) uses to 
correlation ID to respond to the request.

Advantages:
 - No changes necessary in Flink
 - Might sustain higher throughput, due to multiple HTTP endpoints

Disadvantages:
 - Additional HTTP servers and ports require more setup (like service 
definitions on K8s).
 - Need to introduce new control message type and extend function dispatcher to 
handle them.
 - Makes a hard assumption that sources run on all slots. Needs "ugly" 
singleton hack to start only one server per TM process.

*Option (2) - Operator Coordinator*

Operator Coordinators are instances that run on the {{JobManager}} and can 
communicate with the Tasks via RPC.

Coordinators can receive calls from HTTP handlers at the JobManager's HTTP 
endpoint.
 An example for this is the Result Fetching through HTTP/OperatorCoordinator 
requests.
 We would need a patch to Flink to allow registering custom URLs and passing 
the path as a parameter to the request.

The RPCs can be processed in the mailbox on the Tasks, making them thread safe.
 This would also completely avoid the round-trip (source-to-sink) problem, the 
tasks simply need to send a response back to the RPC.

Advantages:
 - Reuse existing HTTP Endpoint and port. No need to have an additional HTTP 
server and port and service, for this admin-style requests, this approach 
re-uses Flink's admin HTTP endpoint.
 - No need for singleton HTTP Server logic in Tasks
 - Does require the assumption that all TMs run an instance of all operators.
 - No need for "control messages" and

[jira] [Created] (FLINK-23262) FileReadingWatermarkITCase.testWatermarkEmissionWithChaining fails on azure

2021-07-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-23262:


 Summary: 
FileReadingWatermarkITCase.testWatermarkEmissionWithChaining fails on azure
 Key: FLINK-23262
 URL: https://issues.apache.org/jira/browse/FLINK-23262
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.14.0
Reporter: Xintong Song
 Fix For: 1.14.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19942&view=logs&j=219e462f-e75e-506c-3671-5017d866ccf6&t=4c5dc768-5c82-5ab0-660d-086cb90b76a0&l=5584

{code}
Jul 05 22:19:00 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 4.334 s <<< FAILURE! - in 
org.apache.flink.test.streaming.api.FileReadingWatermarkITCase
Jul 05 22:19:00 [ERROR] 
testWatermarkEmissionWithChaining(org.apache.flink.test.streaming.api.FileReadingWatermarkITCase)
  Time elapsed: 4.16 s  <<< FAILURE!
Jul 05 22:19:00 java.lang.AssertionError: too few watermarks emitted: 4
Jul 05 22:19:00 at org.junit.Assert.fail(Assert.java:89)
Jul 05 22:19:00 at org.junit.Assert.assertTrue(Assert.java:42)
Jul 05 22:19:00 at 
org.apache.flink.test.streaming.api.FileReadingWatermarkITCase.testWatermarkEmissionWithChaining(FileReadingWatermarkITCase.java:65)
Jul 05 22:19:00 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Jul 05 22:19:00 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Jul 05 22:19:00 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Jul 05 22:19:00 at java.lang.reflect.Method.invoke(Method.java:498)
Jul 05 22:19:00 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Jul 05 22:19:00 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Jul 05 22:19:00 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Jul 05 22:19:00 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Jul 05 22:19:00 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Jul 05 22:19:00 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Jul 05 22:19:00 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Jul 05 22:19:00 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Jul 05 22:19:00 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
Jul 05 22:19:00 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
Jul 05 22:19:00 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
Jul 05 22:19:00 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
Jul 05 22:19:00 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
Jul 05 22:19:00 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
Jul 05 22:19:00 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
Jul 05 22:19:00 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release 1.12.5, release candidate #1

2021-07-05 Thread Xintong Song

+1 (binding)

- verified checksums & signatures
- built from sources
- run example jobs with standalone and native k8s deployments

Thank you~

Xintong Song



On Mon, Jul 5, 2021 at 11:18 AM Jingsong Li  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 1.12.5,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint FBB83C0A4FFB9CA8 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.12.5-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Best,
> Jingsong Lee
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350166
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.5-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1430
> [5] https://github.com/apache/flink/releases/tag/release-1.12.5-rc1
> [6] https://github.com/apache/flink-web/pull/455
>

[jira] [Created] (FLINK-23263) LocalBufferPool can not request memory.

2021-07-05 Thread Ada Wong (Jira)

Ada Wong created FLINK-23263:


 Summary: LocalBufferPool can not request memory.
 Key: FLINK-23263
 URL: https://issues.apache.org/jira/browse/FLINK-23263
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Affects Versions: 1.10.1
Reporter: Ada Wong


Flink job is running, bug it can not consume kafka data.
This following is exception.

"Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt,
after_index, tablename, PROCTIME) -> where: (AND(=(tablename,
CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))),
OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'),
=(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'),
=(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item
AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid,
item, val, unit, dt, after_index, tablename, PROCTIME) -> where:
(AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt,
_UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid,
_UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item,
_UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (sid AS
STID, val AS VAL, dt AS DATATIME)) (1/1)" #79 prio=5 os_prio=0
tid=0x7fd4c4a94000 nid=0x21c0 in Object.wait()
[0x7fd4d5719000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
- locked <0x00074e6c8b98> (a java.util.ArrayDeque)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at 
org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at 
org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamCalcRule$4160.processElement(Unknown Source)
at 
org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66)
at 
org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35)
at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:488)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:465)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:425)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at 
org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at 
org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamSourceConversion$4104.processElement(Unknown Source)
at 
org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:488)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:465)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:425)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:686)
at 
or

Re: Job Recovery Time on TM Lost

2021-07-05 Thread Gen Luo

I know that there are retry strategies for akka rpc frameworks. I was just
considering that, since the environment is shared by JM and TMs, and the
connections among TMs (using netty) are flaky in unstable environments,
which will also cause the job failure, is it necessary to build a
strongly guaranteed connection between JM and TMs, or it could be as flaky
as the connections among TMs?

As far as I know, connections among TMs will just fail on their first
connection loss, so behaving like this in JM just means "as flaky as
connections among TMs". In a stable environment it's good enough, but in an
unstable environment, it indeed increases the instability. IMO, though a
single connection loss is not reliable, a double check should be good
enough. But since I'm not experienced with an unstable environment, I can't
tell whether that's also enough for it.

On Mon, Jul 5, 2021 at 5:59 PM Till Rohrmann  wrote:

> I think for RPC communication there are retry strategies used by the
> underlying Akka ActorSystem. So a RpcEndpoint can reconnect to a remote
> ActorSystem and resume communication. Moreover, there are also
> reconciliation protocols in place which reconcile the states between the
> components because of potentially lost RPC messages. So the main question
> would be whether a single connection loss is good enough for triggering the
> timeout or whether we want a more elaborate mechanism to reason about the
> availability of the remote system (e.g. a couple of lost heartbeat
> messages).
>
> Cheers,
> Till
>
> On Mon, Jul 5, 2021 at 10:00 AM Gen Luo  wrote:
>
>> As far as I know, a TM will report connection failure once its connected
>> TM is lost. I suppose JM can believe the report and fail the tasks in the
>> lost TM if it also encounters a connection failure.
>>
>> Of course, it won't work if the lost TM is standalone. But I suppose we
>> can use the same strategy as the connected scenario. That is, consider it
>> possibly lost on the first connection loss, and fail it if double check
>> also fails. The major difference is the senders of the probes are the same
>> one rather than two different roles, so the results may tend to be the same.
>>
>> On the other hand, the fact also means that the jobs can be fragile in an
>> unstable environment, no matter whether the failover is triggered by TM or
>> JM. So maybe it's not that worthy to introduce extra configurations for
>> fault tolerance of heartbeat, unless we also introduce some retry
>> strategies for netty connections.
>>
>>
>> On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann 
>> wrote:
>>
>>> Could you share the full logs with us for the second experiment, Lu? I
>>> cannot tell from the top of my head why it should take 30s unless you have
>>> configured a restart delay of 30s.
>>>
>>> Let's discuss FLINK-23216 on the JIRA ticket, Gen.
>>>
>>> I've now implemented FLINK-23209 [1] but it somehow has the problem that
>>> in a flakey environment you might not want to mark a TaskExecutor dead on
>>> the first connection loss. Maybe this is something we need to make
>>> configurable (e.g. introducing a threshold which admittedly is similar to
>>> the heartbeat timeout) so that the user can configure it for her
>>> environment. On the upside, if you mark the TaskExecutor dead on the first
>>> connection loss (assuming you have a stable network environment), then it
>>> can now detect lost TaskExecutors as fast as the heartbeat interval.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-23209
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Jul 2, 2021 at 9:33 AM Gen Luo  wrote:
>>>
 Thanks for sharing, Till and Yang.

 @Lu
 Sorry but I don't know how to explain the new test with the log. Let's
 wait for others' reply.

 @Till
 It would be nice if JIRAs could be fixed. Thanks again for proposing
 them.

 In addition, I was tracking an issue that RM keeps allocating and
 freeing slots after a TM lost until its heartbeat timeout, when I found the
 recovery costing as long as heartbeat timeout. That should be a minor bug
 introduced by declarative resource management. I have created a JIRA about
 the problem [1] and  we can discuss it there if necessary.

 [1] https://issues.apache.org/jira/browse/FLINK-23216

 Lu Niu  于2021年7月2日周五 上午3:13写道：

> Another side question, Shall we add metric to cover the complete
> restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only
> covers phase 1. Thanks!
>
> Best
> Lu
>
> On Thu, Jul 1, 2021 at 12:09 PM Lu Niu  wrote:
>
>> Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
>>
>> I did another test yesterday. In this test, I intentionally throw
>> exception from the source operator:
>> ```
>> if (runtimeContext.getIndexOfThisSubtask() == 1
>> && errorFrenquecyInMin > 0
>> && System.currentTimeMillis() - lastStartTime >=
>> err

[jira] [Created] (FLINK-23264) Fine Grained Resource Management Phase 2

2021-07-05 Thread Yangze Guo (Jira)

Yangze Guo created FLINK-23264:
--

 Summary: Fine Grained Resource Management Phase 2
 Key: FLINK-23264
 URL: https://issues.apache.org/jira/browse/FLINK-23264
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Reporter: Yangze Guo






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23265) Support evenly spread out in fine-grained resource management

2021-07-05 Thread Yangze Guo (Jira)

Yangze Guo created FLINK-23265:
--

 Summary: Support evenly spread out in fine-grained resource 
management
 Key: FLINK-23265
 URL: https://issues.apache.org/jira/browse/FLINK-23265
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.14.0
Reporter: Yangze Guo






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23266) HA per-job cluster (rocks, non-incremental) hangs on Azure

2021-07-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-23266:


 Summary: HA per-job cluster (rocks, non-incremental) hangs on Azure
 Key: FLINK-23266
 URL: https://issues.apache.org/jira/browse/FLINK-23266
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.12.4
Reporter: Xintong Song
 Fix For: 1.12.5


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19943&view=logs&j=6caf31d6-847a-526e-9624-468e053467d6&t=0b23652f-b18b-5b6e-6eb6-a11070364610&l=1858

{code}
Jul 05 21:56:00 
==
Jul 05 21:56:00 Running 'Running HA per-job cluster (rocks, non-incremental) 
end-to-end test'
Jul 05 21:56:00 
==
Jul 05 21:56:00 TEST_DATA_DIR: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-00772599944
Jul 05 21:56:00 Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
Jul 05 21:56:00 Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
Jul 05 21:56:01 Starting zookeeper daemon on host fv-az43-4.
Jul 05 21:56:01 Running on HA mode: parallelism=4, backend=rocks, 
asyncSnapshots=true, incremSnapshots=false and zk=3.4.
Jul 05 21:56:03 Starting standalonejob daemon on host fv-az43-4.
Jul 05 21:56:03 Start 1 more task managers
Jul 05 21:56:04 Starting taskexecutor daemon on host fv-az43-4.
Jul 05 21:56:10 Job () is not yet running.
Jul 05 21:56:18 Job () is running.
Jul 05 21:56:18 Running JM watchdog @ 266158
Jul 05 21:56:18 Running TM watchdog @ 266159
Jul 05 21:56:18 Waiting for text Completed checkpoint [1-9]* for job 
 to appear 2 of times in logs...
Jul 05 21:56:22 Killed JM @ 264313
Jul 05 21:56:22 Waiting for text Completed checkpoint [1-9]* for job 
 to appear 2 of times in logs...
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log:
 No such file or directory
Jul 05 21:56:26 Killed TM @ 264571
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log:
 No such file or directory
Jul 05 21:56:26 Starting standalonejob daemon on host fv-az43-4.
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log:
 No such file or directory
Jul 05 21:57:12 Killed JM @ 267798
Jul 05 21:57:12 Waiting for text Completed checkpoint [1-9]* for job 
 to appear 2 of times in logs...
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log:
 No such file or directory
Jul 05 21:57:15 Starting standalonejob daemon on host fv-az43-4.
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log:
 No such file or directory
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_ha.sh: line 151: 
[: 58)\n\tat 
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:
 integer expression expected
Jul 05 21:58:07 Killed JM @ 271440
Jul 05 21:58:07 Waiting for text Completed checkpoint [1-9]* for job 
 to appear 2 of times in logs...
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log:
 No such file or directory
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log:
 No such file or directory
Jul 05 21:58:09 Killed TM @ 267660
Jul 05 21:58:09 Starting standalonejob daemon on host fv-az43-4.
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log:
 No such file or directory
grep: 
/

[jira] [Created] (FLINK-23267) Activate Java code splitter for all generated classes

2021-07-05 Thread Caizhi Weng (Jira)

Caizhi Weng created FLINK-23267:
---

 Summary: Activate Java code splitter for all generated classes
 Key: FLINK-23267
 URL: https://issues.apache.org/jira/browse/FLINK-23267
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Caizhi Weng


In the second step we activate Java code splitter introduced in the first step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23268) [DOCS]The link on page docs/dev/table/sql/queries/match_recognize/ is failed and 404 is returned

2021-07-05 Thread wuguihu (Jira)

wuguihu created FLINK-23268:
---

 Summary: [DOCS]The link on page 
docs/dev/table/sql/queries/match_recognize/ is failed and 404 is returned
 Key: FLINK-23268
 URL: https://issues.apache.org/jira/browse/FLINK-23268
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: wuguihu
 Attachments: image-20210706134442433.png

Some link information on the page is incorrectly written, resulting in a 404 
page

The page url 
：[https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/match_recognize/]

The markdown 
file：[https://github.com/apache/flink/blob/master/docs/content/docs/dev/table/sql/queries/match_recognize.md]

When i click on this the link `[append 
table](dynamic_tables.html#update-and-append-queries)`, I get a 404 page。

The corresponding address is 
：[https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/match_recognize/dynamic_tables.html#update-and-append-queries]

Refer to document 
[https://github.com/apache/flink/blob/master/docs/content.zh/docs/dev/table/sql/queries/match_recognize.md]
 for the correct link information

 

The link shown below returns 404
{code:java}
//1 
 [append table](dynamic_tables.html#update-and-append-queries)
//2 
 [processing time or event time](time_attributes.html)
//3
 [time attributes](time_attributes.html)
//4
 rowtime attribute
//5
 proctime attribute
//6
 [state retention time](query_configuration.html#idle-state-retention-time)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23269) json format decode number to vatchar error when define Decimal type in ddl

2021-07-05 Thread silence (Jira)

silence created FLINK-23269:
---

 Summary: json format decode number to vatchar error when define 
Decimal type in ddl
 Key: FLINK-23269
 URL: https://issues.apache.org/jira/browse/FLINK-23269
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: silence


when use json format and define decimal
json:
{"c1":50.0,"c2":50.0}
ddl:
create table(
c1 varchar,
c2 decimal
)with(
'format'='json'
)
output:
{"c1":"5E+1","c2":50.0}

And the following unit tests will produce the following results

{"double1":50.0,"double2":50.0,"double3":"50.0","float1":20.0,"float2":20.0,"float3":"20.0"}

java.lang.AssertionError: 
Expected :+I[50.0, 50.0, 50.0, 20.0, 20.0, 20.0]
Actual   :+I[5E+1, 50.0, 50.0, 2E+1, 20.0, 20.0]


{code:java}
@Test
public void testDeserialization() throws Exception {
double doubleValue = 50.0;
float floatValue = 20.0f;

ObjectMapper objectMapper = new ObjectMapper();
ObjectNode root = objectMapper.createObjectNode();
root.put("double1", doubleValue);
root.put("double2", doubleValue);
root.put("double3", String.valueOf(doubleValue));
root.put("float1", floatValue);
root.put("float2", floatValue);
root.put("float3", String.valueOf(floatValue));

byte[] serializedJson = objectMapper.writeValueAsBytes(root);
System.out.println(new String(serializedJson));
DataType dataType =
ROW(
FIELD("double1", STRING()),
FIELD("double2", DECIMAL(10,1)),
FIELD("double3", DOUBLE()),
FIELD("float1", STRING()),
FIELD("float2", DECIMAL(10,1)),
FIELD("float3", FLOAT()));

RowType rowType = (RowType) dataType.getLogicalType();
JsonRowDataDeserializationSchema deserializationSchema =
new JsonRowDataDeserializationSchema(
rowType,
InternalTypeInfo.of(rowType),
false,
false,
TimestampFormat.ISO_8601);

Row expected = new Row(6);
expected.setField(0, String.valueOf(doubleValue));
expected.setField(1, String.valueOf(doubleValue));
expected.setField(2, doubleValue);
expected.setField(3, String.valueOf(floatValue));
expected.setField(4, String.valueOf(floatValue));
expected.setField(5, floatValue);

RowData rowData = deserializationSchema.deserialize(serializedJson);
Row actual = convertToExternal(rowData, dataType);
assertEquals(expected, actual);
}
{code}

when define the DecimalType 
ObjectMapper will enable USE_BIG_DECIMAL_FOR_FLOATS
and jsonNode.asText() will call BigDecimal toString method

{code:java}
boolean hasDecimalType =
LogicalTypeChecks.hasNested(rowType, t -> t instanceof 
DecimalType);
if (hasDecimalType) {

objectMapper.enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS);
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

40 matches

Mail list logo