big +1 for this feature,
1. Reset kafka offset in certain cases.
2. Stop checkpoint in certain cases.
3. Change log level for debug.
刘建刚 于2021年6月11日周五 下午12:17写道:
> Thanks for all the discussions and suggestions. Since the topic has
> been discussed for about a week, it is time to have a conclusion and new
> ideas are welcomed at the same time.
> First, the topic starts with use cases in restful interface. The
> restful interface supported many useful interactions with users, for
> example as follows. It is an easy way to control the job compared with
> broadcast api.
>
>1. Change data processing’ logic by dynamic configs, such as filter
>condition.
>2. Define some tools to control the job, such as QPS limit, sampling,
>change log level and so on.
>
> Second, we broaden the topic to control flow in order to support all
> kinds of control events besides the above user cases. There is a strong
> demand to support custom (broadcast) events for iteration, SQL control
> events and so on. As Xintong Song said, the key to the control flow lies as
> follows:
>
>1. Who (which component) is responsible for generating the control
>messages? It may be the jobmaster by some ways, the inner operator and so
>on.
>2. Who (which component) is responsible for reacting to the messages.
>3. How do the messages propagate? Flink should support sending control
>messages by channels.
>4. When it comes to affecting the computation logics, how should the
>control flow work together with the exact-once consistency. To use the
>checkpoint mechanism, control messages flowing from source to down tasks
>may be a good idea.
>
> Third, a common and flexible control flow design requires good design
> and implementation as a base. Future features and existing features should
> both be considered. For future features, a common restful interface is
> first needed to support dynamic configs. For existing features, There exist
> checkpoint barriers, watermark and latency marker. They have some special
> behaviors but also share a lot in common. The common logic should be
> considered but maybe they should remain unchanged until the control flow is
> stable.
> Some other problems as follows:
>
>1. How to persist the control signals when the jobmaster fails? An
>idea is to persist control signals in HighAvailabilityServices and replay
>them after failover. The restful request should be non-blocking.
>2. Should all the operators receive the control messages? All
>operators should have the ability to receive upper operators' control
>messages but maybe not process them. If we want to persist the control
>message state, all the subtasks belonging to one operator should have the
>same control events in order to rescale easily.
>
> For the next step, I will draft a FLIP with the scope of common
> control flow framework. More discussions, ideas and problems are still
> welcome.
>
> Thank you~
>
> Jiangang Liu
>
>
>
>
>
>
>
> Xintong Song 于2021年6月9日周三 下午12:01写道:
>
>> >
>> > 2. There are two kinds of existing special elements, special stream
>> > records (e.g. watermarks) and events (e.g. checkpoint barrier). They all
>> > flow through the whole DAG, but events needs to be acknowledged by
>> > downstream and can overtake records, while stream records are not). So
>> I’m
>> > wondering if we plan to unify the two approaches in the new control flow
>> > (as Xintong mentioned both in the previous mails)?
>> >
>>
>> TBH, I don't really know yet. We feel that the control flow is a
>> non-trivial topic and it would be better to bring it up publicly as early
>> as possible, while the concrete plan is still on the way.
>>
>> Personally, I'm leaning towards not touching the existing watermarks and
>> checkpoint barriers in the first step.
>> - I'd expect the control flow to be introduced as an experimental feature
>> that takes time to stabilize. It would be better that the existing
>> important features like checkpointing and watermarks stay unaffected.
>> - Checkpoint barriers are a little different, as other control messages
>> somehow rely on it to achieve exactly once consistency. Without the
>> concrete design, I'm not entirely sure whether it can be properly modeled
>> as a special case of general control messages.
>> - Watermarks are probably similar to the other control messages. However,
>> it's already exposed to users as public APIs. If we want to migrate it to
>> the new control flow, we'd be very careful not to break any compatibility.
>>
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Wed, Jun 9, 2021 at 11:30 AM Steven Wu wrote:
>>
>> > > producing control events from JobMaster is similar to triggering a
>> > savepoint.
>> >
>> > Paul, here is what I see the difference. Upon job or jobmanager
>> recovery,
>> > we don't need to recover and replay the savepoint trigger signal.
>> >
>> > On Tue, Jun 8, 2021 at 8:20 PM Pa