[ https://issues.apache.org/jira/browse/FLINK-27559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yun Tang closed FLINK-27559. ---------------------------- Resolution: Information Provided > Some question about flink operator state > ---------------------------------------- > > Key: FLINK-27559 > URL: https://issues.apache.org/jira/browse/FLINK-27559 > Project: Flink > Issue Type: New Feature > Environment: Flink 1.14.4 > Reporter: Underwood > Priority: Major > > I hope to get two answers to Flink's maintenance status: > > 1. Does custompartition support saving status? In my usage scenario, the > partition strategy is dynamically adjusted, which depends on the data in > datastream. I hope to make different partition strategies according to > different data conditions. > > For a simple example, I want the first 100 pieces of data in datastream to be > range partitioned and the rest of the data to be hash partitioned. At this > time, I may need a count to identify the number of pieces of data that have > been processed. However, in custompartition, this is only a local variable, > so there seem to be two problems: declaring variables in this way can only be > used in single concurrency, and it seems that they cannot be counted across > slots; In this way, the count data will be lost during fault recovery. > > Although Flink already has operator state and key value state, > custompartition is not an operator, so I don't think it can solve this > problem through state. I've considered introducing a zookeeper to save the > state, but the introduction of new components will make the system bloated. I > don't know whether there is a better way to solve this problem. > > 2. How to make multiple operators share the same state, and even all parallel > subtasks of different operators share the same state? > > For a simple example, my stream processing is divided into four stages: > source - > mapa - > mapb - > sink. I hope to have a status count to count the > total amount of data processed by all operators. For example, if the source > receives one piece of data, then count + 1 when mapa is processed and count + > 1 when mapb is processed. Finally, after this piece of data is processed, the > value of count is 2. > > I don't know if there is such a state saving mechanism in Flink, which can > meet my scenario and recover from failure at the same time. At present, we > can still think of using zookeeper. I don't know if there is a better way. -- This message was sent by Atlassian Jira (v8.20.7#820007)