[ https://issues.apache.org/jira/browse/FLINK-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056695#comment-17056695 ]
Caizhi Weng commented on FLINK-14807: ------------------------------------- Hi dear Flink community. After a long offline discussion with [~godfreyhe] we agree that although {{GlobalAggregateManager}} is currently the easiest (and might be the only) way to implement {{Table#collect}}, it has a bad impact on user interface. As the client can only communicate to the cluster via {{JobClient}}, we'll have to add an {{updateGlobalAggregate}} method to the interface and it's hard to explain to the users what this method is used for. To avoid the impact on the user interface, we come up with two options. # *Hide {{updateGlobalAggregate}} in an internal interface*. We would like to add a new internal interface like {{InternalJobClient}} which extends {{JobClient}} and contains the method {{updateGlobalAggregate}}. All current implementations of {{JobClient}} will implement {{InternalJobClient}} instead. This is the quickest way to support {{Table#collect}} without impacting the user interface but it seems to be sort of hack. # *Extend the ability of {{OperatorCoordinators}}*. From our understanding this is the best way in the long run. However, the original design of {{OperatorCoordinators}} will only support the communication and coordination between the sub-partitions of the same operator, not between the operator and the client. Also, the original design communicates with events and listeners and it's impossible for the clients to register a listener on JMs because JMs can't communicate to the clients initiatively. So for this option to work, we still need at least two extensions on {{OperatorCoordinators}}: ## *A way for the client to talk to the coordinator*. Currently operator coordinators are identified by {{OperationID}} which is unknown to the client. We would like a method to register the coordinators with a name and use this name to identify the coordinators, so that clients can talk to the specified coordinator via REST API with its name. ## *Requests instead of Events*. We would like the client to post a request to the coordinator instead of sending the event. The difference is that for a request we're expecting a response from the coordinator in the same REST API call, while for an event we expect nothing to be returned. What do you think? Is any of these options feasible? Or do you have any other suggestions? > Add Table#collect api for fetching data to client > ------------------------------------------------- > > Key: FLINK-14807 > URL: https://issues.apache.org/jira/browse/FLINK-14807 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API > Affects Versions: 1.9.1 > Reporter: Jeff Zhang > Priority: Major > Labels: usability > Fix For: 1.11.0 > > Attachments: table-collect-draft.patch, table-collect.png > > > Currently, it is very unconvinient for user to fetch data of flink job unless > specify sink expclitly and then fetch data from this sink via its api (e.g. > write to hdfs sink, then read data from hdfs). However, most of time user > just want to get the data and do whatever processing he want. So it is very > necessary for flink to provide api Table#collect for this purpose. > > Other apis such as Table#head, Table#print is also helpful. > -- This message was sent by Atlassian Jira (v8.3.4#803005)