[jira] [Commented] (FLINK-14807) Add Table#collect api for fetching data to client

Caizhi Weng (Jira) Tue, 10 Mar 2020 23:37:23 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056695#comment-17056695
 ]


Caizhi Weng commented on FLINK-14807:
-------------------------------------

Hi dear Flink community.

After a long offline discussion with [~godfreyhe] we agree that although 
{{GlobalAggregateManager}} is currently the easiest (and might be the only) way 
to implement {{Table#collect}}, it has a bad impact on user interface. As the 
client can only communicate to the cluster via {{JobClient}}, we'll have to add 
an {{updateGlobalAggregate}} method to the interface and it's hard to explain 
to the users what this method is used for. To avoid the impact on the user 
interface, we come up with two options.
 # *Hide {{updateGlobalAggregate}} in an internal interface*. We would like to 
add a new internal interface like {{InternalJobClient}} which extends 
{{JobClient}} and contains the method {{updateGlobalAggregate}}. All current 
implementations of {{JobClient}} will implement {{InternalJobClient}} instead. 
This is the quickest way to support {{Table#collect}} without impacting the 
user interface but it seems to be sort of hack.
 # *Extend the ability of {{OperatorCoordinators}}*. From our understanding 
this is the best way in the long run. However, the original design of 
{{OperatorCoordinators}} will only support the communication and coordination 
between the sub-partitions of the same operator, not between the operator and 
the client. Also, the original design communicates with events and listeners 
and it's impossible for the clients to register a listener on JMs because JMs 
can't communicate to the clients initiatively. So for this option to work, we 
still need at least two extensions on {{OperatorCoordinators}}:
 ## *A way for the client to talk to the coordinator*. Currently operator 
coordinators are identified by {{OperationID}} which is unknown to the client. 
We would like a method to register the coordinators with a name and use this 
name to identify the coordinators, so that clients can talk to the specified 
coordinator via REST API with its name.
 ## *Requests instead of Events*. We would like the client to post a request to 
the coordinator instead of sending the event. The difference is that for a 
request we're expecting a response from the coordinator in the same REST API 
call, while for an event we expect nothing to be returned.

What do you think? Is any of these options feasible? Or do you have any other 
suggestions?

> Add Table#collect api for fetching data to client
> -------------------------------------------------
>
>                 Key: FLINK-14807
>                 URL: https://issues.apache.org/jira/browse/FLINK-14807
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>    Affects Versions: 1.9.1
>            Reporter: Jeff Zhang
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>         Attachments: table-collect-draft.patch, table-collect.png
>
>
> Currently, it is very unconvinient for user to fetch data of flink job unless 
> specify sink expclitly and then fetch data from this sink via its api (e.g. 
> write to hdfs sink, then read data from hdfs). However, most of time user 
> just want to get the data and do whatever processing he want. So it is very 
> necessary for flink to provide api Table#collect for this purpose. 
>  
> Other apis such as Table#head, Table#print is also helpful.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-14807) Add Table#collect api for fetching data to client

Reply via email to