[ 
https://issues.apache.org/jira/browse/FLINK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485648#comment-14485648
 ] 

ASF GitHub Bot commented on FLINK-1670:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/581#issuecomment-90987769
  
    It is an interesting idea to collect back a data stream. This solution here 
has, however, quite a few limitations and implications (I assume it was only 
locally tested?):
    
      - It supports only `java.io.Serializable` types. This is a bit 
inconsistent with the current type handling and serialization in Flink. Some 
types that work in all other parts do not work here.
    
      - It does not work in a cluster. It sends "localhost" as the name to the 
worker who should send the data back. In any non-local setup, this cannot work.
    
      - It requires the worker to be able to connect to the client. This may be 
tricky, when the client and workers do not run both in the cluster.
    
      - Selecting the proper interface that opens the port for data 
communication is actually quite tricky. The TaskManagers spend quite a bit of 
work to select that interface - otherwise many installations do not work, since 
in most cases certain interfaces or hostnames are only accessible from certain 
networks (cloud internal and external network interfaces).
    
    I think this is a very tricky thing to realize. It has implications on the 
distributed process and communication model. It starts extending streaming to 
mixed local/remote runtimes and everything. It affects all assumptions we make 
for fault tolerance. What happens to the stream in case of a failure? There is 
no notion of restarting the driver.
    
    That is something that needs a bit more consideration and design, for the 
sake of building something consistent where the concepts and implications play 
together well. I hope you do not take it the wrong way, but without clarifying 
these points, this addition is a bit premature. 
    



> Collect method for streaming
> ----------------------------
>
>                 Key: FLINK-1670
>                 URL: https://issues.apache.org/jira/browse/FLINK-1670
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 0.9
>            Reporter: Márton Balassi
>            Assignee: Gabor Gevay
>            Priority: Minor
>
> A convenience method for streaming back the results of a job to the client.
> As the client itself is a bottleneck anyway an easy solution would be to 
> provide a socket sink with degree of parallelism 1, from which a client 
> utility can read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to