[ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600948#comment-14600948
 ] 

Sebastian Kruse commented on FLINK-2239:
----------------------------------------

I used the {{RemoteCollectorOutputFormat}} for this purpose and that always 
worked pretty well. However, the downside of it is that it uses Java RMI, which 
is not using Flink's serialization stack and also sometimes requires to set up 
the client address via {{-Djava.rmi.server.hostname}}.

Additionally, I would like to remark that there is a more general issue behind 
this: If one wants to ship larger job results to the driver (e.g., in order to 
write it to a local DB), {{collect()}} also falls flat. Something like an 
{{iterate()}} method would help in such cases, that streams the result to the 
client without materializing it. The proposed change to {{print()}} is then 
just a special instance of such an {{iterate()}} method.

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to