GitHub user ggevay opened a pull request:

    https://github.com/apache/flink/pull/581

    [FLINK-1670] Made DataStream iterable

    I created a DataStreamIterator class, and added an iterator() method to 
DataStream, which returns an instance of it. The iterator creates a TCP server 
on which it gets the data. The other end of the TCP connection is a 
SocketClientSink, which is added to the DataStream by writeToSocket from the 
iterator() method.
    
    The iterator() method also calls execute(), because it needs to be called 
on a separate thread, which would be awkward for the user.
    
    I modified the DataStreamSink of writeToSocket() to have parallelism 1, 
because it cannot be conveniently handled if multiple instances connect to the 
same port.
    
    For testing, I modified the WordCount example to not use the print method, 
but use the iterator instead.
    
    The serialization/deserialization could be made faster if the 
SerializationSchema in SocketClientSink would not return a byte[], but instead 
write directly to a stream. But in this case, the schemas in KafkaSink, 
FlumeSink, RMQSink should be modified too, so that the serialization schemas 
that are expected from the user have the same interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ggevay/flink collect

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #581
    
----
commit d5900364537b678cf48b68aa8b0124f54aa7ca10
Author: Gabor Gevay <gga...@gmail.com>
Date:   2015-04-08T14:51:32Z

    [FLINK-1670] Made DataStream iterable: the results are streamed back from 
the job to the client by TCP.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to