[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117788#comment-17117788 ] Robert Metzger commented on FLINK-16517: I agree with Till that {{WindowJoin}} (and also {{TopSpeedWindowing.jar}}) are two examples of self-contained streaming jobs, that are part of the examples collection. The review and maintenance overhead of making this example also long-running is just not justified. I propose to close this ticket as "Won't fix" and close the accompanying pull request. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062489#comment-17062489 ] Till Rohrmann commented on FLINK-16517: --- Isn't the {{WindowJoin}} already simple enough and it comes with an infinite source. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061327#comment-17061327 ] Ethan Li commented on FLINK-16517: -- Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in "[readTextFile|[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085]] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056734#comment-17056734 ] Aljoscha Krettek commented on FLINK-16517: -- I think you might be right, we can change the streaming WordCount example to read files with {{FileProcessingMode.PROCESS_CONTINUOUSLY}}, which would make it more stream-y, and also to use an unbounded source for the built-in data when no input path is given. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056571#comment-17056571 ] Ethan Li commented on FLINK-16517: -- [~aljoscha] Thanks for the link. I feel like it's still not simple enough for starters. I am looking for a very simple example so starters can focus on making their first flink job running. WordCount example is a like a hello-world program in streaming. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055515#comment-17055515 ] Ethan Li commented on FLINK-16517: -- I will put up a pull request if you think this makes sense. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)