[
https://issues.apache.org/jira/browse/FLINK-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242876#comment-14242876
]
ASF GitHub Bot commented on FLINK-1081:
---------------------------------------
Github user chiwanpark commented on the pull request:
https://github.com/apache/incubator-flink/pull/226#issuecomment-66663423
I implement this feature. As I said, There are two functions for this
feature. One is `FileMonitoringFunction` and another is `FileReadFunction` (I
renamed `FileMapFunction` to `FileReadFunction`).
if a user call `readFileStream` in `StreamExecutionEnvironment`, the system
creates a `FileMonitoringFunction` as a primary source and set degree of
parallelism to 1 (because of ignoring `NonParallelInput` interface) and connect
to `FileMapFunction` with `flatMap` method.
I tested in local mini cluster and HDFS environment with set degree of
parallelism to 5.
> Add HDFS file-stream source for streaming
> -----------------------------------------
>
> Key: FLINK-1081
> URL: https://issues.apache.org/jira/browse/FLINK-1081
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 0.7.0-incubating
> Reporter: Gyula Fora
> Assignee: Chiwan Park
> Labels: starter
>
> Add data stream source that will monitor a slected directory on HDFS (or
> other filesystems as well) and will process all new files created.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)