Github user chiwanpark commented on the pull request:
https://github.com/apache/incubator-flink/pull/226#issuecomment-65885299
I suggest a new implementation of this feature. I hope for many feedback
about this idea. There are two functions for this feature.
1. `FileMonitoringFunction` emits a tuple with 3 parameters. (modified file
path, start offset, end offset) This function implements `NonParallelInput`.
2. `FileMapFunction` (I think that renaming of this function is required)
reads file that have the file path and emits contents in given range. This
function implements `FlatMapFunction` because there is no method to link
between two source functions.
When a user calls `readFileStream` in `StreamExecutionEnvironment`, the
system creates a `FileMonitoringFunction` and `FileMapFunction` and links them
and returns them.
With this implementation, we can fix the problem about parallelism with
monitoring instance. The user can set degree of parallelism of source. In fact,
the user set degree of parallelism of map function. There is only one instance
monitoring file system.
Additionally, we can reuse `FileMapFunction` to substitute
`FileSourceFunction`.
How about this implementation?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---