Github user chiwanpark commented on the pull request:

    https://github.com/apache/incubator-flink/pull/226#issuecomment-65885299
  
    I suggest a new implementation of this feature. I hope for many feedback 
about this idea. There are two functions for this feature.
    
    1. `FileMonitoringFunction` emits a tuple with 3 parameters. (modified file 
path, start offset, end offset) This function implements `NonParallelInput`.
    2. `FileMapFunction` (I think that renaming of this function is required) 
reads file that have the file path and emits contents in given range. This 
function implements `FlatMapFunction` because there is no method to link 
between two source functions.
    
    When a user calls `readFileStream` in `StreamExecutionEnvironment`, the 
system creates a `FileMonitoringFunction` and `FileMapFunction` and links them 
and returns them.
    
    With this implementation, we can fix the problem about parallelism with 
monitoring instance. The user can set degree of parallelism of source. In fact, 
the user set degree of parallelism of map function. There is only one instance 
monitoring file system.
    
    Additionally, we can reuse `FileMapFunction` to substitute 
`FileSourceFunction`.
    
    How about this implementation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to