GitHub user DT-Priyanka opened a pull request:

    https://github.com/apache/incubator-apex-malhar/pull/207

    APEXMALHAR-2008: Create HDFS File Reader module

    Code to add HDFS file reader module. 
    1. The module reads file/list of files (directory is also accepted) and 
emit the file blocks. 
    2. The module can be configured to emit blocks in order or out of order.
    3. Module reads file blocks in parallel. The number of parallel readers is 
configurable, if not configured it will increase or decrease readers 
dynamically as per input data rate.
    
    Also updated code of FileSplitterInput to add some improvements:
    1. Tracking last file reference times of each folder differently, to avoid 
duplicates (duplicates could be due to same relative paths of multiple 
files/sub dir)
    2. Small improvements in code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DT-Priyanka/incubator-apex-malhar 
APEXMALHAR-2008-hdfs-input-module

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-apex-malhar/pull/207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #207
    
----
commit 8ffb34abe48f525d401c3932d79ada6c71214e88
Author: Priyanka Gugale <[email protected]>
Date:   2016-03-08T08:42:13Z

    APEXMALHAR-2008: Create HDFS File Reader module

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to