[ 
https://issues.apache.org/jira/browse/NIFI-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419325#comment-15419325
 ] 

ASF GitHub Bot commented on NIFI-2547:
--------------------------------------

GitHub user rickysaltzer opened a pull request:

    https://github.com/apache/nifi/pull/850

    NIFI-2547: Add DeleteHDFS Processor

    This processor adds the capability to delete files or
    directories inside of HDFS.
    
    Paths supports both static and expression language values,
    as well as glob support (e.g. /data/for/2016/07/*).
    
    This processor may be used standalone, as well as part of a
    downstream connection.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rickysaltzer/nifi NIFI-2547

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #850
    
----
commit d42fe48779eefbdfe936f2b3745b7eed1fe31d6e
Author: ricky <ri...@cloudera.com>
Date:   2016-08-10T23:14:39Z

    NIFI-2547: Add DeleteHDFS Processor
    
    This processor adds the capability to delete files or
    directories inside of HDFS.
    
    Paths supports both static and expression language values,
    as well as glob support (e.g. /data/for/2016/07/*).
    
    This processor may be used standalone, as well as part of a
    downstream connection.

----


> Add DeleteHDFS Processor 
> -------------------------
>
>                 Key: NIFI-2547
>                 URL: https://issues.apache.org/jira/browse/NIFI-2547
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Ricky Saltzer
>            Assignee: Ricky Saltzer
>
> There are times where a user may want to remove a file or directory from 
> HDFS. The reasons for this vary, but to provide some context, I currently 
> have a pipeline where I need to periodically delete files that my NiFi 
> pipeline is producing. In my case, it's a "Delete files after they are 7 days 
> old". 
> Currently, I have to use the {{ExecuteStreamCommand}} processor and manually 
> call {{hdfs dfs -rm}}, which is awful when dealing with a large amount of 
> files. For one, an entire JVM is spun up for each delete, and two, when 
> deleting directories with thousands of files, it can sometimes cause the 
> command to hang indefinitely. 
> With that being said, I am proposing we add a {{DeleteHDFS}} processor which 
> meets the following criteria. 
> * Can delete both directories and files
> * Can delete directories recursively
> * Supports the dynamic expression language 
> * Supports using glob paths (e.g. /data/for/2017/08/*)
> * Capable of being a downstream processor as well as a standalone processor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to