[ https://issues.apache.org/jira/browse/NIFI-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419325#comment-15419325 ]
ASF GitHub Bot commented on NIFI-2547: -------------------------------------- GitHub user rickysaltzer opened a pull request: https://github.com/apache/nifi/pull/850 NIFI-2547: Add DeleteHDFS Processor This processor adds the capability to delete files or directories inside of HDFS. Paths supports both static and expression language values, as well as glob support (e.g. /data/for/2016/07/*). This processor may be used standalone, as well as part of a downstream connection. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rickysaltzer/nifi NIFI-2547 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/850.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #850 ---- commit d42fe48779eefbdfe936f2b3745b7eed1fe31d6e Author: ricky <ri...@cloudera.com> Date: 2016-08-10T23:14:39Z NIFI-2547: Add DeleteHDFS Processor This processor adds the capability to delete files or directories inside of HDFS. Paths supports both static and expression language values, as well as glob support (e.g. /data/for/2016/07/*). This processor may be used standalone, as well as part of a downstream connection. ---- > Add DeleteHDFS Processor > ------------------------- > > Key: NIFI-2547 > URL: https://issues.apache.org/jira/browse/NIFI-2547 > Project: Apache NiFi > Issue Type: New Feature > Reporter: Ricky Saltzer > Assignee: Ricky Saltzer > > There are times where a user may want to remove a file or directory from > HDFS. The reasons for this vary, but to provide some context, I currently > have a pipeline where I need to periodically delete files that my NiFi > pipeline is producing. In my case, it's a "Delete files after they are 7 days > old". > Currently, I have to use the {{ExecuteStreamCommand}} processor and manually > call {{hdfs dfs -rm}}, which is awful when dealing with a large amount of > files. For one, an entire JVM is spun up for each delete, and two, when > deleting directories with thousands of files, it can sometimes cause the > command to hang indefinitely. > With that being said, I am proposing we add a {{DeleteHDFS}} processor which > meets the following criteria. > * Can delete both directories and files > * Can delete directories recursively > * Supports the dynamic expression language > * Supports using glob paths (e.g. /data/for/2017/08/*) > * Capable of being a downstream processor as well as a standalone processor -- This message was sent by Atlassian JIRA (v6.3.4#6332)