[ 
https://issues.apache.org/jira/browse/NIFI-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980412#comment-14980412
 ] 

Mark Payne commented on NIFI-631:
---------------------------------

[~jskora] - Excellent! I went back & forth on writing that 
AbstractListProcessor for a while because it'll be so much easier when we have 
the framework able to manage state instead of relying on controller services... 
but alas decided it was worth the time. When the build complains about license 
issues, it will write out a rat.txt file that contains the names of files 
without valid licenses. So in Maven you'll see something like:

[ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.11:check 
(default) on project nifi-standard-processors: Too many files with unapproved 
license: 1 See RAT report in: 
/devel/nifi/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/target/rat.txt
 -> [Help 1]

Then, within the rat.txt you'll see something like:

1 Unknown Licenses

*******************************

Unapproved licenses:

  src/main/resources/hello

*******************************


So here we know that there's a file called 'hello' under src/main/resources, 
and that file doesn't have a valid header.

Thanks
-Mark

> Create ListFile and FetchFile processors
> ----------------------------------------
>
>                 Key: NIFI-631
>                 URL: https://issues.apache.org/jira/browse/NIFI-631
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Mark Payne
>            Assignee: Joe Skora
>         Attachments: 
> 0001-NIFI-631-Initial-implementation-of-FetchFile-process.patch
>
>
> This pair of Processors will provide several benefits over the existing 
> GetFile processor:
> 1. Currently, GetFile will continually pull the same files if the "Keep 
> Source File" property is set to true. There is no way to pull the file and 
> leave it in the directory without continually pulling the same file. We could 
> implement state here, but it would either be a huge amount of state to 
> remember everything pulled or it would have to always pull the oldest file 
> first so that we can maintain just the Last Modified Date of the last file 
> pulled plus all files with the same Last Modified Date that have already been 
> pulled.
> 2. If pulling from a network attached storage such as NFS, this would allow a 
> single processor to run ListFiles and then distribute those FlowFiles to the 
> cluster so that the cluster can share the work of pulling the data.
> 3. There are use cases when we may want to pull a specific file (for example, 
> in conjunction with ProcessHttpRequest/ProcessHttpResponse) rather than just 
> pull all files in a directory. GetFile does not support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to