[ 
https://issues.apache.org/jira/browse/NIFI-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526802#comment-14526802
 ] 

Ricky Saltzer commented on NIFI-583:
------------------------------------

Hey Mark -

I'm under the impression (could be wrong) that ExecuteProcess does not stream 
the FlowFile's contents to STDIN. Instead, it appears that the presence of the 
FlowFile inside of the connection is what triggers it to execute. The problem 
here is that the FlowFile is never removed from the connection, because it's 
never consumed. That being said, the ExecuteStreamCommand *does* stream the 
FlowFile's contents as STDIN.

It might be more intuitive to allow the ExecuteProcess to redirect the FlowFile 
to a second relationship (e.g. passthrough), than to configure the 
ExecuteStreamCommand to not stream STDIN. That way ExecuteProcess can consume 
the FlowFile, and guarantee the OS command is only ran once per FlowFile.

Ricky 

> Allow ExecuteProcess to consume an incoming flowfile
> ----------------------------------------------------
>
>                 Key: NIFI-583
>                 URL: https://issues.apache.org/jira/browse/NIFI-583
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Ricky Saltzer
>
> In some cases it would be really nice to allow a FlowFile to trigger an OS 
> action. For instance, after a daily dump of data is written to an Impala 
> table in HDFS, I would like to execute a refresh on the table via the shell. 
> As it stands, the ExecuteProcess processor will allow a FlowFile in a 
> connection to trigger execution, but unless your connection has an expiration 
> set, the FlowFile will stay there indefinitely. The main issue here is that 
> it will continue to re-execute your ExecuteProcess processor over and over. 
> As far as I know, there's only two clear ways around this. (1) - you can use 
> the ExecuteStreamCommand, instead, but *only* if that command can properly 
> handle STDIN. (2) - you can set your ExecuteProcess processor to execute on a 
> schedule (e.g. 1 per minute) and expire the FlowFile before it can re-execute 
> (e.g. 10 seconds). 
> It would be useful if the ExecuteProcess processor consumed the FlowFile, and 
> passed it through a "passthrough" relationship of some kind. A second option 
> would be to make it configurable (false by default) to drop the FlowFile, or 
> to pass it through a second relationship, that way it doesn't break anyone's 
> current pipelines. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to