Alan Jackoway created NIFI-1856:
-----------------------------------

             Summary: ExecuteStreamCommand Needs to Consume Standard Error
                 Key: NIFI-1856
                 URL: https://issues.apache.org/jira/browse/NIFI-1856
             Project: Apache NiFi
          Issue Type: Bug
            Reporter: Alan Jackoway


I was using ExecuteStreamProcess to run certain hdfs commands that are tricky 
to write in nifi but easy in bash (e.g. {{hadoop fs -rm -r /data/*/2014/05/05}})
However, my larger commands kept hanging even though when I run them from the 
command line they finish quickly.
Based on 
http://www.javaworld.com/article/2071275/core-java/when-runtime-exec---won-t.html
 I believe that ExecuteStreamCommand and possibly other processors need to 
consume the standard error stream to prevent the processes from blocking when 
standard error gets filled.

To reproduce. Create this as ~/write.py
{code:python}
import sys
count = int(sys.argv[1])
for x in range(count):
    sys.stderr.write("ERROR %d\n" % x)
    sys.stdout.write("OUTPUT %d\n" % x)
{code}

Create a flow that goes 
# GenerateFlowFile - 5 minutes schedule 0 bytes size 
# ExecuteStreamCommand - Command arguments /Users/alanj/write.py;100 Command 
Path python
# PutFile - /tmp/write/
routing output stream of ExecuteStreamCommand to PutFile

When you turn everything on, you get 100 lines (not 200) of just the standard 
output in /tmp/write.

Next, change the command arguments to /Users/alanj/write.py;100000 and turn 
everything on again. The command will hang.

I believe that whenever you execute a process the way ExecuteStreamCommand is 
doing, you need to consume the standard error stream to keep it from blocking. 
This may also affect things like ExecuteProcess and ExecuteScript as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to