Alan Jackoway created NIFI-1856: ----------------------------------- Summary: ExecuteStreamCommand Needs to Consume Standard Error Key: NIFI-1856 URL: https://issues.apache.org/jira/browse/NIFI-1856 Project: Apache NiFi Issue Type: Bug Reporter: Alan Jackoway
I was using ExecuteStreamProcess to run certain hdfs commands that are tricky to write in nifi but easy in bash (e.g. {{hadoop fs -rm -r /data/*/2014/05/05}}) However, my larger commands kept hanging even though when I run them from the command line they finish quickly. Based on http://www.javaworld.com/article/2071275/core-java/when-runtime-exec---won-t.html I believe that ExecuteStreamCommand and possibly other processors need to consume the standard error stream to prevent the processes from blocking when standard error gets filled. To reproduce. Create this as ~/write.py {code:python} import sys count = int(sys.argv[1]) for x in range(count): sys.stderr.write("ERROR %d\n" % x) sys.stdout.write("OUTPUT %d\n" % x) {code} Create a flow that goes # GenerateFlowFile - 5 minutes schedule 0 bytes size # ExecuteStreamCommand - Command arguments /Users/alanj/write.py;100 Command Path python # PutFile - /tmp/write/ routing output stream of ExecuteStreamCommand to PutFile When you turn everything on, you get 100 lines (not 200) of just the standard output in /tmp/write. Next, change the command arguments to /Users/alanj/write.py;100000 and turn everything on again. The command will hang. I believe that whenever you execute a process the way ExecuteStreamCommand is doing, you need to consume the standard error stream to keep it from blocking. This may also affect things like ExecuteProcess and ExecuteScript as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)