Stream should allow to re-start the command if it failed in the middle of input
-------------------------------------------------------------------------------
Key: HADOOP-1477
URL: https://issues.apache.org/jira/browse/HADOOP-1477
Project: Hadoop
Issue Type: Improvement
Components: contrib/streaming
Reporter: arkady borkovsky
Sometime, we need to use imperfect programs to process data.
Recently, I used a public domain program that does what I need, but crashes
after processing few million records (in my cases, more than half of the
mappers would succeed, with the rest failing at different %%).
It would be nice if it was possible to tell the Streaming Framework :
if the streaming command fails at some input record (and you get "pipe
broken" from it),
restart the command and continue feeding it the data.
Please log the failing record.
In datamining, quite often, loosing few record of the input makes no
difference at all.
Of course this feature should be disabled by default, and should some "are
really sure" provision. (an expert feature).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.