[
https://issues.apache.org/jira/browse/HADOOP-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
arkady borkovsky updated HADOOP-1477:
-------------------------------------
Description:
Sometimes, we need to use imperfect programs to process data.
Recently, I used a public domain program that did what I needed, but crashed
after processing few million records (in my case, more than half of the mappers
would succeed, with the rest failing at different %%).
It would be nice to be able to tell the Streaming Framework :
if the streaming command fails at some input record (and you get "pipe
broken" from it),
restart the command and continue feeding it the data.
Please log the failing record.
In textmining, quite often, loosing few record of the input makes no
difference at all.
Of course this feature should be disabled by default, and should some "are
really sure" provision. (an expert feature).
was:
Sometime, we need to use imperfect programs to process data.
Recently, I used a public domain program that does what I need, but crashes
after processing few million records (in my cases, more than half of the
mappers would succeed, with the rest failing at different %%).
It would be nice if it was possible to tell the Streaming Framework :
if the streaming command fails at some input record (and you get "pipe
broken" from it),
restart the command and continue feeding it the data.
Please log the failing record.
In datamining, quite often, loosing few record of the input makes no
difference at all.
Of course this feature should be disabled by default, and should some "are
really sure" provision. (an expert feature).
Summary: Streaming should allow to re-start the command if it failed in
the middle of input (was: Stream should allow to re-start the command if it
failed in the middle of input)
> Streaming should allow to re-start the command if it failed in the middle of
> input
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-1477
> URL: https://issues.apache.org/jira/browse/HADOOP-1477
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: arkady borkovsky
>
> Sometimes, we need to use imperfect programs to process data.
> Recently, I used a public domain program that did what I needed, but crashed
> after processing few million records (in my case, more than half of the
> mappers would succeed, with the rest failing at different %%).
> It would be nice to be able to tell the Streaming Framework :
> if the streaming command fails at some input record (and you get "pipe
> broken" from it),
> restart the command and continue feeding it the data.
> Please log the failing record.
> In textmining, quite often, loosing few record of the input makes no
> difference at all.
> Of course this feature should be disabled by default, and should some "are
> really sure" provision. (an expert feature).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.