[jira] [Commented] (STORM-969) HDFS Bolt can end up in an unrecoverable state

ASF GitHub Bot (JIRA) Sun, 16 Aug 2015 18:16:16 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698911#comment-14698911
 ]


ASF GitHub Bot commented on STORM-969:
--------------------------------------

Github user dossett commented on the pull request:

    https://github.com/apache/storm/pull/664#issuecomment-131648086
  
    @arunmahadevan Thank you for the feedback!  I have added a tick tuple 
feature to address your first point (I am already using this locally, I forgot 
to include it in this PR).
    
    You are right that I attempt to sync even when the write fails.  I agree 
that this is unlikely to succeed, but it seems worthwhile to try to "save" as 
many tuples as possible.
    
    With this tick tuple change I think the code is very close to your 
recommendation.  If the bolt is in a bad state the tick tuples will 
periodically try to rotate the file and all errors will be reported up through 
```this.collector.reportError(e)```


> HDFS Bolt can end up in an unrecoverable state
> ----------------------------------------------
>
>                 Key: STORM-969
>                 URL: https://issues.apache.org/jira/browse/STORM-969
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-hdfs
>            Reporter: Aaron Dossett
>            Assignee: Aaron Dossett
>
> The body of the HDFSBolt.execute() method is essentially one try-catch block. 
>  The catch block reports the error and fails the current tuple.  In some 
> cases the bolt's FSDataOutputStream object (named 'out') is in an 
> unrecoverable state and no subsequent calls to execute() can succeed.
> To produce this scenario:
> - process some tuples through HDFS bolt
> - put the underlying HDFS system into safemode
> - process some more tuples and receive a correct ClosedChannelException
> - take the underlying HDFS system out of safemode
> - subsequent tuples continue to fail with the same exception
> The three fundamental operations that execute takes (writing, sync'ing, 
> rotating) need to be isolated so that errors from each are specifically 
> handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-969) HDFS Bolt can end up in an unrecoverable state

Reply via email to