[ 
https://issues.apache.org/jira/browse/CHUKWA-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated CHUKWA-533:
-------------------------------

    Attachment: CHUKWA-533-2.patch

Thanks Eric.

Here's patch #2. It contains additional logic to handle when the previous 
output stream can't be closed before the move during {{rotate}}. This is for 
the case where HDFS went down and back up, so the file handle might not always 
be able to be closed per se, but the file could still be moved. This patch is 
deployed on our system and seems to be working well.



> Improve fault-tolerance of collectors.
> --------------------------------------
>
>                 Key: CHUKWA-533
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-533
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: data collection
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: CHUKWA-533-1.patch, CHUKWA-533-2.patch
>
>
> There are currently a number of ways that a collector can die, typically due 
> to errors on a DN or a NN that's being restarted. A collector should have 
> some combination of retry logic followed by failing back to the agent, but 
> the collector process should not die.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to