[ 
https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739795#action_12739795
 ] 

Jerome Boulon commented on CHUKWA-369:
--------------------------------------

Regarding the issue with .chukwa files, the new LocalWriter is taking care of 
this. Any file older than the rotation period +1min will be rename and send 
over to HDFS.

@Ari: there's one thing I don't understand. Since there's more than one client 
writing to the same SeqFile, How do you know that the 2 additional MBs that you 
are seeing on the file is comming from Client1 and not Client2? Also keep in 
mind that in order to improve performance, most of the time you will have to 
buffer data in memory first then write in big chunk to disk. 
This is what HDFS is doing and from what I know there's no easy way to figure 
out if the data is still in memory or has been written to disk (At least for 
now).

So unless you are able to keep track of the last SeqID per RecordType/Agent at 
the collector side and then figure out what has been push to disk and what is 
still in memory, I don't see a way to send the right information back to the 
Agent.

 


> proposed reliability mechanism
> ------------------------------
>
>                 Key: CHUKWA-369
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-369
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>             Fix For: 0.3.0
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't, 
> quite, since we don't handle collector crashes.  Here's a proposed 
> reliability mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to