As per the collector design, the collector accepts multiple chunks and writes 
each chunk to hdfs. If all the chunks are written to hdfs, collector sends back 
200 status to agent
If hdfs write fails in between, the collector aborts entire processing and 
sends exception. This could mean that the data is partially written to hdfs. I 
have a couple of questions


1.       The agent does not receive response 200. Does it resend the same data 
to another collector? How does checkpointing works in this case?

2.       If the agent sends same data to another collector and it goes to hdfs, 
there is a duplication of some records. Are those duplicates filtered when 
preprocessor runs?

In summary what data loss happens when hdfs goes down from collector 
perspective?

Thanks,
Jaydeep

Jaydeep Ayachit | Persistent Systems Ltd
Cell: +91 9822393963 | Desk: +91 712 3986747


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to