Sorry about that
FYI, About 1GB/day across 4 collectors at the moment
On 3/16/11 6:55 PM, James Seigel wrote:
I believe sir there should be a flume support group on cloudera. I'm
guessing most of us here haven't used it and therefore aren't much
help.
This is vanilla hadoop land. :)
Cheers and good luck!
James
On a side note, how much data are you pumping through it?
Sent from my mobile. Please excuse the typos.
On 2011-03-16, at 7:53 PM, Mark<static.void....@gmail.com> wrote:
Sorry if this is not the correct list to post this on, it was the closest I
could find.
We are using a taildir('/var/log/foo/') source on all of our agents. If this
agent goes down and data can not be sent to the collector for some time, what
happens when this agent becomes available again? Will the agent tail the whole
directory starting from the beginning of all files thus adding duplicate data
to our sink?
I've read that I could set the startFromEnd parameter to true. In that case
however if an agent goes down then we would lose any data that gets written to
our file until the agent comes back up. How do people handle this? It seems
like you either have to deal with the fact that you will have duplicate or
missing data.
Thanks||