I am new to flume and just setup a distributed scenario. I have one agent, one collector, and one master, all on different hardware.
I am currently having the agent taildir a log directory and I am sinking the data into Cassandra. I am generating 1000 log entries per second, with entries 150-bytes in size. I have VNstat running on the agent box and the ethernet bandwidth never goes over 2mb/sec leaving the agent to the collector. My goal was to get the 1000 log entries per second to write in near realtime to the collector, which then inserts them into Cassandra. Right now Cassandra is yawning under the load and I am not sure where the data is being throttled. I can create log entries for 1 minutes, then turn off additional writes. The agent takes at least 10 minutes to process through all of those entries. Thoughts? Trevor Francis
