Hi,
I am trying to figure out what are the exact semantics of flume with
respect to event ordering. I'm interested in both single agent and multi
agent setups.
The current documentation (http://flume.apache.org/FlumeUserGuide.html)
doesn't seem to mention anything about this topic.
Older
If everything is working well flume will give you approximately in
order delivery. If transactions are being rolled back that order will
be considerably mixed.
I don't see us supporting delivery order anytime soon. Additionally,
for most flume use cases a global serial number could be assigned at
Regarding the EOFException my guess is that some nodes are acting
flaky. What version of hadoop are you running?
Brock
On Mon, Nov 5, 2012 at 8:43 PM, Cameron Gandevia cgande...@gmail.com wrote:
Hi
I starting noticing the following error on our flume nodes and was wondering
if anyone had any
Took a threaddump when it hit that point. Looks like the issue might be with
some custom interceptor code that we have written. Looking into it. Thanks for
pointing in the right direction.
--- On Mon, 11/5/12, Brock Noland br...@cloudera.com wrote:
From: Brock Noland br...@cloudera.com
Thanks for the reply, it looks like the cause was our DataNodes throwing
the following exception
java.io.IOException: xceiverCount 2050 exceeds the limit of concurrent
xcievers 2048
I upgraded this setting and now everything seems to run correctly.
On Tue, Nov 6, 2012 at 5:00 AM, Brock Noland
No specific reason. I was familiar with Jetty and we already had a dependence
on Jetty for the metrics stuff and I think some avro IPC stuff too, so I
decided to just use the same.
Thanks,
Hari
--
Hari Shreedharan
On Monday, November 5, 2012 at 2:09 PM, Harish Mandala wrote:
Just
Hi Nathaniel,
When I wrote this (and while it was being reviewed), we concluded it was not a
good idea to expose the response object to the custom handler which can be
plugged in because the handler can flush the buffer of the response object
(which also can be flushed just because it is
Hi Shara,
The .tmp file contains the actual data before it is renamed to the
final file. If the .tmp file is still open, then Flume is still
holding it open in order to write to it. If Flume somehow dies and is
unable to clean up the .tmp files, then once the client lease expires,
the NameNode
Hi,
I am very new to Flume and we are hoping to use it for our log aggregation
into HDFS. I have a few questions below:
FileChannel will double our disk IO, which will affect IO performance on
certain performance sensitive machines. Hence, I was hoping to write a custom
Flume source which
Your still going to be writing out all events, no? So how would file
channel do more IO than that?
On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
Hi,
I am very new to Flume and we are hoping to use it for our log
aggregation into HDFS. I have a few questions below:
But in your architecture you are going to write the contents of the
memory channel out? Or did I miss something?
The checkpoint will be updated each time we perform a successive
insertion into the memory channel.
On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
We have a
We will update the checkpoint each time (we may tune this to be periodic) but
the contents of the memory channel will be in the legacy logs which are
currently being generated.
Additionally, the sink for the memory channel will be an Avro source in another
machine.
Does that clear things up?
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.
Brock
On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
We will update the checkpoint each time (we may tune this to be periodic)
but the contents of the
13 matches
Mail list logo