event ordering semantics

2012-11-06 Thread Jan Van Besien
Hi, I am trying to figure out what are the exact semantics of flume with respect to event ordering. I'm interested in both single agent and multi agent setups. The current documentation (http://flume.apache.org/FlumeUserGuide.html) doesn't seem to mention anything about this topic. Older

Re: event ordering semantics

2012-11-06 Thread Brock Noland
If everything is working well flume will give you approximately in order delivery. If transactions are being rolled back that order will be considerably mixed. I don't see us supporting delivery order anytime soon. Additionally, for most flume use cases a global serial number could be assigned at

Re: Error while writing to HDFS

2012-11-06 Thread Brock Noland
Regarding the EOFException my guess is that some nodes are acting flaky. What version of hadoop are you running? Brock On Mon, Nov 5, 2012 at 8:43 PM, Cameron Gandevia cgande...@gmail.com wrote: Hi I starting noticing the following error on our flume nodes and was wondering if anyone had any

Re: ExecSource tail -F stops after a while

2012-11-06 Thread Parag Hukeri
Took a threaddump when it hit that point. Looks like the issue might be with some custom interceptor code that we have written. Looking into it. Thanks for pointing in the right direction. --- On Mon, 11/5/12, Brock Noland br...@cloudera.com wrote: From: Brock Noland br...@cloudera.com

Re: Error while writing to HDFS

2012-11-06 Thread Cameron Gandevia
Thanks for the reply, it looks like the cause was our DataNodes throwing the following exception java.io.IOException: xceiverCount 2050 exceeds the limit of concurrent xcievers 2048 I upgraded this setting and now everything seems to run correctly. On Tue, Nov 6, 2012 at 5:00 AM, Brock Noland

Re: HTTP Source

2012-11-06 Thread Hari Shreedharan
No specific reason. I was familiar with Jetty and we already had a dependence on Jetty for the metrics stuff and I think some avro IPC stuff too, so I decided to just use the same. Thanks, Hari -- Hari Shreedharan On Monday, November 5, 2012 at 2:09 PM, Harish Mandala wrote: Just

Re: HTTP Source

2012-11-06 Thread Hari Shreedharan
Hi Nathaniel, When I wrote this (and while it was being reviewed), we concluded it was not a good idea to expose the response object to the custom handler which can be plugged in because the handler can flush the buffer of the response object (which also can be flushed just because it is

Re: 答复: HDFS sink leaves .tmp files

2012-11-06 Thread Kathleen Ting
Hi Shara, The .tmp file contains the actual data before it is renamed to the final file. If the .tmp file is still open, then Flume is still holding it open in order to write to it. If Flume somehow dies and is unable to clean up the .tmp files, then once the client lease expires, the NameNode

Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Rahul Ravindran
Hi,    I am very new to Flume and we are hoping to use it for our log aggregation into HDFS. I have a few questions below: FileChannel will double our disk IO, which will affect IO performance on certain performance sensitive machines. Hence, I was hoping to write a custom Flume source which

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Brock Noland
Your still going to be writing out all events, no? So how would file channel do more IO than that? On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote: Hi, I am very new to Flume and we are hoping to use it for our log aggregation into HDFS. I have a few questions below:

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Brock Noland
But in your architecture you are going to write the contents of the memory channel out? Or did I miss something? The checkpoint will be updated each time we perform a successive insertion into the memory channel. On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote: We have a

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Rahul Ravindran
We will update the checkpoint each time (we may tune this to be periodic) but the contents of the memory channel will be in the legacy logs which are currently being generated. Additionally, the sink for the memory channel will be an Avro source in another machine. Does that clear things up?

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Brock Noland
This use case sounds like a perfect use of the Spool DIrectory source which will be in the upcoming 1.3 release. Brock On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote: We will update the checkpoint each time (we may tune this to be periodic) but the contents of the