Will just contributed an Interceptor to provide this out of the box: https://issues.apache.org/jira/browse/FLUME-1284
Regards Mike On Tuesday, June 19, 2012 at 2:54 PM, Mohammad Tariq wrote: > There is no problem from your side..Have a look at this - > http://mail-archives.apache.org/mod_mbox/incubator-flume-user/201206.mbox/%3CCAGPLoJKLthyoecEYnJRscahe8q6i4kKH1ADsL3qoQCAQo=i...@mail.gmail.com%3E > > Regards, > Mohammad Tariq > > > On Wed, Jun 20, 2012 at 1:42 AM, Bhaskar <bmar...@gmail.com > (mailto:bmar...@gmail.com)> wrote: > > Unfortunately, that part is not working as expected. Must be my mistake > > somewhere in the configuration. Here is my sink configuration. > > > > agent4.sinks.svc_0_sink.type = FILE_ROLL > > agent4.sinks.svc_0_sink.sink.directory=/var/logs/agent4/%{host} > > agent4.sinks.svc_0_sink.rollInterval=5400 > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2 > > > > Any thoughts on how to define host specific directory/file? > > > > Bhaskar > > > > On Tue, Jun 19, 2012 at 10:01 AM, Mohammad Tariq <donta...@gmail.com > > (mailto:donta...@gmail.com)> wrote: > > > > > > Hello Bhaskar, > > > > > > That's great...And we can use "%{host}" as the escape > > > sequence to prefix our filenames(am I getting you correctly???).And I > > > am waiting anxiously for your guide as I am still a newbie..:-) > > > > > > Regards, > > > Mohammad Tariq > > > > > > > > > On Tue, Jun 19, 2012 at 7:16 PM, Bhaskar <bmar...@gmail.com > > > (mailto:bmar...@gmail.com)> wrote: > > > > Thank you guys for the responses. I actually was able to get around > > > > this > > > > problem by tinkering around with my setting. I finally ended up with a > > > > capacity of 10000 and commented out transactionCapacity (i originally > > > > set it > > > > to 10) and it started working. Thanks for the insight. It took me a > > > > bit of > > > > time to figure out the inner workings of AVRO to get it to send data in > > > > correct format. So, i got over that hump:-). Here is my flow for POC. > > > > > > > > Host A agent --> Source tail exec --> AVRO Client Sink --> jdbc channel > > > > (flume-ng avro-client -H <<Host>> -p <<port>> -F <<file to read>> > > > > --conf > > > > ../conf/) > > > > Host B agent --> Source tail exec --> AVRO Client Sink --> jdbc channel > > > > (flume-ng avro-client -H <<Host>> -p <<port>> -F <<file to read>> > > > > --conf > > > > ../conf/) > > > > Host C agent --> avro-collector source --> file sink to local directory > > > > -- > > > > Memory channel > > > > > > > > The issue i am running into is, I am unable to uniquely identify the > > > > source > > > > of the log in the sink (means the log events from Host A and Host B are > > > > combined into the same log on the disk and mixed up). Is there a way to > > > > provide unique identifier from the source so that we can track the > > > > origin of > > > > the log? I am hoping to see in my sink log, > > > > > > > > Host A -- some log entry > > > > Host B -- Some log entry etc > > > > > > > > Is this feasible or are there any alternative mechanisms to achieve > > > > this? I > > > > am putting together a new bee guide that might help answer some of these > > > > questions for others as i explore this architecture. > > > > > > > > As always thanks for your assistance, > > > > Bhaskar > > > > > > > > > > > > On Tue, Jun 19, 2012 at 2:59 AM, Juhani Connolly > > > > <juhani_conno...@cyberagent.co.jp > > > > (mailto:juhani_conno...@cyberagent.co.jp)> wrote: > > > > > > > > > > Hello Bhaskar, > > > > > > > > > > Using Avro is generally the recommended method to handle multi-hop > > > > > flows, > > > > > so no concerns there. > > > > > > > > > > Have you tried this setup using memory channels instead of jdbc? Last > > > > > time > > > > > I tested it, the JDBC channel had poor throughput, so you may be > > > > > getting a > > > > > logjam somewhere. How much data is getting entered into your logfile? > > > > > Try > > > > > raising the capacity on your jdbc channel by a lot(10000?). With a > > > > > capacity > > > > > of 10, if the reading side(host b) isn't polling frequently enough, > > > > > there's > > > > > going to be problems. This is probably why you get the "failed to > > > > > persist > > > > > event". As far as FLUME-1259 is concerned, that should only be > > > > > happening if > > > > > bad data is being sent. You're not sending anything else to the same > > > > > port > > > > > are you? Make sure that only the source and sink are set to that port > > > > > and > > > > > that nothing else is. > > > > > > > > > > If the problem continues, please post a chunk of the logs leading up > > > > > to > > > > > the OOM error(the full trace for the cause should be enough) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 06/16/2012 12:01 AM, Bhaskar wrote: > > > > > > > > > > Sorry to be a pest with stream of questions. I think i am going two > > > > > steps > > > > > forward and four steps back:-). After my first successful attempt, i > > > > > tried > > > > > running flume with the following flow: > > > > > > > > > > 1. HostA > > > > > -- Source is tail web server log > > > > > -- channel jdbc > > > > > -- sink is AVRO collection on Host B > > > > > Configuraiton: > > > > > agent3.sources = tail > > > > > agent3.sinks = avro-forward-sink > > > > > agent3.channels = jdbc-channel > > > > > > > > > > # Define source flow > > > > > agent3.sources.tail.type = exec > > > > > agent3.sources.tail.command = tail -f /common/log/access.log > > > > > agent3.sources.tail.channels = jdbc-channel > > > > > > > > > > # define the flow > > > > > agent3.sinks.avro-forward-sink.channel = jdbc-channel > > > > > > > > > > # avro sink properties > > > > > agent3.sources.avro-forward-sink.type = avro > > > > > agent3.sources.avro-forward-sink.hostname = <<IP Address>> > > > > > agent3.sources.avro-forward-sink.port = <<PORT>> > > > > > > > > > > # Define channels > > > > > agent3.channels.jdbc-channel.type = jdbc > > > > > agent3.channels.jdbc-channel.maximum.capacity = 10 > > > > > agent3.channels.jdbc-channel.maximum.connections = 2 > > > > > > > > > > > > > > > 2. HostB > > > > > -- Source is AVRO collection > > > > > -- channel is memory > > > > > -- sink is local file system > > > > > > > > > > Configuration: > > > > > # list sources, sinks and channels in the agent4 > > > > > agent4.sources = avro-collection-source > > > > > agent4.sinks = svc_0_sink > > > > > agent4.channels = MemoryChannel-2 > > > > > > > > > > # define the flow > > > > > agent4.sources.avro-collection-source.channels = MemoryChannel-2 > > > > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2 > > > > > > > > > > # avro sink properties > > > > > agent4.sources.avro-collection-source.type = avro > > > > > agent4.sources.avro-collection-source.bind = <<IP Address>> > > > > > agent4.sources.avro-collection-source.port = <<PORT>> > > > > > > > > > > agent4.sinks.svc_0_sink.type = FILE_ROLL > > > > > agent4.sinks.svc_0_sink.sink.directory=/logs/agent4 > > > > > agent4.sinks.svc_0_sink.rollInterval=600 > > > > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2 > > > > > > > > > > agent4.channels.MemoryChannel-2.type = memory > > > > > agent4.channels.MemoryChannel-2.capacity = 100 > > > > > agent4.channels.MemoryChannel-2.transactionCapacity = 10 > > > > > > > > > > > > > > > Basically i am trying to tail a file on one host, stream it to another > > > > > host running sink. During the trial run, the configuration is loaded > > > > > fine > > > > > and i see the channels created fine. I see an exception from the jdbc > > > > > channel first (Failed to persist event). I am getting a java heap > > > > > space OOM > > > > > exception from Host B when Host A attempts to write. > > > > > > > > > > 2012-06-15 10:31:44,503 WARN ipc.NettyServer: Unexpected exception > > > > > from > > > > > downstream. > > > > > java.lang.OutOfMemoryError: Java heap space > > > > > > > > > > This issue was already > > > > > reported https://issues.apache.org/jira/browse/FLUME-1259 but i am not > > > > > sure > > > > > if there is a work around to this problem. I have couple questions: > > > > > > > > > > 1. Am i force fitting a wrong solution here using AVRO? > > > > > 2. if so, what would be a right solution for streaming data from Host > > > > > A > > > > > to Host B (or thru intermediaries)? > > > > > > > > > > Thanks, > > > > > Bhaskar > > > > > > > > > > On Thu, Jun 14, 2012 at 4:31 PM, Mohammad Tariq <donta...@gmail.com > > > > > (mailto:donta...@gmail.com)> > > > > > wrote: > > > > > > > > > > > > Since you are thinking of using multi-hop flow I would suggest to go > > > > > > for "JDBC Channel" as there is higher chance of error than > > > > > > single-hop > > > > > > flow and in JDBC Channel events are stored in a persistent storage > > > > > > that’s backed by a database. For detailed guidelines you can refer > > > > > > Flume 1.x User Guide at - > > > > > > > > > > > > > > > > > > https://people.apache.org/~mpercy/flume/flume-1.2.0-incubating-SNAPSHOT/docs/FlumeUserGuide.html > > > > > > > > > > > > Regards, > > > > > > Mohammad Tariq > > > > > > > > > > > > > > > > > > On Fri, Jun 15, 2012 at 12:46 AM, Bhaskar <bmar...@gmail.com > > > > > > (mailto:bmar...@gmail.com)> wrote: > > > > > > > Hi Mohammad, > > > > > > > Thanks for the pointer there. Do you think using a message queue > > > > > > > (like > > > > > > > rabbitmq) would be a choice of communication channel between each > > > > > > > hop? > > > > > > > i am > > > > > > > struggling to get a handle on how i need to configure my sink in > > > > > > > intermediary hops in a multi-hop flow. Appreciate any > > > > > > > guidance/examples. > > > > > > > > > > > > > > Thanks, > > > > > > > Bhaskar > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 14, 2012 at 1:57 PM, Mohammad Tariq > > > > > > > <donta...@gmail.com (mailto:donta...@gmail.com)> > > > > > > > wrote: > > > > > > > > > > > > > > > > Hello Bhaskar, > > > > > > > > > > > > > > > > That's great..And the best approach to stream logs depends > > > > > > > > upon > > > > > > > > the type of source you want to watch for..And by looking at your > > > > > > > > usecase, I would suggest to go for "multi-hop" flows where > > > > > > > > events > > > > > > > > travel through multiple agents before reaching the final > > > > > > > > destination. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Mohammad Tariq > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 14, 2012 at 10:48 PM, Bhaskar <bmar...@gmail.com > > > > > > > > (mailto:bmar...@gmail.com)> > > > > > > > > wrote: > > > > > > > > > I know what i am missing:-) I missed connecting the sink with > > > > > > > > > the > > > > > > > > > channel. > > > > > > > > > My small POC works now and i am able to view the streamed > > > > > > > > > logs. > > > > > > > > > Thank > > > > > > > > > you > > > > > > > > > all for the guidance and patience in answering all questions. > > > > > > > > > So, > > > > > > > > > whats > > > > > > > > > the > > > > > > > > > best approach to stream logs from other hosts? Basically my > > > > > > > > > next > > > > > > > > > task > > > > > > > > > would > > > > > > > > > be to set up collector (sort of) model to stream logs to > > > > > > > > > intermediary > > > > > > > > > and > > > > > > > > > then stream from collector to a sink location. I'd appreciate > > > > > > > > > any > > > > > > > > > thoughts/guidance in this regard. > > > > > > > > > > > > > > > > > > Bhaskar > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 14, 2012 at 12:52 PM, Bhaskar <bmar...@gmail.com > > > > > > > > > (mailto:bmar...@gmail.com)> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > For testing purposes, I tried with the following > > > > > > > > > > configuration > > > > > > > > > > without > > > > > > > > > > much luck. I see that the process started fine but it just > > > > > > > > > > does > > > > > > > > > > not > > > > > > > > > > write > > > > > > > > > > anything to the sink. I guess i am missing something here. > > > > > > > > > > Can > > > > > > > > > > one of > > > > > > > > > > you > > > > > > > > > > gurus take a look and suggest what i am doing wrong? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Bhaskar > > > > > > > > > > > > > > > > > > > > agent1.sources = tail > > > > > > > > > > agent1.channels = MemoryChannel-2 > > > > > > > > > > agent1.sinks = svc_0_sink > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > agent1.sources.tail.type = exec > > > > > > > > > > agent1.sources.tail.command = tail -f /var/log/access.log > > > > > > > > > > agent1.sources.tail.channels = MemoryChannel-2 > > > > > > > > > > > > > > > > > > > > agent1.sinks.svc_0_sink.type = FILE_ROLL > > > > > > > > > > agent1.sinks.svc_0_sink.sink.directory=/flume_runtime/logs > > > > > > > > > > agent1.sinks.svc_0_sink.rollInterval=0 > > > > > > > > > > > > > > > > > > > > agent1.channels.MemoryChannel-2.type = memory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 14, 2012 at 4:26 AM, Guillaume Polaert > > > > > > > > > > <gpola...@cyres.fr (mailto:gpola...@cyres.fr)> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Bhaskar, > > > > > > > > > > > > > > > > > > > > > > This is the flume.conf (http://pastebin.com/WULgUuaf) > > > > > > > > > > > what I'm > > > > > > > > > > > using. > > > > > > > > > > > I have an avro server on the hadoop-m host and one agent > > > > > > > > > > > per > > > > > > > > > > > node > > > > > > > > > > > (slave > > > > > > > > > > > hosts). Each agent send the ouput of a exec command to > > > > > > > > > > > avro > > > > > > > > > > > server. > > > > > > > > > > > > > > > > > > > > > > Host1 : exec -> memory -> avro (sink) > > > > > > > > > > > > > > > > > > > > > > Host2 : exec -> memory -> avro > > > > > > > > > > > >>>>> > > > > > > > > > > > MainHost : > > > > > > > > > > > avro > > > > > > > > > > > (source) -> memory -> rolling file (local FS) > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > Host3 : exec -> memory -> avro > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Use your own exec command to read Apache log. > > > > > > > > > > > > > > > > > > > > > > Guillaume Polaert | Cyrès Conseil > > > > > > > > > > > > > > > > > > > > > > De : Bhaskar [mailto:bmar...@gmail.com] > > > > > > > > > > > Envoyé : mercredi 13 juin 2012 19:16 > > > > > > > > > > > À : flume-user@incubator.apache.org > > > > > > > > > > > (mailto:flume-user@incubator.apache.org) > > > > > > > > > > > Objet : Newbee question about flume 1.2 set up > > > > > > > > > > > > > > > > > > > > > > Good Afternoon, > > > > > > > > > > > I am a newbee to flume and read thru limited documentation > > > > > > > > > > > available. > > > > > > > > > > > I > > > > > > > > > > > would like to set up the following to test out. > > > > > > > > > > > > > > > > > > > > > > 1. Read apache access logs (as source) > > > > > > > > > > > 2. Use memory channel > > > > > > > > > > > 3. Write it to a NFS (or even local) file system > > > > > > > > > > > > > > > > > > > > > > Can some one help me with the necessary configuration. I > > > > > > > > > > > am > > > > > > > > > > > having > > > > > > > > > > > difficult time to glean that information from available > > > > > > > > > > > documentation. > > > > > > > > > > > I am > > > > > > > > > > > sure someone has done such test before and i appreciate > > > > > > > > > > > if you > > > > > > > > > > > can > > > > > > > > > > > pass on > > > > > > > > > > > that information. Secondly, I also would like to stream > > > > > > > > > > > the > > > > > > > > > > > logs > > > > > > > > > > > to a > > > > > > > > > > > remote server. Is that a log4j configuration or do i need > > > > > > > > > > > to > > > > > > > > > > > run > > > > > > > > > > > an > > > > > > > > > > > agent > > > > > > > > > > > on each host to do so? Any configuration examples would > > > > > > > > > > > be of > > > > > > > > > > > great > > > > > > > > > > > help. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Bhaskar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >