Re: Newbee question about flume 1.2 set up

Mike Percy Tue, 19 Jun 2012 19:59:39 -0700

Will just contributed an Interceptor to provide this out of the box:  

https://issues.apache.org/jira/browse/FLUME-1284


Regards
Mike


On Tuesday, June 19, 2012 at 2:54 PM, Mohammad Tariq wrote:

> There is no problem from your side..Have a look at this -
> http://mail-archives.apache.org/mod_mbox/incubator-flume-user/201206.mbox/%3CCAGPLoJKLthyoecEYnJRscahe8q6i4kKH1ADsL3qoQCAQo=i...@mail.gmail.com%3E
>  
> Regards,
> Mohammad Tariq
>  
>  
> On Wed, Jun 20, 2012 at 1:42 AM, Bhaskar <bmar...@gmail.com 
> (mailto:bmar...@gmail.com)> wrote:
> > Unfortunately, that part is not working as expected. Must be my mistake
> > somewhere in the configuration. Here is my sink configuration.
> >  
> > agent4.sinks.svc_0_sink.type = FILE_ROLL
> > agent4.sinks.svc_0_sink.sink.directory=/var/logs/agent4/%{host}
> > agent4.sinks.svc_0_sink.rollInterval=5400
> > agent4.sinks.svc_0_sink.channel = MemoryChannel-2
> >  
> > Any thoughts on how to define host specific directory/file?
> >  
> > Bhaskar
> >  
> > On Tue, Jun 19, 2012 at 10:01 AM, Mohammad Tariq <donta...@gmail.com 
> > (mailto:donta...@gmail.com)> wrote:
> > >  
> > > Hello Bhaskar,
> > >  
> > > That's great...And we can use "%{host}" as the escape
> > > sequence to prefix our filenames(am I getting you correctly???).And I
> > > am waiting anxiously for your guide as I am still a newbie..:-)
> > >  
> > > Regards,
> > > Mohammad Tariq
> > >  
> > >  
> > > On Tue, Jun 19, 2012 at 7:16 PM, Bhaskar <bmar...@gmail.com 
> > > (mailto:bmar...@gmail.com)> wrote:
> > > > Thank you guys for the responses. I actually was able to get around
> > > > this
> > > > problem by tinkering around with my setting. I finally ended up with a
> > > > capacity of 10000 and commented out transactionCapacity (i originally
> > > > set it
> > > > to 10) and it started working. Thanks for the insight. It took me a
> > > > bit of
> > > > time to figure out the inner workings of AVRO to get it to send data in
> > > > correct format. So, i got over that hump:-). Here is my flow for POC.
> > > >  
> > > > Host A agent --> Source tail exec --> AVRO Client Sink --> jdbc channel
> > > > (flume-ng avro-client -H <<Host>> -p <<port>> -F <<file to read>>
> > > > --conf
> > > > ../conf/)
> > > > Host B agent --> Source tail exec --> AVRO Client Sink --> jdbc channel
> > > > (flume-ng avro-client -H <<Host>> -p <<port>> -F <<file to read>>
> > > > --conf
> > > > ../conf/)
> > > > Host C agent --> avro-collector source --> file sink to local directory
> > > > --
> > > > Memory channel
> > > >  
> > > > The issue i am running into is, I am unable to uniquely identify the
> > > > source
> > > > of the log in the sink (means the log events from Host A and Host B are
> > > > combined into the same log on the disk and mixed up). Is there a way to
> > > > provide unique identifier from the source so that we can track the
> > > > origin of
> > > > the log? I am hoping to see in my sink log,
> > > >  
> > > > Host A -- some log entry
> > > > Host B -- Some log entry etc
> > > >  
> > > > Is this feasible or are there any alternative mechanisms to achieve
> > > > this? I
> > > > am putting together a new bee guide that might help answer some of these
> > > > questions for others as i explore this architecture.
> > > >  
> > > > As always thanks for your assistance,
> > > > Bhaskar
> > > >  
> > > >  
> > > > On Tue, Jun 19, 2012 at 2:59 AM, Juhani Connolly
> > > > <juhani_conno...@cyberagent.co.jp 
> > > > (mailto:juhani_conno...@cyberagent.co.jp)> wrote:
> > > > >  
> > > > > Hello Bhaskar,
> > > > >  
> > > > > Using Avro is generally the recommended method to handle multi-hop
> > > > > flows,
> > > > > so no concerns there.
> > > > >  
> > > > > Have you tried this setup using memory channels instead of jdbc? Last
> > > > > time
> > > > > I tested it, the JDBC channel had poor throughput, so you may be
> > > > > getting a
> > > > > logjam somewhere. How much data is getting entered into your logfile?
> > > > > Try
> > > > > raising the capacity on your jdbc channel by a lot(10000?). With a
> > > > > capacity
> > > > > of 10, if the reading side(host b) isn't polling frequently enough,
> > > > > there's
> > > > > going to be problems. This is probably why you get the "failed to
> > > > > persist
> > > > > event". As far as FLUME-1259 is concerned, that should only be
> > > > > happening if
> > > > > bad data is being sent. You're not sending anything else to the same
> > > > > port
> > > > > are you? Make sure that only the source and sink are set to that port
> > > > > and
> > > > > that nothing else is.
> > > > >  
> > > > > If the problem continues, please post a chunk of the logs leading up 
> > > > > to
> > > > > the OOM error(the full trace for the cause should be enough)
> > > > >  
> > > > >  
> > > > >  
> > > > >  
> > > > >  
> > > > >  
> > > > > On 06/16/2012 12:01 AM, Bhaskar wrote:
> > > > >  
> > > > > Sorry to be a pest with stream of questions. I think i am going two
> > > > > steps
> > > > > forward and four steps back:-). After my first successful attempt, i
> > > > > tried
> > > > > running flume with the following flow:
> > > > >  
> > > > > 1. HostA
> > > > > -- Source is tail web server log
> > > > > -- channel jdbc
> > > > > -- sink is AVRO collection on Host B
> > > > > Configuraiton:
> > > > > agent3.sources = tail
> > > > > agent3.sinks = avro-forward-sink
> > > > > agent3.channels = jdbc-channel
> > > > >  
> > > > > # Define source flow
> > > > > agent3.sources.tail.type = exec
> > > > > agent3.sources.tail.command = tail -f /common/log/access.log
> > > > > agent3.sources.tail.channels = jdbc-channel
> > > > >  
> > > > > # define the flow
> > > > > agent3.sinks.avro-forward-sink.channel = jdbc-channel
> > > > >  
> > > > > # avro sink properties
> > > > > agent3.sources.avro-forward-sink.type = avro
> > > > > agent3.sources.avro-forward-sink.hostname = <<IP Address>>
> > > > > agent3.sources.avro-forward-sink.port = <<PORT>>
> > > > >  
> > > > > # Define channels
> > > > > agent3.channels.jdbc-channel.type = jdbc
> > > > > agent3.channels.jdbc-channel.maximum.capacity = 10
> > > > > agent3.channels.jdbc-channel.maximum.connections = 2
> > > > >  
> > > > >  
> > > > > 2. HostB
> > > > > -- Source is AVRO collection
> > > > > -- channel is memory
> > > > > -- sink is local file system
> > > > >  
> > > > > Configuration:
> > > > > # list sources, sinks and channels in the agent4
> > > > > agent4.sources = avro-collection-source
> > > > > agent4.sinks = svc_0_sink
> > > > > agent4.channels = MemoryChannel-2
> > > > >  
> > > > > # define the flow
> > > > > agent4.sources.avro-collection-source.channels = MemoryChannel-2
> > > > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2
> > > > >  
> > > > > # avro sink properties
> > > > > agent4.sources.avro-collection-source.type = avro
> > > > > agent4.sources.avro-collection-source.bind = <<IP Address>>
> > > > > agent4.sources.avro-collection-source.port = <<PORT>>
> > > > >  
> > > > > agent4.sinks.svc_0_sink.type = FILE_ROLL
> > > > > agent4.sinks.svc_0_sink.sink.directory=/logs/agent4
> > > > > agent4.sinks.svc_0_sink.rollInterval=600
> > > > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2
> > > > >  
> > > > > agent4.channels.MemoryChannel-2.type = memory
> > > > > agent4.channels.MemoryChannel-2.capacity = 100
> > > > > agent4.channels.MemoryChannel-2.transactionCapacity = 10
> > > > >  
> > > > >  
> > > > > Basically i am trying to tail a file on one host, stream it to another
> > > > > host running sink. During the trial run, the configuration is loaded
> > > > > fine
> > > > > and i see the channels created fine. I see an exception from the jdbc
> > > > > channel first (Failed to persist event). I am getting a java heap
> > > > > space OOM
> > > > > exception from Host B when Host A attempts to write.
> > > > >  
> > > > > 2012-06-15 10:31:44,503 WARN ipc.NettyServer: Unexpected exception 
> > > > > from
> > > > > downstream.
> > > > > java.lang.OutOfMemoryError: Java heap space
> > > > >  
> > > > > This issue was already
> > > > > reported https://issues.apache.org/jira/browse/FLUME-1259 but i am not
> > > > > sure
> > > > > if there is a work around to this problem. I have couple questions:
> > > > >  
> > > > > 1. Am i force fitting a wrong solution here using AVRO?
> > > > > 2. if so, what would be a right solution for streaming data from Host
> > > > > A
> > > > > to Host B (or thru intermediaries)?
> > > > >  
> > > > > Thanks,
> > > > > Bhaskar
> > > > >  
> > > > > On Thu, Jun 14, 2012 at 4:31 PM, Mohammad Tariq <donta...@gmail.com 
> > > > > (mailto:donta...@gmail.com)>
> > > > > wrote:
> > > > > >  
> > > > > > Since you are thinking of using multi-hop flow I would suggest to go
> > > > > > for "JDBC Channel" as there is higher chance of error than 
> > > > > > single-hop
> > > > > > flow and in JDBC Channel events are stored in a persistent storage
> > > > > > that’s backed by a database. For detailed guidelines you can refer
> > > > > > Flume 1.x User Guide at -
> > > > > >  
> > > > > >  
> > > > > > https://people.apache.org/~mpercy/flume/flume-1.2.0-incubating-SNAPSHOT/docs/FlumeUserGuide.html
> > > > > >  
> > > > > > Regards,
> > > > > > Mohammad Tariq
> > > > > >  
> > > > > >  
> > > > > > On Fri, Jun 15, 2012 at 12:46 AM, Bhaskar <bmar...@gmail.com 
> > > > > > (mailto:bmar...@gmail.com)> wrote:
> > > > > > > Hi Mohammad,
> > > > > > > Thanks for the pointer there. Do you think using a message queue
> > > > > > > (like
> > > > > > > rabbitmq) would be a choice of communication channel between each
> > > > > > > hop?
> > > > > > > i am
> > > > > > > struggling to get a handle on how i need to configure my sink in
> > > > > > > intermediary hops in a multi-hop flow. Appreciate any
> > > > > > > guidance/examples.
> > > > > > >  
> > > > > > > Thanks,
> > > > > > > Bhaskar
> > > > > > >  
> > > > > > >  
> > > > > > > On Thu, Jun 14, 2012 at 1:57 PM, Mohammad Tariq 
> > > > > > > <donta...@gmail.com (mailto:donta...@gmail.com)>
> > > > > > > wrote:
> > > > > > > >  
> > > > > > > > Hello Bhaskar,
> > > > > > > >  
> > > > > > > > That's great..And the best approach to stream logs depends
> > > > > > > > upon
> > > > > > > > the type of source you want to watch for..And by looking at your
> > > > > > > > usecase, I would suggest to go for "multi-hop" flows where 
> > > > > > > > events
> > > > > > > > travel through multiple agents before reaching the final
> > > > > > > > destination.
> > > > > > > >  
> > > > > > > > Regards,
> > > > > > > > Mohammad Tariq
> > > > > > > >  
> > > > > > > >  
> > > > > > > > On Thu, Jun 14, 2012 at 10:48 PM, Bhaskar <bmar...@gmail.com 
> > > > > > > > (mailto:bmar...@gmail.com)>
> > > > > > > > wrote:
> > > > > > > > > I know what i am missing:-) I missed connecting the sink with
> > > > > > > > > the
> > > > > > > > > channel.
> > > > > > > > > My small POC works now and i am able to view the streamed 
> > > > > > > > > logs.
> > > > > > > > > Thank
> > > > > > > > > you
> > > > > > > > > all for the guidance and patience in answering all questions.
> > > > > > > > > So,
> > > > > > > > > whats
> > > > > > > > > the
> > > > > > > > > best approach to stream logs from other hosts? Basically my 
> > > > > > > > > next
> > > > > > > > > task
> > > > > > > > > would
> > > > > > > > > be to set up collector (sort of) model to stream logs to
> > > > > > > > > intermediary
> > > > > > > > > and
> > > > > > > > > then stream from collector to a sink location. I'd appreciate
> > > > > > > > > any
> > > > > > > > > thoughts/guidance in this regard.
> > > > > > > > >  
> > > > > > > > > Bhaskar
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > On Thu, Jun 14, 2012 at 12:52 PM, Bhaskar <bmar...@gmail.com 
> > > > > > > > > (mailto:bmar...@gmail.com)>
> > > > > > > > > wrote:
> > > > > > > > > >  
> > > > > > > > > > For testing purposes, I tried with the following 
> > > > > > > > > > configuration
> > > > > > > > > > without
> > > > > > > > > > much luck. I see that the process started fine but it just 
> > > > > > > > > > does
> > > > > > > > > > not
> > > > > > > > > > write
> > > > > > > > > > anything to the sink. I guess i am missing something here. 
> > > > > > > > > > Can
> > > > > > > > > > one of
> > > > > > > > > > you
> > > > > > > > > > gurus take a look and suggest what i am doing wrong?
> > > > > > > > > >  
> > > > > > > > > > Thanks,
> > > > > > > > > > Bhaskar
> > > > > > > > > >  
> > > > > > > > > > agent1.sources = tail
> > > > > > > > > > agent1.channels = MemoryChannel-2
> > > > > > > > > > agent1.sinks = svc_0_sink
> > > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > agent1.sources.tail.type = exec
> > > > > > > > > > agent1.sources.tail.command = tail -f /var/log/access.log
> > > > > > > > > > agent1.sources.tail.channels = MemoryChannel-2
> > > > > > > > > >  
> > > > > > > > > > agent1.sinks.svc_0_sink.type = FILE_ROLL
> > > > > > > > > > agent1.sinks.svc_0_sink.sink.directory=/flume_runtime/logs
> > > > > > > > > > agent1.sinks.svc_0_sink.rollInterval=0
> > > > > > > > > >  
> > > > > > > > > > agent1.channels.MemoryChannel-2.type = memory
> > > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > On Thu, Jun 14, 2012 at 4:26 AM, Guillaume Polaert
> > > > > > > > > > <gpola...@cyres.fr (mailto:gpola...@cyres.fr)>
> > > > > > > > > > wrote:
> > > > > > > > > > >  
> > > > > > > > > > > Hi Bhaskar,
> > > > > > > > > > >  
> > > > > > > > > > > This is the flume.conf (http://pastebin.com/WULgUuaf) 
> > > > > > > > > > > what I'm
> > > > > > > > > > > using.
> > > > > > > > > > > I have an avro server on the hadoop-m host and one agent 
> > > > > > > > > > > per
> > > > > > > > > > > node
> > > > > > > > > > > (slave
> > > > > > > > > > > hosts). Each agent send the ouput of a exec command to 
> > > > > > > > > > > avro
> > > > > > > > > > > server.
> > > > > > > > > > >  
> > > > > > > > > > > Host1 : exec -> memory -> avro (sink)
> > > > > > > > > > >  
> > > > > > > > > > > Host2 : exec -> memory -> avro
> > > > > > > > > > > >>>>>
> > > > > > > > > > > MainHost :
> > > > > > > > > > > avro
> > > > > > > > > > > (source) -> memory -> rolling file (local FS)
> > > > > > > > > > > ...
> > > > > > > > > > >  
> > > > > > > > > > > Host3 : exec -> memory -> avro
> > > > > > > > > > >  
> > > > > > > > > > >  
> > > > > > > > > > > Use your own exec command to read Apache log.
> > > > > > > > > > >  
> > > > > > > > > > > Guillaume Polaert | Cyrès Conseil
> > > > > > > > > > >  
> > > > > > > > > > > De : Bhaskar [mailto:bmar...@gmail.com]
> > > > > > > > > > > Envoyé : mercredi 13 juin 2012 19:16
> > > > > > > > > > > À : flume-user@incubator.apache.org 
> > > > > > > > > > > (mailto:flume-user@incubator.apache.org)
> > > > > > > > > > > Objet : Newbee question about flume 1.2 set up
> > > > > > > > > > >  
> > > > > > > > > > > Good Afternoon,
> > > > > > > > > > > I am a newbee to flume and read thru limited documentation
> > > > > > > > > > > available.
> > > > > > > > > > > I
> > > > > > > > > > > would like to set up the following to test out.
> > > > > > > > > > >  
> > > > > > > > > > > 1. Read apache access logs (as source)
> > > > > > > > > > > 2. Use memory channel
> > > > > > > > > > > 3. Write it to a NFS (or even local) file system
> > > > > > > > > > >  
> > > > > > > > > > > Can some one help me with the necessary configuration. I 
> > > > > > > > > > > am
> > > > > > > > > > > having
> > > > > > > > > > > difficult time to glean that information from available
> > > > > > > > > > > documentation.
> > > > > > > > > > > I am
> > > > > > > > > > > sure someone has done such test before and i appreciate 
> > > > > > > > > > > if you
> > > > > > > > > > > can
> > > > > > > > > > > pass on
> > > > > > > > > > > that information. Secondly, I also would like to stream 
> > > > > > > > > > > the
> > > > > > > > > > > logs
> > > > > > > > > > > to a
> > > > > > > > > > > remote server. Is that a log4j configuration or do i need 
> > > > > > > > > > > to
> > > > > > > > > > > run
> > > > > > > > > > > an
> > > > > > > > > > > agent
> > > > > > > > > > > on each host to do so? Any configuration examples would 
> > > > > > > > > > > be of
> > > > > > > > > > > great
> > > > > > > > > > > help.
> > > > > > > > > > >  
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Bhaskar
> > > > > > > > > >  
> > > > > > > > >  
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
>

Re: Newbee question about flume 1.2 set up

Reply via email to