Re: Newbee question about flume 1.2 set up

Bhaskar Wed, 20 Jun 2012 09:04:52 -0700

I wish similar patch got applied to RollingFileSink as well.  Currently
RollingFileSink does not reconstruct the path based on place holder values.
 This patch was done to HDFSEventSink.  Do you think i should request that
thru Jira?


Bhaskar

On Tue, Jun 19, 2012 at 10:59 PM, Mike Percy <mpe...@cloudera.com> wrote:

> Will just contributed an Interceptor to provide this out of the box:
>
> https://issues.apache.org/jira/browse/FLUME-1284
>
> Regards
> Mike
>
>
> On Tuesday, June 19, 2012 at 2:54 PM, Mohammad Tariq wrote:
>
> > There is no problem from your side..Have a look at this -
> >
> http://mail-archives.apache.org/mod_mbox/incubator-flume-user/201206.mbox/%3CCAGPLoJKLthyoecEYnJRscahe8q6i4kKH1ADsL3qoQCAQo=i...@mail.gmail.com%3E
> >
> > Regards,
> > Mohammad Tariq
> >
> >
> > On Wed, Jun 20, 2012 at 1:42 AM, Bhaskar <bmar...@gmail.com (mailto:
> bmar...@gmail.com)> wrote:
> > > Unfortunately, that part is not working as expected. Must be my mistake
> > > somewhere in the configuration. Here is my sink configuration.
> > >
> > > agent4.sinks.svc_0_sink.type = FILE_ROLL
> > > agent4.sinks.svc_0_sink.sink.directory=/var/logs/agent4/%{host}
> > > agent4.sinks.svc_0_sink.rollInterval=5400
> > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2
> > >
> > > Any thoughts on how to define host specific directory/file?
> > >
> > > Bhaskar
> > >
> > > On Tue, Jun 19, 2012 at 10:01 AM, Mohammad Tariq 
> > > <donta...@gmail.com(mailto:
> donta...@gmail.com)> wrote:
> > > >
> > > > Hello Bhaskar,
> > > >
> > > > That's great...And we can use "%{host}" as the escape
> > > > sequence to prefix our filenames(am I getting you correctly???).And I
> > > > am waiting anxiously for your guide as I am still a newbie..:-)
> > > >
> > > > Regards,
> > > > Mohammad Tariq
> > > >
> > > >
> > > > On Tue, Jun 19, 2012 at 7:16 PM, Bhaskar <bmar...@gmail.com (mailto:
> bmar...@gmail.com)> wrote:
> > > > > Thank you guys for the responses. I actually was able to get around
> > > > > this
> > > > > problem by tinkering around with my setting. I finally ended up
> with a
> > > > > capacity of 10000 and commented out transactionCapacity (i
> originally
> > > > > set it
> > > > > to 10) and it started working. Thanks for the insight. It took me a
> > > > > bit of
> > > > > time to figure out the inner workings of AVRO to get it to send
> data in
> > > > > correct format. So, i got over that hump:-). Here is my flow for
> POC.
> > > > >
> > > > > Host A agent --> Source tail exec --> AVRO Client Sink --> jdbc
> channel
> > > > > (flume-ng avro-client -H <<Host>> -p <<port>> -F <<file to read>>
> > > > > --conf
> > > > > ../conf/)
> > > > > Host B agent --> Source tail exec --> AVRO Client Sink --> jdbc
> channel
> > > > > (flume-ng avro-client -H <<Host>> -p <<port>> -F <<file to read>>
> > > > > --conf
> > > > > ../conf/)
> > > > > Host C agent --> avro-collector source --> file sink to local
> directory
> > > > > --
> > > > > Memory channel
> > > > >
> > > > > The issue i am running into is, I am unable to uniquely identify
> the
> > > > > source
> > > > > of the log in the sink (means the log events from Host A and Host
> B are
> > > > > combined into the same log on the disk and mixed up). Is there a
> way to
> > > > > provide unique identifier from the source so that we can track the
> > > > > origin of
> > > > > the log? I am hoping to see in my sink log,
> > > > >
> > > > > Host A -- some log entry
> > > > > Host B -- Some log entry etc
> > > > >
> > > > > Is this feasible or are there any alternative mechanisms to achieve
> > > > > this? I
> > > > > am putting together a new bee guide that might help answer some of
> these
> > > > > questions for others as i explore this architecture.
> > > > >
> > > > > As always thanks for your assistance,
> > > > > Bhaskar
> > > > >
> > > > >
> > > > > On Tue, Jun 19, 2012 at 2:59 AM, Juhani Connolly
> > > > > <juhani_conno...@cyberagent.co.jp (mailto:
> juhani_conno...@cyberagent.co.jp)> wrote:
> > > > > >
> > > > > > Hello Bhaskar,
> > > > > >
> > > > > > Using Avro is generally the recommended method to handle
> multi-hop
> > > > > > flows,
> > > > > > so no concerns there.
> > > > > >
> > > > > > Have you tried this setup using memory channels instead of jdbc?
> Last
> > > > > > time
> > > > > > I tested it, the JDBC channel had poor throughput, so you may be
> > > > > > getting a
> > > > > > logjam somewhere. How much data is getting entered into your
> logfile?
> > > > > > Try
> > > > > > raising the capacity on your jdbc channel by a lot(10000?). With
> a
> > > > > > capacity
> > > > > > of 10, if the reading side(host b) isn't polling frequently
> enough,
> > > > > > there's
> > > > > > going to be problems. This is probably why you get the "failed to
> > > > > > persist
> > > > > > event". As far as FLUME-1259 is concerned, that should only be
> > > > > > happening if
> > > > > > bad data is being sent. You're not sending anything else to the
> same
> > > > > > port
> > > > > > are you? Make sure that only the source and sink are set to that
> port
> > > > > > and
> > > > > > that nothing else is.
> > > > > >
> > > > > > If the problem continues, please post a chunk of the logs
> leading up to
> > > > > > the OOM error(the full trace for the cause should be enough)
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 06/16/2012 12:01 AM, Bhaskar wrote:
> > > > > >
> > > > > > Sorry to be a pest with stream of questions. I think i am going
> two
> > > > > > steps
> > > > > > forward and four steps back:-). After my first successful
> attempt, i
> > > > > > tried
> > > > > > running flume with the following flow:
> > > > > >
> > > > > > 1. HostA
> > > > > > -- Source is tail web server log
> > > > > > -- channel jdbc
> > > > > > -- sink is AVRO collection on Host B
> > > > > > Configuraiton:
> > > > > > agent3.sources = tail
> > > > > > agent3.sinks = avro-forward-sink
> > > > > > agent3.channels = jdbc-channel
> > > > > >
> > > > > > # Define source flow
> > > > > > agent3.sources.tail.type = exec
> > > > > > agent3.sources.tail.command = tail -f /common/log/access.log
> > > > > > agent3.sources.tail.channels = jdbc-channel
> > > > > >
> > > > > > # define the flow
> > > > > > agent3.sinks.avro-forward-sink.channel = jdbc-channel
> > > > > >
> > > > > > # avro sink properties
> > > > > > agent3.sources.avro-forward-sink.type = avro
> > > > > > agent3.sources.avro-forward-sink.hostname = <<IP Address>>
> > > > > > agent3.sources.avro-forward-sink.port = <<PORT>>
> > > > > >
> > > > > > # Define channels
> > > > > > agent3.channels.jdbc-channel.type = jdbc
> > > > > > agent3.channels.jdbc-channel.maximum.capacity = 10
> > > > > > agent3.channels.jdbc-channel.maximum.connections = 2
> > > > > >
> > > > > >
> > > > > > 2. HostB
> > > > > > -- Source is AVRO collection
> > > > > > -- channel is memory
> > > > > > -- sink is local file system
> > > > > >
> > > > > > Configuration:
> > > > > > # list sources, sinks and channels in the agent4
> > > > > > agent4.sources = avro-collection-source
> > > > > > agent4.sinks = svc_0_sink
> > > > > > agent4.channels = MemoryChannel-2
> > > > > >
> > > > > > # define the flow
> > > > > > agent4.sources.avro-collection-source.channels = MemoryChannel-2
> > > > > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2
> > > > > >
> > > > > > # avro sink properties
> > > > > > agent4.sources.avro-collection-source.type = avro
> > > > > > agent4.sources.avro-collection-source.bind = <<IP Address>>
> > > > > > agent4.sources.avro-collection-source.port = <<PORT>>
> > > > > >
> > > > > > agent4.sinks.svc_0_sink.type = FILE_ROLL
> > > > > > agent4.sinks.svc_0_sink.sink.directory=/logs/agent4
> > > > > > agent4.sinks.svc_0_sink.rollInterval=600
> > > > > > agent4.sinks.svc_0_sink.channel = MemoryChannel-2
> > > > > >
> > > > > > agent4.channels.MemoryChannel-2.type = memory
> > > > > > agent4.channels.MemoryChannel-2.capacity = 100
> > > > > > agent4.channels.MemoryChannel-2.transactionCapacity = 10
> > > > > >
> > > > > >
> > > > > > Basically i am trying to tail a file on one host, stream it to
> another
> > > > > > host running sink. During the trial run, the configuration is
> loaded
> > > > > > fine
> > > > > > and i see the channels created fine. I see an exception from the
> jdbc
> > > > > > channel first (Failed to persist event). I am getting a java heap
> > > > > > space OOM
> > > > > > exception from Host B when Host A attempts to write.
> > > > > >
> > > > > > 2012-06-15 10:31:44,503 WARN ipc.NettyServer: Unexpected
> exception from
> > > > > > downstream.
> > > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > >
> > > > > > This issue was already
> > > > > > reported https://issues.apache.org/jira/browse/FLUME-1259 but i
> am not
> > > > > > sure
> > > > > > if there is a work around to this problem. I have couple
> questions:
> > > > > >
> > > > > > 1. Am i force fitting a wrong solution here using AVRO?
> > > > > > 2. if so, what would be a right solution for streaming data from
> Host
> > > > > > A
> > > > > > to Host B (or thru intermediaries)?
> > > > > >
> > > > > > Thanks,
> > > > > > Bhaskar
> > > > > >
> > > > > > On Thu, Jun 14, 2012 at 4:31 PM, Mohammad Tariq <
> donta...@gmail.com (mailto:donta...@gmail.com)>
> > > > > > wrote:
> > > > > > >
> > > > > > > Since you are thinking of using multi-hop flow I would suggest
> to go
> > > > > > > for "JDBC Channel" as there is higher chance of error than
> single-hop
> > > > > > > flow and in JDBC Channel events are stored in a persistent
> storage
> > > > > > > that’s backed by a database. For detailed guidelines you can
> refer
> > > > > > > Flume 1.x User Guide at -
> > > > > > >
> > > > > > >
> > > > > > >
> https://people.apache.org/~mpercy/flume/flume-1.2.0-incubating-SNAPSHOT/docs/FlumeUserGuide.html
> > > > > > >
> > > > > > > Regards,
> > > > > > > Mohammad Tariq
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 15, 2012 at 12:46 AM, Bhaskar 
> > > > > > > <bmar...@gmail.com(mailto:
> bmar...@gmail.com)> wrote:
> > > > > > > > Hi Mohammad,
> > > > > > > > Thanks for the pointer there. Do you think using a message
> queue
> > > > > > > > (like
> > > > > > > > rabbitmq) would be a choice of communication channel between
> each
> > > > > > > > hop?
> > > > > > > > i am
> > > > > > > > struggling to get a handle on how i need to configure my
> sink in
> > > > > > > > intermediary hops in a multi-hop flow. Appreciate any
> > > > > > > > guidance/examples.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Bhaskar
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 14, 2012 at 1:57 PM, Mohammad Tariq <
> donta...@gmail.com (mailto:donta...@gmail.com)>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hello Bhaskar,
> > > > > > > > >
> > > > > > > > > That's great..And the best approach to stream logs depends
> > > > > > > > > upon
> > > > > > > > > the type of source you want to watch for..And by looking
> at your
> > > > > > > > > usecase, I would suggest to go for "multi-hop" flows where
> events
> > > > > > > > > travel through multiple agents before reaching the final
> > > > > > > > > destination.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Mohammad Tariq
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jun 14, 2012 at 10:48 PM, Bhaskar <
> bmar...@gmail.com (mailto:bmar...@gmail.com)>
> > > > > > > > > wrote:
> > > > > > > > > > I know what i am missing:-) I missed connecting the sink
> with
> > > > > > > > > > the
> > > > > > > > > > channel.
> > > > > > > > > > My small POC works now and i am able to view the
> streamed logs.
> > > > > > > > > > Thank
> > > > > > > > > > you
> > > > > > > > > > all for the guidance and patience in answering all
> questions.
> > > > > > > > > > So,
> > > > > > > > > > whats
> > > > > > > > > > the
> > > > > > > > > > best approach to stream logs from other hosts? Basically
> my next
> > > > > > > > > > task
> > > > > > > > > > would
> > > > > > > > > > be to set up collector (sort of) model to stream logs to
> > > > > > > > > > intermediary
> > > > > > > > > > and
> > > > > > > > > > then stream from collector to a sink location. I'd
> appreciate
> > > > > > > > > > any
> > > > > > > > > > thoughts/guidance in this regard.
> > > > > > > > > >
> > > > > > > > > > Bhaskar
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 14, 2012 at 12:52 PM, Bhaskar <
> bmar...@gmail.com (mailto:bmar...@gmail.com)>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > For testing purposes, I tried with the following
> configuration
> > > > > > > > > > > without
> > > > > > > > > > > much luck. I see that the process started fine but it
> just does
> > > > > > > > > > > not
> > > > > > > > > > > write
> > > > > > > > > > > anything to the sink. I guess i am missing something
> here. Can
> > > > > > > > > > > one of
> > > > > > > > > > > you
> > > > > > > > > > > gurus take a look and suggest what i am doing wrong?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Bhaskar
> > > > > > > > > > >
> > > > > > > > > > > agent1.sources = tail
> > > > > > > > > > > agent1.channels = MemoryChannel-2
> > > > > > > > > > > agent1.sinks = svc_0_sink
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > agent1.sources.tail.type = exec
> > > > > > > > > > > agent1.sources.tail.command = tail -f
> /var/log/access.log
> > > > > > > > > > > agent1.sources.tail.channels = MemoryChannel-2
> > > > > > > > > > >
> > > > > > > > > > > agent1.sinks.svc_0_sink.type = FILE_ROLL
> > > > > > > > > > >
> agent1.sinks.svc_0_sink.sink.directory=/flume_runtime/logs
> > > > > > > > > > > agent1.sinks.svc_0_sink.rollInterval=0
> > > > > > > > > > >
> > > > > > > > > > > agent1.channels.MemoryChannel-2.type = memory
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 14, 2012 at 4:26 AM, Guillaume Polaert
> > > > > > > > > > > <gpola...@cyres.fr (mailto:gpola...@cyres.fr)>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bhaskar,
> > > > > > > > > > > >
> > > > > > > > > > > > This is the flume.conf (http://pastebin.com/WULgUuaf)
> what I'm
> > > > > > > > > > > > using.
> > > > > > > > > > > > I have an avro server on the hadoop-m host and one
> agent per
> > > > > > > > > > > > node
> > > > > > > > > > > > (slave
> > > > > > > > > > > > hosts). Each agent send the ouput of a exec command
> to avro
> > > > > > > > > > > > server.
> > > > > > > > > > > >
> > > > > > > > > > > > Host1 : exec -> memory -> avro (sink)
> > > > > > > > > > > >
> > > > > > > > > > > > Host2 : exec -> memory -> avro
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > MainHost :
> > > > > > > > > > > > avro
> > > > > > > > > > > > (source) -> memory -> rolling file (local FS)
> > > > > > > > > > > > ...
> > > > > > > > > > > >
> > > > > > > > > > > > Host3 : exec -> memory -> avro
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Use your own exec command to read Apache log.
> > > > > > > > > > > >
> > > > > > > > > > > > Guillaume Polaert | Cyrès Conseil
> > > > > > > > > > > >
> > > > > > > > > > > > De : Bhaskar [mailto:bmar...@gmail.com]
> > > > > > > > > > > > Envoyé : mercredi 13 juin 2012 19:16
> > > > > > > > > > > > À : flume-user@incubator.apache.org (mailto:
> flume-user@incubator.apache.org)
> > > > > > > > > > > > Objet : Newbee question about flume 1.2 set up
> > > > > > > > > > > >
> > > > > > > > > > > > Good Afternoon,
> > > > > > > > > > > > I am a newbee to flume and read thru limited
> documentation
> > > > > > > > > > > > available.
> > > > > > > > > > > > I
> > > > > > > > > > > > would like to set up the following to test out.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Read apache access logs (as source)
> > > > > > > > > > > > 2. Use memory channel
> > > > > > > > > > > > 3. Write it to a NFS (or even local) file system
> > > > > > > > > > > >
> > > > > > > > > > > > Can some one help me with the necessary
> configuration. I am
> > > > > > > > > > > > having
> > > > > > > > > > > > difficult time to glean that information from
> available
> > > > > > > > > > > > documentation.
> > > > > > > > > > > > I am
> > > > > > > > > > > > sure someone has done such test before and i
> appreciate if you
> > > > > > > > > > > > can
> > > > > > > > > > > > pass on
> > > > > > > > > > > > that information. Secondly, I also would like to
> stream the
> > > > > > > > > > > > logs
> > > > > > > > > > > > to a
> > > > > > > > > > > > remote server. Is that a log4j configuration or do i
> need to
> > > > > > > > > > > > run
> > > > > > > > > > > > an
> > > > > > > > > > > > agent
> > > > > > > > > > > > on each host to do so? Any configuration examples
> would be of
> > > > > > > > > > > > great
> > > > > > > > > > > > help.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Bhaskar
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>

Re: Newbee question about flume 1.2 set up

Reply via email to