Hari,
I was able to move forward on this with a config inspired by what you have
mentioned below.
patched avroCLIClient to accept headers, this makes testing flume easy out
of the box.
Appreciate your feedback and review comments on this -
https://issues.apache.org/jira/browse/FLUME-1096
Opens
----------
1. I want to create files in HDFS like
/flume-data/<streamName>/YYYY/MM/DD/HH/MN/<filename> to roll every minute.
2. With the following configuration when we'll run multiple collectors with
the same configuration they might all try to write to the same file(within
the same minute). What's the recommendation to support this use-case?
Sample configuration for multiplexing
--------------------------------------------------
host2.sources = src1
host2.sinks = sink1 sink2 sink3
host2.channels = ch3 ch4 ch5
host2.channels.ch3.type = memory
host2.channels.ch4.type = memory
host2.channels.ch5.type = memory
host2.sources.src1.type = avro
host2.sources.src1.bind = 0.0.0.0
host2.sources.src1.port = 41415
host2.sources.src1.channels = ch3 ch4 ch5
host2.sources.src1.selector.type = multiplexing
host2.sources.src1.selector.header = streamName
host2.sources.src1.selector.mapping.rr = ch3
host2.sources.src1.selector.mapping.billing = ch4
host2.sources.src1.selector.default = ch5
host2.sinks.sink1.type = hdfs
host2.sinks.sink1.hdfs.path = hdfs://localhost
host2.sinks.sink1.hdfs.filePrefix = flume-data/%{streamName}/%D
host2.sinks.sink1.channel = ch3
host2.sinks.sink2.type = hdfs
host2.sinks.sink2.hdfs.path = hdfs://localhost
host2.sinks.sink2.hdfs.filePrefix = flume-data/%{streamName}/%D
host2.sinks.sink2.channel = ch4
host2.sinks.sink3.type = null
host2.sinks.sink3.channel = ch5
- Inder
On Tue, Apr 10, 2012 at 12:45 PM, Hari Shreedharan <
[email protected]> wrote:
> Hi Inder,
>
> Do you mean header in an event, or file headers? You can use the
> Multiplexing Channel Selector to select a channel wired to your HDFS Sink.
> So if you have configuration like this:
>
> host2.sources = src1
> host2.sinks = sink1 sink2
> host2.channels = ch1 ch2
> host2.sources.src1.type = seq
> host2.sources.src1.channels = ch1 ch2
> host2.sources.src1.selector.type = multiplexing
> host2.sources.src1.selector.header = streamName
> host2.sources.src1.selector.mapping.tohdfs = ch1
> host2.sources.src1.selector.mapping.tonull = ch2
>
> host2.sources.src1.selector.default = ch1
>
> host2.sinks.sink1.type = hdfs
> host2.sinks.sink1.channel = ch1
> ……<hdfs sink configuration>
> host2.sinks.sink2.type = null
> host2.sinks.sink2.channel = ch2
>
> ---
>
> Note that you need to make sure the hdfs sink is configured correctly. In
> this case, when an event comes in with the header named "streamName", the
> following happens:
> if value = "tohdfs" for "streamName", then it goes to channel, ch1 which
> is in turn wired to sink1, and if value = "tonull" for the same header,
> then the event will be routed to ch2, and to sink2.
>
> Similarly, you can have multiple HDFS sinks configured which write to
> different sets of files, the only thing you need to do is to make sure the
> required mappings are done in the channel selector configuration.
>
> If you need to do this on the sink side, using data within the event,
> rather than based on event headers - there is no built in solution, you
> will need to add support for it.
>
> One solution would be to pre-process the events, and create event headers
> based on the routing you want to do, insert the headers into the events and
> use the above method.
> What source are you using? If you are using the AvroSource to write data
> into Flume, you can insert the headers without a problem.
>
> Hope this helps.
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Monday, April 9, 2012 at 11:21 PM, Inder Pall wrote:
>
> > On Tue, Apr 10, 2012 at 11:33 AM, Inder Pall <[email protected](mailto:
> [email protected])> wrote:
> >
> > > sending to flume dev's as no response from flume user community.
> > >
> > > - inder
> > >
> > > On 04/09/2012 01:23 PM, Inder Pall wrote:
> > > > >
> > > > > Hello Flume User Community,
> > > > >
> > > > > I want messages to go to files in HDFS based on a header like
> > > > > "streamName". Is it supported?
> > > > > Has anyone tried this before? If so, how?
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > - Inder
> > > > > Tech Platforms @Inmobi
> > > > > Linkedin - http://goo.gl/eR4Ub
> > > > >
> > > > >
> > > > > --
> > > > > Marcos Luis Ortíz Valmaseda (@marcosluis2186)
> > > > > Data Engineer at UCI
> > > > > http://marcosluis2186.posterous.com
> > > > >
> > > > >
> > > > > <http://www.uci.cu/>
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > - Inder
> > > > Tech Platforms @Inmobi
> > > > Linkedin - http://goo.gl/eR4Ub
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > - Inder
> > > Tech Platforms @Inmobi
> > > Linkedin - http://goo.gl/eR4Ub
> > >
> >
> >
> >
> >
> > --
> > Thanks,
> > - Inder
> > Tech Platforms @Inmobi
> > Linkedin - http://goo.gl/eR4Ub
> >
> >
>
>
>
--
Thanks,
- Inder
Tech Platforms @Inmobi
Linkedin - http://goo.gl/eR4Ub