Hello All,

I'm wondering if you could provide some guidance for me. One of the
inputs I'm working with batches several entries to a single event.
This is a lot simpler than my data but it provides an easy example.
For example:

timestamp - 5,4,3,2,1
timestamp - 9,7,5,5,6

If I tail the file this results in 2 events being generated. This
example has the data for 10 events.

Here is high level what I want to accomplish.
(web server - agent 1)
exec source tail -f /<some file path>
collector-client to (agent 2)

(collector - agent 2)
collector-server
Custom Interceptor (input 1 event, output n events)
Multiplex to
hdfs
hbase

An interceptor looked like the most logical spot for me to add this.
Is there a better place to add this functionality? Has anyone run into
a similar case?

Looking at the docs for Interceptor. intercept(List<Event> events) it
says "Output list of events. The size of output list MUST NOT BE
GREATER than the size of the input list (i.e. transformation and
removal ONLY)." which tells me not to emit more events than given.
intercept(Event event) only returns a single event so I can't use it
there either. Why is there a requirement to only return 1 for 1?

For now I'm implementing a custom source that will handle generating
multiple events from the events coming in on the web server. My
preference was to do this transformation on the collector agent before
I hand off to hdfs and hbase. I know another alternative would be to
implement custom RPC but I would prefer not to do that. I would prefer
to rely on what is currently available.

Thanks!
j

Reply via email to