Hello All, I'm wondering if you could provide some guidance for me. One of the inputs I'm working with batches several entries to a single event. This is a lot simpler than my data but it provides an easy example. For example:
timestamp - 5,4,3,2,1 timestamp - 9,7,5,5,6 If I tail the file this results in 2 events being generated. This example has the data for 10 events. Here is high level what I want to accomplish. (web server - agent 1) exec source tail -f /<some file path> collector-client to (agent 2) (collector - agent 2) collector-server Custom Interceptor (input 1 event, output n events) Multiplex to hdfs hbase An interceptor looked like the most logical spot for me to add this. Is there a better place to add this functionality? Has anyone run into a similar case? Looking at the docs for Interceptor. intercept(List<Event> events) it says "Output list of events. The size of output list MUST NOT BE GREATER than the size of the input list (i.e. transformation and removal ONLY)." which tells me not to emit more events than given. intercept(Event event) only returns a single event so I can't use it there either. Why is there a requirement to only return 1 for 1? For now I'm implementing a custom source that will handle generating multiple events from the events coming in on the web server. My preference was to do this transformation on the collector agent before I hand off to hdfs and hbase. I know another alternative would be to implement custom RPC but I would prefer not to do that. I would prefer to rely on what is currently available. Thanks! j
