Hi Jeremy, On Mon, Aug 13, 2012 at 9:55 AM, Jeremy Custenborder < [email protected]> wrote: > > > > I believe you are just > > trying to work around a limitation of the exec source, since it appears > > you're describing a serialization issue." > > > Alternatively, one could use an HBase serializer to generate multiple > > increment / decrement operations, and just log the original line in HDFS > > (or use an EventSerializer). > > The is what I'm working towards. I want a 1 for 1 entry in hdfs but > increment counters in hbase >
HBase serializer can generate multiple operations per Event, and the HDFS serializer could generate whatever output Hive expects as well. > Given this I was just planning on emitting an event with the body I > was going to use in hive early in the pipeline. Send the same data to > hdfs and hbase. Then use a serializer on the hbase side to increment > the counters. This would allow me to add data to hdfs in the format > I'm planning on consuming it with without managing two serializers. My > plans for the hbase serializer was literally generate key, increment > per record based on the input. So only a couple lines of code. > Yeah, if you are doing much parsing in your serializers it's going to be a bit more complex. > I pondered this a bit over the last day or so and I'm kind of lukewarm on > > adding preconditions checks at this time. The reason I didn't do it > > initially is that while I wanted a particular contract for that > component, > > in order to make Interceptors viable to maintain and understand with the > > current design of the Flume core, I wasn't sure if it would be sufficient > > for all future use cases. So if someone wants to do something that breaks > > that contract, then they are "on their own", doing stuff that may break > in > > future implementations. If they're willing to accept that risk then they > > have the freedom to maybe do something novel and awesome, which might > > prompt us to add a different kind of extension mechanism in the future to > > support whatever that use case is. > > I think there should be an approved method for this case. A different > extension that could perform processing like this could be helpful. To > me when I looked at an interceptor I thought of using it as a > replacement for a decorator in the old version of flume. We have a lot > of code that will take a log entry and replace the body with a > protocol buffer representation. I prefer to run this code on an > upstream tier from the web server. Interceptors would work fine for > the one in one out case. > Have you considered using an Interceptor or a custom source to generate a single event that has a series of timestamps within it? You could use protobufs for serialization of that data structure. Since you have multiple timestamps / timings on the same log line, I wonder if it isn't a single "event" with multiple facets and this isn't just a semantics thing. Regards, Mike
