We're using the SocketTeeWriter to watch web server logs and using that for near-real-time provisioning. The data rate is quite small, and some error is OK. We could have done it without Chukwa, but we like having a durable copy for later analysis.
--Ari On Tue, Nov 3, 2009 at 7:17 PM, Jerome Boulon <[email protected]> wrote: > Hi, > I agree with Ari, the post-processing should be on another process/machine > since we don't want to take more time/cpu/mem on the collector side. > > Ari, could you give us some details on you're using the SocketTeeWriter? > Thanks, > /Jerome. > > On 11/3/09 5:27 PM, "Thushara Wijeratna" <[email protected]> wrote: > >> yeah, that makes sense. i don't have a strong argument, except it >> might be a tad bit easier to integrate alerting to the system. >> swatch is pretty good, however, for custom processing, for each >> pattern matched, a separate process needs to be run. if alerts are >> rare, as is generally the case, that is not a big problem. >> one reason i'm considering Chukwa instead of swatch is that it >> centralizes the input logs at the collector - swatch AFAIK doesn't >> perform any centralization of logs. >> >> thanks, >> thushara >> >> On Tue, Nov 3, 2009 at 3:51 PM, Ariel Rabkin <[email protected]> wrote: >>> What you describe is certainly doable. I'm not sure what the use case >>> is, though. >>> >>> The core goal for Chukwa is to facilitate MapReduce processing of >>> logs. The idea of the SocketTeeWriter is to get a "sneak peek" at >>> data, before it gets stored to HDFS. If collectors crash or get >>> overloaded, data can get processed more than once by collectors. So >>> there's a real cost to the real-time path. >>> >>> One of the main benefits of SocketTee is that the processing can >>> happen in a separate process, or even on a separate machine. >>> Integrating the pattern-matching in the pipeline is certainly doable, >>> but it's not clear to me that that's an architecture we want to >>> encourage or commit to. >>> >>> If people want Swatch, they know where to find it. What's the argument >>> for needing to emulate it, real-time, in Chukwa? >>> >>> --Ari >>> >>> On Tue, Nov 3, 2009 at 3:48 PM, Thushara Wijeratna <[email protected]> wrote: >>>> Would it be useful to provide something similar to the Swatch Log >>>> monitoring for Chukwa? >>>> http://www.linuxjournal.com/article/4776 >>>> >>>> Currently, we can listen to port 9094 (after running a >>>> SocketTeeWriter), and handle each input line. >>>> I'm wondering whether there will be a value add in creating some more >>>> infra-structure code in Chukwa that will: >>>> >>>> 1. do some regular expression parsing and filter the lines with the >>>> alert condition(s) >>>> 2. perform some standard actions, like email etc >>>> 3. provide an interface to perform custom handling for the user >>>> >>>> The basic core will be someting like this: >>>> >>>> Interface AlertCallback { >>>> >>>> boolean handle(String alertExp, String line); >>>> >>>> } >>>> >>>> Class AlertWriter extends PipelinableWriter { >>>> private String[] alertExps; >>>> private AlertCallback alertCB; >>>> >>>> public AlertWriter(String[] alertExps, AlertCallback alertCB); >>>> } >>>> >>>> It seems like most of the plumbing is already there, exposed in >>>> SocketTeeWriter class, for ex: Filter class. >>>> If you all think it is a good idea, I can help with this. >>>> >>>> thanks, >>>> thushara >>>> >>> >>> >>> >>> -- >>> Ari Rabkin [email protected] >>> UC Berkeley Computer Science Department >>> > > -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
