We use chukwa for near-real time trending, conceptually similar to near-real time anomaly detection.
We use Chukwa agents, collectors and Demux to collect log data in 5 minute increments which we then run MR jobs on, as Ari describes. It works well for us. On Sun, May 29, 2011 at 10:02 AM, Ariel Rabkin <[email protected]> wrote: > My impression is that web log analysis is the main use that people are > putting Chukwa to. > The idea is that you scoop up web logs, throw them into HDFS, and then > run Pig jobs. > > --Ari > > On Sun, May 29, 2011 at 4:39 AM, Amos Shapira <[email protected]> > wrote: > > In case this interests anyone - I'm following Chukwa for such purposes > too. > > Not just Google Analytics- like but also hoping to use it for near real > time > > anomaly detection... > > > > On 29 May 2011 18:19, Nikola Veber <[email protected]> wrote: > >> > >> Hello, > >> > >> I have just discovered Chukwa, and after the initial feeling that it > >> would be a great tool to process large quantities of web-logs and > >> generate statistics like google analytics and co, I started searching > >> the web for hints - but I couldn't find any clue regarding this. > >> > >> Has anyone tried using Chukwa for Web-Analytics, or do you know any > >> a-priori limitations which speak against using it in this manner? > >> > >> > >> Thanks, > >> NIkola > > > > > > > > -- > Ari Rabkin [email protected] > UC Berkeley Computer Science Department >
