Re: Mahout Recommendation for MySQL Data (Network Trending)

Tim Bass Wed, 01 Apr 2009 13:09:12 -0700

This is exciting.  To clarify, experimentation is perfectly ok.   We
have a production web server where
we collect and store time series data (on nearly 350 metrics) in mysql
(on a different server) so we are not looking for
production code, but an experimental, collaborative, open effort.


I like the idea of clustering of mixture models and, think with a bit
of effort, it would
not be too difficult to create initial first order behavioral models.
 The good news is that we use zabbix, with is open
source collection into mysql, to collect the data, and zabbix has good
time-series charting features so it is easy
to visualize the data, see anomalies, etc.

Thanks so much for the interest.    We run the mysql database with all
the data on ubuntu linux, so
what might be the next steps?

Yours faithfully, Tim



On Wed, Apr 1, 2009 at 11:29 PM, Jeff Eastman
<j...@windwardsolutions.com> wrote:
> Actually, the Dirichlet implementation in 0.1 *is* a parallel Hadoop version
> but it has a packaging issue that needs to be resolved to run it there. See
> my earlier thread about Dirichlet Example Class Not Found Exception. Your
> application is exactly the kind of problem that I was investigating when Ted
> suggested the Dirichlet approach. I would be very interested in seeing if we
> could apply it to your problem.
>
> Jeff
>
> Ted Dunning wrote:
>>
>> The short answer is no.
>>
>> The slightly longer answer is a highly qualified yes.
>>
>> There is an algorithm very recently in Mahout that does non-parametric
>> clustering of mixture models.  If you can define a reasonable form of
>> models
>> for your time series, then this algorithm would plausibly give you a set
>> of
>> models that would reasonably describe your original data.  Some of the
>> models in this mixture would represent various classes of normal events,
>> some would represent abnormal events in your training data (if there are
>> any).  There would also be a set of probability scores for each time
>> series
>> that tells you how well the mixture model describes that particular time
>> series.  Anomaly detection could be done by finding new events that are
>> poorly described by the mixture model defined by the training data.
>>
>> That said, the software is NOT production code.  Getting it to work on new
>> kinds of data with new kinds of models and learning to interpret the
>> output
>> is somewhat of a research project.  The code itself is relatively new and
>> you would be one of the first users.  Don't expect to go into production
>> next week.  Or next month.  The current code is also sequential in nature
>> rather than a parallel implementation.  A parallel implementation is being
>> worked on and Mahout contributors and committers (including me!) would be
>> happy to help you accelerate that development.
>>
>> The counter-caveat is that this could give you a state of the art anomaly
>> detector with contributions from you.
>>
>> On Wed, Apr 1, 2009 at 12:02 AM, Tim Bass <tim.silkr...@gmail.com> wrote:
>>
>>
>>>
>>> ... full of time series data about network and server events...   mine
>>> the
>>> data ..
>>> for "normal" and "abnormal" patterns.
>>>
>>> Is there an algorithm or method in Mahout ready to test in this type
>>> of NMS environment?
>>>
>>>
>>
>>
>>
>>
>
>

Re: Mahout Recommendation for MySQL Data (Network Trending)

Reply via email to