Re: Some questions for starters

werner mueller Mon, 06 Oct 2008 02:27:26 -0700

Hallo

Thanks for the reply!

I'll try to be more specific (which is a bit diffucult atm).

The contract example you gave comes quite close. For example:
if a contract from this month is not in a 5% range of what is was in the
last month, give an alarm.

similar things could be done on phone numbers or on numbers of connections.

the thing i know no one has an idea of: is it 5% or 25%?
these numbers are only clear after at least a year. and i dont assume
anyone will tweak the system continousily.
on the other hand: from december to january a jump of 25% may be
absolutly normal (some companies 'stop' working in december as the whole
staff must go on vacation). so beside the change of cost also a timely
context should be reflected in the limit warnings.

looking at a single contract seems a bad thing. So I could look at all
contracts of the same company, try to figure out some average change in
costs. If 80% of the contracts have an increase the alarm limit could be
higher. But here again: why 80% (its a random number right now)?
This would get easier once a full 12 month data set is stored. until
then... try and error. but thats hard to sell. (espacially the time
required for tweaking is hard to specify).

beside the price of the contract there is information like: number of
connections, time of the single connections (people work between 8-17h),
number of products used (sms, telephony on foreign countries, data
options, etc.)

right now i dont know how to figure out the numbers to declare behaviour
as 'unusual' based on limits that differ from company to company (a
plumber may have different limits that a consulting company).

of course there is the option: let the user choose the limits. this has
two drawbacks:
 - where should the user know the limits from? and
 - some user have to look at thousands of contracts.
so i would prefer the system to work on its own (as much as possible).

i don't know whether there are some 'weighting' algorithms out there
that do similar things. so any hint may help a lot.

The whole alarming feature will be part of the web-application. so
whatever we need to do on the cluster must be done via JDBC (from
do-it-yourself to call a stored procedure). i dont know if we are
allowed to run java applications inside the DB directly. I know we
execute some tools to import data that run on the db hosts but in a
different JVM.
But actually the system requiremts question i have goes more into: do i
need a computing cloud and MapReduce or are algorithms to learn from
data independant of such things?

Thanks :)
kind regards

werner

Grant Ingersoll schrieb:
> 
> On Sep 28, 2008, at 6:26 PM, werner mueller wrote:
> 
>> Hallo's
>>
>> finally i find some time to ask boring questions :)
>>
>> I some sort of stumbled across the mahout project at apachecon08 in
>> amsterdam. But i havent found the time for looking into it deeply.
>>
>> I would like to ask for some hints / links / directions for a
>> 'predictions' feature. i read through the mahout wiki and found some
>> interesting links. but since i com more from the applications part and i
>> am not that much into databases i need some help getting started.
>>
>> we develop a reporting application for a telcommunication company.
>> mainly we store data in an oracle cluster. it consists of a star-schema.
>> the application mainly offers to create reports on two data sources:
>> costs and traffic. the data amount is about 1-2 terabytes.
>>
>> the idea came up to implement some 'alarming' features. so customers
>> could set up some limits for contracts, phone numbers etc to get
>> notified once the limits are reached or the data 'behaves strange' (too
>> strong increases for a period, other ideas to come...).
> 
> Can you give an example?  It sounds like you simply want the user to say
> "if contracts > X, then alarm", but I gather not, since you are asking
> here.  Or are you looking for the user
> to not be involved in setting the thresholds, but instead to learn from
> past examples where there was a problem?  For instance, you have
> failures from before, but you don't particularly know why it failed
> (i.e. what features caused the problem).
> 
>>
>>
>> i would like to ask if there is something of use in mahout or whether
>> you would recommend to keep such features 'simple' on a statistical
>> basis and not use learning techniques at all?
> 
> Well, simple is usually better, if it solves your problem.
> 
>>
>>
>> on the other hand the more boring questions: do i need a hadoop cluster
>> for your implementations or could i run them on oracle based clusters as
>> well?
> 
> I don't know enough about Oracle clusters to render an opinion.  If your
> asking if Mahout will run inside the Oracle JVM, I'm guessing that would
> be a stretch at this point, but I don't have anything to base that on.
> 
> -Grant
> 
> 
>

Re: Some questions for starters

Reply via email to