Re: [jira] [Created] (MAHOUT-1425) SGD classifier example with bank marketing dataset

Ted Dunning Sun, 23 Feb 2014 15:43:20 -0800

On Sun, Feb 23, 2014 at 10:51 AM, Frank Scholten (JIRA) <j...@apache.org>wrote:


> Ted: can you tell a bit more about the log transforms? Some of them are
> just Math.log while others are more complex expressions.


The increased complexity comes up when there are zero or small negative
values.

In general, monetary values are commonly transformed with a log during
training of a logistic regression model.  Often you retain the original as
well.

The motivation for the log is that it is common for the structure of the
problem to depend as much on relative differences rather than absolute
differences.  Thus, $80 is different from $100 in about the same way that
$800 is different from $1000.  This makes sense if you are talking about
what makes a material difference.

Of course, if you are talking about net profits, then you may want features
that look like log(a-b) instead.  What happens when that goes negative is a
bit of a can of worms in terms of feature design.  Sometimes, a small
reference value is defined and a value like w(a-b) log w(a-b) is used where
w(x) = x-\gamma if x > \gamma, x+\gamma if x < -\gamma and 0 else.

Re: [jira] [Created] (MAHOUT-1425) SGD classifier example with bank marketing dataset

Reply via email to