On Sun, Feb 23, 2014 at 10:51 AM, Frank Scholten (JIRA) <j...@apache.org>wrote:
> Ted: can you tell a bit more about the log transforms? Some of them are > just Math.log while others are more complex expressions. The increased complexity comes up when there are zero or small negative values. In general, monetary values are commonly transformed with a log during training of a logistic regression model. Often you retain the original as well. The motivation for the log is that it is common for the structure of the problem to depend as much on relative differences rather than absolute differences. Thus, $80 is different from $100 in about the same way that $800 is different from $1000. This makes sense if you are talking about what makes a material difference. Of course, if you are talking about net profits, then you may want features that look like log(a-b) instead. What happens when that goes negative is a bit of a can of worms in terms of feature design. Sometimes, a small reference value is defined and a value like w(a-b) log w(a-b) is used where w(x) = x-\gamma if x > \gamma, x+\gamma if x < -\gamma and 0 else.