On 01/01/16 14:35, Andras Farkas wrote:



Dear All,

wonder if you have a thought on the followimg: if I have a simple
model like model <- lm(log(y)~log(x)+log(z),data=data), where both,
the dependent and independent variables are log transformed, is it ok
just to use ypred <- predict(model,type=response) to get the
predictions , then transform ypred with exp(ypred)  to y's original
scale to compare observed or known data (y) with model predicted
(ypred) on the original scale?

appreciate your thoughts...

Is it OK? In what sense? It's a free country --- at least the country that I live in is still free (more or less) so you can do whatever you feel like.

However it helps to do a little elementary mathematics, rather than just hammering and hoping. When in doubt, do the maths.

You are fitting the model

  log(y) = beta_0 + beta_1 * log(x) + beta_2 * log(z) + E

where it is tacitly assumed that E ~ N(0,sigma^2) and that the E's corresponding to different observations are independent. Using predict() will get you

   log(y).hat = b_0 + b_1 * log(x) + b_2 * log(z)

where the b_i are the *estimates* of the beta_i. Under the given assumptions, the b_i are unbiased so you get the expected value of log(y).hat being equal to the expected value of log(y) and the universe is in harmony. OMMMMMMMMM!

However when you exponentiate everything, things change; expected values are not preserved since exponentiation is *not* a linear function!

If you set y.hat = exp(log(y).hat) you get

   y.hat = exp(b_0) * x^{b_1}* z^{b_2}

It is not obvious what the expected value of this expression is, but it is pretty sure not to be the same as the expected value of y.

Note that the expected value of y is *not* equal to:

    exp(beta_0) * x^{beta_1} * z^{beta_2}  (1)

Rather, it is equal to this expression multiplied by exp(sigma^2/2) (under the stated assumptions).

I *think* (I have not checked this) that the expected value of y.hat will be equal "asymptotically" to (1) --- that is, to the expected value of y divided by exp(sigma^2/2).

Other younger and wiser heads who read this list might like to chime in here and possibly correct me.

So, bottom line, my guess is that if your sample size is "large" then you won't go too far wrong by predicting y by

    exp(log(y).hat)*exp(s^2/2)

i.e. by exp(predict(model))*exp(s^2/2)

where s^2 is the estimated variance from the model that you fitted on the log scale using lm().

I would advise doing some fairly thorough simulations to get a feel for the size of the bias. (There will always be *some* bias.)

You might also think about fitting the non-linear model

    y = gamma_0 * x^{gamma_1} * z^{gamma_2} + E

if you think that additive errors make more sense than multiplicative errors for your setting.

You can do this "fairly easily" using the function nls(), if you choose to go this way.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to