Re: Interpreting mutliple regression Beta is only way?

2002-02-05 Thread Rich Ulrich

On 4 Feb 2002 16:14:11 -0800, [EMAIL PROTECTED] (Wuzzy) wrote:

  
  In biostatistical studies, either version of beta is pretty worthless.
  Generally speaking.
 
 If I may be permitted to infer a reason:
 if you have  
 
 bodyweight= -a(drug) - b(exercise) + food
 
 Then the standardized coefficients will affect bodyweight but they
 will also affect each other.  They would only be useful if drug intake
 was perfectly independant of exercise and food in the population.

Okay, that is a bit of a reason, which applies widely;  but that 
is not what I was pointing at when I wrote that line.

I was over-stating.  But many of the results in clinical research
are 'barely statistically significant'  and when that is so, then
(in my opinion), the 95% Confidence Interval does not add much 
to the statement of test result, and the point estimate 
of the effect (beta, or a mean-difference)  is pretty loose, too.

I want to beat a 5% alpha.  And I want a result to have a
50% interval (say)  that is actually interesting, and well-above
the measurement jitters. -- That is an *essential*  requirement
for being coherent, if you want to describe 
observational studies.  And it is a pretty good idea for
writing about randomized-controlled studies, too.

 
 If they are not independant but partially collinear (0.5) using linear
 regression is it possible to know whether the drug is strong enough
 (colloquially speaking) to recommend?
 I assume that it would be impossible as a change in drug cannot be
 separated from a change in exercise in the population.  Ie. people are
 exercising and taking the drug so it is impossible to distinguish
 which one is beneficial.

You can look at the zero-order effect (raw correlation) as well as
the partial effect after controlling for other potential predictors:
one at a time, or in sets.  If your Drug always shows the same
prediction, regardless of how you test it, that is a pretty good sign.

 I've heard of ridge regression will try to investigate this area
 more..
 will probably figure it out with time hehe..

Ridge regression can be decomposed into a combination of the
p-variate regression, averaged with the 1-variable regressions, 
and all the i-variable regressions that lie in-between.  There is
not much gain using Ridge if you avoid suppressor relationships 
from the start -- all those cases where a beta is the opposite sign
from its zero-order beta.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-02-04 Thread Wuzzy

 
 In biostatistical studies, either version of beta is pretty worthless.
 Generally speaking.

If I may be permitted to infer a reason:
if you have  

bodyweight= -a(drug) - b(exercise) + food

Then the standardized coefficients will affect bodyweight but they
will also affect each other.  They would only be useful if drug intake
was perfectly independant of exercise and food in the population.

If they are not independant but partially collinear (0.5) using linear
regression is it possible to know whether the drug is strong enough
(colloquially speaking) to recommend?
I assume that it would be impossible as a change in drug cannot be
separated from a change in exercise in the population.  Ie. people are
exercising and taking the drug so it is impossible to distinguish
which one is beneficial.

I've heard of ridge regression will try to investigate this area
more..
will probably figure it out with time hehe..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-25 Thread Rich Ulrich

On 18 Jan 2002 16:55:11 -0800, [EMAIL PROTECTED] (Wuzzy) wrote:

 Rich Ulrich [EMAIL PROTECTED] wrote in message 
 
 Thanks Rich, most informative, I am trying to determine a method of
 comparing apples to oranges - it seems an improtant thing to try to
 do, perhaps it is impossible .

Well, I do think there is a narrow, numerical answer about two
particular tests, in a particular sample.  But there are a lot
of side-issues before you get to an answer that is very useful.
See below.

 
 I am trying to
 determine which is better, glycemic index or carbohydrate total in
 predicting glycemic load (Glycemic load=glycemic index*carbohydrate).
 
 my results as a matrix:
 
 GI load  GI  Carb
 GI load  1.000
 GI   .5331.000
 Carb .858.1241.000
 
 So it seems that carb affects GI load more than does GI.. but this is
 on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human
 diet) but I don't think you're allowed to do this kind of comparison
 as Carb and GI aretotal different values:
 
 I suspected that you would be allowed to make the comparisons if you
 use Betas, ie. measure how many standard deviation
 changes of GI and  Carb it requires..  If it takes a bigger standard
 deviation of Carb then you could say that it is more likely that carb
 has a bigger effect on glycemic load.
 
 you seem to suggest that even using standard deviation changes, you
 cannot compare  apples to oranges.  Which sounds right but is
 dissapointing..

There is a narrow, numerical thing with an answer.  For instance,
if you are adding A+B=C,  then two *independent*  components 
of C  affect   C  in proportion to their variances.  Your two 
components don't have much correlation (.13) -- that is, 
they are nearly independent -- so that would work out.

But you actually have an *exact* relationship, as a product.
The one that matters  in this case is the one that has the 
higher  coefficient of variation -- the larger standard 
deviation (and variance)  when expressed as logs.  
In a sum of two numbers, the one that is more varying 
will  contribute more.

You can state that A  or B  is bigger, for *this* 
particular sample.  Now, I imagine a sample could be 
  a) all healthy; 
  b) all with a 'definite' diagnosis some relevant disease; 
  c) all in the category of needing a diagnosis.
Or you could have some mixture of the above; for 
some specified ages, etc.  

Now, What is it you are trying to decide? - that will 
help determine a relevant sampling. 

When you say that a number is more important, are 
you trying to say that the measure you have says
a lot?  - or that *improving*  that particular number
would gain you more, because the number you have
is a lousy one?  -- I can point out that if the two variables
matter exactly the same in some physiological sense, 
then you  can have two opposite conclusions about
which is important, if one of them is *measured*  much
more poorly (much more inherent error; measurement 
error)  than the other.

And, you are reporting on tests, and a number.
You don't get to conclude that glucose matters more 
than carbohydrates  if you only know that your 50 mg
glucose test is more pertinent than your 50mg c.  test;
as collected in your  particular sample.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-18 Thread Wuzzy

Rich Ulrich [EMAIL PROTECTED] wrote in message 

Thanks Rich, most informative, I am trying to determine a method of
comparing apples to oranges - it seems an improtant thing to try to
do, perhaps it is impossible .

I am trying to
determine which is better, glycemic index or carbohydrate total in
predicting glycemic load (Glycemic load=glycemic index*carbohydrate).

my results as a matrix:

GI load  GI  Carb
GI load  1.000
GI   .5331.000
Carb .858.1241.000

So it seems that carb affects GI load more than does GI.. but this is
on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human
diet) but I don't think you're allowed to do this kind of comparison
as Carb and GI aretotal different values:

I suspected that you would be allowed to make the comparisons if you
use Betas, ie. measure how many standard deviation
changes of GI and  Carb it requires..  If it takes a bigger standard
deviation of Carb then you could say that it is more likely that carb
has a bigger effect on glycemic load.

you seem to suggest that even using standard deviation changes, you
cannot compare  apples to oranges.  Which sounds right but is
dissapointing..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-18 Thread Jim Snow


Wuzzy [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Rich Ulrich [EMAIL PROTECTED] wrote in message

 Thanks Rich, most informative, I am trying to determine a method of
 comparing apples to oranges - it seems an improtant thing to try to
 do, perhaps it is impossible .

 I am trying to
 determine which is better, glycemic index or carbohydrate total in
 predicting glycemic load (Glycemic load=glycemic index*carbohydrate).

 my results as a matrix:

 GI load  GI  Carb
 GI load  1.000
 GI   .5331.000
 Carb .858.1241.000

 So it seems that carb affects GI load more than does GI.. but this is
 on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human
 diet) but I don't think you're allowed to do this kind of comparison
 as Carb and GI aretotal different values:

 I suspected that you would be allowed to make the comparisons if you
 use Betas, ie. measure how many standard deviationGlycemic load=glycemic
index*carbohydrate
 changes of GI and  Carb it requires..  If it takes a bigger standard
 deviation of Carb then you could say that it is more likely that carb
 has a bigger effect on glycemic load.

 you seem to suggest that even using standard deviation changes, you
 cannot compare  apples to oranges.  Which sounds right but is
 dissapointing..

The glycaemic index is calculated as the area under the blood
glucose curve for the two hours (or 3 hours for diabetics) after ingesting
enough of a food to include 50 grams of carbohydrate, divided by the same
area after ingesting 50 grams of pure glucose, expressed as a percentage.
In some cases a reference food other than glucose is used.

If the area under the curve is the glycaemic load you are studying I
would expect the model
 Glycemic load=glycemic index*carbohydrate
to fit the data very well when the carbohydrate content is near 50 gm,
providing all the glycaemic indices have been calculated on the same basis.
Using correlations or beta coefficients as you are doing is appropriate
when linear relationships are involved, but not to test for goodness of fit
to this model.
What would be of interest would be a plot  of the difference between
the predicted glycaemic load and the observed value,against carbohydrate,
especially for carbohydrate values far from 50 gm. If I have a meal of
mainly of eggs or meat, the total carbohydrate content is very low, so the
glycaemic load calculated from the formula may be wrong.
One difficulty with the whole Glycaemic Index approach is that there
is not, as far as I know, any way of calculating the glycaemic load from
foods like cheese,eggs and meat. If the body needs glucose, it will be made
from fat and protein foods.
It is not surprising that it could be hard to persuade volunteers to
ingest 8500 grams of processed cheese, containing 50gm of carbohydrate, in
order to determine its glycaemic index  :-)
 I would like to see another index constructed giving the glycaemic load
produced by 100 gm of each food, rather than the load produced by that
amount of food which contains 50gm of carbohydrate.





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-18 Thread Jim Snow


Wuzzy [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Rich Ulrich [EMAIL PROTECTED] wrote in message

 Thanks Rich, most informative, I am trying to determine a method of
 comparing apples to oranges - it seems an improtant thing to try to
 do, perhaps it is impossible .

 I am trying to
 determine which is better, glycemic index or carbohydrate total in
 predicting glycemic load (Glycemic load=glycemic index*carbohydrate).

 my results as a matrix:

 GI load  GI  Carb
 GI load  1.000
 GI   .5331.000
 Carb .858.1241.000

 So it seems that carb affects GI load more than does GI.. but this is
 on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human
 diet) but I don't think you're allowed to do this kind of comparison
 as Carb and GI aretotal different values:

 I suspected that you would be allowed to make the comparisons if you
 use Betas, ie. measure how many standard deviationGlycemic load=glycemic
index*carbohydrate
 changes of GI and  Carb it requires..  If it takes a bigger standard
 deviation of Carb then you could say that it is more likely that carb
 has a bigger effect on glycemic load.

 you seem to suggest that even using standard deviation changes, you
 cannot compare  apples to oranges.  Which sounds right but is
 dissapointing..

The glycaemic index is calculated as the area under the blood
glucose curve for the two hours (or 3 hours for diabetics) after ingesting
enough of a food to include 50 grams of carbohydrate, divided by the same
area after ingesting 50 grams of pure glucose, expressed as a percentage.
In some cases a reference food other than glucose is used.

If the area under the curve is the glycaemic load you are studying I
would expect the model
 Glycemic load=glycemic index*carbohydrate
to fit the data very well when the carbohydrate content is near 50 gm,
providing all the glycaemic indices have been calculated on the same basis.
Using correlations or beta coefficients as you are doing is appropriate
when linear relationships are involved, but not to test for goodness of fit
to this model.
What would be of interest would be a plot  of the difference between
the predicted glycaemic load and the observed value,against carbohydrate,
especially for carbohydrate values far from 50 gm. If I have a meal of
mainly of eggs or meat, the total carbohydrate content is very low, so the
glycaemic load calculated from the formula may be wrong.
One difficulty with the whole Glycaemic Index approach is that there
is not, as far as I know, any way of calculating the glycaemic load from
foods like cheese,eggs and meat. If the body needs glucose, it will be made
from fat and protein foods.
It is not surprising that it could be hard to persuade volunteers to
ingest 8500 grams of processed cheese, containing 50gm of carbohydrate, in
order to determine its glycaemic index  :-)
 I would like to see another index constructed giving the glycaemic load
produced by 100 gm of each food, rather than the load produced by that
amount of food which contains 50gm of carbohydrate.





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-18 Thread Jim Snow


Wuzzy [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Rich Ulrich [EMAIL PROTECTED] wrote in message

 Thanks Rich, most informative, I am trying to determine a method of
 comparing apples to oranges - it seems an improtant thing to try to
 do, perhaps it is impossible .

 I am trying to
 determine which is better, glycemic index or carbohydrate total in
 predicting glycemic load (Glycemic load=glycemic index*carbohydrate).

 my results as a matrix:

 GI load  GI  Carb
 GI load  1.000
 GI   .5331.000
 Carb .858.1241.000

 So it seems that carb affects GI load more than does GI.. but this is
 on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human
 diet) but I don't think you're allowed to do this kind of comparison
 as Carb and GI aretotal different values:

 I suspected that you would be allowed to make the comparisons if you
 use Betas, ie. measure how many standard deviationGlycemic load=glycemic
index*carbohydrate
 changes of GI and  Carb it requires..  If it takes a bigger standard
 deviation of Carb then you could say that it is more likely that carb
 has a bigger effect on glycemic load.

 you seem to suggest that even using standard deviation changes, you
 cannot compare  apples to oranges.  Which sounds right but is
 dissapointing..

The glycaemic index is calculated as the area under the blood
glucose curve for the two hours (or 3 hours for diabetics) after ingesting
enough of a food to include 50 grams of carbohydrate, divided by the same
area after ingesting 50 grams of pure glucose, expressed as a percentage.
In some cases a reference food other than glucose is used.

If the area under the curve is the glycaemic load you are studying I
would expect the model
 Glycemic load=glycemic index*carbohydrate
to fit the data very well when the carbohydrate content is near 50 gm,
providing all the glycaemic indices have been calculated on the same basis.
Using correlations or beta coefficients as you are doing is appropriate
when linear relationships are involved, but not to test for goodness of fit
to this model.
What would be of interest would be a plot  of the difference between
the predicted glycaemic load and the observed value,against carbohydrate,
especially for carbohydrate values far from 50 gm. If I have a meal of
mainly of eggs or meat, the total carbohydrate content is very low, so the
glycaemic load calculated from the formula may be wrong.
One difficulty with the whole Glycaemic Index approach is that there
is not, as far as I know, any way of calculating the glycaemic load from
foods like cheese,eggs and meat. If the body needs glucose, it will be made
from fat and protein foods.
It is not surprising that it could be hard to persuade volunteers to
ingest 8500 grams of processed cheese, containing 50gm of carbohydrate, in
order to determine its glycaemic index  :-)
 I would like to see another index constructed giving the glycaemic load
produced by 100 gm of each food, rather than the load produced by that
amount of food which contains 50gm of carbohydrate.





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Interpreting mutliple regression Beta is only way?

2002-01-16 Thread Wuzzy

If your beta coefficients are on different scales: like
you want to know whether temperature or pressure are affecting
your bread baking more,

Is the way to do this using Beta coefficients calculated
as Beta=beta*SDx/SDy

(SDx=standard deviation of each x)
(SDy=standard deviation of the dependant variable)

It seems like the Beta coefficients are rarely cited in studies
and it seems to me worthless to know beta (small b) as you are
not allowed to compare them as they are on different scales.

For example, a standardized regression equation becomes:
Bread Making Success=0.5Temperature+0.8Pressure

would mean that a standard deviation change in pressure will increase
your success.
Is there a way of converting this standardized coefficient to a
correlation coefficient on a scale of -1 to +1)
It would be useful to do this as you want to know the correlation
coefficient of temperature after factoring out pressure.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-16 Thread Rich Ulrich

On 16 Jan 2002 11:33:15 -0800, [EMAIL PROTECTED] (Wuzzy) wrote:

 If your beta coefficients are on different scales: like
 you want to know whether temperature or pressure are affecting
 your bread baking more,
 
 Is the way to do this using Beta coefficients calculated
 as Beta=beta*SDx/SDy

Something like that ... is called the standardized beta, 
and every OLS  regression program gives them.
 ...
 
 It seems like the Beta coefficients are rarely cited in studies
 and it seems to me worthless to know beta (small b) as you are
 not allowed to compare them as they are on different scales.

In biostatistical studies, either version of beta is pretty worthless.
Generally speaking.

What you have is prediction that is barely better than chance.
The p-values tell you which is more powerful  within this one
equation.  The zero-level correlation tells you how they 
related, alone.  -- If these two indicators are not similar, then
you have something complicated going on, with confounding
taking place, or joint-prediction, and no single number will show
it all.
 - When prediction is enough better-than-chance to be 
really interesting, then the raw units are probably interesting, too.

 [ ... ]
 Is there a way of converting this standardized coefficient to a
 correlation coefficient on a scale of -1 to +1)
 It would be useful to do this as you want to know the correlation
 coefficient of temperature after factoring out pressure.

I think you are looking for simple answers that can't exist, 
even though there *is*  a partial-r, and the beta in regression
*is*  a partial-beta.

The main use I have found for the standardized (partial) beta
is the simple check against confounding, etc.  If ' beta'  is similar
to the zero-order r,  for all variables, then there must be pretty
good independence among the predictors, and interpretation
doesn't hide any big surprises.  If it is half-size, I look for shared
prediction.  If it is in the wrong direction or far too big (these 
conditions happen at the same time, for pairs of variables), 
then  gross confounding exists.

Hope this helps.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=