What is an experiment ?

2002-02-20 Thread Wuzzy

I think also that an experiment is the human attempt to make sense
out of the chaotic world:

the method is you assume chaos, H0 and then disprove it..

so you don't need controls because the experiment can be run to
prove maybe that the equation for velocity is valid..
(validation experiment).. (ie you can disprove the null hypothesis,
chaos by showing that somehthing always occurs)..

I had an argument with a collegue: I said that B may not cause
disease because there was no proof.  They said that B did cause
disease because there was no proof against it.  Basically I think
I'm write in that you always start with assuming chaos and no relation
between disease and exposure and then you do your experiment..

no controls are needed, i think because sometimes you are just
describing, as Jay pointed out.. like with testing the velocity
equation, or discovering another relationship (equation etc.)


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-20 Thread Wuzzy

> My tentative conclusion is that your 2%  effect  really
> is a small one; it should be difficult to discern among 
> likely artifacts; and therefore, it is hardly worth mentioning

I agree to me it makes sense as well: fasting insulin should have more
to do with error and genetics than food and exercise, I'm not giving
up though. I've tried transforming Insulin as I noted odd error
behavior on my residuals but it only improved R^2 marginally.

Also I don't know if the fact that my population is so large is making
a difference.  I note that most published studies usually study
percentiles of serum levels.  This makes more sense I think as maybe
10,000 people will have "normal" serum levels whereas 400 might have
abnormal, and so would this have an effect on r^2.

I think I am breaking the assumption of regression that you can't
repeat the same points over and over.  I will try to Consolidate
people into groups and then re-run the data.  I'm not sure if this
will make a difference, but this is how i see it done in the
literature.

Statistics is interesting, it is hard to find information on the
problems you come across and they can only be tackled by running more
queries from different angles..
an exception :  i asked a while ago whether standardized beta
coefficients are
valid and the answer was shown to be "no", curiously i came across a
journal article on this very topic, if anyone was following the
article is "A heuristic method for estimating the relative weight of
predictor variables in multiple regression" (Multivr behav res. 35 1
1-19, 2000)  This article is very intereting to read...  much to
comment..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Correlations-statistics

2002-02-20 Thread Wuzzy

[EMAIL PROTECTED] (Holger Boehm) wrote in message 
news:<[EMAIL PROTECTED]>...
> Hi,
> 
> I have calculated correlation coefficients between sets of parameters
> (A) and (B) and beween (A) and (C).
> Now I would like to determine the correlation between (A) and (B
> combined with C). How can I combine the two parameters (B) and (C),
> what kind of statistical method has to be applied?
> 
> Thanks for your tips,
> 
> Holger Boehm

If b and c are not correlated at all then your coefficients should be the same..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-18 Thread Wuzzy

> You should take note that R^2  is *not*  a very good measure
> of 'effect size.' 

Hi Rich, you asked to see my data, i've posted the visual at the
following location http://www.accessv.com/~joemende/insulin2.gif note
that the r^2 is low despite the fact that it agrees with common sense:
 Insulin levels are shown here to decrease with increasing exercise as
well as with decreasing food intake..   My r^2 is low but i think it
is clear that the above is true..

I've included several different views, "rating" is in MET values, i
forgot to multiply against body weight in kg to get KCAL spent per
day..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-18 Thread Wuzzy

http://www.accessv.com/~joemende/insulin2.gif

Appologies, i also forgot to divide the KCAL in food by the 31 as this
represents kcal.  It seems to me logical to advise decreasing food
intake and increasing physical activity to improve insulin
sensitivity.  I would probably avoid reporting the R^2, or try a
different model (non-linear)


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-12 Thread Wuzzy

> low-fat vegan diet" would be close).  However, the incidence of heterozygous
> familal hypercholesterolemia is only 1:500,000, so this exposure contributes
> little to the variance in serum cholesterol in the population; its r^2 would
> be small.
> 
> -Jay

Thanks,

This is similar to a problem I have come across: the measurement of a
serum value against exposure.
My theory is that they are correlated.  But the data says that they
have an R^2 of 0.02 even though the  p-value for the beta is p=1E-40 
(ie. zero).

As you explain this is possible.  My reasoning is that the exposure is
happening  many hours before the measurement of serum and so that is
why R^2 is low.  Nonetheless the strong beta might  suggest a strong
effect of the exposure on the serum marker.
I've inserted time before exposure into the equation and it barely
explained the difference the reason is that not enough people had
their serum measured 2hrs after exposure.

basically the data is inadequate - but i'm crossing fingers that the
low p-value is useful.

anyway what i've learned is that R^2 does not measure slope.  I knew
this but it hadn't sunk in.  R^2 is very useful though, for example if
you want to know in the american population what is the highest source
of fat, you would use R^2 on the food frequencies, not the beta
coefficient.. because the R^2 would tell you the food that most
predicts, rather than the "strength" of the effect of the food..  ie.
low fat foods may be main source of fat in diet..

-just thinking outloud hehe..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-09 Thread Wuzzy

"Jay Tanzman" <[EMAIL PROTECTED]> wrote in message 
news:<a42e88$1bthp5$[EMAIL PROTECTED]>...
> Wuzzy <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > It is because I am validating a 24hr dietary recall questionnaire
> > using
> > a food frequency questionnaire:
> 

It was just an experiment.
My theory was that you can select only people whose 24hr questionnaire
was accurate with the ffq and only take that 20% of 24hr that can be
explained by ffq then you might come up with a more accurate picture..

Much like "golden standard" method of deattenuation..  It didn't work.

it is interesting to re-assign food frequencies to people by using
that whichi is predicted by 24hr..

anyway it was fun to try..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-09 Thread Wuzzy

> And that sounds impossible.  I suspect a programming error.
> 
> -Jay

you're right i programmed a food database incorrectly but i've redone
it and yep the correlation was only 0.20 for kcal or so.
it is hard to program a database *into* another database easy to make
errors..

i've made many errors in my trials.
 dumbest mistake: is i listed people who left one question blank as a
dummy variable, "" but i forgot to filter those subjects out and
so it altered my correlation coefficient.. because people who leave
one question blank will also leave another blank..  and i got very
spurious correlations, hehe..


One of the things i have been unable to figure out is if you are
allowed to draw conclusions on very low R^2 equations.  Like if only
1% of the variance is predicted by your equation but the p-value is
very small and the coefficient is very large, does that mean that this
variable has a huge effect on the dependant variable?

as an example carbohydratee has a positive effect on fasting insulin
but i found this on an R^2 of about 0.02 but the p-value was close to
zero, it was like
1E-12 and the coefficient was very large compared to kcal which i
included in the model..
i'll probably figure it out with time..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-07 Thread Wuzzy

Hi Rich, okay i'll post the reason why I ask:

It is because I am validating a 24hr dietary recall questionnaire
using
a food frequency questionnaire:  

as someone else pointed out i got an error, also a perfect correlation
for pearsons.

it is much more complicated than this but that is the scoop.


Amazingly I got a perfect correlation between the two, you would think
that the 24hr would be at least a bit attenuated but I got a perfect
correlation or "error"


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-05 Thread Wuzzy

In my own defense:

I was asking a simple question:

will highly correlated cause an irregularly high R^2.

My answer to my own question is  "no" it can't.. 
No-one here was able to give me this answer and I believe it is
correct: if your sample is large enough,(as mine is) then "no",
multicollinearity cannot affect your R^2, it will only affect the
coefficients and their signs and errors.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: can multicollinearity force a correlation?

2002-02-05 Thread Wuzzy

> You made a model with the "exact same exposure in different units",
> which is something that no one would do, 

Hehe, translation is don't post messages until you've thought them
through.

Anyway, turns out that the answer to my question is "No"..
Multicollinearity cannot force a correlation.  It turns out that ONE
of the variables *was* correlated With R^2=0.45 and so
multicollinearity had no effect on overall R^2.

I'm sure no-one is interested in my data as it has nothing to do with
statistics, my subject of interest is not statistics.. but i need to
learn it as a tool..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



can multicollinearity force a correlation?

2002-02-05 Thread Wuzzy

Is it possible that multicollinearity can force a correlation that
does not exist?

I have a very large sample of n=5,000
and have found that

disease= exposure + exposure + exposure + exposure R^2=0.45

where all 4 exposures are the exact same exposure in different units 
like ug/dL or mg/dL or molar units.

Nonetheless when I do a simple correlation (pearson) I found that the
exposure in ug/dL did not affect the disease.

This seems hard to believe as my sample is relatively large..
I don't believe the 0.45 R^2 is possible but was shocked by it.  I'll
try to rerun it in other, more realistic models.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-02-04 Thread Wuzzy

> 
> In biostatistical studies, either version of beta is pretty worthless.
> Generally speaking.

If I may be permitted to infer a reason:
if you have  

bodyweight= -a(drug) - b(exercise) + food

Then the standardized coefficients will affect bodyweight but they
will also affect each other.  They would only be useful if drug intake
was perfectly independant of exercise and food in the population.

If they are not independant but partially collinear (0.5) using linear
regression is it possible to know whether the drug is strong enough
(colloquially speaking) to recommend?
I assume that it would be impossible as a change in drug cannot be
separated from a change in exercise in the population.  Ie. people are
exercising and taking the drug so it is impossible to distinguish
which one is beneficial.

I've heard of "ridge regression" will try to investigate this area
more..
will probably figure it out with time hehe..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: how to adjust for variables

2002-01-30 Thread Wuzzy

> Walter Willett has a whole chapter on this subject in his book Nutritional
> Epidemiology.  It should be considered required reading before attempting to
> model anything that has to do with diet.


Thanks this is a really good book, not just for ppl wanting to study
nutrition but surveys in general as well as confounding and modifying
by multivariables.
(a simple guide)  He has some really earth-moving examples of errors
commited in the past.  As an example one group tried to find a
correlation between weight and disease as:  disease=weight+blood
pressure+heart rate+blood cholesterol
and willett points out that they found no association because the
implications of "weight" cannot be separated from its effects on heart
rate etc.


Anyway I'm currently going on the definition of "adjusted" for 1 2 and
3 as the following equation:

adjusted variable=variable^-variable

(where variable-hat represents the variable predicted by  1 2 and 3 in
a multivariate equation and "variable" is just the actual variable.)


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: how to adjust for variables

2002-01-24 Thread Wuzzy

> [ ... ]
> > Is doing a univariate regression between the variable you want to
> > adjust for and your predictor the only way to adjust for values as
> 
> Univariate?  Absolutely not.  *Multiple*  regression gives 
> "partial regression coefficients."   Those  "adjust."
> 


I find it extremely difficult to interpret multivariate equations. 
Are there any good books on conceptualizing the equation?

For instance:
If you are assessing whether protein, fat, or carbohydrate is
important in obesity independant of calories, do you do the following
model:

Disease=carb+proten+fat+calories

and if so, isn't the word "calories" meaningless as it is equal to the
sum of the other three.
Perhaps it should not be included in the model.

I have read of studies were they will use everything except "carb" as
follows:

disease=protein+fat+calories

and from here you can determine what substituting carb with protein or
fat will have on the disease.

It is very difficult to conceptualize and very difficult to understand
what the word "calories" means anymore in a multivariate model..

It seems if you use univariate adjusted values it is easier to model,
I have very little experience in statistics as everyone can tell..

Just commenting, no real question here..  I will probably understand
it with time..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



how to adjust for variables

2002-01-21 Thread Wuzzy

Pretend you want to see how fat relates to cancer risk

fat Kcalcancer
1   2   100
2   4   120
3   6   130
4   8   140
5   10  150
6   12  160
7   14  170
8   16  180
9   18  190
10  20  200

You have to adjust for KCal, but how is this done, is the following
the BEST way?

Method:
Regress Fat on KCal and take the residuals as follows   
CoefficientsStandard Error
Intercept   0   0
fat 2   0

so calories=2*fat   
and therefore your adjusted fat intake of person #1 is 1*0.5=0.5

Is doing a univariate regression between the variable you want to
adjust for and your predictor the only way to adjust for values as
above?  Studies often cite how they have "adjusted" for KCal, is this
the way they do it, they usually do not specify the method.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



how to adjust for variables

2002-01-21 Thread Wuzzy

also if you ajdust by using residuals, do you still have to factor in
KCal in your final regression equation?

it would seem to me that you should if you have other variables that
might be confounded by KCal, but otherwise you wouldn't.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Interpreting mutliple regression Beta is only way?

2002-01-18 Thread Wuzzy

Rich Ulrich <[EMAIL PROTECTED]> wrote in message 

Thanks Rich, most informative, I am trying to determine a method of
comparing apples to oranges - it seems an improtant thing to try to
do, perhaps it is impossible .

I am trying to
determine which is better, glycemic index or carbohydrate total in
predicting glycemic load (Glycemic load=glycemic index*carbohydrate).

my results as a matrix:

GI load  GI  Carb
GI load  1.000
GI   .5331.000
Carb .858.1241.000

So it seems that carb affects GI load more than does GI.. but this is
on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human
diet) but I don't think you're allowed to do this kind of comparison
as Carb and GI aretotal different values:

I suspected that you would be allowed to make the comparisons if you
use Betas, ie. measure how many standard deviation
changes of GI and  Carb it requires..  If it takes a bigger standard
deviation of Carb then you could say that it is more likely that carb
has a bigger effect on glycemic load.

you seem to suggest that even using standard deviation changes, you
cannot compare  apples to oranges.  Which sounds right but is
dissapointing..


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Interpreting mutliple regression Beta is only way?

2002-01-16 Thread Wuzzy

If your beta coefficients are on different scales: like
you want to know whether temperature or pressure are affecting
your bread baking more,

Is the way to do this using Beta coefficients calculated
as Beta=beta*SDx/SDy

(SDx=standard deviation of each x)
(SDy=standard deviation of the dependant variable)

It seems like the Beta coefficients are rarely cited in studies
and it seems to me worthless to know beta (small "b") as you are
not allowed to compare them as they are on different scales.

For example, a standardized regression equation becomes:
Bread Making Success=0.5Temperature+0.8Pressure

would mean that a standard deviation change in pressure will increase
your success.
Is there a way of converting this standardized coefficient to a
"correlation coefficient" on a scale of -1 to +1)
It would be useful to do this as you want to know the correlation
coefficient of temperature after factoring out pressure.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=