What is an experiment ?
I think also that an experiment is the human attempt to make sense out of the chaotic world: the method is you assume chaos, H0 and then disprove it.. so you don't need controls because the experiment can be run to prove maybe that the equation for velocity is valid.. (validation experiment).. (ie you can disprove the null hypothesis, chaos by showing that somehthing always occurs).. I had an argument with a collegue: I said that B may not cause disease because there was no proof. They said that B did cause disease because there was no proof against it. Basically I think I'm write in that you always start with assuming chaos and no relation between disease and exposure and then you do your experiment.. no controls are needed, i think because sometimes you are just describing, as Jay pointed out.. like with testing the velocity equation, or discovering another relationship (equation etc.) = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
> My tentative conclusion is that your 2% effect really > is a small one; it should be difficult to discern among > likely artifacts; and therefore, it is hardly worth mentioning I agree to me it makes sense as well: fasting insulin should have more to do with error and genetics than food and exercise, I'm not giving up though. I've tried transforming Insulin as I noted odd error behavior on my residuals but it only improved R^2 marginally. Also I don't know if the fact that my population is so large is making a difference. I note that most published studies usually study percentiles of serum levels. This makes more sense I think as maybe 10,000 people will have "normal" serum levels whereas 400 might have abnormal, and so would this have an effect on r^2. I think I am breaking the assumption of regression that you can't repeat the same points over and over. I will try to Consolidate people into groups and then re-run the data. I'm not sure if this will make a difference, but this is how i see it done in the literature. Statistics is interesting, it is hard to find information on the problems you come across and they can only be tackled by running more queries from different angles.. an exception : i asked a while ago whether standardized beta coefficients are valid and the answer was shown to be "no", curiously i came across a journal article on this very topic, if anyone was following the article is "A heuristic method for estimating the relative weight of predictor variables in multiple regression" (Multivr behav res. 35 1 1-19, 2000) This article is very intereting to read... much to comment.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Correlations-statistics
[EMAIL PROTECTED] (Holger Boehm) wrote in message news:<[EMAIL PROTECTED]>... > Hi, > > I have calculated correlation coefficients between sets of parameters > (A) and (B) and beween (A) and (C). > Now I would like to determine the correlation between (A) and (B > combined with C). How can I combine the two parameters (B) and (C), > what kind of statistical method has to be applied? > > Thanks for your tips, > > Holger Boehm If b and c are not correlated at all then your coefficients should be the same.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
> You should take note that R^2 is *not* a very good measure > of 'effect size.' Hi Rich, you asked to see my data, i've posted the visual at the following location http://www.accessv.com/~joemende/insulin2.gif note that the r^2 is low despite the fact that it agrees with common sense: Insulin levels are shown here to decrease with increasing exercise as well as with decreasing food intake.. My r^2 is low but i think it is clear that the above is true.. I've included several different views, "rating" is in MET values, i forgot to multiply against body weight in kg to get KCAL spent per day.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
http://www.accessv.com/~joemende/insulin2.gif Appologies, i also forgot to divide the KCAL in food by the 31 as this represents kcal. It seems to me logical to advise decreasing food intake and increasing physical activity to improve insulin sensitivity. I would probably avoid reporting the R^2, or try a different model (non-linear) = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
> low-fat vegan diet" would be close). However, the incidence of heterozygous > familal hypercholesterolemia is only 1:500,000, so this exposure contributes > little to the variance in serum cholesterol in the population; its r^2 would > be small. > > -Jay Thanks, This is similar to a problem I have come across: the measurement of a serum value against exposure. My theory is that they are correlated. But the data says that they have an R^2 of 0.02 even though the p-value for the beta is p=1E-40 (ie. zero). As you explain this is possible. My reasoning is that the exposure is happening many hours before the measurement of serum and so that is why R^2 is low. Nonetheless the strong beta might suggest a strong effect of the exposure on the serum marker. I've inserted time before exposure into the equation and it barely explained the difference the reason is that not enough people had their serum measured 2hrs after exposure. basically the data is inadequate - but i'm crossing fingers that the low p-value is useful. anyway what i've learned is that R^2 does not measure slope. I knew this but it hadn't sunk in. R^2 is very useful though, for example if you want to know in the american population what is the highest source of fat, you would use R^2 on the food frequencies, not the beta coefficient.. because the R^2 would tell you the food that most predicts, rather than the "strength" of the effect of the food.. ie. low fat foods may be main source of fat in diet.. -just thinking outloud hehe.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
"Jay Tanzman" <[EMAIL PROTECTED]> wrote in message news:<a42e88$1bthp5$[EMAIL PROTECTED]>... > Wuzzy <[EMAIL PROTECTED]> wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > It is because I am validating a 24hr dietary recall questionnaire > > using > > a food frequency questionnaire: > It was just an experiment. My theory was that you can select only people whose 24hr questionnaire was accurate with the ffq and only take that 20% of 24hr that can be explained by ffq then you might come up with a more accurate picture.. Much like "golden standard" method of deattenuation.. It didn't work. it is interesting to re-assign food frequencies to people by using that whichi is predicted by 24hr.. anyway it was fun to try.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
> And that sounds impossible. I suspect a programming error. > > -Jay you're right i programmed a food database incorrectly but i've redone it and yep the correlation was only 0.20 for kcal or so. it is hard to program a database *into* another database easy to make errors.. i've made many errors in my trials. dumbest mistake: is i listed people who left one question blank as a dummy variable, "" but i forgot to filter those subjects out and so it altered my correlation coefficient.. because people who leave one question blank will also leave another blank.. and i got very spurious correlations, hehe.. One of the things i have been unable to figure out is if you are allowed to draw conclusions on very low R^2 equations. Like if only 1% of the variance is predicted by your equation but the p-value is very small and the coefficient is very large, does that mean that this variable has a huge effect on the dependant variable? as an example carbohydratee has a positive effect on fasting insulin but i found this on an R^2 of about 0.02 but the p-value was close to zero, it was like 1E-12 and the coefficient was very large compared to kcal which i included in the model.. i'll probably figure it out with time.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
Hi Rich, okay i'll post the reason why I ask: It is because I am validating a 24hr dietary recall questionnaire using a food frequency questionnaire: as someone else pointed out i got an error, also a perfect correlation for pearsons. it is much more complicated than this but that is the scoop. Amazingly I got a perfect correlation between the two, you would think that the 24hr would be at least a bit attenuated but I got a perfect correlation or "error" = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
In my own defense: I was asking a simple question: will highly correlated cause an irregularly high R^2. My answer to my own question is "no" it can't.. No-one here was able to give me this answer and I believe it is correct: if your sample is large enough,(as mine is) then "no", multicollinearity cannot affect your R^2, it will only affect the coefficients and their signs and errors. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: can multicollinearity force a correlation?
> You made a model with the "exact same exposure in different units", > which is something that no one would do, Hehe, translation is don't post messages until you've thought them through. Anyway, turns out that the answer to my question is "No".. Multicollinearity cannot force a correlation. It turns out that ONE of the variables *was* correlated With R^2=0.45 and so multicollinearity had no effect on overall R^2. I'm sure no-one is interested in my data as it has nothing to do with statistics, my subject of interest is not statistics.. but i need to learn it as a tool.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
can multicollinearity force a correlation?
Is it possible that multicollinearity can force a correlation that does not exist? I have a very large sample of n=5,000 and have found that disease= exposure + exposure + exposure + exposure R^2=0.45 where all 4 exposures are the exact same exposure in different units like ug/dL or mg/dL or molar units. Nonetheless when I do a simple correlation (pearson) I found that the exposure in ug/dL did not affect the disease. This seems hard to believe as my sample is relatively large.. I don't believe the 0.45 R^2 is possible but was shocked by it. I'll try to rerun it in other, more realistic models. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Interpreting mutliple regression Beta is only way?
> > In biostatistical studies, either version of beta is pretty worthless. > Generally speaking. If I may be permitted to infer a reason: if you have bodyweight= -a(drug) - b(exercise) + food Then the standardized coefficients will affect bodyweight but they will also affect each other. They would only be useful if drug intake was perfectly independant of exercise and food in the population. If they are not independant but partially collinear (0.5) using linear regression is it possible to know whether the drug is strong enough (colloquially speaking) to recommend? I assume that it would be impossible as a change in drug cannot be separated from a change in exercise in the population. Ie. people are exercising and taking the drug so it is impossible to distinguish which one is beneficial. I've heard of "ridge regression" will try to investigate this area more.. will probably figure it out with time hehe.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: how to adjust for variables
> Walter Willett has a whole chapter on this subject in his book Nutritional > Epidemiology. It should be considered required reading before attempting to > model anything that has to do with diet. Thanks this is a really good book, not just for ppl wanting to study nutrition but surveys in general as well as confounding and modifying by multivariables. (a simple guide) He has some really earth-moving examples of errors commited in the past. As an example one group tried to find a correlation between weight and disease as: disease=weight+blood pressure+heart rate+blood cholesterol and willett points out that they found no association because the implications of "weight" cannot be separated from its effects on heart rate etc. Anyway I'm currently going on the definition of "adjusted" for 1 2 and 3 as the following equation: adjusted variable=variable^-variable (where variable-hat represents the variable predicted by 1 2 and 3 in a multivariate equation and "variable" is just the actual variable.) = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: how to adjust for variables
> [ ... ] > > Is doing a univariate regression between the variable you want to > > adjust for and your predictor the only way to adjust for values as > > Univariate? Absolutely not. *Multiple* regression gives > "partial regression coefficients." Those "adjust." > I find it extremely difficult to interpret multivariate equations. Are there any good books on conceptualizing the equation? For instance: If you are assessing whether protein, fat, or carbohydrate is important in obesity independant of calories, do you do the following model: Disease=carb+proten+fat+calories and if so, isn't the word "calories" meaningless as it is equal to the sum of the other three. Perhaps it should not be included in the model. I have read of studies were they will use everything except "carb" as follows: disease=protein+fat+calories and from here you can determine what substituting carb with protein or fat will have on the disease. It is very difficult to conceptualize and very difficult to understand what the word "calories" means anymore in a multivariate model.. It seems if you use univariate adjusted values it is easier to model, I have very little experience in statistics as everyone can tell.. Just commenting, no real question here.. I will probably understand it with time.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
how to adjust for variables
Pretend you want to see how fat relates to cancer risk fat Kcalcancer 1 2 100 2 4 120 3 6 130 4 8 140 5 10 150 6 12 160 7 14 170 8 16 180 9 18 190 10 20 200 You have to adjust for KCal, but how is this done, is the following the BEST way? Method: Regress Fat on KCal and take the residuals as follows CoefficientsStandard Error Intercept 0 0 fat 2 0 so calories=2*fat and therefore your adjusted fat intake of person #1 is 1*0.5=0.5 Is doing a univariate regression between the variable you want to adjust for and your predictor the only way to adjust for values as above? Studies often cite how they have "adjusted" for KCal, is this the way they do it, they usually do not specify the method. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
how to adjust for variables
also if you ajdust by using residuals, do you still have to factor in KCal in your final regression equation? it would seem to me that you should if you have other variables that might be confounded by KCal, but otherwise you wouldn't. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Interpreting mutliple regression Beta is only way?
Rich Ulrich <[EMAIL PROTECTED]> wrote in message Thanks Rich, most informative, I am trying to determine a method of comparing apples to oranges - it seems an improtant thing to try to do, perhaps it is impossible . I am trying to determine which is better, glycemic index or carbohydrate total in predicting glycemic load (Glycemic load=glycemic index*carbohydrate). my results as a matrix: GI load GI Carb GI load 1.000 GI .5331.000 Carb .858.1241.000 So it seems that carb affects GI load more than does GI.. but this is on ALL foods.. (nobody eats ALL foods so cannot extrapolate to human diet) but I don't think you're allowed to do this kind of comparison as Carb and GI aretotal different values: I suspected that you would be allowed to make the comparisons if you use Betas, ie. measure how many standard deviation changes of GI and Carb it requires.. If it takes a bigger standard deviation of Carb then you could say that it is more likely that carb has a bigger effect on glycemic load. you seem to suggest that even using standard deviation changes, you cannot compare apples to oranges. Which sounds right but is dissapointing.. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Interpreting mutliple regression Beta is only way?
If your beta coefficients are on different scales: like you want to know whether temperature or pressure are affecting your bread baking more, Is the way to do this using Beta coefficients calculated as Beta=beta*SDx/SDy (SDx=standard deviation of each x) (SDy=standard deviation of the dependant variable) It seems like the Beta coefficients are rarely cited in studies and it seems to me worthless to know beta (small "b") as you are not allowed to compare them as they are on different scales. For example, a standardized regression equation becomes: Bread Making Success=0.5Temperature+0.8Pressure would mean that a standard deviation change in pressure will increase your success. Is there a way of converting this standardized coefficient to a "correlation coefficient" on a scale of -1 to +1) It would be useful to do this as you want to know the correlation coefficient of temperature after factoring out pressure. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =