Dear all, I hope to be the clearest I can. Let's say I have a dataset with 10 variables, where 4 of them represent for me a certain phenomenon that I call Y. The other 6 represent for me another phenomenon that I call X.
Each one of those variables (10) contains 37 units. Those units are just the respondents of my analysis (a survey). Since all the questions are based on a Likert scale, they are qualitative variables. The scale is from 0 to 7 for all of them, but there are "-1" and "-2" values where the answer is missing. Hence the scale goes actually from -2 to 7. What I want to do is to calculate the regression between my Y (which contains 4 variables in this case and 37 answers for each variable) and my X (which contains 6 variables instead and the same number of respondents). I know that for qualitative analyses I should use Anova instead of the regression, although I have read somewhere that it is even possible to make the regression. Until now I have tried to act this way: __________________________________________________________________________________________________________ > apply(Y, 1, function(Y) mean(Y[Y>0])) #calculate the average per rows (respondents) without considering the negative values > Y.reg<- c(apply(Y, 1, function(Y) mean(Y[Y>0]))) #create the vector Y, thus it results like 1 variable with 37 numbers > apply(X, 1, function(X) mean(X[X>0])) > X.reg<- c(apply(X, 1, function(X) mean(X[X>0]))) #create the vector X, thus it results like 1 variable with 37 numbers > reg1<- lm(Y.reg~ X.reg) #make the first regression > summary(reg1) #see the results Call: lm(formula = Y.reg ~ X.reg) Residuals: Min 1Q Median 3Q Max -2.26183 -0.49434 -0.02658 0.37260 2.08899 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.2577 0.4986 8.539 4.46e-10 *** X.reg 0.1008 0.1282 0.786 0.437 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.7827 on 35 degrees of freedom Multiple R-squared: 0.01736, Adjusted R-squared: -0.01072 F-statistic: 0.6182 on 1 and 35 DF, p-value: 0.437 > layout(matrix(1:4,2,2)) #graphical approach > plot(reg1) please see the pfd() function attached. ________________________________________________________________________________________________________ But as you can see, although I do not use Y as composed by 4 variables and X by 6, and I do not consider the negative values too, I get a very low score as my R^2. If I act with anova instead I have this problem: ________________________________________________________________________________________________________ > Ymatrix<- as.matrix(Y) > Xmatrix<- as.matrix(X) #where both this Y and X are in their first form, thus composed by more variables (4 and 6) and with #negative values as well. > Errore in UseMethod("anova") : no applicable method for 'anova' applied to an object of class "c('matrix', 'integer', 'numeric')" ________________________________________________________________________________________________________ To be honest, a few days ago I succeeded in using anova, but unfortunately I do not remember how and I did not save the command anywhere. What I would like to know is: - First of all, am I wrong in how I approach to my problem? - What do you think about the regression output? - Finally, how can I do to make the anova? If I have to do it. I really hope I have been clear. Thank you all for any kind of help. Best, Andrea [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.