When you use complete.cases(), it creates a logical vector that selects cases with no missing values, but it does not change the rownames in the data.frame and those are carried through to the factor scores so you could link them up that way. Alternatively, you could use na.exclude as the na.action in the call to factanal() instead of complete.cases()? That should pad the output with NAs for the cases that have missing data.
You have to use the formula version of factanal(): > set.seed(42) > example <- data.frame(x=rnorm(15), y=rnorm(15), z=rnorm(15)) > to.na <- cbind(sample.int(15, 3), sample.int(3, 3)) > example[to.na] <- NA > out <- factanal(~x+y+z, 1, data=example, na.action=na.omit, scores="regression") > out$scores # With na.omit the cases with missing values are gone as indicated Factor1 # by the missing row numbers 2 -0.92604879 4 0.10731539 5 -0.24370504 6 0.07357697 7 0.69905895 8 -0.17646575 9 1.58430095 10 -0.35934769 12 1.07671299 13 -1.47487960 14 -0.30235156 15 -0.05816682 > out <- factanal(~x+y+z, 1, data=example, na.action=na.exclude, scores="regression") > out$scores # With na.exclude, the cases are kept out of the analysis but the rows are Factor1 # preserved in the factor scores output 1 NA 2 -0.92604879 3 NA 4 0.10731539 5 -0.24370504 6 0.07357697 7 0.69905895 8 -0.17646575 9 1.58430095 10 -0.35934769 11 NA 12 1.07671299 13 -1.47487960 14 -0.30235156 15 -0.05816682 ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 ----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of PIKAL Petr Sent: Friday, July 26, 2013 9:06 AM To: s00123...@myacu.edu.au; 'Justin Delahunty'; r-help@r-project.org Subject: Re: [R] Maintaining data order in factanal with missing data Hi There are probably better options but merge(data.frame(x=1:154),data.frame(x=names(ab.1.fa[[1]]), y=ab.1.fa[[1]]), all.x=T) gives you data frame with NA when there was missing value in the first data.frame. You probably can automate the process a bit with nrow function. Regards Petr > -----Original Message----- > From: Justin Delahunty [mailto:a...@genius.net.au] > Sent: Friday, July 26, 2013 3:34 PM > To: PIKAL Petr; 'Justin Delahunty'; 'Justin Delahunty'; r-help@r- > project.org > Subject: RE: [R] Maintaining data order in factanal with missing data > > Hi Petr, > > So sorry, I accidentally attached the complete data set rather than the > one with missing values. I've attached the correct file to this email. > RE: > init.dfs() being local, I hadn't even thought of that. I've been away > from OOP for close to 15 years now, so it might be time to revise! > > The problem I have is that with missing values the list of factor > scores returned (ab.w1.fa$factor.scores) does not map onto the > originating data frame (ab.w1.df) as it no longer includes the cases > which had missing values. So while the original data set for ab.w1.df > contains 154 ordered cases, the factor analysis contains only 150. > > I am seeking a way to map the values derived from the factor analysis > (ab.w1.fa$factor.scores) back to their original ordered position, so > that these factor score variables may be merged back into the master > data frame (ab.df). A unique ID for each case is available ($dmid) > which I had thought to use when merging the new variables, however I > don't know how to implement this. > > > Thanks for your help, > > Justin > > > -----Original Message----- > From: PIKAL Petr [mailto:petr.pi...@precheza.cz] > Sent: Friday, 26 July 2013 10:59 PM > To: Justin Delahunty; Justin Delahunty; r-help@r-project.org > Subject: RE: [R] Maintaining data order in factanal with missing data > > Hi > > Well, the function init.dfs does nothing as all data frames created > inside it does not propagate to global environment and there is nothing > what the function returns. > > Tha last line (when used outside a function) gives warnings but there > is no sign of error. > > When > > > head(ab.1.df) > dmid g5oab2 g53 g54 g55 g5ovb1 > 1 1 1.418932 1.805227 2.791152 3.624116 3.425586 > 2 2 2.293907 1.187830 1.611237 1.748526 3.816533 > 3 3 2.836536 2.679523 1.279639 2.674986 2.452395 > 4 4 1.872259 3.278359 1.785872 2.458315 1.146480 > 5 5 1.467195 1.180747 3.564127 3.007682 2.109506 > 6 6 3.098512 3.151974 3.969379 3.750571 1.497358 > > head(ab.2.df) > dmid w2oab3 w22 w23 w24 w2ovb1 > 1 1 4.831362 5.522764 7.809366 6.969172 7.398385 > 2 2 6.706346 4.101742 1.434697 5.266775 5.357641 > 3 3 3.653806 2.666885 1.209326 5.125556 4.963374 > 4 4 7.221255 7.649152 6.540398 6.648506 2.576081 > 5 5 1.848023 5.044314 2.761881 3.307220 1.454234 > 6 6 7.606429 4.911766 2.034813 2.638573 2.818834 > > head(ab.3.df) > dmid w3oab3 w3oab4 w3oab7 w3oab8 w3ovb1 > 1 1 5.835609 6.108220 6.587721 2.451461 2.785467 > 2 2 4.973198 1.196815 6.388056 1.110877 4.226463 > 3 3 3.800367 6.697287 5.235345 6.666829 6.319073 > 4 4 1.093141 1.477773 2.269252 3.194978 4.916342 > 5 5 1.975060 7.204516 4.825435 1.775874 3.484027 > 6 6 3.273361 2.243805 5.326547 5.720892 6.118723 > > > > > str(ab.1.fa) > List of 2 > $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ... > ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ... > $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912 > 0.9975 > ..- attr(*, "names")= chr [1:5] "g5oab2" "g53" "g54" "g55" ... > > str(ab.2.fa) > List of 2 > $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ... > ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ... > $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119 > 0.7138 > ..- attr(*, "names")= chr [1:5] "w2oab3" "w22" "w23" "w24" ... > > str(ab.3.fa) > List of 2 > $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN > NaN NaN ... > ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ... > $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975 > ..- attr(*, "names")= chr [1:5] "w3oab3" "w3oab4" "w3oab7" "w3oab8" > ... > > Anyway I have no idea what you consider wrong? > > Regards > Petr > > > > > -----Original Message----- > > From: Justin Delahunty [mailto:a...@genius.net.au] > > Sent: Friday, July 26, 2013 2:22 PM > > To: PIKAL Petr; 'Justin Delahunty'; r-help@r-project.org > > Subject: RE: [R] Maintaining data order in factanal with missing data > > > > Hi Petr, > > > > Thanks for the quick response. Unfortunately I cannot share the data > I > > am working with, however please find attached a suitable R workspace > > with generated data. It has the appropriate variable names, only the > > data has been changed. > > > > The last function in the list (init.dfs()) I call to subset the > > overall data set into the three waves, then conduct the factor > > analysis on each > > (1 factor CFA); it's just in a function to ease re-typing in a new > > workspace. > > > > > > Thanks, > > > > Justin > > > > -----Original Message----- > > From: PIKAL Petr [mailto:petr.pi...@precheza.cz] > > Sent: Friday, 26 July 2013 7:35 PM > > To: Justin Delahunty; r-help@r-project.org > > Subject: RE: [R] Maintaining data order in factanal with missing data > > > > Hi > > > > You provided functions, so far so good. But without data it would be > > quite difficult to understand what the functions do and where could > be > > the issue. > > > > I suspect combination of complete cases selection together with > subset > > and factor behaviour. But I can be completely out of target too. > > > > Petr > > > > > -----Original Message----- > > > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > > > project.org] On Behalf Of s00123...@myacu.edu.au > > > Sent: Friday, July 26, 2013 9:35 AM > > > To: r-help@r-project.org > > > Subject: [R] Maintaining data order in factanal with missing data > > > > > > Hi, > > > > > > > > > > > > I'm new to R, so sorry if this is a simple answer. I'm currently > > > trying to collapse some ordinal variables into a composite; the > > > program ideally should take a data frame as input, perform a factor > > > analysis, compute factor scores, sds, etc., and return the rescaled > > > scores and loadings. The difficulty I'm having is that my data set > > > contains a number of NA, which I am excluding from the analysis > > > using complete.cases(), and thus the incomplete cases are > "skipped". > > > These functions are for a longitudinal data set with repeated waves > > > of > > data, > > > so the final rescaled scores from each wave need to be saved as > > > variables grouped by a unique ID (DMID). The functions I'm trying > to > > > implement are as follows: > > > > > > > > > > > > weighted.sd<-function(x,w){ > > > > > > sum.w<-sum(w) > > > > > > sum.w2<-sum(w^2) > > > > > > mean.w<-sum(x*w)/sum(w) > > > > > > > > > x.sd.w<-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2)) > > > > > > return(x.sd.w) > > > > > > } > > > > > > > > > > > > re.scale<-function(f.scores, raw.data, loadings){ > > > > > > > > > fz.scores<-(f.scores+mean(f.scores))/(sd(f.scores)) > > > > > > > > > means<-apply(raw.data,1,weighted.mean,w=loadings) > > > > > > > > > sds<-apply(raw.data,1,weighted.sd,w=loadings) > > > > > > grand.mean<-mean(means) > > > > > > grand.sd<-mean(sds) > > > > > > > > > final.scores<-((fz.scores*grand.sd)+grand.mean) > > > > > > return(final.scores) > > > > > > } > > > > > > > > > > > > get.scores<-function(data){ > > > > > > > > > fact<- > > > factanal(data[complete.cases(data),],factors=1,scores="regress ion") > > > > > > f.scores<-fact$scores[,1] > > > > > > f.loads<-fact$loadings[,1] > > > > > > rescaled.scores<-re.scale(f.scores, > > > data[complete.cases(data),], f.loads) > > > > > > output.list<-list(rescaled.scores, > > > f.loads) > > > > > > names(output.list)<- > > > c("rescaled.scores", > > > "factor.loadings") > > > > > > return(output.list) > > > > > > } > > > > > > > > > > > > init.dfs<-function(){ > > > > > > > > > ab.1.df<-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1)) > > > > > > > > > ab.2.df<-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1)) > > > > > > > > > ab.3.df<-subset(ab.df,,select=c(dmid, > > > w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1)) > > > > > > > > > > > > ab.1.fa<-get.scores(ab.1.df[-1]) > > > > > > ab.2.fa<-get.scores(ab.2.df[-1]) > > > > > > ab.3.fa<-get.scores(ab.3.df[-1]) > > > > > > > > > } > > > > > > > > > > > > Thanks for your help, > > > > > > > > > > > > Justin > > > > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting- > > > guide.html and provide commented, minimal, self-contained, > > > reproducible code. > > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.