Input: dataframe with 300+columns for a regression. It consists of sets of
factors whose names have the same structure. For example, aa1,aa2,aa3 could be
one set of factors.
After reading in the dataframe, I would like to compute the density
(%nonzeroes) for certain groups of factors and delete the factors which are
below the density threshold. I would like to use regular expressions to specify
the factor names.
density.factor = c("^aaa","^bbb")
density.faccol=c()
for(fac in density.factor){
density.faccol=c(density.faccol,grep(fac,names(data.df)))
}
data.df=data.df[,-density.faccol]
Is there a way to avoid the for loop? The following seems to work:
lapply(density.factor,grep,names(data.df))
However, that produces a list of lists which need to be merged. Note that in
the above example since we have 2 regular expressions, there will be two lists
but in the general case there will be many more.
Questions (i) how do I merge the lists into a single list (ii) is there a
better way to achieve the "vectorized" grep?
Thanks.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.