Hello, I have a binary matrix of 80k sets (sets comprising of combination of cities) by 885 cities (dimension = 80k x 885). For matrix, 1 means city is a part of the set and 0 means the city is not part of the set.
Sets are rows and cities are columns (city.test). I want to do feature reduction to only keep important sets (most likely 2-10 sets of city combinations) and the associated cities. So I chose SVD and I am following these steps but not sure how to go about the next step. Could anyone help with this? s <- svd(city.test) D <- diag(s$d) d2 <- (s$d)^2 ratio <- cumsum(d2/dum(d2)) # proportion of total variance from 885 PCs. and looking at the plots, I see about first ~10 or 20 PCs explain the most variation (Please see attatched plot). How do I use this to extract the most relevant sets from my original matrix? COuld you please help. A friend of mine recommended plotting: rowSums(abs(s$u*s$d)) and choosing only the highest magnitude sets. I didn't understand the significance of it. Most probably, it reflects that only the first PC contributes the most, hence we only care about rowsum(abs(u*d)). Is this correct? Thanks.
variance-cities.pdf
Description: Adobe PDF document
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.