Hello,

I have a binary matrix of 80k sets (sets comprising of combination of
cities) by 885 cities
(dimension = 80k x 885). For matrix, 1 means city is a part of the set and
0 means the city is not part of the set.

Sets are rows and cities are columns (city.test).

I want to do feature reduction to only keep important sets (most likely
2-10 sets of city combinations) and the associated cities. So I chose SVD
and I am following these steps but not sure how to go about the next step.
Could anyone help with this?

s <- svd(city.test)
D <- diag(s$d)
d2 <- (s$d)^2
ratio <- cumsum(d2/dum(d2))   # proportion of total variance from 885 PCs.

and looking at the plots, I see about first ~10 or 20 PCs explain the most
variation (Please see attatched plot). How do I use this to extract the
most relevant sets from my original matrix? COuld you please help.

A friend of mine recommended plotting: rowSums(abs(s$u*s$d)) and choosing
only the highest magnitude sets. I didn't understand the significance of
it. Most probably, it reflects that only the first PC contributes the most,
hence we only care about rowsum(abs(u*d)). Is this correct?

Thanks.

Attachment: variance-cities.pdf
Description: Adobe PDF document

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to