[R] Sparse Matrices in R
I have data in i,j,r format, where r is the value in location A[i,j] for some imaginary matrix A. I need to build this matrix A, but given the sizes of i and j, I believe that using a sparse format would be most adequate. Hopefully this will allow me to perform some basic matrix manipulation such as multiplication, addition, rowsums, transpositions, subsetting etc etc. Is there any way to achieve this goal in R? Thanks, Danny [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Calculating sum of squares deviation between 2 similar matrices
Hi all, I've got clusters and would like to match individual records to each cluster based on a sum of squares deviation. For each cluster and individual, I've got 50 variables to use (measured in the same way). Matrix 1 is individuals and is 25000x50. Matrix 2 is the cluster centroids and is 100x50. The same variables are found in each matrix in the same order. I'd like to calculate the 'distance' of matrix 1 to matrix 2 and get a ranking of matrix 2's distances (and row IDs 1 to 100) sorted by distance. I tried using the RDIST and DIST functions but they have true (Euclidean) distances and all I want is the sum of squares deviation across the 50 variables. I don't know how to program the sum of squares deviation across the 50 variables and do it efficiently. Because of the size of the data I'm not sure that apply would work well here, that is why I was using a for loop. The (highly inefficient) code I was using is below if that helps at all. I give you permission to laugh if you want. I'm not remotely close to a programmer. Are there any suggestions from the general readership? I'm using the 1.9.0 on Windows XP with 1GB of RAM. Thanks for your attention, Danny --- #Calculate Euclidean distances between two sets of matrices. library(foreign) library(fields) #centroid is small file with 100x50 centroid <- as.data.frame(read.spss("C:\\centroid.sav")) #in_data is 25000x50 in_data <- as.data.frame(read.spss("C:\\in_vars.sav")) #loop through the in_data records, calculate distances to the 100 centroids #sort the distances in ascending order and write out the centroid # and distance for all 100. for(i in 1:nrow(in_data)) { #first column is the centroid #. columns 2 through 51 have data. aa <- as.matrix(centroid[,2:51]) #first column is a unique identifier. columns 2 through 51 have data. bb <- as.matrix(in_data[i,2:51]) #merge the in_data row to the 100 centroids and calculate Euclidean distance. cc <- rdist(rbind(bb,aa)) #take first column of distance matrix - this column is the distance of in_data row to all 100 centroids. dd <- as.matrix(cc[1,2:151]) #sort dd on distance and attach the centroid number. ee <-c(t(cbind(sort.list(dd), sort(dd #write sorted distance to file write(ee, file="C:\\cluster_distances.txt",ncol=300, append=TRUE) } [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-crash using read.shape (maptools)
Hi Herry, On Thu, 29 Apr 2004 12:20:44 +1000, you wrote: >Hi List, > >I am trying to read a large shapefile (~37,000 polys) using read.shape [winxp, 1gig >ram, dellbox). I receive the following error: > >AppName: rgui.exe AppVer: 1.90.30412.0ModName: maptools.dll >ModVer: 1.90.30412.0Offset: 309d > >The getinfo.shape returns info, and the shapefile is readable in arcmap. > >Any ideas on how to overcome this? > >Thanks Herry > >--- >Alexander Herr - Herry > > > >__ >[EMAIL PROTECTED] mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html I've had difficulty if there is too much detail in the polygon definition (i.e. too many nodes). Try thinning the polygons and try again. Danny __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to improve this code?
Hi all, I've got some functioning code that I've literally taken hours to write. My 'R' coding is getting better...it used to take days :) I know I've done a poor job of optimizing the code. In addition, I'm missing an important step and don't know where to put it. So, three questions: 1) I'd like the resulting output to be sorted on distance (ascending) and to have the 'rank' column represent the sort order, so that rank 1 is the first customer and rank 10 is the 10th. Where do I do this? 2) Can someone suggest ways of 'optimizing' or improving the code? It's the only way I'm going to learn better ways of approaching R. 3) If there are no customers in the store's Trade Area, I'd like the output file have nothing written to it . How can I do that? All help is appreciated. Thanks, Danny * library(fields) #Format of input files: ID, LONGITUDE, LATITUDE #Generate Store List storelist <- cbind(1:100, matrix(rnorm(100, mean = -60, sd = 3), ncol = 1), matrix(rnorm(100, mean = 50, sd = 3), ncol = 1)) #Generate Customer List customerlist <- cbind(1:1,matrix(rnorm(1, mean = -60, sd = 20), ncol = 1), matrix(rnorm(1, mean = 50, sd = 10), ncol = 1)) #Output file outfile <- "c:\\output.txt" outfilecolnames <- c("rank","storeid","custid","distance") write.table(t(outfilecolnames), file = outfile, append=TRUE, sep=",",row.names=FALSE, col.names=FALSE) #Trade Area Size TAsize <- c(100) custlatlon <- customerlist[, 2:3] for(i in 1:length(TAsize)){ for(j in 1:nrow(storelist)){ cat("Store: ", storelist[j]," TA Size = ", TAsize[i], "\n") storelatlon <- storelist[j, 2:3] whichval <- which(rdist.earth(t(as.matrix(storelatlon)), as.matrix(custlatlon), miles=F) <= TAsize[i]) dist <- as.data.frame(rdist.earth(t(as.matrix(storelatlon)), as.matrix(custlatlon), miles=F)[whichval]) storetag <- as.data.frame(cbind(1:nrow(dist),storelist[j,1])) fincalc <- as.data.frame(cbind(1:nrow(dist),(customerlist[whichval,1]),rdist.earth(t(as.matrix(storelatlon)), as.matrix(custlatlon), miles=F)[whichval])) combinedata <- data.frame(storetag, fincalc) combinefinal <- subset(combinedata, select= c(-1,-3)) flush.console() write.table(combinefinal, file = outfile, append=TRUE, sep=",", col.names=FALSE) } } __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster Analysis with minimum cluster size?
Hi all, Is it possible to run kmeans, pam or clara with a constraint such that no resulting cluster has fewer than X cases? These kmeans algorithms often find clusters that are too small for my use. There are usually a few clusters with 1-10 cases (generally substantial outliers). I then have to manually assign the small ones to other sizable clusters. If this doesn't exist, it there such an algorithm that does this? Thanks, Danny __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Distance and Aggregate Data - Again...
I appreciate the help I've been given so far. The issue I face is that the data I'm working with has 53000 rows, so in calculating distance, finding all recids that fall within 2km and summing the population, etc. - a) takes too long and b) have no sense of progress. Below is a loop that reads each recid one at a time, calculates the distance and identifies the recids that fall within 2 km. It iterates through all records successfully. Where I'm stuck is how to get the sum of population and dwellings and the mean age for the records that are selected. Also, the desired output should have the following fields: recid, sum(pop), sum(dwell), mean(age). I don't know how to write only those fields out to the file. Any suggestions? Thank you for your help, Danny # library(fields) d <- as.matrix( read.csv("filein.csv") ) for(i in 1:nrow(d)){ lonlat1 <- d[i,2:3] lonlat2 <- d[,2:3] distval <- d[,1] [which(rdist.earth( t( as.matrix(lonlat1) ), as.matrix(lonlat2), miles=F ) < 2)] write(distval,file="C:\\outfile.out",ncol=1, append=TRUE) } # -- Sample Input Data -- recid,lat,long,pop,dwell,age 10010265,47.5971174,-52.7039227,584,219,38 10010260,47.5971574,-52.7039147,488,188,34 10010263,47.5936538,-52.7037037,605,232,43 10010287,47.5739426,-52.7035365,548,256,29 10010290,47.570,-52.703182,559,336,36 10010284,47.5958782,-52.7013245,394,261,61 10010191,47.5322617,-52.7037037,892,323,23 10010291,47.5700412,-52.7009,0,0,0 10010289,47.5714152,-52.70023,0,0,0 10010285,47.5832183,-52.6995828,469,239,44 10010273,47.5800199,-52.6984875,855,283,28 10010190,47.472353,-52.697991,0,0,0 10010274,47.6018197,-52.6978362,344,117,51 10010288,47.5755249,-52.6978207,33,0,19 10010275,47.6005037,-52.697991,232,93,43 10010279,47.5915368,-52.6954916,983,437,33 10010276,47.5993086,-52.6954808,329,131,28 10010278,47.5958782,-52.6934253,251,107,27 10010354,47.5991086,-52.6934037,27,14,47 10010277,47.5968782,-52.6914148,515,194,37 10010293,47.5778754,-52.6954808,58,0,40 10010292,47.5722183,-52.6899332,1112,523,28 10010353,47.6356972,-52.6896838,1387,471,32 10010283,47.5958439,-52.6884621,531,296,41 10010281,47.5983891,-52.6880528,307,113,52 10010280,47.5958439,-52.6878177,374,129,18 10010282,47.5999645,-52.6880528,637,226,22 10010286,47.5797909,-52.6872042,446,280,32 10010355,47.5797609,-52.6872055,197,72,39 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html