Re: [R] memory problem [cluster]
Roger == Roger Bivand [EMAIL PROTECTED] on Sat, 2 Dec 2006 22:11:12 +0100 (CET) writes: Roger On Sat, 2 Dec 2006, Dylan Beaudette wrote: Hi Stephano, Roger Looks like you used my example verbatim Roger (http://casoilresource.lawr.ucdavis.edu/drupal/node/221) Roger :) From exchanges on R-sig-geo, I believe the original questioner is feeding Roger NAs to clara, and the error message in clara() is overrunning the buffer Roger in sprintf(), so the memory problem isn't correctly identified. Using Roger scripts out of context without checking whether the input data frame Roger satifies the conditions of the functions being used is asking for trouble. Roger The error message: traceback() Roger 2: stop(ngettext(length(i), sprintf(Observation %d has, i[1]), Roger sprintf(Observations %s have, paste(i, collapse = ,))), Roger *only* NAs -- omit for clustering) Roger 1: clara(morph, k = 5, stand = F) Roger is coming from lines: Roger i[1]), sprintf(Observations %s have, paste(i, Roger collapse = ,))), *only* NAs -- omit for clustering) Roger in clara(). I have suggested dropping those rows from the data frame in a Roger reply on R-sig-geo, but maybe clara() could be patched to count the # of Roger completely missing rows, and if # is more than a modest number, not print Roger the obs. numbers, just the total? Yes, thanks Roger, for the hint; I have now done that (will be in cluster_1.11.4): data(xclara) xclara[sample(nrow(xclara), 50),] - NA clara(xclara, k = 3) Error in clara(xclara, k = 3) : 50 observations (6,95,106,191,258,294,295,321,432,601,662,702 ...) have *only* NAs -- na.omit() them for clustering! Lessons to be learned (I have learned it earlier; but not scrutinized all my code to see if it's obeyed :-): - Inside stop(..) be careful not produce another error; particularly not a memory-related one, since this will give user-error messages that are not at all helpful. - All non-beginner R users should be trained to routinely say 'traceback()' after they've seen an error. Regards, Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problem [cluster]
Hi Stephano, Looks like you used my example verbatim (http://casoilresource.lawr.ucdavis.edu/drupal/node/221) :) While my approach has not *yet* been published, the original source [4] by Roger Bivand certainly has. Just a reminder. That said, I would highly recommend reading up on the background literature assocated with both the cluster package [1] and terrain classificartion i.e. [2] and [3]. Note that although the clara() function was created to work on massive datasets, it is still possible to overwhelm the available memory with multiple gridded objects- recall that all R objects are held in memory. I have asked the maintainer of the cluster package, Martin Maechler, about integrating a known medoid option into the clara() function- which would be extremely useful in adding some 'supervision' to landscape classification with clara(). Hopefully there will be enough requests for the feature, that Martin will kindly add it :) . 1. Kaufman, L. Rousseeuw, P.J. Finding Groups in Data An Introduction to Cluster Analysis Wiley-Interscience, 2005 2. Blaszczynski, J. Landform characterization with geographical information systems Photogrammetric Engineering and Remote Sensing, 1997, 63, 183-191 3. Wood, W.F. Snell, J.B. A Quatitative system for classifying landforms U.S. Quatermaster Research Engineering Center, 1960 4. Bivand, R. Integrating GRASS 5.0 and R: GIS and modern statistics Computers Geosciences, 2000, 26, 1043–1052 On Friday 01 December 2006 14:04, Massimo Di Stefano wrote: hi to all, frustated for this error, to day i buy a 1 GB memory slot for my laptop now it have 1,28GB instead the old 512, but i've the same error :-( damn!damn!how can i do? repeat for a little area (about 20X20 km and res=20m) it work fine! have you any suggestion? is ther a method for look if this error depend from my ram or other? thanks foe any suggestion! i need your help. thanks. Massimo Il giorno 01/dic/06, alle ore 16:05, massimodisasha ha scritto: hi, i'm trying to perform a clustering on a big dataframe the code is this: print(load required R packages) require(spgrass6) require(cluster) gmeta6 - gmeta6() print(read in our 7 raster files from GRASS) x - readFLOAT6sp(c(er,crosc,longc,slope,profc,minic,maxic)) print(assemble a matrix of our terrain variables) morph - data.frame(cbind(x$er, x$crosc, x$longc, x$slope, x$profc, x$minic, x$maxic)) print(normailize slope by dividing my max(slope)) morph - data.frame(cbind(x$er, x$crosc, x$longc, x$slope/max(x$slope), x$profc, x$minic, x$maxic)) names(morph) - c(er,crosc,longc,slope_n,profc,minic,maxic) print(perform the clustering) morph.clara - clara(morph, k=5, stand=F) x$morph_class - morph.clara$clustering print(send result back to GRASS) rast.put6(x,morph, zcol=morph_class) during the step : perform the clustering after a lot of time, i've this error: Errore in sprintf(fmt, ...) : La lunghezza della stringa eccede la dimensione del buffer di 8192 Inoltre: Warning messages: 1: perl = TRUE è implementato solo nei locale UTF-8 2: perl = TRUE è implementato solo nei locale UTF-8 3: perl = TRUE è implementato solo nei locale UTF-8 4: perl = TRUE è implementato solo nei locale UTF-8 5: perl = TRUE è implementato solo nei locale UTF-8 6: perl = TRUE è implementato solo nei locale UTF-8 7: perl = TRUE è implementato solo nei locale UTF-8 8: La stringa di caratteri verrà probabilmente troncata Esecuzione interrotta if i try the same code on a subregion of my data, it works very fine! but for a large region i've this error :-( obviously i think that is a memory problem, right ? (i'm working with a notebook PPC-1.33-512ram) my data are : 7 raster-map on a region of about 50X40 km at a resolution of 20m. is there some wolkaround about the memory problems? an other question is: what is this : Warning messages: 1: perl = TRUE è implementato solo nei locale UTF-8 2: perl = TRUE è implementato solo nei locale UTF-8 3: perl = TRUE è implementato solo nei locale UTF-8 4: perl = TRUE è implementato solo nei locale UTF-8 5: perl = TRUE è implementato solo nei locale UTF-8 6: perl = TRUE è implementato solo nei locale UTF-8 7: perl = TRUE è implementato solo nei locale UTF-8 is it about this line of the code : morph.clara - clara(morph, k=5, stand=F) i have an F false thanks for any suggestion about, Massimo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dylan Beaudette Soils and Biogeochemistry Graduate Group University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] memory problem [cluster]
On Sat, 2 Dec 2006, Dylan Beaudette wrote: Hi Stephano, Looks like you used my example verbatim (http://casoilresource.lawr.ucdavis.edu/drupal/node/221) :) From exchanges on R-sig-geo, I believe the original questioner is feeding NAs to clara, and the error message in clara() is overrunning the buffer in sprintf(), so the memory problem isn't correctly identified. Using scripts out of context without checking whether the input data frame satifies the conditions of the functions being used is asking for trouble. The error message: traceback() 2: stop(ngettext(length(i), sprintf(Observation %d has, i[1]), sprintf(Observations %s have, paste(i, collapse = ,))), *only* NAs -- omit for clustering) 1: clara(morph, k = 5, stand = F) is coming from lines: i[1]), sprintf(Observations %s have, paste(i, collapse = ,))), *only* NAs -- omit for clustering) in clara(). I have suggested dropping those rows from the data frame in a reply on R-sig-geo, but maybe clara() could be patched to count the # of completely missing rows, and if # is more than a modest number, not print the obs. numbers, just the total? Roger While my approach has not *yet* been published, the original source [4] by Roger Bivand certainly has. Just a reminder. That said, I would highly recommend reading up on the background literature assocated with both the cluster package [1] and terrain classificartion i.e. [2] and [3]. Note that although the clara() function was created to work on massive datasets, it is still possible to overwhelm the available memory with multiple gridded objects- recall that all R objects are held in memory. I have asked the maintainer of the cluster package, Martin Maechler, about integrating a known medoid option into the clara() function- which would be extremely useful in adding some 'supervision' to landscape classification with clara(). Hopefully there will be enough requests for the feature, that Martin will kindly add it :) . 1. Kaufman, L. Rousseeuw, P.J. Finding Groups in Data An Introduction to Cluster Analysis Wiley-Interscience, 2005 2. Blaszczynski, J. Landform characterization with geographical information systems Photogrammetric Engineering and Remote Sensing, 1997, 63, 183-191 3. Wood, W.F. Snell, J.B. A Quatitative system for classifying landforms U.S. Quatermaster Research Engineering Center, 1960 4. Bivand, R. Integrating GRASS 5.0 and R: GIS and modern statistics Computers Geosciences, 2000, 26, 1043â1052 On Friday 01 December 2006 14:04, Massimo Di Stefano wrote: hi to all, frustated for this error, to day i buy a 1 GB memory slot for my laptop now it have 1,28GB instead the old 512, but i've the same error :-( damn!damn!how can i do? repeat for a little area (about 20X20 km and res=20m) it work fine! have you any suggestion? is ther a method for look if this error depend from my ram or other? thanks foe any suggestion! i need your help. thanks. Massimo Il giorno 01/dic/06, alle ore 16:05, massimodisasha ha scritto: hi, i'm trying to perform a clustering on a big dataframe the code is this: print(load required R packages) require(spgrass6) require(cluster) gmeta6 - gmeta6() print(read in our 7 raster files from GRASS) x - readFLOAT6sp(c(er,crosc,longc,slope,profc,minic,maxic)) print(assemble a matrix of our terrain variables) morph - data.frame(cbind(x$er, x$crosc, x$longc, x$slope, x$profc, x$minic, x$maxic)) print(normailize slope by dividing my max(slope)) morph - data.frame(cbind(x$er, x$crosc, x$longc, x$slope/max(x$slope), x$profc, x$minic, x$maxic)) names(morph) - c(er,crosc,longc,slope_n,profc,minic,maxic) print(perform the clustering) morph.clara - clara(morph, k=5, stand=F) x$morph_class - morph.clara$clustering print(send result back to GRASS) rast.put6(x,morph, zcol=morph_class) during the step : perform the clustering after a lot of time, i've this error: Errore in sprintf(fmt, ...) : La lunghezza della stringa eccede la dimensione del buffer di 8192 Inoltre: Warning messages: 1: perl = TRUE è implementato solo nei locale UTF-8 2: perl = TRUE è implementato solo nei locale UTF-8 3: perl = TRUE è implementato solo nei locale UTF-8 4: perl = TRUE è implementato solo nei locale UTF-8 5: perl = TRUE è implementato solo nei locale UTF-8 6: perl = TRUE è implementato solo nei locale UTF-8 7: perl = TRUE è implementato solo nei locale UTF-8 8: La stringa di caratteri verrà probabilmente troncata Esecuzione interrotta if i try the same code on a subregion of my data, it works very fine! but for a large region i've this error :-( obviously i think that is a memory problem, right ? (i'm working with a notebook PPC-1.33-512ram) my data are : 7 raster-map on a region of about 50X40 km at a resolution of 20m. is there some wolkaround about the memory problems? an other question