Re: [R] memory problem [cluster]

2006-12-05 Thread Martin Maechler
 Roger == Roger Bivand [EMAIL PROTECTED]
 on Sat, 2 Dec 2006 22:11:12 +0100 (CET) writes:

Roger On Sat, 2 Dec 2006, Dylan Beaudette wrote:
 Hi Stephano,

Roger Looks like you used my example verbatim 
Roger (http://casoilresource.lawr.ucdavis.edu/drupal/node/221)

Roger :)

 From exchanges on R-sig-geo, I believe the original questioner is feeding
Roger NAs to clara, and the error message in clara() is overrunning the 
buffer
Roger in sprintf(), so the memory problem isn't correctly identified. Using
Roger scripts out of context without checking whether the input data frame 
Roger satifies the conditions of the functions being used is asking for 
trouble. 
Roger The error message:

 traceback()
Roger 2: stop(ngettext(length(i), sprintf(Observation %d has, i[1]),
Roger sprintf(Observations %s have, paste(i, collapse = ,))),
Roger  *only* NAs -- omit for clustering)
Roger 1: clara(morph, k = 5, stand = F)

Roger is coming from lines:

Roger i[1]), sprintf(Observations %s have, paste(i, 
Roger collapse = ,))),  *only* NAs -- omit for clustering)

Roger in clara(). I have suggested dropping those rows from the data frame 
in a 
Roger reply on R-sig-geo, but maybe clara() could be patched to count the 
# of 
Roger completely missing rows, and if # is more than a modest number, not 
print 
Roger the obs. numbers, just the total?

Yes, thanks Roger, for the hint; I have now done that
(will be in cluster_1.11.4):

   data(xclara)
   xclara[sample(nrow(xclara), 50),] - NA
   clara(xclara, k = 3)
  Error in clara(xclara, k = 3) : 50 observations 
(6,95,106,191,258,294,295,321,432,601,662,702 ...)
  have *only* NAs -- na.omit() them for clustering!


Lessons to be learned (I have learned it earlier; but not
scrutinized all my code to see if it's obeyed :-):  

- Inside stop(..) be careful not produce another error;
  particularly not a memory-related one, since this will give
  user-error messages that are not at all helpful.

- All non-beginner R users should be trained to routinely say
  'traceback()' after they've seen an error.

Regards,
Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problem [cluster]

2006-12-02 Thread Dylan Beaudette
Hi Stephano,

Looks like you used my example verbatim 
(http://casoilresource.lawr.ucdavis.edu/drupal/node/221)

:)

While my approach has not *yet* been published, the original source [4] by 
Roger Bivand certainly has. Just a reminder.

That said, I would highly recommend reading up on the background literature 
assocated with both the cluster package [1] and terrain classificartion i.e.
[2] and [3]. Note that although the clara() function was created to work on 
massive datasets, it is still possible to overwhelm the available memory with 
multiple gridded objects- recall that all R objects are held in memory.

I have asked the maintainer of the cluster package, Martin Maechler, about 
integrating a known medoid option into the clara() function- which would be 
extremely useful in adding some 'supervision' to landscape classification 
with clara(). Hopefully there will be enough requests for the feature, that 
Martin will kindly add it :) .

1. Kaufman, L.  Rousseeuw, P.J. Finding Groups in Data An Introduction to 
Cluster Analysis Wiley-Interscience, 2005

2. Blaszczynski, J. Landform characterization with geographical information 
systems Photogrammetric Engineering and Remote Sensing, 1997, 63, 183-191

3. Wood, W.F.  Snell, J.B. A Quatitative system for classifying landforms 
U.S. Quatermaster Research  Engineering Center, 1960

4. Bivand, R. Integrating GRASS 5.0 and R: GIS and modern statistics Computers 
 Geosciences, 2000, 26, 1043–1052


On Friday 01 December 2006 14:04, Massimo Di Stefano wrote:
 hi to all,
 frustated for this error, to day i buy a 1 GB memory
 slot for my laptop
 now it have 1,28GB instead the old 512, but i've the
 same error :-(
 damn!damn!how can i do?
 repeat for a little area (about 20X20 km and res=20m)
 it work fine!
 have you any suggestion?
 is ther a method for look if this error depend from my
 ram or other?
 thanks foe any suggestion!
 i need your help.
 thanks.
 Massimo


 Il giorno 01/dic/06, alle ore 16:05, massimodisasha ha
 scritto:
 hi,
 i'm trying to perform a clustering on a big dataframe
 the code is this:


 print(load required R packages)

 require(spgrass6)

 require(cluster)

 gmeta6 - gmeta6()

 print(read in our 7 raster files from GRASS)

 x -
 readFLOAT6sp(c(er,crosc,longc,slope,profc,minic,maxic))

 print(assemble a matrix of our terrain variables)

 morph - data.frame(cbind(x$er, x$crosc, x$longc,
 x$slope, x$profc, x$minic, x$maxic))

 print(normailize slope by dividing my max(slope))

 morph - data.frame(cbind(x$er, x$crosc, x$longc,
 x$slope/max(x$slope), x$profc, x$minic, x$maxic))

 names(morph) -
 c(er,crosc,longc,slope_n,profc,minic,maxic)

 print(perform the clustering)

 morph.clara - clara(morph, k=5, stand=F)

 x$morph_class - morph.clara$clustering

 print(send result back to GRASS)

 rast.put6(x,morph, zcol=morph_class)



 during the step : perform the clustering
 after a lot of time,
 i've this error:




 Errore in sprintf(fmt, ...) : La lunghezza della
 stringa eccede la dimensione del buffer di 8192
 Inoltre: Warning messages:
 1: perl = TRUE è implementato solo nei locale UTF-8
 2: perl = TRUE è implementato solo nei locale UTF-8
 3: perl = TRUE è implementato solo nei locale UTF-8
 4: perl = TRUE è implementato solo nei locale UTF-8
 5: perl = TRUE è implementato solo nei locale UTF-8
 6: perl = TRUE è implementato solo nei locale UTF-8
 7: perl = TRUE è implementato solo nei locale UTF-8
 8: La stringa di caratteri verrà probabilmente
 troncata
 Esecuzione interrotta



 if i try the same code on a subregion of my data, it
 works very fine!
 but for a large region i've this error :-(

 obviously i think that is a memory problem, right ?
 (i'm working with a notebook PPC-1.33-512ram)
 my data are  : 7 raster-map on a region of about 50X40
 km at a resolution of 20m.
 is there some wolkaround about the memory problems?

 an other question is:
 what is this :
 Warning messages:
 1: perl = TRUE è implementato solo nei locale UTF-8
 2: perl = TRUE è implementato solo nei locale UTF-8
 3: perl = TRUE è implementato solo nei locale UTF-8
 4: perl = TRUE è implementato solo nei locale UTF-8
 5: perl = TRUE è implementato solo nei locale UTF-8
 6: perl = TRUE è implementato solo nei locale UTF-8
 7: perl = TRUE è implementato solo nei locale UTF-8

 is it about this line of the code :

 morph.clara - clara(morph, k=5, stand=F)
 i have an F  false


 thanks for any suggestion about,

 Massimo

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] memory problem [cluster]

2006-12-02 Thread Roger Bivand
On Sat, 2 Dec 2006, Dylan Beaudette wrote:

 Hi Stephano,

Looks like you used my example verbatim 
(http://casoilresource.lawr.ucdavis.edu/drupal/node/221)

:)

From exchanges on R-sig-geo, I believe the original questioner is feeding
NAs to clara, and the error message in clara() is overrunning the buffer
in sprintf(), so the memory problem isn't correctly identified. Using
scripts out of context without checking whether the input data frame 
satifies the conditions of the functions being used is asking for trouble. 
The error message:

  traceback()
2: stop(ngettext(length(i), sprintf(Observation %d has, i[1]),
sprintf(Observations %s have, paste(i, collapse = ,))),
 *only* NAs -- omit for clustering)
1: clara(morph, k = 5, stand = F)

is coming from lines:

i[1]), sprintf(Observations %s have, paste(i, 
collapse = ,))),  *only* NAs -- omit for clustering)

in clara(). I have suggested dropping those rows from the data frame in a 
reply on R-sig-geo, but maybe clara() could be patched to count the # of 
completely missing rows, and if # is more than a modest number, not print 
the obs. numbers, just the total?

Roger


While my approach has not *yet* been published, the original source [4] by 
Roger Bivand certainly has. Just a reminder.

That said, I would highly recommend reading up on the background literature 
assocated with both the cluster package [1] and terrain classificartion i.e.
[2] and [3]. Note that although the clara() function was created to work on 
massive datasets, it is still possible to overwhelm the available memory with 
multiple gridded objects- recall that all R objects are held in memory.

I have asked the maintainer of the cluster package, Martin Maechler, about 
integrating a known medoid option into the clara() function- which would be 
extremely useful in adding some 'supervision' to landscape classification 
with clara(). Hopefully there will be enough requests for the feature, that 
Martin will kindly add it :) .

1. Kaufman, L.  Rousseeuw, P.J. Finding Groups in Data An Introduction to 
Cluster Analysis Wiley-Interscience, 2005

2. Blaszczynski, J. Landform characterization with geographical information 
systems Photogrammetric Engineering and Remote Sensing, 1997, 63, 183-191

3. Wood, W.F.  Snell, J.B. A Quatitative system for classifying landforms 
U.S. Quatermaster Research  Engineering Center, 1960

4. Bivand, R. Integrating GRASS 5.0 and R: GIS and modern statistics Computers 
 Geosciences, 2000, 26, 1043–1052


On Friday 01 December 2006 14:04, Massimo Di Stefano wrote:
 hi to all,
 frustated for this error, to day i buy a 1 GB memory
 slot for my laptop
 now it have 1,28GB instead the old 512, but i've the
 same error :-(
 damn!damn!how can i do?
 repeat for a little area (about 20X20 km and res=20m)
 it work fine!
 have you any suggestion?
 is ther a method for look if this error depend from my
 ram or other?
 thanks foe any suggestion!
 i need your help.
 thanks.
 Massimo


 Il giorno 01/dic/06, alle ore 16:05, massimodisasha ha
 scritto:
 hi,
 i'm trying to perform a clustering on a big dataframe
 the code is this:


 print(load required R packages)

 require(spgrass6)

 require(cluster)

 gmeta6 - gmeta6()

 print(read in our 7 raster files from GRASS)

 x -
 readFLOAT6sp(c(er,crosc,longc,slope,profc,minic,maxic))

 print(assemble a matrix of our terrain variables)

 morph - data.frame(cbind(x$er, x$crosc, x$longc,
 x$slope, x$profc, x$minic, x$maxic))

 print(normailize slope by dividing my max(slope))

 morph - data.frame(cbind(x$er, x$crosc, x$longc,
 x$slope/max(x$slope), x$profc, x$minic, x$maxic))

 names(morph) -
 c(er,crosc,longc,slope_n,profc,minic,maxic)

 print(perform the clustering)

 morph.clara - clara(morph, k=5, stand=F)

 x$morph_class - morph.clara$clustering

 print(send result back to GRASS)

 rast.put6(x,morph, zcol=morph_class)



 during the step : perform the clustering
 after a lot of time,
 i've this error:




 Errore in sprintf(fmt, ...) : La lunghezza della
 stringa eccede la dimensione del buffer di 8192
 Inoltre: Warning messages:
 1: perl = TRUE è implementato solo nei locale UTF-8
 2: perl = TRUE è implementato solo nei locale UTF-8
 3: perl = TRUE è implementato solo nei locale UTF-8
 4: perl = TRUE è implementato solo nei locale UTF-8
 5: perl = TRUE è implementato solo nei locale UTF-8
 6: perl = TRUE è implementato solo nei locale UTF-8
 7: perl = TRUE è implementato solo nei locale UTF-8
 8: La stringa di caratteri verrà probabilmente
 troncata
 Esecuzione interrotta



 if i try the same code on a subregion of my data, it
 works very fine!
 but for a large region i've this error :-(

 obviously i think that is a memory problem, right ?
 (i'm working with a notebook PPC-1.33-512ram)
 my data are  : 7 raster-map on a region of about 50X40
 km at a resolution of 20m.
 is there some wolkaround about the memory problems?

 an other question