On 7/7/2005 3:38 PM, Weiwei Shi wrote: > Hi there: > I have a question on random foresst: > > recently i helped a friend with her random forest and i came with this > problem: > her dataset has 6 classes and since the sample size is pretty small: > 264 and the class distr is like this (Diag is the response variable) > sample.size <- lapply(1:6, function(i) sum(Diag==i)) >> sample.size > [[1]] > [1] 36 > > [[2]] > [1] 12 > > [[3]] > [1] 120 > > [[4]] > [1] 36 > > [[5]] > [1] 30 > > [[6]] > [1] 30 > > I assigned this sample.size to sampsz for a stratiefied sampling > purpose and i got the following error: > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument > > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is fine. Could you > tell me why?
The sum() function knows what to do on a vector, but not on a list. You can turn your sample.size variable into a vector using unlist(sample.size) Duncan Murdoch > btw, as to classification problem for this with uneven class number > situation, do u have some suggestions to improve its accuracy? I > tried to use c() way to make the sampsz works but the result is > similar. > > Thanks, > > weiwei > > On 6/30/05, Liaw, Andy <[EMAIL PROTECTED]> wrote: >> The limitation comes from the way categorical splits are represented in the >> code: For a categorical variable with k categories, the split is >> represented by k binary digits: 0=right, 1=left. So it takes k bits to >> store each split on k categories. To save storage, this is `packed' into a >> 4-byte integer (32-bit), thus the limit of 32 categories. >> >> The current Fortran code (version 5.x) by Breiman and Cutler gets around >> this limitation by storing the split in an integer array. While this lifts >> the 32-category limit, it takes much more memory to store the splits. I'm >> still trying to figure out a more memory efficient way of storing the splits >> without imposing the 32-category limit. If anyone has suggestions, I'm all >> ears. >> >> Best, >> Andy >> >> > From: [EMAIL PROTECTED] >> > >> > Hello, >> > >> > I'm using the random forest package. One of my factors in the >> > data set contains 41 levels (I can't code this as a numeric >> > value - in terms of linear models this would be a random >> > factor). The randomForest call comes back with an error >> > telling me that the limit is 32 categories. >> > >> > Is there any reason for this particular limit? Maybe it's >> > possible to recompile the module with a different cutoff? >> > >> > thanks a lot for your help, >> > kind regards, >> > >> > >> > Arne >> > >> > ______________________________________________ >> > R-help@stat.math.ethz.ch mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide! >> > http://www.R-project.org/posting-guide.html >> > >> > >> > >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >> > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html