Hi, I just noticed that rpart behaves unexpectecly, when performing classification learning and specifying a loss matrix. if the response variable y is a factor and if not all levels of the factor occur in the observations, rpart exits with an error:
> df=data.frame(attr=1:5,class=factor(c(2,3,1,5,3),levels=1:6)) > rpart(class~attr,df,parms=list(loss=matrix(0,6,6))) Error in (get(paste("rpart", method, sep = ".")))(Y, offset, parms, wt) : Wrong length for loss matrix note that while the levels of the factor range from 1:6, for the concrete obseration data, only levels 1, 2, 3, 5 do occur. the error is caused by the code of rpart.class: fy <- as.factor(y) y <- as.integer(fy) numclass <- max(y[!is.na(y)]) ... temp2 <- parms$loss if (length(temp2) != numclass^2) stop("Wrong length for loss matrix") for the example, numclass is set to 5 instead of 6. while for that small example, it may be discussable whether or not numclass should be 6, consider a set of data for that the response variable has a certain range. Then, it may be the case that for some data, not all levels of the response variable do occur. at the same time, it is desirable to use the same loss matrix when training a deicision tree from the data. having said that, i am very happy with the rpart package and with its high configurability. best regards lars ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel