Re: [R] predict function type class vs. prob

David Winsemius Sat, 23 Sep 2023 12:32:30 -0700

That's embarrassing. Apologies for the garbles HTML posting. I'll see ifthis is more readable:


On 9/23/23 05:30, Rui Barradas wrote:

Às 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu:
Dear R Help Team,
My research group and I use R scripts for our multivariate datascreening routines. During routine use, we encountered someinconsistencies within the predict() function of the R Stats Package.





On 9/23/23 05:30, Rui Barradas wrote:
> Às 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu:
>> Dear R Help Team,
>>

>> My research group and I use R scripts for our multivariate datascreening routines. During routine use, we encountered someinconsistencies within the predict() function of the R Stats Package.

In addition to Rui's correction to this misstatement, the caret packageis really a meta package that attempts to implement an umbrellaframework for a vast array of tools from a wide variety of sources. Itis an immense effort but not really a part of the core R project. Thecorrect place to file issues is found in the DESCRIPTION file:



URL: https://github.com/topepo/caret/
BugReports: https://github.com/topepo/caret/issues

If you use `str` on an object constructed with caret, you discoverthat the `predict` function is actually not in the main workspace butrather embedded in the fit-object itself. I think this is a rathergeneral statement regarding the caret universe, and so I expect thatyour fit -objects can be examined for the code that predict.train willuse with this approach. Your description of your analysis methods wasrather incompletely specified, and I will put an appendix of "svm"methods that might be specified after my demonstration using code. (Notethat I do not see a caret "weights" hyper-parameter for the "svmLinear"method which is actually using code from pkg:kernlab.)



library(caret)
svmFit <- train(Species ~ ., data = iris, method = "svmLinear",
                 trControl = trainControl(method = "cv"))

 class(svmFit)
#[1] "train"         "train.formula"
str(predict(svmFit))
 Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
str(svmFit)
#---screen output-------------
List of 24
 $ method      : chr "svmLinear"
 $ modelInfo   :List of 13
  ..$ label     : chr "Support Vector Machines with Linear Kernel"
  ..$ library   : chr "kernlab"
  ..$ type      : chr [1:2] "Regression" "Classification"
  ..$ parameters:'data.frame':    1 obs. of  3 variables:
  .. ..$ parameter: chr "C"
  .. ..$ class    : chr "numeric"
  .. ..$ label    : chr "Cost"
  ..$ grid      :function (x, y, len = NULL, search = "grid")
  ..$ loop      : NULL
  ..$ fit       :function (x, y, wts, param, lev, last, classProbs, ...)
  ..$ predict   :function (modelFit, newdata, submodels = NULL)
  ..$ prob      :function (modelFit, newdata, submodels = NULL)
  ..$ predictors:function (x, ...)

..$ tags : chr [1:5] "Kernel Method" "Support Vector Machines""Linear Regression" "Linear Classifier" ...

  ..$ levels    :function (x)
  ..$ sort      :function (x)
 $ modelType   : chr "Classification"
#  ---- large amount of screen output omitted------

# note that the class of svmFit$modelInfo$predict is 'function'

# and its code at least to this particular svm method of which there areabout 10!



svmFit$modelInfo$predict

#---- screen output ------
function (modelFit, newdata, submodels = NULL)
{
    svmPred <- function(obj, x) {
        hasPM <- !is.null(unlist(obj@prob.model))
        if (hasPM) {
            pred <- kernlab::lev(obj)[apply(kernlab::predict(obj,
                x, type = "probabilities"), 1, which.max)]
        }
        else pred <- kernlab::predict(obj, x)
        pred
    }
    out <- try(svmPred(modelFit, newdata), silent = TRUE)
    if (is.character(kernlab::lev(modelFit))) {
        if (class(out)[1] == "try-error") {

warning("kernlab class prediction calculations failed;returning NAs")

            out <- rep("", nrow(newdata))
            out[seq(along = out)] <- NA
        }
    }
    else {
        if (class(out)[1] == "try-error") {

warning("kernlab prediction calculations failed; returningNAs")

            out <- rep(NA, nrow(newdata))
        }
    }
    if (is.matrix(out))
        out <- out[, 1]
    out
}
<bytecode: 0x561277d4ec50>

--
David

>> Through internal research, we were unable to find the reason forthis and have decided to contact your help team with the following issue:

>>

>> The predict() function is used once to predict the class membershipof a new sample (type = "class") on a trained linear SVM model fordistinguishing two classes (using the caret package). It is then used toalso examine the probability of class membership (type = "prob"). Bothare then presented in an R shiny output. Within the routine, we noticedtwo samples (out of 100+) where the class prediction and probabilityprediction did not match. The prediction probabilities of one class(52%) did not match the class membership within the predict function. Weuse the same seed and the discrepancy is reproducible in this sample.The same problem did not occur in other trained models (lda, randomforest, radial SVM...).

Support Vector Machines with Boundrange String Kernel (method ='svmBoundrangeString')