Dear All,
I am trying to mine a small dataset.
Admittedly, it is a bit odd since it is an example of
multi-classification task where I have more than 300 different classes for 
about 600
observations.
Having said that, the problem is not the output of my script, but the
fact that it gets stuck, without an error message, when I use C5.0 and
caret.
I recycled another script of mine which never gave me any headache, so
I do not know what is going on.
The small training set can be downloaded from


https://www.dropbox.com/s/4yseukqqvssvh63/training.csv?dl=0


whereas I paste my script at the end of the email.
C5.0 without caret completes in seconds, so I must be making some
mistakes with Caret.
Any suggestion is appreciated.

Lorenzo

####################################################

library(caret)
library(readr)
library(C50)
library(doMC)
library(digest)


train <- read_csv("training.csv")

ncores <- 2


registerDoMC(cores = ncores)


set.seed(123)


shuffle <- sample(nrow(train))

train <- train[shuffle, ]


train$productid <- as.character(train$productid)

train$productid <- paste('fac', train$productid, sep='')

train$productid <- as.factor(train$productid)

train$State <- as.factor(train$State)

train$category <- as.factor(train$category)

train$unit <- as.factor(train$unit)

for (i in seq(nrow(train))){

train$myname[i] <- digest(train$myname[i], algo='crc32')

}


train <- subset(train, select=-c(straincategory, description))


### this completes quickly
oneTree <- C5.0(productid ~ ., data = train, trials=10)




c50Grid <- expand.grid(trials = c(10),
        model = c( "tree" ## ,"rules"
                            ),winnow = c(## TRUE,
                                                     FALSE ))




tc <- trainControl(method = "repeatedCV", summaryFunction=mnLogLoss,
                  number = 5, repeats = 5, verboseIter=TRUE,
                  classProbs=TRUE)



### but this takes forever
model <- train(productid~., data=train, method="C5.0", trControl=tc,
                             metric="logLoss",##
                             strata=train$donation,
                                             ## sampsize=rep(nmin,
                             length(levels(train$donation))),
                                             ## control =
                             C5.0Control(fuzzyThreshold = T),
                                             maximize=FALSE,
                             tuneGrid=c50Grid)

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to