Hi all, For some odd reason when running naïve bayes, k-NN, etc., I get slightly different results (e.g., error rates, classification probabilities) from run to run even though I am using the same random seed.
Nothing else (input-wise) is changing, but my results are somewhat different from run to run. The only randomness should be in the partitioning, and I have set the seed before this point. My question simply is: should the location of the set.seed command matter, provided that it is applied before any commands which involve randomness (such as partitioning)? If you need to see the code, it is below: Thank you, Gary A. Separate the original (in-sample) data from the new (out-of-sample) data. Set a random seed. > InvestTech <- as.data.frame(InvestTechRevised) > outOfSample <- InvestTech[5001:nrow(InvestTech), ] > InvestTech <- InvestTech[1:5000, ] > set.seed(654321) B. Install and load the caret, ggplot2 and e1071 packages. > install.packages(caret) > install.packages(ggplot2) > install.packages(e1071) > library(caret) > library(ggplot2) > library(e1071) C. Bin the predictor variables with approximately equal counts using the cut_number function from the ggplot2 package. We will use 20 bins. > InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20) > InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20) > outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20) > outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20) D. Partition the original (in-sample) data into 60% training and 40% validation sets. > n <- nrow(InvestTech) > train <- sample(1:n, size = 0.6 * n, replace = FALSE) > InvestTechTrain <- InvestTech[train, ] > InvestTechVal <- InvestTech[-train, ] E. Use the naiveBayes function in the e1071 package to fit the model. > model <- naiveBayes(`Purchase (1=yes, 0=no)` ~ ., data = InvestTechTrain) > prob <- predict(model, newdata = InvestTechVal, type = raw) > pred <- ifelse(prob[, 2] >= 0.3, 1, 0) F. Use the confusionMatrix function in the caret package to output the confusion matrix. > confMtr <- confusionMatrix(pred,unlist(InvestTechVal[, 3]),mode = everything, positive = 1) > accuracy <- confMtr$overall[1] > valError <- 1 accuracy > confMtr G. Classify the 18 new (out-of-sample) readers using the following code. > prob <- predict(model, newdata = outOfSample, type = raw) > pred <- ifelse(prob[, 2] >= 0.3, 1, 0) > cbind(pred, prob, outOfSample[, -3]) --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.