Hello everybody. I am using the GA package[1] in order to optimize the hyperparameter of SVM like in this example is done: http://stackoverflow.com/questions/32026436/how-to-optimize-parameters-using-genetic-algorithms
However, when I try to adapt the example for random forest, it takes very very long to optimize. It might be because the hyperparameter of random forest are integers (ntree, mtry, nodes) but I don't know if there is a way to specify it in the algorithm. Any suggestion would be very much appreciated. Thank you! The code: library(GA) library("randomForest") data(Ozone, package="mlbench") Data <- na.omit(Ozone) # Setup the data for cross-validation K = 5 # 5-fold cross-validation fold_inds <- sample(1:K, nrow(Data), replace = TRUE) lst_CV_data <- lapply(1:K, function(i) list( train_data = Data[fold_inds != i, , drop = FALSE], test_data = Data[fold_inds == i, , drop = FALSE])) # Given the values of parameters 'ntree', 'mtry' and 'nodesize', return the rmse of the model over the test data evalParamsRF <- function(train_data, test_data, ntree, mtry, nodesize) { # Train model <- randomForest(V4 ~ ., data = train_data, ntree = ntree, mtry = mtry, nodesize = nodesize , proximity=T) # Test rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2) return (rmse) } fitnessFuncRF <- function(x, Lst_CV_Data) { # Retrieve the RF parameters ntree_val <- x[1] mtry_val <- x[2] nodesize_val <- x[3] # Use cross-validation to estimate the RMSE for each split of the dataset rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data, evalParamsRF(train_data, test_data, ntree_val , mtry_val, nodesize_val))) # As fitness measure, return minus the average rmse (over the cross-validation folds), # so that by maximizing fitness we are minimizing the rmse return (-mean(rmse_vals)) } theta_min <- c(ntree = 100, mtry = 2, nodesize = 3) theta_max <- c(ntree = 1000, mtry = 7, nodesize = 20) # Run the genetic algorithm results <- ga(type = "real-valued", fitness = fitnessFuncRF, lst_CV_data, names = names(theta_min), min = theta_min, max = theta_max, popSize = 50, maxiter = 10) summary(results) summary(results)$solution Links: ------ [1] https://cran.r-project.org/web/packages/GA/index.html ------ Aurora González Vidal Ph.D. student in Data Analytics for Energy Efficiency Faculty of Computer Sciences University of Murcia @. aurora.gonzal...@um.es T. 868 88 7866 sae.saiblogs.inf.um.es [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.