Re: [R] Random Forest classification

2016-04-18 Thread Liaw, Andy
This is explained in the "Details" section of the help page for partialPlot. Best Andy > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jesús Para > Fernández > Sent: Tuesday, April 12, 2016 1:17 AM > To: r-help@r-project.org > Subject: [R] Random

Re: [R] rpart and randomforest results

2014-04-07 Thread Liaw, Andy
Hi Sonja, How did you build the rpart tree (i.e., what settings did you use in rpart.control)? Rpart by default will use cross validation to prune back the tree, whereas RF doesn't need that. There are other more subtle differences as well. If you want to compare single tree results, you

Re: [R] randomForest warning: The response has five or fewer unique values. Are you sure you want to do regression?

2014-03-24 Thread Liaw, Andy
If you are using the code, that's not really using randomForest directly. I don't understand the data structure you have (since you did not show anything) so can't really tell you much. In any case, that warning came from randomForest() when it is run in regression mode but the response has

Re: [R] Variable importance - ANN

2013-12-04 Thread Liaw, Andy
You can try something like this: http://pubs.acs.org/doi/abs/10.1021/ci050022a Basically similar idea to what is done in random forests: permute predictor variable one at a time and see how much that degrades prediction performance. Cheers, Andy -Original Message- From:

Re: [R] How do I extract Random Forest Terms and Probabilities?

2013-12-02 Thread Liaw, Andy
#2 can be done simply with predict(fmi, type=prob). See the help page for predict.randomForest(). Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Tuesday, November 26, 2013 6:57 PM To: R help Subject: Re:

Re: [R] interpretation of MDS plot in random forest

2013-12-02 Thread Liaw, Andy
Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help@r-project.org

Re: [R] Split type in the RandomForest package

2013-11-20 Thread Liaw, Andy
Classification trees use the Gini index, whereas the regression trees use sum of squared errors. They are hard-wired into the C/Fortran code, so not easily changeable. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of

Re: [R] What is the difference between Mean Decrease Accuracy produced by importance(foo) vs foo$importance in a Random Forest Model?

2013-11-19 Thread Liaw, Andy
The difference is importance(..., scale=TRUE). See the help page for detail. If you extract the $importance component from a randomForest object, you do not get the scaling. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On

Re: [R] FW: Nadaraya-Watson kernel

2013-11-07 Thread Liaw, Andy
Use KernSmooth (one of the recommended packages that are included in R distribution). E.g., library(KernSmooth) KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 x - seq(0, 1, length=201) y - 4 * cos(2*pi*x) + rnorm(x) f - locpoly(x, y, degree=0, kernel=epan, bandwidth=.1) plot(x, y)

Re: [R] Creating 3d partial dependence plots

2013-03-20 Thread Liaw, Andy
It needs to be done by hand, in that partialPlot() does not handle more than one variable at a time. You need to modify its code to do that (and be ready to wait even longer, as it can be slow). Andy -Original Message- From: r-help-boun...@r-project.org

Re: [R] How do I make R randomForest model size smaller?

2012-12-04 Thread Liaw, Andy
Try the following: set.seed(100) rf1 - randomForest(Species ~ ., data=iris) set.seed(100) rf2 - randomForest(iris[1:4], iris$Species) object.size(rf1) object.size(rf2) str(rf1) str(rf2) You can try it on your own data. That should give you some hints about why the formula interface should be

Re: [R] Different results from random.Forest with test option and using predict function

2012-12-04 Thread Liaw, Andy
Without data to reproduce what you saw, we can only guess. One possibility is due to tie-breaking. There are several places where ties can occur and are broken at random, including at the prediction step. One difference between the two ways of doing prediction is that when it's all done

Re: [R] Partial dependence plot in randomForest package (all flat responses)

2012-11-26 Thread Liaw, Andy
Not unless we have more information. Please read the Posting Guide to see how to make it easier for people to answer your question. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Oritteropus Sent: Thursday, November

Re: [R] Random Forest for multiple categorical variables

2012-10-17 Thread Liaw, Andy
How about taking the combination of the two? E.g., gamma = factor(paste(alpha, beta1, sep=:)) and use gamma as the response. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gyanendra Pokharel Sent: Tuesday, October

Re: [R] Random Forest - Extract

2012-10-03 Thread Liaw, Andy
1. Not sure what you want. What details are you looking for exactly? If you call predict(trainset) without the newdata argument, you will get the (out-of-bag) prediction of the training set, which is exactly the predicted component of the RF object. 2. If you set type=votes and

Re: [R] interpret the importance output?

2012-08-29 Thread Liaw, Andy
The type=1 importance measure in RF compares the prediction error of each tree on the OOB data with the prediction error of the same tree on the OOB data with the values of one variable randomly shuffled. If the variable has no predictive power, then the two should be very close, and there's

Re: [R] Stratified Sampling with randomForest Regression

2012-06-01 Thread Liaw, Andy
Yes, you need to modify both the R and the underlying C code. It's the the source package on CRAN (the .tar.gz file). Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Josh Browning Sent: Friday, June 01, 2012 10:48 AM To:

Re: [R] Question about random Forest function in R

2012-05-29 Thread Liaw, Andy
Hi Kelly, The function has a limitation that it cannot handle any column in your x that is a categorical variable with more than 32 categories. One possibility is to see if you can bin some of the categories into one to get below 32 categories. Andy -Original Message- From:

Re: [R] Random Forest Classification_ForestCombination

2012-05-29 Thread Liaw, Andy
As long as you can remember that the summaries such as variable importance, OOB predictions, and OOB error rates are not applicable, I think that should be fine. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Nikita Desai

Re: [R] Random forests prediction

2012-05-14 Thread Liaw, Andy
I don't think this is so hard to explain. If you evaluate AUC using either OOB prediction or on a test set (or something like CV or bootstrap), that would be what I expect for most data. When you add more variables (that are, say, less informative) to a model, the model has to look harder to

Re: [R] No Data in randomForest predict

2012-05-14 Thread Liaw, Andy
It doesn't: You just get an error if there are NAs in the data; e.g., R rf1 = randomForest(iris[1:4], iris[[5]]) R predict(rf1, newdata=data.frame(Sepal.Length=1, Sepal.Width=2, Petal.Length=3, Petal.Width=NA)) Error in predict.randomForest(rf1, newdata = data.frame(Sepal.Length = 1, :

Re: [R] Random forests prediction

2012-05-14 Thread Liaw, Andy
That's not how RF works at all. The setting of mtry is irrelevant to this. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of matt Sent: Monday, May 14, 2012 10:22 AM To: r-help@r-project.org Subject: Re: [R] Random forests

Re: [R] Partial Dependence and RandomForest

2012-04-17 Thread Liaw, Andy
Note that the partialPlot() function also returns the x-y pairs being plotted, so you can work from there if you wish. As to SD, my guess is you want some sort of confidence interval or band around the curve? I do not know of any theory to produce that, but that may well just be my ignorance.

Re: [R] loess function take

2012-04-13 Thread Liaw, Andy
Alternatively, use only a subset to run loess(), either a random sample or something like every other k-th (sorted) data value, or the quantiles. It's hard for me to imagine that that many data points are going to improve your model much at all (unless you use tiny span). Andy From:

Re: [R] Partial Dependence and RandomForest

2012-04-13 Thread Liaw, Andy
Please read the help page for the partialPlot() function and make sure you learn about all its arguments (in particular, which.class). Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jmc Sent: Wednesday, April 11, 2012 2:44

Re: [R] Execution speed in randomForest

2012-04-13 Thread Liaw, Andy
Without seeing your code, it's hard to say much more, but do avoid using formula when you have large data. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jason Caroline Shaw Sent: Friday, April 06, 2012 1:20 PM To: jim

Re: [R] Imputing missing values using LSmeans (i.e., population marginal means) - advice in R?

2012-04-05 Thread Liaw, Andy
Don't know how you searched, but perhaps this might help: https://stat.ethz.ch/pipermail/r-help/2007-March/128064.html -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jenn Barrett Sent: Tuesday, April 03, 2012 1:23 AM To:

Re: [R] Question about randomForest

2012-04-04 Thread Liaw, Andy
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Saruman I dont see how this answered the original question of the poster. He was quite clear: the value of the predictions coming out of RF do not match what comes out of the predict function using

Re: [R] Memory limits for MDSplot in randomForest package

2012-03-30 Thread Liaw, Andy
Sam, As you've probably seen, all the MDSplot() function does is feed 1 - proximity to the cmdscale() function. Some suggestion and clarification: 1. If all you want is the proximity matrix, you can run randomForest() with keep.forest=FALSE to save memory. You will likely want to run

Re: [R] fitted values with locfit

2012-03-28 Thread Liaw, Andy
I believe you are expecting the software to do what it did not claim being able to do. predict.locfit() does not have a type argument, nor can that take on terms. When you specify two variables in the smooth, a bivariate smooth is done, so you get one bivariate smooth function, not the sum of

[R] job opening at Merck Research Labs, NJ USA

2012-03-20 Thread Liaw, Andy
The Biometrics Research department at the Merck Research Laboratories has an open position to be located in Rahway, New Jersey, USA: This position will be responsible for imaging and bio-signal biomarkers projects including analysis of preclinical, early clinical, and experimental medicine

Re: [R] Using caegorical variables in package randomForest.

2012-03-13 Thread Liaw, Andy
The way to represent categorical variables is with factors. See ?factor. randomForest() will handle factors appropriately, as most modeling functions in R. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of abhishek

Re: [R] Help on reshape function

2012-03-06 Thread Liaw, Andy
Just using the reshape() function in base R: df.long = reshape(df, varying=list(names(df)[4:7]), direction=long) This also gives two extra columns (time and id) can can be dropped. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On

Re: [R] Good and modern Kernel Regression package in R with auto-bandwidth?

2012-02-29 Thread Liaw, Andy
That's why I said you need the book. The details are all in the book. From: Michael [mailto:comtech@gmail.com] Sent: Thursday, February 23, 2012 1:49 PM To: Liaw, Andy Cc: r-help Subject: Re: [R] Good and modern Kernel Regression package in R with auto

Re: [R] Good and modern Kernel Regression package in R with auto-bandwidth?

2012-02-23 Thread Liaw, Andy
to get most mileage out of it though. Andy From: Michael [mailto:comtech@gmail.com] Sent: Thursday, February 23, 2012 12:25 AM To: Liaw, Andy Cc: Bert Gunter; r-help Subject: Re: [R] Good and modern Kernel Regression package in R with auto-bandwidth? $B#I

Re: [R] Good and modern Kernel Regression package in R with auto-bandwidth?

2012-02-23 Thread Liaw, Andy
@gmail.com] Sent: Thursday, February 23, 2012 10:06 AM To: Liaw, Andy Cc: Bert Gunter; r-help Subject: Re: [R] Good and modern Kernel Regression package in R with auto-bandwidth? Thank you Andy! I went thru KernSmooth package but I don't see a way to use the fitted function to do the predict

Re: [R] Good and modern Kernel Regression package in R with auto-bandwidth?

2012-02-22 Thread Liaw, Andy
Bert's question aside (I was going to ask about laundry, but that's much harder than taxes...), my understanding of the situation is that optimal is in the eye of the beholder. There were at least two schools of thought on which is the better way of automatically selecting bandwidth, using

Re: [R] Random Forest Package

2012-02-01 Thread Liaw, Andy
You should be able to use the Rgui menu to install packages. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Niratha Sent: Wednesday, February 01, 2012 5:16 AM To: r-help@r-project.org Subject: [R] Random Forest

Re: [R] randomForest: proximity for new objects using an existing rf

2012-02-01 Thread Liaw, Andy
There's an alternative, but it may not be any more efficient in time or memory... You can run predict() on the training set once, setting nodes=TRUE. That will give you a n by ntree matrix of which node of which tree the data point falls in. For any new data, you would run predict() with

Re: [R] indexing by empty string (was RE: Error in predict.randomForest ... subscript out of bounds with NULL name in X)

2012-02-01 Thread Liaw, Andy
, 2012 08:44:13 AM Liaw, Andy wrote: I'm not exactly sure if this is a problem with indexing by name; i.e., is the following behavior by design? The problem is that names or dimnames that are empty seem to be treated differently, and one can't index by them: R junk = 1:3 R names

Re: [R] Bivariate Partial Dependence Plots in Random Forests

2012-01-31 Thread Liaw, Andy
The reason that it's not implemented is because of computational cost. Some users had done it on their own using the same idea. It's just that it takes too much memory for even moderately sized data. It can be done much more efficiently in MART because computational shortcuts were used.

[R] indexing by empty string (was RE: Error in predict.randomForest ... subscript out of bounds with NULL name in X)

2012-01-31 Thread Liaw, Andy
I'm not exactly sure if this is a problem with indexing by name; i.e., is the following behavior by design? The problem is that names or dimnames that are empty seem to be treated differently, and one can't index by them: R junk = 1:3 R names(junk) = c(a, b, ) R junk a b 1 2 3 R junk[] NA

Re: [R] Variable selection based on both training and testing data

2012-01-30 Thread Liaw, Andy
Variable section is part of the training process-- it chooses the model. By definition, test data is used only for testing (evaluating chosen model). If you find a package or function that does variable selection on test data, run from it! Best, Andy -Original Message- From:

Re: [R] What is the function for smoothing splines with the smoothing parameter selected by generalized maximum likelihood?

2012-01-09 Thread Liaw, Andy
See the gss package on CRAN. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ali_protocol Sent: Monday, January 09, 2012 7:13 AM To: r-help@r-project.org Subject: [R] What is the function for smoothing splines with

Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables

2011-12-05 Thread Liaw, Andy
Tree based models (such as RF) are invriant to monotonic transformations in the predictor (x) variables, because they only use the ranks of the variables, not their actual values. More specifically, they look for splits that are at the mid-points of unique values. Thus the resulting trees are

Re: [R] explanation why RandomForest don't require a transformations (e.g. logarithmic) of variables

2011-12-05 Thread Liaw, Andy
You should see no differences beyond what you'd get by running RF a second time with a different random number seed. Best, Andy From: gianni lavaredo [mailto:gianni.lavar...@gmail.com] Sent: Monday, December 05, 2011 2:19 PM To: Liaw, Andy Cc: r-help@r

Re: [R] Random Forests in R

2011-12-01 Thread Liaw, Andy
The first version of the package was created by re-writing the main program in the original Fortran as C, and calls other Fortran subroutines that were mostly untouched, so dynamic memory allocation can be done. Later versions have most of the Fortran code translated/re-written in C.

Re: [R] Question about randomForest

2011-11-28 Thread Liaw, Andy
Not only that, but in the same help page, same Value section, it says: predicted the predicted values of the input data based on out-of-bag samples so people really should read the help pages instead of speculate... If the error rates were not based on OOB samples, they would drop to

Re: [R] tuning random forest. An unexpected result

2011-11-23 Thread Liaw, Andy
Gianni, You should not tune ntree in cross-validation or other validation methods, and especially should not be using OOB MSE to do so. 1. At ntree=1, you are using only about 36% of the data to assess the performance of a single random tree. This number can vary wildly. I'd say don't

Re: [R] gsDesign

2011-11-15 Thread Liaw, Andy
Hi Dongli, Questions about usage of specific contributed packages are best directed toward the package maintainer/author first, as they are likely the best sources of information, and they don't necessarily subscribe to or keep up with the daily deluge of R-help messages. (In this particular

Re: [R] randomForest - NaN in %IncMSE

2011-09-23 Thread Liaw, Andy
You are not giving anyone much to go on. Please read the posting guide and see how to ask your question in a way that's easier for others to answer. At the _very_ least, show what commands you used, what your data looks like, etc. Andy -Original Message- From:

Re: [R] class weights with Random Forest

2011-09-13 Thread Liaw, Andy
The current classwt option in the randomForest package has been there since the beginning, and is different from how the official Fortran code (version 4 and later) implements class weights. It simply account for the class weights in the Gini index calculation when splitting nodes, exactly as

Re: [R] randomForest memory footprint

2011-09-08 Thread Liaw, Andy
It looks like you are building a regression model. With such a large number of rows, you should try to limit the size of the trees by setting nodesize to something larger than the default (5). The issue, I suspect, is the fact that the size of the largest possible tree has about 2*nodesize

Re: [R] randomForest partial dependence plot variable names

2011-08-09 Thread Liaw, Andy
See if the following is close to what you're looking for. If not, please give more detail on what you want to do. data(airquality) airquality - na.omit(airquality) set.seed(131) ozone.rf - randomForest(Ozone ~ ., airquality, importance=TRUE) imp - importance(ozone.rf) # get the importance

Re: [R] convert a splus randomforest object to R

2011-08-09 Thread Liaw, Andy
You really need to follow the suggestions in the posting guide to get the best help from this list. Which versions of randomForest are you using in S-PLUS and R? Which version of R are you using? When you restore the object into R, what does str(object) say? Have you also tried

Re: [R] squared pie chart - is there such a thing?

2011-07-25 Thread Liaw, Andy
Has anyone suggested mosaic displays? That's the closest I can think of as a square pie chart... -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Naomi Robbins Sent: Sunday, July 24, 2011 7:09 AM To: Thomas Levine Cc:

Re: [R] *not* using attach() *but* in one case ....

2011-05-19 Thread Liaw, Andy
From: Prof Brian Ripley Hmm, load() does have an 'envir' argument. So you could simply use that and with() (which is pretty much what attach() does internally). If people really wanted a lazy approach, with() could be extended to allow file names (as attach does). I'm not sure if

Re: [R] Rotation Forest in R

2011-04-12 Thread Liaw, Andy
I don't have access to that article, but just reading the abstract, it should be quite easy to do by writing a wrapper function that calls randomForest(). I've done so with random projections before. One limitation to methods like these is that they only apply to all numeric data. Andy

Re: [R] Difference in mixture normals and one density

2011-04-04 Thread Liaw, Andy
Is something like this what you're looking for? R library(nor1mix) R nmix2 - norMix(c(2, 3), sig2=c(25, 4), w=c(.2, .8)) R dnorMix(1, nmix2) - dnorm(1, 2, 5) [1] 0.03422146 Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of

Re: [R] ok to use glht() when interaction is NOT significant?

2011-03-08 Thread Liaw, Andy
Just to add my ever depreciating $0.02 USD: Keep in mind that the significance testing paradigm puts a constraint on false positive rate, and let false negative rate float. What you should consider is whether that makes sense in your situation. All too often this is not carefully considered,

Re: [R] Coefficient of Determination for nonlinear function

2011-03-04 Thread Liaw, Andy
As far as I can tell, Uwe is not even fitting a model, but instead just solving a nonlinear equation, so I don't know why he wants a R^2. I don't see a statistical model here, so I don't know why one would want a statistical measure. Andy -Original Message- From:

Re: [R] lm - log(variable) - skip log(0)

2011-02-25 Thread Liaw, Andy
You need to use == instead of = for testing equality. While you're at it, you should check for positive values, not just screening out 0s. This works for me: R mydata = data.frame(x=0:10, y=runif(11)) R fm = lm(y ~ log(x), mydata, subset=x0) Andy -Original Message- From:

Re: [R] Random Forest Cross Validation

2011-02-24 Thread Liaw, Andy
Exactly as Max said. See the rfcv() function in the latest version of randomForest, as well as the reference in the help page for that function. OOB estimate is as accurate as CV estimate _if_ you run straight RF. Most other methods do not have this feature. However, if you start adding

Re: [R] tri-cube and gaussian weights in loess

2011-02-07 Thread Liaw, Andy
Locfit() in the locfit package has a slightly more modern implementation of loess, and is much more flexible in that it has a lot of options to tweak. One such option is the kernel. There are seven to choose from. Andy From: wisdomtooth From what I understand, loess in R uses the

Re: [R] How to measure/rank variable importance when using rpart?

2011-01-24 Thread Liaw, Andy
Check out caret::varImp.rpart(). It's described in the original CART book. Andy From: Tal Galili Hello all, When building a CART model (specifically classification tree) using rpart, it is sometimes interesting to know what is the importance of the various variables introduced to

Re: [R] randomForest: too many elements specified?

2011-01-21 Thread Liaw, Andy
specified? Liaw, Andy Mon, 17 Jan 2005 05:56:28 -0800 From: luk When I run randonForest with a 169453x5 matrix, I got the following message. Error in matrix(0, n, n) : matrix: too many elements specified Can you please advise me how to solve this problem? Thanks, Lu 1. When

Re: [R] Where is a package NEWS.Rd located?

2011-01-06 Thread Liaw, Andy
I was communicating with Kevin off-list. The problem seems to be run time, not install time. News() calls tools:::.build_news_db(), and the 2nd line of that function is: nfile - file.path(dir, inst, NEWS.Rd) and that's the problem: an installed package shouldn't have an inst/ subdirectory,

Re: [R] randomForest speed improvements

2011-01-05 Thread Liaw, Andy
Note that that isn't exactly what I recommended. If you look at the example in the help page for combine(), you'll see that it is combining RF objects trained on the same data; i.e., instead of having one RF with 500 trees, you can combine five RFs trained on the same data with 100 trees each

Re: [R] randomForest speed improvements

2011-01-05 Thread Liaw, Andy
From: Liaw, Andy Note that that isn't exactly what I recommended. If you look at the example in the help page for combine(), you'll see that it is combining RF objects trained on the same data; i.e., instead of having one RF with 500 trees, you can combine five RFs trained on the same

Re: [R] randomForest speed improvements

2011-01-04 Thread Liaw, Andy
If you have multiple cores, one poor man's solution is to run separate forests in different R sessions, save the RF objects, load them into the same session and combine() them. You can do this less clumsily if you use things like Rmpi or other distributed computing packages. Another

Re: [R] randomForest: help with combine() function

2010-12-11 Thread Liaw, Andy
combine() is meant to be used on randomForest objects that were built from identical training data. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dennis Duro Sent: Friday, December 10, 2010 11:59 PM To:

Re: [R] randomForest: How to append ID column along with predictions

2010-12-07 Thread Liaw, Andy
The order in the output correspond to the order of the input. I will patch the code so that it grabs the row names of the input (if exist). If you specify type=prob, it already labels the rows by the input row names. -Original Message- From: r-help-boun...@r-project.org

Re: [R] randomForest parameters for image classification

2010-11-18 Thread Liaw, Andy
by the data you want to predict, not the other way around. Andy -Original Message- From: Deschamps, Benjamin [mailto:benjamin.descha...@agr.gc.ca] Sent: Tuesday, November 16, 2010 11:16 AM To: r-help@r-project.org Cc: Liaw, Andy Subject: RE: [R] randomForest parameters for image

Re: [R] randomForest parameters for image classification

2010-11-11 Thread Liaw, Andy
Please show us the code you used to run randomForest, the output, as well as what you get with other algorithms (on the same random subset for comparison). I have yet to see a dataset where randomForest does _far_ worse than other methods. Andy -Original Message- From:

[R] Contract programming position at Merck (NJ, USA)

2010-10-29 Thread Liaw, Andy
Job: Scientific programmer at Merck, Biostatistics, Rahway, NJ, USA [Job Description] This position works closely with statisticians to process and analyze ultrasound, MRI, and radiotelemetry longitudinal studies using a series of programs developed in R and Mathworks/Matlab. This position

Re: [R] to determine the variable importance in svm

2010-10-26 Thread Liaw, Andy
The caret package has answers to all your questions. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Neeti Sent: Tuesday, October 26, 2010 10:42 AM To: r-help@r-project.org Subject: [R] to determine the variable importance

Re: [R] Random Forest AUC

2010-10-24 Thread Liaw, Andy
The OOB error estimates in RF is one really nifty feature that alleviate the need for additional cross-validation or resampling. I've done some empirical comparison between OOB estimates and 10-fold CV estimates, and they are basically the same. Andy -Original Message- From:

Re: [R] Random Forest AUC

2010-10-23 Thread Liaw, Andy
What Breiman meant is that as the model gets more complex (i.e., as the number of trees tends to infinity) the geneeralization error (test set error) does not increase. This does not hold for boosting, for example; i.e., you can't boost forever, which nececitate the need to find the optimal

Re: [R] Random Forest AUC

2010-10-22 Thread Liaw, Andy
Let me expand on what Max showed. For the most part, performance on training set is meaningless. (That's the case for most algorithms, but especially so for RF.) In the default (and recommended) setting, the trees are grown to the maximum size, which means that quite likely there's only one

Re: [R] RandomForest Proximity Matrix

2010-10-21 Thread Liaw, Andy
From: Michael Lindgren Greetings R Users! I am posting to inquire about the proximity matrix in the randomForest R-package. I am having difficulty pushing very large data through the algorithm and it appears to hang on the building of the prox matrix. I have read on Dr. Breiman's

Re: [R] Force evaluation of variable when calling partialPlot

2010-10-04 Thread Liaw, Andy
The plot titles aren't pretty, but the following works for me: R library(randomForest) randomForest 4.5-37 Type rfNews() to see new features/changes/bug fixes. R set.seed(1004) R iris.rf - randomForest(iris[-5], iris[[5]], ntree=1001) R par(mfrow=c(2,2)) R for (i in 1:4) partialPlot(iris.rf,

Re: [R] randomForest - PartialPlot - reg

2010-09-24 Thread Liaw, Andy
In a partial dependence plot, only the relative scale, not absolute scale, of the y-axis is meaningful. I.e., you can compare the range of the curves between partial dependence plots of two different variables, but not the actual numbers on the axis. The range is compressed compared to the

Re: [R] randomForest - partialPlot - Reg

2010-09-22 Thread Liaw, Andy
From: Vijayan Padmanabhan Dear R Group I had an observation that in some cases, when I use the randomForest model to create partialPlot in R using the package randomForest the y-axis displays values that are more than -1! It is a classification problem that i was trying to address. Any

Re: [R] Passing a function as a parameter...

2010-09-22 Thread Liaw, Andy
One possibility: R f = function(x, f) eval(as.call(list(as.name(f), x))) R f(1:10, mean) [1] 5.5 R f(1:10, max) [1] 10 Andy From: Jonathan Greenberg R-helpers: If I want to pass a character name of a function TO a function, and then have that function executed, how would I do this? I

Re: [R] OT: Is randomization for targeted cancer therapies ethical?

2010-09-21 Thread Liaw, Andy
From: jlu...@ria.buffalo.edu Clearly inferior treatments are unethical. The Big Question is: What constitute clearly? Who or How to decide what is clearly? I'm sure there are plenty of people who don't understand much Statistics and are perfectly willing to say the results on the two

Re: [R] Decision Tree in Python or C++?

2010-09-08 Thread Liaw, Andy
For Python, check out the project orange: http://www.ailab.si/orange/doc/catalog/Classify/ClassificationTree.htm Not sure about C++, but OpenDT is in C: http://opendt.sourceforge.net/ Looks like OpenCV has both Python and C++ interface (didn't see Python interace to decision tree, though):

[R] Open position at Merck (NJ, USA)

2010-09-07 Thread Liaw, Andy
Job description: Computational statistician/biometrician The Biometrics Research Department at Merck Research Laboratories, Merck Co., Inc. in Rahway, NJ, is seeking a highly motivated statistician/data analyst to work in its basic research, drug discovery, preclinical and early clinical

Re: [R] RandomForests Limitations? Work Arounds?

2010-09-07 Thread Liaw, Andy
You're not giving us much to go on, so the info I can give is correspondingly vague. I take it you are using RF in unsupervised mode. What RF does in this case is simply generate a second part of the data that have the same marginal distribution as the data you have, but the variables are

Re: [R] predict.loess and NA/NaN values

2010-08-27 Thread Liaw, Andy
From: Philipp Pagel In a current project, I am fitting loess models to subsets of data in order to use the loess predicitons for normalization (similar to what is done in many microarray analyses). While working on this I ran into a problem when I tried to predict from the loess models and

Re: [R] Learning ANOVA

2010-08-16 Thread Liaw, Andy
From: Stephen Liu Hi JesperHybel, Thanks for your advice. If you're trying to follow the youtube video you have a typing mistake here: InsectSprays.aov -(test01$count ~ test01$spray) I think this should be: InsectSprays.aov -aov(test01$count ~ test01$spray) Your advice

Re: [R] Learning ANOVA

2010-08-13 Thread Liaw, Andy
From: Stephen Liu Hi folks, R on Ubuntu 10.04 64 bit. Performed following steps on R:- ### to access to the object data(InsectSprays) ### create a .csv file write.csv(InsectSprays, InsectSpraysCopy.csv) On another terminal $ sudo updatedb $ locate InsectSpraysCopy.csv

Re: [R] Error on random forest variable importance estimates

2010-08-06 Thread Liaw, Andy
From: Pierre Dubath Hello, I am using the R randomForest package to classify variable stars. I have a training set of 1755 stars described by (too) many variables. Some of these variables are highly correlated. I believe that I understand how randomForest works and how the

Re: [R] Collinearity in Moderated Multiple Regression

2010-08-04 Thread Liaw, Andy
Seems to me it may be worth stating what may be elementary to some on this list: - If all relevant variables are included in the model and the true model is indeed linear, then all least squares estimated coefficients are unbiased. [ David Ruppert once said about the three kinds of lies:

Re: [R] Problems with normality req. for ANOVA

2010-08-03 Thread Liaw, Andy
As a matter of fact, I would say both Bert and I encounter designed experiments a lot more than observational studies, yet we speak from experience that those things that Bert mentioned happen on a daily basis. When you talk to experimenters, ask your questions carefully and you'll see these

Re: [R] Collinearity in Moderated Multiple Regression

2010-08-03 Thread Liaw, Andy
If the collinearity you're seeing arose from the addition of a product (interaction) term, I do not think penalization is the best answer. What is the goal of your analysis? If it's prediction, then I wouldn't worry about this type of collinearity. If you're interested in inference, I'd try some

Re: [R] randomForest outlier return NA

2010-07-15 Thread Liaw, Andy
There's a bug in the code. If you add row names to the X matrix befor you call randomForest(), you'd get: R summary (outlier(mdl.rf) ) Min. 1st Qu. MedianMean 3rd Qu.Max. -1.0580 -0.5957 0. 0.6406 1.2650 9.5200 I'll fix this in the next release. Thanks for reporting.

Re: [R] anyone know why package RandomForest na.roughfix is so slow??

2010-07-02 Thread Liaw, Andy
I'll incorporate some of these ideas into the next release. Thanks! Best, Andy -Original Message- From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley Wickham Sent: Thursday, July 01, 2010 8:08 PM To: Mike Williamson Cc: Liaw, Andy; r-help Subject: Re: [R] anyone

Re: [R] anyone know why package RandomForest na.roughfix is so slow??

2010-07-01 Thread Liaw, Andy
You have not shown any code on exactly how you use na.roughfix(), so I can only guess. If you are doing something like: randomForest(y ~ ., mybigdata, na.action=na.roughfix, ...) I would not be surprised that it's taking very long on large datasets. Most likely it's caused by the formula

Re: [R] anyone know why package RandomForest na.roughfix is so slow??

2010-07-01 Thread Liaw, Andy
8.85 R 2.11.1, randomForest 4.5-35, Windows XP (32-bit), Thinkpad T61 with 2GB ram. Andy From: Mike Williamson [mailto:this.is@gmail.com] Sent: Thursday, July 01, 2010 12:48 PM To: Liaw, Andy Cc: r-help Subject: Re: [R] anyone know why package

Re: [R] Linear Discriminant Analysis in R

2010-05-28 Thread Liaw, Andy
cobler_squad needs more basic help than doing lda. The data input just doesn't make sense. If vowel_feature is a data frame, than G - vowel_feature[15] creates another data frame containing the 15th variable in vowel_feature, so G is the name of a data frame, not a variable in a data frame.

  1   2   3   >