Re: [R] Random Forest: OOB performance = test set performance?
Thanks Peter. Indeed by setting a seed the two results are similar. I am self-studying and wanted to make sure I understood the concept of OOB samples and how much "reliable" were performance metrics calculated on them. It seems I did got it. That's good :) On 4/11/21 6:34 AM, Peter Langfelder wrote: I think the only thing you are doing wrong is not setting the random seed (set.seed()) so your results are not reproducible. Depending on the random sample used to select the training and test sets, you get slightly varying accuracy for both, sometimes one is better and sometimes the other. HTH, Peter On Sat, Apr 10, 2021 at 8:49 PM wrote: Hi ML, For random forest, I thought that the out-of-bag performance should be the same (or at least very similar) to the performance calculated on a separated test set. But this does not seem to be the case. In the following code, the accuracy computed on out-of-bag sample is 77.81%, while the one computed on a separated test set is 81%. Can you please check what I am doing wrong? Thanks in advance and best regards. library(randomForest) library(ISLR) Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes") Carseats$High <- as.factor(Carseats$High) train = sample(1:nrow(Carseats), 200) rf = randomForest(High~.-Sales, data=Carseats, subset=train, mtry=6, importance=T) acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion) print(paste0("Accuracy OOB: ", round(acc*100,2), "%")) yhat <- predict(rf, newdata=Carseats[-train,]) y <- Carseats[-train,]$High conftest <- table(y, yhat) acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest) print(paste0("Accuracy test set: ", round(acctest*100,2), "%")) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evil attributes
Terry wrote I confess to being puzzled WHY the R core has decided on this definition [of vector] ... I believe that "R core" followed S's definition of "vector". From the beginning (at least when I first saw it in 1981) an S vector was the basic unit of an S object - it had a type and a length and no more. This has little to do with the mathematician's or physicist's notion of a vector. It is more like what Technopedia ( https://www.techopedia.com/definition/22817/vector-programming) says is a programmer's notion of a vector: What Does Vector Mean? A vector, in programming, is a type of array that is one dimensional. Vectors are a logical element in programming languages that are used for storing data. Vectors are similar to arrays but their actual implementation and operation differs. Techopedia Explains Vector Vectors are primarily used within the programming context of most programming languages and serve as data structure containers. Being a data structure, vectors are used for storing objects and collections of objects in an organized structure. The major difference between and array and a vector is that, unlike typical arrays, the container size of a vector can be easily increased and decreased to complement different data storage types. Vectors have a dynamic structure and provide the ability to assign container size up front and enable allocation of memory space quickly. Vectors can be thought of as dynamic arrays. -Bill On Sun, Apr 11, 2021 at 8:04 AM Therneau, Terry M., Ph.D. via R-help < r-help@r-project.org> wrote: > I wrote: "I confess to being puzzled WHY the R core has decided on this > definition..." > After just a little more thought let me answer my own question. > > a. The as.vector() function is designed to strip off everything extraneous > and leave just > the core. (I have a mental image of Jack Webb saying "Just the facts > ma'am"). I myself > use it freqently in the test suite for survival, in cases where I'm > checking the corrent > numeric result and don't care about any attached names. > > b. is.vector(x) essentially answers the question "does x look like a > result of as.vector?" > > Nevertheless I understand Roger's confusion. > > -- > Terry M Therneau, PhD > Department of Quantitative Health Sciences > Mayo Clinic > thern...@mayo.edu > > "TERR-ree THUR-noh" > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evil attributes
On 11/04/2021 2:46 p.m., Viechtbauer, Wolfgang (SP) wrote: The is.vector() thing has also bitten me in the behind on a few occasions. When I want to check if something is a vector, allow for it to possibly have some additional attributes (besides names) that would make is.vector() evaluate to FALSE, but evaluate to FALSE for lists (since is.vector(list(a=1, b=2)) is TRUE -- which also wasn't what I had initially expected before reading the documentation), I use: .is.vector <- function(x) is.atomic(x) && !is.matrix(x) && !is.null(x) This might also work: .is.vector <- function(x) is(x, "vector") && !is.list(x) I am sure there are all kinds of edge (and probably also not so edge) cases where these also fail to work properly. Kinda curious if there are better approaches out there. Sorry, but nobody has said what "properly" would be here. How can an approach be better at something if you don't say what you want it to do? The base::is.vector() definition looks fairly useless, and I can't remember ever using that function. But at least it's quite well documented what it is supposed to do. What claims are you making about your .is.vector() definitions? Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] updating OpenMx failed
On Sun, 11 Apr 2021, Martin Møller Skarbiniks Pedersen wrote: You should contact the maintainers of the package. According to this page: https://cran.r-project.org/web/packages/OpenMx/index.html you can get help from http://openmx.ssri.psu.edu/forums Martin, You're correct. I should have looked up the maintainer and directly contacted them. I'll do so. Regards, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] updating OpenMx failed
On Sun, 11 Apr 2021 at 18:10, Rich Shepard wrote: > > I'm running Slackware-14.2/x86_64 and R-4.0.2-x86_64-1_SBo. > > Updating OpenMx failed: > omxState.cpp:1230:82: required from here > omxState.cpp:1229:17: error: cannot call member function ‘void ConstraintVec::eval(FitContext*, double*, double*)’ without object > eval(fc2, result.data(), 0); > ^ [...] > Please advise, You should contact the maintainers of the package. According to this page: https://cran.r-project.org/web/packages/OpenMx/index.html you can get help from http://openmx.ssri.psu.edu/forums Regards Martin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evil attributes
On 4/11/21 11:46 AM, Viechtbauer, Wolfgang (SP) wrote: The is.vector() thing has also bitten me in the behind on a few occasions. When I want to check if something is a vector, allow for it to possibly have some additional attributes (besides names) that would make is.vector() evaluate to FALSE, but evaluate to FALSE for lists (since is.vector(list(a=1, b=2)) is TRUE -- which also wasn't what I had initially expected before reading the documentation), I use: .is.vector <- function(x) is.atomic(x) && !is.matrix(x) && !is.null(x) This might also work: .is.vector <- function(x) is(x, "vector") && !is.list(x) That will allow expression vectors to return TRUE, but they are not atomic so they would be excluded by your current version. -- David. I am sure there are all kinds of edge (and probably also not so edge) cases where these also fail to work properly. Kinda curious if there are better approaches out there. You might want to exclude expression vectors as well. Best, Wolfgang -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Therneau, Terry M., Ph.D. via R-help Sent: Saturday, 10 April, 2021 16:12 To: R-help Subject: Re: [R] evil attributes I wrote: "I confess to being puzzled WHY the R core has decided on this definition..." After just a little more thought let me answer my own question. a. The as.vector() function is designed to strip off everything extraneous and leave just the core. (I have a mental image of Jack Webb saying "Just the facts ma'am"). I myself use it freqently in the test suite for survival, in cases where I'm checking the corrent numeric result and don't care about any attached names. b. is.vector(x) essentially answers the question "does x look like a result of as.vector?" Nevertheless I understand Roger's confusion. -- Terry M Therneau, PhD Department of Quantitative Health Sciences Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evil attributes
The is.vector() thing has also bitten me in the behind on a few occasions. When I want to check if something is a vector, allow for it to possibly have some additional attributes (besides names) that would make is.vector() evaluate to FALSE, but evaluate to FALSE for lists (since is.vector(list(a=1, b=2)) is TRUE -- which also wasn't what I had initially expected before reading the documentation), I use: .is.vector <- function(x) is.atomic(x) && !is.matrix(x) && !is.null(x) This might also work: .is.vector <- function(x) is(x, "vector") && !is.list(x) I am sure there are all kinds of edge (and probably also not so edge) cases where these also fail to work properly. Kinda curious if there are better approaches out there. Best, Wolfgang >-Original Message- >From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Therneau, Terry >M., Ph.D. via R-help >Sent: Saturday, 10 April, 2021 16:12 >To: R-help >Subject: Re: [R] evil attributes > >I wrote: "I confess to being puzzled WHY the R core has decided on this >definition..." >After just a little more thought let me answer my own question. > >a. The as.vector() function is designed to strip off everything extraneous and >leave just >the core. (I have a mental image of Jack Webb saying "Just the facts >ma'am"). I myself >use it freqently in the test suite for survival, in cases where I'm checking >the >corrent >numeric result and don't care about any attached names. > > b. is.vector(x) essentially answers the question "does x look like a result > of >as.vector?" > >Nevertheless I understand Roger's confusion. > >-- >Terry M Therneau, PhD >Department of Quantitative Health Sciences >Mayo Clinic >thern...@mayo.edu > >"TERR-ree THUR-noh" __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] updating OpenMx failed
I'm running Slackware-14.2/x86_64 and R-4.0.2-x86_64-1_SBo. Updating OpenMx failed: omxState.cpp:1230:82: required from here omxState.cpp:1229:17: error: cannot call member function ‘void ConstraintVec::eval(FitContext*, double*, double*)’ without object eval(fc2, result.data(), 0); ^ /usr/lib64/R/etc/Makeconf:174: recipe for target 'omxState.o' failed make: *** [omxState.o] Error 1 ERROR: compilation failed for package ‘OpenMx’ * removing ‘/usr/lib64/R/library/OpenMx’ * restoring previous ‘/usr/lib64/R/library/OpenMx’ The downloaded source packages are in ‘/tmp/Rtmp9jMe61/downloaded_packages’ Updating HTML index of packages in '.Library' Making 'packages.html' ... done Warning message: In install.packages("OpenMx") : installation of package ‘OpenMx’ had non-zero exit status Please advise, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evil attributes
I wrote: "I confess to being puzzled WHY the R core has decided on this definition..." After just a little more thought let me answer my own question. a. The as.vector() function is designed to strip off everything extraneous and leave just the core. (I have a mental image of Jack Webb saying "Just the facts ma'am"). I myself use it freqently in the test suite for survival, in cases where I'm checking the corrent numeric result and don't care about any attached names. b. is.vector(x) essentially answers the question "does x look like a result of as.vector?" Nevertheless I understand Roger's confusion. -- Terry M Therneau, PhD Department of Quantitative Health Sciences Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying column type
Thanks. Great idea! Sent from my iPhone Beware: My autocorrect is crazy > On Apr 10, 2021, at 1:37 PM, Rui Barradas wrote: > > Hello, > > Maybe something like > > > ok <- sapply(mydata, is.numeric) > mydata <- mydata[ok] > > > to keep the numeric columns only. > > > Hope this helps, > > Rui Barradas > > Às 04:25 de 10/04/21, Steven Yen escreveu: >> I have data of mixed types in a data frame - date and numeric, as shown >> in summary below. How do I identify the column(s) that is/are not >> numeric, in this case, the first. All I want is to identify the >> column(s) and so that I can remove it/them from the data frame Thanks. >>> summary(mydata) >> Date Spot Futures Min. :1997-09-01 00:00:00 Min. : 735.1 Min. : 734.2 >> 1st Qu.:2002-10-16 12:00:00 1st Qu.:1120.7 1st Qu.:1122.6 Median >> :2007-12-01 00:00:00 Median :1301.8 Median :1303.2 Mean :2007-12-01 >> 06:01:27 Mean :1423.1 Mean :1423.6 3rd Qu.:2013-01-16 12:00:00 3rd >> Qu.:1540.0 3rd Qu.:1546.5 Max. :2018-03-01 00:00:00 Max. :2823.8 Max. >> :2825.8 >>[[alternative HTML version deleted]] >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.