Re: [R] test logistic regression model
Agreed on the ranking of (1) vs (2) On Sun, Nov 20, 2022 at 1:30 PM Ebert,Timothy Aaron wrote: > I like option 1. Option 2 may cause problems if you are pooling groups > that do not go together. This is especially a problem if you know that the > data is missing some groups. I would consider dropping rare groups - or > compare results between pooling and dropping options. If the answer is the > same in both cases then use the approach that makes your life easier with > reviewers/clients. If the answer is different then I would go with dropping > rare categories, or present both and highlight the difference in outcome. A > third option is to gather more data. > > Tim > > -Original Message- > From: R-help On Behalf Of Bert Gunter > Sent: Sunday, November 20, 2022 1:06 PM > To: Mitchell Maltenfort > Cc: R-help > Subject: Re: [R] test logistic regression model > > [External Email] > > I think (2) might be a bad idea if one of the "sparse"categories has high > predictive power. You'll lose it when you pool, will you not? > Also, there is the problem of subjectively defining "sparse." > > However, 1) seems quite sensible to me. But IANAE. > > -- Bert > > On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort > wrote: > > > > Two possible fixes occur to me > > > > 1) Redo the test/training split but within levels of factor - so you > > have the same split within each level and each level accounted for in > > training and testing > > > > 2) if you have a lot of levels, and perhaps sparse representation in a > > few, consider recoding levels to pool the rare ones into an "other" > > category > > > > On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter > wrote: > >> > >> small reprex: > >> > >> set.seed(5) > >> dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <- > >> data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not > >> seen in dat to NA > >> is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data = > >> dat) > >> > >> ##Result: > >> > predict(lmfit,newdat) > >> 1 2 3 4 5 6 > >> 0.4374251 0.6196527NA 0.4374251 0.6196527NA > >> > >> If this does not suffice, as Rui said, we need details of what you did. > >> (predict.glm works like predict.lm) > >> > >> > >> -- Bert > >> > >> > >> On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas > wrote: > >> > > >> > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: > >> > > Dear Bert, > >> > > > >> > > Yes, was trying to fill the not existing categories with NAs, but > >> > > the suggested solutions in stackoverflow.com unfortunately did not > work. > >> > > > >> > > Best regards > >> > > Gabor > >> > > > >> > > > >> > > Bert Gunter schrieb am So., 20. Nov. > 2022, 16:20: > >> > > > >> > >> You can't predict results for categories that you've not seen > >> > >> before (think about it). You will need to remove those cases > >> > >> from your test set (or convert them to NA and predict them as NA). > >> > >> > >> > >> -- Bert > >> > >> > >> > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki > >> > >> > >> > >> wrote: > >> > >> > >> > >>> Dear all, > >> > >>> > >> > >>> i have created a logistic regression model, > >> > >>> on the train df: > >> > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = > >> > >>> "binomial") > >> > >>> > >> > >>> then i try to predict with the test df > >> > >>> Predict<- predict(mymodel1, newdata = test, type = "response") > >> > >>> then iget this error message: > >> > >>> Error in model.frame.default(Terms, newdata, na.action = > >> > >>> na.action, xlev = > >> > >>> object$xlevels) > >> > >>> Factor "TG_KraftF5" has new levels > >> > >>> > >> > >>> i have tried different proposals from stackoverflow, but > >> > >>> unfortu
Re: [R] test logistic regression model
I like option 1. Option 2 may cause problems if you are pooling groups that do not go together. This is especially a problem if you know that the data is missing some groups. I would consider dropping rare groups - or compare results between pooling and dropping options. If the answer is the same in both cases then use the approach that makes your life easier with reviewers/clients. If the answer is different then I would go with dropping rare categories, or present both and highlight the difference in outcome. A third option is to gather more data. Tim -Original Message- From: R-help On Behalf Of Bert Gunter Sent: Sunday, November 20, 2022 1:06 PM To: Mitchell Maltenfort Cc: R-help Subject: Re: [R] test logistic regression model [External Email] I think (2) might be a bad idea if one of the "sparse"categories has high predictive power. You'll lose it when you pool, will you not? Also, there is the problem of subjectively defining "sparse." However, 1) seems quite sensible to me. But IANAE. -- Bert On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort wrote: > > Two possible fixes occur to me > > 1) Redo the test/training split but within levels of factor - so you > have the same split within each level and each level accounted for in > training and testing > > 2) if you have a lot of levels, and perhaps sparse representation in a > few, consider recoding levels to pool the rare ones into an "other" > category > > On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter wrote: >> >> small reprex: >> >> set.seed(5) >> dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <- >> data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not >> seen in dat to NA >> is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data = >> dat) >> >> ##Result: >> > predict(lmfit,newdat) >> 1 2 3 4 5 6 >> 0.4374251 0.6196527NA 0.4374251 0.6196527NA >> >> If this does not suffice, as Rui said, we need details of what you did. >> (predict.glm works like predict.lm) >> >> >> -- Bert >> >> >> On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas wrote: >> > >> > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: >> > > Dear Bert, >> > > >> > > Yes, was trying to fill the not existing categories with NAs, but >> > > the suggested solutions in stackoverflow.com unfortunately did not work. >> > > >> > > Best regards >> > > Gabor >> > > >> > > >> > > Bert Gunter schrieb am So., 20. Nov. 2022, >> > > 16:20: >> > > >> > >> You can't predict results for categories that you've not seen >> > >> before (think about it). You will need to remove those cases >> > >> from your test set (or convert them to NA and predict them as NA). >> > >> >> > >> -- Bert >> > >> >> > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki >> > >> >> > >> wrote: >> > >> >> > >>> Dear all, >> > >>> >> > >>> i have created a logistic regression model, >> > >>> on the train df: >> > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = >> > >>> "binomial") >> > >>> >> > >>> then i try to predict with the test df >> > >>> Predict<- predict(mymodel1, newdata = test, type = "response") >> > >>> then iget this error message: >> > >>> Error in model.frame.default(Terms, newdata, na.action = >> > >>> na.action, xlev = >> > >>> object$xlevels) >> > >>> Factor "TG_KraftF5" has new levels >> > >>> >> > >>> i have tried different proposals from stackoverflow, but >> > >>> unfortunately they did not solved the problem. >> > >>> Do you have any idea how to test a logistic regression model >> > >>> when you have different levels in train and in test df? >> > >>> >> > >>> thank you in advance >> > >>> Regards, >> > >>> Gabor >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> __ >> > >>> R-help@r-project.org mailing list -- To UNS
Re: [R] test logistic regression model
I think (2) might be a bad idea if one of the "sparse"categories has high predictive power. You'll lose it when you pool, will you not? Also, there is the problem of subjectively defining "sparse." However, 1) seems quite sensible to me. But IANAE. -- Bert On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort wrote: > > Two possible fixes occur to me > > 1) Redo the test/training split but within levels of factor - so you have the > same split within each level and each level accounted for in training and > testing > > 2) if you have a lot of levels, and perhaps sparse representation in a few, > consider recoding levels to pool the rare ones into an “other” category > > On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter wrote: >> >> small reprex: >> >> set.seed(5) >> dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) >> newdat <- data.frame(f =rep(c('r','g','b'),2)) >> ## convert values in newdat not seen in dat to NA >> is.na(newdat$f) <-!( newdat$f %in% dat$f) >> lmfit <- lm(y~f, data = dat) >> >> ##Result: >> > predict(lmfit,newdat) >> 1 2 3 4 5 6 >> 0.4374251 0.6196527NA 0.4374251 0.6196527NA >> >> If this does not suffice, as Rui said, we need details of what you did. >> (predict.glm works like predict.lm) >> >> >> -- Bert >> >> >> On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas wrote: >> > >> > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: >> > > Dear Bert, >> > > >> > > Yes, was trying to fill the not existing categories with NAs, but the >> > > suggested solutions in stackoverflow.com unfortunately did not work. >> > > >> > > Best regards >> > > Gabor >> > > >> > > >> > > Bert Gunter schrieb am So., 20. Nov. 2022, >> > > 16:20: >> > > >> > >> You can't predict results for categories that you've not seen before >> > >> (think about it). You will need to remove those cases from your test set >> > >> (or convert them to NA and predict them as NA). >> > >> >> > >> -- Bert >> > >> >> > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki >> > >> >> > >> wrote: >> > >> >> > >>> Dear all, >> > >>> >> > >>> i have created a logistic regression model, >> > >>> on the train df: >> > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = >> > >>> "binomial") >> > >>> >> > >>> then i try to predict with the test df >> > >>> Predict<- predict(mymodel1, newdata = test, type = "response") >> > >>> then iget this error message: >> > >>> Error in model.frame.default(Terms, newdata, na.action = na.action, >> > >>> xlev = >> > >>> object$xlevels) >> > >>> Factor "TG_KraftF5" has new levels >> > >>> >> > >>> i have tried different proposals from stackoverflow, but unfortunately >> > >>> they >> > >>> did not solved the problem. >> > >>> Do you have any idea how to test a logistic regression model when you >> > >>> have >> > >>> different levels in train and in test df? >> > >>> >> > >>> thank you in advance >> > >>> Regards, >> > >>> Gabor >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> __ >> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>> PLEASE do read the posting guide >> > >>> http://www.R-project.org/posting-guide.html >> > >>> and provide commented, minimal, self-contained, reproducible code. >> > >>> >> > >> >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > __ >> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > >> > hello, >> > >> > What exactly didn't work? You say you have tried the solutions found in >> > stackoverflow but without a link, we don't know which answers to which >> > questions you are talking about. >> > Like Bert said, if you assign NA to the new levels, present only in >> > test, it should work. >> > >> > Can you post links to what you have tried? >> > >> > Hope this helps, >> > >> > Rui Barradas >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from Gmail Mobile __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test logistic regression model
Two possible fixes occur to me 1) Redo the test/training split but within levels of factor - so you have the same split within each level and each level accounted for in training and testing 2) if you have a lot of levels, and perhaps sparse representation in a few, consider recoding levels to pool the rare ones into an “other” category On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter wrote: > small reprex: > > set.seed(5) > dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) > newdat <- data.frame(f =rep(c('r','g','b'),2)) > ## convert values in newdat not seen in dat to NA > is.na(newdat$f) <-!( newdat$f %in% dat$f) > lmfit <- lm(y~f, data = dat) > > ##Result: > > predict(lmfit,newdat) > 1 2 3 4 5 6 > 0.4374251 0.6196527NA 0.4374251 0.6196527NA > > If this does not suffice, as Rui said, we need details of what you did. > (predict.glm works like predict.lm) > > > -- Bert > > > On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas wrote: > > > > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: > > > Dear Bert, > > > > > > Yes, was trying to fill the not existing categories with NAs, but the > > > suggested solutions in stackoverflow.com unfortunately did not work. > > > > > > Best regards > > > Gabor > > > > > > > > > Bert Gunter schrieb am So., 20. Nov. 2022, > 16:20: > > > > > >> You can't predict results for categories that you've not seen before > > >> (think about it). You will need to remove those cases from your test > set > > >> (or convert them to NA and predict them as NA). > > >> > > >> -- Bert > > >> > > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki < > gmalomsoki1...@gmail.com> > > >> wrote: > > >> > > >>> Dear all, > > >>> > > >>> i have created a logistic regression model, > > >>> on the train df: > > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = > > >>> "binomial") > > >>> > > >>> then i try to predict with the test df > > >>> Predict<- predict(mymodel1, newdata = test, type = "response") > > >>> then iget this error message: > > >>> Error in model.frame.default(Terms, newdata, na.action = na.action, > xlev = > > >>> object$xlevels) > > >>> Factor "TG_KraftF5" has new levels > > >>> > > >>> i have tried different proposals from stackoverflow, but > unfortunately > > >>> they > > >>> did not solved the problem. > > >>> Do you have any idea how to test a logistic regression model when > you have > > >>> different levels in train and in test df? > > >>> > > >>> thank you in advance > > >>> Regards, > > >>> Gabor > > >>> > > >>> [[alternative HTML version deleted]] > > >>> > > >>> __ > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide > > >>> http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >>> > > >> > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > hello, > > > > What exactly didn't work? You say you have tried the solutions found in > > stackoverflow but without a link, we don't know which answers to which > > questions you are talking about. > > Like Bert said, if you assign NA to the new levels, present only in > > test, it should work. > > > > Can you post links to what you have tried? > > > > Hope this helps, > > > > Rui Barradas > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Sent from Gmail Mobile [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test logistic regression model
small reprex: set.seed(5) dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <- data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not seen in dat to NA is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data = dat) ##Result: > predict(lmfit,newdat) 1 2 3 4 5 6 0.4374251 0.6196527NA 0.4374251 0.6196527NA If this does not suffice, as Rui said, we need details of what you did. (predict.glm works like predict.lm) -- Bert On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas wrote: > > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: > > Dear Bert, > > > > Yes, was trying to fill the not existing categories with NAs, but the > > suggested solutions in stackoverflow.com unfortunately did not work. > > > > Best regards > > Gabor > > > > > > Bert Gunter schrieb am So., 20. Nov. 2022, 16:20: > > > >> You can't predict results for categories that you've not seen before > >> (think about it). You will need to remove those cases from your test set > >> (or convert them to NA and predict them as NA). > >> > >> -- Bert > >> > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki > >> wrote: > >> > >>> Dear all, > >>> > >>> i have created a logistic regression model, > >>> on the train df: > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = > >>> "binomial") > >>> > >>> then i try to predict with the test df > >>> Predict<- predict(mymodel1, newdata = test, type = "response") > >>> then iget this error message: > >>> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = > >>> object$xlevels) > >>> Factor "TG_KraftF5" has new levels > >>> > >>> i have tried different proposals from stackoverflow, but unfortunately > >>> they > >>> did not solved the problem. > >>> Do you have any idea how to test a logistic regression model when you have > >>> different levels in train and in test df? > >>> > >>> thank you in advance > >>> Regards, > >>> Gabor > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > hello, > > What exactly didn't work? You say you have tried the solutions found in > stackoverflow but without a link, we don't know which answers to which > questions you are talking about. > Like Bert said, if you assign NA to the new levels, present only in > test, it should work. > > Can you post links to what you have tried? > > Hope this helps, > > Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test logistic regression model
Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu: Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter schrieb am So., 20. Nov. 2022, 16:20: You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki wrote: Dear all, i have created a logistic regression model, on the train df: mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = "binomial") then i try to predict with the test df Predict<- predict(mymodel1, newdata = test, type = "response") then iget this error message: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) Factor "TG_KraftF5" has new levels i have tried different proposals from stackoverflow, but unfortunately they did not solved the problem. Do you have any idea how to test a logistic regression model when you have different levels in train and in test df? thank you in advance Regards, Gabor [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test logistic regression model
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter schrieb am So., 20. Nov. 2022, 16:20: > You can't predict results for categories that you've not seen before > (think about it). You will need to remove those cases from your test set > (or convert them to NA and predict them as NA). > > -- Bert > > On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki > wrote: > >> Dear all, >> >> i have created a logistic regression model, >> on the train df: >> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = >> "binomial") >> >> then i try to predict with the test df >> Predict<- predict(mymodel1, newdata = test, type = "response") >> then iget this error message: >> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = >> object$xlevels) >> Factor "TG_KraftF5" has new levels >> >> i have tried different proposals from stackoverflow, but unfortunately >> they >> did not solved the problem. >> Do you have any idea how to test a logistic regression model when you have >> different levels in train and in test df? >> >> thank you in advance >> Regards, >> Gabor >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test logistic regression model
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki wrote: > Dear all, > > i have created a logistic regression model, > on the train df: > mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = "binomial") > > then i try to predict with the test df > Predict<- predict(mymodel1, newdata = test, type = "response") > then iget this error message: > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = > object$xlevels) > Factor "TG_KraftF5" has new levels > > i have tried different proposals from stackoverflow, but unfortunately they > did not solved the problem. > Do you have any idea how to test a logistic regression model when you have > different levels in train and in test df? > > thank you in advance > Regards, > Gabor > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if something was plotted on pdf device
Dear Duncan Thank you for the code, I will test it or at least check what it does. I finally found probably easier solution. I stay with my original code if (dev.cur()==1) plot(ecdf(velik[,"ecd"]), main = ufil[j], col=i) else plot(ecdf(velik[,"ecd"]), add=T, col=i) After plot is finished and cycle ends, I copy result to pdf device dev.copy(pdf,paste(gsub(".xls", "", ufil)[j], ".pdf", sep="")) dev.off() Using this approach I could stay with my original code (almost), check if plot was initialised by dev.cur() and save it after it is finished to pdf. The only obstacle is that my code flashes during plotting to basic device, however I can live with it. Thank you again and best regards Petr > -Original Message- > From: Duncan Murdoch > Sent: Thursday, September 12, 2019 2:29 PM > To: PIKAL Petr ; r-help mailing list project.org> > Subject: Re: [R] test if something was plotted on pdf device > > On 12/09/2019 7:10 a.m., PIKAL Petr wrote: > > Dear all > > > > Is there any simple way checking whether after calling pdf device > something was plotted into it? > > > > In interactive session I used > > > > if (dev.cur()==1) plot(ecdf(rnorm(100))) else plot(ecdf(rnorm(100)), > > add=T, col=i) which enabled me to test if plot is open > > > > But when I want to call eg. pdf("test.pdf") before cycle > > dev.cur()==1 is FALSE even when no plot is drawn and plot.new error > comes. > > > >> pdf("test.pdf") > > > > if (dev.cur()==1) plot(ecdf(rnorm(100))) else plot(ecdf(rnorm(100)), > > add=T, col=i) > > > > Error in segments(ti.l, y, ti.r, y, col = col.hor, lty = lty, lwd = lwd, : > >plot.new has not been called yet > > > > I don't know if this is reliable or not, but you could use code like this: > >f <- tempfile() >pdf(f) >blankPlot <- recordPlot() >dev.off() >unlink(f) > >pdf("test.pdf") > >... unknown operations ... > >if (dev.cur() == 1 || identical(recordPlot(), blankPlot)) > plot(ecdf(rnorm(100))) >else > plot(ecdf(rnorm(100)), add=TRUE, col=i) > > > > Duncan Murdoch Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if something was plotted on pdf device
On 12/09/2019 7:10 a.m., PIKAL Petr wrote: Dear all Is there any simple way checking whether after calling pdf device something was plotted into it? In interactive session I used if (dev.cur()==1) plot(ecdf(rnorm(100))) else plot(ecdf(rnorm(100)), add=T, col=i) which enabled me to test if plot is open But when I want to call eg. pdf("test.pdf") before cycle dev.cur()==1 is FALSE even when no plot is drawn and plot.new error comes. pdf("test.pdf") if (dev.cur()==1) plot(ecdf(rnorm(100))) else plot(ecdf(rnorm(100)), add=T, col=i) Error in segments(ti.l, y, ti.r, y, col = col.hor, lty = lty, lwd = lwd, : plot.new has not been called yet I don't know if this is reliable or not, but you could use code like this: f <- tempfile() pdf(f) blankPlot <- recordPlot() dev.off() unlink(f) pdf("test.pdf") ... unknown operations ... if (dev.cur() == 1 || identical(recordPlot(), blankPlot)) plot(ecdf(rnorm(100))) else plot(ecdf(rnorm(100)), add=TRUE, col=i) Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test of independence
The basic test of independence for a table based on the Chi-squared distribution can be done using the `chisq.test` function. This is in the stats package which is installed and loaded by default, so you don't need to do anything additional. There is also the `fisher.test` function for Fisher's exact test (similar hypotheses, different methodology and assumptions, may be really slow on your table). If you need more than the basics provided in those functions, then a search of CRAN may be helpful, or give us more detail to be able to help. On Thu, Dec 20, 2018 at 12:08 AM km wrote: > > Dear All, > > How do I do a test of independence with 16x16 table of counts. > Please suggest. > > Regards, > KM > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test of independence
Hi Did you search CRAN? I got **many** results for test of independence which may or may not provide you with suitable procedures. Cheers Petr > -Original Message- > From: R-help On Behalf Of km > Sent: Thursday, December 20, 2018 8:07 AM > To: r-help@r-project.org > Subject: [R] test of independence > > Dear All, > > How do I do a test of independence with 16x16 table of counts. > Please suggest. > > Regards, > KM > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
Dear Mr. Savicky, I am currently working on a project where I want to test a random number generator, which is supposed to create 10.000 continuously uniformly distributed random numbers between 0 and 1. I am now wondering if I can use the Chi-Squared-Test to solve this problem or if the Kolmogorov-Smirnov-test would be a better fit. I came across one of your threads on the internet where you answer a similar question and thought I'd reach out to you. Thanks in advance Florian Huber Diese Nachricht einschliesslich etwa beigefuegter Anhaenge ist vertraulich und kann dem Bank- und Datengeheimnis unterliegen oder sonst rechtlich geschuetzte Daten und Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender �ber die Antwortfunktion. Anschliessend moechten Sie bitte diese Nachricht einschliesslich etwa beigefuegter Anhaenge unverzueglich vollstaendig loeschen. Das unerlaubte Kopieren oder Speichern dieser Nachricht und/oder der ihr etwa beigefuegten Anhaenge sowie die unbefugte Weitergabe der darin enthaltenen Daten und Informationen sind nicht gestattet. Wir weisen darauf hin, dass rechtsverbindliche Erklaerungen namens unseres Hauses grundsaetzlich der Unterschriften zweier ausreichend bevollmaechtigter Vertreter unseres Hauses beduerfen. Wir verschicken daher keine rechtsverbindlichen Erklaerungen per E-Mail an Dritte. Demgemaess nehmen wir per E-Mail auch keine rechtsverbindlichen Erklaerungen oder Auftraege von Dritten entgegen. Sollten Sie Schwierigkeiten beim Oeffnen dieser E-Mail haben, wenden Sie sich bitte an den Absender oder an i...@berenberg.de. Please refer to http://www.berenberg.de/my_berenberg/disclaimer_e.html for our confidentiality notice. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test for proportion or concordance
This list is about R programming, not statistics, although admittedly there is a nonempty intersection. However, I think you would do better posting this on a statistics list like stats.stackexchange.com. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Aug 3, 2017 at 7:19 AM, Adrian Johnsonwrote: > Hello group, > > my question is deciding what test would be appropriate for following question. > > An experiment 'A' yielded 3200 observations of which 431 are > significant. Similarly, using same method, another experiment 'B' on a > different population yielded 2541 observations of which 260 are > significant. > > There are 180 observations that are common between significant > observations of A and B. > (180 are common between 431 and 260). > > 80 observations are specific to A > 251 observations are specific to B. > > The question are the 180 observations that are common between A and B > - are these 180 common observations occurring by chance? > > What test would be appropriate for this scenario. (if my total > observations are fixed between two experiments A and B, I could use > Cohens kappa for concordance or Chi-square etc. > Since the total observations differ between experiments A and B, I > dont know what test would be appropriate. I appreciate your help. > > thanks > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test individual slope for each factor level in ANCOVA
Hi John. Thanks much for your help. It is great to know this. Hanna 2017-03-16 8:02 GMT-04:00 Fox, John: > Dear Hanna, > > You can test the slope in each non-reference group as a linear hypothesis. > You didn’t make the data available for your example, so here’s an example > using the linearHypothesis() function in the car package with the Moore > data set in the same package: > > - - - snip - - - > > > library(car) > > mod <- lm(conformity ~ fscore*partner.status, data=Moore) > > summary(mod) > > Call: > lm(formula = conformity ~ fscore * partner.status, data = Moore) > > Residuals: > Min 1Q Median 3Q Max > -7.5296 -2.5984 -0.4473 2.0994 12.4704 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 20.793483.26273 6.373 1.27e-07 *** > fscore-0.151100.07171 -2.107 0.04127 * > partner.statuslow-15.534084.40045 -3.530 0.00104 ** > fscore:partner.statuslow 0.261100.09700 2.692 0.01024 * > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 4.562 on 41 degrees of freedom > Multiple R-squared: 0.2942,Adjusted R-squared: 0.2426 > F-statistic: 5.698 on 3 and 41 DF, p-value: 0.002347 > > > linearHypothesis(mod, "fscore + fscore:partner.statuslow") > Linear hypothesis test > > Hypothesis: > fscore + fscore:partner.statuslow = 0 > > Model 1: restricted model > Model 2: conformity ~ fscore * partner.status > > Res.DfRSS Df Sum of Sq F Pr(>F) > 1 42 912.45 > 2 41 853.42 159.037 2.8363 0.09976 . > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > - - - snip - - - > > In this case, there are just two levels for partner.status, but for a > multi-level factor you can simply perform more than one test. > > > I hope this helps, > > John > > - > John Fox, Professor > McMaster University > Hamilton, Ontario, Canada > Web: http://socserv.mcmaster.ca/jfox/ > > > > > On 2017-03-15, 9:43 PM, "R-help on behalf of li li" > wrote: > > >Hi all, > > Consider the data set where there are a continuous response variable, a > >continuous predictor "weeks" and a categorical variable "region" with five > >levels "a", "b", "c", > >"d", "e". > > I fit the ANCOVA model as follows. Here the reference level is region > >"a" > >and there are 4 dummy variables. The interaction terms (in red below) > >represent the slope > >difference between each region and the baseline region "a" and the > >corresponding p-value is for testing whether this slope difference is > >zero. > >Is there a way to directly test whether the slope corresponding to each > >individual factor level is 0 or not, instead of testing the slope > >difference from the baseline level? > > Thanks very much. > > Hanna > > > > > > > > > > > > > >> mod <- lm(response ~ weeks*region,data)> summary(mod) > >Call: > >lm(formula = response ~ weeks * region, data = data) > > > >Residuals: > > Min 1Q Median 3Q Max > >-0.19228 -0.07433 -0.01283 0.04439 0.24544 > > > >Coefficients: > >Estimate Std. Error t value Pr(>|t|) > >(Intercept)1.2105556 0.0954567 12.682 1.2e-14 *** > >weeks -0.021 0.0147293 -1.4480.156 > >regionb -0.0257778 0.1349962 -0.1910.850 > >regionc -0.034 0.1349962 -0.2550.800 > >regiond -0.075 0.1349962 -0.5590.580 > >regione -0.148 0.1349962 -1.0980.280weeks:regionb > >-0.0007222 0.0208304 -0.0350.973 > >weeks:regionc -0.0017778 0.0208304 -0.0850.932 > >weeks:regiond 0.003 0.0208304 0.1440.886 > >weeks:regione 0.0301667 0.0208304 1.4480.156--- > >Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > > >Residual standard error: 0.1082 on 35 degrees of freedom > >Multiple R-squared: 0.2678, Adjusted R-squared: 0.07946 > >F-statistic: 1.422 on 9 and 35 DF, p-value: 0.2165 > > > > [[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test individual slope for each factor level in ANCOVA
Dear Hanna, You can test the slope in each non-reference group as a linear hypothesis. You didn’t make the data available for your example, so here’s an example using the linearHypothesis() function in the car package with the Moore data set in the same package: - - - snip - - - > library(car) > mod <- lm(conformity ~ fscore*partner.status, data=Moore) > summary(mod) Call: lm(formula = conformity ~ fscore * partner.status, data = Moore) Residuals: Min 1Q Median 3Q Max -7.5296 -2.5984 -0.4473 2.0994 12.4704 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.793483.26273 6.373 1.27e-07 *** fscore-0.151100.07171 -2.107 0.04127 * partner.statuslow-15.534084.40045 -3.530 0.00104 ** fscore:partner.statuslow 0.261100.09700 2.692 0.01024 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.562 on 41 degrees of freedom Multiple R-squared: 0.2942,Adjusted R-squared: 0.2426 F-statistic: 5.698 on 3 and 41 DF, p-value: 0.002347 > linearHypothesis(mod, "fscore + fscore:partner.statuslow") Linear hypothesis test Hypothesis: fscore + fscore:partner.statuslow = 0 Model 1: restricted model Model 2: conformity ~ fscore * partner.status Res.DfRSS Df Sum of Sq F Pr(>F) 1 42 912.45 2 41 853.42 159.037 2.8363 0.09976 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 - - - snip - - - In this case, there are just two levels for partner.status, but for a multi-level factor you can simply perform more than one test. I hope this helps, John - John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/ On 2017-03-15, 9:43 PM, "R-help on behalf of li li"wrote: >Hi all, > Consider the data set where there are a continuous response variable, a >continuous predictor "weeks" and a categorical variable "region" with five >levels "a", "b", "c", >"d", "e". > I fit the ANCOVA model as follows. Here the reference level is region >"a" >and there are 4 dummy variables. The interaction terms (in red below) >represent the slope >difference between each region and the baseline region "a" and the >corresponding p-value is for testing whether this slope difference is >zero. >Is there a way to directly test whether the slope corresponding to each >individual factor level is 0 or not, instead of testing the slope >difference from the baseline level? > Thanks very much. > Hanna > > > > > > >> mod <- lm(response ~ weeks*region,data)> summary(mod) >Call: >lm(formula = response ~ weeks * region, data = data) > >Residuals: > Min 1Q Median 3Q Max >-0.19228 -0.07433 -0.01283 0.04439 0.24544 > >Coefficients: >Estimate Std. Error t value Pr(>|t|) >(Intercept)1.2105556 0.0954567 12.682 1.2e-14 *** >weeks -0.021 0.0147293 -1.4480.156 >regionb -0.0257778 0.1349962 -0.1910.850 >regionc -0.034 0.1349962 -0.2550.800 >regiond -0.075 0.1349962 -0.5590.580 >regione -0.148 0.1349962 -1.0980.280weeks:regionb >-0.0007222 0.0208304 -0.0350.973 >weeks:regionc -0.0017778 0.0208304 -0.0850.932 >weeks:regiond 0.003 0.0208304 0.1440.886 >weeks:regione 0.0301667 0.0208304 1.4480.156--- >Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > >Residual standard error: 0.1082 on 35 degrees of freedom >Multiple R-squared: 0.2678, Adjusted R-squared: 0.07946 >F-statistic: 1.422 on 9 and 35 DF, p-value: 0.2165 > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Homoscedesticity in R Without BP Test
I have tried and got the result. Thank you every one. On Tue, Apr 5, 2016 at 12:58 AM, Achim Zeileiswrote: > On Mon, 4 Apr 2016, varin sacha via R-help wrote: > > Hi Deepak, >> >> In econometrics there is another test very often used : the white test. >> The white test is based on the comparison of the estimated variances of >> residuals when the model is estimated by OLS under the assumption of >> homoscedasticity and when the model is estimated by OLS under the >> assumption of heteroscedastic. >> > > The White test is a special case of the Breusch-Pagan test using a > particular specification of the auxiliary regressors: namely all > regressors, their squares and their cross-products. As this specification > makes only sense if all regressors are continuous, many implementations > have problems if there are already dummy variables, interactions, etc. in > the regressor matrix. This is also the reason why bptest() from "lmtest" > uses a different specification by default. However, you can utilize the > function to carry out the White test as illustrated in: > > example("CigarettesB", package = "AER") > > (Of course, the AER package needs to be installed first.) > > The White test with R >> >> install.packages("bstats") >> library(bstats) >> white.test(LinearModel) >> > > That package is no longer on CRAN as it took the code from bptest() > without crediting its original authors and released it in a package that > conflicted with the original license. Also, the implementation did not > check for potential problems with dummy variables or interactions mentioned > above. > > So the bptest() implementation from "lmtest" is really recommend. Or > alternatively ncvTest() from package "car". > > > Hope this helps. >> >> Sacha >> >> >> >> >> >> >> De : Deepak Singh >> À : r-help@r-project.org Envoyé le : Lundi 4 avril 2016 10h40 >> Objet : [R] Test for Homoscedesticity in R Without BP Test >> >> >> Respected Sir, >> I am doing a project on multiple linear model fitting and in that project >> I >> have to test Homoscedesticity of errors I have google for the same and >> found bptest for the same but in R version 3.2.4 bp test is not available. >> So please suggest me a test on homoscedesticity ASAP as we have to submit >> our report on 7-04-2016. >> >> P.S. : I have plotted residuals against fitted values and it is less or >> more random. >> >> Thank You ! >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Homoscedesticity in R Without BP Test
On Mon, 4 Apr 2016, varin sacha via R-help wrote: Hi Deepak, In econometrics there is another test very often used : the white test. The white test is based on the comparison of the estimated variances of residuals when the model is estimated by OLS under the assumption of homoscedasticity and when the model is estimated by OLS under the assumption of heteroscedastic. The White test is a special case of the Breusch-Pagan test using a particular specification of the auxiliary regressors: namely all regressors, their squares and their cross-products. As this specification makes only sense if all regressors are continuous, many implementations have problems if there are already dummy variables, interactions, etc. in the regressor matrix. This is also the reason why bptest() from "lmtest" uses a different specification by default. However, you can utilize the function to carry out the White test as illustrated in: example("CigarettesB", package = "AER") (Of course, the AER package needs to be installed first.) The White test with R install.packages("bstats") library(bstats) white.test(LinearModel) That package is no longer on CRAN as it took the code from bptest() without crediting its original authors and released it in a package that conflicted with the original license. Also, the implementation did not check for potential problems with dummy variables or interactions mentioned above. So the bptest() implementation from "lmtest" is really recommend. Or alternatively ncvTest() from package "car". Hope this helps. Sacha De : Deepak SinghÀ : r-help@r-project.org Envoyé le : Lundi 4 avril 2016 10h40 Objet : [R] Test for Homoscedesticity in R Without BP Test Respected Sir, I am doing a project on multiple linear model fitting and in that project I have to test Homoscedesticity of errors I have google for the same and found bptest for the same but in R version 3.2.4 bp test is not available. So please suggest me a test on homoscedesticity ASAP as we have to submit our report on 7-04-2016. P.S. : I have plotted residuals against fitted values and it is less or more random. Thank You ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Homoscedesticity in R Without BP Test
On Mon, 4 Apr 2016, Deepak Singh wrote: Respected Sir, I am doing a project on multiple linear model fitting and in that project I have to test Homoscedesticity of errors I have google for the same and found bptest for the same but in R version 3.2.4 bp test is not available. The function is called bptest() and is implemented in package "lmtest" which is available for current versions of R, see https://CRAN.R-project.org/package=lmtest To install it, run: install.packages("lmtest") And then to load the package and try the function: library("lmtest") example("bptest") So please suggest me a test on homoscedesticity ASAP as we have to submit our report on 7-04-2016. P.S. : I have plotted residuals against fitted values and it is less or more random. Thank You ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Homoscedesticity in R Without BP Test
Hi Deepak, In econometrics there is another test very often used : the white test. The white test is based on the comparison of the estimated variances of residuals when the model is estimated by OLS under the assumption of homoscedasticity and when the model is estimated by OLS under the assumption of heteroscedastic. The White test with R install.packages("bstats") library(bstats) white.test(LinearModel) Hope this helps. Sacha De : Deepak SinghÀ : r-help@r-project.org Envoyé le : Lundi 4 avril 2016 10h40 Objet : [R] Test for Homoscedesticity in R Without BP Test Respected Sir, I am doing a project on multiple linear model fitting and in that project I have to test Homoscedesticity of errors I have google for the same and found bptest for the same but in R version 3.2.4 bp test is not available. So please suggest me a test on homoscedesticity ASAP as we have to submit our report on 7-04-2016. P.S. : I have plotted residuals against fitted values and it is less or more random. Thank You ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Homoscedesticity in R Without BP Test
You might "google Breusch Pagan test r" and find that the test is implemented in lmtest package. On 4 Apr 2016 17:28, "Deepak Singh"wrote: > Respected Sir, > I am doing a project on multiple linear model fitting and in that project I > have to test Homoscedesticity of errors I have google for the same and > found bptest for the same but in R version 3.2.4 bp test is not available. > So please suggest me a test on homoscedesticity ASAP as we have to submit > our report on 7-04-2016. > > P.S. : I have plotted residuals against fitted values and it is less or > more random. > > Thank You ! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test hypothesis in R
> On Mar 23, 2016, at 1:44 PM, ruipbarra...@sapo.pt wrote: > > Hello, > > Try > > ?t.test > t.test(mA, mB, alternative = "greater") > > Hope this helps, > > Rui Barradas > > > Citando Eliza Botto: > >> Dear All, >> I want to test a hypothesis in R by using student' t-test (P-values). >> The hypothesis is that model A produces lesser error than model B at >> ten stations. Obviously, Null Hypothesis (H0) is that the error >> produces by model A is not lower than model B. NOT "obviously". You only get to do one-sided tests when the scientific question would not allow the possibility of a departure to "the other side". Two-sided tests are the norm in scientific literature, often to the experimenter's distress when they haven't done a thoughtful (non-optimistic) power analysis and their results are inconclusive as a result. Your hypothesis _should_ have been constructed _before_ you saw the data. That is if you want to be an ethical scientist. >> The error magnitudes are >> >> #model A >>> dput(mA) >> >> c(36.1956086452583, 34.9996207622861, 36.435733025221, >> 37.2003157636202, 36.1318687775115, 37.164132533536, >> 35.2028759357069, 36.7719835944373, 38.3861425339751, >> 37.4174132119744) >> #model B >>> dput(mB) >> >> c(39.7655211768704, 40.1730916643841, 39.3699055738618, >> 39.401619831763, 41.1218634441457, 39.1968630742826, >> 40.5265825061639, 40.4674956975404, 40.5954427072364, >> 41.4875529130543) Those are not models. They are just vectors of numbers. And they seem unlikely to be residual errors of a linear model since they are not centered on zero. I doubt there is enough in your presentation for a sensible comment on the proper analysis. -- David. >> >> Now can I test my hypothesis in R? >> Thankyou very much in Advance, >> Eliza >> [[alternative HTML version deleted]] >> >> __ David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test hypothesis in R
Sorry, but in your original post you said that " Null Hypothesis (H0) is that the error produces by model A is not lower than model B". If now is that model A produces less error change to alternative="less". The relevant part in the help page ?t.test is alternative = "greater" is the alternative that x has a larger mean than y. Rui Barradas Citando Eliza Botto <eliza_bo...@outlook.com>: > Thnx Rui, > Just one point though > > Should it be alternative="greater" or "less"? Since alternative > hypothesis is that model A produced less error. > > regards, > > Eliza > > - > Date: Wed, 23 Mar 2016 20:44:20 + > From: ruipbarra...@sapo.pt > To: eliza_bo...@outlook.com > CC: r-help@r-project.org > Subject: Re: [R] test hypothesis in R > Dear All, > I want to test a hypothesis in R by using student' t-test (P-values). > The hypothesis is that model A produces lesser error than model B at > ten stations. Obviously, Null Hypothesis (H0) is that the error > produces by model A is not lower than model B. > The error magnitudes are > > #model A >> dput(mA) > > c(36.1956086452583, 34.9996207622861, 36.435733025221, > 37.2003157636202, 36.1318687775115, 37.164132533536, > 35.2028759357069, 36.7719835944373, 38.3861425339751, > 37.4174132119744) > #model B >> dput(mB) > > c(39.7655211768704, 40.1730916643841, 39.3699055738618, > 39.401619831763, 41.1218634441457, 39.1968630742826, > 40.5265825061639, 40.4674956975404, 40.5954427072364, > 41.4875529130543) > > Now can I test my hypothesis in R? > Thankyou very much in Advance, > Eliza > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, > minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test hypothesis in R
Thnx Rui, Just one point though Should it be alternative="greater" or "less"? Since alternative hypothesis is that model A produced less error. regards, Eliza Date: Wed, 23 Mar 2016 20:44:20 + From: ruipbarra...@sapo.pt To: eliza_bo...@outlook.com CC: r-help@r-project.org Subject: Re: [R] test hypothesis in R Hello, Try ?t.test t.test(mA, mB, alternative = "greater") Hope this helps, Rui Barradas Citando Eliza Botto <eliza_bo...@outlook.com>: Dear All, I want to test a hypothesis in R by using student' t-test (P-values). The hypothesis is that model A produces lesser error than model B at ten stations. Obviously, Null Hypothesis (H0) is that the error produces by model A is not lower than model B. The error magnitudes are #model A dput(mA) c(36.1956086452583, 34.9996207622861, 36.435733025221, 37.2003157636202, 36.1318687775115, 37.164132533536, 35.2028759357069, 36.7719835944373, 38.3861425339751, 37.4174132119744) #model B dput(mB) c(39.7655211768704, 40.1730916643841, 39.3699055738618, 39.401619831763, 41.1218634441457, 39.1968630742826, 40.5265825061639, 40.4674956975404, 40.5954427072364, 41.4875529130543) Now can I test my hypothesis in R? Thankyou very much in Advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test hypothesis in R
Hello, Try ?t.test t.test(mA, mB, alternative = "greater") Hope this helps, Rui Barradas Citando Eliza Botto: > Dear All, > I want to test a hypothesis in R by using student' t-test (P-values). > The hypothesis is that model A produces lesser error than model B at > ten stations. Obviously, Null Hypothesis (H0) is that the error > produces by model A is not lower than model B. > The error magnitudes are > > #model A >> dput(mA) > > c(36.1956086452583, 34.9996207622861, 36.435733025221, > 37.2003157636202, 36.1318687775115, 37.164132533536, > 35.2028759357069, 36.7719835944373, 38.3861425339751, > 37.4174132119744) > #model B >> dput(mB) > > c(39.7655211768704, 40.1730916643841, 39.3699055738618, > 39.401619831763, 41.1218634441457, 39.1968630742826, > 40.5265825061639, 40.4674956975404, 40.5954427072364, > 41.4875529130543) > > Now can I test my hypothesis in R? > Thankyou very much in Advance, > Eliza > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, > minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if a url exists
On 29/06/2014, 7:12 AM, Hui Du wrote: Hi all, I need to test if a url exists. I used url.exists() in RCurl package library(RCurl) however the test result is kind of weird. For example, url.exists(http://www.amazon.com;) [1] FALSE although www.amazon.comhttp://www.amazon.com is a valid url. Does anybody know how to use that function correctly or the other way to test url existence? You can use the .header = TRUE option to that call to see the error 405 that it gives. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test the return from grep or agrep
On 01/03/2014 23:32, Hui Du wrote: Hi All, My sample code looks like options(stringsAsFactors = FALSE); clean = function(x) { loc = agrep(ABC, x$name); x[loc,]$new_name - NEW; x; } name = c(12, dad, dfd); y = data.frame(name = as.character(name), idx = 1:3); y$new_name = y$name; z - clean(y) The snippet does not work because I forgot to test the return value of agrep. If no pattern is found, it returns 0 and the following x[loc, ]$new_name does not like. I know how to fix that part. However, my code has many places like that, say over 100 calls for agrep or grep for different patterns and substitution. Is there any smart way to fix them all rather than line by line? That is not true: it returns integer(0). (If it returned 0 it would work.) For grep() I would recommend using grepl() instead. Otherwise if(length(loc)) x[loc,]$new_name - NEW or x[loc,]$new_name - rep_len(NEW, length(loc)) Your code is full of pointless empty statements (between ; and NL): R is not C and ; is a separator, not a terminator. Many thanks. HXD -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test to determine if there is a difference between two means
Inline below. Cheers, Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch On Tue, Dec 24, 2013 at 7:38 AM, wesley bell wesleybel...@yahoo.com wrote: Hi, I have a data set where there are 20 experiments which each ran for 10 minutes. In each experiment an insect had a choice to spend time in one of two chambers. Each experiment therefore has number of seconds spent in each chamber. I want to know whether there is a difference in the mean time spent in each chamber. Yes, there is. Always. I was going to do a t-test but was advised that there was a better way, something about introducing random numbers? I was hoping someone could help? This list is about R, not statistics, although they certainly overlap. I suggest you post on stats.stackexchange.com instead for statistics help. Better yet, you might do well to talk with a local expert about statistical issues, as you are obviously weak here. Thanks Wes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test ADF differences in R and Eviews
On Dec 5, 2013, at 3:18 PM, nooldor wrote: Hi, In attachment you can find source data on which I run adf.test() and print-screen with results in R and Eviews. Results are very different. Did I missed something? Yes. You missed the list of acceptable file types for r-help. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test wilcoxon sur R help!
Hi, Try: fun1 - function(dat){ mat1 - combn(colnames(dat1),2) res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } set.seed(432) dat1 - as.data.frame(matrix(sample(18*10,18*10,replace=FALSE),ncol=18)) fun1(dat1) #gives the p-value for each pair of columns Hi, I want to make a wilcoxon test, i have 18 columns each column corresponds to a different sample and i want to compare one to each other with a wilcoxon test in one step this is possible ? or do i compare two by tow? Does it exist a code for automation this test? like this i dont have to type the code for each couple. thanks! denisse __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test wilcoxon sur R help!
Hello, There's a bug in your function, it should be 'dat', not 'dat1'. In the line marked, below. fun1 - function(dat){ mat1 - combn(colnames(dat),2) # Here, 'dat' not 'dat1' res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } Hope this helps, Rui Barradas Em 24-10-2013 20:16, arun escreveu: Hi, Try: fun1 - function(dat){ mat1 - combn(colnames(dat1),2) res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } set.seed(432) dat1 - as.data.frame(matrix(sample(18*10,18*10,replace=FALSE),ncol=18)) fun1(dat1) #gives the p-value for each pair of columns Hi, I want to make a wilcoxon test, i have 18 columns each column corresponds to a different sample and i want to compare one to each other with a wilcoxon test in one step this is possible ? or do i compare two by tow? Does it exist a code for automation this test? like this i dont have to type the code for each couple. thanks! denisse __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test wilcoxon sur R help!
Hi, Check out this function:- pairwise.wilcox.test {package=stats}. example(pairwise.wilcox.test) On Fri, Oct 25, 2013 at 2:15 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, There's a bug in your function, it should be 'dat', not 'dat1'. In the line marked, below. fun1 - function(dat){ mat1 - combn(colnames(dat),2) # Here, 'dat' not 'dat1' res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } Hope this helps, Rui Barradas Em 24-10-2013 20:16, arun escreveu: Hi, Try: fun1 - function(dat){ mat1 - combn(colnames(dat1),2) res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } set.seed(432) dat1 - as.data.frame(matrix(sample(18*10,18*10,replace=FALSE),ncol=18)) fun1(dat1) #gives the p-value for each pair of columns Hi, I want to make a wilcoxon test, i have 18 columns each column corresponds to a different sample and i want to compare one to each other with a wilcoxon test in one step this is possible ? or do i compare two by tow? Does it exist a code for automation this test? like this i dont have to type the code for each couple. thanks! denisse __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test wilcoxon sur R help!
It looks much better than mine. with p value adjustment: p.adjust(fun1(dat1), method = holm, n = 153) # dat1$id - 1:10 library(reshape2) dat2 - melt(dat1,id.var=id) with(dat2,pairwise.wilcox.test(value,variable)) with(dat2,pairwise.wilcox.test(value,variable,p.adj=none)) A.K. On Friday, October 25, 2013 12:05 AM, vikram ranga babuaw...@gmail.com wrote: Hi, Check out this function:- pairwise.wilcox.test {package=stats}. example(pairwise.wilcox.test) On Fri, Oct 25, 2013 at 2:15 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, There's a bug in your function, it should be 'dat', not 'dat1'. In the line marked, below. fun1 - function(dat){ mat1 - combn(colnames(dat),2) # Here, 'dat' not 'dat1' res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } Hope this helps, Rui Barradas Em 24-10-2013 20:16, arun escreveu: Hi, Try: fun1 - function(dat){ mat1 - combn(colnames(dat1),2) res - sapply(seq_len(ncol(mat1)),function(i) {x1- dat[,mat1[,i]]; wilcox.test(x1[,1],x1[,2])$p.value}) names(res) - apply(mat1,2,paste,collapse=_) res } set.seed(432) dat1 - as.data.frame(matrix(sample(18*10,18*10,replace=FALSE),ncol=18)) fun1(dat1) #gives the p-value for each pair of columns Hi, I want to make a wilcoxon test, i have 18 columns each column corresponds to a different sample and i want to compare one to each other with a wilcoxon test in one step this is possible ? or do i compare two by tow? Does it exist a code for automation this test? like this i dont have to type the code for each couple. thanks! denisse __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if 2 samples differ if they have autocorrelation
I imagine that most readers of this list will put your question in the too hard basket. That being so, here is my inexpert take on the question. The issue is to estimate the uncertainty in the estimated difference of the means. This uncertainty depends on the nature of the serial dependence of the series. Therefore in order to get anywhere you need to *model* this dependence. Different models could yield very different values for the variance of the estimated difference of the means. If the series are observed at the same times I would suggest taking the pointwise difference of the two series: D_t = X_t - Y_t, say. Fit the best arima model that you can to D_t. Then the standard error of what is incorrectly labelled intercept (it is actually the estimate of the series *mean*) is the appropriate estimate of the uncertainty. The ratio of the intercept value to its standard error is the test statistic you are looking for. If the series are *not* observed at the same times but can be assumed to be independent then model *each* series as well as you can (different models for each series) and obtain the standard error of the intercept for each series. Your test statistic is then the difference of the intercept estimates divided by sqrt(se_X^2 + se_Y^2) in what I hope is an obvious notation. If the series are not observed at the same times and cannot be assumed to be independent then you probably haven't got sufficient information to answer the question that you wish to answer. I hope that there is some value in the forgoing. cheers, Rolf Turner On 18/07/13 21:50, Eric Jaeger wrote: Dear all I have one question that I struggle to find an answer: Let`s assume I have 2 timeseries of daily PnL data over 2 years coming from 2 different trading strategies. I want to find out if strategy A is better than strategy B. The problem is that the two series have serial correlations, hence I cannot just do a simple t-test. I tried something like this: 1.create cumulative timeseries of PnL_A = C_A and of PnL_B = C_B 2.take the difference of both: C_A – C_B = DiffPnL (to see how the difference evolves over time) 3.do a regression: DiffPnL = beta * time + error (I thought if beta is significantly different from 0 than the two time series are different) 4.estimate beta not with OLS, but with the Newey-West method (HAC estimator) - this corrects statistical tests, standard errors for beta heteroskedasticity and autocorrelation BUT: I read something that the tests are biased when the timeseries are unit root non-stationary (which is due to the fact that I take cumulative time series) I am lost! This should be fairly simple: test if two samples differ if they have autocorrelation? Probably my approach above is completely wrong… __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for column equality across matrices
Dear William, thanks a lot. I've found another nice alternative: A - matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B - combn(16, 3) B.n - B[, -which(duplicated(t(cbind(A, B - ncol(A)] Best wishes, Alrik -Ursprüngliche Nachricht- Von: arun [mailto:smartpink...@yahoo.com] Gesendet: Samstag, 13. Juli 2013 19:57 An: William Dunlap Cc: mailman, r-help; Thiem Alrik Betreff: Re: [R] Test for column equality across matrices I tried it on a slightly bigger dataset: A1 - matrix(t(expand.grid(1:90, 15, 16)), nrow = 3) B1 - combn(90, 3) which(is.element(columnsOf(B1), columnsOf(A1))) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 which(apply(t(B1),1,paste,collapse=)%in%apply(t(A1),1,paste,collapse=)) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 44331 B1[,44331] #[1] 14 15 16 which(apply(t(A1),1,paste,collapse=)==141516) #[1] 14 B1New-B1[,!apply(t(B1),1,paste,collapse=)%in%apply(t(A1),1,paste,collapse=)] newB - B1[ , !is.element(columnsOf(B1), columnsOf(A1))] identical(B1New,newB) #[1] FALSE is.element(B1[,44331],A1[,14]) #[1] TRUE TRUE TRUE B1Sp-columnsOf(B1) B1Sp[[44331]] #[1] 14 15 16 A1Sp- columnsOf(A1) A1Sp[[14]] #[1] 14 15 16 is.element(B1Sp[[44331]],A1Sp[[14]]) #[1] TRUE TRUE TRUE A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: Thiem Alrik th...@sipo.gess.ethz.ch; mailman, r-help r-help@r-project.org Cc: Sent: Saturday, July 13, 2013 1:30 PM Subject: Re: [R] Test for column equality across matrices Try columnsOf - function(mat) split(mat, col(mat)) newB - B[ , !is.element(columnsOf(B), columnsOf(A))] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thiem Alrik Sent: Saturday, July 13, 2013 6:45 AM To: mailman, r-help Subject: [R] Test for column equality across matrices Dear list, I have two matrices A - matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B - combn(16, 3) Now I would like to exclude all columns from the 560 columns in B which are identical to any 1 of the 6 columns in A. How could I do this? Many thanks and best wishes, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for column equality across matrices
It looks like match() (and relatives like %in% and is.element) act a bit unpredictably on lists when the list elements are vectors of numbers of different types. If you match integers to integers or doubles to doubles it works as expected, but when the types don't match the results vary. I would expect the following to give either c(1,2) or c(NA,NA) but not c(1,NA): match( list( c(13L,15L,16L), c(14L,15L,16L)), list( c(13.,15.,16.), c(14.,15.,16.) )) [1] 1 NA It works when the list elements have the same type match( list( c(13L,15L,16L), c(14L,15L,16L)), list( c(13L,15L,16L), c(14L,15L,16L) )) [1] 1 2 match( list( c(13.,15.,16.), c(14.,15.,16.)), list( c(13.,15.,16.), c(14.,15.,16.) )) [1] 1 2 match( list( c(13.,15.,16.), c(14L,15L,16L)), list( c(13.,15.,16.), c(14L,15L,16L) )) [1] 1 2 So - A and B should be coerced to have a common type ('storage.mode') before comparing them. By the way, the discrepency might happen because match() applied to lists might be implemented by calling deparse on each element of each list and then using the character method of match. For sequential integers deparse uses colon notation; e.g., c(14L,15L,16L) becomes the string 14:16. But usually deparse puts an 'L' after integers so they would never match with a double of the same value. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Saturday, July 13, 2013 10:57 AM To: William Dunlap Cc: R help; Thiem Alrik Subject: Re: [R] Test for column equality across matrices I tried it on a slightly bigger dataset: A1 - matrix(t(expand.grid(1:90, 15, 16)), nrow = 3) B1 - combn(90, 3) which(is.element(columnsOf(B1), columnsOf(A1))) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 which(apply(t(B1),1,paste,collapse=)%in%apply(t(A1),1,paste,collapse=)) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 44331 B1[,44331] #[1] 14 15 16 which(apply(t(A1),1,paste,collapse=)==141516) #[1] 14 B1New-B1[,!apply(t(B1),1,paste,collapse=)%in%apply(t(A1),1,paste,collapse=)] newB - B1[ , !is.element(columnsOf(B1), columnsOf(A1))] identical(B1New,newB) #[1] FALSE is.element(B1[,44331],A1[,14]) #[1] TRUE TRUE TRUE B1Sp-columnsOf(B1) B1Sp[[44331]] #[1] 14 15 16 A1Sp- columnsOf(A1) A1Sp[[14]] #[1] 14 15 16 is.element(B1Sp[[44331]],A1Sp[[14]]) #[1] TRUE TRUE TRUE A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: Thiem Alrik th...@sipo.gess.ethz.ch; mailman, r-help r-help@r-project.org Cc: Sent: Saturday, July 13, 2013 1:30 PM Subject: Re: [R] Test for column equality across matrices Try columnsOf - function(mat) split(mat, col(mat)) newB - B[ , !is.element(columnsOf(B), columnsOf(A))] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thiem Alrik Sent: Saturday, July 13, 2013 6:45 AM To: mailman, r-help Subject: [R] Test for column equality across matrices Dear list, I have two matrices A - matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B - combn(16, 3) Now I would like to exclude all columns from the 560 columns in B which are identical to any 1 of the 6 columns in A. How could I do this? Many thanks and best wishes, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for column equality across matrices
Try columnsOf - function(mat) split(mat, col(mat)) newB - B[ , !is.element(columnsOf(B), columnsOf(A))] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thiem Alrik Sent: Saturday, July 13, 2013 6:45 AM To: mailman, r-help Subject: [R] Test for column equality across matrices Dear list, I have two matrices A - matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B - combn(16, 3) Now I would like to exclude all columns from the 560 columns in B which are identical to any 1 of the 6 columns in A. How could I do this? Many thanks and best wishes, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for column equality across matrices
I tried it on a slightly bigger dataset: A1 - matrix(t(expand.grid(1:90, 15, 16)), nrow = 3) B1 - combn(90, 3) which(is.element(columnsOf(B1), columnsOf(A1))) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 which(apply(t(B1),1,paste,collapse=)%in%apply(t(A1),1,paste,collapse=)) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 44331 B1[,44331] #[1] 14 15 16 which(apply(t(A1),1,paste,collapse=)==141516) #[1] 14 B1New-B1[,!apply(t(B1),1,paste,collapse=)%in%apply(t(A1),1,paste,collapse=)] newB - B1[ , !is.element(columnsOf(B1), columnsOf(A1))] identical(B1New,newB) #[1] FALSE is.element(B1[,44331],A1[,14]) #[1] TRUE TRUE TRUE B1Sp-columnsOf(B1) B1Sp[[44331]] #[1] 14 15 16 A1Sp- columnsOf(A1) A1Sp[[14]] #[1] 14 15 16 is.element(B1Sp[[44331]],A1Sp[[14]]) #[1] TRUE TRUE TRUE A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: Thiem Alrik th...@sipo.gess.ethz.ch; mailman, r-help r-help@r-project.org Cc: Sent: Saturday, July 13, 2013 1:30 PM Subject: Re: [R] Test for column equality across matrices Try columnsOf - function(mat) split(mat, col(mat)) newB - B[ , !is.element(columnsOf(B), columnsOf(A))] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Thiem Alrik Sent: Saturday, July 13, 2013 6:45 AM To: mailman, r-help Subject: [R] Test for column equality across matrices Dear list, I have two matrices A - matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B - combn(16, 3) Now I would like to exclude all columns from the 560 columns in B which are identical to any 1 of the 6 columns in A. How could I do this? Many thanks and best wishes, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for column equality across matrices
Hi, One way would be: which(apply(t(B),1,paste,collapse=)%in%apply(t(A),1,paste,collapse=)) #[1] 105 196 274 340 395 B[,105] #[1] 1 15 16 B[,196] #[1] 2 15 16 B1-B[,!apply(t(B),1,paste,collapse=)%in%apply(t(A),1,paste,collapse=)] dim(B1) #[1] 3 555 dim(B) #[1] 3 560 #or B2-B[,is.na(match(interaction(as.data.frame(t(B))),interaction(as.data.frame(t(A)] identical(B1,B2) #[1] TRUE A.K. - Original Message - From: Thiem Alrik th...@sipo.gess.ethz.ch To: mailman, r-help r-help@r-project.org Cc: Sent: Saturday, July 13, 2013 9:45 AM Subject: [R] Test for column equality across matrices Dear list, I have two matrices A - matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B - combn(16, 3) Now I would like to exclude all columns from the 560 columns in B which are identical to any 1 of the 6 columns in A. How could I do this? Many thanks and best wishes, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Parallel Regression Assumption in R
Dear Heather, You can make this test using the ordinal package. Here the function clm fits cumulative link models where the ordinal logistic regression model is a special case (using the logit link). Let me illustrate how to test the parallel regression assumption for a particular variable using clm in the ordinal package. I am using the wine dataset from the same package, I fit a model with two explanatory variables; temp and contact, and I test the parallel regression assumption for the contact variable in a likelihood ratio test: library(ordinal) Loading required package: MASS Loading required package: ucminf Loading required package: Matrix Loading required package: lattice head(wine) response rating temp contact bottle judge 1 36 2 cold no 1 1 2 48 3 cold no 2 1 3 47 3 cold yes 3 1 4 67 4 cold yes 4 1 5 77 4 warm no 5 1 6 60 4 warm no 6 1 fm1 - clm(rating ~ temp + contact, data=wine) fm2 - clm(rating ~ temp, nominal=~ contact, data=wine) anova(fm1, fm2) Likelihood ratio tests of cumulative link models: formula:nominal: link: threshold: fm1 rating ~ temp + contact ~1 logit flexible fm2 rating ~ temp ~contact logit flexible no.parAIC logLik LR.stat df Pr(Chisq) fm1 6 184.98 -86.492 fm2 9 190.42 -86.209 0.5667 3 0.904 The idea is to fit the model under the null hypothesis (parallel effects - fm1) and under the alternative hypothesis (non-parallel effects for contact - fm2) and compare these models with anova() which performs the LR test. From the high p-value we see that the null cannot be rejected and there is no evidence of non-parallel slopes in this case. For additional information, I suggest that you take a look at the following package vignette (http://cran.r-project.org/web/packages/ordinal/vignettes/clm_tutorial.pdf) where these kind of tests are more thoroughly described starting page 6. I think you can also make similar tests with the VGAM package, but I am not as well versed in that package. Hope this helps, Rune Rune Haubo Bojesen Christensen Postdoc DTU Compute - Section for Statistics --- Technical University of Denmark Department of Applied Mathematics and Computer Science Richard Petersens Plads Building 324, Room 220 2800 Lyngby Direct +45 45253363 Mobile +45 30264554 http://www.imm.dtu.dk On 11 March 2013 22:52, Nicole Ford nicole.f...@me.com wrote: here's some code as an example hope it helps! mod-polr(vote~age+demsat+eusup+lrself+male+retnat+union+urban, data=dat) summary(mod) mod-polr(vote~age+demsat+eusup+lrself+male+retnat+union+urban, data=dat) levs-levels(dat$vote) tmpdat-list() for(i in 1:(nlevels(dat$vote)-1)){ tmpdat[[i]] - dat tmpdat[[i]]$z - as.numeric(as.numeric(tmpdat[[1]]$vote) = levs[i]) } form-as.formula(z~age+demsat+eusup+lrself+male+retnat+union+urban) mods-lapply(tmpdat, function(x)glm(form, data=x, family=binomial)) probs-sapply(mods, predict, type=response) p.logits-cbind(probs[,2], t(apply(probs, 1, diff)), 1-probs[,ncol(probs)]) p.ologit-predict(mod, type='probs') n-nrow(p.logits) bin.ll - p.logits[cbind(1:n, dat$vote)] ologit.ll - p.ologit[cbind(1:n, dat$vote)] binom.test(sum(bin.ll ologit.ll), n) dat$vote.fac-factor(dat$vote, levels=1:6) mod-polr(dat$vote.fac~age+demsat+eusup+lrself+male+retnat+union+urban, data=dat) source(http://www.quantoid.net/cat_pre.R ) catpre(mod) install.packages(rms) library(rms) olprobs-predict(mod, type='probs') pred.cat-apply(olprobs, 1, which.max) table(pred.cat, dat$vote) round(prop.table(table(pred.cat, dat$vote), 2), 3) On Mar 11, 2013, at 5:02 PM, Heather Kettrey wrote: Hi, I am running an analysis with an ordinal outcome and I need to run a test of the parallel regression assumption to determine if ordinal logistic regression is appropriate. I cannot find a function to conduct such a test. From searching various message boards I have seen a few useRs ask this same question without a definitive answer - and I came across a thread that indicated there is no such function available in any R packages. I hope this is incorrect. Does anyone know how to test the parallel regression assumption in R? Thanks for your help! -- Heather Hensman Kettrey PhD Candidate Department of Sociology Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] Test of Parallel Regression Assumption in R
Heather: You are at Vanderbilt, whose statistics department under Frank Harrell is a veritable bastion of R and statistical wisdom. I strongly recommend that you take a stroll over there in the lovely spring weather and seek their help. I can't imagine how you could do better than that! Cheers, Bert On Mon, Mar 11, 2013 at 2:02 PM, Heather Kettrey heather.h.kett...@vanderbilt.edu wrote: Hi, I am running an analysis with an ordinal outcome and I need to run a test of the parallel regression assumption to determine if ordinal logistic regression is appropriate. I cannot find a function to conduct such a test. From searching various message boards I have seen a few useRs ask this same question without a definitive answer - and I came across a thread that indicated there is no such function available in any R packages. I hope this is incorrect. Does anyone know how to test the parallel regression assumption in R? Thanks for your help! -- Heather Hensman Kettrey PhD Candidate Department of Sociology Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Parallel Regression Assumption in R
Perhaps you should be asking whether such an algorithm exists, regardless of whether it is already implemented in R. However, this is the wrong place to ask such theory questions... your local statistics expert might know, or you could ask on a statistics theory forum such as stats.stackexchange.com. With the answer to that question you could use the RSiteSeek function to search for references to that algorithm, or even implement it yourself. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Heather Kettrey heather.h.kett...@vanderbilt.edu wrote: Hi, I am running an analysis with an ordinal outcome and I need to run a test of the parallel regression assumption to determine if ordinal logistic regression is appropriate. I cannot find a function to conduct such a test. From searching various message boards I have seen a few useRs ask this same question without a definitive answer - and I came across a thread that indicated there is no such function available in any R packages. I hope this is incorrect. Does anyone know how to test the parallel regression assumption in R? Thanks for your help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test of Parallel Regression Assumption in R
here's some code as an example hope it helps! mod-polr(vote~age+demsat+eusup+lrself+male+retnat+union+urban, data=dat) summary(mod) mod-polr(vote~age+demsat+eusup+lrself+male+retnat+union+urban, data=dat) levs-levels(dat$vote) tmpdat-list() for(i in 1:(nlevels(dat$vote)-1)){ tmpdat[[i]] - dat tmpdat[[i]]$z - as.numeric(as.numeric(tmpdat[[1]]$vote) = levs[i]) } form-as.formula(z~age+demsat+eusup+lrself+male+retnat+union+urban) mods-lapply(tmpdat, function(x)glm(form, data=x, family=binomial)) probs-sapply(mods, predict, type=response) p.logits-cbind(probs[,2], t(apply(probs, 1, diff)), 1-probs[,ncol(probs)]) p.ologit-predict(mod, type='probs') n-nrow(p.logits) bin.ll - p.logits[cbind(1:n, dat$vote)] ologit.ll - p.ologit[cbind(1:n, dat$vote)] binom.test(sum(bin.ll ologit.ll), n) dat$vote.fac-factor(dat$vote, levels=1:6) mod-polr(dat$vote.fac~age+demsat+eusup+lrself+male+retnat+union+urban, data=dat) source(http://www.quantoid.net/cat_pre.R ) catpre(mod) install.packages(rms) library(rms) olprobs-predict(mod, type='probs') pred.cat-apply(olprobs, 1, which.max) table(pred.cat, dat$vote) round(prop.table(table(pred.cat, dat$vote), 2), 3) On Mar 11, 2013, at 5:02 PM, Heather Kettrey wrote: Hi, I am running an analysis with an ordinal outcome and I need to run a test of the parallel regression assumption to determine if ordinal logistic regression is appropriate. I cannot find a function to conduct such a test. From searching various message boards I have seen a few useRs ask this same question without a definitive answer - and I came across a thread that indicated there is no such function available in any R packages. I hope this is incorrect. Does anyone know how to test the parallel regression assumption in R? Thanks for your help! -- Heather Hensman Kettrey PhD Candidate Department of Sociology Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test for a condition in a vector for loop not working
Once again, thanks! MVS - MVS = Matthew Van Scoyoc Graduate Research Assistant, Ecology Wildland Resources Department Ecology Center Quinney College of Natural Resources Utah State University Logan, UT = Think SNOW! -- View this message in context: http://r.789695.n4.nabble.com/test-for-a-condition-in-a-vector-for-loop-not-working-tp4649212p4649216.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Random Points on a Sphere
Hi Lorenzo, Just a quick thought, the uniform probability density on a unit sphere is 1 / (4pi), what about binning those random points according to their directions and do a chi-square test? Regards, Guo On Sun, Oct 7, 2012 at 2:16 AM, cbe...@tajo.ucsd.edu wrote: Lorenzo Isella lorenzo.ise...@gmail.com writes: Dear All, I implemented an algorithm for (uniform) random rotations. In order to test it, I can apply it to a unit vector (0,0,1) in Cartesian coordinates. The result is supposed to be a set of random, uniformly distributed, points on a sphere (not the point of the algorithm, but a way to test it). This is what the points look like when I plot them, but other then eyeballing them, can anyone suggest a test to ensure that I am really generating uniform random points on a sphere? There is a substantial literature on this topic and more than one (metaphorical?) direction you could follow. I suggest you Google 'directional statistics' and start reading. Visit http://www.rseek.org and enter 'directional statistics' in the search box and click on the search button to see if there is something in R to meet your needs. A post to r-sig-geo might get more helpful responses once you can focus the question a bit more. HTH, Chuck Many thanks Lorenzo -- Charles C. BerryDept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Random Points on a Sphere
Lorenzo Isella lorenzo.ise...@gmail.com writes: Dear All, I implemented an algorithm for (uniform) random rotations. In order to test it, I can apply it to a unit vector (0,0,1) in Cartesian coordinates. The result is supposed to be a set of random, uniformly distributed, points on a sphere (not the point of the algorithm, but a way to test it). This is what the points look like when I plot them, but other then eyeballing them, can anyone suggest a test to ensure that I am really generating uniform random points on a sphere? There is a substantial literature on this topic and more than one (metaphorical?) direction you could follow. I suggest you Google 'directional statistics' and start reading. Visit http://www.rseek.org and enter 'directional statistics' in the search box and click on the search button to see if there is something in R to meet your needs. A post to r-sig-geo might get more helpful responses once you can focus the question a bit more. HTH, Chuck Many thanks Lorenzo -- Charles C. BerryDept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Random Points on a Sphere
On Fri, Oct 5, 2012 at 5:39 PM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Dear All, I implemented an algorithm for (uniform) random rotations. In order to test it, I can apply it to a unit vector (0,0,1) in Cartesian coordinates. The result is supposed to be a set of random, uniformly distributed, points on a sphere (not the point of the algorithm, but a way to test it). This is what the points look like when I plot them, but other then eyeballing them, can anyone suggest a test to ensure that I am really generating uniform random points on a sphere? Many thanks Gut says to divide the surface into n bits of equal area and see if the points appear uniformly in those using something chi-squared-ish, but I'm not aware of a canonical way to do so. Cheers, Michael Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Random Points on a Sphere
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of R. Michael Weylandt Sent: Friday, October 05, 2012 11:17 AM To: Lorenzo Isella Cc: r-help@r-project.org Subject: Re: [R] Test for Random Points on a Sphere On Fri, Oct 5, 2012 at 5:39 PM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Dear All, I implemented an algorithm for (uniform) random rotations. In order to test it, I can apply it to a unit vector (0,0,1) in Cartesian coordinates. The result is supposed to be a set of random, uniformly distributed, points on a sphere (not the point of the algorithm, but a way to test it). This is what the points look like when I plot them, but other then eyeballing them, can anyone suggest a test to ensure that I am really generating uniform random points on a sphere? Many thanks Gut says to divide the surface into n bits of equal area and see if the points appear uniformly in those using something chi-squared-ish, but I'm not aware of a canonical way to do so. Cheers, Michael Lorenzo I would be more inclined to use a method which is known to produce a points uniformly distributed on the surface of a sphere and not worry about testing your results. You might find the discussion at the following link useful. http://mathworld.wolfram.com/SpherePointPicking.html Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test Breslow-Day for svytable??
Suggstion: You need to send us more information, i.e. the code that genrated daty, or a listing of the daty structure, and a copy of the listing produced by epi.2by2 John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Diana Marcela Martinez Ruiz dianamm...@hotmail.com 8/31/2012 10:20 AM Hi all, I want to know how to perform the test Breslow-Day test for homogeneity of odds ratios (OR) stratified for svytable. This test is obtained with the following code: epi.2by2 (dat = daty, method = case.control conf.level = 0.95, units = 100, homogeneity = breslow.day, verbose = TRUE) where daty is the object type table svytable consider it, but when I run the code does not throw the homogeneity test. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test Breslow-Day for svytable??
On Aug 31, 2012, at 7:20 AM, Diana Marcela Martinez Ruiz wrote: Hi all, I want to know how to perform the test Breslow-Day test for homogeneity of odds ratios (OR) stratified for svytable. This test is obtained with the following code: epi.2by2 (dat = daty, method = case.control conf.level = 0.95, missing comma here ...^ units = 100, homogeneity = breslow.day, verbose = TRUE) where daty is the object type table svytable consider it, but when I run the code does not throw the homogeneity test. You are asked in the Posting guide to copy all errors and warnings when asking about unexpected behavior. When I run epi.2y2 on the output of a syvtable object I get no errors, but I do get warnings which I think are due to non-integer entries in the weighted table. I also get from a svytable() usingits first example on the help page an object that is NOT a set of 2 x 2 tables in an array of the structure as expected by epi.2by2(). The fact that epi.2by2() will report numbers with labels for a 2 x 3 table means that its error checking is weak. This is the output of str(dat) from one of the example on epi.2by2's help page: str(dat) table [1:2, 1:2, 1:3] 41 13 6 53 66 37 25 83 23 37 ... - attr(*, dimnames)=List of 3 ..$ Exposure: chr [1:2] + - ..$ Disease : chr [1:2] + - ..$ Strata : chr [1:3] 20-29 yrs 30-39 yrs 40+ yrs Notice that is is a 2 x 2 x n array. (Caveat:: from here on out I am simply reading the help pages and using str() to look at the objects created to get an idea regarding success or failure. I am not an experienced user of either package.) I doubt that what you got from svytable is a 2 x 2 table. As another example you can build a 2 x 2 x n table from the built-in dataset: UCBAdmissions DF - as.data.frame(UCBAdmissions) ## Now 'DF' is a data frame with a grid of the factors and the counts ## in variable 'Freq'. dat2 - xtabs(Freq ~ Gender + Admit+Dept, DF) epiR::epi.2by2(dat = dat2, method = case.control, conf.level = 0.95, units = 100, homogeneity = breslow.day, verbose = TRUE)$OR.homog #- test.statistic dfp.value 1 18.82551 5 0.00207139 Using svydesign and svytable I _think_ this is how one would go about constructing a 2 x 2 table: tbl2-svydesign( ~ Gender + Admit+Dept, weights=~Freq, data=DF) summary(dclus1) (tbl2by2 - svytable(~ Gender + Admit+Dept, tbl2)) epiR::epi.2by2(dat = tbl, method = case.control, conf.level = 0.95, units = 100, homogeneity = breslow.day, verbose = TRUE)$OR.homog #--- test.statistic dfp.value 1 18.82551 5 0.00207139 (At least I got internal consistency. I see you copied Thomas Lumley, which is a good idea. I'll be happy to get corrected on any point. I'm adding the maintainer of epiR to the recipients.) -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test Breslow-Day for svytable??
On Sat, Sep 1, 2012 at 4:27 AM, David Winsemius dwinsem...@comcast.net wrote: On Aug 31, 2012, at 7:20 AM, Diana Marcela Martinez Ruiz wrote: Hi all, I want to know how to perform the test Breslow-Day test for homogeneity of odds ratios (OR) stratified for svytable. This test is obtained with the following code: epi.2by2 (dat = daty, method = case.control conf.level = 0.95, missing comma here ...^ units = 100, homogeneity = breslow.day, verbose = TRUE) where daty is the object type table svytable consider it, but when I run the code does not throw the homogeneity test. You are asked in the Posting guide to copy all errors and warnings when asking about unexpected behavior. When I run epi.2y2 on the output of a syvtable object I get no errors, but I do get warnings which I think are due to non-integer entries in the weighted table. I also get from a svytable() usingits first example on the help page an object that is NOT a set of 2 x 2 tables in an array of the structure as expected by epi.2by2(). The fact that epi.2by2() will report numbers with labels for a 2 x 3 table means that its error checking is weak. This is the output of str(dat) from one of the example on epi.2by2's help page: str(dat) table [1:2, 1:2, 1:3] 41 13 6 53 66 37 25 83 23 37 ... - attr(*, dimnames)=List of 3 ..$ Exposure: chr [1:2] + - ..$ Disease : chr [1:2] + - ..$ Strata : chr [1:3] 20-29 yrs 30-39 yrs 40+ yrs Notice that is is a 2 x 2 x n array. (Caveat:: from here on out I am simply reading the help pages and using str() to look at the objects created to get an idea regarding success or failure. I am not an experienced user of either package.) I doubt that what you got from svytable is a 2 x 2 table. As another example you can build a 2 x 2 x n table from the built-in dataset: UCBAdmissions DF - as.data.frame(UCBAdmissions) ## Now 'DF' is a data frame with a grid of the factors and the counts ## in variable 'Freq'. dat2 - xtabs(Freq ~ Gender + Admit+Dept, DF) epiR::epi.2by2(dat = dat2, method = case.control, conf.level = 0.95, units = 100, homogeneity = breslow.day, verbose = TRUE)$OR.homog #- test.statistic dfp.value 1 18.82551 5 0.00207139 Using svydesign and svytable I _think_ this is how one would go about constructing a 2 x 2 table: tbl2-svydesign( ~ Gender + Admit+Dept, weights=~Freq, data=DF) summary(dclus1) (tbl2by2 - svytable(~ Gender + Admit+Dept, tbl2)) epiR::epi.2by2(dat = tbl, method = case.control, conf.level = 0.95, units = 100, homogeneity = breslow.day, verbose = TRUE)$OR.homog #--- test.statistic dfp.value 1 18.82551 5 0.00207139 (At least I got internal consistency. I see you copied Thomas Lumley, which is a good idea. I'll be happy to get corrected on any point. I'm adding the maintainer of epiR to the recipients.) Yes, that will give internal consistency from a data structure point of view. It won't give a valid test in real examples, though -- epi.2by2 doesn't know about complex sampling, and what you're passing it is just an estimate of the population 2x2xK table. What would work, though it's not quite the same as the Breslow-Day test, is to use svyloglin() and do a Rao-Scott test comparing the model with all two-way interactions ~(Gender+Dept+Admit)^2 to the saturated model ~Gender*Dept*Admit. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Tue, Aug 7, 2012 at 10:26 PM, Marc Schwartz marc_schwa...@me.com wrote: since there are alpha-numerics present, whereas the first option will: grepl([^[:alnum:]], ab%) [1] TRUE So, use the first option. And I should start reading more carefully. The above works fine for me. I ended up defining the following wrappers: is_alpha - function(x) {grepl([[:alpha:]], x)} ##Alphabetic characters is_digit - function(x) {grepl([[:digit:]], x)} ##Digits is_alnum - function(x) {grepl([[:alnum:]], x)} ##Alphanumeric characters is_punct - function(x) {grepl([[:punct:]], x)} ##Punctuation characters is_notalnum - function(x) {grepl([^[:alnum:]], x)} ##Non-Alphanumeric characters Thanks again Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Quick follow-up question. I'm always reluctant to create functions that would resemble the method of a function (here, is() ), but would in fact not be a genuine method. So would there be any incompatibility between is() and is.letter(), given that the latter is not a method of the former? Is it good (or acceptable) practice to define is.letter() as above? Would is_letter() be better? Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Another follow-up. To test for (non-)alphanumeric one would do the following: x - c(letters, 1:26, '+', '-', '%^') x[1:10] - paste(x[1:10], 1:10, sep='') x [1] a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 + - %^ xb - grepl([[:alnum:]],x) ##test for alphanumeric chars x[xb] [1] a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 xb - grepl([[:punct:]],x) ##test for non-alphanumeric chars x[xb] [1] + - %^ More regex rules are available on the Wiki [1]. Regards Liviu [1] http://en.wikipedia.org/wiki/Regular_expression __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Tue, Aug 7, 2012 at 4:28 AM, Liviu Andronic landronim...@gmail.com wrote: On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Quick follow-up question. I'm always reluctant to create functions that would resemble the method of a function (here, is() ), but would in fact not be a genuine method. So would there be any incompatibility between is() and is.letter(), given that the latter is not a method of the former? Is it good (or acceptable) practice to define is.letter() as above? Would is_letter() be better? It certainly won't cause problems if you never define anything of class letter or number. Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Aug 7, 2012, at 3:02 PM, Liviu Andronic landronim...@gmail.com wrote: On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Another follow-up. To test for (non-)alphanumeric one would do the following: x - c(letters, 1:26, '+', '-', '%^') x[1:10] - paste(x[1:10], 1:10, sep='') x [1] a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 + - %^ xb - grepl([[:alnum:]],x) ##test for alphanumeric chars x[xb] [1] a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 xb - grepl([[:punct:]],x) ##test for non-alphanumeric chars x[xb] [1] + - %^ That will get you values where punctuation characters are used, but there may be other non-alphanumeric characters in the vector. There may be ASCII control codes, tabs, newlines, CR, LF, spaces, etc. which would not be found by using [:punct:]. For example: grepl([[:punct:]], ) [1] FALSE If you want to explicitly look for non-alphanumeric characters, you would be better off using a negation of [:alnum:] such as: grepl([^[:alnum:]], x) or !grepl([[:alnum:]], x) Regards, Marc More regex rules are available on the Wiki [1]. Regards Liviu [1] http://en.wikipedia.org/wiki/Regular_expression __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Aug 7, 2012, at 3:18 PM, Marc Schwartz marc_schwa...@me.com wrote: On Aug 7, 2012, at 3:02 PM, Liviu Andronic landronim...@gmail.com wrote: On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Another follow-up. To test for (non-)alphanumeric one would do the following: x - c(letters, 1:26, '+', '-', '%^') x[1:10] - paste(x[1:10], 1:10, sep='') x [1] a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 + - %^ xb - grepl([[:alnum:]],x) ##test for alphanumeric chars x[xb] [1] a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 xb - grepl([[:punct:]],x) ##test for non-alphanumeric chars x[xb] [1] + - %^ That will get you values where punctuation characters are used, but there may be other non-alphanumeric characters in the vector. There may be ASCII control codes, tabs, newlines, CR, LF, spaces, etc. which would not be found by using [:punct:]. For example: grepl([[:punct:]], ) [1] FALSE If you want to explicitly look for non-alphanumeric characters, you would be better off using a negation of [:alnum:] such as: grepl([^[:alnum:]], x) or !grepl([[:alnum:]], x) Actually (for the second time in two days) I need to correct myself. The second option would not work correctly in cases where there is a mix of alpha-numerics and non: !grepl([[:alnum:]], ab%) [1] FALSE since there are alpha-numerics present, whereas the first option will: grepl([^[:alnum:]], ab%) [1] TRUE So, use the first option. Regards, Marc who is heading to the coffee machine... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Tue, Aug 7, 2012 at 10:18 PM, Marc Schwartz marc_schwa...@me.com wrote: That will get you values where punctuation characters are used, but there may be other non-alphanumeric characters in the vector. There may be ASCII control codes, tabs, newlines, CR, LF, spaces, etc. which would not be found by using [:punct:]. For example: grepl([[:punct:]], ) [1] FALSE If you want to explicitly look for non-alphanumeric characters, you would be better off using a negation of [:alnum:] such as: [..] !grepl([[:alnum:]], x) Good point! Thanks. Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
nzchar(x) !is.na(x) No? -- Bert On Mon, Aug 6, 2012 at 9:25 AM, Liviu Andronic landronim...@gmail.com wrote: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9101112131415 16171819 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20212223242526 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
Hello, Fun as an exercise in vectorization. 30 times faster. Don't look, guess. Gave it up? Ok, here it is. is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } # test ascii codes, just one loop. has_letter - function(x){ sapply(x, function(y){ y - as.integer(charToRaw(y)) any((65 = y y = 90) | (97 = y y = 122)) }) } x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) t1 - system.time(is_letter(x)) t2 - system.time(has_letter(x)) rbind(t1, t2, t1/t2) user.self sys.self elapsed user.child sys.child t1 15.690 15.74 NANA t2 0.5000.50 NANA 31.38 NaN 31.48 NANA Em 06-08-2012 17:25, Liviu Andronic escreveu: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9101112131415 16171819 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20212223242526 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On 08/06/2012 09:51 AM, Rui Barradas wrote: Hello, Fun as an exercise in vectorization. 30 times faster. Don't look, guess. system.time(res0 - grepl([[:alpha:]], x)) user system elapsed 0.060 0.000 0.061 system.time(res1 - has_letter(x)) user system elapsed 3.728 0.008 3.747 all.equal(res0, res1, check.attributes=FALSE) [1] TRUE Gave it up? Ok, here it is. is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } # test ascii codes, just one loop. has_letter - function(x){ sapply(x, function(y){ y - as.integer(charToRaw(y)) any((65 = y y = 90) | (97 = y y = 122)) }) } x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) t1 - system.time(is_letter(x)) t2 - system.time(has_letter(x)) rbind(t1, t2, t1/t2) user.self sys.self elapsed user.child sys.child t1 15.690 15.74 NANA t2 0.5000.50 NANA 31.38 NaN 31.48 NANA Em 06-08-2012 17:25, Liviu Andronic escreveu: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9101112131415 16171819 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20212223242526 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
Perhaps I am missing something, but why use sapply() when grepl() is already vectorized? is.letter - function(x) grepl([:alpha:], x) is.number - function(x) grepl([:digit:], x) x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) str(x) chr [1:52000] a2 b10 c8 d3 e6 f1 g5 ... system.time(is.letter(x)) user system elapsed 0.011 0.000 0.010 system.time(is.number(x)) user system elapsed 0.010 0.000 0.011 Regards, Marc Schwartz On Aug 6, 2012, at 11:51 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Fun as an exercise in vectorization. 30 times faster. Don't look, guess. Gave it up? Ok, here it is. is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } # test ascii codes, just one loop. has_letter - function(x){ sapply(x, function(y){ y - as.integer(charToRaw(y)) any((65 = y y = 90) | (97 = y y = 122)) }) } x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) t1 - system.time(is_letter(x)) t2 - system.time(has_letter(x)) rbind(t1, t2, t1/t2) user.self sys.self elapsed user.child sys.child t1 15.690 15.74 NANA t2 0.5000.50 NANA 31.38 NaN 31.48 NANA Em 06-08-2012 17:25, Liviu Andronic escreveu: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9101112131415 16171819 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20212223242526 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
Hi, Not sure whether this is you wanted. x-letters (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) x1-c(x,1:26) x1 [1] a4 b3 c5 d2 e9 f6 g1 h8 i10 j7 k l [13] m n o p q r s t u v w x [25] y z 1 2 3 4 5 6 7 8 9 10 [37] 11 12 13 14 15 16 17 18 19 20 21 22 [49] 23 24 25 26 grepl(^[[:alpha:]][[:digit:]],x1) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE A.K. - Original Message - From: Liviu Andronic landronim...@gmail.com To: r-help@r-project.org Help r-help@r-project.org Cc: Sent: Monday, August 6, 2012 12:25 PM Subject: [R] test if elements of a character vector contain letters Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20 21 22 23 24 25 26 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20 21 22 23 24 25 26 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Aug 6, 2012, at 12:06 PM, Marc Schwartz marc_schwa...@me.com wrote: Perhaps I am missing something, but why use sapply() when grepl() is already vectorized? is.letter - function(x) grepl([:alpha:], x) is.number - function(x) grepl([:digit:], x) Sorry, typos in the above from my CP. Should be: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Marc x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) str(x) chr [1:52000] a2 b10 c8 d3 e6 f1 g5 ... system.time(is.letter(x)) user system elapsed 0.011 0.000 0.010 system.time(is.number(x)) user system elapsed 0.010 0.000 0.011 Regards, Marc Schwartz On Aug 6, 2012, at 11:51 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Fun as an exercise in vectorization. 30 times faster. Don't look, guess. Gave it up? Ok, here it is. is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } # test ascii codes, just one loop. has_letter - function(x){ sapply(x, function(y){ y - as.integer(charToRaw(y)) any((65 = y y = 90) | (97 = y y = 122)) }) } x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) t1 - system.time(is_letter(x)) t2 - system.time(has_letter(x)) rbind(t1, t2, t1/t2) user.self sys.self elapsed user.child sys.child t1 15.690 15.74 NANA t2 0.5000.50 NANA 31.38 NaN 31.48 NANA Em 06-08-2012 17:25, Liviu Andronic escreveu: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9101112131415 16171819 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20212223242526 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
Only an extra set of brackets: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) Without them, the functions are fast, but wrong. x [1] a8 b5 c10 d1 e6 f2 g4 h3 i7 j9 k l [13] m n o p q r s t u v w x [25] y z 1 2 3 4 5 6 7 8 9 10 [37] 11 12 13 14 15 16 17 18 19 20 21 22 [49] 23 24 25 26 is.letter - function(x) grepl([:alpha:], x) is.letter(x) [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE [13] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE is.letter - function(x) grepl([[:alpha:]], x) is.letter(x) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [25] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Marc Schwartz Sent: Monday, August 06, 2012 12:07 PM To: Rui Barradas Cc: r-help Subject: Re: [R] test if elements of a character vector contain letters Perhaps I am missing something, but why use sapply() when grepl() is already vectorized? is.letter - function(x) grepl([:alpha:], x) is.number - function(x) grepl([:digit:], x) x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) str(x) chr [1:52000] a2 b10 c8 d3 e6 f1 g5 ... system.time(is.letter(x)) user system elapsed 0.011 0.000 0.010 system.time(is.number(x)) user system elapsed 0.010 0.000 0.011 Regards, Marc Schwartz On Aug 6, 2012, at 11:51 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Fun as an exercise in vectorization. 30 times faster. Don't look, guess. Gave it up? Ok, here it is. is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } # test ascii codes, just one loop. has_letter - function(x){ sapply(x, function(y){ y - as.integer(charToRaw(y)) any((65 = y y = 90) | (97 = y y = 122)) }) } x - c(letters, 1:26) x[1:10] - paste(x[1:10], sample(1:10, 10), sep='') x - rep(x, 1e3) t1 - system.time(is_letter(x)) t2 - system.time(has_letter(x)) rbind(t1, t2, t1/t2) user.self sys.self elapsed user.child sys.child t1 15.690 15.74 NANA t2 0.5000.50 NANA 31.38 NaN 31.48 NANA Em 06-08-2012 17:25, Liviu Andronic escreveu: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE
Re: [R] test if elements of a character vector contain letters
On Mon, Aug 6, 2012 at 6:42 PM, Bert Gunter gunter.ber...@gene.com wrote: nzchar(x) !is.na(x) No? It doesn't work for what I need: x [1] a10 b8 c9 d2 e3 f4 g1 h7 i6 j5 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 nzchar(x) !is.na(x) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [18] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [35] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [52] TRUE I need to have TRUE when an element contains a letter, and FALSE when an element contains only numbers. The above returns TRUE for the entire vector. Regards Liviu On Mon, Aug 6, 2012 at 9:25 AM, Liviu Andronic landronim...@gmail.com wrote: Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector: (x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')) [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 x [1] a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter - function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } is_letter(x) a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9101112131415 16171819 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20212223242526 FALSE FALSE FALSE FALSE FALSE FALSE FALSE is_letter(x, 0:9) ##function slightly misnamed a10b7c2d3e6f1g5h8i9j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9101112131415 16171819 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20212223242526 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
You probably mean grepl('[a-zA-Z]', x) Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Mon, Aug 6, 2012 at 3:29 PM, Liviu Andronic landronim...@gmail.com wrote: On Mon, Aug 6, 2012 at 6:42 PM, Bert Gunter gunter.ber...@gene.com wrote: nzchar(x) !is.na(x) No? It doesn't work for what I need: x [1] a10 b8 c9 d2 e3 f4 g1 h7 i6 j5 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 nzchar(x) !is.na(x) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [18] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [35] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [52] TRUE I need to have TRUE when an element contains a letter, and FALSE when an element contains only numbers. The above returns TRUE for the entire vector. Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if elements of a character vector contain letters
On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote: is.letter - function(x) grepl([[:alpha:]], x) is.number - function(x) grepl([[:digit:]], x) This does exactly what I wanted: x [1] a10 b8 c9 d2 e3 f4 g1 h7 i6 j5 k l m n [15] o p q r s t u v w x y z 1 2 [29] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [43] 17 18 19 20 21 22 23 24 25 26 xb - grepl([[:alpha:]],x) x[xb] ##extract all vector elements that contain a letter [1] a10 b8 c9 d2 e3 f4 g1 h7 i6 j5 k l m n [15] o p q r s t u v w x y z xb - grepl([[:digit:]],x) x[xb] ##extract all vector elements that contain a digit [1] a10 b8 c9 d2 e3 f4 g1 h7 i6 j5 1 2 3 4 [15] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [29] 19 20 21 22 23 24 25 26 Thanks all for the suggestions! Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test parallel slopes with svyolr
On Sun, Jul 8, 2012 at 2:32 AM, Diana Marcela Martinez Ruiz dianamm...@hotmail.com wrote: Hello, I would like to know how to test the assumption of proportional odds or parallel lines or slopes for an ordinal logistic regression with svyolr I wouldn't, but if someone finds a clear reference I'd be prepared to implement it anyway. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test Binary File
As an alternative to the hexview package, an external Hex-Editor may help you investigate how the data is organised. -- View this message in context: http://r.789695.n4.nabble.com/Test-Binary-File-tp833690p4633075.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if a sample mean of integers with range -inf; inf is different from zero
mean(c) != 0 But if you mean in a statistical sense... t.test() is one possibility. Michael On Fri, May 4, 2012 at 5:29 AM, Kay Cichini kay.cich...@gmail.com wrote: Hi all, how would you test if a sample mean of integers with range -inf;inf is different from zero: # my sample of integers: c - c(-3, -1, 0, 1, 0, 3, 4, 10, 12) # is mean of c 0?: mean(c) Thanks, Kay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if a sample mean of integers with range -inf; inf is different from zero
On Fri, May 04, 2012 at 11:29:51AM +0200, Kay Cichini wrote: Hi all, how would you test if a sample mean of integers with range -inf;inf is different from zero: # my sample of integers: c - c(-3, -1, 0, 1, 0, 3, 4, 10, 12) # is mean of c 0?: mean(c) Hi. It is better to use a name of a vector different from c, which is a function, which you also use. Testing, whether the sample mean is zero is simple, since one can use mean(c) == 0 or sum(c) == 0 which are equivalent even in the inaccurate computer arithmetic. So, i think, you are asking for a statistical test, whether the true distribution mean is zero on the basis of a sample. Testing this requires some additional information on the distribution. If we do not know anything about the distribution except that the values are integers, then the sample mean can be arbitrarily large even if the distribuition mean is zero. Consider, for example, a uniform distribution on {-M, M} for some very large integer M. Observing a large sample mean does not allow to reject the null hypothesis on any level, since a large mean may have large probability even if the null hypothesis is true. If there is no bound on the values, then testing anything concerning the mean may not be possible, since the expected may not exist. Do you have a reason to think that the true distribution has an expected value? An example of an integer random variable without an expected value is s*X where s is uniform on {-1, 1} and X has value 2^i with probability 2^-i for i a positive integer. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test-Predict R survival analysis
On 04/18/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I'm trying to use the R Survival analysis on a windows 7 system. The input data format is described at the end of this mail. 1/ I tried to perform a survival analysis including stratified variables using the following formula. cox.xtab_miR=coxph(Surv(time, status) ~ miR + strata(sex,nbligne, age), data=matrix) and obtain the following error message Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge Is this due to the model (error in formula) or is the number of stratified variables fixed? The Cox model compares the deaths to the non-deaths, separately within each stratum, then adds up the result. Your data set and model combination puts each subject into their own strata, so there is no one to compare them to. The fit has no data to use and so must fail. (I admit the error message is misleading, but I hadn't ever seen someone make this particular mistake before.) The following model works much better coxph(Surv(time, status) ~ miR + age + nbligne + strata(sex)) coef exp(coef) se(coef) z p miR 2.75e-05 1.00 9.35e-06 2.941 0.0033 age 3.39e-03 1.00 1.01e-02 0.334 0.7400 nbligne 7.14e-02 1.07 1.32e-01 0.542 0.5900 Likelihood ratio test=5.87 on 3 df, p=0.118 n= 70, number of events= 59 (1 observation deleted due to missingness) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test Normality
Hi Sindy, you might try Snows penultimate normality test from the TeachingDemos package. But read the help file carefully. http://www.inside-r.org/packages/cran/TeachingDemos/docs/SnowsPenultimateNormalityTest cheers. Am 28.03.2012 02:32, schrieb Sindy Carolina Lizarazo: Good Night I made different test to check normality and multinormality in my dataset, but I don´t know which test is better. To verify univariate normality I checked: shapiro.test, cvm.test, ad.test, lillie.test, sf.test or jaque.bera.test and To verify multivariate normal distribution I use mardia, mvShapiro.Test, mvsf, mshapiro.test, mvnorm.e. I have a dataset with almost 1000 data and 9 variables, in both cases the result is non-normality. For this reason, I transformed data with bcPower function and I want to check normality again. I really appreciate your help. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test Normality
On 3/27/2012 8:32 PM, Sindy Carolina Lizarazo wrote: Good Night I made different test to check normality and multinormality in my dataset, but I don´t know which test is better. To verify univariate normality I checked: shapiro.test, cvm.test, ad.test, lillie.test, sf.test or jaque.bera.test and To verify multivariate normal distribution I use mardia, mvShapiro.Test, mvsf, mshapiro.test, mvnorm.e. I have a dataset with almost 1000 data and 9 variables, in both cases the result is non-normality. For this reason, I transformed data with bcPower function and I want to check normality again. Univariate tests of normality are subsumed within the multivariate tests, so there is no real need for the former. That being said, many of the tests are quite sensitive to mild or small departures from multivariate normality, such that would have little real impact on the validity of an analysis. You may find it more useful to carry out a graphical analysis, such as with normal QQ plots, or the multivariate generalization with is a plot of Mahalanobis squared distances of all observations from their centroid vs. corresponding quantiles of the Chisquare distribution with p=9 df. [As a courtesy to readers, you might cite the packages from which you've used these functions.] -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if text is part of vector
Hi Hello, this is a very simple question: How can I find out if a word is part of a list of words like: a - word1 b - word4 vector - c(word1,word2,word3) I tried it with match(a,vector) but this gives the position of the word. Perhaps a %in% vector Regards Petr I am not sure if and how that can be done with a logical operator like if: IF text is part of vector THEN print is part Probably a very easy thing to do, but I am missing the logical operator... and help(if) is not working best regards, johannes -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if text is part of vector
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 20/01/12 12:50, Johannes Radinger wrote: Hello, this is a very simple question: How can I find out if a word is part of a list of words like: a - word1 b - word4 vector - c(word1,word2,word3) I tried it with match(a,vector) but this gives the position of the word. I am not sure if and how that can be done with a logical operator like if: IF text is part of vector THEN print is part Probably a very easy thing to do, but I am missing the logical operator... and help(if) is not working check out %in% help: ?%in% Cheers, Rainer best regards, johannes - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ZV7IACgkQoYgNqgF2egroawCfYAN/eOBMKN4VDTbBZtiBVGdS LAUAnR+h9kg2INJTICiGIAUTfYm2fCbC =Ws2h -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if text is part of vector
Hi, thank you very much... %in% is the operator I was looking for. cheers, johannes Original-Nachricht Datum: Fri, 20 Jan 2012 13:01:54 +0100 Von: Rainer M Krug r.m.k...@gmail.com An: Johannes Radinger jradin...@gmx.at CC: R-help@r-project.org Betreff: Re: [R] test if text is part of vector -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 20/01/12 12:50, Johannes Radinger wrote: Hello, this is a very simple question: How can I find out if a word is part of a list of words like: a - word1 b - word4 vector - c(word1,word2,word3) I tried it with match(a,vector) but this gives the position of the word. I am not sure if and how that can be done with a logical operator like if: IF text is part of vector THEN print is part Probably a very easy thing to do, but I am missing the logical operator... and help(if) is not working check out %in% help: ?%in% Cheers, Rainer best regards, johannes - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ZV7IACgkQoYgNqgF2egroawCfYAN/eOBMKN4VDTbBZtiBVGdS LAUAnR+h9kg2INJTICiGIAUTfYm2fCbC =Ws2h -END PGP SIGNATURE- -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if text is part of vector
You also might look at grepl() if you have time: it allows regular expressions and will be a little (a lot?) more flexible in how you define a match if you want to ignore things like capitalization. (mnemonic: the L in grepl indicates its like grep but returns logicals instead of positions) Michael On Jan 20, 2012, at 7:42 AM, Johannes Radinger jradin...@gmx.at wrote: Hi, thank you very much... %in% is the operator I was looking for. cheers, johannes Original-Nachricht Datum: Fri, 20 Jan 2012 13:01:54 +0100 Von: Rainer M Krug r.m.k...@gmail.com An: Johannes Radinger jradin...@gmx.at CC: R-help@r-project.org Betreff: Re: [R] test if text is part of vector -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 20/01/12 12:50, Johannes Radinger wrote: Hello, this is a very simple question: How can I find out if a word is part of a list of words like: a - word1 b - word4 vector - c(word1,word2,word3) I tried it with match(a,vector) but this gives the position of the word. I am not sure if and how that can be done with a logical operator like if: IF text is part of vector THEN print is part Probably a very easy thing to do, but I am missing the logical operator... and help(if) is not working check out %in% help: ?%in% Cheers, Rainer best regards, johannes - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ZV7IACgkQoYgNqgF2egroawCfYAN/eOBMKN4VDTbBZtiBVGdS LAUAnR+h9kg2INJTICiGIAUTfYm2fCbC =Ws2h -END PGP SIGNATURE- -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test Case for Package
On 19.10.2011 10:13, Vikram Bahure wrote: Hi, I had a query for writing a test case for a package. If we are testing a function then do we need to call that function for testing; library(mypackage)? Is that any circular logic any way. For eg. if I create a package mypackage, can I have a file mypackagetest in the tests directory whose line is library(mypackage). Yes, actually it won't work without such a call to library() or require() ... Uwe Ligges It would be helpful if I could get some input on this. Regards Vikram [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for Random Walk and Makov Process
For random walk, there are entropy based tests (Robinson 1991), or you could empirically test the hypothesis by generating random normal data with the same mean and standard deviation and looking at the distribution of your quantiles. You could make generic statements also about whether or not the data demonstrates autocorrelation function values which are not significant and do not appear to have trend. Further, In a random walk, a binary variable for whether or not values are above and below the mean should follow a binomial distribution of size 1 with a probability of .5, there are tests which do this but also take magnitude into account. I mean to say there are a lot of ways to approach that problem, it depends on the application and how strong you want your conclusions to be. What kind of Markov process? On Sep 3, 2554 BE, at 9:59 PM, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear All I want to test my data for Random Walk or Markov Process. How I can do this. Many Thanks -- Jumlong Vongprasert Assist, Prof. Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if vector contains elements of another vector (disregarding the position)
%in% Here, i %in% j Hope this helps, Michael On Mon, Aug 22, 2011 at 11:51 AM, Martin Batholdy batho...@googlemail.comwrote: Hi, I have the following problem: I have two vectors: i - c('a','c','g','h','b','d','f','k','l','e','i') j - c('a', 'b', 'c') now I would like to generate a vector with the length of i that has zeros where i[x] != any element of j and 1 where i[x] == any element of j. So for the example above the vector would look like this: c(1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0) can someone help me on this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test if vector contains elements of another vector (disregarding the position)
Try this: i %in% j * 1 On Mon, Aug 22, 2011 at 12:51 PM, Martin Batholdy batho...@googlemail.com wrote: Hi, I have the following problem: I have two vectors: i - c('a','c','g','h','b','d','f','k','l','e','i') j - c('a', 'b', 'c') now I would like to generate a vector with the length of i that has zeros where i[x] != any element of j and 1 where i[x] == any element of j. So for the example above the vector would look like this: c(1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0) can someone help me on this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
On Fri, Jun 10, 2011 at 10:15:36PM +0200, Kairavi Bhakta wrote: Thanks for your answer. The reason I want the data to be uniform: It's the first step in a machine learning project I am working on. If I know the data isn't uniformly distributed, then this means there is probably something wrong and the following steps will be biased by the non-uniform input data. I'm not checking an assumption for another statistical test. Actually, the data has been normalized because it is supposed to represent a probability distribution. That's why it sums to 1. My assumption is that, for a vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, but of course there is variation, and I would like to test whether the data comes close enough or not. As others told you, this is not the right format for KS test. The words testing uniformity can mean different things and the meaning depends on which statistical model you assume. If we have a random variable with values in [0, 1], then testing uniformity means to test, to which extent its distribution is close to the uniform distribution on [0, 1]. The numbers, which concentrate around 0.2, will not satisfy this. If we have a discrete variable with k values, for which we have m independent observations, and the number of observations of value i is m_i, then it is possible to test, whether the variable has the uniform distribution on {1, ..., k} using Chi-squared test. Note that for this test, the original counts are needed, not their normalized values, which sum up to 1. For example, if we have 20 observations and the counts (m_1, ..., m_5) are (4, 3, 5, 2, 6), then this is quite consistent with the assumption of uniform distribution. On the other hand, if we have 200 observations and the counts are (40, 30, 50, 20, 60), then the null hypothesis of uniform distribution may be rejected (the uniform distribution is the default, see argument p in ?chisq.test) x - c(40, 30, 50, 20, 60) chisq.test(x) Chi-squared test for given probabilities data: x X-squared = 25, df = 4, p-value = 5.031e-05 It is not clear, whether this is suitable for your application. If you generate the values in a different way, then another test may be needed. Can you specify more detail on how the numbers are generated? Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2). Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions). What is your ultimate question/goal? Why do you care if the data is uniform or not? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 11:24 AM To: r-help@r-project.org Subject: [R] Test if data uniformly distributed (newbie) Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have not researched this issue yet. Kairavi. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test if data uniformly distributed (newbie)
OK, that is not the correct format for the KS test (which is expecting data ranging from 0 to 1 with a fairly flat histogram). You could possibly test this with a Chi-squared test. Can you tell us more about how the numbers you are looking at are generated? The Chi-squared test could be used on counts of 1-5 and compared to the assumption that each is equally likely, but there still is the question of power and how close to uniform is uniform enough. You would need huge samples to find a difference if the true distribution is only slightly non uniform. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: kairavibha...@googlemail.com [mailto:kairavibha...@googlemail.com] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 2:16 PM To: Greg Snow; r-help@r-project.org Subject: RE: [R] Test if data uniformly distributed (newbie) Thanks for your answer. The reason I want the data to be uniform: It's the first step in a machine learning project I am working on. If I know the data isn't uniformly distributed, then this means there is probably something wrong and the following steps will be biased by the non-uniform input data. I'm not checking an assumption for another statistical test. Actually, the data has been normalized because it is supposed to represent a probability distribution. That's why it sums to 1. My assumption is that, for a vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, but of course there is variation, and I would like to test whether the data comes close enough or not. At the moment I am only testing whether there are more a's than b's in the top and bottom portion of the each file (with a wilcoxon test, I have 8 reps of the model I am trying to build). But that sort of felt like a very adhoc solution and I figured maybe testing for uniformity would be better, or at least a important addition. I've also been looking into testing for the randomness of the sequence of a's and b's instead of the wilcoxon test, although that may or may not involve R. Kairavi. Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2). Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions). What is your ultimate question/goal? Why do you care if the data is uniform or not? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 801.408.8111 [Hide Quoted Text] -Original Message- From: r-help-boun...@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 [mailto:r-help-bounces@r-https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 project.orghttp://project.org] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 11:24 AM To: r-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599 Subject: [R] Test if data uniformly distributed (newbie) Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have
Re: [R] Test if data uniformly distributed (newbie)
Thanks for your answer. The reason I want the data to be uniform: It's the first step in a machine learning project I am working on. If I know the data isn't uniformly distributed, then this means there is probably something wrong and the following steps will be biased by the non-uniform input data. I'm not checking an assumption for another statistical test. Actually, the data has been normalized because it is supposed to represent a probability distribution. That's why it sums to 1. My assumption is that, for a vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, but of course there is variation, and I would like to test whether the data comes close enough or not. At the moment I am only testing whether there are more a's than b's in the top and bottom portion of the each file (with a wilcoxon test, I have 8 reps of the model I am trying to build). But that sort of felt like a very adhoc solution and I figured maybe testing for uniformity would be better, or at least a important addition. I've also been looking into testing for the randomness of the sequence of a's and b's instead of the wilcoxon test, although that may or may not involve R. Kairavi. Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2). Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions). What is your ultimate question/goal? Why do you care if the data is uniform or not? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599# 801.408.8111 [Hide Quoted Text] -Original Message- From: r-help-boun...@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#[mailto: r-help-bounces@r-https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599# project.org] On Behalf Of Kairavi Bhakta Sent: Friday, June 10, 2011 11:24 AM To: r-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599# Subject: [R] Test if data uniformly distributed (newbie) Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an a or a b. I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square goodness-of-fit test could be used. Then I looked up ?kolmogorov and found ks.test, but the example there is for the normal distribution and I am not sure how to adapt it for the uniform distribution. I did ?runif and read about the uniform distribution but it doesn't say what the cumulative distribution is. Is it punif, like pnorm? I thought of that because I found a message on this list where someone was told to use pnorm instead of dnorm. But the help page on the uniform distribution says punif is the distribution function. Are the cumulative distribution and the distribution function the same thing? Having several names for the same thing has always confused me very much in statistics. Also, I am not sure whether I need to specify any parameters for the distribution and which. I thought maybe I should specify min=0 and max=1 but those appear to be the defaults. Do I need to specify q, the vector of quantiles? So is ks.test(x, punif) correct or not for what I am attempting to do? After this I will also need to find out whether the a's and b's are distributed randomly in each file. I would be greatful for any pointers although I have not researched this issue yet. Kairavi. [[alternative HTML version deleted]] __ R-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/postinghttp://www.r-project.org/posting - guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for list membership
Try this: list(c(1,2,3), c(4,5,6)) %in% list(c(1,2,3)) On Mon, May 30, 2011 at 10:36 AM, Marcin Wlodarczak mwlodarc...@uni-bielefeld.de wrote: Hi, I need some help with this one: how do I check whether a vector is already present in a list of vectors. I have seen %in% recommended in a similar case but that obviously does not work here. c(1,2,3) %in% list(c(1,2,3), c(4,5,6)) returns [1] FALSE FALSE FALSE which makes sense since 1, 2 or 3 are not elements of that list. I don't really know how to move from there though. Best wishes, Marcin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for list membership
On 30.05.2011 15:36, Marcin Wlodarczak wrote: Hi, I need some help with this one: how do I check whether a vector is already present in a list of vectors. I have seen %in% recommended in a similar case but that obviously does not work here. c(1,2,3) %in% list(c(1,2,3), c(4,5,6)) You said it yourself, almost: list(c(1,2,3)) %in% list(c(1,2,3), c(4,5,6)) Uwe Ligges returns [1] FALSE FALSE FALSE which makes sense since 1, 2 or 3 are not elements of that list. I don't really know how to move from there though. Best wishes, Marcin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for list membership
You almost solved your own problem with that last statement. Instead of comparing apples and oranges, you need to compare oranges and oranges: list(c(1,2,3)) %in% list(c(1,2,3), c(4,5,6)) [1] TRUE list(c(1,2,3)) %in% list(c(1,2,9), c(4,5,6)) [1] FALSE Sarah On Mon, May 30, 2011 at 9:36 AM, Marcin Wlodarczak mwlodarc...@uni-bielefeld.de wrote: Hi, I need some help with this one: how do I check whether a vector is already present in a list of vectors. I have seen %in% recommended in a similar case but that obviously does not work here. c(1,2,3) %in% list(c(1,2,3), c(4,5,6)) returns [1] FALSE FALSE FALSE which makes sense since 1, 2 or 3 are not elements of that list. I don't really know how to move from there though. Best wishes, Marcin -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for list membership
On 05/30/2011 04:14 PM, Uwe Ligges wrote: On 30.05.2011 15:36, Marcin Wlodarczak wrote: Hi, I need some help with this one: how do I check whether a vector is already present in a list of vectors. I have seen %in% recommended in a similar case but that obviously does not work here. c(1,2,3) %in% list(c(1,2,3), c(4,5,6)) You said it yourself, almost: list(c(1,2,3)) %in% list(c(1,2,3), c(4,5,6)) Brilliant! Thanks to everyone. Marcin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equivalence
Reading the original post it is fairly clear that the original poster's question does not match with the traditional test of equivalence, but rather is trying to determine distinguishable or indistinguishable. If the test in my suggestion is statistically significant (and note I did not suggest only testing the interaction) then that meets one possible interpretation of distinguishable, a non-significant result could mean either equivalence or low power, the combination of which could be an interpretation of indistinguishable. I phrased my response as a question in hopes that the original poster would think through what they really wanted to test and get back to us with further details. It could very well be that my response is very different from what they were thinking, but explaining how it does not fit will better help us understand the real problem. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Sunday, February 13, 2011 9:53 PM To: Greg Snow Cc: syrvn; r-help@r-project.org Subject: Re: [R] Test for equivalence testing the null hypothesis of no interaction is not the same as a test of equivalence for the two differences. There is a literature on tests of equivalence. First you must develop a definition of equivalence, for example the difference is in the interval (-a,a). Then, for example, you test the null hypothesis that the difference is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests). One simple procedure: check to see if the 90% CI for the difference (difference of the differences or the interaction effect) is contained in the interval (-a,a). albyn Quoting Greg Snow greg.s...@imail.org: Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test- for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equivalence
Reading the original post it was clear to me that the poster was looking for a test of equivalence, but obviously there was room for interpretation! albyn On Mon, Feb 14, 2011 at 09:46:13AM -0700, Greg Snow wrote: Reading the original post it is fairly clear that the original poster's question does not match with the traditional test of equivalence, but rather is trying to determine distinguishable or indistinguishable. If the test in my suggestion is statistically significant (and note I did not suggest only testing the interaction) then that meets one possible interpretation of distinguishable, a non-significant result could mean either equivalence or low power, the combination of which could be an interpretation of indistinguishable. I phrased my response as a question in hopes that the original poster would think through what they really wanted to test and get back to us with further details. It could very well be that my response is very different from what they were thinking, but explaining how it does not fit will better help us understand the real problem. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Sunday, February 13, 2011 9:53 PM To: Greg Snow Cc: syrvn; r-help@r-project.org Subject: Re: [R] Test for equivalence testing the null hypothesis of no interaction is not the same as a test of equivalence for the two differences. There is a literature on tests of equivalence. First you must develop a definition of equivalence, for example the difference is in the interval (-a,a). Then, for example, you test the null hypothesis that the difference is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests). One simple procedure: check to see if the 90% CI for the difference (difference of the differences or the interaction effect) is contained in the interval (-a,a). albyn Quoting Greg Snow greg.s...@imail.org: Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test- for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] Test for equivalence
Hi! first of all. Thank you all very much for your input. I am sorry but I haven't had yet the time to reply to all of your messages. I will give you a more detailed description of my problem within the next 2 days! Many thanks again. Best, syrvn -- View this message in context: http://r.789695.n4.nabble.com/Test-for-equivalence-tp3302739p3305890.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equivalence
From: greg.s...@imail.org To: ment...@gmx.net; r-help@r-project.org Date: Sat, 12 Feb 2011 18:04:34 -0700 Subject: Re: [R] Test for equivalence Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. I'm not a statistician and don't play one on TV but I'm not sure if the OP has a specific approach of hypothesis in mind. I guess it could be a question about an equivalence or non-inferiority trial or about some notion of stationary statistics between the two control and treatment groups ( do list A and B have same E(x^n) for example). More likely, it sounds like a question about do A and B appear to be drawn from the same populations in terms of statistics I care about. So, I guess first I'd just re run whatever analyses you did with lists A and B but run control vs control and also treatment vs treatment, pool the results ( A+B combined ) etc. See what that returns and do sensitivity tests, deleting points moving them a bit etc. Any anova, cox, aft etc probably wouldn't hurt but hard to know without knowing real issue. FWIW, this issue was raised at a recent review of a drug where part of the FDA discussion concerned differences in placebo survival between two studies. someone also earlier mentioned the FDA doesn't accept such and such. In this review of Provenge that was linked to later threats against some oncologists ( presumably disgruntled DNDN stock speculators LOL). many types of information are considered including post hoc analysis. Now doesn't accept doesn't mean they will refuse to file a BLA that uses some analysis but this panel anyway was quite open to considering all the information they had probably more so than the public ( stockholders LOL) that was often just quoting some isolated statistics, http://www.fda.gov/ohrms/dockets/ac/07/transcripts/2007-4291T1.pdf ( you can find the info presented by DNDN in their briefing and the responses, this is just a transcript of meeting) This was probably so bizarre to a lot of people because the panel voted solidly that they thought the drug was effective but the FDA rejected the thing largely due to efficacy concerns. Their vote was on a question that forced a bit of an unfortunate choice and it was easy to see how the descrepancy occured. And in the final analysis the FDA question is, should the sponsor be allowed to collect money for claiming they have a drug to treat this condition. I'm not citing this as a case of how statistics should be done by any means, just that it is an interesting recent case of how it is done in real life. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test-for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] Test for equivalence
testing the null hypothesis of no interaction is not the same as a test of equivalence for the two differences. There is a literature on tests of equivalence. First you must develop a definition of equivalence, for example the difference is in the interval (-a,a). Then, for example, you test the null hypothesis that the difference is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests). One simple procedure: check to see if the 90% CI for the difference (difference of the differences or the interaction effect) is contained in the interval (-a,a). albyn Quoting Greg Snow greg.s...@imail.org: Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test-for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for equivalence
Does it make sense for you to combine the 2 data sets and do a 2-way anova with treatment vs. control as one factor and experiment number as the other factor? Then you could test the interaction and treatment number factor to see if they make a difference. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of syrvn Sent: Saturday, February 12, 2011 7:30 AM To: r-help@r-project.org Subject: [R] Test for equivalence Hi! is there a way in R to check whether the outcome of two different experiments is statistically distinguishable or indistinguishable? More preciously, I used the wilcoxon test to determine the differences between controls and treated subjects for two different experiments. Now I would like to check whether the two lists of analytes obtained are statistically distinguishable or indistinguishable I tried to use a equivalence test from the 'equivalence' package in R but it seems that this test is not applicable to my problem. The test in the 'equivalence' package just determines similarity between two conditions but I need to compare the outcome of two different experiments. My experiments are constructed as follows: Exp1: 8 control samples 8 treated samples - determine significantly changes (List A) Exp2: 8 control samples 8 treated samples - determine significantly changes (List B) Now i would like to check whether List A and List B are distinguishable or indistinguishable. Any advice is very much appreciated! Best, beginner -- View this message in context: http://r.789695.n4.nabble.com/Test-for- equivalence-tp3302739p3302739.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test statistic in anova.glm when quasi family is used
Eiiti Kasuya ekasuscb at kyushu-u.org writes: When quasi family (not quasipoisson or quasibinomial) is used in glm, what is the appropriate test statistic in anova.glm? Help of anova.glm tells “For models with known dispersion (e.g., binomial and Poisson fits) the chi-squared test is most appropriate, and for those with dispersion estimated by moments (e.g., gaussian, quasibinomial and quasipoisson fits) the F test is most appropriate”. I assume that F is appropriate in the case of quasi (not quasipoisson or quasibinomial). Is this correct? Ei Kasuya Yes. References are Venables and Ripley (i.e. MASS) and Crawley's Statistical Data Analysis book. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test
Thank you all for the precious help. Finally i could start the writing of a first part of my script, but now i have a new question for you. Need i to repeat the write.table portion for all the 15 lines or can i use a short cut? Example: file.open - C:\\test.txt file.save - C:\\results.txt my.data - read.table(file, header=T) library(plyr) write.table(ddply(my.data, .(Thesis, Day), function(x){ Baseline - unlist(x[1, c(A, B, C)]) data.frame(t(apply(x[-1, c(A, B, C)], 1, function(z){z - Baseline}))) }), file = file.save, row.names = F) write.table(ddply(my.data, .(Thesis, Day), function(x){ Baseline - unlist(x[2, c(A, B, C)]) data.frame(t(apply(x[-1:-2, c(A, B, C)], 1, function(z){z - Baseline}))) }), file = file.save, append = T, row.names = F, col.names = F) etc etc Thanks again for the help. Best regards, Roberto. -- View this message in context: http://r.789695.n4.nabble.com/Comparison-of-numbers-in-a-table-tp3217329p3218524.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] test
Hi: If I understood you correctly, the following may work: # Utility function to compute all pairwise differences of a vector: # The upper triangle subtracts x_i - x_j for j i; if you want it the # other way around, use lower.tri() instead of upper.tri(). # Coercion to vector means that the differences from x_1 appear # first, followed by those from x_2, then x_3, etc. subtfun - function(x) { u - outer(x, x, '-') as.vector(u[upper.tri(u)]) } # To apply it to each of A, B, C in each Thesis:Day subgroup, # here's one way with function ddply() in the plyr package: library(plyr) v - ddply(df, .(Thesis, Day), numcolwise(subtfun)) head(v) HTH, Dennis On Fri, Jan 14, 2011 at 1:17 AM, romzero romz...@yahoo.it wrote: Hi, i have that table Thesis Day A B C 1 0 83.43 90.15 22.97 1 0 85.50 94.97 16.62 1 0 83.36 95.38 20.70 1 0 84.47 92.16 23.58 1 0 83.98 95.33 19.39 1 0 82.86 93.78 24.55 1 0 83.39 92.67 19.56 1 0 85.17 95.24 17.95 1 0 81.62 93.32 28.49 1 0 82.99 92.85 19.73 1 0 81.11 95.67 27.20 1 0 83.39 94.69 16.51 1 0 79.56 89.87 30.39 1 0 80.54 93.32 21.76 1 0 82.11 92.58 22.17 1 14 85.65 94.00 19.19 1 14 85.06 92.44 20.44 1 14 83.97 91.39 24.38 1 14 84.61 91.97 19.44 1 14 85.13 90.59 25.30 1 14 84.81 91.01 19.80 1 14 84.52 94.06 18.77 1 14 84.30 94.49 24.90 1 14 84.74 91.32 20.35 1 14 84.08 94.12 22.96 1 14 84.50 94.25 19.95 1 14 84.02 94.74 20.35 1 14 85.30 92.82 21.12 1 14 85.08 91.14 24.16 1 14 85.21 95.69 18.17 etc etc etc etc etc 2 0 83.43 90.15 22.97 2 0 85.50 94.97 16.62 2 0 83.36 95.38 20.70 2 0 84.47 92.16 23.58 2 0 83.98 95.33 19.39 2 0 82.86 93.78 24.55 2 0 83.39 92.67 19.56 2 0 85.17 95.24 17.95 2 0 81.62 93.32 28.49 2 0 82.99 92.85 19.73 2 0 81.11 95.67 27.20 2 0 83.39 94.69 16.51 2 0 79.56 89.87 30.39 2 0 80.54 93.32 21.76 2 0 82.11 92.58 22.17 2 14 84.48 91.23 20.44 2 14 85.22 93.08 22.54 2 14 83.89 92.74 25.11 etc etc etc etc etc I need to subtract from every number the other numbers of the same thesis and same day. Example: A(row1) - A (row2) (same for B and C) A(row1) - A (row3) etc until the last Thesis 1 and Day 0 A(row2) - A (row3) etc etc until the last Thesis 1 and Day 0 Same for the others theses and days. How can i do that? Sorry for my english. -- View this message in context: http://r.789695.n4.nabble.com/test-tp3217329p3217329.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.