Re: [R] Should there be an R-beginners list?
StackOverflow has certainly its merits, although I miss a bit the good ol' Oxford sarcasm gems you find in this list. This said : Beginner's list. Bad, bad idea. First rule in my classes is: RTFI (Read The Fucking Internetzz). Anybody using R should be able to do a basic Google search. A beginner's list is not going to help them in learning that. If beginners do the effort of following the posting guidelines, netiquette or any other guide to getting help on the internet, they can safely use this list. Cheers Joris On Wed, Nov 27, 2013 at 9:47 AM, Rolf Turner wrote: > On 11/25/13 09:04, Rich Shepard wrote: > >> On Sun, 24 Nov 2013, Yihui Xie wrote: >> >> Mailing lists are good for a smaller group of people, and especially >>> good when more focused on discussions on development (including bug >>> reports). The better place for questions is a web forum. >>> >> >> I disagree. Mail lists push messages to subscribers while web fora >> require >> one to use a browser, log in, then pull messages. Not nearly as >> convenient. >> > > Well expressed Rich. I agree with you completely. > > cheers, > > Rolf Turner > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tinn-R user guide (latex sources) available on GitHub
Thank you for doing this! On Wed, Nov 27, 2013 at 11:22 AM, Jose Claudio Faria < joseclaudio.fa...@gmail.com> wrote: > Dears user, > > The Tinn-R User Guide is completely written in LaTeX and the idea > behind this to be available on GitHub is that it has contributions > from multiple users. > > If you find something that you would like to include or impruve: > please, fell free to make it better. > > This User Guide have been developing under both OS: Windows and Linux. > > Under Windows: we have been using Tinn-R as editor and MikTeX as compiler. > > Under Linux: we have been using Vim (with LaTeX-Box plugin) as editor > and TexLive as compiler. > > Link: https://github.com/jcfaria/Tinn-R-User-Guide > > Regards, > -- > ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\ > Jose Claudio Faria > Estatistica > UESC/DCET/Brasil > joseclaudio.faria at gmail.com > Telefones: > 55(73)3680.5545 - UESC > 55(73)9100.7351 - TIM > 55(73)8817.6159 - OI > ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems when Apply a script to a list
Where exactly did you put the sink() statement? I tried it with a 1000 dataframes and I have no problem whatsoever. Cheers Joris On Fri, Aug 27, 2010 at 6:56 AM, wrote: > Joris, > thank you very much for your help. > It is very helpful for me. > I still have a problem with sink stack although I put after my last print > result sink(). > I want to ask you another one question. Do you know how can I have at the > sink output file a message for which one set of the list are the exact > results. When I use for instead of apply I had > cat("\n***\nEstimation \n\nDataset > Sim : ", > i ) > Thank you in advance > Evgenia > Joris Meys writes: >> >> Answers below. >> On Thu, Aug 26, 2010 at 11:20 AM, Evgenia wrote: >>> >>> Dear users, >>> ***I have a function f to simulate data from a model (example below >>> used >>> only to show my problems) >>> f<-function(n,mean1){ >>> a<-matrix(rnorm(n, mean1 , sd = 1),ncol=5) >>> b<-matrix(runif(n),ncol=5) >>> data<-rbind(a,b) >>> out<-data >>> out} >>> *I want to simulate 1000 datasets (here only 5) so I use >>> S<-list() >>> for (i in 1:5){ >>> S[[i]]<-f(n=10,mean1=0)} >>> **I have a very complicated function for estimation of a model which >>> I >>> want to apply to Each one of the above simulated datasets >>> fun<-function(data){data<-as.matrix(data) >>> sink(' Example.txt',append=TRUE) >>> cat("\n***\nEstimation >>> \n\nDataset Sim : ", >>> i ) >>> d<-data%*%t(data) >>> s<-solve(d) >>> print(s) >>> out<-s >>> out >>> } >>> results<-list() >>> for (i in 1:5){results[[i]]<-fun(data=S[[i]])} >>> >>> My questions are: >>> 1) for some datasets system is computational singular and this causes >>> execution of the for to stop.By this way I have only results until this >>> problem happens.How can I pass over the execution for this step and have >>> results for All other datasets for which function fun is applicable? >> >> see ?try, or ?tryCatch. >> I'd do something in the line of >> for(i in 1:5){ >> tmp <- try(fun(data=S[[i]])) >> results[[i]] <- ifelse(is(tmp,"try-error"),NA,tmp) >> } >> Alternatively, you could also use lapply : >> results <- lapply(S,function(x{ >> tmp <- try(fun(data=x)) >> ifelse(is(tmp,"try-error"),NA,tmp) >> }) >>> >>> 2) After some runs to my program, I receive this error message someError >>> in >>> sink("output.txt") : sink stack is full . How can I solve this problem, >>> as I >>> want to have results of my program for 1000 datasets. >> >> That is because you never empty the sink. add sink() after the last >> line you want in the file. This will empty the sink buffer to the >> file. Otherwise R keeps everything in the memory, and that gets too >> full after a while. >>> >>> 3) Using for is the correct way to run my proram for a list >> >> See the lapply solution. >>> >>> Thanks alot >>> Evgenia >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html >>> Sent from the R help mailing list archive at Nabble.com. >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Joris Meys >> Statistical consultant >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing arguments between S4 methods fails within a function:bug? example with raster package.
Dear all, This problem came up initially while debugging a function, but it seems to be a more general problem of R. I hope I'm wrong, but I can't find another explanation. Let me illustrate with the raster package. For an object "RasterLayer" (which inherits from Raster), there is a method xyValues defined with the signature (object="RasterLayer",xy="matrix"). There is also a method with signature (object="Raster",xy="vector"). The only thing this method does, is change xy into a matrix and then pass on to the next method using callGeneric again. Arguments are passed. Now this all works smoothly, as long as you stay in the global environment : require(raster) a <- raster() a[] <- 1:ncell(a) origin <- c(-80,50) eff.dist <- 10 unlist(xyValues(a,xy=origin,buffer=eff.dist)) [1] 14140 14141 14500 14501 Now let's make a very basic test function : test <- function(x,orig.point){ eff.distance <- 10 p <- unlist(xyValues(x,xy=orig.point,buffer=eff.distance)) return(p) } This gives the following result : > test(a,origin) Error in .local(object, xy, ...) : object 'eff.distance' not found huh? Apparently, eff.distance got lost somewhere in the parsetree (am I saying this correctly?) The funny thing is when we change origin to a matrix : > origin <- matrix(origin,ncol=2) > unlist(xyValues(a,xy=origin,buffer=eff.dist)) [1] 14140 14141 14500 14501 > test(a,origin) [1] 14140 14141 14500 14501 It all works again! So something goes wrong with passing the arguments from one method to another using callGeneric. Is this a bug in R or am I missing something obvious? The relevant code from the raster package : setMethod("xyValues", signature(object='Raster', xy='vector'), function(object, xy, ...) { if (length(xy) == 2) { callGeneric(object, matrix(xy, ncol=2), ...) } else { stop('xy coordinates should be a two-column matrix or data.frame, or a vector of two numbers.') } } ) setMethod("xyValues", signature(object='RasterLayer', xy='matrix'), function(object, xy, method='simple', buffer=NULL, fun=NULL, na.rm=TRUE) { if (dim(xy)[2] != 2) { stop('xy has wrong dimensions; it should have 2 columns' ) } if (! is.null(buffer)) { return( .xyvBuf(object, xy, buffer, fun, na.rm=na.rm) ) } if (method=='bilinear') { return(.bilinearValue(object, xy)) } else if (method=='simple') { cells <- cellFromXY(object, xy) return(.readCells(object, cells)) } else { stop('invalid method argument. Should be simple or bilinear.') } } ) -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems when Apply a script to a list
Answers below. On Thu, Aug 26, 2010 at 11:20 AM, Evgenia wrote: > > Dear users, > > ***I have a function f to simulate data from a model (example below used > only to show my problems) > > f<-function(n,mean1){ > a<-matrix(rnorm(n, mean1 , sd = 1),ncol=5) > b<-matrix(runif(n),ncol=5) > data<-rbind(a,b) > out<-data > out} > > *I want to simulate 1000 datasets (here only 5) so I use > S<-list() > > for (i in 1:5){ > S[[i]]<-f(n=10,mean1=0)} > > **I have a very complicated function for estimation of a model which I > want to apply to Each one of the above simulated datasets > > fun<-function(data){data<-as.matrix(data) > sink(' Example.txt',append=TRUE) > cat("\n***\nEstimation > \n\nDataset Sim : ", > i ) > d<-data%*%t(data) > s<-solve(d) > print(s) > out<-s > out > } > results<-list() > for (i in 1:5){results[[i]]<-fun(data=S[[i]])} > > > My questions are: > 1) for some datasets system is computational singular and this causes > execution of the for to stop.By this way I have only results until this > problem happens.How can I pass over the execution for this step and have > results for All other datasets for which function fun is applicable? see ?try, or ?tryCatch. I'd do something in the line of for(i in 1:5){ tmp <- try(fun(data=S[[i]])) results[[i]] <- ifelse(is(tmp,"try-error"),NA,tmp) } Alternatively, you could also use lapply : results <- lapply(S,function(x{ tmp <- try(fun(data=x)) ifelse(is(tmp,"try-error"),NA,tmp) }) > > 2) After some runs to my program, I receive this error message someError in > sink("output.txt") : sink stack is full . How can I solve this problem, as I > want to have results of my program for 1000 datasets. That is because you never empty the sink. add sink() after the last line you want in the file. This will empty the sink buffer to the file. Otherwise R keeps everything in the memory, and that gets too full after a while. > > 3) Using for is the correct way to run my proram for a list See the lapply solution. > > Thanks alot > > Evgenia > > -- > View this message in context: > http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html > Sent from the R help mailing list archive at Nabble.com. > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change value of a slot of an S4 object within a method.
OK, I admit. It will never win a beauty price, but I won't either so we go pretty well together. Seriously, I am aware this is about the worst way to solve such a problem. But as I said, rewriting the class definitions wasn't an option any more. I'll definitely take your advice for the next project. Cheers Joris On Wed, Aug 25, 2010 at 9:36 PM, Steve Lianoglou wrote: > Howdy, > My eyes start to gloss over on their first encounter of `substitute` > ... add enough `eval`s and (even) an `expression` (!) to that, and > you'll probably see me running for the hills ... but hey, if it makes > sense to you, more power to you ;-) > > Glad you found a fix that works, > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change value of a slot of an S4 object within a method.
Hi Steve, thanks for the tip. I'll definitely take a closer look at your solution for implementation for future use. But right now I don't have the time to start rewriting my class definitions. Luckily, I found where exactly things were going wrong. After reading into the documentation about the evaluation in R, I figured out I have to specify the environment where substitute should look explicitly as parent.frame(1). I still don't understand completely why exactly, but it does the job. Thus : eval(eval(substitute(expression(obj...@extra[[name]] <<- value should become : eval( eval( substitute( expression(obj...@extra[[name]] <<- value) ,env=parent.frame(1) ) ) ) Tried it out and it works. Cheers Joris On Wed, Aug 25, 2010 at 6:21 PM, Steve Lianoglou wrote: > Hi Joris, > > On Wed, Aug 25, 2010 at 11:56 AM, Joris Meys wrote: >> Dear all, >> >> I have an S4 class with a slot "extra" which is a list. I want to be >> able to add an element called "name" to that list, containing the >> object "value" (which can be a vector, a dataframe or another S4 >> object) >> >> Obviously >> >> setMethod("add.extra",signature=c("PM10Data","character","vector"), >> function(object,name,value){ >> obj...@extra[[name]] <- value >> } >> ) >> >> Contrary to what I would expect, the line : >> eval(eval(substitute(expression(obj...@extra[[name]] <<- value >> >> gives the error : >> Error in obj...@extra[[name]] <<- value : object 'object' not found >> >> Substitute apparently doesn't work any more in S4 methods... >> >> I found a work-around by calling the initializer every time, but this >> influences the performance. Returning the object is also not an >> option, as I'd have to remember to assign that one each time to the >> original name. >> >> Basically I'm trying to do some call by reference with S4, but don't >> see how I should do that. How would I be able to tackle this problem >> in an efficient and elegant way? > > In lots of my own S4 classes I define a slot called ".cache" which is > an environment for this exact purpose. > > Using this solution for your scenario might look something like this: > > setMethod("add.extra",signature=c("PM10Data","character","vector"), > function(object, name, value) { > obj...@.cache$extra[[name]] <- value > }) > > I'm not sure what your entire problem looks like, but to "get" your > extra list, or a value form it, you could: > > setMethod("extra", signature="PM10Data", > function(object, name=NULL) { > if (!is.null(name)) { > obj...@.cache$extra[[name]] > } else { > obj...@.cache$extra > }) > > ... or something like that. > > The last thing you have to be careful of is that you nave to make sure > that each new("PM10Data") object you have initializes its *own* cache: > > setClass("PM10Data", representation(..., .cache='environment')) > setMethod("initialize", "PM10Data", > function(.Object, ..., .cache=new.env()) { > callNextMethod(.Object, .cache=.cache, ...) > }) > > Does that make sense? > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change value of a slot of an S4 object within a method.
Dear all, I have an S4 class with a slot "extra" which is a list. I want to be able to add an element called "name" to that list, containing the object "value" (which can be a vector, a dataframe or another S4 object) Obviously setMethod("add.extra",signature=c("PM10Data","character","vector"), function(object,name,value){ obj...@extra[[name]] <- value } ) Contrary to what I would expect, the line : eval(eval(substitute(expression(obj...@extra[[name]] <<- value gives the error : Error in obj...@extra[[name]] <<- value : object 'object' not found Substitute apparently doesn't work any more in S4 methods... I found a work-around by calling the initializer every time, but this influences the performance. Returning the object is also not an option, as I'd have to remember to assign that one each time to the original name. Basically I'm trying to do some call by reference with S4, but don't see how I should do that. How would I be able to tackle this problem in an efficient and elegant way? Thank you in advance Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???
Follow up: I finally succeeded to more or less reproduce the error. The origin lied in the fact that I accidently loaded a function while being in browser mode for debugging that function. So something went very much wrong with the namespaces. Teaches me right... Cheers Joris On Wed, Jul 28, 2010 at 2:27 PM, Joris Meys wrote: > Dear all, > > it gets even more weird. After restarting R, the code I used works > just fine. The call is generated in a function that I debugged using > browser(). Problem is solved, but I have no clue whatsoever how that > error came about. It must have something to do with namespaces, but > the origin is dark. I tried to regenerate the error, but didn't > succeed. > > Somebody an idea as to where I have to look for a cause? > > Cheers > Joris > > On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys wrote: >> Dear all, >> >> I run a gamm with following call : >> >> result <- try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP >> )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month >> ),data=tmp) )" >> >> with mgcv version 1.6.2 >> >> No stress about the data, the error is not data-related. I get : >> >> Error in isS4(x) : object 'VM' not found >> >> What so? I did define the dataframe to be used, and the dataframe >> contains a variable VM : >> >>> str(tmp) >> 'data.frame': 4014 obs. of 12 variables: >> $ values : num 73.45 105.45 74.45 41.45 -4.55 ... >> $ dates :Class 'Date' num [1:4014] 9862 9863 9864 9865 9866 ... >> $ year : num -5.65 -5.65 -5.65 -5.65 -5.65 ... >> $ day : num -178 -177 -176 -175 -174 ... >> $ month : Factor w/ 156 levels "1996-April","1996-August",..: 17 17 >> 17 17 17 17 17 17 17 17 ... >> $ julday : num -2241 -2240 -2239 -2238 -2237 ... >> $ weekend: num -0.289 -0.289 -0.289 0.711 0.711 ... >> $ VM : num 0.139 -1.451 0.349 0.839 -0.611 ... >> $ RH : num 55.2 61.4 59.8 64.1 60.7 ... >> $ TT : num -23.4 -23.6 -19.5 -16.1 -15.3 ... >> $ PP : num 6.17 4.27 -4.93 -9.23 -2.63 ... >> $ RF : Ord.factor w/ 3 levels "None"<"<2.5mm"<..: 1 1 1 1 1 1 1 1 1 1 >> ... >> - attr(*, "means")= num >> >> Any idea what I'm missing here? >> >> Cheers >> Joris >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???
Dear all, it gets even more weird. After restarting R, the code I used works just fine. The call is generated in a function that I debugged using browser(). Problem is solved, but I have no clue whatsoever how that error came about. It must have something to do with namespaces, but the origin is dark. I tried to regenerate the error, but didn't succeed. Somebody an idea as to where I have to look for a cause? Cheers Joris On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys wrote: > Dear all, > > I run a gamm with following call : > > result <- try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP > )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month > ),data=tmp) )" > > with mgcv version 1.6.2 > > No stress about the data, the error is not data-related. I get : > > Error in isS4(x) : object 'VM' not found > > What so? I did define the dataframe to be used, and the dataframe > contains a variable VM : > >> str(tmp) > 'data.frame': 4014 obs. of 12 variables: > $ values : num 73.45 105.45 74.45 41.45 -4.55 ... > $ dates :Class 'Date' num [1:4014] 9862 9863 9864 9865 9866 ... > $ year : num -5.65 -5.65 -5.65 -5.65 -5.65 ... > $ day : num -178 -177 -176 -175 -174 ... > $ month : Factor w/ 156 levels "1996-April","1996-August",..: 17 17 > 17 17 17 17 17 17 17 17 ... > $ julday : num -2241 -2240 -2239 -2238 -2237 ... > $ weekend: num -0.289 -0.289 -0.289 0.711 0.711 ... > $ VM : num 0.139 -1.451 0.349 0.839 -0.611 ... > $ RH : num 55.2 61.4 59.8 64.1 60.7 ... > $ TT : num -23.4 -23.6 -19.5 -16.1 -15.3 ... > $ PP : num 6.17 4.27 -4.93 -9.23 -2.63 ... > $ RF : Ord.factor w/ 3 levels "None"<"<2.5mm"<..: 1 1 1 1 1 1 1 1 1 1 ... > - attr(*, "means")= num > > Any idea what I'm missing here? > > Cheers > Joris > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???
Dear all, I run a gamm with following call : result <- try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month ),data=tmp) )" with mgcv version 1.6.2 No stress about the data, the error is not data-related. I get : Error in isS4(x) : object 'VM' not found What so? I did define the dataframe to be used, and the dataframe contains a variable VM : > str(tmp) 'data.frame': 4014 obs. of 12 variables: $ values : num 73.45 105.45 74.45 41.45 -4.55 ... $ dates :Class 'Date' num [1:4014] 9862 9863 9864 9865 9866 ... $ year : num -5.65 -5.65 -5.65 -5.65 -5.65 ... $ day: num -178 -177 -176 -175 -174 ... $ month : Factor w/ 156 levels "1996-April","1996-August",..: 17 17 17 17 17 17 17 17 17 17 ... $ julday : num -2241 -2240 -2239 -2238 -2237 ... $ weekend: num -0.289 -0.289 -0.289 0.711 0.711 ... $ VM : num 0.139 -1.451 0.349 0.839 -0.611 ... $ RH : num 55.2 61.4 59.8 64.1 60.7 ... $ TT : num -23.4 -23.6 -19.5 -16.1 -15.3 ... $ PP : num 6.17 4.27 -4.93 -9.23 -2.63 ... $ RF : Ord.factor w/ 3 levels "None"<"<2.5mm"<..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "means")= num Any idea what I'm missing here? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random sample from arrays
Could you elaborate? Both x <- 1:4 set <- matrix(nrow = 50, ncol = 11) for(i in c(1:11)){ set[,i] <-sample(x,50) print(c(i,"->", set), quote = FALSE) } and x <- 1:4 set <- matrix(nrow = 50, ncol = 11) for(i in c(1:50)){ set[i,] <-sample(x,11) print(c(i,"->", set), quote = FALSE) } run perfectly fine on my computer. Cheers On Fri, Jul 9, 2010 at 3:10 PM, Assa Yeroslaviz wrote: > Hi Joris, > I guess i did it wrong again. > but your example didn't work either. I still get the error massage. > > but replicate function just fine. I can even replicate the whole array > lines. > > THX > > Assa > > On Thu, Jul 8, 2010 at 15:20, Joris Meys wrote: >> >> Don't know what exactly you're trying to do, but you make a matrix >> with 11 columns and 50 rows, then treat it as a vector. On top of >> that, you try to fill 50 rows/columns with 50 values. Off course that >> doesn't work. Did you check the warning messages when running the >> code? >> >> Either do : >> >> for(i in c(1:11)){ >> set[,i] <-sample(x,50) >> print(c(i,"->", set), quote = FALSE) >> } >> >> or >> >> for(i in c(1:50)){ >> set[i,] <-sample(x,11) >> print(c(i,"->", set), quote = FALSE) >> } >> >> Or just forget about the loop altogether and do : >> >> set <- replicate(11,sample(x,50)) >> or >> set <- t(replicate(50,sample(x,11))) >> >> cheers >> >> On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz wrote: >> > Hello R users, >> > >> > I'm trying to extract random samples from a big array I have. >> > >> > I have a data frame of over 40k lines and would like to produce around >> > 50 >> > random sample of around 200 lines each from this array. >> > >> > this is the matrix >> > ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T >> > xxx__8T >> > yyy_1c yyy_1c _2c >> > 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 >> > 2.115634 2.189372 1.643328 1.563523 >> > 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 >> > 11.650336 11.995504 13.024494 12.776322 >> > 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 >> > 2.033982 2.111937 1.606340 1.548940 >> > 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 >> > 9.160038 9.104121 10.079177 9.828757 >> > 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 >> > 3.559434 3.156093 4.308437 4.045098 >> > 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 >> > 7.410085 7.022582 7.668616 7.953706 >> > ... >> > >> > I tried to do it with a for loop: >> > >> > genelist <- read.delim("/user/R/raw_data.txt") >> > rownames(genelist) <- genelist[,1] >> > genes <- rownames(genelist) >> > >> > x <- 1:4 >> > set <- matrix(nrow = 50, ncol = 11) >> > >> > for(i in c(1:50)){ >> > set[i] <-sample(x,50) >> > print(c(i,"->", set), quote = FALSE) >> > } >> > >> > which basically do the trick, but I just can't save the results outside >> > the >> > loop. >> > After having the random sets of lines it wasn't a problem to extract the >> > line from the arrays using subset. >> > >> > genSet1 <-sample(x,50) >> > random1 <- genes %in% genSet1 >> > subsetGenelist <- subset(genelist, random1) >> > >> > >> > is there a different way of creating these random vectors or saving the >> > loop >> > results outside tjhe loop so I cn work with them? >> > >> > Thanks a lot >> > >> > Assa >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split with list
One solution is to put these unwanted entries to "" repor$9853312 [1:2,2:3] <- "" Cheers Joris On Fri, Jul 9, 2010 at 12:18 PM, n.via...@libero.it wrote: > > Dear List I would like to ask you something concenting a better print of the > R output: > I have a bit data frame which has the following structure: > CFISCALE RAGSOCB ANNO VAR1 VAR2. > 9853312 astra 2005 6 > 45 > > 9853312 astra 2006 78 > 45 > > > 9853312 astra 2007 55 > 76 > > > 9653421 geox 2005 35 > 89 > > > > 9653421 geox 2006 24 > 33 > > 9653421 geox 2007 54 > 55 > > > The first thing I did is to split my data frame for CFISCALE. The result is > that R has transformed my data frame into a list. The second step was to > transpose each element of my list. > repo=split(rep,rep$CFISCALE) > repor=lapply(repo,function(x){ > t(x)}) > > > When I print my list the format is the following > $9853312 > 1 2 > 3 > > CFISCALE "9853312" "9853312" "9853312" > > RAGSOCB "astra" "astra" "astra" > > ANNO "2005" "2006" > "2007" > > VAR1 6 78 > 55 > > VAR2 45 45 > 76 > > > There is a way to remove the first row I mean 1, 2 , 3 and to have just one > CFISCALE and RAGSOCB??? > For the second problem I tried to use unique but it seems that it doesnt work > for list. So what I would like to get is: > $9853312 > > > > > CFISCALE "9853312" > > > RAGSOCB "astra" > ANNO "2005" "2006" > "2007" > > VAR1 6 78 > 55 > > VAR2 45 45 > 76 > > > This is because I next run xtable on my list in order to get a table in > Latex, which I woud like to be in a nice format. > Thanks a lot for your attention! > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric regression
Just to be correct : gam is mentioned on the page Tal linked to, but is a semi-parametric approach using maximum likelihood. It stays valid though. Another thing : you detect non-normality. But can you use a Poisson distribution for example? The framework of generalized linear models and generalized additive models allows you to deal with non-normality of your data. In any case, I suggest you contact a statistician nearby for guidance. Cheers Joris On Fri, Jul 9, 2010 at 10:26 AM, Tal Galili wrote: > >From reviewing the first google page result for "Non-parametric regression > R", I hope this link will prove useful: > > http://socserv.mcmaster.ca/jfox/Courses/Oxford-2005/R-nonparametric-regression.html > > > > Contact > Details:--- > Contact me: tal.gal...@gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > -- > > > > > On Fri, Jul 9, 2010 at 11:01 AM, Ralf B wrote: > >> I have two data sets, each a vector of 1000 numbers, each vector >> representing a distribution (i.e. 1000 numbers each of which >> representing a frequency at one point on a scale between 1 and 1000). >> For similfication, here an short version with only 5 points. >> >> >> a <- c(8,10,8,12,4) >> b <- c(7,11,8,10,5) >> >> Leaving the obvious discussion about causality aside fro a moment, I >> would like to see how well i can predict b from a using a regression. >> Since I do not know anything about the distribution type and already >> discovered non-normality I cannot use parametric regression or >> anything GLM for that matter. >> >> How should I proceed in using non-parametric regression to model >> vector a and see how well it predicts b? Perhaps you could extend the >> given lines into a short example script to give me an idea? Are there >> any other options? >> >> Best, >> Ralf >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R^2 in loess and predict?
I do not agree with your interpretation of the adjusted R^2. The R^2 is no more than the ratio of the explained variance by the total variance, expressed in sums of squares. The adjusted R^2 is adjusted for the degrees of freedom, and can only be used for selection purposes. The interpretation towards the final model is hard, and definitely not a measure of how well it models the population. For a loess regression this can be calculated as well. But the loess is a local regression technique, highly flexible, and highly dependent on the window you use. In these cases, R^2 (or any other goodness of fit test) tells you even less. You can get an R^2 of 1 if you chose the window small enough. If you want to do inference on nonlinear regression techniques, I strongly suggest you use Generalized Additive Models, eg from the package mgcv. There you can use the framework of likelihood ratio tests for determination of goodness of fit by comparing models. Cheers Joris On Fri, Jul 9, 2010 at 10:42 AM, Ralf B wrote: > Parametric regression produces R^2 as a measure of how well the model > predicts the sample and adjusted R^2 as a measure of how well it > models the population. What is the equalvalent for non-parametric > regression (e.g. loess function) ? > > Ralf > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package installation for Windows 7
On Thu, Jul 8, 2010 at 3:39 PM, Duncan Murdoch wrote: > On 08/07/2010 9:26 AM, David Bickel wrote: >> >> Yes, the User into which I logged in before launching RGui is an >> Administrator. Correct, the problem is not limited to Bioconductor packages. >> > > On Windows 7, it's not enough to have the user be an administrator, you need > to run programs specifically with administrative rights. You do this by > right clicking on the icon, and choosing "Run as administrator". Reason for this is that "Program Files" is a protected directory, and changes can only be done by programs that get administrator rights (which is not the same as running a program in an administrator account). Without those administrator rights, R cannot access the default folder for the packages, as that one is also included in the Program Files. Also changing the Rprofile.site becomes a hassle, as so many other tweaks. Actually, the only programs getting there are Microsoft related applications. If there's no strict need to place a program there, I stay away from that folder as far as possible. Vista has the same problem by the way, and obviously the same solutions. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using nlm or optim
Without data I can't check, but try : mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean)) With a random dataset I get : > Mean <- rnorm(136) > Var <- 1 + rnorm(136)^2 > mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean)) Error in optim(start, f, method = method, hessian = TRUE, ...) : initial value in 'vmmin' is not finite This might be just a data problem, but again, I'm not sure. Cheers Joris On Thu, Jul 8, 2010 at 3:11 AM, Anita Narwani wrote: > Hello, > I am trying to use nlm to estimate the parameters that minimize the > following function: > > Predict<-function(M,c,z){ > + v = c*M^z > + return(v) > + } > > M is a variable and c and z are parameters to be estimated. > > I then write the negative loglikelihood function assuming normal errors: > > nll<-function(M,V,c,z,s){ > n<-length(Mean) > logl<- -.5*n*log(2*pi) -.5*n*log(s) - (1/(2*s))*sum((V-Predict(Mean,c,z))^2) > return(-logl) > } > > When I put the Mean and Variance (variables with 136 observations) into this > function, and estimates for c,z, and s, it outputs the estimate for the > normal negative loglikelihood given the data, so I know that this works. > > However, I am unable to use mle to estimate the parameters c, z, and s. I do > not know how or where the data i.e. Mean (M) and Variance (V) should enter > into the mle function. I have tried variations on > > mle(nll,start=list(c=0.01,z=2.1,s=200)) including > mle(nll,start=list(M=Mean,V=Var, c=0.01,z=2.1,s=200)) > > I keep getting errors and am quite certain that I just have a syntax error > in the script because I don't know how to enter the variables into MLE. > > Thanks for your help, > Anita. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random sample from arrays
Don't know what exactly you're trying to do, but you make a matrix with 11 columns and 50 rows, then treat it as a vector. On top of that, you try to fill 50 rows/columns with 50 values. Off course that doesn't work. Did you check the warning messages when running the code? Either do : for(i in c(1:11)){ set[,i] <-sample(x,50) print(c(i,"->", set), quote = FALSE) } or for(i in c(1:50)){ set[i,] <-sample(x,11) print(c(i,"->", set), quote = FALSE) } Or just forget about the loop altogether and do : set <- replicate(11,sample(x,50)) or set <- t(replicate(50,sample(x,11))) cheers On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz wrote: > Hello R users, > > I'm trying to extract random samples from a big array I have. > > I have a data frame of over 40k lines and would like to produce around 50 > random sample of around 200 lines each from this array. > > this is the matrix > ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T > yyy_1c yyy_1c _2c > 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 > 2.115634 2.189372 1.643328 1.563523 > 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 > 11.650336 11.995504 13.024494 12.776322 > 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 > 2.033982 2.111937 1.606340 1.548940 > 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 > 9.160038 9.104121 10.079177 9.828757 > 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 > 3.559434 3.156093 4.308437 4.045098 > 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 > 7.410085 7.022582 7.668616 7.953706 > ... > > I tried to do it with a for loop: > > genelist <- read.delim("/user/R/raw_data.txt") > rownames(genelist) <- genelist[,1] > genes <- rownames(genelist) > > x <- 1:4 > set <- matrix(nrow = 50, ncol = 11) > > for(i in c(1:50)){ > set[i] <-sample(x,50) > print(c(i,"->", set), quote = FALSE) > } > > which basically do the trick, but I just can't save the results outside the > loop. > After having the random sets of lines it wasn't a problem to extract the > line from the arrays using subset. > > genSet1 <-sample(x,50) > random1 <- genes %in% genSet1 > subsetGenelist <- subset(genelist, random1) > > > is there a different way of creating these random vectors or saving the loop > results outside tjhe loop so I cn work with them? > > Thanks a lot > > Assa > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package installation for Windows 7
ttp://bioconductor.org/biocLite.R";) >> BioC_mirror = http://www.bioconductor.org >> Change using chooseBioCmirror(). >> Warning messages: >> 1: In safeSource() : Redefining ‘biocinstall’ >> 2: In safeSource() : Redefining ‘biocinstallPkgGroups’ >> 3: In safeSource() : Redefining ‘biocinstallRepos’ >> > biocLite("Biobase") >> Using R version 2.11.1, biocinstall version 2.6.7. >> Installing Bioconductor version 2.6 packages: >> [1] "Biobase" >> Please wait... >> >> Warning in install.packages(pkgs = pkgs, repos = repos, ...) : >> argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' >> Error in if (ok) { : missing value where TRUE/FALSE needed >> >> > utils:::menuInstallLocal() # "Install package(s) from local zip >> files..." >> Error in if (ok) { : missing value where TRUE/FALSE needed >> >> > utils:::menuInstallPkgs() # "Install package(s)..." >> --- Please select a CRAN mirror for use in this session --- >> Error in if (ok) { : missing value where TRUE/FALSE needed >> >> >> I would appreciate any assistance. >> >> David >> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strsplit("dia ma", "\\b") splits characterwise
l guess this is expected behaviour, although counterintuitive. \b represents an empty string indicating a word boundary, but is coerced to character and thus simply the empty string. This means the output you get is the same as > strsplit("dia ma", "",perl=T) [[1]] [1] "d" "i" "a" " " "m" "a" I'd use the seperating character as split in strsplit, eg > strsplit("dia ma", "\\s") [[1]] [1] "dia" "ma" If you need the space in the list as well, you'll have to go around it I guess. > test <- as.vector(gregexpr("\\b", "dia ma", perl=TRUE)[[1]]) > test [1] 1 4 5 7 > apply(embed(test,2),1,function(x) substr("dia ma",x[2],x[1]-1)) [1] "dia" " " "ma" It would be nice if special characters like \b would be recognized by strsplit as well though. Cheers Joris On Thu, Jul 8, 2010 at 10:15 AM, Suharto Anggono Suharto Anggono wrote: > \b is word boundary. > But, unexpectedly, strsplit("dia ma", "\\b") splits character by character. > >> strsplit("dia ma", "\\b") > [[1]] > [1] "d" "i" "a" " " "m" "a" > >> strsplit("dia ma", "\\b", perl=TRUE) > [[1]] > [1] "d" "i" "a" " " "m" "a" > > > How can that be? > > This is the output of 'gregexpr'. > >> gregexpr("\\b", "dia ma") > [[1]] > [1] 1 2 3 4 5 6 > attr(,"match.length") > [1] 0 0 0 0 0 0 > >> gregexpr("\\b", "dia ma", perl=TRUE) > [[1]] > [1] 1 4 5 7 > attr(,"match.length") > [1] 0 0 0 0 > > > The output from gregexpr("\\b", "dia ma", perl=TRUE) is what I expect. I > expect 'strsplit' to split at that points. > > This is in Windows. R was installed from binary. > >> sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > > R 2.8.1 shows the same 'strsplit' behavior, but the behavior of default > 'gregexpr' (i.e. perl=FALSE) is different. > >> strsplit("dia ma", "\\b") > [[1]] > [1] "d" "i" "a" " " "m" "a" > >> strsplit("dia ma", "\\b", perl=TRUE) > [[1]] > [1] "d" "i" "a" " " "m" "a" > >> gregexpr("\\b", "dia ma") > [[1]] > [1] 1 4 5 7 > attr(,"match.length") > [1] 0 0 0 0 > >> gregexpr("\\b", "dia ma", perl=TRUE) > [[1]] > [1] 1 4 5 7 > attr(,"match.length") > [1] 0 0 0 0 > >> sessionInfo() > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MON > ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United > States.1252 > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to determine order of arima bt acf and pacf
Did you read these already? http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf http://www.barigozzi.eu/ARIMA.pdf Cheers Joris On Thu, Jul 8, 2010 at 11:45 AM, vijaysheegi wrote: > > Hi R community, > I am new to R.Have some doubts on ACF and pacf.Appying acf and pacf to > dataframe or table.How to determine ARIMA degrees which suits our example . > Please assist me on this and also please give me link for the same so that i > will also try understand the solution. > > Thanks in advance > vijaysheegi > student > > > -- > View this message in context: > http://r.789695.n4.nabble.com/how-to-determine-order-of-arima-bt-acf-and-pacf-tp2282053p2282053.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package installation for Windows 7
N mirror for use in this session --- > Error in if (ok) { : missing value where TRUE/FALSE needed > > > I would appreciate any assistance. > > David > > -- > David R. Bickel, PhD > Associate Professor > Ottawa Institute of Systems Biology > Biochem., Micro. and I. Department > Mathematics and Statistics Department > University of Ottawa > 451 Smyth Road > Ottawa, Ontario K1H 8M5 > > http://www.statomics.com > > Office Tel: (613) 562-5800 ext. 8670 > Office Fax: (613) 562-5185 > Office Room: RGN 4510F (Follow the signs to the elevator, and take it to the > fourth floor. Turn left and go all the way to the end of the hall, and enter > the door to the OISB area.) > Lab Tel.: (613) 562-5800 ext. 8304 > Lab Room: RGN 4501T > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relation in aggregated data
Depending on the data and the research question, a meta-analytic approach might be appropriate. You can see every campaign as a "study". See the package metafor for example. You can only draw very general conclusions, but at least your inference will be closer to correct. Cheers Joris On Thu, Jul 8, 2010 at 10:03 AM, Petr PIKAL wrote: > Thank you > > Actually when I do this myself I always try to make day or week averages > if possible. However this was done by one of my colleagues and basically > the aggregation was done on basis of campaigns. There is 4 to 6 campaigns > per year and sometimes there is apparent relationship in aggregated data > sometimes is not. My opinion is that I can not say much about exact > relations until I have other clues or ways like expected underlaying laws > of physics. > > Thanks again > > Best regards > Petr > > > > Joris Meys napsal dne 07.07.2010 17:33:55: > >> You examples are pretty extreme... Combining 120 data points in 4 >> points is off course never going to give a result. Try : >> >> fac <- rep(1:8,each=15) >> xprum <- tapply(x, fac, mean) >> yprum <- tapply(y, fac, mean) >> plot(xprum, yprum) >> >> Relation is not obvious, but visible. >> >> Yes, you lose information. Yes, your hypothesis changes. But in the >> case you describe, averaging the x-values for every day (so you get an >> average linked to 1 y value) seems like a possibility, given you take >> that into account when formulating the hypothesis. Optimally, you >> should take the standard error on the average into account for the >> analysis, but this is complicated, often not done and in most cases >> ignoring this issue is not influencing the results to that extent it >> becomes important. >> >> YMMV >> >> Cheers >> >> On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL > wrote: >> > Dear all >> > >> > My question is more on statistics than on R, however it can be >> > demonstrated by R. It is about pros and cons trying to find a > relationship >> > by aggregated data. I can have two variables which can be related and > I >> > measure them regularly during some time (let say a year) but I can not >> > measure them in a same time - (e.g. I can not measure x and respective >> > value of y, usually I have 3 or more values of x and only one value of > y >> > per day). >> > >> > I can make a aggregated values (let say quarterly). My questions are: >> > >> > 1. Is such approach sound? Can I use it? >> > 2. What could be the problems >> > 3. Is there any other method to inspect variables which can be >> > related but you can not directly measure them in a same time? >> > >> > My opinion is, that it is not much sound to inspect aggregated values > and >> > there can be many traps especially if there are only few aggregated >> > values. Below you can see my examples. >> > >> > If you have some opinion on this issue, please let me know. >> > >> > Best regards >> > Petr >> > >> > Let us have a relation x/y >> > >> > set.seed(555) >> > x <- rnorm(120) >> > y <- 5*x+3+rnorm(120) >> > plot(x, y) >> > >> > As you can see there is clear relation which can be seen from plot. > Now I >> > make a factor for aggregation. >> > >> > fac <- rep(1:4,each=30) >> > >> > xprum <- tapply(x, fac, mean) >> > yprum <- tapply(y, fac, mean) >> > plot(xprum, yprum) >> > >> > Relationship is completely gone. Now let us make other fake data >> > >> > xn <- runif(120)*rep(1:4, each=30) >> > yn <- runif(120)*rep(1:4, each=30) >> > plot(xn,yn) >> > >> > There is no visible relation, xn and yn are independent but related to >> > aggregation factor. >> > >> > xprumn <- tapply(xn, fac, mean) >> > yprumn <- tapply(yn, fac, mean) >> > plot(xprumn, yprumn) >> > >> > Here you can see perfect relation which is only due to aggregation > factor. >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] when all I have is a contingency table....
See Teds answer for histogram (I'd go with barplot). For most statistical procedures there is a weighted version (e.g. weighted.mean() for the mean). Your counts are valid weights for most procedures. Cheers Joris On Wed, Jul 7, 2010 at 10:39 PM, Andrei Zorine wrote: > Hello, > I just need a hint here: > Suppose I have no raw data, but only a frequency table I have, and I > want to run basic statistical procedures with it, like histogram, > descriptive statistics, etc. How do I do this with R? > For example, how do I plot a histogram for this table for a sample of size 60? > > Value Count > 1 10 > 2 8 > 3 12 > 4 9 > 5 14 > 6 7 > > > Thanks, > A.Z. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with write.table, involving loops & paste statement
To be correct, everything is written to all folders according to my testing. There is absolutely no need whatsoever to use l_ply. And in any case, take as much as possible outside the loop, like the library statement and the max.col. Following is _a_ solution, not the most optimal, but as close as feasible to your code : A_var_df <- data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)), from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1)) # I make some data and make sure I can adjust the number of dirs and the steps Dchr1 <-matrix(rep(1:100,each=10),ncol=100) dirs <- 20 max.col <- ncol(Dchr1) steps = ceiling(max.col/dirs) cols <- seq(1, max.col, by=steps) for(i in 1:length(A_var_df[,1])) { k <- cols[i] print(as.data.frame(Dchr1[,k:min(k+steps, max.col)])) print(paste("./A_",A_var_df[i,1], "/k.csv", sep="")) # I use print here just to show which dataframe is going to which directory, # you can reconstruct the write.table yourself. } Cheers Joris On Wed, Jul 7, 2010 at 9:32 PM, CC wrote: > Hi! > > I want to write portions of my data (3573 columns at a time) to twenty > folders I have available titled "A_1" to "A_20" such that the first 3573 > columns will go to folder A_1, next 3573 to folder A_2 and so on. > > This code below ensures that the data is written into all 20 folders, but > only the last iteration of the loop (last 3573 columns) is being written > into ALL of the folders (A_1 to A_20) rather than the sequential order that > I would like. > > How can I fix this? > > Thank you! > > * > > Code: > > A_var_df <- data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)), > from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1)) > > for(i in 1:length(A_var_df[,1])) > { > library(plyr) > max.col <- ncol(Dchr1) > l_ply(seq(1, max.col, by=3573), function(k) > write.table(as.data.frame(Dchr1[,k:min(k+3572, max.col)]), paste("./A_", > A_var_df[i,1], "/k.csv", sep=""), sep=",", row.names=F, quote=F) ) > } > > ** > > -- > Thanks, > CC > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relation in aggregated data
You examples are pretty extreme... Combining 120 data points in 4 points is off course never going to give a result. Try : fac <- rep(1:8,each=15) xprum <- tapply(x, fac, mean) yprum <- tapply(y, fac, mean) plot(xprum, yprum) Relation is not obvious, but visible. Yes, you lose information. Yes, your hypothesis changes. But in the case you describe, averaging the x-values for every day (so you get an average linked to 1 y value) seems like a possibility, given you take that into account when formulating the hypothesis. Optimally, you should take the standard error on the average into account for the analysis, but this is complicated, often not done and in most cases ignoring this issue is not influencing the results to that extent it becomes important. YMMV Cheers On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL wrote: > Dear all > > My question is more on statistics than on R, however it can be > demonstrated by R. It is about pros and cons trying to find a relationship > by aggregated data. I can have two variables which can be related and I > measure them regularly during some time (let say a year) but I can not > measure them in a same time - (e.g. I can not measure x and respective > value of y, usually I have 3 or more values of x and only one value of y > per day). > > I can make a aggregated values (let say quarterly). My questions are: > > 1. Is such approach sound? Can I use it? > 2. What could be the problems > 3. Is there any other method to inspect variables which can be > related but you can not directly measure them in a same time? > > My opinion is, that it is not much sound to inspect aggregated values and > there can be many traps especially if there are only few aggregated > values. Below you can see my examples. > > If you have some opinion on this issue, please let me know. > > Best regards > Petr > > Let us have a relation x/y > > set.seed(555) > x <- rnorm(120) > y <- 5*x+3+rnorm(120) > plot(x, y) > > As you can see there is clear relation which can be seen from plot. Now I > make a factor for aggregation. > > fac <- rep(1:4,each=30) > > xprum <- tapply(x, fac, mean) > yprum <- tapply(y, fac, mean) > plot(xprum, yprum) > > Relationship is completely gone. Now let us make other fake data > > xn <- runif(120)*rep(1:4, each=30) > yn <- runif(120)*rep(1:4, each=30) > plot(xn,yn) > > There is no visible relation, xn and yn are independent but related to > aggregation factor. > > xprumn <- tapply(xn, fac, mean) > yprumn <- tapply(yn, fac, mean) > plot(xprumn, yprumn) > > Here you can see perfect relation which is only due to aggregation factor. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different goodness of fit tests leads to contradictory conclusions
The first two options are GOF-tests for ungrouped data, the latter two can only be used for grouped data. According to my experience, the G^2 test is more influenced by this than the X^2 test (gives -wrongly- significant deviations more easily when used for ungrouped data). If you started from binary data, you can only use the Hosmer tests. Cheers Joris On Wed, Jul 7, 2010 at 4:00 PM, Kiyoshi Sasaki wrote: > I am trying to test goodness of fit for my legalistic regression using > several options as shown below. Hosmer-Lemeshow test (whose function I > borrowed from a previous post), Hosmer–le Cessie omnibus lack of fit test > (also borrowed from a previous post), Pearson chi-square test, and deviance > test. All the tests, except the deviance tests, produced p-values well above > 0.05. Would anyone please teach me why deviance test produce p-values such a > small value (0.001687886) that suggest lack of fit, while other tests suggest > good fit? Did I do something wrong? > > Thank you for your time and help! > > Kiyoshi > > > mod.fit <- glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = > logit), data = no.NA, na.action = na.exclude, control = list(epsilon = > 0.0001, maxit = 50, trace = F)) > >> # Option 1: Hosmer-Lemeshow test >> mod.fit <- glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = >> logit), data = no.NA, na.action = na.exclude, control = list(epsilon = >> 0.0001, maxit = 50, trace = F)) > > > hosmerlem <- function (y, yhat, g = 10) > { > cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), > include.lowest = T) > obs <- xtabs(cbind(1 - y, y) ~ cutyhat) > expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat) > chisq <- sum((obs - expect)^2/expect) > P <- 1 - pchisq(chisq, g - 2) > c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P) > } > >> hosmerlem(no.NA$repcnd, fitted(mod.fit)) > X^2 Df P(>Chi) > 7.8320107 8.000 0.4500497 > > >> # Option 2 - Hosmer–le Cessie omnibus lack of fit test: >> library(Design) >> lrm.GOF <- lrm(formula = no.NA$repcnd ~ no.NA$svl, data = no.NA, y = T, x >> = T) >> resid(lrm.GOF,type = "gof") > Sum of squared errors Expected value|H0 SD > Z P > 48.3487115 48.3017399 > 0.1060826 0.4427829 0.6579228 > >> # Option 3 - Pearson chi-square p-value: >> pp <- sum(resid(lrm.GOF,typ="pearson")^2) >> 1-pchisq(pp, mod.fit$df.resid) > [1] 0.4623282 > > >> # Option 4 - Deviance (G^2) test: >> 1-pchisq(mod.fit$deviance, mod.fit$df.resid) > [1] 0.001687886 > > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA and Regression
PCA components are orthogonal by definition so no, that doesn't make sense at all. Do yourself a favor and get a book on multivariate data analysis. Two books come to mind: Obviously the one of Hair and colleagues, called "multivariate data analysis" and easily found in a university library (or on the internet). The second one is "An R and S-Plus Companion to Multivariate Analysis" by Everitt. This one you have less chance of finding in the library, but you find it online for sale. This is a nice introduction as well : https://netfiles.uiuc.edu/miguez/www/Teaching/MultivariateRGGobi.pdf Never use a chainsaw without reading the manual. Cheers Joris On Tue, Jul 6, 2010 at 9:07 PM, Marino Taussig De Bodonia, Agnese wrote: > Hello, > > I am currently analyzing responses to questionnaires about general attitudes. > I have performed a PCA on my data, and have retained two Principal > Components. Now I would like to use the scores of both the principal > comonents in a multiple regression. I would like to know if it makes sense to > use the scores of one principal component to explain the variance in the > scores of another principal component: > > lm(scores of principal component 1~scores of principal component 2+ age, > gender, etc..) > > Please could you let me know if this is statistically sound? > > Thank you in advance for you time, > > Agnese > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help With ANOVA
I still can't reproduce your example. The aov output gives me the following : > anova(aov(Intensity ~ Group, data = zzzanova)) Analysis of Variance Table Response: Intensity Df Sum Sq Mean Sq F value Pr(>F) Group 5 98.85 19.771 2.1469 0.07576 . Residuals 48 442.03 9.209 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Next to that I noticed you have truncated data, which has implications for the analysis as well. If you use a Kruskal-Wallis test, the p-value becomes larger : > kruskal.test(Intensity ~ Group, data = zzzanova) Kruskal-Wallis rank sum test data: Intensity by Group Kruskal-Wallis chi-squared = 6.6955, df = 5, p-value = 0.2443 Which is to be expected, as you have almost 50% truncated data. So a p-value of 0.005 seems very wrong to me. Cheers Joris On Tue, Jul 6, 2010 at 7:11 PM, Amit Patel wrote: > Hi Joris > > Sorry i had a misprint in the appendix code in the last email > > datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, > -4.60517, 3.003749, -4.60517, > 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, > -4.60517, > -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, > -4.60517, 2.39127, > 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, > -4.60517, 2.121427, 1.973118, > -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, > 0.023703, -4.60517, > 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, > 1.371895, 1.533227) > > "zzzanova" <- > structure(list(Intensity = datalist, > Group = structure(c(1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4,4,4,4, > 5,5,5,5,5,5,5,5,5, > 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", > "Group4", "Group5", "Group6"), class = "factor"), > Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) > )) > , .Names = c("Intensity", > "Group", "Sample"), row.names = > c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", > "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", > "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", > "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", > "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", > "51", "52", "53", "54"),class = "data.frame") > > > Thanks for your reply > > > > > > - Original Message > From: Joris Meys > To: Amit Patel > Cc: r-help@r-project.org > Sent: Tue, 6 July, 2010 17:04:40 > Subject: Re: [R] Help With ANOVA > > We're missing the samp1 etc. in order to be able to test the code. > Where did you get the other p-value? > Cheers > Joris > > On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel wrote: >> Hi I needed some help with ANOVA >> >> I have a problem with My ANOVA >> analysis. I have a dataset with a known ANOVA p-value, however I can >> not seem to re-create it in R. >> >> I have created a list (zzzanova) which contains >> 1)Intensity Values >> 2)Group Number (6 Different Groups) >> 3)Sample Number (54 different samples) >> this is created by the script in Appendix 1 >> >> I then conduct ANOVA with the command >>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova) >> >> I get a p-value of >> Pr(>F)1 >> 0.9483218 >> >> The >> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly >> or have put in a wrong formula. I am trying to do an ANOVA analysis >> across all 6 Groups. Is there something wrong with my formula. But I think I >> have made a mistake in the formula rather than anything else. >> >> >> >> >> APPENDIX 1 >> >> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, >> -4.60517, 3.003749, -4.60517
Re: [R] Help With ANOVA
We're missing the samp1 etc. in order to be able to test the code. Where did you get the other p-value? Cheers Joris On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel wrote: > Hi I needed some help with ANOVA > > I have a problem with My ANOVA > analysis. I have a dataset with a known ANOVA p-value, however I can > not seem to re-create it in R. > > I have created a list (zzzanova) which contains > 1)Intensity Values > 2)Group Number (6 Different Groups) > 3)Sample Number (54 different samples) > this is created by the script in Appendix 1 > > I then conduct ANOVA with the command >> zzz.aov <- aov(Intensity ~ Group, data = zzzanova) > > I get a p-value of > Pr(>F)1 > 0.9483218 > > The > expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly > or have put in a wrong formula. I am trying to do an ANOVA analysis > across all 6 Groups. Is there something wrong with my formula. But I think I > have made a mistake in the formula rather than anything else. > > > > > APPENDIX 1 > > datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, > -4.60517, 3.003749, -4.60517, > 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, > -4.60517, > -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, > -4.60517, 2.39127, > 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, > -4.60517, 2.121427, 1.973118, > -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, > 0.023703, -4.60517, > 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, > 1.371895, 1.533227) > > "zzzanova" <- > structure(list(Intensity = c(t(Samp1), t(Samp2), t(Samp3), t(Samp4)), > Group = structure(c(1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4,4,4,4, > 5,5,5,5,5,5,5,5,5, > 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", > "Group4", "Group5", "Group6"), class = "factor"), > Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) > )) > , .Names = c("Intensity", > "Group", "Sample"), row.names = > c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", > "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", > "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", > "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", > "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", > "51", "52", "53", "54"),class = "data.frame") > > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculation on series with different time-steps
Look at ?ifelse en ?abs, eg : data_frame$new_column_in_dataframe <- ifelse(stage$julian_day == baro$julian_day & abs(stage$time - baro$hour) <= 30, stage$stage.cm - baro$pressure, NA ) Cheers Joris On Thu, Jul 1, 2010 at 10:09 PM, Jeana Lee wrote: > Hello, > > I have two series, one with stream stage measurements every 5 minutes, and > the other with barometric pressure measurements every hour. I want to > subtract each barometric pressure measurement from the 12 stage measurements > closest in time to it (6 stage measurements on either side of the hour). > > I want to do something like the following, but I don't know the syntax. > > "If the Julian day of the stage measurement is equal to the Julian day of > the pressure measurement, AND the absolute value of the difference between > the time of the stage measurement and the hour of the pressure measurement > is less than or equal to 30 minutes, then subtract the pressure measurement > from the stage measurement (and put it in a new column in the stage data > frame)." > > if ( stage$julian_day = baro$julian_day & |stage$time - > baro$hour| <= 30 ) > then (stage$stage.cm - baro$pressure) > > Can you help me? > > Thanks, > JL > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] timeseries
Difficult to guess why, but I reckon you should use ts() instead of as.ts. Otherwise set the tsp-attribute correctly. Eg : > x <- cumsum(1+round(rnorm(20),2)) > as.ts(x) Time Series: Start = 1 End = 20 Frequency = 1 [1] 0.87 3.51 4.08 4.20 3.25 4.63 6.30 6.89 9.28 9.93 10.19 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47 > tseries <- ts(x,frequency=12,start=c(2004,3)) > tseries Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2004 0.87 3.51 4.08 4.20 3.25 4.63 6.30 6.89 9.28 9.93 2005 10.19 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47 > tsp(x) <- c(2004+2/12,2005.75,12) > as.ts(x) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2004 0.87 3.51 4.08 4.20 3.25 4.63 6.30 6.89 9.28 9.93 2005 10.19 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47 See ?ts to get the options right. I suggest to use the function ts() instead of assigning the tsp attribute. ) Cheers Joris On Mon, Jul 5, 2010 at 9:35 AM, nuncio m wrote: > Dear useRs, > I am trying to construct a time series using as.ts function, surprisingly > when I plot > the data the x axis do not show the time in years, however if I use > ts(data), time in years are shown in the > x axis. Why such difference in the results of both the commands > Thanks > nuncio > > > -- > Nuncio.M > Research Scientist > National Center for Antarctic and Ocean research > Head land Sada > Vasco da Gamma > Goa-403804 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] acf
Your question is a bit confusing. "acfresidfit" is an object, of which we don't know the origin. with your test file, I arrive at the first correlations (but with integer headings) : > residfit <- read.table("fileree2_test_out.txt") > acf(residfit) > acfresid <- acf(residfit) > acfresid Autocorrelations of series ‘residfit’, by lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1.000 -0.015 0.010 0.099 0.048 -0.014 -0.039 -0.019 0.040 0.018 0.042 0.078 -0.029 0.028 -0.016 -0.021 -0.109 17 18 19 20 21 22 23 24 25 0.000 -0.038 -0.006 0.015 -0.032 -0.002 0.014 -0.226 -0.030 Could you please check where the object acfresidfit is coming from and how you generated it? Cheers Joris On Tue, Jul 6, 2010 at 9:47 AM, nuncio m wrote: > Hi list, > I have the following code to compute the acf of a time series > acfresid <- acf(residfit), where residfit is the series > when I type acfresid at the prompt the follwoing is displayed > > Autocorrelations of series ‘residfit’, by lag > > 0. 0.0833 0.1667 0.2500 0. 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 > > 1.000 -0.015 0.010 0.099 0.048 -0.014 -0.039 -0.019 0.040 0.018 0.042 > > 0.9167 1. 1.0833 1.1667 1.2500 1. 1.4167 1.5000 1.5833 1.6667 1.7500 > > 0.078 -0.029 0.028 -0.016 -0.021 -0.109 0.000 -0.038 -0.006 0.015 -0.032 > > 1.8333 1.9167 2. 2.0833 > -0.002 0.014 -0.226 -0.030 > Residfit is a timeseries object at monthly interval (0.0833), Here I > understand R computed the correlation at lags 0 to 2 years. > > What is surprising to me is > if I type acfresidfit at the prompt the following is displayed > > Autocorrelations of series ‘residfit’, by lag > > 0 1 2 3 4 5 6 7 8 9 10 > > 1.000 -0.004 0.011 0.041 -0.056 0.019 -0.052 -0.027 -0.008 -0.012 -0.034 > > 11 12 13 14 15 16 17 18 19 20 21 > > 0.024 -0.005 0.006 -0.045 0.031 -0.035 -0.011 -0.021 -0.020 -0.010 -0.007 > > 22 23 24 25 > -0.038 0.017 0.051 0.038 > >From the header I understand both are autocorrelation computed at the same > lags. but the correlations are different > > where am I going wrong and which is the correct one. > > file residfit is also attached(filename-fileree2_test_out.txt) > Thanks > nuncio > -- > Nuncio.M > Research Scientist > National Center for Antarctic and Ocean research > Head land Sada > Vasco da Gamma > Goa-403804 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4 classes and debugging - Is there a summary?
On Fri, Jul 2, 2010 at 4:01 PM, Martin Morgan wrote: > > selectMethod(xyValues, c('RasterLayer', 'matrix')) > > would be my choice. > Thanks, that's a discovery. I guess I misunderstood the help pages on that one. > > I don't really have the right experience, but Chamber's 2008 Software > for Data Analysis... and Gentleman's 2008 R Programming for > Bioinformatics... books would be where I'd start. ?Methods and ?Classes > are I think under-used. I did read the posting guide ;-) but I have to admit the help pages are not always that clear. Which is one of the reasons why S4 gets me frustrated so easily. I also have Chamber's book and Gentleman's book is here on the department as well. I'll go again through the relevant chapters. Thanks Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4 classes and debugging - Is there a summary?
Correction: trace off course works when specifying the signature. so eg: trace(xyValues,tracer=browser,signature=c(object="RasterLayer", xy="matrix")) allows me to browse through the function. Should have specified the signature, I overlooked that. Still, if there's a manual on S4 that anybody likes to mention, please go ahead. I find bits and pieces on the internet, but haven't found an overview yet dealing with all peculiarities of these classes. Cheers Joris On Fri, Jul 2, 2010 at 2:05 PM, Joris Meys wrote: > Dear all, > > I'm getting more and more frustrated with the whole S4 thing and I'm > looking for a more advanced summary on how to deal with them. Often I > get error messages that don't make sense at all, or the code is not > doing what I think it would do. Far too often inspecting the code > requires me to go to the source, which doesn't really help in easily > finding the bit of code you're interested in. > > Getting the code with getAnywhere() doesn't always work. Debug() > doesn't work. Using trace() and browser() is not an option, as I can't > find the correct method. > > eg : > library(raster) >> getAnywhere(xyValues) > A single object matching ‘xyValues’ was found > It was found in the following places > package:raster > namespace:raster > with value > > standardGeneric for "xyValues" defined from package "raster" > > function (object, xy, ...) > standardGeneric("xyValues") > > Methods may be defined for arguments: object, xy > Use showMethods("xyValues") for currently available ones. >> showMethods("xyValues") > Function: xyValues (package raster) > object="Raster", xy="data.frame" > object="Raster", xy="SpatialPoints" > object="Raster", xy="vector" > object="RasterLayer", xy="matrix" > object="RasterStackBrick", xy="matrix" > > And now...? > > Is there an overview that actually explains how you get the > information you're looking for without strolling through the complete > source? Sorry if I sound frustrated, but this is costing me huge > amounts of time, to that extent that I rather write a custom function > than use one in an S4 package if I'm not absolutely sure about what it > does and how it achieves it. > > Cheers > Joris > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] S4 classes and debugging - Is there a summary?
Dear all, I'm getting more and more frustrated with the whole S4 thing and I'm looking for a more advanced summary on how to deal with them. Often I get error messages that don't make sense at all, or the code is not doing what I think it would do. Far too often inspecting the code requires me to go to the source, which doesn't really help in easily finding the bit of code you're interested in. Getting the code with getAnywhere() doesn't always work. Debug() doesn't work. Using trace() and browser() is not an option, as I can't find the correct method. eg : library(raster) > getAnywhere(xyValues) A single object matching ‘xyValues’ was found It was found in the following places package:raster namespace:raster with value standardGeneric for "xyValues" defined from package "raster" function (object, xy, ...) standardGeneric("xyValues") Methods may be defined for arguments: object, xy Use showMethods("xyValues") for currently available ones. > showMethods("xyValues") Function: xyValues (package raster) object="Raster", xy="data.frame" object="Raster", xy="SpatialPoints" object="Raster", xy="vector" object="RasterLayer", xy="matrix" object="RasterStackBrick", xy="matrix" And now...? Is there an overview that actually explains how you get the information you're looking for without strolling through the complete source? Sorry if I sound frustrated, but this is costing me huge amounts of time, to that extent that I rather write a custom function than use one in an S4 package if I'm not absolutely sure about what it does and how it achieves it. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
First of all, read the posting guide carefully : http://www.R-project.org/posting-guide.html Your question is far from clear. When you say that the lengths of P and Q are different, you mean the length of the data or the difference between start and end? That makes a world of difference. Regarding the statistical test, that depends on what your data represents. Is it possible for P to fall close to the left and the right : P- Q --- For example. You should also specify which test you want to use. Then people on the list will be able to tell you whether that is available in R. You can off course construct your own test with the tools R provides, but again, this requires a lot more information. Next to that, the list is actually not intended for statistical advice, but for advice regarding R code. Maybe somebody will join in with some statistical guidance, but if you don't know what to do, you better consult a statistician at your departement. Cheers Joris On Thu, Jul 1, 2010 at 1:53 PM, ravikumar sukumar wrote: > Dear all, > I am a biologist. I have two sets of distance P(start1, end1) and Q(start2, > end2). > The distance will be like this. > P > Q > > I want to know whether P falls closely to the right end or left end of Q. > P and Q are of different lengths for each data point. There are more than > 1 pairs of P and Q. > Is there any test or function in R to bring a statistically significant > conclusion. > > Thanks for all, > Suku > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] run R
Welcome to a new world. You're using the _programming language_ R with thousands of packages, and first of all you should be reading the introduction material thoroughly : http://cran.r-project.org/doc/manuals/R-intro.pdf http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf Regarding your question : Check the help files of source : ?source if you have a script say foo.R in a folder c:\bar, do : source("c:/bar/foo.R") mind the slashes instead of the backslashes. I assume you're on a Windows system... Cheers Joris On Wed, Jun 30, 2010 at 3:08 PM, wrote: > > > Hi, > > > I'm starting use the R Package and I have some .R scripts. How > can I run these .R scripts. > > > Conrado > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use of processor by R 32bit on a 64bit machine
*slaps forehead* Thanks. So out it goes, that hyperthreading. Who invented hyperthreading on a quad-core anyway? Cheers Joris 2010/6/29 Uwe Ligges : > > > On 29.06.2010 15:30, Joris Meys wrote: >> >> Dear all, >> >> I've recently purchased a new 64bit system with an intel i7 quadcore >> processor. As I understood (maybe wrongly) that to date the 32bit >> version of R is more stable than the 64bit, I installed the 32bit >> version and am happily using it ever since. Now I'm running a whole >> lot of models, which goes smoothly, and I thought out of curiosity to >> check how much processor I'm using. I would have thought I used 25% >> (being one core), as on my old dual core R uses 50% of the total >> processor capacity. Funny, it turns out that R is currently using only >> 12-13% of my cpu, which is about half of what I expected. >> > > An Intel Core i7 Quadcore has 8 virtual cores since it supports > hyperthreading. R uses one of these virtual cores. Note that 2 virtual cores > won't be twice as fast since they are running on the same physical core. > Hence this is expected. > > Uwe Ligges > > > >> Did I miss something somewhere? Should I change some settings? I'm >> running on a Windows 7 enterprise. I looked around already, but I have >> the feeling I overlooked something. >> >> Cheers >> Joris >> >> sessionInfo() >> R version 2.10.1 (2009-12-14) >> i386-pc-mingw32 >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> attached base packages: >> [1] grDevices datasets splines graphics stats tcltk utils >> methods base >> >> other attached packages: >> [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0 Hmisc_3.7-0 >> survival_2.35-7 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.12.3 grid_2.10.1 lattice_0.18-3 svMisc_0.9-57 >> tools_2.10.1 >> >> > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use of processor by R 32bit on a 64bit machine
Dear all, I've recently purchased a new 64bit system with an intel i7 quadcore processor. As I understood (maybe wrongly) that to date the 32bit version of R is more stable than the 64bit, I installed the 32bit version and am happily using it ever since. Now I'm running a whole lot of models, which goes smoothly, and I thought out of curiosity to check how much processor I'm using. I would have thought I used 25% (being one core), as on my old dual core R uses 50% of the total processor capacity. Funny, it turns out that R is currently using only 12-13% of my cpu, which is about half of what I expected. Did I miss something somewhere? Should I change some settings? I'm running on a Windows 7 enterprise. I looked around already, but I have the feeling I overlooked something. Cheers Joris sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0 survival_2.35-7 loaded via a namespace (and not attached): [1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57 tools_2.10.1 -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (New) Popularity of R, SAS, SPSS, Stata...
Dear Robert, I've tried to acces that link, but to no prevail. Seems the server r4stats.com is down, as he doesn't respond. This link got me to the site : http://sites.google.com/site/r4statistics/popularity Cheers Joris On Mon, Jun 28, 2010 at 3:52 PM, Muenchen, Robert A (Bob) wrote: > Greeting Listserv Readers, > > At http://r4stats.com/popularity I have added plots, data, and/or > discussion of: > > 1. Scholarly impact of each package across the years > 2. The number of subscribers to some of the listservs > 3. How popular each package is among Google searches across the years > 4. Survey results from a Rexer Analytics poll > 5. Survey results from a KDnuggests poll > 6. A rudimentary analysis of the software skills that employers are > seeking > > Thanks very much to all the folks who helped on this project including: > John Fox, Marc Schwartz, Duncan Murdoch, Martin Weiss, John (Jiangtang) > HU, Andre Wielki, Kjetil Halvorsen, Dario Solari, Joris Meys, Keo > Ormsby, Karl Rexer, and Gregory Piatetsky-Shapiro. > > If anyone can think of other angles, please let me know. > > Cheers, > Bob > > = > Bob Muenchen (pronounced Min'-chen), Manager > Research Computing Support > Voice: (865) 974-5230 > Email: muenc...@utk.edu > Web: http://oit.utk.edu/research, > News: http://oit.utk.edu/research/news.php > Feedback: http://oit.utk.edu/feedback/ > = > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different standard errors from R and other software
If I understand correctly from their website, "discrete choice" models are mostly generalized linear models with the common link functions for discrete data? Apart from a few names I didn't recognize, all analyses seem quite "standard" to me. So I wonder why you would write the log-likelihood yourself for techniques that are implemented in R. Unless I missed something pretty important, or you want to do a specific analysis that wasn't clear to me, you should take a closer look at the possibilities in R for generalized linear (mixed) modelling and so on. Binary choice translates to a simple glm with a logit function. Multinomial choice can be done with eg. multinom() from nnet. Ordered choice can be done with polr() from the MASS package. A nice one to look at is the package mgcv or gamm4 in case of big datasets. They offer very flexible models that can include random terms, specific variance-covariance structures and non-linear relations in the form of splines. Apologies if this is all obvious and known to you. In that case you might want to specify what exactly it is you are comparing and how exactly you calculated it yourself. Cheers Joris On Fri, Jun 25, 2010 at 11:47 PM, Min Chen wrote: > Hi all, > > Sorry to bother you. I'm estimating a discrete choice model in R using > the maxBFGS command. Since I wrote the log-likelihood myself, in order to > double check, I run the same model in Limdep. It turns out that the > coefficient estimates are quite close; however, the standard errors are very > different. I also computed the hessian and outer product of the gradients in > R using the numDeriv package, but the results are still very different from > those in Limdep. Is it the routine to compute the inverse hessian that > causes the difference? Thank you very much! > > Best wishes. > > > Min > > > -- > Min Chen > Ph.D. Candidate > Department of Agricultural, Food, and Resource Economics > 125 Cook Hall > Michigan State University > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
>>I had taken the opposite tack with Google Trends by subtracting > keywords >>like: >>SAS -shoes -airlines -sonar... >>but never got as good results as that beautiful "X code for" search. >>When you see the end-of-semester panic bumps in traffic, you know > you're >>nailing it! > > I have to eat those words already. The "R code for" search that showed a > peak every December did not have quotes around it, so it was searching > for those three words not the complete phrase. When you add the quotes, > the peaks vanish. Don't swallow! You're looking through search terms, not through web pages. R code for regression, regression code R etc. are all valid searches, no quotation marks needed. http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20SAS%2Ccode%20for%20SPSS%2Ccode%20for%20matlab&cmpt=q This one is nice too. You can see that the bump in the autumn semester for R is replacing the one for Matlab. Then in the spring semester Matlab stays high but R drops. And both the US and India always have a very large search index, whereas the rest of the world is essentially worthless. Which leads me to the conclusion that : 1) The results are probably coming from google.com, excluding local versions, and 2) in the US (and India), statistics is mainly taught in the autumn semester. Given the fact that daylight has a beneficial effect on the emotional well being, the impopularity of statistics is likely caused by unfortunate scheduling. Forget Excel. Google rocks! ;-) Cheers Joris > > Once you go the phrase route, you gain precision but end up with zero > counts on various phrases. I avoided that by combining them with "+" to > get enough to plot. The resulting graph shows SAS dominant until > mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in > order: > > http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20m > anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%22 > %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22sp > ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22sp > ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22sta > ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-plus%20code%20for%22%2B%22 > s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-plus%20graph%22&cmpt=q > > This might be a good one to add to http://r4stats.com/popularity > > Bob > >> >>I see that there's a car, the R Code Mustang, that adding "for" gets > rid >>of. >> >>Thanks for getting me back on a topic that I had given up on! >> >>Bob >> >>>-Original Message- >>>From: r-help-boun...@r-project.org >>[mailto:r-help-boun...@r-project.org] >>>On Behalf Of Joris Meys >>>Sent: Thursday, June 24, 2010 7:56 PM >>>To: Dario Solari >>>Cc: r-help@r-project.org >>>Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... >>> >>>Nice idea, but quite sensitive to search terms, if you compare your >>>result on "... code" with "... code for": >>>http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%2 > 0 >>f >>>or%2Cspss%20code%20for&cmpt=q >>> >>>On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari > >>>wrote: >>>> First: excuse for my english >>>> >>>> My opinion: a useful font for measuring "popoularity" can be Google >>>> Insights for Search - http://www.google.com/insights/search/# >>>> >>>> Every person using a software like R, SAS, SPSS needs first to learn >>>> it. So probably he make a web-search for a manual, a tutorial, a >>>> guide. One can measure the share of this kind of serach query. >>>> This kind of results can be useful to determine trends of >>>> "popularity". >>>> >>>> Example 1: "R tutorial/manual/guide", "SAS tutorial/manual/guide", >>>> "SPSS tutorial/manual/guide" >>>> >>>http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20m > a >>n >>>ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22% > 2 >>B >>>%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22s > a >>s >>>%20manual%22%2B%22sas%20guide%22&cmpt=q >>>> >>>> Example 2: "R software", "SAS software", "SPSS software" >>>> >>>http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss% > 2 >>0 >
Re: [R] All a column to a data frame with a specific condition
merge(sum_plan_a,data,by="plan_a") Cheers Joris On Sat, Jun 26, 2010 at 2:27 AM, Yi wrote: > Hi, folks, > > Please first look at the codes: > > plan_a=c('apple','orange','apple','apple','pear','bread') > plan_b=c('bread','bread','orange','bread','bread','yogurt') > value=1:6 > data=data.frame(plan_a,plan_b,value) > library(plyr) > library(reshape) > mm=melt(data, id=c('plan_a','plan_b')) > sum_plan_a=cast(mm,plan_a~variable,sum) > > ### I would like to add a new column to the data.frame named 'data', with > the same sum of value for the same type of plan_a > ### The result should come up like this: > > plan_a plan_b value sum_plan_a > 1 apple bread 1 8 > 2 orange bread 2 2 > 3 apple orange 3 8 > 4 apple bread 4 8 > 5 pear bread 5 5 > 6 bread yogurt 6 6 > > Any tips? > > Thank you. > > [[alternative HTML version deleted]] > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average 2 Columns when possible, or return available value
Just want to add that if you want to clean out the NA rows in a matrix or data frame, take a look at ?complete.cases. Can be handy to use with big datasets. I got curious, so I just ran the codes given here on a big dataset, before and after removing NA rows. I have to be honest, this is surely an illustration of the power of rowMeans. I'm amazed myself. DF <- data.frame( A=rep(DF$A,1), B=rep(DF$B,1) ) > system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 13.260.06 13.46 > system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.030.000.03 > system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape > DF <- DF[complete.cases(DF),] > system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 0.390.000.39 > system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.010.000.02 > system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) user system elapsed 10.010.07 13.40 Cheers Joris On Sat, Jun 26, 2010 at 1:08 AM, emorway wrote: > > Forum, > > Using the following data: > > DF<-read.table(textConnection("A B > 22.60 NA > NA NA > NA NA > NA NA > NA NA > NA NA > NA NA > NA NA > 102.00 NA > 19.20 NA > 19.20 NA > NA NA > NA NA > NA NA > 11.80 NA > 7.62 NA > NA NA > NA NA > NA NA > NA NA > NA NA > 75.00 NA > NA NA > 18.30 18.2 > NA NA > NA NA > 8.44 NA > 18.00 NA > NA NA > 12.90 NA"),header=T) > closeAllConnections() > > The second column is a duplicate reading of the first column, and when two > values are available, I would like to average column 1 and 2 (example code > below). But if there is only one reading, I would like to retain it, but I > haven't found a good way to exclude NA's using the following code: > > t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) > > Currently, row 24 is the only row with a returned value. I'd like the > result to return column "A" if it is the only available value, and average > where possible. Of course, if both columns are NA, NA is the only possible > result. > > The result I'm after would look like this (row 24 is an avg): > > 22.60 > NA > NA > NA > NA > NA > NA > NA > 102.00 > 19.20 > 19.20 > NA > NA > NA > 11.80 > 7.62 > NA > NA > NA > NA > NA > 75.00 > NA > 18.25 > NA > NA > 8.44 > 18.00 > NA > 12.90 > > This is a small example from a much larger data frame, so if you're > wondering what the deal is with list(), that will come into play for the > larger problem I'm trying to solve. > > Respectfully, > Eric > -- > View this message in context: > http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
Thanks for the link, very interesting book. Yet, I couldn't find the part about EDMA. It would have surprised me anyway, as the input of multidimensional scaling is one matrix with euclidean distances between your observations, whereas in EDMA the data consist of a number of distance matrices. Quite a different thing if you ask me. Neither cmdscale nor isoMDS or its derivated functions (eg metaMDS in the vegan package) are going to be of any help. Now I come to think of it, vegan has a procrustes function, but I'm not sure if it is generalized to be of use in EDMA. Cheers Joris On Fri, Jun 25, 2010 at 7:42 PM, Kjetil Halvorsen wrote: > There is a freely downloadable and very relevant (& readable) book at > https://ccrma.stanford.edu/~dattorro/mybook.html > "Convex Optimization and Euclidean Distance geometry", and it indeed names > EDMA > as a form of multidimensional scaling (or maybe in the oposite way). > You should have a look > at the codes for multidimensional scaling in R. > > Kjetil > > On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu > wrote: >> >> thanks for your interests Joris >> >> >> >> Gokhan OCAKOGLU >> Uludag University >> Faculty of Medicine >> Department of Biostatistics >> http://www20.uludag.edu.tr/~biostat/ocakoglui.htm >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
2010/6/25 Frank E Harrell Jr : > The central limit theorem doesn't help. It just addresses type I error, > not power. > > Frank I don't think I stated otherwise. I am aware of the fact that the wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared to the t-test in case of skewed distributions. Apologies if I caused more confusion. The "problem" with the wilcoxon is twofold as far as I understood this data correctly : - there are quite some ties - the wilcoxon assumes under the null that the distributions are the same, not only the location. The influence of unequal variances and/or shapes of the distribution is enhanced in the case of unequal sample sizes. The central limit theory makes that : - the t-test will do correct inference in the presence of ties - unequal variances can be taken into account using the modified t-test, both in the case of equal and unequal sample sizes For these reasons, I would personally use the t-test for comparing two samples from the described population. Your mileage may vary. Cheers Joris > > On 06/25/2010 04:29 AM, Joris Meys wrote: >> As a remark on your histogram : use less breaks! This histogram tells >> you nothing. An interesting function is ?density , eg : >> >> x<-rnorm(250) >> hist(x,freq=F) >> lines(density(x),col="red") >> >> See also this ppt, a very nice and short introduction to graphics in R : >> http://csg.sph.umich.edu/docs/R/graphics-1.pdf >> >> 2010/6/25 Atte Tenkanen: >>> Is there anything for me? >>> >>> There is a lot of data, n=2418, but there are also a lot of ties. >>> My sample n≈250-300 >> >> You should think about the central limit theorem. Actually, you can >> just use a t-test to compare means, as with those sample sizes the >> mean is almost certainly normally distributed. >>> >>> i would like to test, whether the mean of the sample differ significantly >>> from the population mean. >>> >> According to probability theory, this will be in 5% of the cases if >> you repeat your sampling infinitly. But as David asked: why on earth >> do you want to test that? >> >> cheers >> Joris >> > > > -- > Frank E Harrell Jr Professor and ChairmanSchool of Medicine > Department of Biostatistics Vanderbilt University > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] i want create script
Please read the posting guide : http://www.R-project.org/posting-guide.html Your question is very vague. One could assume you're completely new to R and want the commands to read a csv file (see ?read.csv), and to write away a table (eg ?write.table to write your predicted data in a text format). My guess is you want to run this script in the shell without having to open R, similar to a perl scipt. For this, take a look at: http://cran.r-project.org/doc/manuals/R-intro.html#Scripting-with-R http://projects.uabgrid.uab.edu/r-group/wiki/CommandLineProcessing Cheers Joris On Fri, Jun 25, 2010 at 8:26 AM, vijaysheegi wrote: > > Hi R community, > I want to create a script which will take the .csv table as input and do > some prediction and output should be returned to some file.Inputs is exel > sheet containing some tables of data.out should be table of predicted > data.Will some one help me in this regards... > Thanks in advance. > > I am using Windows R.Please advise proccedure to create Rscript. > > > Regards > - > Vijay > Research student > Bangalore > India > -- > View this message in context: > http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimizing given two vectors of data
Optim uses vectors of _parameters_, not of data. You add a (likelihood) function, give initial values of the parameters, and get the optimized parameters back. See ?optim and the examples therein. It contains an example for optimization using multiple data columns. Cheers Joris On Fri, Jun 25, 2010 at 8:12 AM, confusedSoul wrote: > > I am trying to estimate an Arrhenius-exponential model in R. I have one > vector of data containing failure times, and another containing > corresponding temperatures. I am trying to optimize a maximum likelihood > function given BOTH these vectors. However, the optim command takes only > one such vector parameter. > > How can I pass both vectors into the function? > -- > View this message in context: > http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
I've been looking around myself, but I couldn't find any. Maybe somebody will chime in to direct you to the correct places. I also checked the papers, and it seems not too hard to implement. If I find some time, I'll take a look at it next week. For the other two gentlemen, check: http://www.getahead.psu.edu/PDF/EuclideanDistanceMatrixAnalysis.pdf http://www.getahead.psu.edu/PDF/no.1.pdf Cheers Joris On Fri, Jun 25, 2010 at 8:30 AM, gokhanocakoglu wrote: > > In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate > free approach to the analysis of form using landmark data which was > developed by Subhash Lele and Joan Richstmeier. They also developed a > computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow > to perform several techniques including EDMA I-II but I wonder is there a > package or code available in R to perform EDMA... > > thanks > -- > View this message in context: > http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x<-rnorm(250) hist(x,freq=F) lines(density(x),col="red") See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanen : > Is there anything for me? > > There is a lot of data, n=2418, but there are also a lot of ties. > My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. > > i would like to test, whether the mean of the sample differ significantly > from the population mean. > According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Sorry, missed the two variable thing. Go with the lm solution then, and you can tweak the plot yourself (the confidence intervals are easily obtained via predict(lm.object, interval="prediction") ). The function qq.plot uses robust regression, but in your case normal regression will do. Regarding the shapes : this just indicates both tails are shorter than expected, so you have a kurtosis greater than 3 (or positive, depending whether you do the correction or not) Cheers Joris On Fri, Jun 25, 2010 at 4:10 AM, Ralf B wrote: > Short rep: I have two distributions, data and data2; each build from > about 3 million data points; they appear similar when looking at > densities and histograms. I plotted qqplots for further eye-balling: > > qqplot(data, data2, xlab = "1", ylab = "2") > > and get an almost perfect diagonal line which means they are in fact > very alike. Now I tried to check normality using qqnorm -- and I think > I am doing something wrong here: > > qqnorm(data, main = "Q-Q normality plot for 1") > qqnorm(data2, main = "Q-Q normality plot for 2") > > I am getting perfect S-shaped curves (??) for both distributions. Am I > something missing here? > > | > | * * * * > | * > | * > | * > | * > | * > | * > | * * * > |- > > Thanks, Ralf > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write a loop for tallies
It would help if you placed r <- 0; s <- 0 etc. outside the loop. Same goes for cat(...). And get rid of the sum(r), sum(s) and so on, that's doing nothing (r,s,... are single numbers) This said : See Peter Langfelder's response. Cheers Joris > # see ?table for a better approach > r<-0 > s<-0 > t<-0 > u<-0 > v<-0 > > a<- for (i in 1:length(n)) { > ifelse(n[i] == "3000", r <- r+1, > ifelse(n[i] == "4000", s <- r+1, > ifelse(n[i] == "5000", t <- r+1, > ifelse(n[i] == "6000", u <- r+1, > ifelse(n[i] == "7000", v <- r+1, NA))))) > } > cat("r = ", r, "\n") > cat("s = ", s, "\n") > cat("t = ", t, "\n") > cat("u = ", u, "\n") > cat("v = ", v, "\n") -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
dangit, tab in the way... On Fri, Jun 25, 2010 at 1:56 AM, Joris Meys wrote: > Nice idea, but quite sensitive to search terms, if you compare your > result on "... code" with "... code for": > http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20for&cmpt=q This one actually shows quite nicely when students start panicking for their projects. Cheers Joris > On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari wrote: >> First: excuse for my english >> >> My opinion: a useful font for measuring "popoularity" can be Google >> Insights for Search - http://www.google.com/insights/search/# >> >> Every person using a software like R, SAS, SPSS needs first to learn >> it. So probably he make a web-search for a manual, a tutorial, a >> guide. One can measure the share of this kind of serach query. >> This kind of results can be useful to determine trends of >> "popularity". >> >> Example 1: "R tutorial/manual/guide", "SAS tutorial/manual/guide", >> "SPSS tutorial/manual/guide" >> http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22&cmpt=q >> >> Example 2: "R software", "SAS software", "SPSS software" >> http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22&cmpt=q >> >> Example 3: "R code", "SAS code", "SPSS code" >> http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22&cmpt=q >> >> Example 4: "R graph", "SAS graph", "SPSS graph" >> http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22&cmpt=q >> >> Example 5: "R regression", "SAS regression", "SPSS regression" >> http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22&cmpt=q >> >> Some example are cross-software (learning needs - Example1), other can >> be biased by the tarditional use of that software (in SPSS usually you >> don't manipulate graph, i think) >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
Nice idea, but quite sensitive to search terms, if you compare your result on "... code" with "... code for": http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20for&cmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari wrote: > First: excuse for my english > > My opinion: a useful font for measuring "popoularity" can be Google > Insights for Search - http://www.google.com/insights/search/# > > Every person using a software like R, SAS, SPSS needs first to learn > it. So probably he make a web-search for a manual, a tutorial, a > guide. One can measure the share of this kind of serach query. > This kind of results can be useful to determine trends of > "popularity". > > Example 1: "R tutorial/manual/guide", "SAS tutorial/manual/guide", > "SPSS tutorial/manual/guide" > http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22&cmpt=q > > Example 2: "R software", "SAS software", "SPSS software" > http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22&cmpt=q > > Example 3: "R code", "SAS code", "SPSS code" > http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22&cmpt=q > > Example 4: "R graph", "SAS graph", "SPSS graph" > http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22&cmpt=q > > Example 5: "R regression", "SAS regression", "SPSS regression" > http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22&cmpt=q > > Some example are cross-software (learning needs - Example1), other can > be biased by the tarditional use of that software (in SPSS usually you > don't manipulate graph, i think) > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install package automatically if not there?
If you want to make changes more permanent, you should take a look at the "Rprofile.site" file. This one gets loaded in R at the startup of the console. You can set the CRAN there if you want too. Cheers Joris On Fri, Jun 25, 2010 at 1:32 AM, Joshua Wiley wrote: > Hello Ralf, > > Glad it works for you. As far as avoiding any prompting if packages > are out-of-date; I am not sure. It honestly seems like a risky idea > to me to have old packages being overwritten without a user knowing. > However, I added a few lines of code here to set the CRAN mirror, and > revert it back to whatever it was when the function ends to make it as > inobtrusive as possible. > > > load.fun <- function(x) { > old.repos <- getOption("repos") > on.exit(options(repos = old.repos)) #this resets the repos option > when the function exits > new.repos <- old.repos > new.repos["CRAN"] <- "http://cran.stat.ucla.edu"; #set your favorite > CRAN Mirror here > options(repos = new.repos) > x <- as.character(substitute(x)) > if(isTRUE(x %in% .packages(all.available=TRUE))) { > eval(parse(text=paste("require(", x, ")", sep=""))) > } else { > update.packages() # recommended before installing so that > dependencies are the latest version > eval(parse(text=paste("install.packages('", x, "')", sep=""))) > eval(parse(text=paste("require(", x, ")", sep=""))) > } > } > > Best regards, > > Josh > > On Thu, Jun 24, 2010 at 3:34 PM, Ralf B wrote: >> Thanks, works great. >> >> By the way, is there a say to even go further, and avoid prompting >> (e.g. picking a particiular server by default and updating without >> further prompt) ? I could understand if such a feature does not exist for >> security reasons... >> >> Thanks again, >> Ralf >> >> On Thu, Jun 24, 2010 at 5:12 PM, Joshua Wiley wrote: >>> On Thu, Jun 24, 2010 at 1:51 PM, Joshua Wiley >>> wrote: >>>> Hello Ralf, >>>> >>>> This is a little function that you may find helpful. If the package >>>> is already installed, it just loads it, otherwise it updates the >>>> existing packages and then installs the required package. As in >>>> require(), 'x' does not need to be quoted. >>>> >>>> load.fun <- function(x) { >>>> x <- as.character(substitute(x)) >>>> if(isTRUE(x %in% .packages(all.available=TRUE))) { >>>> eval(parse(text=paste("require(", x, ")", sep=""))) >>>> } else { >>>> update.packages() # recommended before installing so that >>>> dependencies are the latest version >>>> eval(parse(text=paste("install.packages('", x, "')", sep=""))) >>>> } >>>> } >>> >>> I miscopied the last line; it should be >>> >>> ### >>> load.fun <- function(x) { >>> x <- as.character(substitute(x)) >>> if(isTRUE(x %in% .packages(all.available=TRUE))) { >>> eval(parse(text=paste("require(", x, ")", sep=""))) >>> } else { >>> update.packages() # recommended before installing so that >>> dependencies are the latest version >>> eval(parse(text=paste("install.packages('", x, "')", sep=""))) >>> eval(parse(text=paste("require(", x, ")", sep=""))) >>> } >>> } >>> ### >>> >>>> >>>> HTH, >>>> >>>> Josh >>>> >>>> On Thu, Jun 24, 2010 at 12:25 PM, Ralf B wrote: >>>>> Hi fans, >>>>> >>>>> is it possible for a script to check if a library has been installed? >>>>> I want to automatically install it if it is missing to avoid scripts >>>>> to crash when running on a new machine... >>>>> >>>>> Ralf >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> Joshua Wiley >>>> Ph.D. Student >>>> Health Psychology >>>> University of California, Los Angeles >>>> >>> >>> >>> >>> -- >>> Joshua Wiley >>> Ph.D. Student >>> Health Psychology >>> University of California, Los Angeles >>> >> > > > > -- > Joshua Wiley > Ph.D. Student > Health Psychology > University of California, Los Angeles > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install package automatically if not there?
I've never seen R been Perl'd this nice before. On Thu, Jun 24, 2010 at 10:32 PM, Albert-Jan Roskam wrote: > require(pkg) || install.packages(pkg) > > Cheers!! > > Albert-Jan > > > > ~~ > > All right, but apart from the sanitation, the medicine, education, wine, > public order, irrigation, roads, a fresh water system, and public health, > what have the Romans ever done for us? > > ~~ > > --- On Thu, 6/24/10, Ralf B wrote: > > From: Ralf B > Subject: [R] Install package automatically if not there? > To: "r-help@r-project.org" > Date: Thursday, June 24, 2010, 9:25 PM > > Hi fans, > > is it possible for a script to check if a library has been installed? > I want to automatically install it if it is missing to avoid scripts > to crash when running on a new machine... > > Ralf > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Also take a look at qq.plot in the package "car". Gives you exactly what you want. Cheers Joris On Fri, Jun 25, 2010 at 12:55 AM, Ralf B wrote: > More details... > > I have two distributions which are very similar. I have plotted > density plots already from the two distributions. In addition, > I created a qqplot that show an almost straight line. What I want is a > line that represents the ideal case in which the two > distributions match perfectly. I would use this line to see how much > the errors divert at different stages of the plot. > > Ralf > > > > On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick wrote: >> You are going to have to define the question a little better. Also, >> please provide a reproducible example. >> >> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B wrote: >>> I am a beginner in R, so please don't step on me if this is too >>> simple. I have two data sets datax and datay for which I created a >>> qqplot >>> >>> qqplot(datax,datay) >>> >>> but now I want a line that indicates the perfect match so that I can >>> see how much the plot diverts from the ideal. This ideal however is >>> not normal, so I think qqnorm and qqline cannot be applied. >>> >>> Perhaps you can help? >>> >>> Ralf >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Stephen Sefick >> >> | Auburn University | >> | Department of Biological Sciences | >> | 331 Funchess Hall | >> | Auburn, Alabama | >> | 36849 | >> |___| >> | sas0...@auburn.edu | >> | http://www.auburn.edu/~sas0025 | >> |___| >> >> Let's not spend our time and resources thinking about things that are >> so little or so large that all they really do for us is puff us up and >> make us feel like gods. We are mammals, and have not exhausted the >> annoying little problems of being mammals. >> >> -K. Mullis >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help, bifurcation diagram efficiency
Basically, don't write loops. Think vectors, matrices,... The R Inferno of Patrick Burns contains a lot of valuable information on optimizing code : http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf Cheers Joris On Thu, Jun 24, 2010 at 7:51 PM, Tyler Massaro wrote: > Hello all - > > This code will run, but it bogs down my computer when I run it for finer and > finer time increments and more generations. I was wondering if there is a > better way to write my loops so that this wouldn't happen. Thanks! > > -Tyler > > # > > # Bifurcation diagram > > # Using Braaksma system of equations > > # We have however used a Fourier analysis > > # to get a forcing function similar to > > # cardiac action potential... > > # > > require(odesolve) > > > > # We get this s_of_t function from Maple ws > > s_of_t = function(t) > > { > > (1/10) * (( (1/2) + (1/2) * (sin((1/4)*pi*t))/(abs(sin((1/4)*pi*t * ( > 6.588315815*sin((1/4)*pi*t) - 1.697435362*sin((1/2)*pi*t) - 1.570296922*sin > ((3/4)*pi*t) + 0.3247901958*sin(pi*t) + 0.7962749105*sin((5/4)*pi*t) + > 0.07812230515*sin((3/2)*pi*t) - 0.3424877143*sin((7/4)*pi*t) - 0.1148306748* > sin(2*pi*t) + 0.1063696962*sin((9/4)*pi*t) + 0.02812403009*sin((5/2)*pi*t))) > > } > > > ModBraaksma = function(t, n, p) > > { > > > dx.dt = (1/0.01)*(n[2]-((1/2)*n[1]^2+(1/3)*n[1]^3)) > > dy.dt = -(n[1]+p["alpha"]) + 0.032 * s_of_t(t) > > list(c(dx.dt, dy.dt)) > > } > > > initial.values = c(0.1, -0.02) > > > alphamin = 0.01 > > alphamax = 0.02 > > > alphas = seq(alphamin, alphamax, by = 0.1) > > > TimeInterval = 100 > > > times = seq(0.001, TimeInterval, by = 0.001) > > > plot(1, xlim = c(alphamin, alphamax), ylim = c(0,0.3), type = "n",xlab > = "Values > of alpha", ylab = "Approximate loop size for a limit cycle", main = > "Bifurcation > Diagram") > > > for (i in 1:length(alphas)){ > > params = c(alpha=alphas[i]) > > out = lsoda(initial.values, times, ModBraaksma, params) > > X = out[,2] > > Y = out[,3] > > for(j in 200:length(times)){ > > if (abs(X[j]) < 0.001) { > > points(alphas[i], Y[j], pch = ".") > > } > > } > > } > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius wrote: > > On Jun 24, 2010, at 6:09 PM, Joris Meys wrote: > >> I do agree that one should not trust solely on sources like wikipedia >> and graphpad, although they contain a lot of valuable information. >> >> This said, it is not too difficult to illustrate why, in the case of >> the one-sample signed rank test, > > That is a key point. I was assuming that you were using the paired sample > version of the WSRT and I may have been misleading the OP. For the > one-sample situation, the assumption of symmetry is needed but for the > paired sampling version of the test, the location shift becomes the tested > hypothesis, and no assumptions about the form of the hypothesis are made > except that they be the same. I believe you mean the form of the distributions. The assumption that the distributions of both samples are the same (or similar, it is a robust test) implies that the differences x_i - y_i are more or less symmetrically distributed. Key point here that we're not talking about the distribution of the populations/samples (as done in the OP) but about the distribution of the difference. I may not have been clear enough on that one. Cheers Joris > Any consideration of median or mean (which > will be the same in the case of symmetric distributions) gets lost in the > paired test case. > > -- > David. > > >> the differences should be not to far >> away from symmetrical. It just needs some reflection on how the >> statistic is calculated. If you have an asymmetrical distribution, you >> have a lot of small differences with a negative sign and a lot of >> large differences with a positive sign if you test against the median >> or mean. Hence the sum of ranks for one side will be higher than for >> the other, leading eventually to a significant result. >> >> An extreme example : >> >>> set.seed(100) >>> y <- rnorm(100,1,2)^2 >>> wilcox.test(y,mu=median(y)) >> >> Wilcoxon signed rank test with continuity correction >> >> data: y >> V = 3240.5, p-value = 0.01396 >> alternative hypothesis: true location is not equal to 1.829867 >> >>> wilcox.test(y,mu=mean(y)) >> >> Wilcoxon signed rank test with continuity correction >> >> data: y >> V = 1763, p-value = 0.008837 >> alternative hypothesis: true location is not equal to 5.137409 >> >> Which brings us to the question what location is actually tested in >> the wilcoxon test. For the measure of location to be the mean (or >> median), one has to assume that the distribution of the differences is >> rather symmetrical, which implies your data has to be distributed >> somewhat symmetrical. The test is robust against violations of this >> -implicit- assumption, but in more extreme cases skewness does matter. >> >> Cheers >> Joris >> >> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius >> wrote: >>> >>> >>> You are being misled. Simply finding a statement on a statistics software >>> website, even one as reputable as Graphpad (???), does not mean that it >>> is >>> necessarily true. My understanding (confirmed reviewing "Nonparametric >>> statistical methods for complete and censored data" by M. M. Desu, >>> Damaraju >>> Raghavarao, is that the Wilcoxon signed-rank test does not require that >>> the >>> underlying distributions be symmetric. The above quotation is highly >>> inaccurate. >>> >>> -- >>> David. >>> >>>> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : > set.seed(100) > y <- rnorm(100,1,2)^2 > wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 > wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius wrote: > > > You are being misled. Simply finding a statement on a statistics software > website, even one as reputable as Graphpad (???), does not mean that it is > necessarily true. My understanding (confirmed reviewing "Nonparametric > statistical methods for complete and censored data" by M. M. Desu, Damaraju > Raghavarao, is that the Wilcoxon signed-rank test does not require that the > underlying distributions be symmetric. The above quotation is highly > inaccurate. > > -- > David. > >> -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PD: ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2
Same trick : c0<-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 ) c0 c1<-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, 0,NA,20,10.3444) c1 c2<-rbind(NA,"A",NA,NA,"B",NA,NA,NA,NA,NA,NA,"C",NA,NA,NA,NA,"D") c2 C.df<-data.frame(c0,c1,c2) C.df pos <- which(!is.na(C.df$c2)) idx <- sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1)) names(idx) <- sapply(2:length(pos), function(i) paste(C.df$c2[pos[i-1]],"-",C.df$c2[pos[i]])) out <- lapply(idx,function(i) summary(C.df[i,1:2])) out 2010/6/24 Eugeniusz Kałuża : > Dear useRs, > > Thanks for advice from Joris Meys, > Now will try to think how to make it working for less specyfic case, > to make the problem more general. > Then the result should be displayed for every group between non empty string > in c2 > i.e. not only result for: > #mean: > c1 c3 c4 c5 > 20 Start1 Stop1 Start1-Stop1 > 25.48585 Start2 Stop2 Start2-Stop2 > > but also for every one group created by space between two closest strings in > c2, that contains only seriess of Na, NA, NA, separated from time to time by > one string i.e.: > #mean: > c1 c3 c4 c5 > 20 Start1 Stop1 Start1-Stop1 > .. Stop1 Start2 Stop1-Start2 > 25.48585 Start2 Stop2 Start2-Stop2 > > i.e. > to rewrite this maybe for another simpler version of command > > but also for every one group created by space between two closest strings in > c2, that contains only seriess of Na, NA, NA, separated from time to time by > one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA > i.e.: > #mean: > c1 c3 c4 c5 > 20 A B A-B > .. B C B-C > 25.48585 C D C-D > ... > > > Looking for more general method (function), grouping between these letters in > c2, > I will now try to study solution proposed by Joris Meys > Thanks for immediate aswer > Kaluza > > > > > -Wiadomo¶æ oryginalna- > Od: Joris Meys [mailto:jorism...@gmail.com] > Wys³ano: Cz 2010-06-24 15:14 > Do: Eugeniusz Ka³u¿a > DW: r-help@r-project.org > Temat: Re: [R] ?to calculate sth for groups defined between points in one > variable (string), / value separating/ spliting variable into groups by i.e. > between start, NA, NA, stop1, start2, NA, stop2 > > On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza > wrote: >> >> Dear useRs, >> >> Thanks for any advices >> >> # I do not know where are the examples how to mark groups >> # based on signal occurence in the additional variable: cf. variable c2, >> # How to calculate different calculations for groups defined by (split by >> occurence of c2 characteristic data) >> >> >> #First example of simple data >> #mexample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 >> c0<-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, >> 12,13,14,15,16,17 ) >> c0 >> c1<-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, >> 0,NA,20,10.3444) >> c1 >> c2<-rbind(NA,"Start1",NA,NA,"Stop1",NA,NA,NA,NA,NA,NA,"Start2",NA,NA,NA,NA,"Stop2") >> c2 >> C.df<-data.frame(cbind(c0,c1,c2)) >> colnames(C.df)<-c("c0","c1","c2") >> C.df >> >> # preparation of form for explaining further needed result (next 3 lines are >> not needed indeed, they are only to explain how to obtain final result >> c3<-rbind(NA,"Start1","Start1","Start1","Start1","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2") >> c4<-rbind(NA, "Stop1", "Stop1", "Stop1", "Stop1", "Stop2", "Stop2", >> "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", >> "Stop2", "Stop2") >> C.df<-data.frame(cbind(c0,c1,c2,c3,c4)) >> colnames(C.df)<-c("c0","c1","c2","c3","c4") >> C.df$c5<-paste(C.df$c3,C.df$c4,sep="-") >> C.df >> > Now this is something I don't get. The list "Start2-Stop2" starts way > before Start2, actually at Stop1. Sure that's what you want? >
Re: [R] How to say "if error"
You could do that using the options, eg : set.seed(1) x <- rnorm(1:10) y <- letters[1:10] z <- rnorm(1:10) warn <-getOption("warn") options(warn=2) for (i in list(x,y,z)){ cc <- try(mean(i), silent=T) if(is(cc,"try-error")) {next} print(cc) } options(warn=warn) see ?options under "warn" Cheers Joris On Thu, Jun 24, 2010 at 5:12 PM, Paul Chatfield wrote: > > On a similar issue, how can you detect a warning in a loop - e.g. the > following gives a warning, so I'd like to set up code to recognise that and > then carry on in a loop > > x<-rnorm(2);y<-c(1,0) > ff<-glm(y/23~x, family=binomial) > > so this would be incorporated into a loop that might be > > x<-rnorm(10);y<-rep(c(1,0),5) > for (i in 1:10) > {ee<-glm(y~x, family=binomial) > ff<-glm(y/23~x, family=binomial)} > > from which I would recognise the warning in ff and not those in ee, saving > results from ee and not from ff. The last bit would be easy adding a line > if(there_is_a_warning_message) {newvector<-NA} else {use results} but how do > you detect the warning message? > > Thanks all for your feedback so far, > > Paul > -- > View this message in context: > http://r.789695.n4.nabble.com/How-to-say-if-error-tp2266619p2267140.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Averaging half hourly data to hourly
See ?aggregate Cheers J On Thu, Jun 24, 2010 at 10:45 AM, Jennifer Wright wrote: > Hi all, > > I have some time-series data in half hourly time steps and I want to be able > to either average or sum the two half hours together into hours. Any help > greatly appreciated. > > Thanks, > > Jenny > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "If-error-resume-next"
http://r.789695.n4.nabble.com/How-to-say-if-error-td2266619.html#a2266619 Cheers On Thu, Jun 24, 2010 at 4:30 PM, Christofer Bogaso wrote: > In VBA there is a syntax "if error resume next" which sometimes acts as very > beneficial on ignoring error. Is there any similar kind of function/syntax > here in R as well? > > Thanks for your attention > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to say "if error"
An old-fashioned and I guess also advised-against method would be to use try() itself, eg : set.seed(1) x <- rnorm(1:10) y <- letters[1:10] z <- rnorm(1:10) for (i in list(x,y,z)){ cc <- try(sum(i), silent=T) if(is(cc,"try-error")) {next} print(cc) } Put silent=F if you want to see the error methods. See also ?try (and ?is ) Cheers Joris On Thu, Jun 24, 2010 at 3:34 PM, Duncan Murdoch wrote: > On 24/06/2010 7:06 AM, Paul Chatfield wrote: >> >> I've had a look at the conditions in base and I can't get the ones to work >> I've looked at but it is all new to me. >> For example, I can work the examples for tryCatch, but it won't print a >> finally message for me when I apply it to my model. Even if I could get >> this to work, I think it would still cause a break e.g. >> for (j in 1:10) >> {tryCatch(ifelse(j==5, stop(j), j), finally=print("oh dear"))} >> >> Thanks for the suggestion though - any others? >> > > I think you don't want to use finally, which is just code that's guaranteed > to be executed at the end. You want to catch the errors and continue. For > example, > > for (j in 1:10) > { tryCatch(ifelse(j==5, stop(j), print(j)), error=function(e) {print("caught > error"); print(e)}) } > > Duncan Murdoch > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on WLS (gls vs lm)
Thanks! I was getting confused as wel, Wolf really had a point. I have to admit that this is all a bit counterintuitive. I would expect those weights to be the inverse of the variances, as GLS uses the inverse of the variance-covariance matrix. I finally found in the help files ?nlme::varWeights the needed information, after you pointed out the problem and after strolling through quite a part of the help files... Does this mean that the GLS algorithm uses 1/sd as weights (can't imagine that...), or is this one of the things in R that just are like that? Cheers Joris On Thu, Jun 24, 2010 at 3:20 PM, Viechtbauer Wolfgang (STAT) wrote: > The weights in 'aa' are the inverse standard deviations. But you want to use > the inverse variances as the weights: > > aa <- (attributes(summary(f1)$modelStruct$varStruct)$weights)^2 > > And then the results are essentially identical. > > Best, > > -- > Wolfgang Viechtbauer http://www.wvbauer.com/ > Department of Methodology and Statistics Tel: +31 (0)43 388-2277 > School for Public Health and Primary Care Office Location: > Maastricht University, P.O. Box 616 Room B2.01 (second floor) > 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck) > > > Original Message > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Stats Wolf Sent: > Thursday, June 24, 2010 15:00 To: Joris Meys > Cc: r-help@r-project.org > Subject: Re: [R] Question on WLS (gls vs lm) > >> Indeed, they should give the same results, and hence I was worried to >> see that the results were not that same. Suffice it to look at >> standard errors and p-values: they do differ, and the differences are >> not really that small. >> >> >> Thanks, >> Stats Wolf >> >> >> >> On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys >> wrote: >>> Indeed, WLS is a special case of GLS, where the error covariance >>> matrix is a diagonal matrix. OLS is a special case of GLS, where the >>> error is considered homoscedastic and all weights are equal to 1. And >>> I now realized that the varIdent() indeed makes a diagonal covariance >>> matrix, so the results should be the same in fact. Sorry for missing >>> that one. >>> >>> A closer inspection shows that the results don't differ too much. The >>> fitting method differs between both functions; lm.wfit uses the QR >>> decomposition, whereas gls() uses restricted maximum likelihood. In >>> Asymptopia, they should give the same result. >>> >>> Cheers >>> Joris >>> >>> >>> On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf >>> wrote: >>>> Thanks for reply. >>>> >>>> Yes, they do differ, but does not gls() with the weights argument >>>> (correlation being unchanged) make the special version of GLS, as >>>> this sentence from the page you provided says: "The method leading >>>> to this result is called Generalized Least Squares estimation >>>> (GLS), of which WLS is just a special case"? >>>> >>>> Best, >>>> Stats Wolf >>>> >>>> >>>> >>>> On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys >>>> wrote: >>>>> Isn't that exactly what you would expect when using a _generalized_ >>>>> least squares compared to a normal least squares? GLS is not the >>>>> same as WLS. >>>>> >>>>> http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_square >>>>> s_generalized.htm >>>>> >>>>> Cheers >>>>> Joris >>>>> >>>>> On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> I understand that gls() uses generalized least squares, but I >>>>>> thought that maybe optimum weights from gls might be used as >>>>>> weights in lm (as shown below), but apparently this is not the >>>>>> case. See: >>>>>> >>>>>> library(nlme) >>>>>> f1 <- gls(Petal.Width ~ Species / Petal.Length, data = iris, >>>>>> weights = varIdent(form = ~ 1 | Species)) aa <- >>>>>> attributes(summary(f1)$modelStruct$varStruct)$weights >>>>>> f2 <- lm(Petal.Width ~ Species / Petal.Length, data = iris, >>>>>> weights = aa) >>>>>> >>&
Re: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2
On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza wrote: > > Dear useRs, > > Thanks for any advices > > # I do not know where are the examples how to mark groups > # based on signal occurence in the additional variable: cf. variable c2, > # How to calculate different calculations for groups defined by (split by > occurence of c2 characteristic data) > > > #First example of simple data > #mexample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > c0<-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 > ) > c0 > c1<-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, > 0,NA,20,10.3444) > c1 > c2<-rbind(NA,"Start1",NA,NA,"Stop1",NA,NA,NA,NA,NA,NA,"Start2",NA,NA,NA,NA,"Stop2") > c2 > C.df<-data.frame(cbind(c0,c1,c2)) > colnames(C.df)<-c("c0","c1","c2") > C.df > > # preparation of form for explaining further needed result (next 3 lines are > not needed indeed, they are only to explain how to obtain final result > c3<-rbind(NA,"Start1","Start1","Start1","Start1","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2") > c4<-rbind(NA, "Stop1", "Stop1", "Stop1", "Stop1", "Stop2", "Stop2", "Stop2", > "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", > "Stop2") > C.df<-data.frame(cbind(c0,c1,c2,c3,c4)) > colnames(C.df)<-c("c0","c1","c2","c3","c4") > C.df$c5<-paste(C.df$c3,C.df$c4,sep="-") > C.df > Now this is something I don't get. The list "Start2-Stop2" starts way before Start2, actually at Stop1. Sure that's what you want? I took the liberty of showing how to get the data between start and stop for every entry, and how to apply functions to it. If you don't get the code, look at ?lapply ?apply ?grep I also adjusted your example, as you caused all variables to be factors by using the cbind in the data.frame function. Never do this unless you're really sure you have to. But I can't think of a case where that would be beneficial... ... C.df<-data.frame(c0,c1,c2) C.df # find positions Start <- grep("Start",C.df$c2) Stop <- grep("Stop",C.df$c2) # create indices idx <- apply(cbind(Start,Stop),1,function(i) i[1]:i[2]) names(idx) <- paste("Start",1:length(Start),"-Stop",1:length(Start),sep="") # Apply the function summary and get a list back named by the interval. out <- lapply(idx,function(i) summary(C.df[i,1:2])) out If you really need to start Start2 right after Stop1, you can use a similar approach. Cheers Joris > # NEEDED RESULTS > # needed result > # for Stat1-Stop1: mean(20,30,40,50) > # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) > #mean: > c1 c3 c4 c5 > 20 Start1 Stop1 Start1-Stop1 > 25.48585 Start2 Stop2 Start2-Stop2 > > #sum > # for Stat1-Stop1: sum(20,30,40,50) > # for Stat2-Stop2: sum(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) > #sum: > c1 c3 c4 c5 > 140 Start1 Stop1 Start1-Stop1 > 280.3444 Start2 Stop2 Start2-Stop2 > > # for Stat1-Stop1: max(20,30,40,50) > # for Stat2-Stop2: max(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) > #max: > c1 c3 c4 c5 > 50 Start1 Stop1 Start1-Stop1 > 60 Start2 Stop2 Start2-Stop2 > > # place of max (in Start1-Stop1: 4 th element in gruop Start1-Stop1 > # place of max (in Start1-Stop1: 2 nd element in gruop Start1-Stop1 > > c0 c3 c4 c5 > 4 Start1 Stop1 Start1-Stop1 > 2 Start2 Stop2 Start2-Stop2 > > > Thanks for any suggestion, > Kaluza > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on WLS (gls vs lm)
Indeed, WLS is a special case of GLS, where the error covariance matrix is a diagonal matrix. OLS is a special case of GLS, where the error is considered homoscedastic and all weights are equal to 1. And I now realized that the varIdent() indeed makes a diagonal covariance matrix, so the results should be the same in fact. Sorry for missing that one. A closer inspection shows that the results don't differ too much. The fitting method differs between both functions; lm.wfit uses the QR decomposition, whereas gls() uses restricted maximum likelihood. In Asymptopia, they should give the same result. Cheers Joris On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf wrote: > Thanks for reply. > > Yes, they do differ, but does not gls() with the weights argument > (correlation being unchanged) make the special version of GLS, as this > sentence from the page you provided says: "The method leading to this > result is called Generalized Least Squares estimation (GLS), of which > WLS is just a special case"? > > Best, > Stats Wolf > > > > On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys wrote: >> Isn't that exactly what you would expect when using a _generalized_ >> least squares compared to a normal least squares? GLS is not the same >> as WLS. >> >> http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm >> >> Cheers >> Joris >> >> On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf wrote: >>> Hi all, >>> >>> I understand that gls() uses generalized least squares, but I thought >>> that maybe optimum weights from gls might be used as weights in lm (as >>> shown below), but apparently this is not the case. See: >>> >>> library(nlme) >>> f1 <- gls(Petal.Width ~ Species / Petal.Length, data = iris, weights >>> = varIdent(form = ~ 1 | Species)) >>> aa <- attributes(summary(f1)$modelStruct$varStruct)$weights >>> f2 <- lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa) >>> >>> summary(f1)$tTable; summary(f2) >>> >>> So, the two models with the very same weights do differ (in terms of >>> standard errors). Could you please explain why? Are these different >>> types of weights? >>> >>> Many thanks in advance, >>> Stats Wolf >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
One way of looking at it is doing a sign test after substraction of the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect to have about as many values above as below zero. There is a sign test somewhere in one of the packages, but it's easily done using the binom.test as well : > set.seed(12345) > x1 <- rnorm(100) > x2 <- rpois(100,2) > binom.test((sum(x1-mean(x1)>0)),length(x1)) Exact binomial test data: (sum(x1 - mean(x1) > 0)) and length(x1) number of successes = 56, number of trials = 100, p-value = 0.2713 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.4571875 0.6591640 sample estimates: probability of success 0.56 > binom.test((sum(x2-mean(x2)>0)),length(x2)) Exact binomial test data: (sum(x2 - mean(x2) > 0)) and length(x2) number of successes = 37, number of trials = 100, p-value = 0.01203 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.2755666 0.4723516 sample estimates: probability of success 0.37 Cheers Joris On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen wrote: > PS. > > Mayby I can somehow try to transform data and check it, for example, using > the skewness-function of timeDate-package? > >> Thanks. What I have had to ask is that >> >> how do you test that the data is symmetric enough? >> If it is not, is it ok to use some data transformation? >> >> when it is said: >> >> "The Wilcoxon signed rank test does not assume that the data are >> sampled from a Gaussian distribution. However it does assume that the >> data are distributed symmetrically around the median. If the >> distribution is asymmetrical, the P value will not tell you much about >> whether the median is different than the hypothetical value." >> >> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen wrote: >> > > Hi all, >> > > >> > > I have a distribution, and take a sample of it. Then I compare >> that >> > sample with the mean of the population like here in "Wilcoxon signed >> >> > rank test with continuity correction": >> > > >> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided") >> > > >> > > Wilcoxon signed rank test with continuity correction >> > > >> > > data: AlphaNoteOnsetDists >> > > V = 63855, p-value = 0.0002093 >> > > alternative hypothesis: true location is not equal to 0.4115136 >> > > >> > >> wilcox.test(Sample,mu=mean(All), alt = "greater") >> > > >> > > Wilcoxon signed rank test with continuity correction >> > > >> > > data: AlphaNoteOnsetDists >> > > V = 63855, p-value = 0.0001047 >> > > alternative hypothesis: true location is greater than 0.4115136 >> > > >> > > What assumptions are needed for the population? >> > >> > wikipedia says: >> > "The Wilcoxon signed-rank test is a _non-parametric_ statistical >> > hypothesis test for... " >> > it also talks about the assumptions. >> > >> > > What can we say according these results? >> > > p-value for the "less" is 0.999. >> > >> > That the p-value for less and greater seem to sum up to one, and that >> > the p-value of greater is half of that for two-sided. You shouldn't >> > ask what we can say. You should ask yourself "What was the question >> > and is this test giving me an answer on that question?" >> > >> > Cheers >> > Joris >> > >> > -- >> > Joris Meys >> > Statistical consultant >> > >> > Ghent University >> > Faculty of Bioscience Engineering >> > Department of Applied mathematics, biometrics and process control >> > >> > tel : +32 9 264 59 87 >> > joris.m...@ugent.be >> > --- >> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on WLS (gls vs lm)
Isn't that exactly what you would expect when using a _generalized_ least squares compared to a normal least squares? GLS is not the same as WLS. http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm Cheers Joris On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf wrote: > Hi all, > > I understand that gls() uses generalized least squares, but I thought > that maybe optimum weights from gls might be used as weights in lm (as > shown below), but apparently this is not the case. See: > > library(nlme) > f1 <- gls(Petal.Width ~ Species / Petal.Length, data = iris, weights > = varIdent(form = ~ 1 | Species)) > aa <- attributes(summary(f1)$modelStruct$varStruct)$weights > f2 <- lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa) > > summary(f1)$tTable; summary(f2) > > So, the two models with the very same weights do differ (in terms of > standard errors). Could you please explain why? Are these different > types of weights? > > Many thanks in advance, > Stats Wolf > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
see ?levels eg: x <- rnorm(10) y <- cut(x,c(-10,0,10)) levels(y)<-c("-10-0","0-10") cheers Joris On Thu, Jun 24, 2010 at 4:14 AM, Yi wrote: > Yeap. It works. Just to make the result more beautiful. > > One more question. > > The interval is showns as (0,10]. > > Is there a way to change it into the format 0-10? > Thanks. > > On Wed, Jun 23, 2010 at 6:12 PM, Joris Meys wrote: >> >> see ?cut >> >> Cheers >> Joris >> >> On Thu, Jun 24, 2010 at 2:57 AM, Yi wrote: >> > I would like to prepare the data for barplot. But I only have the data >> > frame >> > now. >> > >> > x1=rnorm(10,mean=2) >> > x2=rnorm(20,mean=-1) >> > x3=rnorm(15,mean=3) >> > data=data.frame(x1,x2,x3) >> > >> > If there a way to put data within a specific range? The expected result >> > is >> > as follows: >> > range x1 x2 x3 >> > -10-0 2 5 1 (# points >> > in >> > this range) >> > 0-10 7 9 6 >> > ... >> > >> > I know the table function but I do not know how to deal with the range >> > issue. >> > >> > Thanks in advance. >> > >> > Yi >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
see ?cut Cheers Joris On Thu, Jun 24, 2010 at 2:57 AM, Yi wrote: > I would like to prepare the data for barplot. But I only have the data frame > now. > > x1=rnorm(10,mean=2) > x2=rnorm(20,mean=-1) > x3=rnorm(15,mean=3) > data=data.frame(x1,x2,x3) > > If there a way to put data within a specific range? The expected result is > as follows: > range x1 x2 x3 > -10-0 2 5 1 (# points in > this range) > 0-10 7 9 6 > ... > > I know the table function but I do not know how to deal with the range > issue. > > Thanks in advance. > > Yi > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem to building R (datasets)
Why compile from source? 2.11.1 installs fine on XP from the binary, so that's the more obvious solution. Cheers Joris On Thu, Jun 24, 2010 at 12:39 AM, Geun Seop Lee wrote: >> >> Dear all, >>> >>> While I was trying to build R source, I found an error at datasets package >>> (there was no error before that) >>> >>> ../../../library/datasets/R/datasets is unchanged >>> Error in dir.create(Rdatadir, showWarnings = FALSE) : >>> file name conversion problem >>> Calls: -> -> dir.create >>> Execution halted >>> make[2]: *** [all] Error 1 >>> make[1]: *** [R] Error 1 >>> make: *** [all] Error 2 >>> >>> And it was caused by >>> >>> �...@$(ECHO) "tools:::data2LazyLoadDB(\"$(pkg)\", compress=3)" | \ >>> R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) > /dev/null >>> at the Makefile.win in the src/datasets directory >>> I am using Window XP and tried to compile 2.11.1 version. >>> >>> I can't imagine how I can solve this problem. Any hints or suggestions >>> will be appreciated. >>> >>> Thank you. >>> >>> Lee. >>> >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginning Eigen System question.
On Thu, Jun 24, 2010 at 12:41 AM, Joris Meys wrote: > Which other Linear Algebra system, and which function did you use in R? > Cheers > Joris Never mind, off course you used "eigen()"... Eigenvectors are only determined up to a constant. If I'm not mistaken (but check the help files on it), R normalizes the eigenvectors (and so does your other Linear Algebra system). This makes the eigenvectors defined up to the sign. Cheers Joris > > On Thu, Jun 24, 2010 at 12:32 AM, wrote: >> Forgive me if I missunderstand a basic Eigensystem but when I present the >> following matrix to most any other LinearAlgebra system: >> >> 1 3 1 >> 1 2 2 >> 1 1 3 >> >> I get an answer like: >> >> //$values >> //[1] 5.00e+00 1.00e+00 -5.536207e-16 >> >> //$vectors >> // [,1] [,2] [,3] >> //[1,] 0.5773503 -0.8451543 -0.9428090 >> //[2,] 0.5773503 -0.1690309 0.2357023 >> //[3,] 0.5773503 0.5070926 0.2357023 >> >> But R gives me: >> >> //$values >> //[1] 5.00e+00 1.00e+00 -5.536207e-16 >> >> //$vectors >> // [,1] [,2] [,3] >> //[1,] -0.5773503 -0.8451543 -0.9428090 >> //[2,] -0.5773503 -0.1690309 0.2357023 >> //[3,] -0.5773503 0.5070926 0.2357023 >> >> The only difference seems to be the sign on the first eigen vector. What am >> I missing? >> >> Kevin >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginning Eigen System question.
Which other Linear Algebra system, and which function did you use in R? Cheers Joris On Thu, Jun 24, 2010 at 12:32 AM, wrote: > Forgive me if I missunderstand a basic Eigensystem but when I present the > following matrix to most any other LinearAlgebra system: > > 1 3 1 > 1 2 2 > 1 1 3 > > I get an answer like: > > //$values > //[1] 5.00e+00 1.00e+00 -5.536207e-16 > > //$vectors > // [,1] [,2] [,3] > //[1,] 0.5773503 -0.8451543 -0.9428090 > //[2,] 0.5773503 -0.1690309 0.2357023 > //[3,] 0.5773503 0.5070926 0.2357023 > > But R gives me: > > //$values > //[1] 5.00e+00 1.00e+00 -5.536207e-16 > > //$vectors > // [,1] [,2] [,3] > //[1,] -0.5773503 -0.8451543 -0.9428090 > //[2,] -0.5773503 -0.1690309 0.2357023 > //[3,] -0.5773503 0.5070926 0.2357023 > > The only difference seems to be the sign on the first eigen vector. What am I > missing? > > Kevin > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list operation
Another variation on the same theme : lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc')) set <- c('a','c') f <-function(lst,set) sapply(lst,function(x) sum(set %in% x)==length(set) ) i <- f(lst,set) names(i[i]) Doesn't serve anybody but keeps my mind fresh. For long lists, you might benefit from first calculating the length of set, to avoid having to do that n times for a list of length n. Cheers Joris On Wed, Jun 23, 2010 at 11:01 PM, Peter Alspach wrote: > Tena koe Yu > > One possibility: > > lst[sapply(lst, function(x) length(x[x%in% c('a','c')])==2)] > > HTH ... > > Peter Alspach > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Yuan Jian >> Sent: Thursday, 24 June 2010 1:35 a.m. >> To: r-help@r-project.org >> Subject: [R] list operation >> >> Hi, >> >> it seems a simple problem, but I can not find a clear way. >> I have a list: >> lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc')) >> > lst >> $m >> [1] "a" "b" "c" >> $n >> [1] "c" "a" >> $l >> [1] "a" "bc" >> >> how can I get list elements that include a given subset? for example, >> for given subset {'a','c'}, the answer should be 'm' and 'n'. >> >> thanks >> Yu >> >> >> >> [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probabilities from survfit.coxph:
On Wed, Jun 23, 2010 at 9:03 PM, Parminder Mankoo wrote: > Hello: > > In the example below (or for a censored data) using survfit.coxph, can > anyone point me to a link or a pdf as to how the probabilities appearing in > bold under "summary(pred$surv)" are calculated? Do these represent > acumulative probability distribution in time (not including censored time)? predicted survival. Hence the column name "survival"... Cheers Joris > > Thanks very much, > parmee > > *fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)* > *pred <- survfit(fit, newdata=data.frame(age=60))* > *summary(pred)* > > time n.risk n.event survival std.err lower 95% CI upper 95% CI > 59 26 1 *0.978* 0.0240 0.932 1.000 > 115 25 1 *0.952* 0.0390 0.878 1.000 > 156 24 1 *0.917* 0.0556 0.814 1.000 > 268 23 1 *0.880* 0.0704 0.752 1.000 > 329 22 1 *0.818* 0.0884 0.662 1.000 > 353 21 1 *0.760* 0.0991 0.588 0.981 > 365 20 1 *0.698* 0.1079 0.516 0.945 > 431 17 1 *0.623* 0.1187 0.429 0.905 > 464 15 1 *0.549* 0.1248 0.352 0.858 > 475 14 1 *0.480* 0.1267 0.286 0.805 > 563 12 1 *0.382* 0.1332 0.193 0.757 > 638 11 1 *0.297* 0.1292 0.127 0.697 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen wrote: > Hi all, > > I have a distribution, and take a sample of it. Then I compare that sample > with the mean of the population like here in "Wilcoxon signed rank test with > continuity correction": > >> wilcox.test(Sample,mu=mean(All), alt="two.sided") > > Wilcoxon signed rank test with continuity correction > > data: AlphaNoteOnsetDists > V = 63855, p-value = 0.0002093 > alternative hypothesis: true location is not equal to 0.4115136 > >> wilcox.test(Sample,mu=mean(All), alt = "greater") > > Wilcoxon signed rank test with continuity correction > > data: AlphaNoteOnsetDists > V = 63855, p-value = 0.0001047 > alternative hypothesis: true location is greater than 0.4115136 > > What assumptions are needed for the population? wikipedia says: "The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... " it also talks about the assumptions. > What can we say according these results? > p-value for the "less" is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself "What was the question and is this test giving me an answer on that question?" Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing distributions
A qqplot would indeed help. ?ks.test for more formal testing, but be aware: You should also think about what you call similar distributions. See following example : set.seed(12345) x1 <- c(rnorm(100),rnorm(150,3.3,0.7)) x2 <- c(rnorm(140,1,1.2),rnorm(110,3.3,0.6)) x3 <- c(rnorm(140,2,1.2),rnorm(110,4.3,0.6)) d1 <-density(x1) d2 <- density(x2) d3 <- density(x3) xlim <- 1.2*c(min(x1,x2,x3),max(x1,x2,x3)) ylim <- 1.2*c(0,max(d1$y,d2$y,d3$y)) op <- par(mfrow=c(1,3)) plot(d1,xlim=xlim,ylim=ylim) lines(d2,col="red") lines(d3,col="blue") qqplot(x1,x2) qqplot(x2,x3) par(op) # formal testing ks.test(x1,x2) ks.test(x2,x3) # relocate x3 x3b <- x3 - mean(x3-x2) x3c <- x3 - mean(x3-x1) # formal testing ks.test(x2,x3b) ks.test(x1,x3c) # test location t.test(x2-x1) t.test(x3-x2) t.test(x3-x1) Cheers Joris On Wed, Jun 23, 2010 at 9:33 PM, Ralf B wrote: > I am trying to do something in R and would appreciate a push into the > right direction. I hope some of you experts can help. > > I have two distributions obtrained from 1 datapoints each (about > 1 datapoints each, non-normal with multi-model shape (when > eye-balling densities) but other then that I know little about its > distribution). When plotting the two distributions together I can see > that the two densities are alike with a certain distance to each other > (e.g. 50 units on the X axis). I tried to plot a simplified picture of > the density plot below: > > > > > | > | * > | * * > | * + * > | * + + * > | * + * + + * > | * +* + * + + * > | * + * + +* > | * + +* > | * + +* > | * + + * > | * + + * > |___ > > > What I would like to do is to formally test their similarity or > otherwise measure it more reliably than just showing and discussing a > plot. Is there a general approach other then using a Mann-Whitney test > which is very strict and seems to assume a perfect match. Is there a > test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or > are there any other similarity measures that could give me a statistic > about how close these two distributions are to each other ? All I can > say from eye-balling is that they seem to follow each other and it > appears that one distribution is shifted by a amount from the other. > Any ideas? > > Ralf > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about a program
Most of the computation time is in the functions qvnorm. You can win a little bit by optimizing the code, but the gain is relatively small. You can also decrease the interval used to evaluate qvnorm to win some speed there. As you look for the upper tail, no need to evaluate the function in negative numbers. Eg : pair_kFWER2=function(m,alpha,rho,k,pvaluessort){ library(mvtnorm) cc_z <- numeric(m) Var <- matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T) lpart <- 1:k # first part hpart <- (k+1):m # second part # make first part of the cc_z cc_z[lpart] <- qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail="upper", interval=c(0,4),sigma=Var)$quantile # make second part of the cc_z cc_z[hpart] <- sapply(hpart,function(i){ qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,interval=c(0,4), tail="upper", sigma=Var)$quantile }) crit_cons <- 1-pnorm(cc_z) # does the test whether values of crit_cons[i] are smaller than # values pvaluessort[i] and returns a vector # which says at which positions does not occur # tail takes the last position. I guess that's what you wanted out <- tail(which(!(crit_cons < pvaluessort)),1) return(out) } > system.time(replicate(100,pair_kFWER(m,alpha,rho,k,pvaluessort))) user system elapsed 5.970.046.04 > system.time(replicate(100,pair_kFWER2(m,alpha,rho,k,pvaluessort))) user system elapsed 4.430.004.44 Check whether the function works correctly. It gives the same result as the other one in my tests. Only in the case where your function returns 0, the modified returns integer(0) Cheers Joris On Wed, Jun 23, 2010 at 2:21 PM, li li wrote: > Dear all, > I have the following program for a multiple comparison procedure. > There are two functions for the two steps. First step is to calculate the > critical values, > while the second step is the actual procedure [see below: program with two > functions]. > This work fine. However, However I want to put them into one function > for the convenience > of later use [see below: program with one function]. Some how the big > function works extremely > slow. For example I chose > m <- 10 > rho <- 0.1 > k <- 2 > alpha <- 0.05 > pvaluessort <- sort(1-pnorm(rnorm(10)) > > Is there anything that I did wrong? > Thank you! > Hannah > > > ##Program with two functions > ## first step > library(mvtnorm) > cc_f <- function(m, rho, k, alpha) { > > cc_z <- numeric(m) > var <- matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T) > for (i in 1:m){ > if (i <= k) {cc_z[i] <- qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail="upper", > sigma=var)$quantile} else > {cc_z[i] <- qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha, > tail="upper", sigma=var)$quantile} > } > cc <- 1-pnorm(cc_z) > return(cc) > } > ## second step > pair_kFWER=function(m,crit_cons,pvaluessort){ > k=0 > while((k k=k+1 > } > return(m-k) > } > ### > ##Program with one function ## > > pair_kFWER=function(m,alpha,rho,k,pvaluessort){ > library(mvtnorm) > cc_z <- numeric(m) > var <- matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T) > for (i in 1:m){ > if (i <= k) {cc_z[i] <- qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail="upper", > sigma=var)$quantile} else > {cc_z[i] <- qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha, > tail="upper", sigma=var)$quantile} > } > crit_cons <- 1-pnorm(cc_z) > > k=0 > while((k k=k+1 > } > return(m-k) > } > ##### > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (help) This is an R workspace memory processing question
http://lmgtfy.com/?q=R+large+data+sets Cheers Joris On Wed, Jun 23, 2010 at 6:38 AM, 나여나 wrote: > > Hi all! > > > This is an R workspace memory processing question. > > > There is a method which from the R will control 10GB data at 500MB units? > > > my work environment : > > R version : 2.11.1 > OS : WinXP Pro sp3 > > > Thanks and best regards. > > > Park, Young-Ju > > from Korea. > > [1][rKWLzcpt.zNp8gmPEwGJCA00] > > > [...@from=dllmain&rcpt=r%2Dhelp%40r%2Dproject%2Eorg&msgid=%3C20100623133828%2EH > M%2E0Zq%40dllmain%2Ewwl737%2Ehanmail%2Enet%3E] > > References > > 1. mailto:dllm...@hanmail.net > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "save scores" from sem
I should have specified: lavaan is not familiar to me. I'm also not familiar enough with the sem package to tell you how to obtain the scores, but all information regarding the fit is in the object. With that, it shouldn't be too difficult to get the scores you want. This paper might give you some more information, in case you didn't know it yet : http://socserv.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf On a side note, sem with a single latent variable might be seen as a factor analysis with one component, but definitely not as a PCA. A PCA is constructed based on the total variance, rendering an orthogonal space with as many dimensions as there ara variables. Not so for a FA, as the matrix used to calculate the eigenvectors and eigenvalues is a reduced matrix, in essence only taking into account part of the variation in the data for calculation of the loadings. This makes PCA absolutely defined, but FA only up to a rotation. On a second side note, using the saved scores in some subsequent analysis is in my view completely against the idea behind sem. Structural equation modelling combines those observed variables exactly to be able to take the variation on the combined latent variable into account. If you use those latent variables as input in a second analysis, you lose the information regarding the variation. Cheers Joris On Wed, Jun 23, 2010 at 9:53 AM, Steve Powell wrote: > Dear Joris, > thanks for your reply - it is the sem case which interests me. Suppose > for example I use sem to construct a CFA for a set of variables, with > a single latent variable, then this could be equivalent to a PCA with > a single component, couldn't it? From the PCA I could "save" the > scores as new variables; is there an equivalent with sem? This would > be particularly useful if e.g. in sem I let some of the errors covary > and then wanted to use the "saved scores" in some subsequent analysis. > > By the way, lavaan is at cran.r-project.org/web/packages/lavaan/index.html > > Best Wishes > Steve > > www.promente.org | skype stevepowell99 | +387 61 215 997 > > > > > On Tue, Jun 22, 2010 at 7:08 PM, Joris Meys wrote: >> PCA and factor analysis is implemented in the core R distribution, no >> extra packages needed. When using princomp, they're in the object. >> >> pr.c <- princomp(USArrests,scale=T) >> pr.c$scores # gives you the scores >> >> see ?princomp >> >> When using factanal, you can ask for regression scores or Bartlett >> scorse. See ?factanal. >> Mind you, you will get different -i.e. more correct- results than >> those obtained by SPSS. >> >> I don't understand what you mean with scores in the context of >> structural equation modelling. Lavaan is unknown to me. >> >> Cheers >> Joris >> >> On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell wrote: >>> Dear expeRts, >>> sorry for such a newbie question - >>> in PCA/factor analysis e.g. in SPSS it is possible to save scores from the >>> factors. Is it analogously possible to "save" the implied scores from the >>> latent variables in a measurement model or structural model e.g. using the >>> sem or lavaan packages, to use in further analyses? >>> Best wishes >>> Steve Powell >>> >>> www.promente.org | skype stevepowell99 | +387 61 215 997 >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mahalanobis distance
Say X is your data matrix with the variable, then you could do : X <- matrix(rnorm(2100),300,7) S <- var(X) dist <- as.dist( apply(X,1,function(i){ mahalanobis(X,i,S) } ) ) Cheers Joris On Tue, Jun 22, 2010 at 11:41 PM, yoo hoo wrote: > I am a new R user. i have a question about Mahalanobis distance.actually i > have 300 rows and 7 columns. columns are different measurements, 300 rows are > genes. since genes can > classify into 4 categories. i used dist() with euclidean distance and > cmdscale to do MDS plot. but find out Mahalanobis distance may be > better. how do i use Mahalanobis() to generate similar dist object which i > can use MDS plot? second question is if should i calculate mean for > every categories for every measurement first and do 4*4 distance matrix, or i > should calculate pairwise distance first and then find category > means since i only care about relative position of 4 categories in MDS > plot. Thank you very much. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'understand' R functions besides reading R codes
There is: 1) read the help files 2) read the vignette (see ?vignette ) 2) look up at www.rseek.org 3) check google 4) look at the references mentioned in the help files (or do first if you're not familiar with the method) 5) ask here If still nothing : use another function/method. Cheers Joris On Wed, Jun 23, 2010 at 2:29 AM, Yi wrote: > Apologize for not being clearer earlier. I would like to ask again. Thank > Joris and Markleeds for response. > > Two examples: > > 1. Function 'var'. In R, it is the sum of square divided by (n-1) but not by > n. (I know this in R class) > > 2. Function 'lm'. In R, it is the residual sum of square divied by (n-2) not > by n, the same as in the least squares estimate. But the assumption > following that in the method of maximum likelihood. (I know this by looking > up the book 'Introductory Statistics with R'). > > But not all functions are explained in the books. > > I am thinking whether there is a general way to figure out how R function > works and what is the underlying theory besides looking at the codes > (type var or lm at the Rprompt). Because R codes are too difficult to > understand. > > Thanks > > Yi > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Verify the linear regression model used in R ( fundamental theory)
It's normally always specified, unless in the case of least squares linear regression (lm), where it is considered obvious. Cheers Joris On Wed, Jun 23, 2010 at 1:46 AM, Yi wrote: > Hi, folks, > > As I understand, Least-squares Estimate (second-moment assumption) and the > Method of Maximum Likelihood (full distribtuion assumption) are used for > linear regression. > > I do >?lm, but the help file does not tell me the model employed in R. But > in the book 'Introductory Statistics with R', it indicates R estimate the > parameters using the method of Least-squares. However it assumes the error > is iid N(o,sigma^2). > > Am I correct? Is there any general way (like RSiteSearch() ) to find out > what the model (theory) is for specific function? Let's say how to find out > the assumption and the model used for rlm. > > Thanks > > Yi > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm
On Tue, Jun 22, 2010 at 1:00 AM, Samuel Okoye wrote: > Hi, > > I have the following data > > data1 <- data.frame(count = c(0,1,1,2,4,5,13,16,14), weeks = 1:9, > treat=c(rep("1mg",3),rep("5mg",3),rep("10mg",3))) > and I am using > > library(splines) > > to fit > > glm.m <- glm(count~weeks)+as.factor(treat),family=poisson,data=data1) > > and I am interested in predicting the count variale for the weeks 10, 11 and > 12 with treat 10mg and 15mg. bad luck for you. newdat <-data.frame( weeks=rep(10:12,each=2), treat=rep(c("5mg","10mg"),times=3) ) preds <- predict(glm.m,type="response",newdata=newdat,se.fit=T) cbind(newdat,preds) gives as expected : Warning message: In bs(weeks, degree = 3L, knots = numeric(0), Boundary.knots = c(1L, : some 'x' values beyond boundary knots may cause ill-conditioned bases weeks treat fitse.fit residual.scale 110 5mg 5.934881 5.205426 1 210 10mg 12.041639 9.514347 1 311 5mg 4.345165 6.924663 1 411 10mg 8.816168 15.805171 1 512 5mg 2.781063 8.123436 1 612 10mg 5.642667 18.221007 1 Watch the standard errors on the predicted values. No, you shouldn't predict outside your data space, especially when using splines. And when interested in 15mg, well, you shouldn't put treatment as a factor to start with. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "save scores" from sem
PCA and factor analysis is implemented in the core R distribution, no extra packages needed. When using princomp, they're in the object. pr.c <- princomp(USArrests,scale=T) pr.c$scores # gives you the scores see ?princomp When using factanal, you can ask for regression scores or Bartlett scorse. See ?factanal. Mind you, you will get different -i.e. more correct- results than those obtained by SPSS. I don't understand what you mean with scores in the context of structural equation modelling. Lavaan is unknown to me. Cheers Joris On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell wrote: > Dear expeRts, > sorry for such a newbie question - > in PCA/factor analysis e.g. in SPSS it is possible to save scores from the > factors. Is it analogously possible to "save" the implied scores from the > latent variables in a measurement model or structural model e.g. using the > sem or lavaan packages, to use in further analyses? > Best wishes > Steve Powell > > www.promente.org | skype stevepowell99 | +387 61 215 997 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Duplicate dates in zoo objects
Oops, I was too fast. I meant : i <- length(x)- match(unique(index(x)),rev(index(x)))+1 (one has to reverse the indices again) But in any case, the aggregate-way is indeed far cleaner. Thx for pointing that out. Cheers Joris On Tue, Jun 22, 2010 at 7:24 PM, Achim Zeileis wrote: > On Tue, 22 Jun 2010, Joris Meys wrote: > >> Try this : >>> >>> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1 >> >>> x <- zoo(1:5, x.Date) >> >>> x >> >> 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 >> 1 2 3 4 5 >> >>> i <- match(unique(index(x)),rev(index(x))) >> >>> x[i] >> >> 2003-02-01 2003-02-03 2003-02-07 2003-02-14 >> 1 2 4 5 > > Note that this happens to do the right thing in this particular toy example > but not more generally. Simply adding a single observation will make it > fail. The aggregate() solution I posted previously is preferred: > > R> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14, 15) - 1 > R> x <- zoo(1:6, x.Date) > Warning message: > In zoo(1:6, x.Date) : > some methods for "zoo" objects do not work if the index entries in > 'order.by' are not unique > R> x > 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 2003-02-15 > 1 2 3 4 5 6 > R> aggregate(x, time(x), tail, 1) > 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15 > 1 2 4 5 6 > R> i <- match(unique(index(x)),rev(index(x))) > R> x[i] > 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15 > 1 2 3 5 6 > > Best, > Z > >> Cheers >> Joris >> >> >> On Tue, Jun 22, 2010 at 4:35 PM, Research >> wrote: >>> >>> Hello, >>> >>> I have a zoo time series read from an excel file which has some dates the >>> same, such as the following example: >>> >>> 02/10/1995 4925.5 >>> 30/10/1995 4915.9 >>> 23/01/1996 4963.5 >>> 23/01/1996 5009.2 >>> 04/03/1996 5031.9 # here >>> 04/03/1996 5006.5 # here >>> 03/04/1996 5069.2 >>> 03/05/1996 5103.7 >>> 31/05/1996 5107.1 >>> 01/07/1996 5153.1 >>> 02/08/1996 5151.7 >>> >>> Is there a simple way to keep the last price of the ones that have the >>> same >>> dates? >>> >>> 04/03/1996 5031.9 >>> 04/03/1996 5006.5 >>> >>> i.e., keep only the "04/03/1996 5006.5" price and discard the >>> previous >>> one... Is there an implicit function that does that or do I need some >>> sort >>> of recursive algorithm? >>> >>> You can try a solution on this example (for convenience): >>> >>> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1 >>> x <- zoo(rnorm(5), x.Date) >>> >>> Zoo object has 2 prices with same dates. >>> >>> Many thanks in advance, >>> Costas >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Duplicate dates in zoo objects
Try this : > x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1 > x <- zoo(1:5, x.Date) > x 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 1 2 3 4 5 > i <- match(unique(index(x)),rev(index(x))) > x[i] 2003-02-01 2003-02-03 2003-02-07 2003-02-14 1 2 4 5 Cheers Joris On Tue, Jun 22, 2010 at 4:35 PM, Research wrote: > Hello, > > I have a zoo time series read from an excel file which has some dates the > same, such as the following example: > > 02/10/1995 4925.5 > 30/10/1995 4915.9 > 23/01/1996 4963.5 > 23/01/1996 5009.2 > 04/03/1996 5031.9 # here > 04/03/1996 5006.5 # here > 03/04/1996 5069.2 > 03/05/1996 5103.7 > 31/05/1996 5107.1 > 01/07/1996 5153.1 > 02/08/1996 5151.7 > > Is there a simple way to keep the last price of the ones that have the same > dates? > > 04/03/1996 5031.9 > 04/03/1996 5006.5 > > i.e., keep only the "04/03/1996 5006.5" price and discard the previous > one... Is there an implicit function that does that or do I need some sort > of recursive algorithm? > > You can try a solution on this example (for convenience): > > x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1 > x <- zoo(rnorm(5), x.Date) > > Zoo object has 2 prices with same dates. > > Many thanks in advance, > Costas > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replication time series using permutations of the each "y" values
Read the posting guide please. What's your data structure? That's quite important. As I see it, I can easily get a matrix with what you want by : x1 <- rep("a","b",each=3) x2 <- rep("c","d","f",times=2) x3 <- rep("g",6) x4 <- rep("h",6) result <- rbind(x1,x2,x3,x4) But that's not what you want probably. Cheers Joris 2010/6/22 Josué Polanco : > Dear All, > > I'm trying to create this: > I've these data (a, b,c, etc. are numbers) : > > 1 a b > 2 c d f > 3 g > 4 h > ... > N > > I'm trying to create the all "time series permutations" (length = N) > taking account > the possibilities for each value. For example (N=4), > > 1 a > 2 c > 3 g > 4 h > > next, > 1 b > 2 c > 3 g > 4 h > > next > 1 a > 2 d > 3 g > 4 h > > and so on. For this example, there are 2*3 different (possibilities) > time series, but > I need to do, v. gr., N = 100, and 2^14 different (possibilities) > time series. I am wondering > if somebody could give me a hint (suggestion) to make this using a > function in R. > > thank you so much > > best regards > > -- > Josue Polanco > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do you install R package?
open the R console. type: ?install.packages. press enter. read. say "Doh, that's easy." do what you read. cheers Joris On Tue, Jun 22, 2010 at 5:38 PM, Amy Li wrote: > Hi I'm a new user. I need to install some new packages. Can someone show me? > > Amy > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Project
Presuming you're talking about Perl, I might be able to help with specific issues, but read the posting guide before you actually call upon our help. http://www.R-project.org/posting-guide.html One book I definitely can recommend is Phil Spectors Data Manipulation with R http://www.springer.com/statistics/computanional+statistics/book/978-0-387-74730-9 It covers most you need to know about data structures in R, as they're quite different from Perl. Pay attention to the vectorization and the use of indices in R, as both concepts differ substantially from Perl. In essence, avoid for-loops as much as you can. One other source you might look at is the R inferno of Patrick Burns : http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf Cheers Joris On Tue, Jun 22, 2010 at 4:35 AM, Colton Hunt wrote: > I am an intern in Virginia Beach and I am currently working on a project > that involves converting a script of Pearl to R. The code takes one input > text file and outputs six text files. If you think you can help, I will be > able to provide more detailed information. Thank you for your time! > > Colton Hunt > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} RE: {Spam?} RE: {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?
Hi Carlos, there is no possible way you can compare both models using a classical statistical framework, be it ML, REML or otherwise. The assumptions are violated. Regarding the df, see my previous mail. In your case, I'd resort to the AIC/BIC criteria, and if prediction is the main focus, compare the predictive power of both models using a crossvalidation approach. Wood suggests in his book also a MCMC approach for more difficult comparisons. Cheers Joris On Tue, Jun 22, 2010 at 1:31 AM, Carlo Fezzi wrote: > Hi Christos, > > thanks for your kind reply, I agree entirely with your interpreation. > > In the first model comparison, however, "anova" does seem to work > according to our interpretation, i.e. the "df" are equal in the two model. > My intuition is that the "anova" command does a fixed-effect test rather > than a random effect one. This is the results I get: > > anova(f1$lme,f2$lme) > > Model df AIC BIC logLik > f1$lme 1 5 466.6232 479.6491 -228.3116 > f2$lme 2 5 347.6293 360.6551 -168.8146 > > Hence I was not sure our interpretation was correct. > > On your second regarding mode point I totally agree about the appealing of > GAMs... howver, I am working in a specific application where the quadratic > function is the established benchmark and I think that testing against it > will show even more strongly the appeal of a gamm approach. Any idea of > which bases could work? > > Finally thansk for the tip regarding gamm4, unfortunately I need to fit a > bivariate smooth so I cannot use it. > > Best wishes, > > Carlo > > > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
Hehe, You do have a point in not calling R a statistical language. It is indeed far more than that; Yet, I don't agree that statistics is done by stuffy professors. Wished it was so, but alas, last time I looked at my paycheck I had to conclude that I might be stuffy, but I'm far from being paid as a professor... Cheers Joris On Tue, Jun 22, 2010 at 11:34 AM, Patrick Burns wrote: > I'll expand my statement slightly. > > Yes, Peter, you are the archetypical > stuffy professor. The truth hurts. > > By any reasonable metric that I've > thought of my company name is at least > one-third "statistics", from which a > common (and I think correct) inference > would be that I'm not anti-statistics. > > > There are two aspects of why I think > that R should not be called a statistical > program: marketing and reality. > > Marketing > > Identifying with the most dreaded experience > in university is not so good for "sales". > (Reducing stuffiness might reduce the root > problem here.) > > Reality > > R really is used for more than statistics. > Almost all of my use of R is outside the > realm of statistics. Maybe the field of > statistics should have claim on a lot of > that, but as of now that isn't the case. > > A Fusion > > R's real competition is not SAS or SPSS, but > Excel. As Brian has pointed out before, > the vast majority of statistics is actually > done in Excel. Is Excel a statistics program? > I don't think many people think that -- neither > statisticians nor non-statisticians. > > Pat > > > On 21/06/2010 10:32, Joris Meys wrote: >> >> On Mon, Jun 21, 2010 at 11:15 AM, Patrick Burns >> wrote: >>> >>> (Statistics is what stuffy professors >>> do, I just look at my data and try to >>> figure out what it means.) >> >> Often those stuffy professors have a reason to do so. When they want >> an objective view on the data for example, or an objective measure of >> the significance of a hypothesis. But you're right, who cares about >> objectiveness these days? It doesn't sell you a paper, does it? >> >> Cheers >> Joris >> >> > > -- > Patrick Burns > pbu...@pburns.seanet.com > http://www.burns-stat.com > (home of 'Some hints for the R beginner' > and 'The R Inferno') > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?
On Mon, Jun 21, 2010 at 10:01 PM, Bert Gunter wrote: > > 1. This discussion probably belongs on the sig-mixed-models list. Correct, but I'll keep it here now for archive purposes. > > 2. Your claim is incorrect, I think. The structure of the random errors = > model covariance can be parameterized in various ways, and one can try to > test significance of nested parameterizations (for a particular fixed > effects parameterizaton). Whether you can do so meaningfully especially in > the gamm context -- is another issue, but if you check e.g. Bates and > Pinheiro, anova for different random effects parameterizations is advocated, > and is implemented in the anova.lme() nlme function. > It is a matter of debate and interpretation. Mind you that I never said it can't be done. I just wanted to pointed out that in most cases, it shouldn't be done. As somebody else noted, both Wood and Pinheiro and Bates suggest testing the random components _if the fixed effects are the same_ . Yet, even then it gets difficult to offer the correct hypothesis. In the example of Carlo, H0 is actually not correct : 1) The "7 extra random components" are not really 7 extra random components, as they are definitely not independent, but actually all part of the same spline fit. 2) According to my understanding, the degrees of freedom for a likelihood is determined by the amount of parameters fitted, not the amount of variables in the model. The parameters linked to the random effect are part of the variance structure, and the number of parameters to be fitted in this variance structure is not dependent on the number of knots, but on the number of smooths. Indeed, in the gamm model the variance of the parameters b for the random effect is considered to be equal for all b's related to the same smooth. 3) it appears to me that you violate the restriction both Pinheiro/Bates and Wood put on the testing of models with LR : you can only do so if none of the variance parameters are restriced to the edge of the feasible parameter space. Again as I see it, the model with less knots implies the assumption that some variance components are zero. Hence, you cannot use a LR test to compare both models. The anova.lme won't carry out the test anyway : library(mgcv) library(nlme) set.seed(123) x <- runif(100,1,10)# regressor b0 <- rep(rnorm(10,mean=1,sd=2),each=10) # random intercept id <- rep(1:10, each=10) # identifier y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1) # dependent variable f1 <- gamm(y ~ s(x, k=3, bs="cr"), random = list(id=~1), method="ML" ) f2 <- gamm(y ~ s(x, k=10, bs="cr"), random = list(id=~1),method="ML" ) > anova(f1$lme,f2$lme) Model df AIC BIClogLik f1$lme 1 5 466.6232 479.6491 -228.3116 f2$lme 2 5 347.6293 360.6551 -168.8146 If you change the random term not related to the splines, it does carry out the test. In this case, you can test the random effects as explained by Pinheiro/Bates. > f3 <- gamm(y ~ s(x, k=10, bs="cr"),method="ML" ) > anova(f2$lme,f3$lme) Model df AIC BIClogLik Test L.Ratio p-value f2$lme 1 5 347.6293 360.6551 -168.8146 f3$lme 2 4 446.2704 456.6910 -219.1352 1 vs 2 100.6411 <.0001 As an illustration, the variance of the random effects : > f1$lme ... Random effects: Formula: ~Xr.1 - 1 | g.1 Xr.1 StdDev: 163.8058 Formula: ~1 | id %in% g.1 (Intercept) Residual StdDev:1.686341 2.063269 Number of Observations: 100 Number of Groups: g.1 id %in% g.1 1 10 > f2$lme ... Random effects: Formula: ~Xr.1 - 1 | g.1 Structure: pdIdnot Xr.11Xr.12Xr.13Xr.14Xr.15Xr.16Xr.17Xr.18 StdDev: 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 ... Clearly, the SD for all parameters b is exactly the same, and hence only 1 parameter in the likelihood function. So both models won't differ in df when using the ML estimation. This does not mean that the b-coefficients for the random effects cannot be obtained. They are just not part of the minimalization in the likelihood function. Or, as Wood said about random effects : you don't estimate them, although you might want to predict them. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?
t;>> Dear all, >>>>>>> >>>>>>> I am using the "mgcv" package by Simon Wood to estimate an additive >>>>>>> mixed >>>>>>> model in which I assume normal distribution for the residuals. I >>>>>>> would >>>>>>> like to test this model vs a standard parametric mixed model, such >>>>>>> as >>>>>>> the >>>>>>> ones which are possible to estimate with "lme". >>>>>>> >>>>>>> Since the smoothing splines can be written as random effects, is it >>>>>>> correct to use an (approximate) likelihood ratio test for this? >>>>>> -- yes this is ok (subject to the usual caveats about testing on the >>>>>> boundary >>>>>> of the parameter space) but your 2 example models below will have >>>>>> the >>>>>> same >>>>>> number of degrees of freedom! >>>>>> >>>>>>> If so, >>>>>>> which is the correct number of degrees of freedom? >>>>>> --- The edf from the lme object, if you are testing using the log >>>>>> likelihood >>>>>> returned by the lme representation of the model. >>>>>> >>>>>>> Sometime the function >>>>>>> LogLik() seems to provide strange results regarding the number of >>>>>>> degrees >>>>>>> of freedom (df) for the gam, for instance in the example I copied >>>>>>> below >>>>>>> the df for the "gamm" are equal to the ones for the "lme", but the >>>>>>> summary(model.gam) seems to indicate a much higher edf for the gamm. >>>>>> --- the edf for the lme representation of the model counts only the >>>>>> fixed >>>>>> effects + the variance parameters (which includes smoothing >>>>>> parameters). >>>>>> Each >>>>>> smooth typically contributes only one or two fixed effect parameters, >>>>>> with >>>>>> the rest of the coefficients for the smooth treated as random >>>>>> effects. >>>>>> >>>>>> --- the edf for the gam representation of the same model differs in >>>>>> that >>>>>> it >>>>>> also counts the *effective* number of parameters used to represent >>>>>> each >>>>>> smooth: this includes contributions from all those coefficients that >>>>>> the >>>>>> lme >>>>>> representation treated as strictly random. >>>>>> >>>>>> best, >>>>>> Simon >>>>>> >>>>>> >>>>>>> I would be very grateful to anybody who could point out a solution, >>>>>>> >>>>>>> Best wishes, >>>>>>> >>>>>>> Carlo >>>>>>> >>>>>>> Example below: >>>>>>> >>>>>>> >>>>>>> >>>>>>> rm(list = ls()) >>>>>>> library(mgcv) >>>>>>> library(nlme) >>>>>>> >>>>>>> set.seed(123) >>>>>>> >>>>>>> x <- runif(100,1,10) # regressor >>>>>>> b0 <- rep(rnorm(10,mean=1,sd=2),each=10) # random intercept >>>>>>> id <- rep(1:10, each=10) # identifier >>>>>>> >>>>>>> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1) # dependent variable >>>>>>> >>>>>>> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" ) # >>>>>>> lme >>>>>>> model >>>>>>> >>>>>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" ) # gamm >>>>>>> >>>>>>> ## same number of "df" according to logLik: >>>>>>> logLik(f1) >>>>>>> logLik(f2$lme) >>>>>>> >>>>>>> ## much higher edf according to summary: >>>>>>> summary(f2$gam) >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> __ >>>>>>> R-help@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html and provide commented, >>>>>>> minimal, >>>>>>> self-contained, reproducible code. >>>>>> >>>>>> -- >>>>>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY >>>>>>> UK >>>>>>> +44 1225 386603 www.maths.bath.ac.uk/~sw283 >>>>>> >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> Joris Meys >>>> Statistical consultant >>>> >>>> Ghent University >>>> Faculty of Bioscience Engineering >>>> Department of Applied mathematics, biometrics and process control >>>> >>>> tel : +32 9 264 59 87 >>>> joris.m...@ugent.be >>>> --- >>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >>>> >>> >>> >>> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> > > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ctree
You can't get them out of the plot function, but you can calculate them from the fit. This code returns a matrix with the appropriate rownames and colnames. x1 <- treeresponse(iris.ct) p <- matrix( unlist(unique(x1)), ncol=3, byrow=T ) colnames(p) <- levels(iris$Species) rownames(p) <- unique(where(iris.ct)) p On Mon, Jun 21, 2010 at 3:28 PM, wrote: > > Hello, > > This is a re-submittal of question I submitted last week, but haven't rec'd > any responses. > > I need to extract the probabilities used to construct the barplots > displayed as part of the graph produced by plot("ctree"). > > For example, > > library(party) > > iris.ct <- ctree(Species ~ . , data = iris) > plot(iris.ct) > > Instead of a simple example with only 4 terminal nodes, my analysis > produces 55 terminal nodes, so unless someone is really interested I'm only > using the iris data as an example. > > This is the model I am sending to ctree. The graph produced is very > informative, but I need the information from the plot(Marshct) > > Marshct <- ctree(Physiogomy ~ meanAnnualDepthAve + meanAnnualDepthWetAve + > medAnnualDepthAve + medianAnnualDepthWetAve + medianAnnualDepthDryAve + > continHydroWetAve + DCHperiodAverage + DCHydroDryAve + > threeDayWaterDepthMinAve + threeDayWaterDepthMaxAve + sevenDayDryFreqAve + > sevenDayWaterDepthMaxAve + sevenDayWaterDepthMinAve + > seventeenDayWaterDepMaxAve + seventeenDayWaterDepMinAve + > thirtyoneDayWaterDepthMaxAve + wetIntensityAve + BulkDensity + LOI + TP + > TN + TC + Total_Mg, data = Marsh) > > plot(Marshct) > > > Working in Windows XP with R2.11.1 (2010-05-31) > > Thanks > Steve > > Steve Friedman Ph. D. > Spatial Statistical Analyst > Everglades and Dry Tortugas National Park > 950 N Krome Ave (3rd Floor) > Homestead, Florida 33034 > > steve_fried...@nps.gov > Office (305) 224 - 4282 > Fax (305) 224 - 4147 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Return value associated with a factor
Code is not runnable, so can't check why it goes wrong, but tried already with as.character(save.tract$...) ? Cheers Joris On Mon, Jun 21, 2010 at 3:15 PM, GL wrote: > > I am using the code below to extract census tract information. > save.tract$state, save.tract$county and save.tract$tract are returned as > factors. In the last three statements, I need to save the actual value of > the factor, but, instead, the code is yielding the position of the factor. > How do I instead return the value of the factor? > > By way of example, for Lon=-82.49574 and Lat=29.71495, the code returns > state = 1, county = 1 and tract = 161. The desired results are state=12, > county = 001 tract = 002201. > > > #set libraries > library(UScensus2000tract) > library(gdata) > data(florida.tract) > > > #read input > dbs.in = read.delim("addresses_coded_new.txt", header = TRUE, sep = "\t", > quote="\"", dec=".") > > #instantiate output > more.columns <- data.frame( state=double(0), > county=double(0), > tract=double(0)) > > dbs.in <- cbind(dbs.in,more.columns) > > #fiure out how many times to loop > j <- nrow(dbs.in) > > #loop through each lab/long and assign census tract > > for (i in 1:j) { > > index<-overlay(SpatialPoints(cbind(dbs.in$Lon[i],dbs.in$Lat[i])),florida.tract) > save.tract<-florida.tract[index,] > dbs.in$state[i] <- save.tract$state #this is returning the position > in the list instead of the value > dbs.in$county[i] <- save.tract$county #this is returning the > position in the list instead of the value > dbs.in$tract[i] <- save.tract$tract #this is returning the position > in the list instead of the value > } > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Return-value-associated-with-a-factor-tp2262605p2262605.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing elements of a list over a certain threshold
You shouldn't use sapply/lapply for this but use the indices > set.seed(1) > r <- round(runif(10,1,10)) > treshold <- 5 > head(r) [1] 3 4 6 9 3 9 > system.time( r[ r>threshold ] <- threshold ) user system elapsed 0 0 0 > head(r) [1] 3 4 5 5 3 5 On Mon, Jun 21, 2010 at 2:03 PM, Christos Argyropoulos wrote: > > Hi, > You should use the sapply/lapply for such operations. > >> r<-round(runif(10,1,10)) >> head(r) > [1] 3 7 6 3 2 8 >> filt<-function(x,thres) ifelse(x> system.time(r2<-sapply(r,filt,thres=5)) > user system elapsed > 3.36 0.00 3.66 >> head(r2) > [1] 3 5 5 3 2 5 > > > To return a list, replace "sapply" with "lapply" > > Christos > > > >> Date: Mon, 21 Jun 2010 11:34:01 +0100 >> From: ja...@ipec.co.uk >> To: r-help@r-project.org >> Subject: [R] Replacing elements of a list over a certain threshold >> >> Dear List, >> >> I have a list of length ~1000 filled with numerics. I need to replace >> the elements of this list that are above a certain numerical threshold >> with the value of the threshold. >> >> e.g >> example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1) >> threshold=5 >> >> example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1). >> >> I have written a crude script that achieves this but it's very slow. Is >> there a way to do this using some R function? >> >> Crude script: http://pastebin.com/3KSfi8nD >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > _ > Hotmail: Free, trusted and rich email service. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.