Re: [R] R help on loops
Please, please do not use html formatted mail. It is clearly requested by the mailing list The code of your last mail is a mess and when replying it becomes even more of a mess. I told you to do siml[g,] - optm(perm=20) See the comma after g. and not what you are now doing with siml[g]-optm(perm=20)[g] I'm giving up. Berend On 07-06-2013, at 12:15, Laz lmra...@ufl.edu wrote: Hi, I am almost getting there, but still have errors. Thanks for your help. I have tried improving but I get the following errors below: itn-function(it){ + siml-matrix(NA,ncol=5,nrow=it) + for(g in 1:it){ + siml[g]-optm(perm=20)[g] + } + siml + } itn(3) [,1] [,2] [,3] [,4] [,5] [1,] 0.8873775898 NA NA NA NA [2,] 0.0015584824 NA NA NA NA [3,] 0.0001414317 NA NA NA NA itn-function(it){ + siml-matrix(NA,ncol=5,nrow=it) + for(g in 1:it){ + siml[g]-optm(perm=20) + } + siml + } itn(3) [,1] [,2] [,3] [,4] [,5] [1,] 0.8880941 NA NA NA NA [2,] 0.8869727 NA NA NA NA [3,] 0.8877045 NA NA NA NA Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length I expect something close to average sd se min max 0.8881969 0.0008215379 0.000410769 0.8873842 0.8890167 0.884659 0.0004215379 0.000410769 0.2342 0.676307 0.8885839 0.0001215379 0.0002112 0.82752992 0.8836337 Thanks fpr you help. On 6/7/2013 5:24 AM, Berend Hasselman wrote: On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote: Dear R users, I am stuck here: My first function returns a vector of 5 values. In my second function, I want to repeat this, a number of times, say 10 so that I have 10 rows and five columns but I keep on getting errors. See the code and results below: optm -function(perm, verbose = FALSE) { trace-c() for (k in 1:perm){ trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1] perm[k]-k mat-cbind(trace, perm = seq(perm)) } if (verbose){ cat(***starting matrix\n) print(mat) } # iterate till done while(nrow(mat) 1){ high - diff(mat[, 'trace']) 0 if (!any(high)) break # done # find which one to delete delete - which.max(high) + 1L mat - mat[-delete, ] newmat-apply(mat,2,mean)[1] sdm-sd(mat[,1]) sem-sdm/sqrt(nrow(mat)) maxv-mat[1,1] minv-mat[nrow(mat),1] } stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv) stats } test-optm(perm=20) test average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 itn-function(it){ siml-matrix(NA,ncol=5,nrow=length(it)) for(g in 1:it){ siml[g]-optm(perm=20) } siml-cbind(siml=siml) siml } ans-itn(5) Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 4: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 5: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length ans [,1] [,2] [,3] [,4] [,5] [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469 1. Not reproducible code. Where does function Rspatswap come from? 2. You have several errors in function itn: Argument it is a scalar: length(it) is 1. You need to do siml - matrix(NA,ncol=5,nrow=it) Next in the g-loop you want to fill row g so do: siml[g,] - ….. Finally why are you doing siml - cbind(siml=siml)? Seems superfluous to me. Delete the line. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help on loops
Dear R Users, I created a function which returns a value every time it's run (A simplified toy example is attached on this mail). For example spat(a,b,c,d) # run the first time and it gives you ans1= 10, perm=1 ; run the second time and gives you ans2= 7, perm=2 etc I store both the result and the iteration on a matrix called vector with columns:1==ans, column2==permutation The rule I want to implement is: compare between ans1 and ans2. If ans1ans2, keep both ans1 and ans2. if ans1ans2, drop the whole row of the second permutation (that is drop both ans2 and perm2 but continue counting all permutations). Re-run the function for the 3rd time and repeat comparison between the value of the last run and the current value obtained. Return matrix ans with the saved ans and their permutations and another full matrix with all results including the dropped ans and their permutation numbers. I need to repeat this process 1000 times but I am getting stuck below. See attached R code. The code below works only for the first 6 permutations. suppose I want 1000 permutations, where do I keep the loop? Example: Not a perfect code though it somehow works: testx-function(perm){ optA-c() #while(perm=2){ for (k in 1:perm){ #repeat { optA[k]-sample(1:1000,1,replace=TRUE) perm[k]-k } mat2-as.matrix(cbind(optA=optA,perm=perm)) result-mat2 lenm-nrow(result) if(result[1,1]=result[2,1]) result-result[1,] return(list(mat2=mat2,result=result)) #} if(perm2){ mat2-as.matrix(cbind(optA=optA,perm=perm)) result-mat2 lenm-nrow(result) if(result[1,1]=result[2,1]) result-result[1,] if(result[lenm-1,1]=result[lenm,1]) result-result[-lenm,] if(result[(lenm-2),1]=result[(lenm-1),1]) result-result[-(lenm-1),] if(result[(lenm-3),1]=result[(lenm-2),1]) result-result[-(lenm-2),] if(result[(lenm-4),1]=result[(lenm-3),1]) result-result[-(lenm-3),] if(result[(lenm-5),1]=result[(lenm-4),1]) result-result[-(lenm-4),] return(list(mat2=mat2,result=result)) } } ## Now calling my function below: testx(perm=6) $mat2 optA perm [1,] 2721 [2,] 8582 [3,] 8343 [4,] 8624 [5,] 6505 [6,] 4056 $result optA perm 2721 testx(perm=2) $mat2 optA perm [1,] 3981 [2,] 8162 $result optA perm 3981 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
Is this what you want? I was not clear on your algorithm, but is looks like you wanted descending values: testx - + function(n, verbose = FALSE) + { + mat - cbind(optA = sample(n, n, TRUE), perm = seq(n)) + if (verbose){ + cat(***starting matrix\n) + print(mat) + } + # iterate till done + while(nrow(mat) 1){ + high - diff(mat[, 'optA']) 0 + if (!any(high)) break # done + # find which one to delete + delete - which.max(high) + 1L + mat - mat[-delete, ] + } + mat + } testx(20, verbose = TRUE) ***starting matrix optA perm [1,] 131 [2,] 122 [3,]73 [4,] 104 [5,] 115 [6,]46 [7,] 117 [8,]28 [9,]69 [10,]5 10 [11,]6 11 [12,] 18 12 [13,]9 13 [14,] 16 14 [15,] 18 15 [16,]9 16 [17,]2 17 [18,]7 18 [19,] 15 19 [20,]7 20 optA perm [1,] 131 [2,] 122 [3,]73 [4,]46 [5,]28 [6,]2 17 On Fri, May 31, 2013 at 2:02 PM, Laz lmra...@ufl.edu wrote: Dear R Users, I created a function which returns a value every time it's run (A simplified toy example is attached on this mail). For example spat(a,b,c,d) # run the first time and it gives you ans1= 10, perm=1 ; run the second time and gives you ans2= 7, perm=2 etc I store both the result and the iteration on a matrix called vector with columns:1==ans, column2==permutation The rule I want to implement is: compare between ans1 and ans2. If ans1ans2, keep both ans1 and ans2. if ans1ans2, drop the whole row of the second permutation (that is drop both ans2 and perm2 but continue counting all permutations). Re-run the function for the 3rd time and repeat comparison between the value of the last run and the current value obtained. Return matrix ans with the saved ans and their permutations and another full matrix with all results including the dropped ans and their permutation numbers. I need to repeat this process 1000 times but I am getting stuck below. See attached R code. The code below works only for the first 6 permutations. suppose I want 1000 permutations, where do I keep the loop? Example: Not a perfect code though it somehow works: testx-function(perm){ optA-c() #while(perm=2){ for (k in 1:perm){ #repeat { optA[k]-sample(1:1000,1,replace=TRUE) perm[k]-k } mat2-as.matrix(cbind(optA=optA,perm=perm)) result-mat2 lenm-nrow(result) if(result[1,1]=result[2,1]) result-result[1,] return(list(mat2=mat2,result=result)) #} if(perm2){ mat2-as.matrix(cbind(optA=optA,perm=perm)) result-mat2 lenm-nrow(result) if(result[1,1]=result[2,1]) result-result[1,] if(result[lenm-1,1]=result[lenm,1]) result-result[-lenm,] if(result[(lenm-2),1]=result[(lenm-1),1]) result-result[-(lenm-1),] if(result[(lenm-3),1]=result[(lenm-2),1]) result-result[-(lenm-2),] if(result[(lenm-4),1]=result[(lenm-3),1]) result-result[-(lenm-3),] if(result[(lenm-5),1]=result[(lenm-4),1]) result-result[-(lenm-4),] return(list(mat2=mat2,result=result)) } } ## Now calling my function below: testx(perm=6) $mat2 optA perm [1,] 2721 [2,] 8582 [3,] 8343 [4,] 8624 [5,] 6505 [6,] 4056 $result optA perm 2721 testx(perm=2) $mat2 optA perm [1,] 3981 [2,] 8162 $result optA perm 3981 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 123, Issue 30
Hi all are there any R packages that include circular stats similar to Oriana (http://www.kovcomp.co.uk/oriana/newver4.html)? I am interested in looking at annual patterns of bat activity where data will have date/times and relative abundance values for each Date. I would like to have a circular plot with the circumference axis the 12 months of the year and then a value of relative abundance and likely with ggplot2 this can be set to color= species. Tnx Bruce __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 123, Issue 30
On 05/27/2013 10:28 AM, Neotropical bat risk assessments wrote: Hi all are there any R packages that include circular stats similar to Oriana (http://www.kovcomp.co.uk/oriana/newver4.html)? I am interested in looking at annual patterns of bat activity where data will have date/times and relative abundance values for each Date. I would like to have a circular plot with the circumference axis the 12 months of the year and then a value of relative abundance and likely with ggplot2 this can be set to color= species. Tnx Bruce Hi Bruce, Here is a possibility: library(plotrix) batact-matrix(c(sin(seq(0,1.833*pi,length=12))+2+rnorm(36)/4), nrow=3,byrow=TRUE) batpos-seq(0,1.833*pi,length=12) radial.plot(batact,batpos,rp.type=ps,main=Bat activity by month, line.col=2:4,radial.lim=0:4,label.pos=batpos,labels=month.abb, point.symbols=16:18,point.col=2:4,label.prop=1.1,start=pi/2, clockwise=TRUE) legend(-3.5,0.5,paste(Species,1:3),pch=16:18,col=2:4) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-Help: nparLD Package Non-parametric Repeated Measures
Hello, James, see my comments inline. ... Main issue/question: In R the nparLD ANOVA-type Test showed a significant p-value for diel period, no effect of season, and no interaction between diel period and season. But a post-hoc Wilcoxon Signed-Rank Test did NOT find a significant difference (p = 0.054) for diel period (day vs night) body temperature. How is it possible to have a significant effect for day vs night, based on the nparLD package, but NO significant difference between day and night for the post-hoc Wilcoxon test? Those tests -- in general -- test different hypotheses and use different test statistics, the first one, in addition, using a distributional approximation to obtain a p-value. You may want to look at the references cited in the respective online help pages for technical details. Also, if I only have two levels of the fixed effect (day vs night), do I need to run a post-hoc test or just look at the mean values after the ANOVA-type test? Hm, actually no (of course depending on your scientific question) since the ANOVA-table -- if you believe its p-value -- already tells you that there is a (marginally) significant main Diel-period-effect, i.e., a difference between day and night. However, I recommend that you consult with a local statistician since this is actually not an R-problem and since I have the impression that there exist some uncertainties in your statistical expertise. Hth -- Gerrit Data info: The repeated measurements on the 7 subjects had 2 fixed effects: 1. Diel period (day or night) 2. Season (Spring, summer, and fall)(Subplot Factor) Mean values for body temperature and for diel period are below. Diel column (D=Day, N = Night). State column (RT=Spring, RF = Summer, PT = Fall). Subject, N=7. NA = missing value. All comments (good and bad) are greatly appreciated! Thanks, James -- output of sessionInfo(): [code] data=read.csv(file.choose(), header=TRUE) attach(data) data stp diel state subject 1 26.2DRT 1 2 26.4NRT 1 3 24.1DRT 2 4NANRT 2 5NADRT 3 6 25.2NRT 3 7 27.1DRT 4 8 26.5NRT 4 9 26.9DRT 5 10 27.1NRT 5 11 26.2DRT 6 12 26.0NRT 6 13 26.3DRT 7 14 26.7NRT 7 15 26.0DRF 1 16 26.6NRF 1 17 24.2DRF 2 18 25.6NRF 2 19 25.6DRF 3 20 26.6NRF 3 21 26.1DRF 4 22 26.9NRF 4 23 27.2DRF 5 24 27.4NRF 5 25 26.2DRF 6 26 26.7NRF 6 27 27.2DRF 7 28 27.5NRF 7 29 25.0DPT 1 30 24.8NPT 1 31 NADPT 2 32 NANPT 2 33 NADPT 3 34 NANPT 3 35 26.7DPT 4 36 26.9NPT 4 37 27.6DPT 5 38 27.5NPT 5 39 25.2DPT 6 40 24.9NPT 6 41 27.1DPT 7 42 27.0NPT 7 ex.f2-ld.f2(y=stp, time1=diel, time2=state, subject=subject, time1.name=Diel, time2.name=State, description=FALSE) ex.f2$ANOVA.test Statistic dfp-value Diel 4.9028447 1.00 0.02681249 State 0.2332795 1.374320 0.70586274 Diel:State 2.1937783 1.062943 0.13717393 [/code] [code] detach(data) data=read.csv(file.choose(), header=TRUE) attach(data) data day night 1 26.2 26.4 2 26.0 26.6 3 25.0 24.8 4 24.2 25.6 5 25.6 26.6 6 27.1 26.5 7 26.1 26.9 8 26.7 26.9 9 26.9 27.1 10 27.2 27.4 11 27.6 27.5 12 26.2 26.0 13 26.2 26.7 14 25.2 24.9 15 26.3 26.7 16 27.2 27.5 17 27.1 27.0 library(coin) wilcoxsign_test(day ~ night, distribution=exact) Exact Wilcoxon-Signed-Rank Test data: y by x (neg, pos) stratified by block Z = -1.9234, p-value = 0.05482 alternative hypothesis: true mu is not equal to 0 [/code] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help: Batch read files based on names in a list
HI, You could use: (# with 3 files in my folder) lfiles-list.files(pattern=.txt) lfiles #[1] file1.txt file2.txt file3.txt lst1-lapply(lfiles,function(x) read.table(x,header=TRUE,sep=,stringsAsFactors=FALSE)) lst1 #[[1]] # col1 col2 #1 1 0.5 #2 2 0.2 #3 3 0.3 #4 4 0.3 #5 5 0.1 #6 6 0.2 # #[[2]] # col1 col3 #1 1 A #2 2 B #3 3 C # #[[3]] # col1 col4 #1 1 0.1 #2 2 0.5 #3 4 0.9 library(plyr) join_all(lst1,by=col1) # col1 col2 col3 col4 #1 1 0.5 A 0.1 #2 2 0.2 B 0.5 #3 3 0.3 C NA #4 4 0.3 NA 0.9 #5 5 0.1 NA NA #6 6 0.2 NA NA join_all(lst1,by=col1,type=inner) # col1 col2 col3 col4 #1 1 0.5 A 0.1 #2 2 0.2 B 0.5 #or Reduce(function(...) merge(...,all=TRUE),lst1) # col1 col2 col3 col4 #1 1 0.5 A 0.1 #2 2 0.2 B 0.5 #3 3 0.3 C NA #4 4 0.3 NA 0.9 #5 5 0.1 NA NA #6 6 0.2 NA NA #Suppose you don't want to read file3.txt lfilesSub-lfiles[!lfiles%in% file3.txt] lfilesSub #[1] file1.txt file2.txt A.K. - Original Message - From: Jonathan Dry dry...@gmail.com To: r-help@r-project.org Cc: Sent: Wednesday, May 15, 2013 1:51 PM Subject: [R] R help: Batch read files based on names in a list * I am currently reading in a series of files, applying the same functions to them one at a time, and then merging the resulting data frames e.g.: MyRows - c(RowA, RowB, RowC)File1_DF - read.delim(DirectoryToFiles\\File1_Folder\\File1.txt, stringsAsFactors=FALSE, check.names=FALSE)File1_DF - as.data.frame(t(File1_DF[MyRows,]))File1_DF - as.data.frame(t(File1_DF))mergeDF - merge(mergeDF,File1_DF, by.x = Row.names, by.y=row.names)File2_DF - read.delim(DirectoryToFiles\\File2_Folder\\File2.txt, stringsAsFactors=FALSE, check.names=FALSE)File2_DF - as.data.frame(t(File2_DF[MyRows,]))File2_DF - as.data.frame(t(File2_DF))mergeDF - merge(mergeDF,File2_DF, by.x = Row.names, by.y=row.names) ...etc I want to know if I can use a list of the filenames c(File1, File2, File2) etc. and apply a function to do this in a more automated fasion? This would involve using the list value in the directory path to read in the file i.e. *MyFilesValue*_DF - read.delim(DirectoryToFolders\\*MyFilesValue*_Folder\\*MyFilesValue*.txt, stringsAsFactors=FALSE, check.names=FALSE) Any help appreciated * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help: Batch read files based on names in a list
* I am currently reading in a series of files, applying the same functions to them one at a time, and then merging the resulting data frames e.g.: MyRows - c(RowA, RowB, RowC)File1_DF - read.delim(DirectoryToFiles\\File1_Folder\\File1.txt, stringsAsFactors=FALSE, check.names=FALSE)File1_DF - as.data.frame(t(File1_DF[MyRows,]))File1_DF - as.data.frame(t(File1_DF))mergeDF - merge(mergeDF,File1_DF, by.x = Row.names, by.y=row.names)File2_DF - read.delim(DirectoryToFiles\\File2_Folder\\File2.txt, stringsAsFactors=FALSE, check.names=FALSE)File2_DF - as.data.frame(t(File2_DF[MyRows,]))File2_DF - as.data.frame(t(File2_DF))mergeDF - merge(mergeDF,File2_DF, by.x = Row.names, by.y=row.names) ...etc I want to know if I can use a list of the filenames c(File1, File2, File2) etc. and apply a function to do this in a more automated fasion? This would involve using the list value in the directory path to read in the file i.e. *MyFilesValue*_DF - read.delim(DirectoryToFolders\\*MyFilesValue*_Folder\\*MyFilesValue*.txt, stringsAsFactors=FALSE, check.names=FALSE) Any help appreciated * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help: Batch read files based on names in a list
On Wed, 15 May 2013, Jonathan Dry dry...@gmail.com writes: * I am currently reading in a series of files, applying the same functions to them one at a time, and then merging the resulting data frames e.g.: MyRows - c(RowA, RowB, RowC)File1_DF - read.delim(DirectoryToFiles\\File1_Folder\\File1.txt, stringsAsFactors=FALSE, check.names=FALSE)File1_DF - as.data.frame(t(File1_DF[MyRows,]))File1_DF - as.data.frame(t(File1_DF))mergeDF - merge(mergeDF,File1_DF, by.x = Row.names, by.y=row.names)File2_DF - read.delim(DirectoryToFiles\\File2_Folder\\File2.txt, stringsAsFactors=FALSE, check.names=FALSE)File2_DF - as.data.frame(t(File2_DF[MyRows,]))File2_DF - as.data.frame(t(File2_DF))mergeDF - merge(mergeDF,File2_DF, by.x = Row.names, by.y=row.names) ...etc I want to know if I can use a list of the filenames c(File1, File2, File2) etc. and apply a function to do this in a more automated fasion? This would involve using the list value in the directory path to read in the file i.e. Something like this? files - dir(my_directory) for (f in files) { ## do something with file 'f' } -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-Help: nparLD Package Non-parametric Repeated Measures
Hi, I'm trying to analyze repeated measurements of body temperature data collected from 7 randomly chosen subjects (e.g. turtles). I am using R, along with the nparLD package to test for an effect of diel period (fixed factor: day or night) and season (sub-plot fixed factor: spring, summer, fall) on body temperature. Based on this set-up (LD-F2), I am using the non-parametric nparLD pacakge([url]http://www.inside-r.org/packages/cran/nparLD/docs/ld.f2[/url]) because data transformations were unsuccessful and I am randomly missing some paired values. Main issue/question: In R the nparLD ANOVA-type Test showed a significant p-value for diel period, no effect of season, and no interaction between diel period and season. But a post-hoc Wilcoxon Signed-Rank Test did NOT find a significant difference (p = 0.054) for diel period (day vs night) body temperature. How is it possible to have a significant effect for day vs night, based on the nparLD package, but NO significant difference between day and night for the post-hoc Wilcoxon test? Also, if I only have two levels of the fixed effect (day vs night), do I need to run a post-hoc test or just look at the mean values after the ANOVA-type test? Data info: The repeated measurements on the 7 subjects had 2 fixed effects: 1. Diel period (day or night) 2. Season (Spring, summer, and fall)(Subplot Factor) Mean values for body temperature and for diel period are below. Diel column (D=Day, N = Night). State column (RT=Spring, RF = Summer, PT = Fall). Subject, N=7. NA = missing value. All comments (good and bad) are greatly appreciated! Thanks, James -- output of sessionInfo(): [code] data=read.csv(file.choose(), header=TRUE) attach(data) data stp diel state subject 1 26.2DRT 1 2 26.4NRT 1 3 24.1DRT 2 4NANRT 2 5NADRT 3 6 25.2NRT 3 7 27.1DRT 4 8 26.5NRT 4 9 26.9DRT 5 10 27.1NRT 5 11 26.2DRT 6 12 26.0NRT 6 13 26.3DRT 7 14 26.7NRT 7 15 26.0DRF 1 16 26.6NRF 1 17 24.2DRF 2 18 25.6NRF 2 19 25.6DRF 3 20 26.6NRF 3 21 26.1DRF 4 22 26.9NRF 4 23 27.2DRF 5 24 27.4NRF 5 25 26.2DRF 6 26 26.7NRF 6 27 27.2DRF 7 28 27.5NRF 7 29 25.0DPT 1 30 24.8NPT 1 31 NADPT 2 32 NANPT 2 33 NADPT 3 34 NANPT 3 35 26.7DPT 4 36 26.9NPT 4 37 27.6DPT 5 38 27.5NPT 5 39 25.2DPT 6 40 24.9NPT 6 41 27.1DPT 7 42 27.0NPT 7 ex.f2-ld.f2(y=stp, time1=diel, time2=state, subject=subject, time1.name=Diel, time2.name=State, description=FALSE) ex.f2$ANOVA.test Statistic dfp-value Diel 4.9028447 1.00 0.02681249 State 0.2332795 1.374320 0.70586274 Diel:State 2.1937783 1.062943 0.13717393 [/code] [code] detach(data) data=read.csv(file.choose(), header=TRUE) attach(data) data day night 1 26.2 26.4 2 26.0 26.6 3 25.0 24.8 4 24.2 25.6 5 25.6 26.6 6 27.1 26.5 7 26.1 26.9 8 26.7 26.9 9 26.9 27.1 10 27.2 27.4 11 27.6 27.5 12 26.2 26.0 13 26.2 26.7 14 25.2 24.9 15 26.3 26.7 16 27.2 27.5 17 27.1 27.0 library(coin) wilcoxsign_test(day ~ night, distribution=exact) Exact Wilcoxon-Signed-Rank Test data: y by x (neg, pos) stratified by block Z = -1.9234, p-value = 0.05482 alternative hypothesis: true mu is not equal to 0 [/code] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help for creating expression data of Differentially expressed genes
Hi Arun, I am still facing trouble as I can see the data output is identical for all rows when I am using this merge function. It seems that since in my data2 which I have provided I have not given you the exact genes I have. There are likely to be repeatations of ID in both the files but the problem now is that when the loop is running for the merge on basis of ID it is printing on the rows for which the ID's are repeated more than once and which does not have any repeatations in the data.txt they are being ignored. Say I had a locus XLOC_002126 which is the ID and its repeated for 10 times in the data1.txt so when my merging is taking place its only working on those ID's which are repeated more than once in data1.txt and merging them with their respective attributes. And this is happening for the number of times it is being repeated in data1.txt and the next data on which it merges is also the same for which in data1.txt we have more repeats for ID. Here is an example of the output I am getting below. ID test_id gene locus Sample_118p_0 Sample_118rp3_0 Sample_118rz_0 Sample_118z_0 Sample_132p1_0 Sample_132p2_0 Sample_132p3_0 Sample_132rp1_0 Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0 Sample_132z_0 Sample_141p1_0 Sample_141p2_0 Sample_141p3_0 Sample_141p4_0 Sample_141z_0 Sample_183p1_0 Sample_183p2_0 Sample_183p3_0 Sample_183z_0 Sample_91p_0 Sample_91rp1_0 Sample_91rp3_0 Sample_91rp4_0 Sample_91rz_0 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78 76.5147 4.67875 468.667 XLOC_002126 XLOC_002126 MPZ
Re: [R] R help for creating expression data of Differentially expressed genes
Hi Vivek, May be this helps: set.seed(35) dat1- cbind(ID=1:8, as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7))) set.seed(38) dat2- cbind(ID= sample(1:20,8,replace=FALSE), as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33))) colnames(dat2)[-1]-gsub(V,X,colnames(dat2)[-1]) merge(dat1[,1:2],dat2[,1:31],by=ID) # ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 #1 1 43 44 4 33 47 29 43 31 15 2 34 42 5 18 22 36 34 44 3 45 9 #2 3 28 4 18 45 24 5 20 30 16 49 34 33 5 24 49 31 10 45 21 26 20 #3 6 5 16 1 5 2 26 6 40 16 15 50 26 37 22 25 39 16 24 29 50 42 #4 7 25 26 39 16 29 5 40 15 27 46 16 38 36 42 8 3 29 7 13 18 38 #5 8 30 3 41 25 38 24 41 44 23 2 45 33 10 18 20 49 19 23 42 25 5 # X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 #1 14 27 3 21 6 44 33 42 10 29 #2 48 13 8 47 18 9 23 9 44 3 #3 25 14 31 19 14 6 26 13 6 49 #4 43 28 15 6 9 19 43 21 41 21 #5 1 27 18 3 42 5 16 39 46 47 A.K. - Original Message - From: Vivek Das vd4mm...@gmail.com To: arun smartpink...@yahoo.com Cc: Sent: Tuesday, May 7, 2013 3:45 PM Subject: R help for creating expression data of Differentially expressed genes Hi Arun, I need some help regarding R scripting. I have two data file one containing seven columns and the other containing 33. Both files have unique identifier as ID. I want to create another file which should have the first two columns of the first file and and the 31 columns of the second file matched on the basis of ID. The first file is having gene I'd and gene names of around 500 and I want the output file which is having all of those and other attributes as well. I want to get the output file having all attributes matching with the I'd of the first file. So that I get output of 500 rows with all the attributes of second file. I am new to R but having trouble with merge function in R. If you can help it will be great. Regards, Vivek Sent from my iPad On 07/mag/2013, at 21:13, arun smartpink...@yahoo.com wrote: HI Ye, For the NA in ID column, Hi dat1- read.table(text= ObsNumber ID Weight 1 0001 12 2 0001 13 3 0001 14 4 0002 16 5 0002 17 6 N/A 18 ,sep=,header=TRUE,colClass=c(numeric,character,numeric),na.strings=N/A) unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) #[1] 0001_1 0001_2 0001_3 0002_1 0002_2 A.K. From: Ye Lin ye...@lbl.gov To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, May 7, 2013 2:54 PM Subject: Re: [R] create unique ID for each group Thanks A.K. But I have NA in ID column, so when I apply the code, it gives me error saying the replacement as less rows than the data has. Anyway for ID=N/A, return sth like N/A_1 in order as well? On Tue, May 7, 2013 at 11:17 AM, arun smartpink...@yahoo.com wrote: H, Sorry, a mistake: dat1$UniqueID-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) dat1 # ObsNumber ID Weight UniqueID #1 1 0001 12 0001_1 #2 2 0001 13 0001_2 #3 3 0001 14 0001_3 #4 4 0002 16 0002_1 #5 5 0002 17 0002_2 dat2$UniqueID-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) A.K. - Original Message - From: arun smartpink...@yahoo.com To: Ye Lin ye...@lbl.gov Cc: R help r-help@r-project.org Sent: Tuesday, May 7, 2013 2:10 PM Subject: Re: [R] create unique ID for each group Hi, Try this: dat1- read.table(text= ObsNumber ID Weight 1 0001 12 2 0001 13 3 0001 14 4 0002 16 5 0002 17 ,sep=,header=TRUE,colClass=c(numeric,character,numeric)) dat2- read.table(text= ID Height 0001 3.2 0001 2.6 0001 3.2 0002 2.2 0002 2.6 ,sep=,header=TRUE,colClass=c(character,numeric)) dat1$UniqueID-with(dat1,as.character(interaction(ID,ObsNumber,sep=_))) dat2$UniqueID-with(dat2,as.character(interaction(ID,rownames(dat2),sep=_))) dat2 # ID Height UniqueID #1 0001 3.2 0001_1 #2 0001 2.6 0001_2 #3 0001 3.2 0001_3 #4 0002 2.2 0002_4 #5 0002 2.6 0002_5 A.K. - Original Message - From: Ye Lin ye...@lbl.gov To: R help r-help@r-project.org Cc: Sent: Tuesday, May 7,
Re: [R] R help for creating expression data of Differentially expressed genes
HI Arun, My data sets are as in the provided files. I am providing the sample files. I guess this will give a better idea to the type of working I want to do with the two files and the kind or script am trying to write. Hope you can give me some suggestions regarding this. I am new to R so having trouble to use different functions to use this for my working. Anyone who can help me out with this can be of great help. -- Vivek Das PhD Student in Computational Biology Giuseppe Testa's Lab European School of Molecular Medicine IFOM-IEO Campus Via Adamello, 16 Milan, Italy emails: vivek@ieo.eu vchris...@yahoo.co.in vd4mm...@gmail.com On Tue, May 7, 2013 at 10:36 PM, arun smartpink...@yahoo.com wrote: Hi Vivek, May be this helps: set.seed(35) dat1- cbind(ID=1:8, as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7))) set.seed(38) dat2- cbind(ID= sample(1:20,8,replace=FALSE), as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33))) colnames(dat2)[-1]-gsub(V,X,colnames(dat2)[-1]) merge(dat1[,1:2],dat2[,1:31],by=ID) # ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 #1 1 43 44 4 33 47 29 43 31 15 2 34 42 5 18 22 36 34 44 3 45 9 #2 3 28 4 18 45 24 5 20 30 16 49 34 33 5 24 49 31 10 45 21 26 20 #3 6 5 16 1 5 2 26 6 40 16 15 50 26 37 22 25 39 16 24 29 50 42 #4 7 25 26 39 16 29 5 40 15 27 46 16 38 36 42 8 3 29 7 13 18 38 #5 8 30 3 41 25 38 24 41 44 23 2 45 33 10 18 20 49 19 23 42 25 5 # X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 #1 14 27 3 21 6 44 33 42 10 29 #2 48 13 8 47 18 9 23 9 44 3 #3 25 14 31 19 14 6 26 13 6 49 #4 43 28 15 6 9 19 43 21 41 21 #5 1 27 18 3 42 5 16 39 46 47 A.K. - Original Message - From: Vivek Das vd4mm...@gmail.com To: arun smartpink...@yahoo.com Cc: Sent: Tuesday, May 7, 2013 3:45 PM Subject: R help for creating expression data of Differentially expressed genes Hi Arun, I need some help regarding R scripting. I have two data file one containing seven columns and the other containing 33. Both files have unique identifier as ID. I want to create another file which should have the first two columns of the first file and and the 31 columns of the second file matched on the basis of ID. The first file is having gene I'd and gene names of around 500 and I want the output file which is having all of those and other attributes as well. I want to get the output file having all attributes matching with the I'd of the first file. So that I get output of 500 rows with all the attributes of second file. I am new to R but having trouble with merge function in R. If you can help it will be great. Regards, Vivek Sent from my iPad On 07/mag/2013, at 21:13, arun smartpink...@yahoo.com wrote: HI Ye, For the NA in ID column, Hi dat1- read.table(text= ObsNumber ID Weight 1 0001 12 2 0001 13 3 0001 14 4 0002 16 5 0002 17 6 N/A 18 ,sep=,header=TRUE,colClass=c(numeric,character,numeric),na.strings=N/A) unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) #[1] 0001_1 0001_2 0001_3 0002_1 0002_2 A.K. From: Ye Lin ye...@lbl.gov To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, May 7, 2013 2:54 PM Subject: Re: [R] create unique ID for each group Thanks A.K. But I have NA in ID column, so when I apply the code, it gives me error saying the replacement as less rows than the data has. Anyway for ID=N/A, return sth like N/A_1 in order as well? On Tue, May 7, 2013 at 11:17 AM, arun smartpink...@yahoo.com wrote: H, Sorry, a mistake: dat1$UniqueID-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) dat1 # ObsNumber ID Weight UniqueID #1 1 0001 12 0001_1 #2 2 0001 13 0001_2 #3 3 0001 14 0001_3 #4 4 0002 16 0002_1 #5 5 0002 17 0002_2 dat2$UniqueID-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) A.K. - Original Message - From: arun smartpink...@yahoo.com To: Ye Lin ye...@lbl.gov Cc: R help r-help@r-project.org Sent: Tuesday, May 7, 2013 2:10 PM Subject: Re: [R] create unique ID for each group Hi, Try this: dat1- read.table(text= ObsNumber ID Weight 1 0001 12
Re: [R] R help for creating expression data of Differentially expressed genes
HI, Assuming that out_dat.txt is the output you expected. dat1- read.table(data1.txt,header=TRUE,stringsAsFactors=FALSE) dat2- read.table(data2.txt,header=TRUE,stringsAsFactors=FALSE) out_dat- read.table(out_data.txt,header=TRUE,stringsAsFactors=FALSE) out_dat2-merge(dat1[,1:4],dat2,by=ID) identical(out_dat,out_dat2) #[1] TRUE A.K. From: Vivek Das vd4mm...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, May 7, 2013 6:07 PM Subject: Re: R help for creating expression data of Differentially expressed genes HI Arun, My data sets are as in the provided files. I am providing the sample files. I guess this will give a better idea to the type of working I want to do with the two files and the kind or script am trying to write. Hope you can give me some suggestions regarding this. I am new to R so having trouble to use different functions to use this for my working. Anyone who can help me out with this can be of great help. -- Vivek Das PhD Student in Computational Biology Giuseppe Testa's Lab European School of Molecular Medicine IFOM-IEO Campus Via Adamello, 16 Milan, Italy emails: vivek@ieo.eu vchris...@yahoo.co.in vd4mm...@gmail.com On Tue, May 7, 2013 at 10:36 PM, arun smartpink...@yahoo.com wrote: Hi Vivek, May be this helps: set.seed(35) dat1- cbind(ID=1:8, as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7))) set.seed(38) dat2- cbind(ID= sample(1:20,8,replace=FALSE), as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33))) colnames(dat2)[-1]-gsub(V,X,colnames(dat2)[-1]) merge(dat1[,1:2],dat2[,1:31],by=ID) # ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 #1 1 43 44 4 33 47 29 43 31 15 2 34 42 5 18 22 36 34 44 3 45 9 #2 3 28 4 18 45 24 5 20 30 16 49 34 33 5 24 49 31 10 45 21 26 20 #3 6 5 16 1 5 2 26 6 40 16 15 50 26 37 22 25 39 16 24 29 50 42 #4 7 25 26 39 16 29 5 40 15 27 46 16 38 36 42 8 3 29 7 13 18 38 #5 8 30 3 41 25 38 24 41 44 23 2 45 33 10 18 20 49 19 23 42 25 5 # X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 #1 14 27 3 21 6 44 33 42 10 29 #2 48 13 8 47 18 9 23 9 44 3 #3 25 14 31 19 14 6 26 13 6 49 #4 43 28 15 6 9 19 43 21 41 21 #5 1 27 18 3 42 5 16 39 46 47 A.K. - Original Message - From: Vivek Das vd4mm...@gmail.com To: arun smartpink...@yahoo.com Cc: Sent: Tuesday, May 7, 2013 3:45 PM Subject: R help for creating expression data of Differentially expressed genes Hi Arun, I need some help regarding R scripting. I have two data file one containing seven columns and the other containing 33. Both files have unique identifier as ID. I want to create another file which should have the first two columns of the first file and and the 31 columns of the second file matched on the basis of ID. The first file is having gene I'd and gene names of around 500 and I want the output file which is having all of those and other attributes as well. I want to get the output file having all attributes matching with the I'd of the first file. So that I get output of 500 rows with all the attributes of second file. I am new to R but having trouble with merge function in R. If you can help it will be great. Regards, Vivek Sent from my iPad On 07/mag/2013, at 21:13, arun smartpink...@yahoo.com wrote: HI Ye, For the NA in ID column, Hi dat1- read.table(text= ObsNumber ID Weight 1 0001 12 2 0001 13 3 0001 14 4 0002 16 5 0002 17 6 N/A 18 ,sep=,header=TRUE,colClass=c(numeric,character,numeric),na.strings=N/A) unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) #[1] 0001_1 0001_2 0001_3 0002_1 0002_2 A.K. From: Ye Lin ye...@lbl.gov To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, May 7, 2013 2:54 PM Subject: Re: [R] create unique ID for each group Thanks A.K. But I have NA in ID column, so when I apply the code, it gives me error saying the replacement as less rows than the data has. Anyway for ID=N/A, return sth like N/A_1 in order as well? On Tue, May 7, 2013 at 11:17 AM, arun smartpink...@yahoo.com wrote: H, Sorry, a mistake: dat1$UniqueID-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE) dat1 # ObsNumber ID Weight UniqueID #1 1 0001 12 0001_1 #2 2 0001 13 0001_2 #3 3 0001 14 0001_3 #4 4 0002 16 0002_1 #5 5
Re: [R] R help - bootstrap with survival analysis
This comes up regularly. Type ?print.survfit and look at the comments there under value. Terry T. - begin included message Hi, I'm not sure if this is the proper way to ask questions, sorry if not. But here's my problem: I'm trying to do a bootstrap estimate of the mean for some survival data. Is there a way to specifically call upon the rmean value, in order to store it in an object? I've used print(...,print.rmean=T) to print the summary of survfit, but I'm not sure how to access only rmean because it does not show up under attributes for survfit. Thanks for any help in advance! Fayaaz Khatri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help - bootstrap with survival analysis
Hi, I'm not sure if this is the proper way to ask questions, sorry if not. But here's my problem: I'm trying to do a bootstrap estimate of the mean for some survival data. Is there a way to specifically call upon the rmean value, in order to store it in an object? I've used print(...,print.rmean=T) to print the summary of survfit, but I'm not sure how to access only rmean because it does not show up under attributes for survfit. Thanks for any help in advance! Fayaaz Khatri [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 121, Issue 5
Thank you Jason! actually, there have been two solutions and one is yours setting row.names=F works great, additionally what Ive been having problems with is the European version of Word 2010. Apparently it sets delimiters of ; instead of , as in the English/USA version. This is something that messes not only with R exporting txt or csv files, but will effect Office products, ArcGIS and several more. So having to go into the Region and Language settings of the computer and changing the defaults has cleared up quite a bit of the issues I was having with R. ~K Not all who wander are lost K. L. Nicholson PhD Associate Wildlife Biologist® From: Law, Jason jason@portlandoregon.gov Sent: Tuesday, March 5, 2013 5:31 PM Subject: RE: R-help Digest, Vol 121, Issue 5 On R 2.15.2 and ArcGIS 9.3.1, it works for me in ArcCatalog but you have to follow the particulars here: http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Accessing_delimited_text_file_data For example: write.table(test, '***.tab', sep = '\t', row.names = F) The extension .tab and sep = '\t' are required for text files. Didn't test row.names=T but I wouldn't count on that working either. Jason Law Statistician City of Portland, Bureau of Environmental Services Water Pollution Control Laboratory 6543 N Burlington Avenue Portland, OR 97203-5452 503-823-1038 jason@portlandoregon.gov -Original Message- Date: Mon, 04 Mar 2013 10:48:39 -0500 From: Duncan Murdoch murdoch.dun...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Subject: Re: [R] Mysterious issues with reading text files from R in ArcGIS and Excel Message-ID: 5134c257.6020...@gmail.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 04/03/2013 10:09 AM, Kerry wrote: It seems within the last ~3 months Ive been having issues with writing text or csv files from a R data frame. The problem is multifold and it is hard to filter out what is going on and where the problem is. So, Im hoping someone else has come across this and may provide insight. I think you need to provide a simple example for us to try, either by putting a small example of one of your files online for us to download, or (better) by giving us self-contained code to duplicate the problem. You might also get better help (especially about ArcGIS) on the R-sig-Geo mailing list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo. Duncan Murdoch My current settings for R: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C [5] LC_TIME=Swedish_Sweden.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] adehabitat_1.8.11 shapefiles_0.6 foreign_0.8-51 tkrplot_0.0-23 ade4_1.5-1 loaded via a namespace (and not attached): [1] tools_2.15.2 I am using Microsoft Excel 2010 and ArcGIS 10.1sp1 for Desktop Basically, no matter what data frame I am working on, when I export it to a text file to be use in Excel or ArcGIS problems arise. Im not sure if it is R or these other programs, maybe forums for ArcGIS might be more appropriate, but this problem only occurs when I use tables that have been produced from an R session. When I try to open a text file in Excel, either I get an error message stating The file you are trying to open is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source. Followed by Excel has detected that 'file.txt' is a SYLK file, but cannot load it. Either the file has errors or is not a SYLK file format. Click OK to open the file in a different format Then the file opens Otherwise, the file opens fine the first time through - and looks ok. I can't figure out what Im doing different between the two commands of write.table as they are always written the same: write.csv(file, file = D:/mylocations/fileofinterest.csv) or write.table(file, file = D:/mylocations/fileofinterest.txt) Sometimes I will try to add sep = , or sep = ; but these don't make a difference (which I didn't figure they would). The other program I use is ArcGIS and bringing in a txt file from R is really messing things up as 2 new columns of information are typically added and date/time data is usually lost with txt files, but not with csv files. For instance - a text file that looks like this in Excel: id x y date R1dmed R1dmean R1error R2error 1 F07001 1482445 6621768 2007-03-05 10:00:53 2498.2973 2498.2973 FALSE FALSE 2 F07001 1481274 6619628 2007-03-05 12:00:41 657.1029 657.1029 FALSE FALSE 3 F07001 1481279 6619630 2007-03-05 14:01:12 660.3569 660.3569 FALSE FALSE 4 F07001 1481271 6619700
Re: [R] R-help Digest, Vol 121, Issue 5
On 3/26/2013 7:21 AM, Kerry wrote: Thank you Jason! actually, there have been two solutions and one is yours setting row.names=F works great, additionally what Ive been having problems with is the European version of Word 2010. Apparently it sets delimiters of ; instead of , as in the English/USA version. This is something that messes not only with R exporting txt or csv files, but will effect Office products, ArcGIS and several more. So having to go into the Region and Language settings of the computer and changing the defaults has cleared up quite a bit of the issues I was having with R. ~K ?read.csv2 This is a wrapper to read.table(). read.csv2 expects the delimiter to be '; by default and the decimal to be , by default. Not all who wander are lost K. L. Nicholson PhD Associate Wildlife Biologist® From: Law, Jason jason@portlandoregon.gov Sent: Tuesday, March 5, 2013 5:31 PM Subject: RE: R-help Digest, Vol 121, Issue 5 On R 2.15.2 and ArcGIS 9.3.1, it works for me in ArcCatalog but you have to follow the particulars here: http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Accessing_delimited_text_file_data For example: write.table(test, '***.tab', sep = '\t', row.names = F) The extension .tab and sep = '\t' are required for text files. Didn't test row.names=T but I wouldn't count on that working either. Jason Law Statistician City of Portland, Bureau of Environmental Services Water Pollution Control Laboratory 6543 N Burlington Avenue Portland, OR 97203-5452 503-823-1038 jason@portlandoregon.gov -Original Message- Date: Mon, 04 Mar 2013 10:48:39 -0500 From: Duncan Murdoch murdoch.dun...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Subject: Re: [R] Mysterious issues with reading text files from R in ArcGIS and Excel Message-ID: 5134c257.6020...@gmail.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 04/03/2013 10:09 AM, Kerry wrote: It seems within the last ~3 months Ive been having issues with writing text or csv files from a R data frame. The problem is multifold and it is hard to filter out what is going on and where the problem is. So, Im hoping someone else has come across this and may provide insight. I think you need to provide a simple example for us to try, either by putting a small example of one of your files online for us to download, or (better) by giving us self-contained code to duplicate the problem. You might also get better help (especially about ArcGIS) on the R-sig-Geo mailing list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo. Duncan Murdoch My current settings for R: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C [5] LC_TIME=Swedish_Sweden.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] adehabitat_1.8.11 shapefiles_0.6foreign_0.8-51tkrplot_0.0-23 ade4_1.5-1 loaded via a namespace (and not attached): [1] tools_2.15.2 I am using Microsoft Excel 2010 and ArcGIS 10.1sp1 for Desktop Basically, no matter what data frame I am working on, when I export it to a text file to be use in Excel or ArcGIS problems arise. Im not sure if it is R or these other programs, maybe forums for ArcGIS might be more appropriate, but this problem only occurs when I use tables that have been produced from an R session. When I try to open a text file in Excel, either I get an error message stating The file you are trying to open is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source. Followed by Excel has detected that 'file.txt' is a SYLK file, but cannot load it. Either the file has errors or is not a SYLK file format. Click OK to open the file in a different format Then the file opens Otherwise, the file opens fine the first time through - and looks ok. I can't figure out what Im doing different between the two commands of write.table as they are always written the same: write.csv(file, file = D:/mylocations/fileofinterest.csv) or write.table(file, file = D:/mylocations/fileofinterest.txt) Sometimes I will try to add sep = , or sep = ; but these don't make a difference (which I didn't figure they would). The other program I use is ArcGIS and bringing in a txt file from R is really messing things up as 2 new columns of information are typically added and date/time data is usually lost with txt files, but not with csv files. For instance - a text file that looks like this in Excel: id x ydateR1dmedR1dmean R1error R2error 1 F07001 1482445 6621768 2007-03-05 10:00:53 2498.2973
Re: [R] R-help Digest, Vol 121, Issue 5
On R 2.15.2 and ArcGIS 9.3.1, it works for me in ArcCatalog but you have to follow the particulars here: http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Accessing_delimited_text_file_data For example: write.table(test, '***.tab', sep = '\t', row.names = F) The extension .tab and sep = '\t' are required for text files. Didn't test row.names=T but I wouldn't count on that working either. Jason Law Statistician City of Portland, Bureau of Environmental Services Water Pollution Control Laboratory 6543 N Burlington Avenue Portland, OR 97203-5452 503-823-1038 jason@portlandoregon.gov -Original Message- Date: Mon, 04 Mar 2013 10:48:39 -0500 From: Duncan Murdoch murdoch.dun...@gmail.com To: Kerry kernichol...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Subject: Re: [R] Mysterious issues with reading text files from R in ArcGIS and Excel Message-ID: 5134c257.6020...@gmail.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 04/03/2013 10:09 AM, Kerry wrote: It seems within the last ~3 months Ive been having issues with writing text or csv files from a R data frame. The problem is multifold and it is hard to filter out what is going on and where the problem is. So, Im hoping someone else has come across this and may provide insight. I think you need to provide a simple example for us to try, either by putting a small example of one of your files online for us to download, or (better) by giving us self-contained code to duplicate the problem. You might also get better help (especially about ArcGIS) on the R-sig-Geo mailing list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo. Duncan Murdoch My current settings for R: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C [5] LC_TIME=Swedish_Sweden.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] adehabitat_1.8.11 shapefiles_0.6foreign_0.8-51tkrplot_0.0-23 ade4_1.5-1 loaded via a namespace (and not attached): [1] tools_2.15.2 I am using Microsoft Excel 2010 and ArcGIS 10.1sp1 for Desktop Basically, no matter what data frame I am working on, when I export it to a text file to be use in Excel or ArcGIS problems arise. Im not sure if it is R or these other programs, maybe forums for ArcGIS might be more appropriate, but this problem only occurs when I use tables that have been produced from an R session. When I try to open a text file in Excel, either I get an error message stating The file you are trying to open is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source. Followed by Excel has detected that 'file.txt' is a SYLK file, but cannot load it. Either the file has errors or is not a SYLK file format. Click OK to open the file in a different format Then the file opens Otherwise, the file opens fine the first time through - and looks ok. I can't figure out what Im doing different between the two commands of write.table as they are always written the same: write.csv(file, file = D:/mylocations/fileofinterest.csv) or write.table(file, file = D:/mylocations/fileofinterest.txt) Sometimes I will try to add sep = , or sep = ; but these don't make a difference (which I didn't figure they would). The other program I use is ArcGIS and bringing in a txt file from R is really messing things up as 2 new columns of information are typically added and date/time data is usually lost with txt files, but not with csv files. For instance - a text file that looks like this in Excel: id x ydateR1dmedR1dmean R1error R2error 1 F07001 1482445 6621768 2007-03-05 10:00:53 2498.2973 2498.2973 FALSE FALSE 2 F07001 1481274 6619628 2007-03-05 12:00:41 657.1029 657.1029FALSE FALSE 3 F07001 1481279 6619630 2007-03-05 14:01:12 660.3569 660.3569FALSE FALSE 4 F07001 1481271 6619700 2007-03-05 16:00:39 620.1397 620.1397FALSE FALSE in ArcGIS now looks like this: Field1idid_Xid_YxydateR1dmedR1dmean R1errorR2errorOBJECTID * 1F07001118.0818119.485541e+01514824456621768NA2498.297272498.29727FALSEFALSE1 2F07001118.0818119.485541e+01514812746619628NA657.102922657.102922FALSEFALSE2 3F07001118.0818119.485541e+01514812796619630NA660.356911660.356911FALSEFALSE3 4F07001118.0818119.485541e+01514812716619700NA620.139702620.139702FALSEFALSE4 5F07001118.0818119.485541e+01514808496620321NA378.186792378.186792FALSEFALSE5 Where did id_X and id_Y come from?? What are they?? What happened to the Date column??? Why does the date column show up when I use write.csv but not write.table? Thank you for your help. ~K [[alternative HTML version deleted]]
[R] R -HELP REQUEST
Good morning to you all, Sorry for taking your time from your research and teaching schedules. If you have a non-stationary univariate time Series data that has the transformation: Say; l.dat-log (series) d.ldat-diff (l.dat, differences=1) and you fit say arima model. predit.arima-predict (fit.series, n.ahead=10, xregnew= (n+1) :( n+10)) How could I re-transform prediction$pred to the level data since it has been differenced once? I know exp (prediction$pred) will bring the inverse of the log transform but what about the differenced transform? This is my question. I would be very grateful if you could help me with this.Thank you very much in anticipation Mr. Mahmoud Coker Senior Manager Bank of Sierra Leone (Sam-Bangura Building) Freetown-Sierra Leone West Africa Email: cokiest...@yahoo.com Phone: +232 78 625967 / +232 77 440143 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R -HELP REQUEST
If you just want point forecasts, it's simple: Let your original series be X_t, t=1, ..., N. Let Y_t = log(X_t). Let Z_t = Y_t - Y_{t-1}, t = 2, ..., N. Fit your model and forecast, obtaining Z-hat__1, ..., Z-hat_10. Then Y-hat_{N+1} = Y_N + Z-hat_1, Y-hat_{N+2} = Y-hat_{N+1} + Z-hat_2, ., Y-hat_{N+10} = Y-hat_{N+9} + Z-hat_10. In R, let your forecast values be the vector Zhat (a vector of length 10). Then do: Yhat - cumsum(c(Y[N],Zhat))[-1] Xhat - exp(Yhat) Get error bounds on the forecasts is more problematic. cheers, Rolf Turner On 02/05/2013 11:49 PM, Mahmoud Coker wrote: Good morning to you all, Sorry for taking your time from your research and teaching schedules. If you have a non-stationary univariate time Series data that has the transformation: Say; l.dat-log (series) d.ldat-diff (l.dat, differences=1) and you fit say arima model. predit.arima-predict (fit.series, n.ahead=10, xregnew= (n+1) :( n+10)) How could I re-transform prediction$pred to the level data since it has been differenced once? I know exp (prediction$pred) will bring the inverse of the log transform but what about the differenced transform? This is my question. I would be very grateful if you could help me with this.Thank you very much in anticipation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help archives --- are they up-to-date?
I just saw a message from David Winsemius, responding to an inquiry from Carol White: On Jan 28, 2013, at 9:06 PM, carol white wrote: Should I understand that this message was received? It's always possible to check the Archives for this question. This prompted me to ask about a problem that has been bothering me for a while: When I go to Search on the R web page and then click on Searchable mail archives http://tolstoy.newcastle.edu.au/%7Erking/R/ of the three mailing lists are provided by Robert King at the University of Newcastle, Australia. I find that the archives appear to end at 31 January 2012. If I click on, say 2012: April to June http://tolstoy.newcastle.edu.au/R/e18/help/ I get the response The future isn't here yet. Well, yes, it isn't. But in my limited understanding April to June 2012 is in the past, not the future. Am I doing something wrong, or is something broken in my system, or does this happen to others as well? Just in case it's of any relevance: sessionInfo() R version 2.15.2 Patched (2013-01-10 r61627) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] misc_0.0-15 loaded via a namespace (and not attached): [1] grid_2.15.2 lattice_0.20-13 lme4_0.99-0 Matrix_1.0-10 [5] nlme_3.1-107stats4_2.15.2 Thanks for any insight. cheers, Rolf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help archives --- are they up-to-date?
The searchable archives may lag, and apparently do. The main list archive is here: https://stat.ethz.ch/pipermail/r-help/ and is complete. That's the one to check if you wish to know whether something made it to the list. If you go to the r-help listinfo link in the mailing list footer, you are directed to the link I posted, and the searchable archive is also mentioned. Sarah On Tue, Jan 29, 2013 at 3:28 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: I just saw a message from David Winsemius, responding to an inquiry from Carol White: On Jan 28, 2013, at 9:06 PM, carol white wrote: Should I understand that this message was received? It's always possible to check the Archives for this question. This prompted me to ask about a problem that has been bothering me for a while: When I go to Search on the R web page and then click on Searchable mail archives http://tolstoy.newcastle.edu.au/%7Erking/R/ of the three mailing lists are provided by Robert King at the University of Newcastle, Australia. I find that the archives appear to end at 31 January 2012. If I click on, say 2012: April to June http://tolstoy.newcastle.edu.au/R/e18/help/ I get the response The future isn't here yet. Well, yes, it isn't. But in my limited understanding April to June 2012 is in the past, not the future. Am I doing something wrong, or is something broken in my system, or does this happen to others as well? Just in case it's of any relevance: sessionInfo() R version 2.15.2 Patched (2013-01-10 r61627) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] misc_0.0-15 loaded via a namespace (and not attached): [1] grid_2.15.2 lattice_0.20-13 lme4_0.99-0 Matrix_1.0-10 [5] nlme_3.1-107stats4_2.15.2 Thanks for any insight. cheers, Rolf [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help archives --- are they up-to-date?
On 01/30/2013 10:28 AM, Sarah Goslee wrote: The searchable archives may lag, and apparently do. The main list archive is here: https://stat.ethz.ch/pipermail/r-help/ and is complete. That's the one to check if you wish to know whether something made it to the list. If you go to the r-help listinfo link in the mailing list footer, you are directed to the link I posted, and the searchable archive is also mentioned. Gottit!!! Bewdy, ta. cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r-help
-- ___ Paul K. Musingila University of Hasselt, I-Biostat, Diepenbeek, Belgium Mobile: +254-724-423532, +32-48-637-4558 E-mail: paul.musing...@student.uhasselt.be, pmusing...@gmail.com Skype: pmusingila When darkness overtakes the godly, light will come bursting in Psalm 112:4 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help post Bayesian CART
Hi, I have explored many of the R packages that construct Bayesian trees including the tgp, bart, BMA and maptree packages. I have also searched through some other packages but they do not seem to be suitable for the type of analysis I need to do. I need to construct Bayesian CART that have terminal nodes which have bivariate regressions (not multiple regressions like most of the packages do). Does anyone know of a package that has this capability? Thanks, Rachel M.Sc Student [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 118, Issue 20
I don't know of one. If building your own you could use rpart with the maxdepth=1 as the tool to find the best split at each node. Terry Therneau On 12/20/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I've searched R-help and haven't found an answer. I have a set of data from which I can create a classification tree using rpart. However, what I'd like to do is predefine the blank structure of the binary tree (i.e., which nodes to include) and then use a package like rpart to fit for the optimal splitting criteria at each of the predefined nodes. Does such a package exist? Thanks, Lee __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help..subsetting data frame that meeting multiple criteria
Hi, I am new to R. I am trying to regroup data frame using multiple constrains. for example data frame: data value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 I need to regroup each class that have greater than or equal to 70 percent into new group. Similarly, I also need to regroup each class that have less than 70 percent into new group. I can do this by using following syntax for each class class36- data[data$class==36data$percent70,] class36a- data[data$class==36data$percent=70,] but I have 100 different classes. In order to do this for all 100 classes, I have write that syntax 100 times. There would be some way to do dynamically to regroup for 100 classes (may be using for loop) but I dont know. Can you please help in this. Output should be like data frame: class36 value class percent 15528 36 70.3125 data frame: class36a value class percent 15526 36 4.6875 data frame: class62 15527 62 85.9375 data frame: class62a 15528 62 9.375 data frame: class82 15528 82 74.6875 data frame: class82a 15527 82 32.4564 Thank you very much your help.. P. -- View this message in context: http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help..subsetting data frame that meeting multiple criteria
On Nov 23, 2012, at 1:14 PM, prasmas wrote: Hi, I am new to R. I am trying to regroup data frame using multiple constrains. for example data frame: data value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 I need to regroup each class that have greater than or equal to 70 percent into new group. Similarly, I also need to regroup each class that have less than 70 percent into new group. I can do this by using following syntax for each class class36- data[data$class==36data$percent70,] class36a- data[data$class==36data$percent=70,] but I have 100 different classes. In order to do this for all 100 classes, I have write that syntax 100 times. There would be some way to do dynamically to regroup for 100 classes (may be using for loop) but I dont know. Can you please help in this. Output should be like data frame: class36 value class percent 15528 36 70.3125 data frame: class36a value class percent 15526 36 4.6875 dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ] dat.a value class percent 1 1552636 4.6875 2 1552762 85.9375 3 1552782 32.4564 4 1552836 70.3125 5 1552862 9.3750 6 1552882 74.6875 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) paste0(x, letters[1:length(x)]))) dat.a value class percent 36a 1552636 4.6875 36b 1552762 85.9375 62a 1552782 32.4564 62b 1552836 70.3125 82a 1552862 9.3750 82b 1552882 74.6875 You can split by the NROW of dat.a if you want. -- David. data frame: class62 15527 62 85.9375 data frame: class62a 15528 62 9.375 data frame: class82 15528 82 74.6875 data frame: class82a 15527 82 32.4564 Thank you very much your help.. P. -- View this message in context: http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help..subsetting data frame that meeting multiple criteria
Hello, Like the following? dat - read.table(text= value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 , header = TRUE) per70 - dat$percent 70 split(dat, list(dat$class, per70)) Hope this helps, Rui Barradas Em 23-11-2012 21:14, prasmas escreveu: Hi, I am new to R. I am trying to regroup data frame using multiple constrains. for example data frame: data valueclass percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 I need to regroup each class that have greater than or equal to 70 percent into new group. Similarly, I also need to regroup each class that have less than 70 percent into new group. I can do this by using following syntax for each class class36- data[data$class==36data$percent70,] class36a- data[data$class==36data$percent=70,] but I have 100 different classes. In order to do this for all 100 classes, I have write that syntax 100 times. There would be some way to do dynamically to regroup for 100 classes (may be using for loop) but I dont know. Can you please help in this. Output should be like data frame: class36 value class percent 15528 36 70.3125 data frame: class36a value class percent 15526 36 4.6875 data frame: class62 15527 62 85.9375 data frame: class62a 15528 62 9.375 data frame: class82 15528 82 74.6875 data frame: class82a 15527 82 32.4564 Thank you very much your help.. P. -- View this message in context: http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help..subsetting data frame that meeting multiple criteria
On Nov 23, 2012, at 6:32 PM, arun wrote: Hi David, Tried the solution on a slightly different data: dat - read.table(text= value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 15529 72 50. 15530 72 50. , header = TRUE) dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ] row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) paste0(x, letters[1:length(x)]))) dat.a #value class percent #36a 1552636 4.6875 #36b 1552762 85.9375 #62a 1552782 32.4564 #62b 1552836 70.3125 #82a 1552862 9.3750 #82b 1552882 74.6875 Great. Seems to be working as requested. Er, What's your point? -- David. - Original Message - From: David Winsemius dwinsem...@comcast.net To: prasmas prasad...@gmail.com Cc: r-help@r-project.org Sent: Friday, November 23, 2012 7:40 PM Subject: Re: [R] R help..subsetting data frame that meeting multiple criteria On Nov 23, 2012, at 1:14 PM, prasmas wrote: Hi, I am new to R. I am trying to regroup data frame using multiple constrains. for example data frame: data valueclasspercent 15526364.6875 155276285.9375 155278232.4564 155283670.3125 15528629.375 155288274.6875 I need to regroup each class that have greater than or equal to 70 percent into new group. Similarly, I also need to regroup each class that have less than 70 percent into new group. I can do this by using following syntax for each class class36- data[data$class==36data$percent70,] class36a- data[data$class==36data$percent=70,] but I have 100 different classes. In order to do this for all 100 classes, I have write that syntax 100 times. There would be some way to do dynamically to regroup for 100 classes (may be using for loop) but I dont know. Can you please help in this. Output should be like data frame: class36 valueclasspercent 155283670.3125 data frame: class36a valueclasspercent 15526364.6875 dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ] dat.a value class percent 1 1552636 4.6875 2 1552762 85.9375 3 1552782 32.4564 4 1552836 70.3125 5 1552862 9.3750 6 1552882 74.6875 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) paste0(x, letters[1:length(x)]))) dat.a value class percent 36a 1552636 4.6875 36b 1552762 85.9375 62a 1552782 32.4564 62b 1552836 70.3125 82a 1552862 9.3750 82b 1552882 74.6875 You can split by the NROW of dat.a if you want. --David. data frame: class62 155276285.9375 data frame: class62a 15528629.375 data frame: class82 155288274.6875 data frame: class82a 155278232.4564 Thank you very much your help.. P. -- View this message in context: http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help..subsetting data frame that meeting multiple criteria
HI, You could also try this: dat - read.table(text= value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 15529 72 50. 15530 72 50. , header = TRUE) mat1-as.matrix(dat) row.names(mat1)-paste(dat$class,ave(dat$percent,dat$class,FUN=function(x) x70),sep=_) mat1 # value class percent #36_0 15526 36 4.6875 #62_1 15527 62 85.9375 #82_0 15527 82 32.4564 #36_1 15528 36 70.3125 #62_0 15528 62 9.3750 #82_1 15528 82 74.6875 #72_0 15529 72 50. #72_0 15530 72 50. If you want to change 0 and 1 to a and b. row.names(mat1)-gsub(1,b,gsub(0,a,row.names(mat1))) row.names(mat1) #[1] 36_a 62_b 82_a 36_b 62_a 82_b 72_a 72_a Hope it helps. A.K. - Original Message - From: prasmas prasad...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, November 23, 2012 4:14 PM Subject: [R] R help..subsetting data frame that meeting multiple criteria Hi, I am new to R. I am trying to regroup data frame using multiple constrains. for example data frame: data value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 I need to regroup each class that have greater than or equal to 70 percent into new group. Similarly, I also need to regroup each class that have less than 70 percent into new group. I can do this by using following syntax for each class class36- data[data$class==36data$percent70,] class36a- data[data$class==36data$percent=70,] but I have 100 different classes. In order to do this for all 100 classes, I have write that syntax 100 times. There would be some way to do dynamically to regroup for 100 classes (may be using for loop) but I dont know. Can you please help in this. Output should be like data frame: class36 value class percent 15528 36 70.3125 data frame: class36a value class percent 15526 36 4.6875 data frame: class62 15527 62 85.9375 data frame: class62a 15528 62 9.375 data frame: class82 15528 82 74.6875 data frame: class82a 15527 82 32.4564 Thank you very much your help.. P. -- View this message in context: http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help..subsetting data frame that meeting multiple criteria
Hi David, Tried the solution on a slightly different data: dat - read.table(text= value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 15529 72 50. 15530 72 50. , header = TRUE) dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ] row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) paste0(x, letters[1:length(x)]))) dat.a # value class percent #36a 15526 36 4.6875 #36b 15527 62 85.9375 #62a 15527 82 32.4564 #62b 15528 36 70.3125 #82a 15528 62 9.3750 #82b 15528 82 74.6875 A.K. - Original Message - From: David Winsemius dwinsem...@comcast.net To: prasmas prasad...@gmail.com Cc: r-help@r-project.org Sent: Friday, November 23, 2012 7:40 PM Subject: Re: [R] R help..subsetting data frame that meeting multiple criteria On Nov 23, 2012, at 1:14 PM, prasmas wrote: Hi, I am new to R. I am trying to regroup data frame using multiple constrains. for example data frame: data value class percent 15526 36 4.6875 15527 62 85.9375 15527 82 32.4564 15528 36 70.3125 15528 62 9.375 15528 82 74.6875 I need to regroup each class that have greater than or equal to 70 percent into new group. Similarly, I also need to regroup each class that have less than 70 percent into new group. I can do this by using following syntax for each class class36- data[data$class==36data$percent70,] class36a- data[data$class==36data$percent=70,] but I have 100 different classes. In order to do this for all 100 classes, I have write that syntax 100 times. There would be some way to do dynamically to regroup for 100 classes (may be using for loop) but I dont know. Can you please help in this. Output should be like data frame: class36 value class percent 15528 36 70.3125 data frame: class36a value class percent 15526 36 4.6875 dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ] dat.a value class percent 1 15526 36 4.6875 2 15527 62 85.9375 3 15527 82 32.4564 4 15528 36 70.3125 5 15528 62 9.3750 6 15528 82 74.6875 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) paste0(x, letters[1:length(x)]))) dat.a value class percent 36a 15526 36 4.6875 36b 15527 62 85.9375 62a 15527 82 32.4564 62b 15528 36 70.3125 82a 15528 62 9.3750 82b 15528 82 74.6875 You can split by the NROW of dat.a if you want. --David. data frame: class62 15527 62 85.9375 data frame: class62a 15528 62 9.375 data frame: class82 15528 82 74.6875 data frame: class82a 15527 82 32.4564 Thank you very much your help.. P. -- View this message in context: http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r-help or r-devel ?
Hi the list, I have some basic questions about writing a package. On which list shall I post them? Theoretically, I am supposed to post them on r-devel, but all the questions on this list are very advance questions, not basic ones... So I don't know what to do. Christophe -- Christophe Genolini Maître de conférences en bio-statistique Vice président Communication interne et animation du campus Université Paris Ouest Nanterre La Défense [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r-help or r-devel ?
On Nov 5, 2012, at 11:24 AM, Christophe Genolini cgeno...@u-paris10.fr wrote: Hi the list, I have some basic questions about writing a package. On which list shall I post them? Theoretically, I am supposed to post them on r-devel, but all the questions on this list are very advance questions, not basic ones... So I don't know what to do. Christophe In general, queries pertaining to creating R packages should go to R-Devel, especially if it involves the use of C/C++/FORTRAN. That being said, if your queries are basic, please be sure to read through Writing R Extensions: http://cran.r-project.org/doc/manuals/R-exts.pdf and consider searching the list archives via: http://www.rseek.org as there is a reasonable chance that your query has already been posted and answered in the past. If you have already done those things, then post to R-Devel, but please do subscribe first: https://stat.ethz.ch/mailman/listinfo/r-devel as it will save the R-Devel list moderators from having to manually approve each of your posts, therefore making your posting more expedient. Thanks and regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help - Adding a column in a data frame with multiple conditions
Hi, I am trying to add a column of numbers to a data frame in R with multiple conditions. Here is a simplified example df: [A] [B] [C] [D] [E] [1] 1 X 90 88 [2] 1 Y 72 70 [3] 1 Z 67 41 [4] 2 X 74 49 [5] 2 Y 42 50 [6] 2 Z 81 56 [7] 3 X 92 59 [8] 3 Y 94 80 [9] 3 Z 80 82 I would like column [E] to have a certain value (found either in [C] or [D]) based on conditions in columns [A] *and* [B]. E.g. : if [A] = 1 and [B] = X, then [E] = the entry in [C] for that row (i.e., 90) if [A] = 1 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 70) if [A] = 1 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 67) if [A] = 2 and [B] = X, then [E] = the entry in [C] for that row (i.e., 74) if [A] = 2 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 50) if [A] = 2 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 81) and so on. ATTEMPT TO RESOLVE: The following code allowed me to add values for column [E] when [A] ==1, but I can't figure out how to keep the code going in order to get a value for column [E] based on all of the numbers in column [A] and the secondary condition for [B] ([A] goes from 1:48). df$[E] - ifelse((df$A == 1) (df$B == X), df$C[df$A == 1], ifelse((df$A == 1) (df$B == Y), df$D[df$A == 1], ifelse((df$A == 1) (df$B == Z), df$C[df$A == 1], NA Thank you for any advice you can give. Libby G libbymg[at]utexas.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help - Adding a column in a data frame with multiple conditions
Hi Libby, You had an accumulation of small errors, from an extra ) to an unclear understanding of how indexing works in R. Also, you shouldn't call your dataframe df, or use square brackets in column names. That said, what about: sampledata - structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), B = c(X, Y, Z, X, Y, Z, X, Y, Z), C = c(90L, 72L, 67L, 74L, 42L, 81L, 92L, 94L, 80L), D = c(88L, 70L, 41L, 49L, 50L, 56L, 59L, 80L, 82L)), .Names = c(A, B, C, D), class = data.frame, row.names = c(NA, -9L)) E - ifelse((sampledata$A == 1) (sampledata$B == X), sampledata$C, ifelse((sampledata$A == 1) (sampledata$B == Y), sampledata$D, ifelse((sampledata$A == 1) (sampledata$B == Z), sampledata$C, NA))) sampledata - data.frame(sampledata, E) Sarah On Thu, Oct 4, 2012 at 2:47 PM, Libby M Gertken libb...@utexas.edu wrote: Hi, I am trying to add a column of numbers to a data frame in R with multiple conditions. Here is a simplified example df: [A] [B] [C] [D] [E] [1] 1 X 90 88 [2] 1 Y 72 70 [3] 1 Z 67 41 [4] 2 X 74 49 [5] 2 Y 42 50 [6] 2 Z 81 56 [7] 3 X 92 59 [8] 3 Y 94 80 [9] 3 Z 80 82 I would like column [E] to have a certain value (found either in [C] or [D]) based on conditions in columns [A] *and* [B]. E.g. : if [A] = 1 and [B] = X, then [E] = the entry in [C] for that row (i.e., 90) if [A] = 1 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 70) if [A] = 1 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 67) if [A] = 2 and [B] = X, then [E] = the entry in [C] for that row (i.e., 74) if [A] = 2 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 50) if [A] = 2 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 81) and so on. ATTEMPT TO RESOLVE: The following code allowed me to add values for column [E] when [A] ==1, but I can't figure out how to keep the code going in order to get a value for column [E] based on all of the numbers in column [A] and the secondary condition for [B] ([A] goes from 1:48). df$[E] - ifelse((df$A == 1) (df$B == X), df$C[df$A == 1], ifelse((df$A == 1) (df$B == Y), df$D[df$A == 1], ifelse((df$A == 1) (df$B == Z), df$C[df$A == 1], NA Thank you for any advice you can give. Libby G libbymg[at]utexas.edu [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help - Adding a column in a data frame with multiple conditions
Hi, By extending Sarah's solution to the whole dataset: dat1-read.table(text= A B C D 1 X 90 88 1 Y 72 70 1 Z 67 41 2 X 74 49 2 Y 42 50 2 Z 81 56 3 X 92 59 3 Y 94 80 3 Z 80 82 ,sep=,header=TRUE,stringsAsFactors=FALSE) dat1$E-unlist(lapply(split(dat1,dat1$A),function(x) ifelse(x$B==X|x$B==Z,x$C,ifelse(x$B==Y,x$D,NA dat1 # A B C D E #1 1 X 90 88 90 #2 1 Y 72 70 70 #3 1 Z 67 41 67 #4 2 X 74 49 74 #5 2 Y 42 50 50 #6 2 Z 81 56 81 #7 3 X 92 59 92 #8 3 Y 94 80 80 #9 3 Z 80 82 80 A.K. - Original Message - From: Sarah Goslee sarah.gos...@gmail.com To: Libby M Gertken libb...@utexas.edu Cc: r-help@r-project.org Sent: Thursday, October 4, 2012 4:03 PM Subject: Re: [R] R help - Adding a column in a data frame with multiple conditions Hi Libby, You had an accumulation of small errors, from an extra ) to an unclear understanding of how indexing works in R. Also, you shouldn't call your dataframe df, or use square brackets in column names. That said, what about: sampledata - structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), B = c(X, Y, Z, X, Y, Z, X, Y, Z), C = c(90L, 72L, 67L, 74L, 42L, 81L, 92L, 94L, 80L), D = c(88L, 70L, 41L, 49L, 50L, 56L, 59L, 80L, 82L)), .Names = c(A, B, C, D), class = data.frame, row.names = c(NA, -9L)) E - ifelse((sampledata$A == 1) (sampledata$B == X), sampledata$C, ifelse((sampledata$A == 1) (sampledata$B == Y), sampledata$D, ifelse((sampledata$A == 1) (sampledata$B == Z), sampledata$C, NA))) sampledata - data.frame(sampledata, E) Sarah On Thu, Oct 4, 2012 at 2:47 PM, Libby M Gertken libb...@utexas.edu wrote: Hi, I am trying to add a column of numbers to a data frame in R with multiple conditions. Here is a simplified example df: [A] [B] [C] [D] [E] [1] 1 X 90 88 [2] 1 Y 72 70 [3] 1 Z 67 41 [4] 2 X 74 49 [5] 2 Y 42 50 [6] 2 Z 81 56 [7] 3 X 92 59 [8] 3 Y 94 80 [9] 3 Z 80 82 I would like column [E] to have a certain value (found either in [C] or [D]) based on conditions in columns [A] *and* [B]. E.g. : if [A] = 1 and [B] = X, then [E] = the entry in [C] for that row (i.e., 90) if [A] = 1 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 70) if [A] = 1 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 67) if [A] = 2 and [B] = X, then [E] = the entry in [C] for that row (i.e., 74) if [A] = 2 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 50) if [A] = 2 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 81) and so on. ATTEMPT TO RESOLVE: The following code allowed me to add values for column [E] when [A] ==1, but I can't figure out how to keep the code going in order to get a value for column [E] based on all of the numbers in column [A] and the secondary condition for [B] ([A] goes from 1:48). df$[E] - ifelse((df$A == 1) (df$B == X), df$C[df$A == 1], ifelse((df$A == 1) (df$B == Y), df$D[df$A == 1], ifelse((df$A == 1) (df$B == Z), df$C[df$A == 1], NA Thank you for any advice you can give. Libby G libbymg[at]utexas.edu [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 115, Issue 12
Hello Amelie, I don't have an answer to your question, but I just wanted to point out this page I noticed recently ( http://hlplab.wordpress.com/2009/05/07/multinomial-random-effects-models-in-r/), which might be helpful. I'm also interested in figuring out how to do a multinomial glmm, so if you find out anything I'd be happy to hear more about it! Based on what I've found so far it seems like doing a multinomial glmm is not very straightforward, unfortunately... Best, Steve Politzer-Ahles Message: 94 Date: Wed, 12 Sep 2012 09:47:12 +0200 From: Vaniscotte vaname...@gmail.com To: r-help@r-project.org Subject: [R] multinomial MCMCglmm Message-ID: 50503e00.4060...@gmail.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear all, I would like to add mixed effects in a multinomial model and I am trying to use MCMCglmm for that. The main problem I face: my data set consits of a trapping data set, where the observation at eah trap (1 or 0 for each species) have been aggregated per traplines. Therefore we have a proportion of presence/absence for each species per trapline. ex: ID_line mesh habitat Apsy Mygl Crle Crru Miag Miar Mimi Mumu Misu Soar Somi 11 028S6A 28 copse200000000 00 12 028S6B 28 copse110000000 00 13 028S6C 28 hedge200400000 00 14 028S6D 28 hedge100700001 00 15 028S6E 28 hedge700100000 00 empty 1128 1228 1324 1421 1522 When I run the following: test1 - MCMCglmm(fixed=cbind(Apsy,Mygl,Crle,Crru,Miag,Miar,Mimi,Mumu,Misu,Soar,Somi,empty)~habitat,random=~mesh,family=multinomial12,data=metalSmA[,c(2,9,23:34)],rcov=~us(trait):units) I got some error when running regarding the variance structure: ill-conditioned G/R structure: use proper priors if you haven't or rescale data if you have I guess that the problem comes from the nature of my observation whih are frequencies rather than 0/1 per unit Does someone know if a multinomial model fitted with MCMCglmm can handdle those frequencies table and how to specify the good G/R variance structures? Regards Am?lie Vaniscotte [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help question
Hi there, I have subscribed to R-help but am not sure how to view or post questions? I think this is the right way. I am planning on doing a multivariate regression investigating the relationship between depression (a continuous variable) and social support variables (mostly continuous, some categorical) among older people. I have a number of demographic and health-related variables that I am including as control variables. I have a large dataset from nearly 4,000 individuals. I need to check whether my data is 1) Missing at Random (MAR) and 2) Missing Completely At Random (MCAR). Here are three questions that I have related to this: 1) To check whether the data is MAR, I dichotomised a variable into missing and not missing, and checked for any significant differences in means (for continuous) or proportions (for categorical) of the other variables. I did this for each of the variables in my analysis. Is this correct? 2) Because of the size of my dataset, relationships for my MAR analysis are coming up as significant when, practically, the differences in means or proportions are not meaningful. Is it acceptable for me to argue as such, and say that the data is effectively MAR despite statistical significance? Sorry this is not a question specifically to R (more of a stats question) so no problem if no-one can help, though it would be greatly appreciated. 3) I have no idea how to check whether the data is Missing Completely At Random in R. I think this involves seeing whether those who had missing data for one variable were more likely to have missing data in other variables? If so, I don't know how to do this. Or, I need to do an overall test like Little's test of missing completely at random. I have spent ages looking online and at packages and can't find anything. Please help! I don't want to use SPSS! Cheers, Louise Louise Cowpertwait louisecowpertw...@gmail.com 021 258 9795 Auckland, NZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help question
On Sun, Aug 12, 2012 at 10:58 PM, Louise Cowpertwait louisecowpertw...@gmail.com wrote: Hi there, I have subscribed to R-help but am not sure how to view or post questions? I think this is the right way. Indeed! I am planning on doing a multivariate regression investigating the relationship between depression (a continuous variable) and social support variables (mostly continuous, some categorical) among older people. I have a number of demographic and health-related variables that I am including as control variables. I have a large dataset from nearly 4,000 individuals. I need to check whether my data is 1) Missing at Random (MAR) and 2) Missing Completely At Random (MCAR). Here are three questions that I have related to this: 1) To check whether the data is MAR, I dichotomised a variable into missing and not missing, and checked for any significant differences in means (for continuous) or proportions (for categorical) of the other variables. I did this for each of the variables in my analysis. Is this correct? Something like a classic chi-sq test or small sample analogue? Seems somewhat reasonable, though you'll have to worry about making sure you don't get tripped up by doing too many tests. See ?p.adjust.methods for some references. 2) Because of the size of my dataset, relationships for my MAR analysis are coming up as significant when, practically, the differences in means or proportions are not meaningful. Is it acceptable for me to argue as such, and say that the data is effectively MAR despite statistical significance? I'm not sure there is a statistical answer to that: it's going to depend much more on the nature of your data set. Let your meta knowledge of the source of missingness guide things here. Sorry this is not a question specifically to R (more of a stats question) so no problem if no-one can help, though it would be greatly appreciated. 3) I have no idea how to check whether the data is Missing Completely At Random in R. I think this involves seeing whether those who had missing data for one variable were more likely to have missing data in other variables? If so, I don't know how to do this. Or, I need to do an overall test like Little's test of missing completely at random. I have spent ages looking online and at packages and can't find anything. You might want to check the rms package and the accompanying book (Regression Modeling Strategies) by Frank Harrell. It has the best coverage of MAR/MCAR/Imputation/etc that I've read on a practical basis. Cheers, Michael Please help! I don't want to use SPSS! Cheers, Louise Louise Cowpertwait louisecowpertw...@gmail.com 021 258 9795 Auckland, NZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help
Hi I'm trying to increase the memory limit but Im facing this problem Error: unexpected symbol in C:\\Program Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=500M Hope you can help me out Thank You [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
There seem to be too many quotes in your error message... the path and name of the program should be quoted, the argument should not (or should be separately quoted). 500M isn't especially large these days... be sure to check your existing (default) value using the memory.limit() function before changing it this way. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. chong hui qi imgreee...@gmail.com wrote: Hi I'm trying to increase the memory limit but Im facing this problem Error: unexpected symbol in C:\\Program Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=500M Hope you can help me out Thank You [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
Please continue to include the r-help list in the conversation... you will most likely get your solution quicker. Why do you have double backslashes? That is needed within R, but not at the command line or in a shortcut. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Hui Qi imgreee...@gmail.com wrote: Hi Thanks for ur help Error: unexpected symbol in C:\\Program Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=1014M I tried agn but still obtain this On 9 Aug, 2012, at 12:37 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Error: unexpected symbol in C:\\Program Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=500M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: Help xts object Subset Date by Day of the Week
On Sun, Aug 5, 2012 at 4:49 PM, Douglas Karabasz doug...@sigmamonster.com wrote: I have a xts object made of daily closing prices I have acquired using quantmod. Here is my code: library(xts) library(quantmod) library(lubridate) # Gets SPY data getSymbols(SPY) # Subset Prices to just closing price SP500 - Cl(SPY) # Show day of the week for each date using 2-6 for monday-friday SP500wd - wday(SP500) # Add Price and days of week together SP500wd - cbind(SP500, SP500wd) # subset Monday into one xts object SPmon - subset(SP500wd, SP500wd$..2==2) I then used the package lubridate to show the days of the week. Due to the requirement of an xts objects to be numeric you will see each day is represented as a number so that Monday is =2, Tuesday=3, Wednesday=4, Thursday=5, Friday=6, Saturday=7. Since this is a financial index you will only see the numbers 2-6 or Monday-Friday. I want to subset the data by using the day column. I would like some help to figure out the best way to accomplish a few objectives. 1. Subset the data so that I only show Monday in sequence. However, I do want to make sure that it shows the date, price and the ..2 colum(which is the day of week) after Sub setting the data (I have it done but not sure if it is the best way) I think what you do works, this might also be a one liner: SPY[format(index(SPY), %a) == Mon, ] Alternatively split.default(SPY, format(index(SPY), %a)) creates a list of xts objects split by day of the week (Note you need split.default here because split.xts does something different) 2. Rearrange the object (hopefully without destroying the xts object) so that my data lines up like a weekly calendar. So it would look like the follow. Unfortunately, your formatting got all chewed up by the R-help server, which doesn't like HTML so I'm not quite sure what you want here. Possibly some black magic like this? SPY.CL - Cl(SPY) length(SPY.CL) - 7*floor(length(SPY.CL)/7) dim(SPY.CL) - c(length(SPY.CL)/7, 7) But note that this looses time stamps because each row can only have a single time stamp. You might also try to.weekly() Cheers, Michael Long Date Monday Monday Price Monday Day Index Long Date Tuesday Tuesday Price Tuesday Day Index Long Date Wednesday Wednesday Price Wednesday Index Long Date Thursday Thursday Price Thursday Index Friday Friday Price Friday Index 1/5/2009 92.85 2 1/6/2009 93.47 3 1/7/2009 90.67 4 1/8/2009 84.4 5 1/9/2009 89.09 6 1/12/2009 86.95 2 1/13/2009 87.11 3 1/14/2009 84.37 4 1/15/2009 91.04 5 1/16/2009 85.06 6 MLK Mondy MLK Monday MLK Monday 1/20/2009 80.57 3 1/21/2009 84.05 4 1/22/2009 82.75 5 1/23/2009 83.11 6 1/26/2009 83.68 2 1/27/2009 84.53 3 1/28/2009 87.39 4 1/29/2009 84.55 5 1/30/2009 82.83 6 Thank you, Douglas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: Help xts object Subset Date by Day of the Week
On Mon, Aug 6, 2012 at 4:30 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: On Sun, Aug 5, 2012 at 4:49 PM, Douglas Karabasz doug...@sigmamonster.com wrote: I have a xts object made of daily closing prices I have acquired using quantmod. Here is my code: library(xts) library(quantmod) library(lubridate) # Gets SPY data getSymbols(SPY) # Subset Prices to just closing price SP500 - Cl(SPY) # Show day of the week for each date using 2-6 for monday-friday SP500wd - wday(SP500) # Add Price and days of week together SP500wd - cbind(SP500, SP500wd) # subset Monday into one xts object SPmon - subset(SP500wd, SP500wd$..2==2) I then used the package lubridate to show the days of the week. Due to the requirement of an xts objects to be numeric you will see each day is represented as a number so that Monday is =2, Tuesday=3, Wednesday=4, Thursday=5, Friday=6, Saturday=7. Since this is a financial index you will only see the numbers 2-6 or Monday-Friday. I want to subset the data by using the day column. I would like some help to figure out the best way to accomplish a few objectives. 1. Subset the data so that I only show Monday in sequence. However, I do want to make sure that it shows the date, price and the ..2 colum(which is the day of week) after Sub setting the data (I have it done but not sure if it is the best way) I think what you do works, this might also be a one liner: SPY[format(index(SPY), %a) == Mon, ] Alternatively split.default(SPY, format(index(SPY), %a)) creates a list of xts objects split by day of the week (Note you need split.default here because split.xts does something different) 2. Rearrange the object (hopefully without destroying the xts object) so that my data lines up like a weekly calendar. So it would look like the follow. Unfortunately, your formatting got all chewed up by the R-help server, which doesn't like HTML so I'm not quite sure what you want here. Possibly some black magic like this? SPY.CL - Cl(SPY) length(SPY.CL) - 7*floor(length(SPY.CL)/7) dim(SPY.CL) - c(length(SPY.CL)/7, 7) But note that this looses time stamps because each row can only have a single time stamp. To clarify that's not _why_ that looses the time-stamps (and xts-ness) but just that it does happen. Technically, it's because dim-.xts doesn't exist; the reason it doesn't (I'd imagine) is because of the time stamp thing. M You might also try to.weekly() Cheers, Michael Long Date Monday Monday Price Monday Day Index Long Date Tuesday Tuesday Price Tuesday Day Index Long Date Wednesday Wednesday Price Wednesday Index Long Date Thursday Thursday Price Thursday Index Friday Friday Price Friday Index 1/5/2009 92.85 2 1/6/2009 93.47 3 1/7/2009 90.67 4 1/8/2009 84.4 5 1/9/2009 89.09 6 1/12/2009 86.95 2 1/13/2009 87.11 3 1/14/2009 84.37 4 1/15/2009 91.04 5 1/16/2009 85.06 6 MLK Mondy MLK Monday MLK Monday 1/20/2009 80.57 3 1/21/2009 84.05 4 1/22/2009 82.75 5 1/23/2009 83.11 6 1/26/2009 83.68 2 1/27/2009 84.53 3 1/28/2009 87.39 4 1/29/2009 84.55 5 1/30/2009 82.83 6 Thank you, Douglas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: Help xts object Subset Date by Day of the Week
I have a xts object made of daily closing prices I have acquired using quantmod. Here is my code: library(xts) library(quantmod) library(lubridate) # Gets SPY data getSymbols(SPY) # Subset Prices to just closing price SP500 - Cl(SPY) # Show day of the week for each date using 2-6 for monday-friday SP500wd - wday(SP500) # Add Price and days of week together SP500wd - cbind(SP500, SP500wd) # subset Monday into one xts object SPmon - subset(SP500wd, SP500wd$..2==2) I then used the package lubridate to show the days of the week. Due to the requirement of an xts objects to be numeric you will see each day is represented as a number so that Monday is =2, Tuesday=3, Wednesday=4, Thursday=5, Friday=6, Saturday=7. Since this is a financial index you will only see the numbers 2-6 or Monday-Friday. I want to subset the data by using the day column. I would like some help to figure out the best way to accomplish a few objectives. 1. Subset the data so that I only show Monday in sequence. However, I do want to make sure that it shows the date, price and the ..2 colum(which is the day of week) after Sub setting the data (I have it done but not sure if it is the best way) 2. Rearrange the object (hopefully without destroying the xts object) so that my data lines up like a weekly calendar. So it would look like the follow. Long Date Monday Monday Price Monday Day Index Long Date Tuesday Tuesday Price Tuesday Day Index Long Date Wednesday Wednesday Price Wednesday Index Long Date Thursday Thursday Price Thursday Index Friday Friday Price Friday Index 1/5/2009 92.85 2 1/6/2009 93.47 3 1/7/2009 90.67 4 1/8/2009 84.4 5 1/9/2009 89.09 6 1/12/2009 86.95 2 1/13/2009 87.11 3 1/14/2009 84.37 4 1/15/2009 91.04 5 1/16/2009 85.06 6 MLK Mondy MLK Monday MLK Monday 1/20/2009 80.57 3 1/21/2009 84.05 4 1/22/2009 82.75 5 1/23/2009 83.11 6 1/26/2009 83.68 2 1/27/2009 84.53 3 1/28/2009 87.39 4 1/29/2009 84.55 5 1/30/2009 82.83 6 Thank you, Douglas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R- Help (looping)
Hello, Inline Em 31-07-2012 02:59, Wellington Silva escreveu: Ok, This really helped. You've probably noticed, I'm a begginer using R. And when you said that you tried with rnorm and the counts were not zero, where did you use the rnorm? Where the runif is, in its stead use matrix( rnorm(nc*dds), ...etc... ) But note that this is fake data, what is the distribution of your dataset? Or is the purpose of this simulations? If so, there must also be a distribution to replicate, no? Rui Barradas The point is, I need this count to be different from zero, so I can use the ARL later on. I need how many values are out of control (for each column /variable), store these values in a vector, calculate this vector's mean, and then calculate the ARL. I've got to the same point as you did, but I'm stuck in the same problem. I need this count do be different from zero. I'm still learning, haha, please be patient. Best Regards. 2012/7/30 Rui Barradas ruipbarra...@sapo.pt Hello, Try the following. # make up some data dds - 1e3 nc - 10 xss - data.frame(matrix(runif(nc***dds, min=-1, max=1), ncol=nc)) names(xss) - paste0(xs, 1:10) # two functions with descriptive names getControlLimits - function(x, L = 3){ mu - mean(x) sigma - sd(x) c(lcl = mu - L*sigma, ucl = mu + L*sigma) } countOutOfControl - function(x, L = 3){ cl - getControlLimits(x, L = L) sum( x cl[lcl] | x cl[ucl] ) } sapply(xss, getControlLimits)# To know why the next returns all zeros sapply(xss, countOutOfControl)# No values outside control limits The data comes from an uniform in the interval (-1, 1) therefore the sd is sqrt(1/3) = 0.577. Times 3 gives values for lcl and ucl clearly below and above -1 and 1, respectively. (I've tried rnorm and the counts were not zero.) But this is just a data example to see if the code works, and it does. Hope this helps, Rui Barradas Em 30-07-2012 22:04, Wellington Silva escreveu: Ok Rui, I'll try to explain myself a little bit better. I have 10 columns, with 1000 rows each. (Each column represents a variable of the process) these columns were simulated using runif, with values between 4 and 6. I need to calculate the control limits for each column (variable), and verify if they are within the bounds accepted. For now, I'm reading these columns separately, I'm creating variables to read only one of the each column, and then check if the values in this column is within the limits or not. PS. All these columns originally come from a dataframe. something like this: (dds = 1000) xs1-runif(dds, min=-1, max=1) xs2-runif(dds, min=-1, max=1) xs3-runif(dds, min=-1, max=1) xs4-runif(dds, min=-1, max=1) xs5-runif(dds, min=-1, max=1) ... xs10-runif(dds, min=-1,max=1) xs-data.frame(xs1,xs2,xs3,**xs4,xs5,..xs10) __**___ #Reading columns (first one) c.n1 - 'xs1' c1 - with(xss, get(c.n1)) #Mean mc1 - mean (c1) #Standard deviation sdc1 - sd(c1) #Control Limits L=3 uclc1 = mc1 + L*sdc1 lclc1 = mc1 - L*sdc1 I need an if inside a loop to do the following: for (each column) if (value in c1 lclc1 | value in c1 uclc1) {Count how many values are outside the bounds} And I need this process to be repeated to all columns. 2012/7/30 Rui Barradas ruipbarra...@sapo.pt Hello, Your code example doesn't make much sense, it needs decrypting. Let's see, for(k in 1:length(c1)){ if(c1[k] lcl | c1[k] ucl) {do something} } If this is it, then you can completely avoid the loop: i1 - c1 lcl | c1 ucl # create an index vector out.of.control - c1[ i1 ]# save the values When you say you have a column, is it a matrix column? data.frame? Give a data example with dput( head(myData, 20) ) # paste the output of this in a post Hope this helps, Rui Barradas Em 29-07-2012 21:59, Wellington G. Silva escreveu: Hi, I'm Wellington from Brazil and I have the following issue: I've been working on a project a for a while, and I'm having trouble in using the loop (for) I need to read a column (c1), and for each value of this column, I need to check if it's within the control limits So, I was trying to do this: For (k in 1: c1) If (c1 lcl1 | c1 ucl1) {here I need to store the values outside the limits) I have 5 columns, need to do the same process in each one of them. And later on I'm gonna concatenate these 5 vectors and calculate its mean for an ARL project. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
Re: [R] R- Help (looping)
Hello, Glad it helped. E fico à espera do relatório. Rui Barradas Em 31-07-2012 14:40, Wellington Silva escreveu: Rui, This is a cientific initiation program. The idea is to develop a code which can simulate data and calculate the ARL later. *So, a little bit later yesterday night, after sending the email, I've figuered how to use the rnorm and then my problems were gone. Ok, we were working with normal distribution, that's why I needed to use rnorm. These simulations have just one purpose: calculate ARL at the end of the project. And, I want to thank you for the help, thank you so much, you've really helped me. As soon as I'm done with the final report, at the special thanks page, I'm gonna put your name Haha. Because, your help was enormous. Thanks again, I'll send you a copy of the final report as soon as I finish it if you wish. 2012/7/31 Rui Barradas ruipbarra...@sapo.pt Hello, Inline Em 31-07-2012 02:59, Wellington Silva escreveu: Ok, This really helped. You've probably noticed, I'm a begginer using R. And when you said that you tried with rnorm and the counts were not zero, where did you use the rnorm? Where the runif is, in its stead use matrix( rnorm(nc*dds), ...etc... ) But note that this is fake data, what is the distribution of your dataset? Or is the purpose of this simulations? If so, there must also be a distribution to replicate, no? Rui Barradas The point is, I need this count to be different from zero, so I can use the ARL later on. I need how many values are out of control (for each column /variable), store these values in a vector, calculate this vector's mean, and then calculate the ARL. I've got to the same point as you did, but I'm stuck in the same problem. I need this count do be different from zero. I'm still learning, haha, please be patient. Best Regards. 2012/7/30 Rui Barradas ruipbarra...@sapo.pt Hello, Try the following. # make up some data dds - 1e3 nc - 10 xss - data.frame(matrix(runif(nc*dds, min=-1, max=1), ncol=nc)) names(xss) - paste0(xs, 1:10) # two functions with descriptive names getControlLimits - function(x, L = 3){ mu - mean(x) sigma - sd(x) c(lcl = mu - L*sigma, ucl = mu + L*sigma) } countOutOfControl - function(x, L = 3){ cl - getControlLimits(x, L = L) sum( x cl[lcl] | x cl[ucl] ) } sapply(xss, getControlLimits)# To know why the next returns all zeros sapply(xss, countOutOfControl)# No values outside control limits The data comes from an uniform in the interval (-1, 1) therefore the sd is sqrt(1/3) = 0.577. Times 3 gives values for lcl and ucl clearly below and above -1 and 1, respectively. (I've tried rnorm and the counts were not zero.) But this is just a data example to see if the code works, and it does. Hope this helps, Rui Barradas Em 30-07-2012 22:04, Wellington Silva escreveu: Ok Rui, I'll try to explain myself a little bit better. I have 10 columns, with 1000 rows each. (Each column represents a variable of the process) these columns were simulated using runif, with values between 4 and 6. I need to calculate the control limits for each column (variable), and verify if they are within the bounds accepted. For now, I'm reading these columns separately, I'm creating variables to read only one of the each column, and then check if the values in this column is within the limits or not. PS. All these columns originally come from a dataframe. something like this: (dds = 1000) xs1-runif(dds, min=-1, max=1) xs2-runif(dds, min=-1, max=1) xs3-runif(dds, min=-1, max=1) xs4-runif(dds, min=-1, max=1) xs5-runif(dds, min=-1, max=1) ... xs10-runif(dds, min=-1,max=1) xs-data.frame(xs1,xs2,xs3,xs4,xs5,..xs10) _____ #Reading columns (first one) c.n1 - 'xs1' c1 - with(xss, get(c.n1)) #Mean mc1 - mean (c1) #Standard deviation sdc1 - sd(c1) #Control Limits L=3 uclc1 = mc1 + L*sdc1 lclc1 = mc1 - L*sdc1 I need an if inside a loop to do the following: for (each column) if (value in c1 lclc1 | value in c1 uclc1) {Count how many values are outside the bounds} And I need this process to be repeated to all columns. 2012/7/30 Rui Barradas ruipbarra...@sapo.pt Hello, Your code example doesn't make much sense, it needs decrypting. Let's see, for(k in 1:length(c1)){ if(c1[k] lcl | c1[k] ucl) {do something} } If this is it, then you can completely avoid the loop: i1 - c1 lcl | c1 ucl # create an index vector out.of.control - c1[ i1 ]# save the values When you say you have a column, is it a matrix column? data.frame? Give a data example with dput( head(myData, 20) ) # paste the output of this in a post Hope this helps, Rui Barradas Em 29-07-2012 21:59, Wellington G. Silva escreveu: Hi, I'm Wellington from Brazil and I have the following issue: I've been working on a project a for a while, and I'm having trouble in using the loop (for) I need to read a column (c1), and for each
[R] R- Help (looping)
Hi, I'm Wellington from Brazil and I have the following issue: I've been working on a project a for a while, and I'm having trouble in using the loop (for) I need to read a column (c1), and for each value of this column, I need to check if it's within the control limits So, I was trying to do this: For (k in 1: c1) If (c1 lcl1 | c1 ucl1) {here I need to store the values outside the limits) I have 5 columns, need to do the same process in each one of them. And later on I'm gonna concatenate these 5 vectors and calculate its mean for an ARL project. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R- Help (looping)
Hello, Your code example doesn't make much sense, it needs decrypting. Let's see, for(k in 1:length(c1)){ if(c1[k] lcl | c1[k] ucl) {do something} } If this is it, then you can completely avoid the loop: i1 - c1 lcl | c1 ucl # create an index vector out.of.control - c1[ i1 ]# save the values When you say you have a column, is it a matrix column? data.frame? Give a data example with dput( head(myData, 20) ) # paste the output of this in a post Hope this helps, Rui Barradas Em 29-07-2012 21:59, Wellington G. Silva escreveu: Hi, I'm Wellington from Brazil and I have the following issue: I've been working on a project a for a while, and I'm having trouble in using the loop (for) I need to read a column (c1), and for each value of this column, I need to check if it's within the control limits So, I was trying to do this: For (k in 1: c1) If (c1 lcl1 | c1 ucl1) {here I need to store the values outside the limits) I have 5 columns, need to do the same process in each one of them. And later on I'm gonna concatenate these 5 vectors and calculate its mean for an ARL project. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R- Help (looping)
Hello, Try the following. # make up some data dds - 1e3 nc - 10 xss - data.frame(matrix(runif(nc*dds, min=-1, max=1), ncol=nc)) names(xss) - paste0(xs, 1:10) # two functions with descriptive names getControlLimits - function(x, L = 3){ mu - mean(x) sigma - sd(x) c(lcl = mu - L*sigma, ucl = mu + L*sigma) } countOutOfControl - function(x, L = 3){ cl - getControlLimits(x, L = L) sum( x cl[lcl] | x cl[ucl] ) } sapply(xss, getControlLimits)# To know why the next returns all zeros sapply(xss, countOutOfControl)# No values outside control limits The data comes from an uniform in the interval (-1, 1) therefore the sd is sqrt(1/3) = 0.577. Times 3 gives values for lcl and ucl clearly below and above -1 and 1, respectively. (I've tried rnorm and the counts were not zero.) But this is just a data example to see if the code works, and it does. Hope this helps, Rui Barradas Em 30-07-2012 22:04, Wellington Silva escreveu: Ok Rui, I'll try to explain myself a little bit better. I have 10 columns, with 1000 rows each. (Each column represents a variable of the process) these columns were simulated using runif, with values between 4 and 6. I need to calculate the control limits for each column (variable), and verify if they are within the bounds accepted. For now, I'm reading these columns separately, I'm creating variables to read only one of the each column, and then check if the values in this column is within the limits or not. PS. All these columns originally come from a dataframe. something like this: (dds = 1000) xs1-runif(dds, min=-1, max=1) xs2-runif(dds, min=-1, max=1) xs3-runif(dds, min=-1, max=1) xs4-runif(dds, min=-1, max=1) xs5-runif(dds, min=-1, max=1) ... xs10-runif(dds, min=-1,max=1) xs-data.frame(xs1,xs2,xs3,xs4,xs5,..xs10) _ #Reading columns (first one) c.n1 - 'xs1' c1 - with(xss, get(c.n1)) #Mean mc1 - mean (c1) #Standard deviation sdc1 - sd(c1) #Control Limits L=3 uclc1 = mc1 + L*sdc1 lclc1 = mc1 - L*sdc1 I need an if inside a loop to do the following: for (each column) if (value in c1 lclc1 | value in c1 uclc1) {Count how many values are outside the bounds} And I need this process to be repeated to all columns. 2012/7/30 Rui Barradas ruipbarra...@sapo.pt Hello, Your code example doesn't make much sense, it needs decrypting. Let's see, for(k in 1:length(c1)){ if(c1[k] lcl | c1[k] ucl) {do something} } If this is it, then you can completely avoid the loop: i1 - c1 lcl | c1 ucl # create an index vector out.of.control - c1[ i1 ]# save the values When you say you have a column, is it a matrix column? data.frame? Give a data example with dput( head(myData, 20) ) # paste the output of this in a post Hope this helps, Rui Barradas Em 29-07-2012 21:59, Wellington G. Silva escreveu: Hi, I'm Wellington from Brazil and I have the following issue: I've been working on a project a for a while, and I'm having trouble in using the loop (for) I need to read a column (c1), and for each value of this column, I need to check if it's within the control limits So, I was trying to do this: For (k in 1: c1) If (c1 lcl1 | c1 ucl1) {here I need to store the values outside the limits) I have 5 columns, need to do the same process in each one of them. And later on I'm gonna concatenate these 5 vectors and calculate its mean for an ARL project. [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 113, Issue 20
Hi, I'm using write.table() to export dataframe to Excel. All works fine except that I want to export the variable labels instead of variable names. I can see the labels in the R consol using attr(), but I just don't know how to use the labels instead of the names. Thanks, François Maurice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help
hi would you please remove my email from the mailing list. thanks in advance Sebastien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
https://stat.ethz.ch/mailman/listinfo/r-help at the bottom… On 16.07.2012, at 10:52, Sébastien Morant wrote: hi would you please remove my email from the mailing list. thanks in advance Sebastien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 113, Issue 19
On 16/07/2012, r-help-requ...@r-project.org r-help-requ...@r-project.org wrote: -- Message: 77 Date: Mon, 16 Jul 2012 10:48:39 +0100 From: Rui Barradas ruipbarra...@sapo.pt To: e-letter inp...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] histogram of time-stamp data Message-ID: 5003e377.3000...@sapo.pt Content-Type: text/plain; charset=ISO-8859-1; format=flowed timestamps - as.POSIXct(Sys.Date()) + sample(24*60*60, 1e3, TRUE) Why is it necessary to apply the sample to the current date? Looking at the dataframe, I noticed that values have been changed: file 'test.txt': 12:19:00 09:30:00 16:56:00 01:56:00 10:44:00 10:31:00 02:14:00 05:05:00 12:52:00 21:50:00 R command terminal input: timestamps-read.table(test.txt) timestamps V1 1 12:19:00 2 09:30:00 3 16:56:00 4 01:56:00 5 10:44:00 6 10:31:00 7 02:14:00 8 05:05:00 9 12:52:00 10 21:50:00 timestamps - as.POSIXct(Sys.Date()) + sample(24*60*60, 1e3, TRUE) write.csv(timestamps,file=test1.txt) test1.txt: ,x 1,2012-07-16 04:52:48 2,2012-07-16 21:21:28 3,2012-07-16 18:58:27 4,2012-07-16 22:17:25 5,2012-07-16 11:13:52 6,2012-07-16 03:17:35 7,2012-07-16 02:14:17 8,2012-07-16 14:18:27 9,2012-07-16 14:39:16 Why is this happening? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 113, Issue 13
http://www.ncbi.nlm.nih.gov/pubmed/21418051 for the full reference. I don't have an electronic copy, but I do have that issue of Biometrics in my office. I'll have a copy sent over. Terry On 07/10/2012 04:08 PM, r-help-requ...@r-project.org wrote: Send R-help mailing list submissions to r-help@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body 'help' to r-help-requ...@r-project.org You can reach the person managing the list at r-help-ow...@r-project.org When replying, please edit your Subject line so it is more specific than Re: Contents of R-help digest... Today's Topics: 1. how can I show the xlab and ylab information while using layout (Jie Tang) 2. Re: how can I show the xlab and ylab information while using layout (Sarah Goslee) 3. Re: Extracting arithmetic mean for specific values from multiple .txt-files (Rui Barradas) 4. Re: How to use external image with R plot? (Michael Sumner) 5. Re: is it possible to insert a figure into into another new figure by r script (Thomas Adams) 6. image.plot transparent? (Chris82) 7. Re: How to add marker in Stacked bar plot? (Jorge I Velez) 8. Re: Predicted values for zero-inflated Poisson (Laura Lee) 9. Re: how can I show the xlab and ylab information while using layout (Sarah Goslee) 10. Re: image.plot transparent? (Sarah Goslee) 11. RGB components of plot() colours ( (Ted Harding)) 12. Re: number of decimal places in a number? (Martin Ivanov) 13. define stuff to be only usable in the same file (Jessica Streicher) 14. fill 0-row data.frame with 1 line of NAs (Liviu Andronic) 15. Re: RGB components of plot() colours (Duncan Murdoch) 16. Re: define stuff to be only usable in the same file (Duncan Murdoch) 17. Re: RGB components of plot() colours (Sarah Goslee) 18. Re: Predicted values for zero-inflated Poisson (Achim Zeileis) 19. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas) 20. Re: Plotting rpart trees with long list of class members (Jean V Adams) 21. Re: image.plot transparent? (Prof Brian Ripley) 22. customize packages' help index ( 00index.html file ) (Damien Georges) 23. identify.hclust() doesn't cut tree at the vertical position of the mouse pointer (WATSON Mick) 24. Use of Sappy and Tappy for Mathematical Calculation (Rantony) 25. fitting power growth (Thomas Hoffmann) 26. Mac OS X R uninstallation question (Alastair) 27. Count of elements in coulmns of a matrix (Rantony) 28. Re: Questions about doing analysis based on time (APOCooter) 29. gdata: Problem reading excel document containing non-US characters (=?iso-8859-1?Q?Rolf_Marvin_B=F8e_Lindgren?=) 30. Re: Count of elements in coulmns of a matrix (Sarah Goslee) 31. Re: define stuff to be only usable in the same file (Jessica Streicher) 32. Re: Mac OS X R uninstallation question (Prof Brian Ripley) 33. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers) 34. Re: define stuff to be only usable in the same file (Jessica Streicher) 35. estimation of NA by predict command (eliza botto) 36. Re: fill 0-row data.frame with 1 line of NAs (Brian Diggs) 37. Help with vectors and rollapply (Raghuraman Ramachandran) 38. Re: multiple comparisons with generalised least squares (Ariel) 39. RGL 3D curvilinear shapes (PatGauthier) 40. Re: estimation of NA by predict command (arun) 41. Re: Count of elements in coulmns of a matrix (arun) 42. Re: Help with vectors and rollapply (William Dunlap) 43. Re: Predicted values for zero-inflated Poisson (Laura Lee) 44. Re: Use of Sappy and Tappy for Mathematical Calculation (Nordlund, Dan (DSHS/RDA)) 45. Re: Questions about doing analysis based on time (Rui Barradas) 46. Re: Extracting arithmetic mean for specific values from multiple .txt-files (vimmster) 47. Revolutions blog: June Roundup (David Smith) 48. calculating the difference between days? (C W) 49. Re: R to winbugs interface (Uwe Ligges) 50. Re: Predicted values for zero-inflated Poisson (Highland Statistics Ltd) 51. Re: Extracting arithmetic mean for specific values from multiple .txt-files (Rui Barradas) 52. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers) 53. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas) 54. Re: Predicted values for zero-inflated Poisson (Laura Lee) 55. Re: Specify model with polynomial interaction terms up to degree n (YTP) 56. Re: Use of Sappy and Tappy for Mathematical Calculation (arun) 57. problem for installing rgdal (stanislas rebaudet) 58. HELP me please with import of csv to R (F86) 59. R code help to change table format (peziza) 60. Re: Skipping lines and incomplete rows (arun)
[R] R help, using R to build choropleth
Hi guys i need some help to build choropleth. Basically i am trying to colour regions on the map by population. I possess the shape file of the country, and also the population data, however, i am having trouble to create the plot, below is my code: population=read.csv(nz2.csv) population names(population) nz=readShapeSpatial((sprintf('nz-geodetic-marks',tmp_dir))) nz$land_distr mapnames = nz$land_distr mapnamesnew=as.character(mapnames) popnames =as.character(mapnamesnew) pop =population$People pop2=as.numeric(pop) popdich = ifelse(pop2 10, red, blue) popnames.lower = tolower(popnames) cols=popdich[match(mapnames,popnames.lower)] plot(nz,fill=TRUE,col=cols,proj=GCS_NZGD_2000) thanks in advance for any assistance Kind regards Iverson -- View this message in context: http://r.789695.n4.nabble.com/R-help-using-R-to-build-choropleth-tp4634686.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help, using R to build choropleth
Hello, You can find some hints there: http://geography.uoregon.edu/geogr/topics/index.html Regards Le 12/06/28 14:26, iverson a écrit : Hi guys i need some help to build choropleth. Basically i am trying to colour regions on the map by population. I possess the shape file of the country, and also the population data, however, i am having trouble to create the plot, below is my code: population=read.csv(nz2.csv) population names(population) nz=readShapeSpatial((sprintf('nz-geodetic-marks',tmp_dir))) nz$land_distr mapnames = nz$land_distr mapnamesnew=as.character(mapnames) popnames =as.character(mapnamesnew) pop =population$People pop2=as.numeric(pop) popdich = ifelse(pop2 10, red, blue) popnames.lower = tolower(popnames) cols=popdich[match(mapnames,popnames.lower)] plot(nz,fill=TRUE,col=cols,proj=GCS_NZGD_2000) thanks in advance for any assistance Kind regards Iverson -- View this message in context: http://r.789695.n4.nabble.com/R-help-using-R-to-build-choropleth-tp4634686.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help, using R to build choropleth
On 06/28/2012 03:26 PM, iverson wrote: Hi guys i need some help to build choropleth. Basically i am trying to colour regions on the map by population. I possess the shape file of the country, and also the population data, however, i am having trouble to create the plot, below is my code: population=read.csv(nz2.csv) population names(population) nz=readShapeSpatial((sprintf('nz-geodetic-marks',tmp_dir))) nz$land_distr mapnames = nz$land_distr mapnamesnew=as.character(mapnames) popnames =as.character(mapnamesnew) pop =population$People pop2=as.numeric(pop) popdich = ifelse(pop2 10, red, blue) popnames.lower = tolower(popnames) cols=popdich[match(mapnames,popnames.lower)] plot(nz,fill=TRUE,col=cols,proj=GCS_NZGD_2000) Hi Iverson, Here is some code that I have used to draw a choropleth map of NSW where the colors represent the average Index of Relative Social Disadvantage for Statistical Local Areas in New South Wales. I've cut out the bits that map the locations of low speed vehicle runovers as you won't need that code. # load the RGDAL package require(rgdal) # read in the SLA boundaries for Australia SLAmap-readOGR(.,SLA10aAust) # pick out the bits in NSW NSWmap-SLAmap[1:199,] # read in the SEIFA (including IRSD) values AU_SEIFA-read.csv(AU_SEIFA_SLA_2006.csv) require(plotrix) # generate the colors for the IRSD scores NSW_SEIFAcol-color.scale(AU_SEIFA$SEIFA1dec[1:199], c(1,0.9,0.8,0.8),c(0.8,0.9,0.9,0.8),c(0.8,0.8,0.9,1),xrange=c(0,10)) # open the graphics device png(lsvro_NSW_96-10.png,width=700,height=700) par(mar=c(4,0,3,1)) # plot the map with the colors plot(NSWmap,xlim=c(140,max(lsvro$longitude,na.rm=TRUE)), col=NSW_SEIFAcol) legend_values-c(0:10) # generate the colors for the legend SEIFAlegendcol-color.scale(legend_values, c(1,0.9,0.8,0.8),c(0.8,0.9,0.9,0.8),c(0.8,0.8,0.9,1)) # stick a legend on the side color.legend(151.8,-37.5,152.3,-34.5,as.character(legend_values),SEIFAlegendcol, align=rb,gradient=y) title(main=Low speed vehicle run overs NSW - 1996-2010) par(xpd=TRUE) # put a title on the color legend text(151.8,-34.3,SEIFA decile,adj=0) par(xpd=FALSE) dev.off() Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 112, Issue 25
While lm() is a linear modeling, the constraints make it easier to solve with a nonlinear tool. Both my packages Rvmmin and Rcgmin (I recommend the R-forge versions as more up-to-date) have bounds constraints and masks i.e., fixed parameters. I am actually looking for example problems of this type that are more recent than the ones that got me into this 30 years ago. Do contact me off-list if you have something that could be shared. I'd also welcome discussion on appropriate tools for such constrained linear modeling problems. They are, I believe, more or less present in most linear modeling situations, but we rarely impose the constraints explicitly, and tend to use lm() and (hopefully) check if the solution obeys the conditions. Best, John Nash On 06/25/2012 06:00 AM, r-help-requ...@r-project.org wrote: Message: 5 Date: Sun, 24 Jun 2012 03:34:10 -0700 (PDT) From: rgoodman rosa.good...@gmail.com To: r-help@r-project.org Subject: Re: [R] Constrained coefficients in lm (correction) Message-ID: 1340534050627-4634321.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hi Jorge, Did you ever figure this out? I want to do the same thing with the additional constraint of the coef for x1 = 2. lm(Y~offset(2*x1)+x2+x3,data=mydata) where b= coeff for x2, c = coeff for x3, b+c=1 and b and c0. I've loaded the systemfit package, but the suggestion R*beta0 = q, where R is R.restr and q is q.restr in the function call makes no sense to me. Cheers, Rosie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 112, Issue 6
Rich, The documentation for cenboxplot states that the second argument must be logical and not integer. the function cenboxplot substitutes synthetic values for censored values using ros, hence the error message from the ros method. I also do not understand how you expect group = 'SO4' to work. It is not clear that the function will replicate 'SO4' to make a single group. Dave Date: Tue, 5 Jun 2012 09:58:16 -0700 (PDT) From: Rich Shepard rshep...@appl-ecosys.com To: r-help@r-project.org Subject: [R] NADA Applied to my Data Message-ID: alpine.lnx.2.00.1206050931300.27...@salmo.appl-ecosys.com Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII I need a nudge in the right direction to get started using NADA. I bought Helsel's second addition and am currently reading it; NADA is installed in R. My data has been restructured with a couple of awk scripts. The data frame structure now has a flag if the quantity is censored (ceneq1 column) as well as a lower and upper limit for censored data. For present purposes, interval censoring can be ignored. The data frame structure is now: str(waterchem) 'data.frame': 46551 obs. of 7 variables: $ site: Factor w/ 126 levels BC-0.5,BC-1,..: 22 22 22 13 3 13 ... $ sampdate: Date, format: 1996-05-22 1996-07-19 ... $ param : Factor w/ 58 levels -0.100,AGP,..: 47 58 10 16 16 26 ... $ quant : num 0.01 7.69 0.02 63.8 120 0.02 399 439 2 433 ... $ ceneq1 : int 1 0 0 0 0 0 0 0 0 0 ... $ low : num 0 7.69 0.02 63.8 120 0.02 399 439 2 433 ... $ high: num 0.01 7.69 0.02 63.8 120 0.02 399 439 2 433 ... What I want to first learn is how to specify a box plot (and whether I can use the lattice package) for specific chemicals. ?cenboxplot shows me the arguments, but I'm not entering them correctly, or there's a prerequisite step I need to take: cenboxplot(waterchem$quant, waterchem$ceneq1, group='SO4', log=T, range=1.5) Error in function (classes, fdef, mtable) : unable to find an inherited method for function ros, for signature numeric, integer Perhaps cenboxplot is looking for a separate data set and not a data frame? Or, perhaps I need to melt and re-cast the data frame to the wide format from the current narrow format? Pointers appreciated. Rich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help!
So long as x is a character vector, I tend to use the following for this problem. x - c(12/31/11 23:45, 1/1/12 2:15) x.split - strsplit(x, ) x.date - sapply(x.split, function(y) return(y[1])) x.time - sapply(x.split, function(y) if (length(y) 1) return(y[2]) else NA) x.date [1] 12/31/11 1/1/12 x.time [1] 23:45 2:15 Benjamin Nutter | Biostatistician | Quantitative Health Sciences Cleveland Clinic | 9500 Euclid Ave. | Cleveland, OH 44195 | (216) 445-1365 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roth Sent: Wednesday, May 02, 2012 4:01 PM To: r-help@r-project.org Subject: [R] R help! Hello there, I was wondering if you could help me with a quick R issue. I have a data set where one of the columns has both date and time in it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this column into two new columns: date and time. One of the problems with splitting here is that when the dates go into single digits there are no 0's in front of months January-September (e.g., January is represented by 1 as opposed to 01), so every entry is a different length. Therefore, splitting by the space is the only option, I think. Here's the coding I've developed thus far: z$dt - z$Date#time and date is all under z$Date foo - strsplit( , z$dt) #attempted split based on the space And then if that were to work, I would proceed use the coding: foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - foo[ ,2] However, foo - strsplit( , z$dt) isn't working. Do you know what the problem is? If you could respond soon, that would be greatly appreciated! Thanks so much! Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. === Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News World Report (2010). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help!
On May 3, 2012, at 14:05 , Nutter, Benjamin wrote: So long as x is a character vector, I tend to use the following for this problem. x - c(12/31/11 23:45, 1/1/12 2:15) x.split - strsplit(x, ) x.date - sapply(x.split, function(y) return(y[1])) x.time - sapply(x.split, function(y) if (length(y) 1) return(y[2]) else NA) x.date [1] 12/31/11 1/1/12 x.time [1] 23:45 2:15 Benjamin Nutter | Biostatistician | Quantitative Health Sciences Cleveland Clinic| 9500 Euclid Ave. | Cleveland, OH 44195 | (216) 445-1365 I think this can be simplified to sapply(x.split,`[`,1) [1] 12/31/11 1/1/12 sapply(x.split,`[`,2) [1] 23:45 2:15 It's a bit inefficient, though. Other ideas: sub( .*$, , x) [1] 12/31/11 1/1/12 sub(^.* , , x) [1] 23:45 2:15 read.table(text=x) V1V2 1 12/31/11 23:45 2 1/1/12 2:15 -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help!
Hi I would convert it to propper date format and then you can extract anything. dat-strptime(12/31/11 23:45, format=%m/%d/%y %H:%M) as.Date(dat) [1] 2011-12-31 format(dat, %H:%M) [1] 23:45 Regards Petr Hello there, I was wondering if you could help me with a quick R issue. I have a data set where one of the columns has both date and time in it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this column into two new columns: date and time. One of the problems with splitting here is that when the dates go into single digits there are no 0's in front of months January-September (e.g., January is represented by 1 as opposed to 01), so every entry is a different length. Therefore, splitting by the space is the only option, I think. Here's the coding I've developed thus far: z$dt - z$Date#time and date is all under z$Date foo - strsplit( , z$dt) #attempted split based on the space And then if that were to work, I would proceed use the coding: foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - foo[ ,2] However, foo - strsplit( , z$dt) isn't working. Do you know what the problem is? If you could respond soon, that would be greatly appreciated! Thanks so much! Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help!
Hello there, I was wondering if you could help me with a quick R issue. I have a data set where one of the columns has both date and time in it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this column into two new columns: date and time. One of the problems with splitting here is that when the dates go into single digits there are no 0's in front of months January-September (e.g., January is represented by 1 as opposed to 01), so every entry is a different length. Therefore, splitting by the space is the only option, I think. Here's the coding I've developed thus far: z$dt - z$Date#time and date is all under z$Date foo - strsplit( , z$dt) #attempted split based on the space And then if that were to work, I would proceed use the coding: foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - foo[ ,2] However, foo - strsplit( , z$dt) isn't working. Do you know what the problem is? If you could respond soon, that would be greatly appreciated! Thanks so much! Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help!
Hi Alex, Here is one way: s - 12/31/11 23:45 strsplit(s, )[[1]] # [1] 12/31/11 23:45 * HTH, Jorge.- * On Wed, May 2, 2012 at 4:00 PM, Alex Roth wrote: Hello there, I was wondering if you could help me with a quick R issue. I have a data set where one of the columns has both date and time in it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this column into two new columns: date and time. One of the problems with splitting here is that when the dates go into single digits there are no 0's in front of months January-September (e.g., January is represented by 1 as opposed to 01), so every entry is a different length. Therefore, splitting by the space is the only option, I think. Here's the coding I've developed thus far: z$dt - z$Date#time and date is all under z$Date foo - strsplit( , z$dt) #attempted split based on the space And then if that were to work, I would proceed use the coding: foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - foo[ ,2] However, foo - strsplit( , z$dt) isn't working. Do you know what the problem is? If you could respond soon, that would be greatly appreciated! Thanks so much! Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help!
On 02-05-2012, at 22:00, Alex Roth wrote: Hello there, I was wondering if you could help me with a quick R issue. I have a data set where one of the columns has both date and time in it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this column into two new columns: date and time. One of the problems with splitting here is that when the dates go into single digits there are no 0's in front of months January-September (e.g., January is represented by 1 as opposed to 01), so every entry is a different length. Therefore, splitting by the space is the only option, I think. Here's the coding I've developed thus far: z$dt - z$Date#time and date is all under z$Date foo - strsplit( , z$dt) #attempted split based on the space And then if that were to work, I would proceed use the coding: foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - foo[ ,2] However, foo - strsplit( , z$dt) isn't working. Do you know what the problem is? If you could respond soon, that would be greatly appreciated! Have a look at the help for strsplit. You seem to have the arguments of strsplit the wrong way around. strsplit(z$dt, ) Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help!
Hard to say.. your example is not reproducible. (Study the Posting Guide mentioned at the end of every message on this list.) A stab in the dark might be that z$dt is a factor and needs to be converted to character first. Use the str function to study your data. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Alex Roth alexsro...@gmail.com wrote: Hello there, I was wondering if you could help me with a quick R issue. I have a data set where one of the columns has both date and time in it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this column into two new columns: date and time. One of the problems with splitting here is that when the dates go into single digits there are no 0's in front of months January-September (e.g., January is represented by 1 as opposed to 01), so every entry is a different length. Therefore, splitting by the space is the only option, I think. Here's the coding I've developed thus far: z$dt - z$Date#time and date is all under z$Date foo - strsplit( , z$dt) #attempted split based on the space And then if that were to work, I would proceed use the coding: foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - foo[ ,2] However, foo - strsplit( , z$dt) isn't working. Do you know what the problem is? If you could respond soon, that would be greatly appreciated! Thanks so much! Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 110, Issue 28
I will be out of the office Apr 26-27, 2012 with limited or no email access. For immediate help, please call the JPSM main number, 301-314-7911. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 110, Issue 23
Yes, the (start, stop] formalism is the easiest way to deal with time dependent data. Each individual only needs to have sufficient data to describe them, so for if id number 4 is in house 1, their housemate #1 was eaten at time 2, and the were eaten at time 10, the following is sufficient data for that subject: id house time1 time2 status discovered 4 10 20 false 4 12 10 1 true We don't need observations for each intermediate time, only that from 0-2 they were not yet discovered and that from 2-10 they were. The status variable tells whether an interval ended in disaster. Use Surv((time1, time2, status) on the left side of the equation. Since the time scale is discrete you should technically use method='exact' in a Cox model, but the default Efron approximation will be very close. Interval censoring isn't necessary. You will have a model of time to discovery instead of time to eaten, but with a fixed examination schedule such as you have there is no information in the data to help you move from one to the other. The standard interval approach would just assume deaths happened at the midpoint between examinations. Terry T. On 04/21/2012 05:00 AM, r-help-requ...@r-project.org wrote: Dear R users, I fear this is terribly trivial but I'm struggling to get my head around it. First of all, I'm using the survival package in R 2.12.2 on Windows Vista with the RExcel plugin. You probably only need to know that I'm using survival for this. I have data collected from 180 or so individuals that were checked 7 times throughout a trial with set start and end times. Once the event happens (death by predator) there are no more checks for that individual. This means that I check on each individual up to 7 times with either an event recorded or the final time being censored. At the moment, I have a data sheet with one observation per individual; that is either the event time (the observation time when the individual had had an event) or the censored time. However, I'd like to add a time dependent factor and I also wonder if this data should be treated as interval censored. The time dependent factor is like this. The individuals are grouped in houses and once one individual in a group has an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered the others in the house as well (the predator is able to consume many individuals). At the moment I'm coding this as a normal two level factor (discovered) where all individuals alive after the first event in that house are TRUE and the first individuals in a house to be eaten are FALSE. All individuals in houses that were not discovered at al are also FALSEl. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway). Should I write up this data set with a start and stop time for every check I made so each individual has up to 7 records, one for each time I checked? Is there a quick and easy way to do this in R or would I have to go through the data set manually? Does coding the discovered factor the way I have, make statistical sense? Should I worry about proportional hazards of the discovered factor? It seems to me that it would often turn out not proportional because of its nature. Sorry, lots of stats questions. I don't mind if you don't answer all of these. Just knowing how to best feed this data into R would help me no end. The rest I can probably glean from the millions of survival analysis books I have lying about. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: Help; error in optim
Hello, When i run the code below from Weibull distribution with 30% censoring by using optim i get an error form R, which states that Error in optim(start, fn = z, data = q, hessian = T) : objective function in optim evaluates to length 25 not 1 can somebody help me remove this error. Is my censoring approach correct. n=25;rr=1000 p=1.5;b=1.2 for (i in 1:rr){ q-c(t,cen) t-rweibull(25,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue d-meantrue/0.30 cen- runif(25,min=0,max=d) cen s-ifelse(t=cen,1,0) z-function(data,p){ beta-p[1] eta-p[2] log1-(n*cen*log(p[1])-n*cen*(p[1])*log(p[2])+cen*(p[1]-1)*sum(log(t))-n*sum((t/(p[2]))^(p[1]))) return(-log1) } start -c(0.5,0.5) zz-optim(start,fn=z,data=q,hessian=T) m1-zz$par[2] p-zz$par[1] } m1 p Thank you Chris Guure Researcher Institute for Mathematical Research UPM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: Help; error in optim
On 16-04-2012, at 07:04, Christopher Kelvin wrote: Hello, When i run the code below from Weibull distribution with 30% censoring by using optim i get an error form R, which states that Error in optim(start, fn = z, data = q, hessian = T) : objective function in optim evaluates to length 25 not 1 can somebody help me remove this error. Is my censoring approach correct. n=25;rr=1000 p=1.5;b=1.2 for (i in 1:rr){ q-c(t,cen) t-rweibull(25,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue d-meantrue/0.30 cen- runif(25,min=0,max=d) cen s-ifelse(t=cen,1,0) z-function(data,p){ beta-p[1] eta-p[2] log1-(n*cen*log(p[1])-n*cen*(p[1])*log(p[2])+cen*(p[1]-1)*sum(log(t))-n*sum((t/(p[2]))^(p[1]))) return(-log1) } start -c(0.5,0.5) zz-optim(start,fn=z,data=q,hessian=T) m1-zz$par[2] p-zz$par[1] } m1 p The example as given doesn't run. The first assignment after the start of the for loop ( q - ...) gives an error message: object 'cen' not found. The assignment needs to moved to after the line with ifelse. In function z object 'cen' (with length 25) is used in the calculation of log1, which becomes a vector of length 25. You need to review the definition of log1 in function z. Finally: why are you assigning p[1] and p[2] to beta and eta and nor using these variables? Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help: Censoring data (actually an optim issue
Your function is giving NaN's during the optimization. The R-forge version of optimx() has functionality specifically intended to deal with this. NOTE: the CRAN version does not, and the R-forge version still has some glitches! However, I easily ran the code you supplied by changing optim to optimx in the penultimate line. Here's the final output. KKT condition testing Number of parameters = 2 max abs gradient element = 0.004032794 test tol = 0.1018794 KKT1 result = TRUE Hessian eigenvalues: [1] 7.138974e+02 9.931285e-04 KKT2 result = TRUE KKT results: gmax= 0.004032794 evratio= 1.391136e-06 KKT1 2: TRUE TRUE [1] 7.138974e+02 9.931285e-04 Save results from method BFGS zz method par fvalues fns grs hes rs conv KKT1 KKT2 2 BFGS 0.2832468, 52.6096161 100.8794 10 1 0 -23 TRUE TRUE 1 NM 0.2827563, 54.3551491 100.8779 53 0 0 10 TRUE TRUE mtilt xtimes meths 2 NA 0.11 BFGS 1 0.0005130808 0.57NM There were 50 or more warnings (use warnings() to see the first 50) Note the Hessian eigenvalues are pretty bad. And the warnings are the NaN's produced. This is with just the 2 default methods. When I tried all.methods, one (uobyqa) seemed to lock up. This is a fairly ill-conditioned problem. Best, JN On 04/14/2012 06:00 AM, r-help-requ...@r-project.org wrote: Message: 13 Date: Fri, 13 Apr 2012 03:54:43 -0700 (PDT) From: Christopher Kelvin chris_kelvin2...@yahoo.com To: r-help@r-project.org r-help@r-project.org Subject: [R] R-Help: Censoring data Message-ID: 1334314483.27693.yahoomail...@web65408.mail.ac4.yahoo.com Content-Type: text/plain; charset=iso-8859-1 Hello, ?I want to estimate weibull parameters with 30% censored data. I have below the code for the censoring ?but how it must be put into the likelihood equation to obtain the desire estimate is where i have a problem with, ?can some body help? ?My likelihood equation is for a random type-I censoring where time for the censored units is different for each unit. ? n=50;r=35 p=0.8;b=1.5 t-rweibull(50,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue d-meantrue/0.30 cen- runif(50,min=0,max=d) cen s-ifelse(t=cen,1,0) s z-function(p){? shape-p[1] scale-p[2] log1-(r*log(p[1])-r*(p[1])*log(p[2])+(p[1]-1)*sum(log(t))-sum((t/(p[2]))^(p[1]) )-((n-r)*(sum(cen)/(p[2]))^(p[1]))) return(-log1) } start - c(1,1) zz-optim(start,fn=z,hessian=T) zz Thanks in anticipation Chris Guure Researcher Institute for Mathematical Research UPM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-Help: Censoring data
Hello, I want to estimate weibull parameters with 30% censored data. I have below the code for the censoring but how it must be put into the likelihood equation to obtain the desire estimate is where i have a problem with, can some body help? My likelihood equation is for a random type-I censoring where time for the censored units is different for each unit. n=50;r=35 p=0.8;b=1.5 t-rweibull(50,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue d-meantrue/0.30 cen- runif(50,min=0,max=d) cen s-ifelse(t=cen,1,0) s z-function(p){ shape-p[1] scale-p[2] log1-(r*log(p[1])-r*(p[1])*log(p[2])+(p[1]-1)*sum(log(t))-sum((t/(p[2]))^(p[1]) )-((n-r)*(sum(cen)/(p[2]))^(p[1]))) return(-log1) } start - c(1,1) zz-optim(start,fn=z,hessian=T) zz Thanks in anticipation Chris Guure Researcher Institute for Mathematical Research UPM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help; generating censored data
Hello, can i implement this as 10% censored data where t gives me failure and x censored. Thank you p=2;b=120 n=50 set.seed(132); r-sample(1:50,45) t-rweibull(r,shape=p,scale=b) t set.seed(123); cens - sample(1:50, 5) x-runif(cens,shape=p,scale=b) x Chris Guure Researcher, Institute for Mathematical Research UPM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help; generating censored data
On 11-Apr-2012 16:28:31 Christopher Kelvin wrote: Hello, can i implement this as 10% censored data where t gives me failure and x censored. Thank you p=2;b=120 n=50 set.seed(132); r-sample(1:50,45) t-rweibull(r,shape=p,scale=b) t set.seed(123);_ cens - sample(1:50, 5)_ x-runif(cens,shape=p,scale=b)_ x Chris Guure Researcher, Institute for Mathematical Research UPM This query is obscure! First, its approach does not seem to conform to the standard notion of censored data. This refers to a situation where, for each item observed, either (a) it value falls within a certain range (which may itself depend on the item), in which case its value is recorded as a value in the data; or (b) its value falls outside that range, in which case that fact is recorded but the value is not recorded (thus being censored). Eaxmple: Patients who have been admitted to hospital for a particular disease are subsequently monitored for a period of time (days/months/years) which may vary from patient to patient. The reason for the time limitation may be that the design of the investigation set a limit, or may be haphazard as a result of the patient moving away and no longer being accessible. The value recorded (if available) is the time from admission to death. If not available, then all that can be recorded is that the event occurred later than the upper time limit for thaqt patient. As far as I can see, no element of your code above corresponds to this notion of censored. Next, your r-sample(1:50,45) selects 45 different values from (1:50), and then your t-rweibull(r,shape=p,scale=b) generates 45 values sampled from the Weibull distribution, ** regardless of the 45 values from (1:50) in r ** -- See under '?rweibull' where it says: n: number of observations. If 'length(n) 1', the length is taken to be the number required. So it would seem that your r-sample(1:50,45) is superfluous, and you could simply have written t-rweibull(45,shape=p,scale=b). Similar comments apply to your cens - sample(1:50, 5) x-runif(cens,shape=p,scale=b) where you could have equivalently written x-runif(5,shape=p,scale=b). Also, the parameters shape and scale would not be recognised by runif(), whose parameters are as in runif(n, min=..., max=...). Maybe you meant to write x-rweibull(cens,shape=p,scale=b), but then you would simply be sampling a further 5 values from the same Weibull distribution, along with your original 45. So how does censoring come into this? If you would explain, in plain words, what you are seeking to do, it would help to remove this obscurity and confusion! Hoping this helps, Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 11-Apr-2012 Time: 23:23:36 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help; Censoring
Hello, I wish to censor 10% of my sample units of 50 from a Weibull distribution. Below is the code for it. I will need to know whether what i have done is correct and if not, can i have any suggestion to improve it? Thank you p=2;b=120 n=50 r=45 t-rweibull(r,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue cen- runif(n-r,min=0,max=meantrue) cen Chris Guure Researcher, Institute for Mathematical Research UPM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help; Censoring
On Apr 10, 2012, at 11:48 PM, Christopher Kelvin wrote: Hello, I wish to censor 10% of my sample units of 50 from a Weibull distribution. Below is the code for it. I will need to know whether what i have done is correct and if not, can i have any suggestion to improve it? Thank you p=2;b=120 n=50 r=45 t-rweibull(r,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue cen- runif(n-r,min=0,max=meantrue) cen I don't see how that works. You have five random draws from a uniform distribution with support on some interval, not necessarily related to the t vector. How are you deciding which of the variates should be considered censored? Why wouldn't you just use set.seed(n); cens - sample(1:50, 5)) and use as an index? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help
I have a 11*1562 set of data. I want to regress using ols all combination of the ten variables on the eleventh. Store the results of the t test of each regression and the r2 to compare which combination is better. the combination can use any amount o variables. How can I do that in R? -- View this message in context: http://r.789695.n4.nabble.com/R-help-tp4539571p4539571.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
This is, more often that not, statistically speaking a bad idea: you're likely to get a false positive (https://xkcd.com/882/) But it sounds like something along these lines should work for you to get the models : x - data.frame(rnorm(50), rnorm(50), rnorm(50), rnorm(50)) m - vector(list, ncol(x)) # Make a list to hold all the resulting models for(i in seq_along(x)) m[[i]] - lm(as.formula(paste(names(x)[i], ~.)), data = x) To parse that, loop over the columns of x (that's what seq_along will give you if x is a data frame) -- take the i-th name and model is based on everything else (~.) -- coerce that thing to a formula and pass it to lm() to do the linear model. Hope this helps, Michael On Sat, Apr 7, 2012 at 11:27 AM, kebrab67 selamgetac...@gmail.com wrote: I have a 11*1562 set of data. I want to regress using ols all combination of the ten variables on the eleventh. Store the results of the t test of each regression and the r2 to compare which combination is better. the combination can use any amount o variables. How can I do that in R? -- View this message in context: http://r.789695.n4.nabble.com/R-help-tp4539571p4539571.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On 2012-04-07 08:27, kebrab67 wrote: I have a 11*1562 set of data. I want to regress using ols all combination of the ten variables on the eleventh. Store the results of the t test of each regression and the r2 to compare which combination is better. the combination can use any amount o variables. How can I do that in R? Personally, I think that what you propose is scientific nonsense. But if you must do this, you might check out the leaps package. Peter Ehlers -- View this message in context: http://r.789695.n4.nabble.com/R-help-tp4539571p4539571.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Help
Hi, I have a polynomial of 2n^2-5n+3 and I have my n values going up in powers of 2 i.e. n=2,4,8,16….. I wanted to fit this curve to the function A*n*log2(n) +B*n where A and B are to be found. How would i do this? Thank you Jaymin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Help
This sounds a little bit like homework (since you know the actual model), so I'll just point you in the right direction: I'd do something like: n - 2^(1:10) y - 2*n^2 - 5*n + 3 dtf - data.frame(y = y, n=n) lm( **, data = dtf) The * should be replaced by a formula object: to find the syntax for those, just google around or read ?formula and the examples of ?lm. The help pages are a little opaque (very good, just not super beginner friendly) but the easiest way to set up formulas is + includes a term : includes just a cross term * includes a cross term and the individual terms (i.e., A*B = A + B + A:B) - removes a term. That should help you set things up. Hope this helps, Michael On Sat, Mar 31, 2012 at 10:03 AM, Jaymin Shah jayminsh...@hotmail.com wrote: Hi, I have a polynomial of 2n^2-5n+3 and I have my n values going up in powers of 2 i.e. n=2,4,8,16….. I wanted to fit this curve to the function A*n*log2(n) +B*n where A and B are to be found. How would i do this? Thank you Jaymin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 109, Issue 16
Q #1: yesterday: A: 1. Restructure your database with NoSQL. See MongoDB.org. 2. Choose application language with which to work (and get Driver) and write your code. 3. R package - rmongodb. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help-es have reached 500 members!
This posting is only to celebrate that R-help-es (R-help for Spanish Speakers) have reached 500 members! (and to thank Patricia for doing the bulk of admin work). Kjetil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help-es have reached 500 members!
:) congratulations! On Tue, Mar 13, 2012 at 08:59:56AM -0600, Kjetil Halvorsen wrote: This posting is only to celebrate that R-help-es (R-help for Spanish Speakers) have reached 500 members! (and to thank Patricia for doing the bulk of admin work). Kjetil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- :: Igor Sosa Mayor :: joseleopoldo1...@gmail.com :: :: GnuPG: 0x69804897 :: http://www.gnupg.org/ :: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 109, Issue 9
On Fri, 2012-03-09 at 12:00 +0100, r-help-requ...@r-project.org wrote: A note on standard errors: ?S(t) +- std is a terrible confidence interval. ?You will be much more accurate if you use log scale. ?(Some argue for logit or log-log, in truth they all work well.) ? If n is large enough, however, you should be ok. Very true, but if one really wants a confidence interval for S_1(t)-S_2(t) (not for S_1(t)/S_2(t)) then one is pretty much forced to use the raw probability scale. -thomas I agree, and stand corrected. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah m - 500 m1 - 0.5*m rho - 0.5 Sigma - rho* matrix(1, m, m)+diag(1-rho, m) Error in matrix(1, m, m) : too many elements specified Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m - 10 # can be 500 rho - 0.5 # single vector x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho # check the sample covariance matrix if m is not too large sigma - cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) - NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n - 1 a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of a is n times m and it should fit into the memory. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
Petr, Thanks for the help. That certainly makes sense and solves my current problem. But, in general, for arbitrary covariance matrix (instead of the special equi-correlation case), it there a way to generate numbers from multivariate normal when the dimension is very high? Thanks. Hannah ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.czдµÀ£º On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah m - 500 m1 - 0.5*m rho - 0.5 Sigma - rho* matrix(1, m, m)+diag(1-rho, m) Error in matrix(1, m, m) : too many elements specified Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m - 10 # can be 500 rho - 0.5 # single vector x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho # check the sample covariance matrix if m is not too large sigma - cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) - NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n - 1 a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of a is n times m and it should fit into the memory. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
Actually I still get an error for the case of equal correlation. Below is the code m - 10 n - 500 m1 - 0.5*m mu - c(rep(2, m1), rep(0, m-m1)) rho - 0.5 x.mod1 - matrix(rnorm(n*m, sd=sqrt(1-rho)), nrow=n, ncol=m)+rnorm(n, sd=sqrt(rho))+t(replicate(n,mu)) Error: cannot allocate vector of size 381.5 Mb R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç10:53£¬li li hannah@gmail.comдµÀ£º Petr, Thanks for the help. That certainly makes sense and solves my current problem. But, in general, for arbitrary covariance matrix (instead of the special equi-correlation case), it there a way to generate numbers from multivariate normal when the dimension is very high? Thanks. Hannah ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.czдµÀ£º On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah m - 500 m1 - 0.5*m rho - 0.5 Sigma - rho* matrix(1, m, m)+diag(1-rho, m) Error in matrix(1, m, m) : too many elements specified Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m - 10 # can be 500 rho - 0.5 # single vector x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho # check the sample covariance matrix if m is not too large sigma - cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) - NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n - 1 a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of a is n times m and it should fit into the memory. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On Feb 19, 2012, at 16:53 , li li wrote: Petr, Thanks for the help. That certainly makes sense and solves my current problem. But, in general, for arbitrary covariance matrix (instead of the special equi-correlation case), it there a way to generate numbers from multivariate normal when the dimension is very high? In the general case, there is really no alternative to an algorithm on the order of p^2 in space and p^2 per replication in time. The covariance matrix is of size p*(p+1)/2 and you will have to multiply by its square root for each replicate. If you run out of space, you are out of luck. Any potential speedups will have to rely on special structure of the covariance. Thanks. Hannah ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.cz‹´µÀ£º On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah m - 500 m1 - 0.5*m rho - 0.5 Sigma - rho* matrix(1, m, m)+diag(1-rho, m) Error in matrix(1, m, m) : too many elements specified Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m - 10 # can be 500 rho - 0.5 # single vector x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho # check the sample covariance matrix if m is not too large sigma - cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) - NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n - 1 a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of a is n times m and it should fit into the memory. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On Feb 19, 2012, at 11:46 AM, li li wrote: Actually I still get an error for the case of equal correlation. No. You need to read the error message for meaning. Below is the code m - 10 n - 500 m1 - 0.5*m mu - c(rep(2, m1), rep(0, m-m1)) rho - 0.5 x.mod1 - matrix(rnorm(n*m, sd=sqrt(1-rho)), nrow=n, ncol=m)+rnorm(n, sd=sqrt(rho))+t(replicate(n,mu)) Error: cannot allocate vector of size 381.5 Mb You don't have enough space in your machine (whatever it might be). -- David. R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç10:53£¬li li hannah@gmail.comдµÀ£º Petr, Thanks for the help. That certainly makes sense and solves my current problem. But, in general, for arbitrary covariance matrix (instead of the special equi-correlation case), it there a way to generate numbers from multivariate normal when the dimension is very high? Thanks. Hannah ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.czдµÀ£º On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah m - 500 m1 - 0.5*m rho - 0.5 Sigma - rho* matrix(1, m, m)+diag(1-rho, m) Error in matrix(1, m, m) : too many elements specified Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m - 10 # can be 500 rho - 0.5 # single vector x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho # check the sample covariance matrix if m is not too large sigma - cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) - NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n - 1 a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of a is n times m and it should fit into the memory. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Hi. I am replying to your first email, since the other did not arrive to my folder, possibly filtered out by a spam filter. I see them at the web interface. 1. Error: cannot allocate vector of size 381.5 Mb The error message makes sense. The matrix requires m*n*8/2^20 MB, which is in your case m - 10 n - 500 m*n*8/2^20 [1] 381.4697 May be, you already have other large objects in the memory. Try to minimize the number and size of objects, which you need simultaneously in an R session. 2. Generating a multivariate normal distribution. As Peter Dalgaard pointed out, a speed up is possible only for special types of the covariance matrix Sigma. A general way is to find a matrix A such that A A^t = Sigma. Then, the vector A X, where X is from N(0,I) and I is an identity matrix of an appropriate dimension, has covariance Sigma. This is also the way, how mvtnorm package works. A speed up is possible, if computing the product A X does not require to have matrix A explicitly represented in memory. The matrix A need not be a square matrix. In particular, the previous case may be understood as using the matrix A, which for a small m is as follows. m - 5 rho - 0.5 A - cbind(sqrt(rho), sqrt(1 - rho)*diag(m)) A [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.7071068 0.7071068 0.000 0.000 0.000 0.000 [2,] 0.7071068 0.000 0.7071068 0.000 0.000 0.000 [3,] 0.7071068 0.000 0.000 0.7071068 0.000 0.000 [4,] 0.7071068 0.000 0.000 0.000 0.7071068 0.000 [5,] 0.7071068 0.000 0.000 0.000 0.000 0.7071068 A %*% t(A) [,1] [,2] [,3] [,4] [,5] [1,] 1.0 0.5 0.5 0.5 0.5 [2,] 0.5 1.0 0.5 0.5 0.5 [3,] 0.5 0.5 1.0 0.5 0.5 [4,] 0.5 0.5 0.5 1.0 0.5 [5,] 0.5 0.5 0.5 0.5 1.0 This construction is conceptually possible also for a large m because the structure of A allows to compute A X by simpler operations with the vector X than an explicit matrix product. Namely, the expression rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) or, more clearly, sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m) is equivalent to the required A X, where X consists of rnorm(1) and rnorm(m) together. If you have a specific Sigma, describe it and we can discuss, whether an appropriate A can be found. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
Folks: Perhaps I am naive, ignorant, or foolish, but this whole discussion seems rather ridiculous. What possible relation to reality could a multivariate normal of the size requested have? Even for simulation, it seems like nonsense. Cheers, Bert On Sun, Feb 19, 2012 at 11:35 AM, Petr Savicky savi...@cs.cas.cz wrote: On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Hi. I am replying to your first email, since the other did not arrive to my folder, possibly filtered out by a spam filter. I see them at the web interface. 1. Error: cannot allocate vector of size 381.5 Mb The error message makes sense. The matrix requires m*n*8/2^20 MB, which is in your case m - 10 n - 500 m*n*8/2^20 [1] 381.4697 May be, you already have other large objects in the memory. Try to minimize the number and size of objects, which you need simultaneously in an R session. 2. Generating a multivariate normal distribution. As Peter Dalgaard pointed out, a speed up is possible only for special types of the covariance matrix Sigma. A general way is to find a matrix A such that A A^t = Sigma. Then, the vector A X, where X is from N(0,I) and I is an identity matrix of an appropriate dimension, has covariance Sigma. This is also the way, how mvtnorm package works. A speed up is possible, if computing the product A X does not require to have matrix A explicitly represented in memory. The matrix A need not be a square matrix. In particular, the previous case may be understood as using the matrix A, which for a small m is as follows. m - 5 rho - 0.5 A - cbind(sqrt(rho), sqrt(1 - rho)*diag(m)) A [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.7071068 0.7071068 0.000 0.000 0.000 0.000 [2,] 0.7071068 0.000 0.7071068 0.000 0.000 0.000 [3,] 0.7071068 0.000 0.000 0.7071068 0.000 0.000 [4,] 0.7071068 0.000 0.000 0.000 0.7071068 0.000 [5,] 0.7071068 0.000 0.000 0.000 0.000 0.7071068 A %*% t(A) [,1] [,2] [,3] [,4] [,5] [1,] 1.0 0.5 0.5 0.5 0.5 [2,] 0.5 1.0 0.5 0.5 0.5 [3,] 0.5 0.5 1.0 0.5 0.5 [4,] 0.5 0.5 0.5 1.0 0.5 [5,] 0.5 0.5 0.5 0.5 1.0 This construction is conceptually possible also for a large m because the structure of A allows to compute A X by simpler operations with the vector X than an explicit matrix product. Namely, the expression rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) or, more clearly, sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m) is equivalent to the required A X, where X consists of rnorm(1) and rnorm(m) together. If you have a specific Sigma, describe it and we can discuss, whether an appropriate A can be found. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On 20/02/12 09:55, Bert Gunter wrote: Folks: Perhaps I am naive, ignorant, or foolish, but this whole discussion seems rather ridiculous. What possible relation to reality could a multivariate normal of the size requested have? Even for simulation, it seems like nonsense. Right on, bro'!!! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help
Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah m - 500 m1 - 0.5*m rho - 0.5 Sigma - rho* matrix(1, m, m)+diag(1-rho, m) Error in matrix(1, m, m) : too many elements specified mu - c(rep(2, m1), rep(0, m-m1)) x.mod1 - mvrnorm(n = 1, mu, Sigma, tol = 1e-6, empirical = FALSE) Error in mvrnorm(n = 1, mu, Sigma, tol = 1e-06, empirical = FALSE) : incompatible arguments [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 108, Issue 16
Hi Petr, You advice options(scipen=20) gave me the expected result and fix the problem. Thanks a lot! Gian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 108, Issue 13
dear: i want to know how to get a survival curve of the Cox proportional risk regression, thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 108, Issue 13
Le lundi 13 février 2012 à 19:48 +0800, 丁飞 a écrit : dear: i want to know how to get a survival curve of the Cox proportional risk regression, thank you. See the relevant part of the Survival Task View here: http://cran.r-project.org/web/views/Survival.html And more specifically, see the coxph() function in the survival package. A tutorial is available from: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf (But RSiteSearch(coxph) or a simple Google search will give you many more results.) Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.