Re: [R] Intersecting two matrices
I would appreciate it if you would follow the Posting Guide and give a reproducible example and post all messages using plain text. Try m1 - matrix(sample(0:999,2*1057837,TRUE),ncol=2) m2 - matrix(sample(0:999,2*951980,TRUE),ncol=2) df1 - as.data.frame(m1) df2 - as.data.frame(m2) library(sqldf) system.time(df3 - sqldf(SELECT DISTINCT df1.V1, df1.V2 FROM df1 INNER JOIN df2 ON df1.V1=df2.V1 AND df1.V2=df2.V2) ) The speed seems heavily dependent on how many rows are duplicated within the input data frames... so if the range of values is small then the query runs slower. Note also that moving the data from R to the database and back takes time... you may be able to import the data directly from your source data to the database and save some time. Read ?sqldf and ?read.csv.sql examples for more info. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. c char charlie.hsia...@gmail.com wrote: I am not familiar with R's sort and sql libs. appreciate if you can post a code snippet when you got time. Thanks a lot! On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: In that case, you should be looking at a relational inner join, perhaps with SQLite (see package sqldf). --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. c char charlie.hsia...@gmail.com wrote: Thanks a lot. Still looking for some super fast and memory efficient solution, as the matrix I have in real world has billions of rows. On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap wdun...@tibco.com wrote: I haven't looked at the size-time relationship, but im2 (below) is faster than your function on at least one example: intersectMat - function(mat1, mat2) { #mat1 and mat2 are both deduplicated nr1 - nrow(mat1) nr2 - nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , drop=FALSE] } im2 - function(mat1, mat2) { stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) toChar - function(twoColMat) paste(sep=\1, twoColMat[,1], twoColMat[,2]) mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] } m1 - cbind(1:1e7, rep(1:10, len=1e7)) m2 - cbind(1:1e7, rep(1:20, len=1e7)) system.time(r1 - intersectMat(m1,m2)) user system elapsed 430.371.96 433.98 system.time(r2 - im2(m1,m2)) user system elapsed 27.890.20 28.13 identical(r1, r2) [1] TRUE dim(r1) [1] 500 2 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of c char Sent: Monday, July 29, 2013 4:04 PM To: r-help@r-project.org Subject: [R] Intersecting two matrices Dear all, I am interested to know a faster matrix intersection package for R handles intersection of two integer matrices with ncol=2. Currently I am using my homemade code adapted from a previous thread: intersectMat - function(mat1, mat2){#mat1 and mat2 are both deduplicated nr1 - nrow(mat1) nr2 - nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} which handles: size A= 10578373 size B= 9519807 expected intersecting time= 251.2272 intersecting for corssing MPRs took 409.602 seconds. scale a little bit worse than linearly but atomic operation is not good. Wonder if a super fast C/C++ extension exists for this task. Your ideas are appreciated. Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __
Re: [R] Plot a series of plots without using a loop
Hello, There's a bug in the line for (i in 1:length(dim(somdata.xyf$codes$X)[2])) length() is always 1, you can use simply 1:dim(...)[2] or even simpler for(i in 1:ncol(somdata.xyf$codes$X)) As for a way without a loop, you could use ?sapply: sapply(1:ncol(somdata.xyf$codes$X), function(i) plot(...)) But I believe the loop is far more readable, and preferable. Rui Barradas Em 31-07-2013 00:25, Ben Harrison escreveu: On 30 July 2013 21:35, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Maybe the following does it. op - par(mfrow=c(2, 3)) for(i in 1:6){ plot(somdata.xyf, type=property, property=somdata.xyf$codes$X[, i], main=colnames(somdata.xyf$codes$X)[i]) } par(op) Hope this helps, Rui Barradas Thanks Rui, that does it for sure. I had come to that solution, but just realised by looking at it again, I could change for (i in 1:6) with for (i in 1:length(dim(somdata.xyf$codes$X)[2])) I was also wondering if there was a way to do it without a for loop, but in this case it's a very small number of iterations, so probably not worth it. Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using If loop in R how to extract even and odd ids
I have 500 ids ; i want to take out even and odd ids separately and store it another data files. How can it be done in R by using *If and for loop* ?? -- View this message in context: http://r.789695.n4.nabble.com/Using-If-loop-in-R-how-to-extract-even-and-odd-ids-tp4672707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List of lists
Hi Jim, close(filedescriptors$cpufiledescriptors[[1]]) close(filedescriptors$cpufiledescriptors[[2]]) close(filedescriptors$cpufiledescriptors[[3]]) I might be doing something wrong. Error is Error in UseMethod(close) : no applicable method for 'close' applied to an object of class c ('integer', 'numeric') Thanks, Mohan Re: [R] List of lists Jim Lemon to: mohan.radhakrishnan, R-help@r-project.org 31-07-2013 03:05 AM On 07/30/2013 10:05 PM, mohan.radhakrish...@polarisft.com wrote: Hi, I am creating a list of 2 lists, one containing filenames and the other file descriptors. When I retrieve them I am unable to close the file descriptor. I am getting this error when I try to call close(filedescriptors [[2]][[1]]). Error in UseMethod(close) : no applicable method for 'close' applied to an object of class c ('integer', 'numeric') print(filedescriptors[[2]][[1]]) seems to be printing individual elements. Thanks, Mohan filelist.array- function(n){ cpufile- list() cpufiledescriptors- list() length(cpufile)- n for (i in 1:n) { cpufile[[i]]- paste(output, i, .txt, sep = ) cpufiledescriptors[[i]]-file( cpufile[[i]], a ) } listoffiles- list(cpufile=cpufile, cpufiledescriptors=cpufiledescriptors) return (listoffiles) } #Test function test.filelist.array- function() { filedescriptors- filelist.array(3) print(filedescriptors[[2]][[1]]) print(filedescriptors[[2]][[2]]) print(filedescriptors[[2]][[3]]) } Hi Mohan, When you have opened connections as above, you need to pass the connection, not just one element, to close: close(listoffiles$cpufiledescriptors[[1]]) Jim This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Add a column to a data frame with value based on the percentile of the row
Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named SEGMENT to it. The first 5% rows (first 125.000 rows) should have the value Top 5% in the SEGMENT column Then the rows from 5% to 20% should have the value 5 to 20 Then 20-50% should have the value 20 to 50 And the last 50% of the rows should have the value Bottom 50 What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing real set vs sampled sets
Dear R helper, I have a statistic question. I have a vector of 500 values for which I need to assess the statistical significance of occurrence real.dist - realValues For that, I sampled from my data large data pool 1000 other vectors of 500 values each. I then run ks.test with my real vec vs each of the sampled vectors. ks.res-unlist(lapply(l.sampled,function(x){ ks - ks.test(real.dist, x$dist) as.numeric(ks[[statistic]]) })) I now have 1000 D values with their corresponding p.values. How can I have a general p.value saying that my real data differs from the sampled one, and thus significant ? Any suggestion ? Many thanks, -- View this message in context: http://r.789695.n4.nabble.com/comparing-real-set-vs-sampled-sets-tp4672709.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] parfm frailty model and post hoc testing
Dear all, I'm running a model with one fixed factor which has four groups called species, and a clustering factor called nest. My dependent variable (timeto) is ttm (time to moult) which is number of days perindividualhttp://r.789695.n4.nabble.com/parfm-frailty-model-and-post-hoc-testing-td4672712.html#, and the Status-variable is called moulted_final. The code and its results are as follows. library(parfm) Moult=read.table(file=HSBS R moult2.txt,header=T) modelMoult=parfm(Surv(ttm,moulted_final)~species,cluster=nest,data=Moult,dist=weibull,frailty=possta) Execution time: 12.72 second(s) anova(modelMoult) Analysis of Deviance Table Parametric frailty model: response is Surv(ttm, moulted_final) Terms added sequentially (first to last) loglik Chisq Df Pr(|Chi|) NULL-346.61 species -341.35 10.514 1 0.001184 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 As you can see there are significant differences among species and I would like to know how to obtain these. I'm used to using linear models in which post hoc testing gives you pairwise p-values, but I'm not sure if that is how parfm works. On a side note, all my samples have moulted so moulted_final has the same state (1) for all samples. Thanks in advance, Raoul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] detect multivariate outliers with aq.plot {mvoutliers} high dimensions
Hei, i have a species abundance data set CommData, with n (samples)=40 and p (species)=107. Sample Species A Species B Species C Species D …. 411_201040 20 0 0 412_201030 20 0 0 413_20100 0 0 0 414_20100 10 0 0 415_201020 0 0 0 418_20100 0 0 0 419_20100 0 0 0 421_2010160 40 0 10 …. I try to find outliers based on the Mahalonis distance with the package {mvoutliers}. I get an error using aq.plot(CommData): Error in covMcd(x, alpha = quan) : n = p -- you can't be serious! SoI try pcout(CommData), which is supposed to work for high dimensions, but get the error More than 50% equal values in one or more variables! Can this be fixed? Any idea how i can find outliers in my multidimensional data? Thanks a lot for any help!! -- View this message in context: http://r.789695.n4.nabble.com/detect-multivariate-outliers-with-aq-plot-mvoutliers-high-dimensions-tp4672714.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using If loop in R how to extract even and odd ids
Hello, Who told you you need a loop or an if? even - function(x) x %% 2 == 0 x - 1:50 idx - even(x) x[idx] Hope this helps, Rui Barradas Em 31-07-2013 08:46, ravi.raghava1 escreveu: I have 500 ids ; i want to take out even and odd ids separately and store it another data files. How can it be done in R by using *If and for loop* ?? -- View this message in context: http://r.789695.n4.nabble.com/Using-If-loop-in-R-how-to-extract-even-and-odd-ids-tp4672707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add a column to a data frame with value based on the percentile of the row
Hello, Combine quantile() with findInterval(). Something like the following. # sample data x - rnorm(100) val - c(Bottom 50, 20 to 50, 5 to 20, Top 5%) qq - quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1)) idx - findInterval(x, qq) val[idx] Hope this helps, Rui Barradas Em 31-07-2013 10:37, Dark escreveu: Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named SEGMENT to it. The first 5% rows (first 125.000 rows) should have the value Top 5% in the SEGMENT column Then the rows from 5% to 20% should have the value 5 to 20 Then 20-50% should have the value 20 to 50 And the last 50% of the rows should have the value Bottom 50 What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using If loop in R how to extract even and odd ids
Hi, May be this helps: set.seed(24) dat1- data.frame(ID=1:500,value=rnorm(500)) res- split(dat1,dat1$ID%%2) A.K. - Original Message - From: ravi.raghava1 ravi.ragh...@classle.co.in To: r-help@r-project.org Cc: Sent: Wednesday, July 31, 2013 3:46 AM Subject: [R] Using If loop in R how to extract even and odd ids I have 500 ids ; i want to take out even and odd ids separately and store it another data files. How can it be done in R by using *If and for loop* ?? -- View this message in context: http://r.789695.n4.nabble.com/Using-If-loop-in-R-how-to-extract-even-and-odd-ids-tp4672707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add a column to a data frame with value based on the percentile of the row
Hi, May be this helps: set.seed(24) dat1- data.frame(ID=1:500,value=rnorm(500)) indx-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1))) indx1-findInterval(row.names(dat1),indx,rightmost.closed=TRUE) dat1$SEGMENT- as.character(factor(indx1,labels=c(Top 5%,5 to 20,20 to 50, Bottom 50))) head(dat1) # ID value SEGMENT #1 3 -0.7859574 Top 5% #2 3 1.0117428 Top 5% #3 8 -2.1558035 Top 5% #4 6 1.7803880 Top 5% #5 7 0.4192816 Top 5% #6 10 -1.0142512 Top 5% tail(dat1) # ID value SEGMENT #495 1 0.3571848 Bottom 50 #496 9 -1.1971854 Bottom 50 #497 5 0.3544896 Bottom 50 #498 8 -0.1562356 Bottom 50 #499 8 -0.2994321 Bottom 50 #500 8 -0.4170319 Bottom 50 A.K. - Original Message - From: Dark i...@software-solutions.nl To: r-help@r-project.org Cc: Sent: Wednesday, July 31, 2013 5:37 AM Subject: [R] Add a column to a data frame with value based on the percentile of the row Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named SEGMENT to it. The first 5% rows (first 125.000 rows) should have the value Top 5% in the SEGMENT column Then the rows from 5% to 20% should have the value 5 to 20 Then 20-50% should have the value 20 to 50 And the last 50% of the rows should have the value Bottom 50 What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] heatmap scale parameter question
Would anyone of the more experienced r-users explain to me the behaviour of the scale parameter in the heatmap function. different options for scale (R 3.0.1) do change only the colors but do not affect the dendrograms. Please see for yourself executing the following code: d - matrix(rnorm(100),nrow=20) stats::heatmap(d) X11() heatmap(d,scale=column) X11() heatmap(d,scale=row) X11() heatmap(d,scale=none) In all four above cases the dendrograms look exactly the same However, scaling clearly affects clustering. see: d - scale(d) heatmap(d,scale=none) best regards R version 3.0.1 (2013-05-16) -- Good Sport ciao -- Witold Eryk Wolski -- Witold Eryk Wolski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please take me out of the mailing list
[[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please take me out of the mailing list
Subject: [R] Please take me out of the mailing list Please follow the instructions on the mailing list page. The link is given at the bottom of every mail from the list. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add a column to a data frame with value based on the percentile of the row
Hi, set.seed(24) dat1- data.frame(ID=1:500,value=rnorm(500)) dat1 - dat1[order(-dat1$value),] row.names(dat1)-1:nrow(dat1) indx-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1))) indx1-findInterval(row.names(dat1),indx,rightmost.closed=TRUE) dat1$SEGMENT- as.character(factor(indx1,labels=c(Top 5%,5 to 20,20 to 50, Bottom 50))) A.K. Hi Arun Kirshna, I have tested your method and it will work for me. I only run into one problem. Before I want to do this operation I have sorted my data frame so my rownumbers ar not subsequent. You can see if you first order your example data frame like: dat1 - dat1[order(-dat1$value),] head(dat1) ID value SEGMENT 237 237 3.538552 20 to 50 21 21 3.376149 Top 5% 421 421 3.015634 Bottom 50 339 339 2.855991 Bottom 50 119 119 2.589574 20 to 50 12 12 2.512276 Top 5% Do you have a solution for this? - Original Message - From: arun smartpink...@yahoo.com To: Dark i...@software-solutions.nl Cc: R help r-help@r-project.org Sent: Wednesday, July 31, 2013 7:48 AM Subject: Re: [R] Add a column to a data frame with value based on the percentile of the row Hi, May be this helps: set.seed(24) dat1- data.frame(ID=1:500,value=rnorm(500)) indx-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1))) indx1-findInterval(row.names(dat1),indx,rightmost.closed=TRUE) dat1$SEGMENT- as.character(factor(indx1,labels=c(Top 5%,5 to 20,20 to 50, Bottom 50))) head(dat1) # ID value SEGMENT #1 3 -0.7859574 Top 5% #2 3 1.0117428 Top 5% #3 8 -2.1558035 Top 5% #4 6 1.7803880 Top 5% #5 7 0.4192816 Top 5% #6 10 -1.0142512 Top 5% tail(dat1) # ID value SEGMENT #495 1 0.3571848 Bottom 50 #496 9 -1.1971854 Bottom 50 #497 5 0.3544896 Bottom 50 #498 8 -0.1562356 Bottom 50 #499 8 -0.2994321 Bottom 50 #500 8 -0.4170319 Bottom 50 A.K. - Original Message - From: Dark i...@software-solutions.nl To: r-help@r-project.org Cc: Sent: Wednesday, July 31, 2013 5:37 AM Subject: [R] Add a column to a data frame with value based on the percentile of the row Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named SEGMENT to it. The first 5% rows (first 125.000 rows) should have the value Top 5% in the SEGMENT column Then the rows from 5% to 20% should have the value 5 to 20 Then 20-50% should have the value 20 to 50 And the last 50% of the rows should have the value Bottom 50 What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge matrix row data
Dear list, I have a matrix showing the species presence-absence on a map. Its rows are map locations, represented by GridCellID, such as GID1 and GID 5. Its columns are species ID, such as D0989, D9820, and D5629. The matrix is as followed. Now I want to merge the GridCellID according to the map location of each island. For instance, Island A consist of GID 1 and 5. Island B consist of GID 2, 4, and 7. In GID 1 and 5, species D0989 are both 1. Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1. The original matrix and the resulting matrix are listed below. Please kindly advise how to code the calculation in R. Please do not hesitate to ask if anything is unclear. Thank you in advance. Elaine Original matrix D0989 D9820 D5629 D4327 D2134 GID 1100 1 0 GID 2011 0 0 GID 4001 0 0 GID 5110 0 0 GID 7010 0 1 Resulting matrix D0989 D9820 D5629 D4327 D2134 Island A 11 0 1 0 Island B 01 1 0 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge matrix row data
HI, Please use ?dput() mat1- as.matrix(read.table(text= D0989 D9820 D5629 D4327 D2134 GID_1 1 0 0 1 0 GID_2 0 1 1 0 0 GID_4 0 0 1 0 0 GID_5 1 1 0 0 0 GID_7 0 1 0 0 1 ,sep=,header=TRUE)) row.names(mat1)- gsub([_], ,row.names(mat1)) IslandA-c(GID 1, GID 5) IslandB- c(GID 2, GID 4, GID 7) res- t(sapply(c(IslandA,IslandB),function(x) {x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} )) res # D0989 D9820 D5629 D4327 D2134 #IslandA 1 1 0 1 0 #IslandB 0 1 1 0 1 A.K. - Original Message - From: Elaine Kuo elaine.kuo...@gmail.com To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch Cc: Sent: Wednesday, July 31, 2013 9:03 AM Subject: [R] merge matrix row data Dear list, I have a matrix showing the species presence-absence on a map. Its rows are map locations, represented by GridCellID, such as GID1 and GID 5. Its columns are species ID, such as D0989, D9820, and D5629. The matrix is as followed. Now I want to merge the GridCellID according to the map location of each island. For instance, Island A consist of GID 1 and 5. Island B consist of GID 2, 4, and 7. In GID 1 and 5, species D0989 are both 1. Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1. The original matrix and the resulting matrix are listed below. Please kindly advise how to code the calculation in R. Please do not hesitate to ask if anything is unclear. Thank you in advance. Elaine Original matrix D0989 D9820 D5629 D4327 D2134 GID 1 1 0 0 1 0 GID 2 0 1 1 0 0 GID 4 0 0 1 0 0 GID 5 1 1 0 0 0 GID 7 0 1 0 0 1 Resulting matrix D0989 D9820 D5629 D4327 D2134 Island A 1 1 0 1 0 Island B 0 1 1 0 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Highlight selected bar in barplot
Hi All, I am new at R so any help would be appreciate. Below my current R-code/script: initial.dir-getwd() setwd('/Users/jurgens/VirtualEnv/venv/Projects/QTLS/Resaved_Results') dataset - read.table(LWxANNA_FinalReport_resaved_spwc.csv, header=TRUE, sep=\t ) n - length(dataset$X..No.Call) x - sort(dataset$X..No.Call,partial = n )[n] outlier - dataset[ dataset$X..No.Call quantile(dataset$X..No.Call,0.25) + (IQR(dataset$X..No.Call) *1.5),] par( las=2, cex.axis=0.5, cex.lab=1, cex.main=2, cex.sub=1) barplot(dataset$X..No.Call, names.arg = dataset$Individual.Sample, cex.names=0.5 ,space=0.5, ylim=c(0,x*1.5) ) setwd(initial.dir) I would like to highlight the sample in outlier on the barplot that is create, would this be possible? Thanks -- Regards/Groete/Mit freundlichen GrüÃen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/пÑÐ¸Ð²ÐµÑ Jurgens de Bruin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R number format with Hmisc and knitr
Dear R-Users and R-Devels, I have a problem when using knitr in combination with Hmisc. I generate a data.frame which has mixed scientific and non-scientific numbers inside. In my Latex Table I just want to have non-scientific format, so I call latex(myDataFrame, file = '', cdec = c(0, rep(4, NROW(myDataFrame) - 1)), ) Usually this works, but in this case it doesn't. I do not know why but suspect the mixed data format to be the culprit. What could I do? Using format(, scientific = FALSE) before or options(scipen = 4) before has no influence. Best Simon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R number format with Hmisc and knitr
Errata: it must say: latex(myDataFrame, file = '', cdec = c(0, rep(4, NCOL(myDataFrame) - 1)) ) But this does not work. Scientific notation is very robust :) Apologize Simon On Jul 31, 2013, at 5:05 PM, Simon Zehnder szehn...@uni-bonn.de wrote: Dear R-Users and R-Devels, I have a problem when using knitr in combination with Hmisc. I generate a data.frame which has mixed scientific and non-scientific numbers inside. In my Latex Table I just want to have non-scientific format, so I call latex(myDataFrame, file = '', cdec = c(0, rep(4, NROW(myDataFrame) - 1)), ) Usually this works, but in this case it doesn't. I do not know why but suspect the mixed data format to be the culprit. What could I do? Using format(, scientific = FALSE) before or options(scipen = 4) before has no influence. Best Simon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation Loops in time series
Hi, May be this helps: set.seed(25) mt1- matrix(sample(c(NA,1:40),20*200,replace=TRUE),ncol=200) set.seed(487) mt2- matrix(sample(c(NA,1:80),20*200,replace=TRUE),ncol=200) res- sapply(seq_len(ncol(mt1)),function(i) cor(mt1[,i],mt2[,i],use=complete.obs,method=pearson)) A.K. Hello, I've got the following problem. I have to matrices each containing 200 time series. Now I want to calculate the correlation of the first time series of each of the matrices. I use the following command: cor(mts1[,1],mts2[,1], use=complete.obs, method=c(pearson)) cor(mts1[,2],mts2[,2], use=complete.obs, method=c(pearson)) cor(mts1[,3],mts2[,3], use=complete.obs, method=c(pearson)) and so on.. I would like to repeat this for each of the 200 time series. As it is quite painful to change the command 200 times I wanted to ask if there's a loop function that can cover these series in a fast way? Thanks in advance for your help Best Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] heatmap scale parameter question
In your example all of the values are drawn from the same distribution so there will not be substantial differences (row means/variances and column means/variances will be approximately the same). set.seed(42) d - matrix(rnorm(100),nrow=20) # Start with your example and modify the row/col means rows - sample.int(15:25, 20, replace=TRUE) cols - sample.int(5:15, 5, replace=TRUE) d2 - sweep(d, 2, cols, +) d2 - sweep(d2, 1, rows, +) heatmap(d2, scale=none) heatmap(d2, scale=row) heatmap(d2, scale=col) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Witold E Wolski Sent: Wednesday, July 31, 2013 7:04 AM To: r-help@r-project.org Subject: [R] heatmap scale parameter question Would anyone of the more experienced r-users explain to me the behaviour of the scale parameter in the heatmap function. different options for scale (R 3.0.1) do change only the colors but do not affect the dendrograms. Please see for yourself executing the following code: d - matrix(rnorm(100),nrow=20) stats::heatmap(d) X11() heatmap(d,scale=column) X11() heatmap(d,scale=row) X11() heatmap(d,scale=none) In all four above cases the dendrograms look exactly the same However, scaling clearly affects clustering. see: d - scale(d) heatmap(d,scale=none) best regards R version 3.0.1 (2013-05-16) -- Good Sport ciao -- Witold Eryk Wolski -- Witold Eryk Wolski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add a column to a data frame with value based on the percentile of the row
Hello, Sorry, that should be 0.80, not 0.70. qq - quantile(x, probs = c(0, 0.50, 0.80, 0.95, 1)) Rui Barradas Em 31-07-2013 12:22, Rui Barradas escreveu: Hello, Combine quantile() with findInterval(). Something like the following. # sample data x - rnorm(100) val - c(Bottom 50, 20 to 50, 5 to 20, Top 5%) qq - quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1)) idx - findInterval(x, qq) val[idx] Hope this helps, Rui Barradas Em 31-07-2013 10:37, Dark escreveu: Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named SEGMENT to it. The first 5% rows (first 125.000 rows) should have the value Top 5% in the SEGMENT column Then the rows from 5% to 20% should have the value 5 to 20 Then 20-50% should have the value 20 to 50 And the last 50% of the rows should have the value Bottom 50 What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Greek symbols in study labels and custom summary lines in forest plot (meta)
On Jul 29, 2013, at 11:52 AM, Rapsomaniki, Eleni wrote: Dear R helpers, Is there a way to display mathematical notations (e.g. greek characters, subscripts) properly in study (studlab) and group (byvar) labels in a forest plot created using the meta package? #Example: library(meta) logHR - log(runif(10,0.5,2)) selogHR - log(runif(10,0.05,0.2)) study=c(0.1,.2,.3,.4,.5,0.1,.2,.3,.4,.5) group=c(rep('alpha',5),rep('beta',5)) meta1=metagen(logHR, selogHR, sm=HR,studlab=paste(Fixed,expression(beta[w]),study),byvar=group) forest(meta1, print.byvar=F) I tried a variety of plotmath and substitute strategies but the arguments to studlab get first processed with 'as.character' and then put into a data.frame before printing. Dataframes do not accept language objects, so R expressions could not be processed. dftest - data.frame(a =expression(a,b,c)) Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class expression to a data.frame Best I could to was ' ... ,studlab=paste('Fixed ß[w]=',study), Question 2 Is there a way to add a line to this plot at my preferred location? For example, I want to add a within-group combined estimate line (the default here is just an overall group line by random or fixed effects). I know I need to use grid.lines, e.g. grid.lines(x = 3, y = c(0.5,1),gp = gpar(col = 5)) But for the life of me I can't work out the co-ordinate system in grid graphics! Unfortunately all of the printing to the device is handled inside the 'forest' function and no list representation is returned as a value to be augmented and later printed. So the input data would need to be entered in a manner that gets processed as text or you would need to modify the code. I don't have the knowledge of the meta package that can get there. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Highlight selected bar in barplot
It's a bit difficult to know what you are doing without any data. Would you supply some data please. See ?dput for the easiest way to supply it. Also have a look at https://github.com/hadley/devtools/wiki/Reproducibility and/or http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for some suggetions on asking questions and code formatting. John Kane Kingston ON Canada -Original Message- From: debrui...@gmail.com Sent: Wed, 31 Jul 2013 16:57:55 +0200 To: r-help@r-project.org Subject: [R] Highlight selected bar in barplot Hi All, I am new at R so any help would be appreciate. Below my current R-code/script: initial.dir-getwd() setwd('/Users/jurgens/VirtualEnv/venv/Projects/QTLS/Resaved_Results') dataset - read.table(LWxANNA_FinalReport_resaved_spwc.csv, header=TRUE, sep=\t ) n - length(dataset$X..No.Call) x - sort(dataset$X..No.Call,partial = n )[n] outlier - dataset[ dataset$X..No.Call quantile(dataset$X..No.Call,0.25) + (IQR(dataset$X..No.Call) *1.5),] par( las=2, cex.axis=0.5, cex.lab=1, cex.main=2, cex.sub=1) barplot(dataset$X..No.Call, names.arg = dataset$Individual.Sample, cex.names=0.5 ,space=0.5, ylim=c(0,x*1.5) ) setwd(initial.dir) I would like to highlight the sample in outlier on the barplot that is create, would this be possible? Thanks -- Regards/Groete/Mit freundlichen GrCC?en/recuerdos/meilleures salutations/ distinti saluti/siong/duC, yC:/P?Q?P8P2P5Q? Jurgens de Bruin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xmlToDataFrame very slow
Hi Stavros xmlToDataFrame() is very generic and so doesn't know anything about the particulars of the XML it is processing. If you know something about the structure of the XML, you should be able to leverage that for performance. xmlToDataFrame is also not optimized as it is just a convenience routine for people who want to work with XML without much effort. If you send me the file and the code you are using to read the file, I'll take a look at it. D. On 7/30/13 11:10 AM, Stavros Macrakis wrote: I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame (package XML). I have successfully read it into R by splitting the file 10 ways then running xmlToDataFrame on each part, then rbind.fill (package plyr) on the result. This takes about 530 s total, and results in a data.frame with 71k rows and object.size of 21MB. But trying to run xmlToDataFrame on the whole file takes forever ( 1 s so far). xmlParse of this file takes only 0.8 s. I tried running xmlToDataFrame on the first 10% of the file, then the first 10% repeated twice, then three times (with the outer tags adjusted of course). Timings: 1 copy: 111 s = 111 per copy 2 copy: 311 s = 155 3 copy: 626 s = 209 The runtime is superlinear. What is going on here? Is there a better approach? Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Does a general latex table-making function exist?
Our Hmisc package summary.formula function and its latex methods can make some fairly advanced tables. But the tables have to be regular. For example, all rows of the tables are based on the same data frame. I'm thinking that what is needed is a ggplot2-like set of functions for building a table row-by-row or row-by-block of rows. Different row blocks could have different denominators, e.g., the first part of the table might be on everyone and a latter block of rows be for females, with different summary statistics computed. Has anyone already written functions creating LaTeX markup with such functionality? Thanks Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does a general latex table-making function exist?
On 13-07-31 4:03 PM, Frank Harrell wrote: Our Hmisc package summary.formula function and its latex methods can make some fairly advanced tables. But the tables have to be regular. For example, all rows of the tables are based on the same data frame. I'm thinking that what is needed is a ggplot2-like set of functions for building a table row-by-row or row-by-block of rows. Different row blocks could have different denominators, e.g., the first part of the table might be on everyone and a latter block of rows be for females, with different summary statistics computed. Has anyone already written functions creating LaTeX markup with such functionality? My tables package does some of what you are asking for; I'm not sure if it does everything. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] qgraph: how to create legend (scale) for edge thickness?
Hello R community, I am creating some network representations using the qgraph package (big thanks to Sacha Epskamp for developing it!). The package is very well documented, but I am unable to find how to create a legend (scale) for edge thickness. In one of his qgraph examples, Sacha shows such type of scale (fifth graph in http://sachaepskamp.com/qgraph/examples - scale for edge thickness relative to p-values). I have searched the documentation and it seems that a legend relates to the definition of node groups, so I am uncertain on which option/command I need to use for achieving what I need. I would also like to be able to select the values for which the scale is created too. If it is unclear, what I am looking for is to display this next to the network graph: probability edge thickness 1.0 display line with thickness for 1.0 0.8 display line with thickness for 0.8 0.6 display line with thickness for 0.6 0.4 display line with thickness for 0.4 My line of code for generating the network is the following: qgraph(Edges2,esize=7,nsize=12,gray=TRUE,layout=circular,filetype=pdf,width=5,height=5,vsize=11,label.prop=1.2,arrows=FALSE,border.color=c(red,red,blue,green,purple),border.width=4,maximum=1.4,cut=0.0001) I would appreciate if somebody can help me out. Thanks, Maria Antonieta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] geocoding using the Google API with a key
Hello, I am trying to geocode an address using the Google API and R. So far, I have used the following code: location-c('120 Avenue de la Republique, 92120 Montrouge, France') location - gsub(' ', '+', location) sensor-c('FALSE') sensor4url - paste('sensor=', tolower(as.character(sensor)), sep = '') posturl - paste(location, sensor4url, sep = '') url_string - paste('http://maps.googleapis.com/maps/api/geocode/json?address=', posturl, sep = ) url_string - URLencode(url_string) gc - fromJSON(paste(readLines(url(url_string)), collapse = '')) gc The above code has worked just fine for up to 2500 Google queries per day (which are free). My question: how can I modify the above code to insert a Google client ID and/or crypto key so as to run 2500 queries? Adding a 'client=...' and/or 'key=...' in the posturl line above does not seem to do the trick. Thanks in advance, fv CONFIDENTIALITY NOTICE This e-mail message and any attac...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] resampling
Could anyone tell me how,from a pool of 1002 observations (one variable), can I resample 1000 samples of 20 observations? And then calculate the mean and standard deviation between 2, 3, 4, ..., 1000 samples and plot them? Thank you! _ Rita Gamito Centro de Oceanografia Faculdade de Ciências, Universidade de Lisboa Campo Grande, 1749-016 Lisboa, Portugal e-mail: rgam...@fc.ul.pt Tel: + 351 21 750 00 00 - ext. 22575 Fax: + 351 21 750 02 07 www.co.fc.ul.pt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R help
Hi First of all, thanks for this service, it is being very useful for me. I am new in R so I have a lot of doubts. I have to do imputation in a data set, this is a sample of my data set which looks like: NUMERO Data1 Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010 20133 30/09/2002 18/06/2013 153 279 289 370 412 262 115 75 21138 11/07/2002 13/05/2009546078638365 12009 16763 NA NA NA 22146 16/10/2009 18/06/2013 NA NA NA NA NA NA NA 35 23152 27/05/1999 18/06/2013 NA 80 77 60 89 137 144 146 24154 21/12/2004 18/06/2013 NA NA 148 186 302 233 194 204 25166 8/02/2008 18/06/2013 NA NA NA NA NA NA 98 160 26177 20/02/1996 18/06/2013 16 4 NA 3 3 NA 5 5 The problem is that I have cells which have to be empty, this depends on Data1 and Data2 For instance in the third row, you can see that Data1 is equal to 16/10/2009, so I don't have to have any information until year 2009, therefore IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008 have to be totally empty, but this doesn't mean that they are missing values, in fact they are not. I don't want to get any imputation in this cells. Ie.2009 and IE.2010 have to be full and they are not, so this cells are missing values and I want to get imputed values for them. (I would delete this row, because it is impossible to get any imformation about it, but it is ok for this example) On the other hand, in the last row NA is a real missing value. How can I specify that this cells are empty and don't get this imputed values?? I have tried to put NaN but I have problems in some functions that I need to do it before the imputation. Thanks a lot Best regars, Teresa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split in blocks
Hello, I am a little bit lost on my search for a solution and idea. I would like to split my time serie in blocks of night. V1 indicates if its night or not. How can i split this kind of cases? Best regards, str(ou[,c(1,3,8)]) 'data.frame': 863 obs. of 3 variables: $ Fecha: POSIXct, format: 2013-07-04 00:10:00 ... $ Ta : num 22.6 22.2 22.2 22.2 22.2 ... $ V1 : num 1 1 1 1 1 1 1 1 1 1 ... dput(ou[,c(1,3,8)]) structure(list(Fecha = structure(c(1372889400, 137289, 1372890600, 1372891200, 1372891800, 1372892400, 1372893000, 1372893600, 1372894200, 1372894800, 1372895400, 1372896000, 1372896600, 1372897200, 1372897800, 1372898400, 1372899000, 1372899600, 1372900200, 1372900800, 1372901400, 1372902000, 1372902600, 1372903200, 1372903800, 1372904400, 1372905000, 1372905600, 1372906200, 1372906800, 1372907400, 1372908000, 1372908600, 1372909200, 1372909800, 1372910400, 1372911000, 1372911600, 1372912200, 1372912800, 1372913400, 1372914000, 1372914600, 1372915200, 1372915800, 1372916400, 1372917000, 1372917600, 1372918200, 1372918800, 1372919400, 137292, 1372920600, 1372921200, 1372921800, 1372922400, 1372923000, 1372923600, 1372924200, 1372924800, 1372925400, 1372926000, 1372926600, 1372927200, 1372927800, 1372928400, 1372929000, 1372929600, 1372930200, 1372930800, 1372931400, 1372932000, 1372932600, 1372933200, 1372933800, 1372934400, 1372935000, 1372935600, 1372936200, 1372936800, 1372937400, 1372938000, 1372938600, 1372939200, 1372939800, 1372940400, 1372941000, 1372941600, 1372942200, 1372942800, 1372943400, 1372944000, 1372944600, 1372945200, 1372945800, 1372946400, 1372947000, 1372947600, 1372948200, 1372948800, 1372949400, 137295, 1372950600, 1372951200, 1372951800, 1372952400, 1372953000, 1372953600, 1372954200, 1372954800, 1372955400, 1372956000, 1372956600, 1372957200, 1372957800, 1372958400, 1372959000, 1372959600, 1372960200, 1372960800, 1372961400, 1372962000, 1372962600, 1372963200, 1372963800, 1372964400, 1372965000, 1372965600, 1372966200, 1372966800, 1372967400, 1372968000, 1372968600, 1372969200, 1372969800, 1372970400, 1372971000, 1372971600, 1372972200, 1372972800, 1372973400, 1372974000, 1372974600, 1372975200, 1372975800, 1372976400, 1372977000, 1372977600, 1372978200, 1372978800, 1372979400, 137298, 1372980600, 1372981200, 1372981800, 1372982400, 1372983000, 1372983600, 1372984200, 1372984800, 1372985400, 1372986000, 1372986600, 1372987200, 1372987800, 1372988400, 1372989000, 1372989600, 1372990200, 1372990800, 1372991400, 1372992000, 1372992600, 1372993200, 1372993800, 1372994400, 1372995000, 1372995600, 1372996200, 1372996800, 1372997400, 1372998000, 1372998600, 1372999200, 1372999800, 1373000400, 1373001000, 1373001600, 1373002200, 1373002800, 1373003400, 1373004000, 1373004600, 1373005200, 1373005800, 1373006400, 1373007000, 1373007600, 1373008200, 1373008800, 1373009400, 137301, 1373010600, 1373011200, 1373011800, 1373012400, 1373013000, 1373013600, 1373014200, 1373014800, 1373015400, 1373016000, 1373016600, 1373017200, 1373017800, 1373018400, 1373019000, 1373019600, 1373020200, 1373020800, 1373021400, 1373022000, 1373022600, 1373023200, 1373023800, 1373024400, 1373025000, 1373025600, 1373026200, 1373026800, 1373027400, 1373028000, 1373028600, 1373029200, 1373029800, 1373030400, 1373031000, 1373031600, 1373032200, 1373032800, 1373033400, 1373034000, 1373034600, 1373035200, 1373035800, 1373036400, 1373037000, 1373037600, 1373038200, 1373038800, 1373039400, 137304, 1373040600, 1373041200, 1373041800, 1373042400, 1373043000, 1373043600, 1373044200, 1373044800, 1373045400, 1373046000, 1373046600, 1373047200, 1373047800, 1373048400, 1373049000, 1373049600, 1373050200, 1373050800, 1373051400, 1373052000, 1373052600, 1373053200, 1373053800, 1373054400, 1373055000, 1373055600, 1373056200, 1373056800, 1373057400, 1373058000, 1373058600, 1373059200, 1373059800, 1373060400, 1373061000, 1373061600, 1373062200, 1373062800, 1373063400, 1373064000, 1373064600, 1373065200, 1373065800, 1373066400, 1373067000, 1373067600, 1373068200, 1373068800, 1373069400, 137307, 1373070600, 1373071200, 1373071800, 1373072400, 1373073000, 1373073600, 1373074200, 1373074800, 1373075400, 1373076000, 1373076600, 1373077200, 1373077800, 1373078400, 1373079000, 1373079600, 1373080200, 1373080800, 1373081400, 1373082000, 1373082600, 1373083200, 1373083800, 1373084400, 1373085000, 1373085600, 1373086200, 1373086800, 1373087400, 1373088000, 1373088600, 1373089200, 1373089800, 1373090400, 1373091000, 1373091600, 1373092200, 1373092800, 1373093400, 1373094000, 1373094600, 1373095200, 1373095800, 1373096400, 1373097000, 1373097600, 1373098200, 1373098800, 1373099400, 137310, 1373100600, 1373101200, 1373101800, 1373102400, 1373103000, 1373103600, 1373104200, 1373104800, 1373105400, 1373106000, 1373106600, 1373107200, 1373107800, 1373108400, 1373109000, 1373109600, 1373110200, 1373110800, 1373111400, 1373112000, 1373112600, 1373113200, 1373113800, 1373114400, 1373115000,
Re: [R] Add a column to a data frame with value based on the percentile of the row
Works like a charm, thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711p4672728.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem about mean function in ffbase package
Hi all, I experienced some unmatched result using mean function in ffbase package and cannot figure out what's wrong. I have a simulated ff vector with 10 numbers inside and want to calculate its mean. But the results are quite different. With mean( ) function in ffbase package, the mean is 152.6858. But with R's mean( ) or adding sum from chunks directly, I got 667.5595 any idea ? Thank you in advance! Bayes Chen # F1 is an ffdf , F1$X1 is an ff vector length(F1$X1) [1] 10 # Use mean() function in ffbase package mean(F1$X1) [1] 152.6858 X2 = F1$X1[] # X2 is now an non-ff vector length(X2) [1] 10 mean(X2) # R's original mean function for ordinary vectors [1] 667.5595 # calculate sum and then mean by chunks chunks = chunk(F1$X1, by=500) sumx = 0 for (i in chunks) { + sumx = sumx + sum(F1$X1[i]) + } sumx/length(F1$X1) [1] 667.5595 --- below are some other trials X2 = F1$X1[1:100] mean(X2) [1] 59.43149 mean(as.ff(X2)) [1] 59.43149 X2 = F1$X1[1:1] mean(X2) [1] 59.41978 mean(as.ff(X2)) [1] 59.42128 X2 = F1$X1[1:5] mean(X2) [1] 60.53615 mean(as.ff(X2)) [1] 57.72168 X2 = F1$X1[1:75000] mean(X2) [1] 59.37562 mean(as.ff(X2)) [1] 57.81179 X2 = F1$X1[1:9] mean(X2) [1] 57.0867 mean(as.ff(X2)) [1] 57.44862 X3 = F1$X1[9:10] mean(X3) [1] 6161.814 mean(as.ff(X3)) [1] 6161.797 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correlation Loops in time series
Hello, I've got the following problem. I have to matrices each containing 200 time series. Now I want to calculate the correlation of the first time series of each of the matrices. I use the following command: cor(mts1[,1],mts2[,1], use=complete.obs, method=c(pearson)) cor(mts1[,2],mts2[,2], use=complete.obs, method=c(pearson)) cor(mts1[,3],mts2[,3], use=complete.obs, method=c(pearson)) and so on.. I would like to repeat this for each of the 200 time series. As it is quite painful to change the command 200 times I wanted to ask if there's a loop function that can cover these series in a fast way? Thanks in advance for your help Best Tom -- View this message in context: http://r.789695.n4.nabble.com/Correlation-Loops-in-time-series-tp4672732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add a column to a data frame with value based on the percentile of the row
Hi Arun Kirshna, I have tested your method and it will work for me. I only run into one problem. Before I want to do this operation I have sorted my data frame so my rownumbers ar not subsequent. You can see if you first order your example data frame like: dat1 - dat1[order(-dat1$value),] head(dat1) IDvalue SEGMENT 237 237 3.538552 20 to 50 21 21 3.376149Top 5% 421 421 3.015634 Bottom 50 339 339 2.855991 Bottom 50 119 119 2.589574 20 to 50 12 12 2.512276Top 5% Do you have a solution for this? -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711p4672726.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] resampling
Hi. See ?sample, ?replicate,?colMeans, ?plot.. Here is the simple example: sample(1:1000,20) replicate(5, sample(1:1000,20)) colMeans(replicate(5, sample(1:1000,20))) Andrija On Wed, Jul 31, 2013 at 1:23 PM, Rita Gamito rslo...@fc.ul.pt wrote: Could anyone tell me how,from a pool of 1002 observations (one variable), can I resample 1000 samples of 20 observations? And then calculate the mean and standard deviation between 2, 3, 4, ..., 1000 samples and plot them? Thank you! _ Rita Gamito Centro de Oceanografia Faculdade de Ciências, Universidade de Lisboa Campo Grande, 1749-016 Lisboa, Portugal e-mail: rgam...@fc.ul.pt Tel: + 351 21 750 00 00 - ext. 22575 Fax: + 351 21 750 02 07 www.co.fc.ul.pt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Canadian common CV: how to cite R packages?
A Q for Canadians who have filled out the new Canadian common CV for grant applications: is there any way to cite research contributions of software such as R packages, aside from published journal articles? If so, where/how in the online application can they be entered? For example, under Publications, they list Reports and Manuals,but the required fields there seem to apply only to things like printed technical reports and printed manuals. If the answer is: these cannot be listed, OK, but the online app is extremely Byzantine and maybe there was something I missed. TIA -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation Loops in time series
sapply(1:200, function(x) cor(mts1[,x], mts2[,x], use=complete.obs, method=c(pearson))) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of TMiller Sent: Wednesday, July 31, 2013 8:16 AM To: r-help@r-project.org Subject: [R] Correlation Loops in time series Hello, I've got the following problem. I have to matrices each containing 200 time series. Now I want to calculate the correlation of the first time series of each of the matrices. I use the following command: cor(mts1[,1],mts2[,1], use=complete.obs, method=c(pearson)) cor(mts1[,2],mts2[,2], use=complete.obs, method=c(pearson)) cor(mts1[,3],mts2[,3], use=complete.obs, method=c(pearson)) and so on.. I would like to repeat this for each of the 200 time series. As it is quite painful to change the command 200 times I wanted to ask if there's a loop function that can cover these series in a fast way? Thanks in advance for your help Best Tom -- View this message in context: http://r.789695.n4.nabble.com/Correlation-Loops-in-time-series -tp4672732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge matrix row data
Dear Arun Thank you for the very useful help. However, please kindly explain the code below. row.names(mat1)- gsub([_], ,row.names(mat1)) 1. what does [_] mean? 2. what doesmean? 3. what does row.names(mat1) mean? I checked ?gsub but still did not get the idea. Thank you again Elaine On Wed, Jul 31, 2013 at 9:35 PM, arun smartpink...@yahoo.com wrote: HI, Please use ?dput() mat1- as.matrix(read.table(text= D0989 D9820 D5629 D4327 D2134 GID_1100 1 0 GID_2011 0 0 GID_4001 0 0 GID_5110 0 0 GID_7010 0 1 ,sep=,header=TRUE)) row.names(mat1)- gsub([_], ,row.names(mat1)) IslandA-c(GID 1, GID 5) IslandB- c(GID 2, GID 4, GID 7) res- t(sapply(c(IslandA,IslandB),function(x) {x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} )) res #D0989 D9820 D5629 D4327 D2134 #IslandA 1 1 0 1 0 #IslandB 0 1 1 0 1 A.K. - Original Message - From: Elaine Kuo elaine.kuo...@gmail.com To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch Cc: Sent: Wednesday, July 31, 2013 9:03 AM Subject: [R] merge matrix row data Dear list, I have a matrix showing the species presence-absence on a map. Its rows are map locations, represented by GridCellID, such as GID1 and GID 5. Its columns are species ID, such as D0989, D9820, and D5629. The matrix is as followed. Now I want to merge the GridCellID according to the map location of each island. For instance, Island A consist of GID 1 and 5. Island B consist of GID 2, 4, and 7. In GID 1 and 5, species D0989 are both 1. Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1. The original matrix and the resulting matrix are listed below. Please kindly advise how to code the calculation in R. Please do not hesitate to ask if anything is unclear. Thank you in advance. Elaine Original matrix D0989 D9820 D5629 D4327 D2134 GID 1100 1 0 GID 2011 0 0 GID 4001 0 0 GID 5110 0 0 GID 7010 0 1 Resulting matrix D0989 D9820 D5629 D4327 D2134 Island A 11 0 1 0 Island B 01 1 0 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split in blocks
Enter ?help at the prompt to learn how to use R's (extensive) Help system to answer questions like this. For this question: ?split ## what else? Also ?tapply, ?ave, ?aggregate, ?by may be relevant. Also, read AN Introduction to R if you haven't already done so to start learning about R's many data manipulation and analysis features. Cheers, Bert On Wed, Jul 31, 2013 at 7:39 AM, Dominic Roye dominic.r...@gmail.com wrote: Hello, I am a little bit lost on my search for a solution and idea. I would like to split my time serie in blocks of night. V1 indicates if its night or not. How can i split this kind of cases? Best regards, str(ou[,c(1,3,8)]) 'data.frame': 863 obs. of 3 variables: $ Fecha: POSIXct, format: 2013-07-04 00:10:00 ... $ Ta : num 22.6 22.2 22.2 22.2 22.2 ... $ V1 : num 1 1 1 1 1 1 1 1 1 1 ... dput(ou[,c(1,3,8)]) structure(list(Fecha = structure(c(1372889400, 137289, 1372890600, 1372891200, 1372891800, 1372892400, 1372893000, 1372893600, 1372894200, 1372894800, 1372895400, 1372896000, 1372896600, 1372897200, 1372897800, 1372898400, 1372899000, 1372899600, 1372900200, 1372900800, 1372901400, 1372902000, 1372902600, 1372903200, 1372903800, 1372904400, 1372905000, 1372905600, 1372906200, 1372906800, 1372907400, 1372908000, 1372908600, 1372909200, 1372909800, 1372910400, 1372911000, 1372911600, 1372912200, 1372912800, 1372913400, 1372914000, 1372914600, 1372915200, 1372915800, 1372916400, 1372917000, 1372917600, 1372918200, 1372918800, 1372919400, 137292, 1372920600, 1372921200, 1372921800, 1372922400, 1372923000, 1372923600, 1372924200, 1372924800, 1372925400, 1372926000, 1372926600, 1372927200, 1372927800, 1372928400, 1372929000, 1372929600, 1372930200, 1372930800, 1372931400, 1372932000, 1372932600, 1372933200, 1372933800, 1372934400, 1372935000, 1372935600, 1372936200, 1372936800, 1372937400, 1372938000, 1372938600, 1372939200, 1372939800, 1372940400, 1372941000, 1372941600, 1372942200, 1372942800, 1372943400, 1372944000, 1372944600, 1372945200, 1372945800, 1372946400, 1372947000, 1372947600, 1372948200, 1372948800, 1372949400, 137295, 1372950600, 1372951200, 1372951800, 1372952400, 1372953000, 1372953600, 1372954200, 1372954800, 1372955400, 1372956000, 1372956600, 1372957200, 1372957800, 1372958400, 1372959000, 1372959600, 1372960200, 1372960800, 1372961400, 1372962000, 1372962600, 1372963200, 1372963800, 1372964400, 1372965000, 1372965600, 1372966200, 1372966800, 1372967400, 1372968000, 1372968600, 1372969200, 1372969800, 1372970400, 1372971000, 1372971600, 1372972200, 1372972800, 1372973400, 1372974000, 1372974600, 1372975200, 1372975800, 1372976400, 1372977000, 1372977600, 1372978200, 1372978800, 1372979400, 137298, 1372980600, 1372981200, 1372981800, 1372982400, 1372983000, 1372983600, 1372984200, 1372984800, 1372985400, 1372986000, 1372986600, 1372987200, 1372987800, 1372988400, 1372989000, 1372989600, 1372990200, 1372990800, 1372991400, 1372992000, 1372992600, 1372993200, 1372993800, 1372994400, 1372995000, 1372995600, 1372996200, 1372996800, 1372997400, 1372998000, 1372998600, 1372999200, 1372999800, 1373000400, 1373001000, 1373001600, 1373002200, 1373002800, 1373003400, 1373004000, 1373004600, 1373005200, 1373005800, 1373006400, 1373007000, 1373007600, 1373008200, 1373008800, 1373009400, 137301, 1373010600, 1373011200, 1373011800, 1373012400, 1373013000, 1373013600, 1373014200, 1373014800, 1373015400, 1373016000, 1373016600, 1373017200, 1373017800, 1373018400, 1373019000, 1373019600, 1373020200, 1373020800, 1373021400, 1373022000, 1373022600, 1373023200, 1373023800, 1373024400, 1373025000, 1373025600, 1373026200, 1373026800, 1373027400, 1373028000, 1373028600, 1373029200, 1373029800, 1373030400, 1373031000, 1373031600, 1373032200, 1373032800, 1373033400, 1373034000, 1373034600, 1373035200, 1373035800, 1373036400, 1373037000, 1373037600, 1373038200, 1373038800, 1373039400, 137304, 1373040600, 1373041200, 1373041800, 1373042400, 1373043000, 1373043600, 1373044200, 1373044800, 1373045400, 1373046000, 1373046600, 1373047200, 1373047800, 1373048400, 1373049000, 1373049600, 1373050200, 1373050800, 1373051400, 1373052000, 1373052600, 1373053200, 1373053800, 1373054400, 1373055000, 1373055600, 1373056200, 1373056800, 1373057400, 1373058000, 1373058600, 1373059200, 1373059800, 1373060400, 1373061000, 1373061600, 1373062200, 1373062800, 1373063400, 1373064000, 1373064600, 1373065200, 1373065800, 1373066400, 1373067000, 1373067600, 1373068200, 1373068800, 1373069400, 137307, 1373070600, 1373071200, 1373071800, 1373072400, 1373073000, 1373073600, 1373074200, 1373074800, 1373075400, 1373076000, 1373076600, 1373077200, 1373077800, 1373078400, 1373079000, 1373079600, 1373080200, 1373080800, 1373081400, 1373082000, 1373082600, 1373083200, 1373083800, 1373084400, 1373085000, 1373085600, 1373086200, 1373086800, 1373087400, 1373088000, 1373088600, 1373089200, 1373089800,
Re: [R] merge matrix row data
Time to do some homework, Elaine: ?regexp There are also numerous online tutorials on regular expressions that you can use to educate yourself. Cheers, Bert On Wed, Jul 31, 2013 at 2:07 PM, Elaine Kuo elaine.kuo...@gmail.com wrote: Dear Arun Thank you for the very useful help. However, please kindly explain the code below. row.names(mat1)- gsub([_], ,row.names(mat1)) 1. what does [_] mean? 2. what doesmean? 3. what does row.names(mat1) mean? I checked ?gsub but still did not get the idea. Thank you again Elaine On Wed, Jul 31, 2013 at 9:35 PM, arun smartpink...@yahoo.com wrote: HI, Please use ?dput() mat1- as.matrix(read.table(text= D0989 D9820 D5629 D4327 D2134 GID_1100 1 0 GID_2011 0 0 GID_4001 0 0 GID_5110 0 0 GID_7010 0 1 ,sep=,header=TRUE)) row.names(mat1)- gsub([_], ,row.names(mat1)) IslandA-c(GID 1, GID 5) IslandB- c(GID 2, GID 4, GID 7) res- t(sapply(c(IslandA,IslandB),function(x) {x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} )) res #D0989 D9820 D5629 D4327 D2134 #IslandA 1 1 0 1 0 #IslandB 0 1 1 0 1 A.K. - Original Message - From: Elaine Kuo elaine.kuo...@gmail.com To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch Cc: Sent: Wednesday, July 31, 2013 9:03 AM Subject: [R] merge matrix row data Dear list, I have a matrix showing the species presence-absence on a map. Its rows are map locations, represented by GridCellID, such as GID1 and GID 5. Its columns are species ID, such as D0989, D9820, and D5629. The matrix is as followed. Now I want to merge the GridCellID according to the map location of each island. For instance, Island A consist of GID 1 and 5. Island B consist of GID 2, 4, and 7. In GID 1 and 5, species D0989 are both 1. Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1. The original matrix and the resulting matrix are listed below. Please kindly advise how to code the calculation in R. Please do not hesitate to ask if anything is unclear. Thank you in advance. Elaine Original matrix D0989 D9820 D5629 D4327 D2134 GID 1100 1 0 GID 2011 0 0 GID 4001 0 0 GID 5110 0 0 GID 7010 0 1 Resulting matrix D0989 D9820 D5629 D4327 D2134 Island A 11 0 1 0 Island B 01 1 0 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] double matrix?
Hi- I have a 37 X 473971 character matrix that I am trying to convert into a numeric matrix. When I use the code: class(matrix) = numeric I end up with something called a double matrix whose dimensions are still 37 X 473971 I have also tried new = apply(matrix,2, as.numeric) and got the same thing. The analysis code I am ultimately attempting to run on this data requires that it be in a numerical matrix, and it is really not okay with a double matrix. Does anyone know how to fix this? Thanks. -- Jessica R.B. Musselman, MS T32 Trainee/Doctoral Candidate University of Minnesota Department of Pediatrics Division of Epidemiology/Clinical Research Mayo Mail Code 715 Room 1-195 Moos Tower 420 Delaware St. SE Minneapolis MN 55455 Phone: (612)626-3281 email: bruce...@umn.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does a general latex table-making function exist?
Duncan, I had read your excellent tables package vignette at http://cran.r-project.org/web/packages/tables/vignettes/tables.pdf when it first came out. It is extremely impressive. I'm glad to be reminded to give it another look. Is there a way to make the special symbols n and 1 refer to the number of non-missing observations rather than the length of a vector? Do you feel like taking on this challenge? An example of an irregular table I'm thinking of is the following Females Males Q1 Med Q3 (n) Q1 Med Q3 (n) Age 25 49 63 (1016) 26 50 64 (1767) Canadians Weight (kg) 57 63 74 ( 243) 67 73 90 ( 401) Canadians could mean country=='Canada'. Thanks! Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double matrix?
What are the entries in your matrix? If they are something that won't coerce to numeric, you need to backtrack. Note how R distinguishes types of characters. as.numeric(a) [1] NA Warning message: NAs introduced by coercion as.character(2) [1] 2 as.numeric(2) [1] 2 On Jul 31, 2013, at 1:47 PM, bruce...@umn.edu wrote: Hi- I have a 37 X 473971 character matrix that I am trying to convert into a numeric matrix. When I use the code: class(matrix) = numeric I end up with something called a double matrix whose dimensions are still 37 X 473971 I have also tried new = apply(matrix,2, as.numeric) and got the same thing. The analysis code I am ultimately attempting to run on this data requires that it be in a numerical matrix, and it is really not okay with a double matrix. Does anyone know how to fix this? Thanks. -- Jessica R.B. Musselman, MS T32 Trainee/Doctoral Candidate University of Minnesota Department of Pediatrics Division of Epidemiology/Clinical Research Mayo Mail Code 715 Room 1-195 Moos Tower 420 Delaware St. SE Minneapolis MN 55455 Phone: (612)626-3281 email: bruce...@umn.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Don McKenzie, Research Ecologist Pacific WIldland Fire Sciences Lab US Forest Service Affiliate Professor School of Forest Resources, College of the Environment CSES Climate Impacts Group University of Washington phone: 206-732-7824 d...@uw.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double matrix?
In R, double is a synonym for numeric. Please see ?numeric The details section of ?numeric begins with Details: 'numeric' is identical to 'double' (and 'real'). It creates a double-precision vector of the specified length with each element equal to '0'. Rich On Wed, Jul 31, 2013 at 4:47 PM, bruce...@umn.edu wrote: Hi- I have a 37 X 473971 character matrix that I am trying to convert into a numeric matrix. When I use the code: class(matrix) = numeric I end up with something called a double matrix whose dimensions are still 37 X 473971 I have also tried new = apply(matrix,2, as.numeric) and got the same thing. The analysis code I am ultimately attempting to run on this data requires that it be in a numerical matrix, and it is really not okay with a double matrix. Does anyone know how to fix this? Thanks. -- Jessica R.B. Musselman, MS T32 Trainee/Doctoral Candidate University of Minnesota Department of Pediatrics Division of Epidemiology/Clinical Research Mayo Mail Code 715 Room 1-195 Moos Tower 420 Delaware St. SE Minneapolis MN 55455 Phone: (612)626-3281 email: bruce...@umn.edu __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double matrix?
Hello, double and numeric are the same. From the help page for ?double, section Note on names It is a historical anomaly that R has two names for its floating-point vectors, double and numeric (and formerly had real). Apparently you are successfully converting characters to double precision floating-point numbers. Hope this helps, Rui Barradas Em 31-07-2013 21:47, bruce...@umn.edu escreveu: Hi- I have a 37 X 473971 character matrix that I am trying to convert into a numeric matrix. When I use the code: class(matrix) = numeric I end up with something called a double matrix whose dimensions are still 37 X 473971 I have also tried new = apply(matrix,2, as.numeric) and got the same thing. The analysis code I am ultimately attempting to run on this data requires that it be in a numerical matrix, and it is really not okay with a double matrix. Does anyone know how to fix this? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double matrix?
It is hard to understand that your R code will not work with a double matrix since double is just short for double precision floating point matrix. Your only alternative would be integer. From ?numeric It is a historical anomaly that R has two names for its floating-point vectors, double and numeric (and formerly had real). double is the name of the type. numeric is the name of the mode and also of the implicit class. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of bruce...@umn.edu Sent: Wednesday, July 31, 2013 3:48 PM To: r-help@r-project.org Subject: [R] double matrix? Hi- I have a 37 X 473971 character matrix that I am trying to convert into a numeric matrix. When I use the code: class(matrix) = numeric I end up with something called a double matrix whose dimensions are still 37 X 473971 I have also tried new = apply(matrix,2, as.numeric) and got the same thing. The analysis code I am ultimately attempting to run on this data requires that it be in a numerical matrix, and it is really not okay with a double matrix. Does anyone know how to fix this? Thanks. -- Jessica R.B. Musselman, MS T32 Trainee/Doctoral Candidate University of Minnesota Department of Pediatrics Division of Epidemiology/Clinical Research Mayo Mail Code 715 Room 1-195 Moos Tower 420 Delaware St. SE Minneapolis MN 55455 Phone: (612)626-3281 email: bruce...@umn.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert rbind of lists to data.frame
I'm trying to build a data.frame row-by-row like so: df - data.frame(rbind(list('a',1), list('b', 2), list('c', 3))) I was surprised to see that the columns of the resulting data.frame are stored in lists rather than vectors. str(df) 'data.frame': 3 obs. of 2 variables: $ X1:List of 3 ..$ : chr a ..$ : chr b ..$ : chr c $ X2:List of 3 ..$ : num 1 ..$ : num 2 ..$ : num 3 The desired result is: str(df) 'data.frame': 3 obs. of 2 variables: $ X1: chr a b c $ X2: num 1 2 3 The following works, but is rather ugly: df - data.frame(lapply(data.frame(rbind(list('a',1), list('b', 2), list('c', 3))), unlist)) Thanks, Shaun __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double matrix?
In R double and numeric mean essentially the same thing. I think you are fine. (What called the result a double matrix?) z - cbind(c(11, 12), c(3.14, 2.718)) str(z) chr [1:2, 1:2] 11 12 3.14 2.718 class(z) [1] matrix class(z) - numeric str(z) num [1:2, 1:2] 11 12 3.14 2.72 class(z) [1] matrix z [,1] [,2] [1,] 11 3.140 [2,] 12 2.718 log(z) [,1] [,2] [1,] 2.397895 1.1442228 [2,] 2.484907 0.9998963 R numeric vectors consist of C double or Fortran double precision or real*8 values - 8 byte double precision floating point numbers with 52 binary digits of precision. S supported 4-byte single precision vectors which it also considered numeric. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of bruce...@umn.edu Sent: Wednesday, July 31, 2013 1:48 PM To: r-help@r-project.org Subject: [R] double matrix? Hi- I have a 37 X 473971 character matrix that I am trying to convert into a numeric matrix. When I use the code: class(matrix) = numeric I end up with something called a double matrix whose dimensions are still 37 X 473971 I have also tried new = apply(matrix,2, as.numeric) and got the same thing. The analysis code I am ultimately attempting to run on this data requires that it be in a numerical matrix, and it is really not okay with a double matrix. Does anyone know how to fix this? Thanks. -- Jessica R.B. Musselman, MS T32 Trainee/Doctoral Candidate University of Minnesota Department of Pediatrics Division of Epidemiology/Clinical Research Mayo Mail Code 715 Room 1-195 Moos Tower 420 Delaware St. SE Minneapolis MN 55455 Phone: (612)626-3281 email: bruce...@umn.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] resampling
Hello, The best way seems to be ?replicate. set.seed(3997) # make it reproducible x - rnorm(1002) # make up some data sim - replicate(1000, sample(x, 20)) colSds - function(x, na.rm = FALSE) apply(x, 2, sd, na.rm = na.rm) mu - colMeans(sim) sigma - colSds(sim) Hope this helps, Rui Barradas Em 31-07-2013 12:23, Rita Gamito escreveu: Could anyone tell me how,from a pool of 1002 observations (one variable), can I resample 1000 samples of 20 observations? And then calculate the mean and standard deviation between 2, 3, 4, ..., 1000 samples and plot them? Thank you! _ Rita Gamito Centro de Oceanografia Faculdade de Ciências, Universidade de Lisboa Campo Grande, 1749-016 Lisboa, Portugal e-mail: rgam...@fc.ul.pt Tel: + 351 21 750 00 00 - ext. 22575 Fax: + 351 21 750 02 07 www.co.fc.ul.pt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert rbind of lists to data.frame
May be this helps: l1- list('a',1) l2- list('b',2) l3- list('c',3) df1-data.frame(mapply(`c`,l1,l2,l3,SIMPLIFY=FALSE),stringsAsFactors=FALSE) colnames(df1)-paste0(X,1:2) str(df1) #'data.frame': 3 obs. of 2 variables: # $ X1: chr a b c # $ X2: num 1 2 3 A.K. - Original Message - From: Shaun Jackman sjack...@gmail.com To: R help r-help@r-project.org Cc: Sent: Wednesday, July 31, 2013 5:58 PM Subject: [R] Convert rbind of lists to data.frame I'm trying to build a data.frame row-by-row like so: df - data.frame(rbind(list('a',1), list('b', 2), list('c', 3))) I was surprised to see that the columns of the resulting data.frame are stored in lists rather than vectors. str(df) 'data.frame': 3 obs. of 2 variables: $ X1:List of 3 ..$ : chr a ..$ : chr b ..$ : chr c $ X2:List of 3 ..$ : num 1 ..$ : num 2 ..$ : num 3 The desired result is: str(df) 'data.frame': 3 obs. of 2 variables: $ X1: chr a b c $ X2: num 1 2 3 The following works, but is rather ugly: df - data.frame(lapply(data.frame(rbind(list('a',1), list('b', 2), list('c', 3))), unlist)) Thanks, Shaun __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge matrix row data
Hi Elaine, In that case: Do you have GID in the IslandA and IslandBs? IslandA-c(GID 1, GID 5) IslandB- c(GID 2, GID 4, GID 7) If there is no change in the two Islands, then using the same dataset: mat1- as.matrix(read.table(text= D0989 D9820 D5629 D4327 D2134 GID_1 1 0 0 1 0 GID_2 0 1 1 0 0 GID_4 0 0 1 0 0 GID_5 1 1 0 0 0 GID_7 0 1 0 0 1 ,sep=,header=TRUE)) row.names(mat1)- gsub(.*\\_,,row.names(mat1)) #to replace the GID_ from the row.names() mat1 # D0989 D9820 D5629 D4327 D2134 #1 1 0 0 1 0 #2 0 1 1 0 0 #4 0 0 1 0 0 #5 1 1 0 0 0 #7 0 1 0 0 1 IslandA-c(GID 1, GID 5) IslandB- c(GID 2, GID 4, GID 7) res-t(sapply(c(IslandA,IslandB),function(x) {x1- mat1[match(gsub(.*\\s+,,get(x)),row.names(mat1)),];(!!colSums(x1))*1})) res # D0989 D9820 D5629 D4327 D2134 #IslandA 1 1 0 1 0 #IslandB 0 1 1 0 1 Regarding the use of !!colSums() You can check these: t(sapply(c(IslandA,IslandB),function(x) {x1- mat1[match(gsub(.*\\s+,,get(x)),row.names(mat1)),];!colSums(x1)})) # D0989 D9820 D5629 D4327 D2134 #IslandA FALSE FALSE TRUE FALSE TRUE #IslandB TRUE FALSE FALSE TRUE FALSE t(sapply(c(IslandA,IslandB),function(x) {x1- mat1[match(gsub(.*\\s+,,get(x)),row.names(mat1)),];!!colSums(x1)})) # D0989 D9820 D5629 D4327 D2134 #IslandA TRUE TRUE FALSE TRUE FALSE #IslandB FALSE TRUE TRUE FALSE TRUE # *1 will replace TRUE with 1 and FALSE with 0. A.K. From: Elaine Kuo elaine.kuo...@gmail.com To: arun smartpink...@yahoo.com Sent: Wednesday, July 31, 2013 6:58 PM Subject: Re: [R] merge matrix row data Dear Arun, Thank you for the clear explanation. The row.names question is a mistyping, for I do not have enough sleep last night. Two more questions 1. If the row names are 1, 2, and 4 etc (numbers) instead of GID 1, GID 2, and GID 3, is there any modification in need for the code ? 2. Please kindly explain the code (!!colSums(x1))*1} It is the critical part to merge the row data. Thanks again. Elaine On Thu, Aug 1, 2013 at 6:45 AM, arun smartpink...@yahoo.com wrote: Dear Elaine, I used that line only because you didn't provide the data using dput(). So, I need to either use delimiter , or just leave a space by first joining the GID and the numbers using _. I chose the latter as I didn't had that much time to spent by putting , between each entries. After that, I removed _ using the ?gsub(). As Bert pointed out, there are many online resources for understanding regular expression. In this particular case, what I did was to single out the _ in the first pair of quotes, and replace with space in the second pair of quotes . Therefore, GID_1, would become GID 1, which is what your original dataset looks like. If you type row.names(mat1) on the R console and enter, you will be able to get the output. Hope it helps. Arun From: Elaine Kuo elaine.kuo...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Wednesday, July 31, 2013 5:07 PM Subject: Re: [R] merge matrix row data Dear Arun Thank you for the very useful help. However, please kindly explain the code below. row.names(mat1)- gsub([_], ,row.names(mat1)) 1. what does [_] mean? 2. what does mean? 3. what does row.names(mat1) mean? I checked ?gsub but still did not get the idea. Thank you again Elaine On Wed, Jul 31, 2013 at 9:35 PM, arun smartpink...@yahoo.com wrote: HI, Please use ?dput() mat1- as.matrix(read.table(text= D0989 D9820 D5629 D4327 D2134 GID_1 1 0 0 1 0 GID_2 0 1 1 0 0 GID_4 0 0 1 0 0 GID_5 1 1 0 0 0 GID_7 0 1 0 0 1 ,sep=,header=TRUE)) row.names(mat1)- gsub([_], ,row.names(mat1)) IslandA-c(GID 1, GID 5) IslandB- c(GID 2, GID 4, GID 7) res- t(sapply(c(IslandA,IslandB),function(x) {x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} )) res # D0989 D9820 D5629 D4327 D2134 #IslandA 1 1 0 1 0 #IslandB 0 1 1 0 1 A.K. - Original Message - From: Elaine Kuo elaine.kuo...@gmail.com To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch Cc: Sent: Wednesday, July 31, 2013 9:03 AM Subject: [R] merge matrix row data Dear list, I have a matrix showing the species presence-absence on a map. Its rows are map locations, represented by GridCellID, such as GID1 and GID 5. Its columns are species ID, such as D0989, D9820, and D5629. The matrix is as followed. Now I want to merge the GridCellID according to the map location of each island. For instance, Island A consist of GID 1 and
Re: [R] Split in blocks
Hi, Not clear about your desired output. source(ou.txt) split(ou,ou$V1) #split based on values of V1 (1 and 0) #or #may be you wanted 1 followed by 0 in one block, again 1 followed by 0 in second block etc.. #In that case: lst1-split(ou,cumsum(c(TRUE,diff(ou$V1)==1))) A.K. On Wed, Jul 31, 2013 at 7:39 AM, Dominic Roye dominic.r...@gmail.com wrote: Hello, I am a little bit lost on my search for a solution and idea. I would like to split my time serie in blocks of night. V1 indicates if its night or not. How can i split this kind of cases? Best regards, str(ou[,c(1,3,8)]) 'data.frame': 863 obs. of 3 variables: $ Fecha: POSIXct, format: 2013-07-04 00:10:00 ... $ Ta : num 22.6 22.2 22.2 22.2 22.2 ... $ V1 : num 1 1 1 1 1 1 1 1 1 1 ... dput(ou[,c(1,3,8)]) structure(list(Fecha = structure(c(1372889400, 137289, 1372890600, 1372891200, 1372891800, 1372892400, 1372893000, 1372893600, 1372894200, 1372894800, 1372895400, 1372896000, 1372896600, 1372897200, 1372897800, 1372898400, 1372899000, 1372899600, 1372900200, 1372900800, 1372901400, 1372902000, 1372902600, 1372903200, 1372903800, 1372904400, 1372905000, 1372905600, 1372906200, 1372906800, 1372907400, 1372908000, 1372908600, 1372909200, 1372909800, 1372910400, 1372911000, 1372911600, 1372912200, 1372912800, 1372913400, 1372914000, 1372914600, 1372915200, 1372915800, 1372916400, 1372917000, 1372917600, 1372918200, 1372918800, 1372919400, 137292, 1372920600, 1372921200, 1372921800, 1372922400, 1372923000, 1372923600, 1372924200, 1372924800, 1372925400, 1372926000, 1372926600, 1372927200, 1372927800, 1372928400, 1372929000, 1372929600, 1372930200, 1372930800, 1372931400, 1372932000, 1372932600, 1372933200, 1372933800, 1372934400, 1372935000, 1372935600, 1372936200, 1372936800, 1372937400, 1372938000, 1372938600, 1372939200, 1372939800, 1372940400, 1372941000, 1372941600, 1372942200, 1372942800, 1372943400, 1372944000, 1372944600, 1372945200, 1372945800, 1372946400, 1372947000, 1372947600, 1372948200, 1372948800, 1372949400, 137295, 1372950600, 1372951200, 1372951800, 1372952400, 1372953000, 1372953600, 1372954200, 1372954800, 1372955400, 1372956000, 1372956600, 1372957200, 1372957800, 1372958400, 1372959000, 1372959600, 1372960200, 1372960800, 1372961400, 1372962000, 1372962600, 1372963200, 1372963800, 1372964400, 1372965000, 1372965600, 1372966200, 1372966800, 1372967400, 1372968000, 1372968600, 1372969200, 1372969800, 1372970400, 1372971000, 1372971600, 1372972200, 1372972800, 1372973400, 1372974000, 1372974600, 1372975200, 1372975800, 1372976400, 1372977000, 1372977600, 1372978200, 1372978800, 1372979400, 137298, 1372980600, 1372981200, 1372981800, 1372982400, 1372983000, 1372983600, 1372984200, 1372984800, 1372985400, 1372986000, 1372986600, 1372987200, 1372987800, 1372988400, 1372989000, 1372989600, 1372990200, 1372990800, 1372991400, 1372992000, 1372992600, 1372993200, 1372993800, 1372994400, 1372995000, 1372995600, 1372996200, 1372996800, 1372997400, 1372998000, 1372998600, 1372999200, 1372999800, 1373000400, 1373001000, 1373001600, 1373002200, 1373002800, 1373003400, 1373004000, 1373004600, 1373005200, 1373005800, 1373006400, 1373007000, 1373007600, 1373008200, 1373008800, 1373009400, 137301, 1373010600, 1373011200, 1373011800, 1373012400, 1373013000, 1373013600, 1373014200, 1373014800, 1373015400, 1373016000, 1373016600, 1373017200, 1373017800, 1373018400, 1373019000, 1373019600, 1373020200, 1373020800, 1373021400, 1373022000, 1373022600, 1373023200, 1373023800, 1373024400, 1373025000, 1373025600, 1373026200, 1373026800, 1373027400, 1373028000, 1373028600, 1373029200, 1373029800, 1373030400, 1373031000, 1373031600, 1373032200, 1373032800, 1373033400, 1373034000, 1373034600, 1373035200, 1373035800, 1373036400, 1373037000, 1373037600, 1373038200, 1373038800, 1373039400, 137304, 1373040600, 1373041200, 1373041800, 1373042400, 1373043000, 1373043600, 1373044200, 1373044800, 1373045400, 1373046000, 1373046600, 1373047200, 1373047800, 1373048400, 1373049000, 1373049600, 1373050200, 1373050800, 1373051400, 1373052000, 1373052600, 1373053200, 1373053800, 1373054400, 1373055000, 1373055600, 1373056200, 1373056800, 1373057400, 1373058000, 1373058600, 1373059200, 1373059800, 1373060400, 1373061000, 1373061600, 1373062200, 1373062800, 1373063400, 1373064000, 1373064600, 1373065200, 1373065800, 1373066400, 1373067000, 1373067600, 1373068200, 1373068800, 1373069400, 137307, 1373070600, 1373071200, 1373071800, 1373072400, 1373073000, 1373073600, 1373074200, 1373074800, 1373075400, 1373076000, 1373076600, 1373077200, 1373077800, 1373078400, 1373079000, 1373079600, 1373080200, 1373080800, 1373081400, 1373082000, 1373082600, 1373083200, 1373083800, 1373084400, 1373085000, 1373085600, 1373086200, 1373086800, 1373087400, 1373088000, 1373088600, 1373089200, 1373089800, 1373090400, 1373091000, 1373091600, 1373092200, 1373092800, 1373093400,
Re: [R] Split in blocks
Hi, In that case: lst1-split(ou,cumsum(c(TRUE,diff(ou$V1)==1))) lst2-lapply(lst1,function(x) x[x$V1==1,]) A.K. From: Dominic Roye dominic.r...@gmail.com To: arun smartpink...@yahoo.com Sent: Wednesday, July 31, 2013 7:17 PM Subject: Re: [R] Split in blocks Hi, The thing is that because of the change of day at 00:00, I can not split with the date. I created V1 for separate the day time from the night time (sunset till sunrise). But now I need to separate each night without the day time and the other nights. The aim is to obtain something like that in form of a list or an index indicating each night in the data.frame: I hope you understand now my explanation. Thank you. night 1: 278 2013-07-05 22:20:00 2.42 27.61 61 0 05.07.2013 22:20 1 279 2013-07-05 22:30:00 2.35 27.39 62 0 05.07.2013 22:30 1 280 2013-07-05 22:40:00 2.18 27.07 63 0 05.07.2013 22:40 1 281 2013-07-05 22:50:00 2.21 26.80 64 0 05.07.2013 22:50 1 282 2013-07-05 23:00:00 2.30 26.42 65 0 05.07.2013 23:00 1 283 2013-07-05 23:10:00 1.91 26.03 66 0 05.07.2013 23:10 1 284 2013-07-05 23:20:00 2.54 25.61 67 0 05.07.2013 23:20 1 285 2013-07-05 23:30:00 2.79 25.15 68 0 05.07.2013 23:30 1 286 2013-07-05 23:40:00 2.66 24.83 70 0 05.07.2013 23:40 1 287 2013-07-05 23:50:00 2.35 24.55 70 0 05.07.2013 23:50 1 288 2013-07-06 00:00:00 2.05 24.34 71 0 06.07.2013 00:00 1 289 2013-07-06 00:10:00 1.88 24.12 71 0 06.07.2013 00:10 1 290 2013-07-06 00:20:00 2.25 23.87 72 0 06.07.2013 00:20 1 291 2013-07-06 00:30:00 1.82 23.57 73 0 06.07.2013 00:30 1 292 2013-07-06 00:40:00 2.06 23.30 74 0 06.07.2013 00:40 1 293 2013-07-06 00:50:00 2.21 23.08 74 0 06.07.2013 00:50 1 294 2013-07-06 01:00:00 2.78 22.78 74 0 06.07.2013 01:00 1 295 2013-07-06 01:10:00 2.70 22.66 75 0 06.07.2013 01:10 1 296 2013-07-06 01:20:00 2.42 22.36 77 0 06.07.2013 01:20 1 297 2013-07-06 01:30:00 2.48 22.18 76 0 06.07.2013 01:30 1 298 2013-07-06 01:40:00 2.88 22.11 77 0 06.07.2013 01:40 1 299 2013-07-06 01:50:00 1.39 22.01 78 0 06.07.2013 01:50 1 300 2013-07-06 02:00:00 1.05 21.61 80 0 06.07.2013 02:00 1 301 2013-07-06 02:10:00 1.07 21.79 78 0 06.07.2013 02:10 1 302 2013-07-06 02:20:00 1.89 21.50 79 0 06.07.2013 02:20 1 303 2013-07-06 02:30:00 1.83 21.15 81 0 06.07.2013 02:30 1 304 2013-07-06 02:40:00 2.34 20.83 81 0 06.07.2013 02:40 1 305 2013-07-06 02:50:00 2.28 20.60 81 0 06.07.2013 02:50 1 306 2013-07-06 03:00:00 1.85 20.58 82 0 06.07.2013 03:00 1 307 2013-07-06 03:10:00 1.39 20.51 82 0 06.07.2013 03:10 1 308 2013-07-06 03:20:00 1.30 20.19 84 0 06.07.2013 03:20 1 309 2013-07-06 03:30:00 1.87 20.16 83 0 06.07.2013 03:30 1 310 2013-07-06 03:40:00 2.28 20.07 83 0 06.07.2013 03:40 1 311 2013-07-06 03:50:00 2.12 20.09 83 0 06.07.2013 03:50 1 312 2013-07-06 04:00:00 1.72 19.97 84 0 06.07.2013 04:00 1 313 2013-07-06 04:10:00 1.37 19.67 85 0 06.07.2013 04:10 1 314 2013-07-06 04:20:00 0.84 19.37 87 0 06.07.2013 04:20 1 315 2013-07-06 04:30:00 0.30 19.36 87 0 06.07.2013 04:30 1 316 2013-07-06 04:40:00 1.76 19.39 86 0 06.07.2013 04:40 1 317 2013-07-06 04:50:00 2.00 19.09 87 0 06.07.2013 04:50 1 318 2013-07-06 05:00:00 1.00 18.82 89 0 06.07.2013 05:00 1 319 2013-07-06 05:10:00 1.60 19.00 87 4 06.07.2013 05:10 1 320 2013-07-06 05:20:00 1.85 19.06 87 9 06.07.2013 05:20 1 321 2013-07-06 05:30:00 1.44 19.06 86 14 06.07.2013 05:30 1 322 2013-07-06 05:40:00 1.38 18.83 87 26 06.07.2013 05:40 1 323 2013-07-06 05:50:00 1.87 18.74 88 57 06.07.2013 05:50 1 324 2013-07-06 06:00:00 1.91 19.42 84 78 06.07.2013 06:00 1 325 2013-07-06 06:10:00 0.85 19.78 83 100 06.07.2013 06:10 1 326 2013-07-06 06:20:00 0.80 20.22 81 124 06.07.2013 06:20 1 327 2013-07-06 06:30:00 0.67 20.86 79 150 06.07.2013 06:30 1 328 2013-07-06 06:40:00 1.03 20.86 79 179 06.07.2013 06:40 1 329 2013-07-06 06:50:00 1.20 20.63 80 209 06.07.2013 06:50 1 330 2013-07-06 07:00:00 1.03 20.97 79 238 06.07.2013 07:00 1 night 2 421 2013-07-06 22:10:00 2.63 28.16 60 0 06.07.2013 22:10 1 422 2013-07-06 22:20:00 3.19 27.88 61 0 06.07.2013 22:20 1 423 2013-07-06 22:30:00 3.77 27.55 62 0 06.07.2013 22:30 1 424 2013-07-06 22:40:00 3.37 27.21 64 0 06.07.2013 22:40 1 425 2013-07-06 22:50:00 2.32 26.88 65 0 06.07.2013 22:50 1 426 2013-07-06 23:00:00 2.43 26.56 66 0 06.07.2013 23:00 1 427 2013-07-06 23:10:00 2.96 26.31 66 0 06.07.2013 23:10 1 428 2013-07-06 23:20:00 3.23 26.08 67 0 06.07.2013 23:20 1 429 2013-07-06 23:30:00 4.00 25.79 68 0 06.07.2013 23:30 1 430 2013-07-06 23:40:00 3.55 25.47 69 0 06.07.2013 23:40 1 431 2013-07-06
[R] R and S+ Courses: Brisbane, Melbourne, Sydney in Aug and Sep.
R and S+ Courseshttp://www.solutionmetrics.com.au/ Brisbane, Melbourne Sydneyhttp://www.solutionmetrics.com.au/ Hi, Apologies for cross-posting SolutionMetrics is presenting R and S+ courses in Brisbane, Melbourne Sydney - August September, 2013 To book, please email enquir...@solutionmetrics.com.aumailto:enquir...@solutionmetrics.com.au or call +61 2 9233 6888 Getting Started with R (2 Day) Day 1: Introduction to R, Data objects Classes, Data Import/Export, Data Manipulation, Graphics, Basic Statistical models, avoiding repetitive typing/clicking file management Day2: Writing your own simple functions, efficient programming, Advanced Visualisations, Data Mining - Logistic Regression/Tree models Working with Time-Series objects. More Infohttp://bit.ly/11qFxpO Date: 12-13 Aug, 2013 - Sydney (Mon-Tue) 19-20 Aug, 2013 - Melbourne (Mon-Tue) 26-27 Aug, 2013 - Brisbane (Mon-Tue) Getting Started with S+ (2 Day) Day 1: Course provides users with the knowledge to perform all day to day data analysis graphics tasks with just a click of a mouse (No Programming Required). Day2: Introduction to the S Language, Data objects Classes, Data Import/Export, Data Manipulation, Graphics, Basic Statistical models - Regression, avoiding repetitive typing/clicking file management. Course Outlinehttp://bit.ly/16fKTFY Date: 9-10 Sep, 2013 - Sydney (Mon-Tue) 19-20 Sep, 2013 - Melbourne (Thu-Fri) Intermediate R (1 Day) Efficient use of R language functions objects, Big Data, Advanced Graphics, Data Mining - Logistic Regression/Tree models Working with Time-Series objects. More Infohttp://bit.ly/YBsT5b Date: 29 Aug, 2013 - Sydney (Thu) 2 Sep, 2013 - Sydney (Mon) . For more information, please email enquir...@solutionmetrics.com.aumailto:enquir...@solutionmetrics.com.au or call + 61 2 9233 6888 Cheers Kris Angelovski | Director| SolutionMetrics T +61 2 9233 6888 | F +61 2 9233 4099 Suite 44, Level 9, 88 Pitt Street, Sydney NSW 2000 solutionmetrics.com.auhttp://www.solutionmetrics.com.au/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help
On 07/31/2013 10:03 PM, Mª Teresa Martinez Soriano wrote: Hi First of all, thanks for this service, it is being very useful for me. I am new in R so I have a lot of doubts. I have to do imputation in a data set, this is a sample of my data set which looks like: NUMERO Data1 Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010 20133 30/09/2002 18/06/2013 153 279 289 370 412 262 115 75 21138 11/07/2002 13/05/2009546078638365 12009 16763 NA NA NA 22146 16/10/2009 18/06/2013 NA NA NA NA NA NA NA 35 23152 27/05/1999 18/06/2013 NA 80 77 60 89 137 144 146 24154 21/12/2004 18/06/2013 NA NA 148 186 302 233 194 204 25166 8/02/2008 18/06/2013 NA NA NA NA NA NA 98 160 26177 20/02/1996 18/06/2013 16 4 NA 3 3 NA 5 5 The problem is that I have cells which have to be empty, this depends on Data1 and Data2 For instance in the third row, you can see that Data1 is equal to 16/10/2009, so I don't have to have any information until year 2009, therefore IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008 have to be totally empty, but this doesn't mean that they are missing values, in fact they are not. I don't want to get any imputation in this cells. Ie.2009 and IE.2010 have to be full and they are not, so this cells are missing values and I want to get imputed values for them. (I would delete this row, because it is impossible to get any imformation about it, but it is ok for this example) On the other hand, in the last row NA is a real missing value. How can I specify that this cells are empty and don't get this imputed values?? I have tried to put NaN but I have problems in some functions that I need to do it before the imputation. Hi Teresa, I didn't see an answer to this, so I'll offer a couple of suggestions. First, NA is probably the best thing to have in your empty cells. If you change the NA cells to , the columns will become factors, and if you then change the values back to numeric, the blanks will become NAs again. I would get a set of vectors of logical values that indicated which cells you _don't_ want to impute (say your data frame is tmsdf): dontimpute2003-which( as.numeric(unlist(sapply(strsplit(tmsdf$Data1,/),[,3))) 2003 is.na(tmsdf$IE.2003)) dontimpute2004-which( as.numeric(unlist(sapply(strsplit(tmsdf$Data1,/),[,3))) 2004 is.na(tmsdf$IE.2004)) ... then do your imputation on the entire data frame and reset the ones you don't want imputed to NA: tmsdf$2003[dontimpute2003]-NA ... Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: color histograms by quintile
Hello, I have a basic panel of histograms as follows, whose current colors don't matter: binsize=diff(range(thing$Rate))/64 ggplot(thing, aes(x=Rate, fill=Series)) + geom_histogram(binwidth=binsize) + facet_grid(Series~.,scales=free)+ labs(fill=Index) + xlab(Growth Rate (%)) + theme(axis.title.y=element_blank(),legend.position=c(1,.64), legend.justification=c(1,1),strip.text.y = theme_blank()) + scale_x_continuous(breaks=c(-10,-5,-2:10,15,20)) + geom_vline(xintercept=0, linetype=dotted) rm(binsize) What I would like to do is color each of the four histograms by its own deciles. Essentially quantile(trim.index$Rate,c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1)) would give me the over-all deciles, but I would like them broken down by the elements of the Series variable, and then applied to the histograms as shading or coloring. Does this makes sense? I've dumped the first 100 rows of data below. Thanks in advance for any help you're able to provide. David structure(list(Trials = 1:100, Year = c(2005L, 2008L, 2006L, 2007L, 2006L, 2004L, 2004L, 2003L, 2007L, 2005L, 2008L, 2006L, 2011L, 2005L, 2004L, 2003L, 2010L, 2002L, 2008L, 2005L, 2005L, 2004L, 2006L, 2011L, 2011L, 2008L, 2006L, 2004L, 2002L, 2003L, 2009L, 2004L, 2003L, 2011L, 2006L, 2002L, 2007L, 2010L, 2005L, 2008L, 2011L, 2008L, 2010L, 2005L, 2004L, 2009L, 2002L, 2008L, 2002L, 2006L, 2003L, 2007L, 2006L, 2006L, 2002L, 2002L, 2010L, 2008L, 2008L, 2003L, 2003L, 2009L, 2007L, 2009L, 2004L, 2005L, 2011L, 2010L, 2005L, 2008L, 2008L, 2008L, 2007L, 2008L, 2008L, 2007L, 2002L, 2009L, 2011L, 2002L, 2002L, 2006L, 2007L, 2007L, 2002L, 2009L, 2007L, 2003L, 2010L, 2010L, 2009L, 2003L, 2010L, 2003L, 2007L, 2006L, 2010L, 2005L, 2004L, 2010L), Month = c(12L, 4L, 5L, 3L, 9L, 12L, 4L, 3L, 6L, 10L, 6L, 11L, 1L, 6L, 9L, 10L, 5L, 3L, 11L, 10L, 2L, 8L, 9L, 7L, 8L, 8L, 7L, 1L, 9L, 1L, 11L, 3L, 12L, 1L, 6L, 7L, 6L, 8L, 12L, 8L, 11L, 11L, 5L, 7L, 2L, 6L, 9L, 9L, 11L, 6L, 11L, 5L, 5L, 3L, 10L, 6L, 7L, 8L, 9L, 2L, 3L, 11L, 8L, 4L, 12L, 6L, 10L, 10L, 12L, 9L, 4L, 12L, 12L, 12L, 6L, 6L, 11L, 1L, 5L, 6L, 2L, 4L, 7L, 10L, 12L, 4L, 5L, 8L, 7L, 2L, 6L, 10L, 10L, 10L, 10L, 2L, 6L, 6L, 9L, 9L), Core.CPI.Weighting = c(2L, 3L, 2L, 5L, 1L, 5L, 4L, 5L, 4L, 1L, 3L, 4L, 5L, 2L, 4L, 5L, 1L, 5L, 1L, 3L, 2L, 1L, 5L, 2L, 1L, 2L, 5L, 5L, 4L, 2L, 4L, 4L, 5L, 5L, 2L, 5L, 3L, 4L, 5L, 1L, 2L, 2L, 5L, 3L, 2L, 2L, 5L, 3L, 2L, 4L, 2L, 4L, 1L, 3L, 1L, 4L, 1L, 3L, 1L, 1L, 4L, 2L, 3L, 2L, 2L, 5L, 4L, 4L, 3L, 4L, 2L, 5L, 2L, 5L, 1L, 2L, 5L, 5L, 5L, 2L, 5L, 3L, 3L, 1L, 5L, 2L, 2L, 2L, 1L, 3L, 5L, 3L, 4L, 3L, 3L, 1L, 1L, 2L, 2L, 3L), CPI.Food = c(0.023474768, 0.043433814, 0.029315923, 0.042208873, 0.035479323, 0.024429485, 0.028537661, 0.027623773, 0.045546671, 0.023973579, 0.045546671, 0.038421672, 0.037161108, 0.023102181, 0.032765694, 0.032962625, 0.008051879, 0.028741685, 0.053639179, 0.025192645, 0.025077433, 0.032806764, 0.023605006, 0.025644434, 0.029584922, 0.031756778, 0.032450724, 0.026035343, 0.020656969, 0.026035343, 0.010684754, 0.029551194, 0.02442531, 0.012348667, 0.030959528, 0.023781539, 0.045546671, 0.008345359, 0.024429485, 0.031756778, 0.034731773, 0.053639179, 0.008051879, 0.023005118, 0.030315091, 0.04149634, 0.019373857, 0.051078725, 0.022406708, 0.030959528, 0.022406708, 0.044396055, 0.023304811, 0.025196539, 0.020831987, 0.016599861, 0.008044572, 0.049349997, 0.026691689, 0.01612059, 0.015903088, 0.010684754, 0.033979886, 0.048496522, 0.024429485, 0.023102181, 0.033084021, 0.033084021, 0.024429485, 0.026691689, 0.048496522, 0.054246603, 0.039956669, 0.054246603, 0.045546671, 0.045546671, 0.018027105, 0.053917666, 0.034171337, 0.025416646, 0.029331567, 0.02412957, 0.032450724, 0.052641267, 0.017531941, 0.048496522, 0.044396055, 0.018367312, 0.008044572, 0.013896245, 0.04149634, 0.032962625, 0.009905593, 0.032962625, 0.052641267, 0.025077433, 0.023022986, 0.023102181, 0.032765694, 0.00916168), PPI.Farm = c(0.009730106, 0.204892729, 0.138453455, 0.210017271, 0.178801715, -0.017104315, 0.168632738, 0.157512456, 0.208907609, -0.007879949, 0.208907609, 0.187585976, 0.171910952, -0.0471555, 0.144318736, 0.111713247, -0.000515726, 3.019e-05, 0.120566043, -0.027737238, -0.021152479, 0.168890071, -0.020784628, 0.231482252, 0.050380553, -0.141247793, 0.16832412, 0.140014634, -0.04669922, 0.140014634, 0.095775695, 0.02684028, 0.142349938, 0.125929795, 0.154898078, -0.043965946, 0.208907609, 0.033632438, -0.017104315, -0.141247793, 0.215496099, 0.120566043, -0.000515726, -0.041130752, 0.041959105, -0.105460885, 0.070882919, 0.189171764, 0.122047713, 0.154898078, 0.122047713, 0.202255379, -0.048781928, -0.033167528, 0.099171658, 0.041330169, 0.015714896, 0.204083122, -0.154114835, -0.008444152, -0.007718043, 0.095775695, 0.170931281, -0.067437568, -0.017104315, -0.0471555, 0.228225921, 0.228225921, -0.017104315, -0.154114835, -0.067437568, 0.072354298, 0.198873427, 0.072354298, 0.208907609, 0.208907609,