Re: [R] "Best" way to merge 300+ .5MB dataframes?
On 12/08/2014 07:07, David Winsemius wrote: On Aug 11, 2014, at 8:01 PM, John McKown wrote: On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams wrote: Grant, Assuming all your filenames are something like file1.txt, file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to the directory where your files are located... This will strip off the 1st lines, that is, your header lines: for file in *.txt;do sed -i '1d'${file}; done Then, do this: cat *.txt > newfilename.txt Doing both should only take a few seconds, depending on your file sizes. Cheers! Tom Using sed hadn't occurred to me. I guess I'm just "awk-ward" . A slightly different way would be: for file in *.txt;do sed '1d' ${file} done >newfilename.txt that way the original files are not modified. But it strips out the header on the 1st file as well. Not a big deal, but the read.table will need to be changed to accommodate that. Also, it creates an otherwise unnecessary intermediate file "newfilename.txt". To get the 1st file's header, the script could: head -1 >newfilename.txt for file in *.txt;do sed '1d' ${file} done >>newfilename.txt I really like having multiple answers to a given problem. Especially since I have a poorly implemented version of "awk" on one of my systems. It is the vendor's "awk" and conforms exactly to the POSIX definition with no additions. So I don't have the FNR built-in variable. Your implementation would work well on that system. Well, if there were a version of R for it. It is a branded UNIX system which was designed to be totally __and only__ POSIX compliant, with few (maybe no) extensions at all. IOW, it stinks. No, it can't be replaced. It is the z/OS system from IBM which is EBCDIC based and runs on the "big iron" mainframe, system z. -- On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send. For historical reasons this is actually part of R's configuration: see the AWK entry in R_HOME/etc/Makeconf. (There is an SED entry too: not all sed's in current OSes are POSIX-compliant.) Using system2() rather than system() is recommended for new code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "Best" way to merge 300+ .5MB dataframes?
On Aug 11, 2014, at 8:01 PM, John McKown wrote: > On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams wrote: >> Grant, >> >> Assuming all your filenames are something like file1.txt, >> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to >> the directory where your files are located... >> >> This will strip off the 1st lines, that is, your header lines: >> >> for file in *.txt;do >> sed -i '1d'${file}; >> done >> >> Then, do this: >> >> cat *.txt > newfilename.txt >> >> Doing both should only take a few seconds, depending on your file sizes. >> >> Cheers! >> Tom >> > > Using sed hadn't occurred to me. I guess I'm just "awk-ward" . > A slightly different way would be: > > for file in *.txt;do > sed '1d' ${file} > done >newfilename.txt > > that way the original files are not modified. But it strips out the > header on the 1st file as well. Not a big deal, but the read.table > will need to be changed to accommodate that. Also, it creates an > otherwise unnecessary intermediate file "newfilename.txt". To get the > 1st file's header, the script could: > > head -1 >newfilename.txt > for file in *.txt;do > sed '1d' ${file} > done >>newfilename.txt > > I really like having multiple answers to a given problem. Especially > since I have a poorly implemented version of "awk" on one of my > systems. It is the vendor's "awk" and conforms exactly to the POSIX > definition with no additions. So I don't have the FNR built-in > variable. Your implementation would work well on that system. Well, if > there were a version of R for it. It is a branded UNIX system which > was designed to be totally __and only__ POSIX compliant, with few > (maybe no) extensions at all. IOW, it stinks. No, it can't be > replaced. It is the z/OS system from IBM which is EBCDIC based and > runs on the "big iron" mainframe, system z. > > -- On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "Best" way to merge 300+ .5MB dataframes?
On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams wrote: > Grant, > > Assuming all your filenames are something like file1.txt, > file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to > the directory where your files are located... > > This will strip off the 1st lines, that is, your header lines: > > for file in *.txt;do > sed -i '1d'${file}; > done > > Then, do this: > > cat *.txt > newfilename.txt > > Doing both should only take a few seconds, depending on your file sizes. > > Cheers! > Tom > Using sed hadn't occurred to me. I guess I'm just "awk-ward" . A slightly different way would be: for file in *.txt;do sed '1d' ${file} done >newfilename.txt that way the original files are not modified. But it strips out the header on the 1st file as well. Not a big deal, but the read.table will need to be changed to accommodate that. Also, it creates an otherwise unnecessary intermediate file "newfilename.txt". To get the 1st file's header, the script could: head -1 >newfilename.txt for file in *.txt;do sed '1d' ${file} done >>newfilename.txt I really like having multiple answers to a given problem. Especially since I have a poorly implemented version of "awk" on one of my systems. It is the vendor's "awk" and conforms exactly to the POSIX definition with no additions. So I don't have the FNR built-in variable. Your implementation would work well on that system. Well, if there were a version of R for it. It is a branded UNIX system which was designed to be totally __and only__ POSIX compliant, with few (maybe no) extensions at all. IOW, it stinks. No, it can't be replaced. It is the z/OS system from IBM which is EBCDIC based and runs on the "big iron" mainframe, system z. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "Best" way to merge 300+ .5MB dataframes?
Grant, Assuming all your filenames are something like file1.txt, file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to the directory where your files are located... This will strip off the 1st lines, that is, your header lines: for file in *.txt;do sed -i '1d'${file}; done Then, do this: cat *.txt > newfilename.txt Doing both should only take a few seconds, depending on your file sizes. Cheers! Tom On Mon, Aug 11, 2014 at 12:01 PM, Grant Rettke wrote: > On Sun, Aug 10, 2014 at 6:50 PM, John McKown > wrote: > > > OK, I assume this results in a vector of file names in a variable, > > like you'd get from list.files(); > > Yes. > > > Why? Do you need them in separate data frames? > > I do not. > > > The meat of the question. If you don't need the files in separate data > > frames, and the files do _NOT_ have headers, then I would just load > > them all into a single frame. I used Linux and so my solution may not > > work on Windows. Something like: > > Excellent point. All of the files do have the same header. I'm on OSX > so there must be a nice > one liner to concatenate all of the individual files, dropping the > first line for all but the first. Danke! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "Best" way to merge 300+ .5MB dataframes?
On Sun, Aug 10, 2014 at 6:50 PM, John McKown wrote: > OK, I assume this results in a vector of file names in a variable, > like you'd get from list.files(); Yes. > Why? Do you need them in separate data frames? I do not. > The meat of the question. If you don't need the files in separate data > frames, and the files do _NOT_ have headers, then I would just load > them all into a single frame. I used Linux and so my solution may not > work on Windows. Something like: Excellent point. All of the files do have the same header. I'm on OSX so there must be a nice one liner to concatenate all of the individual files, dropping the first line for all but the first. Danke! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading chunks of data from a file more efficiently
Scott, there is a package called ff that '... provides data structures that are stored on disk but behave (almost) as if they were in RAM ...' i hope it helps peter On Sat, Aug 9, 2014 at 6:31 PM, Waichler, Scott R wrote: > Hi, > > I have some very large (~1.1 GB) output files from a groundwater model > called STOMP that I want to read as efficiently as possible. For each > variable there are over 1 million values to read. Variables are not > organized in columns; instead they are written out in sections in the file, > like this: > > X-Direction Node Positions, m > 5.93145E+05 5.93155E+05 5.93165E+05 5.93175E+05 > 5.93245E+05 5.93255E+05 5.93265E+05 5.93275E+05 > . . . > 5.94695E+05 5.94705E+05 5.94715E+05 5.94725E+05 > 5.94795E+05 5.94805E+05 5.94815E+05 5.94825E+05 > > Y-Direction Node Positions, m > 1.14805E+05 1.14805E+05 1.14805E+05 1.14805E+05 > 1.14805E+05 1.14805E+05 1.14805E+05 1.14805E+05 > . . . > 1.17195E+05 1.17195E+05 1.17195E+05 1.17195E+05 > 1.17195E+05 1.17195E+05 1.17195E+05 1.17195E+05 > > Z-Direction Node Positions, m > 9.55000E+01 9.55000E+01 9.55000E+01 9.55000E+01 > 9.55000E+01 9.55000E+01 9.55000E+01 9.55000E+01 > . . . > > I want to read and use only a subset of the variables. I wrote the > function below to find the line where each target variable begins and then > scan the values, but it still seems rather slow, perhaps because I am > opening and closing the file for each variable. Can anyone suggest a > faster way? > > # Reads original STOMP plot file (plot.*) directly. Should be useful when > the plot files are > # very large with lots of variables, and you just want to retrieve a few > of them. > # Arguments: 1) plot filename, 2) number of nodes, > # 3) character vector of names of target variables you want to return. > # Returns a list with the selected plot output. > READ.PLOT.OUTPUT6 <- function(plt.file, num.nodes, var.names) { > lines <- readLines(plt.file) > num.vars <- length(var.names) > tmp <- list() > for(i in 1:num.vars) { > ind <- grep(var.names[i], lines, fixed=T, useBytes=T) > if(length(ind) != 1) stop("Not one line in the plot file with matching > variable name.\n") > tmp[[i]] <- scan(plt.file, skip=ind, nmax=num.nodes, quiet=T) > } > return(tmp) > } # end READ.PLOT.OUTPUT6() > > Regards, > Scott Waichler > Pacific Northwest National Laboratory > Richland, WA, USA > scott.waich...@pnnl.gov > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Peter Salzman, PhD Department of Biostatistics and Computational Biology University of Rochester [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing graphs
Hi If you want a 1 package and 1 function approach try this xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d, par.settings = list(strip.background = list(col = "transparent")), layout = c(3,1), aspect = 1, type = c("b","g"), scales = list(alternating = FALSE), panel = function(x,y,...){ panel.xyplot(x,y,...) # f1<-function(x,v,cl,t) # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t), panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30) } ) # par.settings ... if you are publishing show text better # with factor if you want 1:3 omit the levels # has advantage of doing more things than in groupedData as Doug Bates has said Regards Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Naser Jamil Sent: Monday, 11 August 2014 19:06 To: R help Subject: [R] Superimposing graphs Dear R-user, May I seek your help to sort out a little problem. I have the following codes to draw two graphs. I want to superimpose the second one on each of the first one. library(nlme) subject<-c(1,1,1,2,2,2,3,3,3) time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, 0.10593292,1.20808375,0.47638394,0.02808967) data.d=data.frame(subject=subject,time=time,conc=con.cohort) grouped.data<-groupedData(formula=conc~time | subject, data =data.d) plot(grouped.data) ## f1<-function(x,v,cl,t) { (x/v)*exp(-(cl/v)*t) } t<-seq(0,30, .01) plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time", ylab="conc") ### Any suggestion will really be helpful. Regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient way to replace a range of numeric with a integer in a matrix
On Aug 11, 2014, at 3:27 PM, Jinsong Zhao wrote: > On 2014/8/11 14:50, William Dunlap wrote: >> You can use >> m[m > 0 & m <= 1.0] <- 1 >> m[m > 1 ] <- 2 >> or, if you have lots of intervals, something based on findInterval(). E.g., >> m[] <- findInterval(m, c(-Inf, 0, 1, Inf)) - 1 >> OR, if you have irregularly spaced intervals or particular values to match to the intervals, you can use findInterval to define categories and select with "[": > set.seed(42); m <- matrix( rnorm(100, 10, 5), 10) > round( m, 2) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 16.85 16.52 8.47 12.28 11.03 11.61 8.16 4.78 17.56 16.96 [2,] 7.18 21.43 1.09 13.52 8.19 6.08 10.93 9.55 11.29 7.62 [3,] 11.82 3.06 9.14 15.18 13.79 17.88 12.91 13.12 10.44 13.25 [4,] 13.16 8.61 16.07 6.96 6.37 13.21 17.00 5.23 9.40 16.96 [5,] 12.02 9.33 19.48 12.52 3.16 10.45 6.36 7.29 4.03 4.45 [6,] 9.47 13.18 7.85 1.41 12.16 11.38 16.51 12.90 13.06 5.70 [7,] 17.56 8.58 8.71 6.08 5.94 13.40 11.68 13.84 8.91 4.34 [8,] 9.53 -3.28 1.18 5.75 17.22 10.45 15.19 12.32 9.09 2.70 [9,] 20.09 -2.20 12.30 -2.07 7.84 -4.97 14.60 5.57 14.67 10.40 [10,] 9.69 16.60 6.80 10.18 13.28 11.42 13.60 4.50 14.11 13.27 > m[] <- c(1,2,4,8,16, 32) [ findInterval(m, c(-Inf, 2, 5, 10, 15, 18, Inf) ) ] > m [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 16 16488842 1616 [2,]4 321844848 4 [3,]824 168 16888 8 [4,]84 16448 164416 [5,]84 32828442 2 [6,]484188 1688 4 [7,] 1644448884 2 [8,]4114 168 1684 2 [9,] 3218141848 8 [10,]4 164888828 8 -- David. >> (What do you want to do with non-positive numbers?) >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com > > Thank you very much. > > I think findInterval() is what I want. > > Regards, > Jinsong > >> >> >> On Mon, Aug 11, 2014 at 2:40 PM, Jinsong Zhao wrote: >>> Hi there, >>> >>> I hope to replace a range of numeric in a matrix with a integer. For >>> example, in the following matrix, I want to use 1 to replace the elements >>> range from 0.0 to 1.0, and all larger than 1. with 2. >>> (m <- matrix(runif(16, 0, 2), nrow = 4)) >>> [,1] [,2] [,3] [,4] >>> [1,] 0.7115088 0.55370418 0.1586146 1.882931 >>> [2,] 0.9068198 0.38081423 0.9172629 1.713592 >>> [3,] 1.5210150 0.93900649 1.2609942 1.744456 >>> [4,] 0.3779058 0.03130103 0.1893477 1.601181 >>> >>> so I want to get something like: >>> >>> [,1] [,2] [,3] [,4] >>> [1,]1112 >>> [2,]1112 >>> [3,]2122 >>> [4,]1112 >>> >>> I wrote a function to do such thing: >>> >>> fun <- function(x) { >>> if (is.na(x)) { >>> NA >>> } else if (x > 0.0 && x <= 1.0) { >>> 1 >>> } else if (x > 1.0) { >>> 2 >>> } else { >>> x >>> } >>> } >>> >>> Then run it as: >>> apply(m,2,function(i) sapply(i, fun)) >>> >>> However, it seems that this method is not efficient when the dimension is >>> large, e.g., 5000x5000 matrix. >>> >>> Any suggestions? Thanks in advance! >>> >>> Best regards, >>> Jinsong > David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient way to replace a range of numeric with a integer in a matrix
On 2014/8/11 14:50, William Dunlap wrote: You can use m[m > 0 & m <= 1.0] <- 1 m[m > 1 ] <- 2 or, if you have lots of intervals, something based on findInterval(). E.g., m[] <- findInterval(m, c(-Inf, 0, 1, Inf)) - 1 (What do you want to do with non-positive numbers?) Bill Dunlap TIBCO Software wdunlap tibco.com Thank you very much. I think findInterval() is what I want. Regards, Jinsong On Mon, Aug 11, 2014 at 2:40 PM, Jinsong Zhao wrote: Hi there, I hope to replace a range of numeric in a matrix with a integer. For example, in the following matrix, I want to use 1 to replace the elements range from 0.0 to 1.0, and all larger than 1. with 2. (m <- matrix(runif(16, 0, 2), nrow = 4)) [,1] [,2] [,3] [,4] [1,] 0.7115088 0.55370418 0.1586146 1.882931 [2,] 0.9068198 0.38081423 0.9172629 1.713592 [3,] 1.5210150 0.93900649 1.2609942 1.744456 [4,] 0.3779058 0.03130103 0.1893477 1.601181 so I want to get something like: [,1] [,2] [,3] [,4] [1,]1112 [2,]1112 [3,]2122 [4,]1112 I wrote a function to do such thing: fun <- function(x) { if (is.na(x)) { NA } else if (x > 0.0 && x <= 1.0) { 1 } else if (x > 1.0) { 2 } else { x } } Then run it as: apply(m,2,function(i) sapply(i, fun)) However, it seems that this method is not efficient when the dimension is large, e.g., 5000x5000 matrix. Any suggestions? Thanks in advance! Best regards, Jinsong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient way to replace a range of numeric with a integer in a matrix
You can use m[m > 0 & m <= 1.0] <- 1 m[m > 1 ] <- 2 or, if you have lots of intervals, something based on findInterval(). E.g., m[] <- findInterval(m, c(-Inf, 0, 1, Inf)) - 1 (What do you want to do with non-positive numbers?) Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Aug 11, 2014 at 2:40 PM, Jinsong Zhao wrote: > Hi there, > > I hope to replace a range of numeric in a matrix with a integer. For > example, in the following matrix, I want to use 1 to replace the elements > range from 0.0 to 1.0, and all larger than 1. with 2. > >> (m <- matrix(runif(16, 0, 2), nrow = 4)) > [,1] [,2] [,3] [,4] > [1,] 0.7115088 0.55370418 0.1586146 1.882931 > [2,] 0.9068198 0.38081423 0.9172629 1.713592 > [3,] 1.5210150 0.93900649 1.2609942 1.744456 > [4,] 0.3779058 0.03130103 0.1893477 1.601181 > > so I want to get something like: > > [,1] [,2] [,3] [,4] > [1,]1112 > [2,]1112 > [3,]2122 > [4,]1112 > > I wrote a function to do such thing: > > fun <- function(x) { > if (is.na(x)) { > NA > } else if (x > 0.0 && x <= 1.0) { > 1 > } else if (x > 1.0) { > 2 > } else { > x > } > } > > Then run it as: > >> apply(m,2,function(i) sapply(i, fun)) > > However, it seems that this method is not efficient when the dimension is > large, e.g., 5000x5000 matrix. > > Any suggestions? Thanks in advance! > > Best regards, > Jinsong > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient way to replace a range of numeric with a integer in a matrix
(m>1)+1 On Mon, Aug 11, 2014 at 5:40 PM, Jinsong Zhao wrote: > Hi there, > > I hope to replace a range of numeric in a matrix with a integer. For > example, in the following matrix, I want to use 1 to replace the elements > range from 0.0 to 1.0, and all larger than 1. with 2. > >> (m <- matrix(runif(16, 0, 2), nrow = 4)) > [,1] [,2] [,3] [,4] > [1,] 0.7115088 0.55370418 0.1586146 1.882931 > [2,] 0.9068198 0.38081423 0.9172629 1.713592 > [3,] 1.5210150 0.93900649 1.2609942 1.744456 > [4,] 0.3779058 0.03130103 0.1893477 1.601181 > > so I want to get something like: > > [,1] [,2] [,3] [,4] > [1,]1112 > [2,]1112 > [3,]2122 > [4,]1112 > > I wrote a function to do such thing: > > fun <- function(x) { > if (is.na(x)) { > NA > } else if (x > 0.0 && x <= 1.0) { > 1 > } else if (x > 1.0) { > 2 > } else { > x > } > } > > Then run it as: > >> apply(m,2,function(i) sapply(i, fun)) > > However, it seems that this method is not efficient when the dimension is > large, e.g., 5000x5000 matrix. > > Any suggestions? Thanks in advance! > > Best regards, > Jinsong > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] efficient way to replace a range of numeric with a integer in a matrix
Hi there, I hope to replace a range of numeric in a matrix with a integer. For example, in the following matrix, I want to use 1 to replace the elements range from 0.0 to 1.0, and all larger than 1. with 2. > (m <- matrix(runif(16, 0, 2), nrow = 4)) [,1] [,2] [,3] [,4] [1,] 0.7115088 0.55370418 0.1586146 1.882931 [2,] 0.9068198 0.38081423 0.9172629 1.713592 [3,] 1.5210150 0.93900649 1.2609942 1.744456 [4,] 0.3779058 0.03130103 0.1893477 1.601181 so I want to get something like: [,1] [,2] [,3] [,4] [1,]1112 [2,]1112 [3,]2122 [4,]1112 I wrote a function to do such thing: fun <- function(x) { if (is.na(x)) { NA } else if (x > 0.0 && x <= 1.0) { 1 } else if (x > 1.0) { 2 } else { x } } Then run it as: > apply(m,2,function(i) sapply(i, fun)) However, it seems that this method is not efficient when the dimension is large, e.g., 5000x5000 matrix. Any suggestions? Thanks in advance! Best regards, Jinsong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to process multiple data files using R loop
In addition to the solution and comments that you have already received, here are a couple of additional comments: This is a variant on FAQ 7.21, if you had found that FAQ then it would have told you about the get function. The most important part of the answer in FAQ 7.21 is the last part where it says that it is better to use a list. If all the objects of interest are related and you want to do the same or similar things to each one, then having them all stored in a single list can simplify things for the future. You can collect all the objects into a single list using the mget command, e.g.: P_objects <- mget( ls(pattern='P_')) Now that they are in a list you can do the equivalent of your loop, but simpler with the lapply function, e.g.: lapply( P_objects, head, 2 ) And if you want to do other things with all these objects, such as save them, plot them, do a regression analysis on them, delete them, etc. then you can do that using lapply/sapply as well in a simpler way than looping. On Fri, Aug 8, 2014 at 12:25 PM, Fix Ace wrote: > I have 16 files and would like to check the information of their first two > lines, what I did: > > >> ls(pattern="P_") > [1] "P_3_utr_source_data" "P_5_utr_source_data" > [3] "P_exon_per_gene_cds_source_data" "P_exon_per_gene_source_data" > [5] "P_exon_source_data""P_first_exon_oncds_source_data" > [7] "P_first_intron_oncds_source_data" "P_first_intron_ongene_source_data" > [9] "P_firt_exon_ongene_source_data""P_gene_cds_source_data" > [11] "P_gene_source_data""P_intron_source_data" > [13] "P_last_exon_oncds_source_data" "P_last_exon_ongene_source_data" > [15] "P_last_intron_oncds_source_data" "P_last_intron_ongene_source_data" > > > >>for(i in ls(pattern="P_")){head(i, 2)} > > It obviously does not work since nothing came out > > What I would like to see for the output is : > >> head(P_3_utr_source_data,2) > V1 > 1 1 > 2 1 >> head(P_5_utr_source_data,2) > V1 > 1 1 > 2 1 >> > . > > . > . > > > > Could anybody help me with this? > > Thank you very much for your time:) > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loops with assign() and get()
That code will not work. get() and assign() are troublesome for a variety of reasons. E.g., * adding made-up names to the current environment is dangerous. They may clobber fixed names in the environment. You may be confused about what the current environment is (especially when refactoring code). You can avoid this by using dataEnv <- new.env() to make an environment for your related objects and using the envir=dataEnv argument to get() and assign() to put the objects in there. However, once you go this route, you may as well use the syntax dataEnv[[name]] to refer to your objects instead of get(name, envir=dataEnv) and assign(name, value, envir=dataEnv). * replacement syntax like names(get(someName)) <- c("One", "Two") will not work. You have to use kludgy code like tmp <- get(someName) names(tmp) <- c("One", "Two") assign(someName, tmp) If you use the dataEnv[[name]] syntax then you can use the more normal looking names(dataEnv[[name]]) <- c("One", "Two") By the way, I do not think your suggested code will work - you call assign() before making a bunch of changes to dfi instead of after making the changes. I have not measured the memory implications of your method vs. using lapply on lists, but I don't think there is much of a difference in this case. (There can be a big difference when you are replacing the inputs by the outputs.) Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Aug 10, 2014 at 8:22 PM, PO SU wrote: > > It's a great method, but there is a memory problem, DFS would occupy a > large memory. So from this point of view, i prefer the loop. > >>> for (i in 1 : nrow(unique)){ >>> tmp=get(past0("DF",i))[1,] >>> assign(paste0("df",i),tmp) >>> dfi=dfi[,1:3] >>> names(dfi)=names(tmp[c(1,4,5)]) >>> dfi=rbind(dfi,tmp[c(1,4,5)]) >>> names(dfi)=c("UID","Date","Location") >>>} > > NB: The code above without any test! > > > > -- > PO SU > mail: desolato...@163.com > Majored in Statistics from SJTU > > > At 2014-08-10 06:32:38, "William Dunlap" wrote: >>> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc) >>> using >>> the assign() in a loop. >> >>The first step to making things easier to do is to put those data.frames >>into a list. I'll call it DFS and your data.frames will now be DFs[[1]], >>DFs[[2]], ..., DFs[[length(DFs)]]. >>DFs <- lapply(paste0("DFs", 1:102), get) >>In the future, I think it would be easier if you skipped the 'assign()' >>and just put the data into a list from the start. >> >>Now use lapply to process that list, creating a new list called 'df', where >>df[[i]] is the result of processing DFs[[i]]: >> >>df <- lapply(DFs, FUN=function(DFsi) { >> # your code from the for loop you supplied >> dfi=DFsi[1,] >> dfi=dfi[,1:3] >> names(dfi)=names(DFsi[c(1,4,5)]) >> dfi=rbind(dfi,DFsi[c(1,4,5)]) >> names(dfi)=c("UID","Date","Location") >> dfi # return this to put in list that lapply is >> making >> }) >> >>(You didn't supply sample data so I did not run this - there may be typos.) >> >>Bill Dunlap >>TIBCO Software >>wdunlap tibco.com >> >> >>On Sat, Aug 9, 2014 at 1:39 PM, Laura Villegas Ortiz >> wrote: >>> Dear all, >>> >>> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc) >>> using >>> the assign() in a loop. >>> >>> Now, I would like to perform the following transformation for each one of >>> these dataframes: >>> >>> df1=DFs1[1,] >>> df1=df1[,1:3] >>> names(df1)=names(DFs1[c(1,4,5)]) >>> df1=rbind(df1,DFs1[c(1,4,5)]) >>> names(df1)=c("UID","Date","Location") >>> >>> something like this: >>> >>> for (i in 1 : nrow(unique)){ >>> >>> dfi=DFsi[1,] >>> dfi=dfi[,1:3] >>> names(dfi)=names(DFsi[c(1,4,5)]) >>> dfi=rbind(dfi,DFsi[c(1,4,5)]) >>> names(dfi)=c("UID","Date","Location") >>> >>> } >>> >>> I thought it could be straightforward but has proven the opposite >>> >>> Many thanks >>> >>> Laura >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >>__ >>R-help@r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing graphs
whoops P1<- plot(grouped.data) Sent from my iPhone > On Aug 11, 2014, at 5:06, Naser Jamil wrote: > > Dear R-user, > May I seek your help to sort out a little problem. I have the following > codes > to draw two graphs. I want to superimpose the second one on each of the > first one. > > > > library(nlme) > subject<-c(1,1,1,2,2,2,3,3,3) > time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) > con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, > 0.10593292,1.20808375,0.47638394,0.02808967) > > data.d=data.frame(subject=subject,time=time,conc=con.cohort) > grouped.data<-groupedData(formula=conc~time | subject, data =data.d) > > plot(grouped.data) > > ## > > f1<-function(x,v,cl,t) { > (x/v)*exp(-(cl/v)*t) > } > t<-seq(0,30, .01) > plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time", > ylab="conc") > > > ### > > Any suggestion will really be helpful. > > > Regards, > > Jamil. > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley
The book is absolutely helpful to me. Any R newusers should read the book. Now I am reading the Section Rcpp. -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU At 2014-08-11 09:02:53, "Mitchell Maltenfort" wrote: >Ah, what do you know anyway? -- as the book critic said to the author. > >Ersatzistician and Chutzpahthologist > >I can answer any question. "I don't know" is an answer. "I don't know >yet" is a better answer. > >"I can write better than anybody who can write faster, and I can write >faster than anybody who can write better" AJ Leibling > > >On Mon, Aug 11, 2014 at 8:38 AM, Hadley Wickham wrote: >> Or just go to http://adv-r.had.co.nz/ ... >> >> Hadley >> >> On Sun, Aug 10, 2014 at 9:34 PM, John McKown >> wrote: >>> Well, it says that it's from Hadley Wickham. >>> >>> https://github.com/hadley/adv-r >>> >>> >>> This is code and text behind the Advanced R programming book. >>> >>> The site is built using jekyll, with a custom plugin to render .rmd >>> files with knitr and pandoc. To create the site, you need: >>> >>> jekyll and s3_websiter gems: gem install jekyll s3_website >>> pandoc >>> knitr: install.packages("knitr") >>> >>> >>> >>> This contains a Rstudio project file. I know because I've done a git >>> clone on it and loaded it into Rstudio, on Linux. If you don't have >>> git, there is a "download zip" option on the site too. >>> >>> -- >>> There is nothing more pleasant than traveling and meeting new people! >>> Genghis Khan >>> >>> Maranatha! <>< >>> John McKown >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> http://had.co.nz/ >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with assignment 1 part 1
On Mon, Aug 11, 2014 at 9:00 AM, michelle maurin wrote: > see code below > > > pollutantmean <- function(directory, pollutant, id = 1:332) { > files_list <- list.files(directory, full.names=TRUE) #creates a list of > files > dat <- data.frame()#creates an empty data frame > for (i in 1:332) { > dat <- rbind(dat, read.csv(files_list[i]))#loops through the files, > rbinding them together > } > > #subsets the rows that match the 'pollutant' argument > median(dat_subset$pollutant, na.rm=TRUE) #identifies the median of the > subset > } > > > > ##I highlighted the area that I think has the problem , I helped my self > using the tutorial found on the forum ,for assignment 1 I really think your not where you believe you are. This is an email list for general questions on the R language. I am not aware of an "the tutorial found on the forum". But I do think that I have an idea of what your problem is. Basically you want to find all the rows in "dat" which have a pollutant (dat$pollutant) of either "sulfate" or "nitrate". The which() function isn't going to do that for you. The which() function takes a logical vector of TRUE and FALSE values. It return an integer vector which has the index values of the TRUE entries. For example: > which(c(TRUE,FALSE,FALSE,TRUE,FALSE,TRUE)) [1] 1 4 6 I realise how this can be thought of as how to do this. And if could work, but is unnecessary in this case. But the real problem is the segment: dat["suflate","nitrate"] == pollutant If you would try this (I can't because I don't have the data files), you would see that this is not asking the right question. You want to see if dat$pollutant is either "suflate" or "nitrate". Or, expanding a bit you want to ask: 'is dat$pollutant equal to "suflate"? If not, is it equal to 'nitrate"?'. The answer to this question will be the proper logical vector that you can either use in the which() function, or directly as a row selector. The hint on how to ask this question is to use the ifelse() function properly. So your line (with the critical method of the proper use of ifelse) should look something like: dat_subset <- dat[which(ifelse(),]; #or, equivalently dat_subset <- dat[ifelse(???),]; This latter is valid because the R language will accept a logical vector as a "selector" and only return the data where the logical value is TRUE. I am deliberately leaving the challenge of how to use the ifelse() for you. Remember, from the documentation, that the form of the ifelse() is: ifelse(condition,result-if-condition-true,result-if-condition-false.) Hopefully this is a sufficient clew to get you going. I won't comment on the rest of the code because I don't know the problem. Or what "forum" you're talking about. > > Best regards > > > > Michelle > > > > > Date: Sun, 10 Aug 2014 22:06:38 -0500 > Subject: Re: [R] Problem with assignment 1 part 1 > From: john.archie.mck...@gmail.com > To: michimau...@hotmail.com > CC: r-help@r-project.org > > What code. > > Also, the forum has a "no homework" policy. Your subject implies this is > homework, so you might not get any answers. You might get a hint or two > though. > > On Aug 10, 2014 10:00 PM, "michelle maurin" wrote: > > I think my code is very close I can seem to be able to debug it Might be > something very simple I know the problem is on the last 3 lines of code can > you please help? > Thanks > Michelle > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing graphs
I think this is what you are looking for. library(latticeExtra) t.tmp <-seq(0,30, .01) P1 + layer(panel.xyplot(y=f1(0.5,0.5,0.06, t.tmp), x=t.tmp, type="l", col="black")) Notice that t is a very bad name for your variable as it is the name of a function. I used t.tmp instead. Rich On Mon, Aug 11, 2014 at 5:06 AM, Naser Jamil wrote: > Dear R-user, > May I seek your help to sort out a little problem. I have the following > codes > to draw two graphs. I want to superimpose the second one on each of the > first one. > > > > library(nlme) > subject<-c(1,1,1,2,2,2,3,3,3) > time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) > con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, > 0.10593292,1.20808375,0.47638394,0.02808967) > > data.d=data.frame(subject=subject,time=time,conc=con.cohort) > grouped.data<-groupedData(formula=conc~time | subject, data =data.d) > > plot(grouped.data) > > ## > > f1<-function(x,v,cl,t) { > (x/v)*exp(-(cl/v)*t) > } > t<-seq(0,30, .01) > plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time", > ylab="conc") > > > ### > > Any suggestion will really be helpful. > > > Regards, > > Jamil. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] building a BIGLM model from three tables (related)
Hi all, I wonder if you can help me. THE PROBLEM I want to train and test a GLM with some large datasets. I am running into some problems when I flatten my tables together to feed into the GLM model, as it produces a very large table which is far too big for the memory on my computer. THREE TABLES - Pipes, Weekly Weather data, Bursts I have three tables, which are all related to each other. (1)Pipe cohorts (114,000 rows) with a range of explanatory variables. ((1) Linking fields: (A) Pipe cohort ID, (B) weathercell_ID) (2)Explanatory Weekly Weather data 12 years (e.g. 624 weeks for each pipe cohort) ((2) Linking fields: (C) week, (B) weathercell_ID) (3)Bursts (40,000 bursts) ((3) Linking fields: (A) Pipe cohort ID, (C) week) Effectively, the combination of tables (1) and (2) make the population. Table (3) are the events, or failures. JOINING THE THREE TABLES I have previously had far fewer pipe cohort rows. What I have been doing till now is joining the (1) pipe cohorts data to the (2) weekly weather data. This repeats the pipe cohort data, each week, for the 12 years, which, now, makes a very long table e.g. 624 x 114,000 rows = 71 million rows. I would then join the (3) burst data to that to see how many bursts there were that week, on that pipe cohort. This made a large, flat file, which I could feed into GLM. This worked ok when there are not so many pipe cohorts, but now there are 114,000 rows, when I join the data tables I produce a MASSIVE table (many, many GB) which kills my computers. RELATIONAL DATABASE APPROACH? I am thinking it would be better to have a relational database structure where, for each data point (row) being brought into the BIGLM model, it take the three tables and looks up the appropriate values each time, using the defined join fields (A, B +C), feeds that into the model, then goes back and looks up the next point. ADVICE? How would you approach this problem? I have the data prepared in the three tables. I need to fit lots of models to see which variables give me the best AIC (output: lots of model fits) Then predict bursts using the best model and the available (1) pipe and (2) weather data Would you use the package BIGLM, linking to a sqlite database? (Or do something completely different?) Many thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [vegan]Envfit, pvalues and ggplot2
Good Morning, first let me thank you very much for answering my first two questions on this list. Currently, i do vegan's EnvFit to simple PCA ordinations. When drawing the biplot, one can set a cutoff to just fit the parameters with significant p-values (via p.max=0.05 in the plot command). There is already sufficient coverage on the net for biplotting this kind of data with ggplot2 (with the problem being the arrow length). http://stackoverflow.com/questions/14711470/plotting-envfit-vectors-vegan-package-in-ggplot2 However, what the solution is not covering is the exclusion from insignificant environmental parameters, as the score extraction process as described in the link only works with 'display="vectors"': |Envfit_scores<- as.data.frame(scores(list_from_envfit, display= "vectors")) | envfit creates lists like this: PC1PC2r2Pr(>r) param1-0.708820.705390.09940.000999*** param2-0.601220.799080.05930.000999*** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 P values based on 999 permutations. The list contains a vector called $pval containing the pvalues. So, i need to reduce the list created by envfit to rows meeting a criterion in $pval (via "unlist" and "which", i suppose). However, i have difficulties to work out the correct code. Any help is much appreciated! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Superimposing graphs
Dear R-user, May I seek your help to sort out a little problem. I have the following codes to draw two graphs. I want to superimpose the second one on each of the first one. library(nlme) subject<-c(1,1,1,2,2,2,3,3,3) time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, 0.10593292,1.20808375,0.47638394,0.02808967) data.d=data.frame(subject=subject,time=time,conc=con.cohort) grouped.data<-groupedData(formula=conc~time | subject, data =data.d) plot(grouped.data) ## f1<-function(x,v,cl,t) { (x/v)*exp(-(cl/v)*t) } t<-seq(0,30, .01) plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time", ylab="conc") ### Any suggestion will really be helpful. Regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley
Ah, what do you know anyway? -- as the book critic said to the author. Ersatzistician and Chutzpahthologist I can answer any question. "I don't know" is an answer. "I don't know yet" is a better answer. "I can write better than anybody who can write faster, and I can write faster than anybody who can write better" AJ Leibling On Mon, Aug 11, 2014 at 8:38 AM, Hadley Wickham wrote: > Or just go to http://adv-r.had.co.nz/ ... > > Hadley > > On Sun, Aug 10, 2014 at 9:34 PM, John McKown > wrote: >> Well, it says that it's from Hadley Wickham. >> >> https://github.com/hadley/adv-r >> >> >> This is code and text behind the Advanced R programming book. >> >> The site is built using jekyll, with a custom plugin to render .rmd >> files with knitr and pandoc. To create the site, you need: >> >> jekyll and s3_websiter gems: gem install jekyll s3_website >> pandoc >> knitr: install.packages("knitr") >> >> >> >> This contains a Rstudio project file. I know because I've done a git >> clone on it and loaded it into Rstudio, on Linux. If you don't have >> git, there is a "download zip" option on the site too. >> >> -- >> There is nothing more pleasant than traveling and meeting new people! >> Genghis Khan >> >> Maranatha! <>< >> John McKown >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > http://had.co.nz/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley
Or just go to http://adv-r.had.co.nz/ ... Hadley On Sun, Aug 10, 2014 at 9:34 PM, John McKown wrote: > Well, it says that it's from Hadley Wickham. > > https://github.com/hadley/adv-r > > > This is code and text behind the Advanced R programming book. > > The site is built using jekyll, with a custom plugin to render .rmd > files with knitr and pandoc. To create the site, you need: > > jekyll and s3_websiter gems: gem install jekyll s3_website > pandoc > knitr: install.packages("knitr") > > > > This contains a Rstudio project file. I know because I've done a git > clone on it and loaded it into Rstudio, on Linux. If you don't have > git, there is a "download zip" option on the site too. > > -- > There is nothing more pleasant than traveling and meeting new people! > Genghis Khan > > Maranatha! <>< > John McKown > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C.D.F
On 11/08/14 20:17, pari hesabi wrote: Hello everybody, Can anybody help me to write a program for the CDF of sum of two independent gamma random variables ( covolution of two gamma distributions) with different amounts of parameters( the shape parameters are the same)? Is this homework? The list has a no homework policy. cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C.D.F
Dear Diba, you could try package distr; eg. library(distr) G1 <- Gammad(scale = 0.7, shape = 0.5) G2 <- Gammad(scale = 2.1, shape = 1.7) G3 <- G1+G2 # convolution G3 For the convolution exact formulas are applied if available, otherwise we use FFT; see also http://www.jstatsoft.org/v59/i04/ (will appear soon) resp. a previous version at http://arxiv.org/abs/1006.0764 hth Matthias Am 11.08.2014 um 10:17 schrieb pari hesabi: Hello everybody, Can anybody help me to write a program for the CDF of sum of two independent gamma random variables ( covolution of two gamma distributions) with different amounts of parameters( the shape parameters are the same)? Thank you Diba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Prof. Dr. Matthias Kohl www.stamats.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help in using Rcpp
Dear Rcppusers, I can't figure out what the following codes do: int f4(Function pred, List x) { int n = x.size(); for(int i = 0; i < n; ++i) { LogicalVector res = pred(x[i]); if (res[0]) return i + 1; } return 0; } I investigated it, and understand applying a function to everypart of a list then return a LogicalVector, but i can't understand if the first element of the LogicalVector is true then return the row index of the list. Tks! -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loops with assign() and get()
It's a great method, but there is a memory problem, DFS would occupy a large memory. So from this point of view, i prefer the loop. >> for (i in 1 : nrow(unique)){ >> tmp=get(past0("DF",i))[1,] >> assign(paste0("df",i),tmp) >> dfi=dfi[,1:3] >> names(dfi)=names(tmp[c(1,4,5)]) >> dfi=rbind(dfi,tmp[c(1,4,5)]) >> names(dfi)=c("UID","Date","Location") >>} NB: The code above without any test! -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU At 2014-08-10 06:32:38, "William Dunlap" wrote: >> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc) using >> the assign() in a loop. > >The first step to making things easier to do is to put those data.frames >into a list. I'll call it DFS and your data.frames will now be DFs[[1]], >DFs[[2]], ..., DFs[[length(DFs)]]. >DFs <- lapply(paste0("DFs", 1:102), get) >In the future, I think it would be easier if you skipped the 'assign()' >and just put the data into a list from the start. > >Now use lapply to process that list, creating a new list called 'df', where >df[[i]] is the result of processing DFs[[i]]: > >df <- lapply(DFs, FUN=function(DFsi) { > # your code from the for loop you supplied > dfi=DFsi[1,] > dfi=dfi[,1:3] > names(dfi)=names(DFsi[c(1,4,5)]) > dfi=rbind(dfi,DFsi[c(1,4,5)]) > names(dfi)=c("UID","Date","Location") > dfi # return this to put in list that lapply is making > }) > >(You didn't supply sample data so I did not run this - there may be typos.) > >Bill Dunlap >TIBCO Software >wdunlap tibco.com > > >On Sat, Aug 9, 2014 at 1:39 PM, Laura Villegas Ortiz wrote: >> Dear all, >> >> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc) using >> the assign() in a loop. >> >> Now, I would like to perform the following transformation for each one of >> these dataframes: >> >> df1=DFs1[1,] >> df1=df1[,1:3] >> names(df1)=names(DFs1[c(1,4,5)]) >> df1=rbind(df1,DFs1[c(1,4,5)]) >> names(df1)=c("UID","Date","Location") >> >> something like this: >> >> for (i in 1 : nrow(unique)){ >> >> dfi=DFsi[1,] >> dfi=dfi[,1:3] >> names(dfi)=names(DFsi[c(1,4,5)]) >> dfi=rbind(dfi,DFsi[c(1,4,5)]) >> names(dfi)=c("UID","Date","Location") >> >> } >> >> I thought it could be straightforward but has proven the opposite >> >> Many thanks >> >> Laura >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GSoC 2014 - an R package for working with RRD files
Hello, I'm taking part in Google Summer of Code 2014 wih Ganglia and I spent the past few months implementing an R package that makes it possible to directly import and work with RRD (http://oss.oetiker.ch/rrdtool/) files in R. There are currently three ways to use the package: - importRRD("filename", "cf", start, stop, step) - returns a data.frame containing the desired portion of an RRA - importRRD('filename") - imports everything in the RRD file into a list of data.frame objects (one per RRA) (the metadata is read and appropriate names are given to columns and list elements) - getVal("filename", "cf", step, timestamp) - optimized for getting the values at a specific timestamp, uses a cache to minimize the read frequency Please, feel free to install and test the package: https://github.com/ pldimitrov/Rrd I'm now getting close to finishing it so any feedback is more than welcome! I'm especially worried about getting: stack imbalance in '.Call', 28 then 29 warnings. Perhaps my protects are not matching the unprotects? Could you suggest a good way to debug this? Thanks, Plamen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C.D.F
Hello everybody, Can anybody help me to write a program for the CDF of sum of two independent gamma random variables ( covolution of two gamma distributions) with different amounts of parameters( the shape parameters are the same)? Thank you Diba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.