Re: [R] ddply with mean and max...
Thats the ticket! So mean is already set up to operate on columns but max and min are not? I guess its not too important now I know ... but whats going on in the background that makes that happen? Basically, this: mean.data.frame function (x, ...) sapply(x, mean, ...) environment: namespace:base min.data.frame Error: object 'min.data.frame' not found There was some discussion on r-devel recently about removing mean.data.frame to be consistent with the other summary functions (plus the way it's currently written makes it prone to problems) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple loop
Using paste(Site,Prof) when calling ave() is ugly, in that it forces you to consider implementation details that you expect ave() to take care of (how does paste convert various types to strings?). It also courts errors since paste(A B, C) and paste(A, B C) give the same result but represent different Site/Prof combinations. Well, ave() uses interaction(...) and interaction() has a drop argument, so with(x, ave(H, Site, Prof, drop=TRUE, FUN=function(y)y-min(y))) [1] 8 0 51 0 33 22 21 0 I don't understand why this isn't the default. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Empty Data Frame
On Wed, Apr 27, 2011 at 4:58 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: You could try something like df - data.frame( expand.grid( Week = 1:52, Year = 2002:2011 )) expand.grid already returns a data frame... You might want KEEP.OUT.ATTRS = F though. Even it feels like you are yelling at R. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setting options only inside functions
This has the side effect of ignoring errors and even hiding the error messages. If you are concerned about multiple calls to on.exit() in one function you could define a new function like withOptions - function(optionList, expr) { oldOpts - options(optionList) on.exit(options(oldOpts)) expr # lazily evaluate } I wish R had more functions like this. This sort of behaviour is also useful when you open connections or change locales. Ruby's blocks provide nice syntactic sugar for this idea. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MASS fitdistr with plyr or data.table?
On Wed, Apr 27, 2011 at 3:55 PM, Justin Haynes jto...@gmail.com wrote: I am trying to extract the shape and scale parameters of a wind speed distribution for different sites. I can do this in a clunky way, but I was hoping to find a way using data.table or plyr. However, when I try I am met with the following: set.seed(144) weib.dist-rweibull(1,shape=3,scale=8) weib.test-data.table(cbind(1:10,weib.dist)) names(weib.test)-c('site','wind_speed') fitted-weib.test[,fitdistr(wind_speed,'weibull'),by=site] Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) : invalid to set the class to matrix unless the dimension attribute is of length 2 (was 0) In addition: Warning messages: 1: In dweibull(x, shape, scale, log) : NaNs produced ... 10: In dweibull(x, shape, scale, log) : NaNs produced (the warning messages are normal from what I can tell) or using plyr: set.seed(144) weib.dist-rweibull(1,shape=3,scale=8) weib.test.too-data.frame(cbind(1:10,weib.dist)) names(weib.test.too)-c('site','wind_speed') fitted-ddply(weib.test.too,.(site),fitdistr,'weibull') Well fitdistr doesn't return a data frame, so you need to do something to its output... Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setting options only inside functions
Put together a list and we can see what might make sense. If we did take this on it would be good to think about providing a reasonable mechanism for addressing the small flaw in this function as it is defined here. In devtools, I have: #' Evaluate code in specified locale. with_locale - function(locale, expr) { cur - Sys.getlocale(category = LC_COLLATE) on.exit(Sys.setlocale(category = LC_COLLATE, locale = cur)) Sys.setlocale(category = LC_COLLATE, locale = locale) force(expr) } (Using force here just to be clear about what's going on) Bill discussed options(). Other ideas (mostly from skimming apropos(set)): * graphics devices (i.e. automatic dev.off) * working directory (as in the chdir argument to sys.source) * environment variables (Sys.setenv/Sys.getenv) * time limits (as replacement for transient = T) For connections it would be nice to have something like: with_connection - function(con, expr) { open(con) on.exist(close(con)) force(expr) } but it's a little clumsy, because with_connection(file(myfile.txt), {do stuff...}) isn't very useful because you have no way to reference the connection that you're using. Ruby's blocks have arguments which would require big changes to R's syntax. One option would to use pronouns: with_connection - function(con, expr) { open(con) on.exist(close(con)) env - new.env(parent = parent.frame() env$.it - con eval(substitute(expr), env) } or anonymous functions: with_connection - function(con, f) { open(con) on.exist(close(con)) f(con) } Neither of which seems particularly appealing to me. (I didn't test any of this code, so no guarantees that it works, but hopefully you see the ideas) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). How do you get POSIXlt objects into a data frame? df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: 2008-01-01 df - data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01) str(df) 'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: 0 Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] taking rows from data.frames in list to form new data.frame?
On Wed, Apr 20, 2011 at 6:36 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: Perhaps you're looking for subset()? I'm not sure I understand the problem completely, but is do.call(rbind, lapply(database, function(df) subset(df, Symbol == 'IBM'))) or library(plyr) ldply(lapply(database, function(df) subset(df, Symbol == 'IBM'), rbind) That's a bit redundant. All you need is: ldply(database, function(df) subset(df, Symbol == 'IBM')) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Yes, it's fixed and a new version of plyr has been pushed up to cran - hopefully will be available for download soon. In the meantime, I think you can fix it by running library(stats) before library(ggplot2). Hadley On Sun, Apr 17, 2011 at 3:51 PM, Bryan Hanson han...@depauw.edu wrote: Is there any news on this issue? I have the same problem but on a Mac. I have upgraded R and updated the built packages. The console output and sessionInfo are below. The problem is triggered by library(ggplot2) my .Rprofile If I do library(ggplot2) after the aborted start up ggplot2 is loaded properly, and I can manually do everything in my .Rprofile and my configuration is as originally intended. Thanks, Bryan Console Output: Loading required package: reshape Loading required package: plyr Attaching package: 'reshape' The following object(s) are masked from 'package:plyr': rename, round_any Loading required package: grid Loading required package: proto Error in rename(x, .base_to_ggplot) : could not find function setNames Error : unable to load R code in package 'ggplot2' Error: package/namespace load failed for 'ggplot2' [R.app GUI 1.40 (5751) x86_64-apple-darwin9.8.0] [History restored from /Users/bryanhanson/.Rhistory] and here is my session info after the aborted start up: R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets grid methods base other attached packages: [1] proto_0.3-9.1 reshape_0.8.4 plyr_1.5.1 lattice_0.19-23 * Original Post from Stephen Sefick I have just upgraded to R 2.13 and have library(ggplot2) in my .Rprofile (among other things). when i start R I get an error message. Has something in the start up scripts changed? Is there a better way to specify the library calls in .Rprofile? Thanks for all of the help in advance. Error: Loading required package: grid Loading required package: proto Error in rename(x, .base_to_ggplot) : could not find function setNames Error : unable to load R code in package 'ggplot2' Error: package/namespace load failed for 'ggplot2' [Previously saved workspace restored] Computer 1: R version 2.13.0 (2011-04-13) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] proto_0.3-9.1 reshape_0.8.4 plyr_1.5.1 Computer 2 R version 2.13.0 (2011-04-13) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] proto_0.3-9.1 reshape_0.8.4 plyr_1.5.1 -- Stephen Sefick | Auburn University | | Biological Sciences | | 331 Funchess Hall | | Auburn, Alabama | | 36849 | |___| | sas0...@auburn.edu | | http://www.auburn.edu/~sas0025 | |___| -- View this message in context: http://r.789695.n4.nabble.com/no-subject-tp3454416p3456100.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a better way to parse strings than this?
I was trying strsplit(string,\.\.\.) as per the suggestion in Venables and Ripleys book to (use '\.' to match '.'), which is in the Regular expressions section. I noticed that in the suggestions sent to me people used: strsplit(test,\\.\\.\\.) Could anyone please explain why I should have used \\.\\.\\. rather than \.\.\.? Basically, * you want to match . * so the regular expression you need is \. * and the way you represent that in a string in R is \\. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a better way to parse strings than this?
On Wed, Apr 13, 2011 at 5:18 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Here's one approach: strings - c( A5.Brands.bought...Dulux, A5.Brands.bought...Haymes, A5.Brands.bought...Solver, A5.Brands.bought...Taubmans.or.Bristol, A5.Brands.bought...Wattyl, A5.Brands.bought...Other) slist - strsplit(strings, '\\.\\.\\.') Or with stringr: library(stringr) str_split_fixed(strings, fixed(...), n = 2) # or maybe str_match(strings, (..).*\\.\\.\\.(.*)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R plots pdf() does not allow spotcolors?
Even so, this would depend on what your publisher/printer requires in what you submit. It would be important to obtain from them a full and exact specification of what they require for colour printing in files submitted to them for printing. No one else has mentioned this, but the publisher is trying to make money, not make your life easier. Sometimes the right thing to do is say Hey - you guys are the experts at this, you convert my RGB pdfs to the correct format. It's worthwhile to push back a bit to publishers and get them to do their job. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Line plots in base graphics
Am I missing something obvious on how to draw multi-line plots in base graphics? In ggplot2, I can do: data(Oxboys, package = nlme) library(ggplot2) qplot(age, height, data = Oxboys, geom = line, group = Subject) But in base graphics, the best I can come up with is this: with(Oxboys, plot(age, height, type = n)) lapply(split(Oxboys[c(age, height)], Oxboys$Subject), lines) Am I missing something obvious? Thanks! Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Line plots in base graphics
On Wed, Apr 13, 2011 at 2:58 PM, Ben Bolker bbol...@gmail.com wrote: Hadley Wickham hadley at rice.edu writes: Am I missing something obvious on how to draw multi-line plots in base graphics? In ggplot2, I can do: data(Oxboys, package = nlme) library(ggplot2) qplot(age, height, data = Oxboys, geom = line, group = Subject) But in base graphics, the best I can come up with is this: with(Oxboys, plot(age, height, type = n)) lapply(split(Oxboys[c(age, height)], Oxboys$Subject), lines) [quoting removed to fool gmane] Am I missing something obvious? reshape to wide format and matplot()? Hmmm, that doesn't work if your measurements are at different times e.g: Oxboys2 - transform(Oxboys, age = age + runif(234)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: CRAN problem with plyr-1.4.1
Then, can we have the ERROR message, please? Otherwise the only explanation I can guess is that a mirror grabs the contents of a repository exactly in the second the repository is updated and that is unlikely, particularly if more than one mirror is involved. Isn't one possible explanation that PACKAGES.gz on the mirror was updated before the package directory was? That would seem to me a plausible hypothesis to me, because rsync seems to send files in the top level directory before files in directories below. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] plyr: version 1.5
# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise transformations like scaling or standardising It's already possible to do this with base R functions (like split and the apply family of functions), but plyr makes it all a bit easier with: * totally consistent names, arguments and outputs * convenient parallelisation through the foreach package * input from and output to data.frames, matrices and lists * progress bars to keep track of long running operations * built-in error recovery, and informative error messages * labels that are maintained across all transformations Considerable effort has been put into making plyr fast and memory efficient, and in many cases plyr is as fast as, or faster than, the built-in equivalents. A detailed introduction to plyr has been published in JSS: The Split-Apply-Combine Strategy for Data Analysis, http://www.jstatsoft.org/v40/i01/. You can find out more at http://had.co.nz/plyr/, or track development at http://github.com/hadley/plyr. You can ask questions about plyr (and data manipulation in general) on the plyr mailing list. Sign up at http://groups.google.com/group/manipulatr. Version 1.5 -- NEW FEATURES * new `strip_splits` function removes splitting variables from the data frames returned by `ddply`. * `rename` moved in from reshape, and rewritten. * new `match_df` function makes it easy to subset a data frame to only contain values matching another data frame. Inspired by http://stackoverflow.com/questions/4693849. BUG FIXES * `**ply` now works when passed a list of functions * `*dply` now correctly names output even when some output combinations are missing (NULL) (Thanks to bug report from Karl Ove Hufthammer) * `*dply` preserves the class of many more object types. * `a*ply` now correctly works with zero length margins, operating on the entire object (Thanks to bug report from Stavros Macrakis) * `join` now implements joins in a more SQL like way, returning all possible matches, not just the first one. It is still a (little) faster than merge. The previous behaviour is accessible with `match = first`. * `join` is now more symmetric so that `join(x, y, left)` is closer to `join(y, x, right)`, modulo column ordering * `named.quoted` failed when quoted expressions were longer than 50 characters. (Thanks to bug report from Eric Goldlust) * `rbind.fill` now correctly maintains POSIXct tzone attributes and preserves missing factor levels * `split_labels` correctly preserves empty factor levels, which means that `drop = FALSE` should work in more places. Use `base::droplevels` to remove levels that don't occur in the data, and `drop = T` to remove combinations of levels that don't occur. * `vaggregate` now passes `...` to the aggregation function when working out the output type (thanks to bug report by Pavan Racherla) -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R licence
If all you need is loess, I suspect it would be cheaper to re-write it in C# than to get a considered legal opinion on the matter. Hadley On Thu, Apr 7, 2011 at 2:45 AM, Stanislav Bek stanislav.pavel@gmail.com wrote: Hi, is it possible to use some statistic computing by R in proprietary software? Our software is written in c#, and we intend to use http://rdotnet.codeplex.com/ to get R work there. Especially we want to use loess function. Thanks, Best regards, Stanislav [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Windrose Percent Interval Frequencies Are Non Linear! Help!
Does anyone with specific windrose experience know how to adjust the graphic such that the data and the percent intervals are evenly spaced? Hopefully I am making sense here How about giving us a reproducible example? Code is better than mere description; code + description is best. The problem is probably that A = pi r ^2, and the percent intervals are spaced evenly on the square root scale to keep areas from being distorted. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data list in to single data frame
filelist = list.files(pattern = K*cd.txt) # the file names are K1cd.txt .to K200cd.txt It's very easy: names(filelist) - basename(filelist) data_list - ldply(filelist, read.table, header=T, comment=;, fill=T) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset and as.POSIXct / as.POSIXlt oddness
On Thu, Mar 24, 2011 at 8:29 AM, Michael Bach pha...@gmail.com wrote: Dear R users, Given this data: x - seq(1,100,1) dx - as.POSIXct(x*900, origin=2007-06-01 00:00:00) dfx - data.frame(dx) Now to play around for example: subset(dfx, dx as.POSIXct(2007-06-01 16:00:00)) Ok. Now for some reason I want to extract the datapoints between hours 10:00:00 and 14:00:00, so I thought well: subset(dfx, dx as.POSIXct(2007-06-01 16:00:00), 14 as.POSIXlt(dx)$hour as.POSIXlt(dx)$hour 10) Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied As others have noted you used a , instead of . I wanted to point out that this is a little easier to express with the lubridate package: subset(dfx, dx ymd(2007-06-01) hour(dx) 14 hour(x) 10) but I presume you meant: subset(dfx, dx ymd(2007-06-01) hour(dx) 10 hour(x) 14) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How create vector that sums correct responses for multiple subjects?
On Thu, Mar 24, 2011 at 2:24 PM, Kevin Burnham kburn...@gmail.com wrote: I have a data file with indicates pretest scores for a linguistics experiment. The data are in long form so for each of 33 subjects there are 400 rows, one for each item on the test, and there is a column called ‘Correct’ that shows ‘C’ for a correct response and ‘E’ for an incorrect response. I am trying to write a formula that will create a vector that indicates the number of correct answers for each subject. nrow(pretestdata[(pretestdata$Subject==1 pretestdata$Correct==C),]) gives the number of correct responses for subject 1, but I would like a vector that indicates the number correct for each of 33 subjects. How about with(pretestdata, table(Subject, Correct)) ? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated
I don't doubt that R may be the most popular in terms of discussion group traffic, but you should be aware that the traffic for SAS comprises two separate lists that used to be mirrored, but are no longer linked Usenet -- news://comp.soft-sys.sas (what you counted) listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html R programming challenge: create a script that parses those html pages to compute the total number of messages per week! (Maybe I'll use this in class) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning to list element within target environment
On Thu, Mar 17, 2011 at 7:25 AM, Richard D. Morey r.d.mo...@rug.nl wrote: I would like to assign an value to an element of a list contained in an environment. The list will contain vectors and matrices. Here's a simple example: # create toy environment testEnv = new.env(parent = emptyenv()) # create list that will be in the environment, then assign() it x = list(a=1,b=2) assign(xList,x,testEnv) # create new element, to be inserted into xList c = 5:7 Now, what I'd like to do is something like this: assign(xList[[3]],c,testEnv) testEnv$x[[3]] - c ? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
2) I don't want to fit data with linear model of zero intercept. 3) I dont know if I understand correctly. Im 100% sure the model for my data should have zero intercept. The only coordinate which Im 100% sure is correct. If I had measured quality Y of a same sample X0 number of times I would get E(Y(X0))=0. Are points 2) and 3) not contradictory? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Persistent storage between package invocations
No. First, please use path.expand(~) for this, and it does not necessarily mean the home directory (and in principle it might not expand at all). In practice I think it will always be *a* home directory, but on Windows there may be more than one (and watch out for local/roaming profile differences). Ok - I did remember that something like path.expand existed, I just couldn't find it. (And I always get confused by the difference between normalizePath and path.expand). Second, it need not be writeable, and so many package authors write rubbish in my home directory that I usually arrange it not be writeable to R test processes. So at a minimum I need to check if the home directory is writeable, and fail gracefully if not. What about using the registry on windows? Does R provide any convenience functions for adding/accessing entries? If you want something writeable across processes, use dirname(tempdir()) . I was really looking for options to be persistent between instances - i.e. so you decide once, and not need to be asked again. In a similar way, it would be nice if you could choose a CRAN mirror once and then not be asked again - and not need to know anything about how to set options during startup. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] File Save As...
No, defaults are evaluated in the evaluation frame of the function. That's why you can use local variables in them, e.g. the way rgamma uses 1/rate as a default for scale. Oops, yes, I was getting confused with promises - non-missing arguments are promises evaluated in the parent frame. But the point isn't evaluation here: the point is the parsing. A function gets its source attribute when it is parsed, so getSrcFilename needs to be passed something that was parsed in the script. Still, it would be nice to have a function that, by default, would return the location of the calling script. You can also hack something together using sys.frames(), but it would be nice to have official R support for it. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] proportional symbol map ggplot
On Mon, Mar 14, 2011 at 9:41 AM, Strategische Analyse CSD Hasselt csd...@fedpolhasselt.be wrote: Hello, we want to plot a proportional symbol map with ggplot. Symbols' area should have the same proportions as the scaled variable. Hereby an example we found on http://www.r-bloggers.com/bubble-chart-by-using-ggplot2/ . In this example we see the proportions of the symbols' area are different from the proportions of the scaled variable: crime - read.csv(http://datasets.flowingdata.com/crimeRatesByState2008.csv;, header=TRUE, sep=\t) p - ggplot(crime, aes(murder,burglary,size=population, label=state)) p - p+geom_point(colour=red) +scale_area(to=c(1,20))+geom_text(size=3) Example: proportion population Pennsylvania/Tennessee= 2.003 proportion symbols' area Pennsylvania/Tennessee= +/- 2.50 proportion population California/Florida= 2.005 proportion symbols' area California/Florida= +/-2.25 What we would like is that the proportion of the symbols' area is also equal to 2.0. To do that you need to make sure the lower limit extends to 0 and the size of the smallest circle is also 0. I think something like scale_area(to=c(0, 20), limits = c(0, 4e7), breaks = 1:4 * 1e7) should suffice. It would also be helpful if you stated how you calculated the areas. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does R have a const object?
Its useful for being able to set defaults for arguments that do not have defaults. That cannot break existing programs. Until the next program decides do co change those defaults and either can't or does and you end up with incompatible assumptions. It also make the code with the added defaults inconsistent with the documentation though, which is not a good idea. It may seem convenient but it isn't a good idea in production code that is intended to play well with other production code. I like the name the ruby community has for these sort of changes: monkey patching. It's an evocative term! Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Persistent storage between package invocations
Hi all, Does anyone have any advice or experience storing package settings between R runs? Can I rely on the user's home directory (e.g. tools::file_path_as_absolute(~)) to be available and writeable across platforms? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing colour of continuous time-series in ggplot2
You need to specify the group aesthetic - that defines how observations are grouped into instances of a geom. Hadley On Tue, Mar 15, 2011 at 8:37 AM, joeP joseph.parr...@bt.com wrote: Hi, This seems like there should be a simple answer, but having spent most of the day trying to find it, I'm becoming less convinced and as such am asking it here. Here's a sub-set of my data (a data.frame in R): myDF time value trial 1 2011-03-01 01:00:00 64092 FALSE 2 2011-03-01 02:00:00 47863 FALSE 3 2011-03-01 03:00:00 43685 FALSE 4 2011-03-01 04:00:00 44821 TRUE 5 2011-03-01 05:00:00 48610 TRUE 6 2011-03-01 06:00:00 44856 TRUE 7 2011-03-01 07:00:00 55199 TRUE 8 2011-03-01 08:00:00 69326 FALSE 9 2011-03-01 09:00:00 84048 FALSE 10 2011-03-01 10:00:00 81341 FALSE From this, I can plot a simple time-series in ggplot: ggplot(myDF, aes(time,value)) + geom_line() but I'd like to change the colour of the line based on whether the trial value is TRUE or FALSE, so I try: ggplot(myDF, aes(time,value)) + geom_line(aes(colour=trial)) but this draws a line from the value on row 3 to that on row 8 (essentially plotting TRUE and FALSE as separate data-sets). I've tried using various other geometries (inc. geom_path()) but all have produced similar events. Is there a way I can plot the time-series in a continuous way (i.e. as one data-set) and change only the colour of the line? Thanks, Joe -- View this message in context: http://r.789695.n4.nabble.com/Changing-colour-of-continuous-time-series-in-ggplot2-tp3356582p3356582.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] File Save As...
The bigger issue is that R can't tell the location of an open script, which makes it harder to create new versions of existing work But it can. If you open a script and choose save, it will be saved to the same place. Or do you mean an executing script? There are indirect ways to find the name of the executing script. For example, in R-devel (to become 2.13.0 next month), you can do this: cat(This file is , getSrcFilename(function(){}, full=TRUE), \n) The getSrcFilename() function will be new in 2.13.0. You can do the same in earlier versions, but you need to program it yourself. Could getSrcFilename() gain a default argument so that getSrcFilename() would by default return the path of the executing script? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] File Save As...
Could getSrcFilename() gain a default argument so that getSrcFilename() would by default return the path of the executing script? No, it needs to see a function defined in that script. But I thought default arguments were evaluated in the parent environment? Does that not follow for source attributes as well? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe to a timeseries object
Well, I'd start by removing all explicit use of environments, which makes you code very hard to follow. Hadley On Monday, March 14, 2011, Daniele Amberti daniele.ambe...@ors.it wrote: I found that plyr:::daply is more efficient than base:::by (am I doing something wrong?), below updated code for comparison (I also fixed a couple things). Function daply from plyr package has also a .parallel argument and I wonder if creating timeseries objects in parallel and then combining them would be faster (Windows XP platform); does someone has experience with this topic? I found only very simple examples about plyr and parallel computations and I do not have a working example for such kind of implementation (daply that return a list of timeseries objects). Thanks in advance, Daniele Amberti set.seed(123) N - 1 X - data.frame( ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4)), VALUE = runif(N*4), stringsAsFactors = FALSE) X - X[sample(1:(N*4), N*4),] str(X) library(timeSeries) buildTimeSeriesFromDataFrame - function(x, env) { { if(exists(xx, envir = env)) assign(xx, cbind(get(xx, env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1]))), envir = env) else assign(xx, timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1])), envir = env) return(TRUE) } } tsBy - function(...) { e1 - new.env(parent = baseenv()) res - by(X, X$ID, buildTimeSeriesFromDataFrame, env = e1, simplify = TRUE) return(get(xx, e1)) } Time01 - replicate(100, system.time(tsBy(X, X$ID, simplify = TRUE))[[1]]) median(Time01) hist(Time01) ATS - tsBy(X, X$ID, simplify = TRUE) library(xts) buildXtsFromDataFrame - function(x, env) { { if(exists(xx, envir = env)) assign(xx, cbind(get(xx, env), xts(x$VALUE, as.POSIXct(x$DATE, tz = GMT, format = '%Y-%m-%d %H:%M:%S'), tzone = 'GMT')), envir = env) else assign(xx, xts(x$VALUE, as.POSIXct(x$DATE, tz = GMT, format = '%Y-%m-%d %H:%M:%S'), tzone = 'GMT'), envir = env) return(TRUE) } } xtsBy - function(...) { e1 - new.env(parent = baseenv()) res - by(X, X$ID, buildXtsFromDataFrame, env = e1, simplify = TRUE) return(get(xx, e1)) } Time02 - replicate(100, system.time(xtsBy(X, X$ID,simplify = TRUE))[[1]]) median(Time02) hist(Time02) AXTS - xtsBy(X, X$ID, simplify = TRUE) plot(density(Time02), col = red, xlim = c(min(c(Time02, Time01)), max(c(Time02, Time01 lines(density(Time01), col = blue) #check equal, a still a problem with names AXTS2 - as.timeSeries(AXTS) names(AXTS2) - names(ATS) identical(getDataPart(ATS), getDataPart(AXTS2)) identical(time(ATS), time(AXTS2)) # with plyr library and daply instead of by: library(plyr) tsDaply - function(...) { e1 - new.env(parent = baseenv()) res - daply(X, ID, buildTimeSeriesFromDataFrame, env = e1) return(get(xx, e1)) } Time03 - replicate(100, system.time(tsDaply(X, X$ID))[[1]]) median(Time03) hist(Time03) xtsDaply - function(...) { e1 - new.env(parent = baseenv()) res - daply(X, ID, buildXtsFromDataFrame, env = e1) return(get(xx, e1)) } Time04 - replicate(100, system.time(xtsDaply(X, X$ID))[[1]]) median(Time04) hist(Time04) plot(density(Time04), col = red, xlim = c( min(c(Time02, Time01, Time03, Time04)), max(c(Time02, Time01, Time03, Time04))), ylim = c(0,100)) lines(density(Time03), col = blue) lines(density(Time02)) lines(density(Time01)) -Original Message- From: Daniele Amberti Sent: 11 March 2011 14:44 To: r-help@r-project.org Subject: dataframe to a timeseries object I’m wondering which is the most efficient (time, than memory usage) way to obtain a multivariate time series object from a data frame (the easiest data structure to get data from a database trough RODBC). I have a starting point using timeSeries or xts library (these libraries can handle time zones), below you can find code to test. Merging parallelization (cbind) is something I’m thinking at (suggestions from users with experience on this topic is highly appreciated), any suggestion is welcome. My platform is Windows XP, R 2.12.1, latest available packages on CRAN for timeSeries and xts. set.seed(123) N - 9000 X - data.frame( ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), DATE = rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4), VALUE = runif(N*4)) library(timeSeries) buildTimeSeriesFromDataFrame - function(x, env) { { if(exists(xx, envir = env)) assign(xx, cbind(get(xx, env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d
Re: [R] dataframe to a timeseries object
That's a bit better, but you're still creating an object in the global environment, when you should be returning it from your function. Hadley On Mon, Mar 14, 2011 at 8:54 AM, Daniele Amberti daniele.ambe...@ors.it wrote: Thanks Hadley for Your interest, below some code without environments use (using timeSeries); I also made some experiments with .parallel = TRUE in daply to crate timeSeries objects and then bind them together but I have some problems. Thank You in advance, Daniele Amberti set.seed(123) N - 1 X - data.frame( ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4)), VALUE = runif(N*4), stringsAsFactors = FALSE) X - X[sample(1:(N*4), N*4),] str(X) head(X) #define a variable in global env ATS - NULL buildTimeSeriesFromDataFrame - function(x) { library(timeSeries) if(!is.null(ATS)) # in global env { # assign in global env ATS - cbind(ATS, timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1]))) } else { # assign in global env ATS - timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1])) } return(TRUE) } tsDaply - function(...) { # assign in global env, to clean previous run ATS - NULL library(plyr) res - daply(X, ID, buildTimeSeriesFromDataFrame) return(res) } tsDaply(X, X$ID) head(ATS) #performance tests Time - replicate(100, system.time(tsDaply(X, X$ID))[[1]]) median(Time) hist(Time) ### #some multithread tests: ### library(doSMP) w - startWorkers(workerCount = 2) registerDoSMP(w) # do not cbint ts, just create buildTimeSeriesFromDataFrame2 - function(x) { library(timeSeries ) xx - timeSeries:::timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1])) return(xx) } #tsDaply2 - function(...) #{ # library(plyr) # res - daply(X, ID, buildTimeSeriesFromDataFrame2, .parallel = TRUE) # return(res) #} # tsDaply2 .parallel = TRUE return error: #Error in do.ply(i) : task 4 failed - subscript out of bounds #In addition: Warning messages: #1: anonymous: ... may be used in an incorrect context: '.fun(piece, ...)' #2: anonymous: ... may be used in an incorrect context: '.fun(piece, ...)' tsDaply2 - function(...) { library(plyr) res - daply(X, ID, buildTimeSeriesFromDataFrame2, .parallel = FALSE) return(res) } # tsDaply2 .parallel = FALSE work but list discart timeSeries class # bind after ts creation res - tsDaply2(X, X$ID) # list is not a timeSeries object str(cbind(t(res))) res - as.timeSeries(cbind(t(res))) stopWorkers(w) -Original Message- From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley Wickham Sent: 14 March 2011 12:48 To: Daniele Amberti Cc: r-help@r-project.org Subject: Re: [R] dataframe to a timeseries object - [ ] Message is from an unknown sender Well, I'd start by removing all explicit use of environments, which makes you code very hard to follow. Hadley On Monday, March 14, 2011, Daniele Amberti daniele.ambe...@ors.it wrote: I found that plyr:::daply is more efficient than base:::by (am I doing something wrong?), below updated code for comparison (I also fixed a couple things). Function daply from plyr package has also a .parallel argument and I wonder if creating timeseries objects in parallel and then combining them would be faster (Windows XP platform); does someone has experience with this topic? I found only very simple examples about plyr and parallel computations and I do not have a working example for such kind of implementation (daply that return a list of timeseries objects). Thanks in advance, Daniele Amberti set.seed(123) N - 1 X - data.frame( ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4)), VALUE = runif(N*4), stringsAsFactors = FALSE) X - X[sample(1:(N*4), N*4),] str(X) library(timeSeries) buildTimeSeriesFromDataFrame - function(x, env) { { if(exists(xx, envir = env)) assign(xx, cbind(get(xx, env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1]))), envir = env) else assign(xx, timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', zone = 'GMT', units = as.character(x$ID[1])), envir = env) return(TRUE) } } tsBy - function(...) { e1 - new.env(parent = baseenv()) res - by(X, X$ID, buildTimeSeriesFromDataFrame, env = e1, simplify = TRUE) return(get(xx, e1)) } Time01 - replicate(100, system.time(tsBy(X, X$ID, simplify = TRUE))[[1]]) median(Time01) hist(Time01) ATS - tsBy(X, X$ID, simplify = TRUE) library(xts) buildXtsFromDataFrame - function(x, env
Re: [R] increase a value by each group?
On Mon, Mar 14, 2011 at 9:59 AM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Something like this? my_data=read.table(clipboard, header=TRUE) my_data$s_name - factor(my_data$s_name) library(plyr) ddply(my_data, .(s_name), function(x){ x$Im_looking - x$Depth + as.numeric(x$s_name) / 100 x }) I think you need factor in there: ddply(my_data, .(s_name), function(x){ x$Im_looking - x$Depth + as.numeric(factor(x$s_name)) / 100 x }) or with transform: ddply(my_data, s_name, transform, Im_looking = Depth + as.numeric(factor(s_name)) / 100) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need Assistance in Stacked Area plot
You might try sending a reproducible example (https://github.com/hadley/devtools/wiki/Reproducibility) to the ggplot2 mailing list. Hadley On Wed, Feb 16, 2011 at 8:41 AM, Kishorenalluri kishorenalluri...@gmail.com wrote: Dear All, I need the assistance to plot the staked area plot using ggplot2 What i am expecting is to plot in X-axis the time(Shown in column1) from range of 0 to 100 seconds, and in the y axis the stable increment in area in both directions (less than and greater than zero) for columns 3, 4, 5 ( Column1, Column2, column3). The example as follows TIME concentration Column1 Column2 Column3 0. 0.E+00 0.E+00 0.E+00 0.E+00 0. 0.E+00 1.06339151E-16 -1.45858050E-21 -5.91725566E-19 0. 5.38792107E-16 1.02157781E-17 -1.64419026E-20 -7.66233765E-19 1. 2.59545931E-15 3.42126227E-18 -1.98776066E-20 -3.72669548E-19 1. 2.91310885E-15 2.81003039E-18 -1.91286265E-20 -2.38608440E-19 2. 3.07852570E-15 2.50631096E-18 -1.81194864E-20 -1.86739453E-19 3. 3.23261641E-15 -5.56403736E-16 -1.77552840E-20 -1.70484122E-19 3. 3.35382008E-15 1.54158070E-17 -2.34217089E-20 -2.07658923E-19 4. 3.82413183E-15 -9.70815457E-13 0.E+00 -6.79571364E-19 4. 5.95542983E-15 6.80013097E-16 -4.50874919E-19 -2.66777428E-19 5. 4.39175250E-14 2.47867332E-15 -1.01608288E-18 -9.76030255E-19 6. 1.38894685E-13 5.61681417E-15 -3.93770327E-18 -3.49248490E-18 6. 3.68692195E-13 1.16035253E-14 -1.41445363E-17 -1.22159013E-17 7. 8.73269040E-13 1.79082686E-14 -3.66076039E-17 -2.95735768E-17 7. 1.76597984E-12 2.12332352E-14 -6.62184833E-17 -4.70513477E-17 8. 2.95030081E-12 2.25656057E-14 -9.60506242E-17 -5.97356578E-17 9. 4.26108735E-12 1.34254419E-14 -1.25090653E-16 -6.95633425E-17 9. 5.63056757E-12 2.32612717E-14 -1.53348074E-16 -8.54969359E-17 10. 7.01928856E-12 2.31851800E-14 -1.82546609E-16 -1.02043519E-16 10. 8.39638583E-12 2.31701226E-14 -2.11755800E-16 -1.18810066E-16 11. 9.76834605E-12 -5.45647674E-13 -2.40796726E-16 -1.35796249E-16 12. 1.11376775E-11 -9.78639513E-13 -2.69731656E-16 -1.53027433E-16 12. 1.25059438E-11 2.32773795E-14 -2.98461854E-16 -1.70614303E-16 13. 1.38742114E-11 2.33364991E-14 -3.27081730E-16 -1.88394094E-16 13. 1.52430947E-11 2.33958407E-14 -3.55573728E-16 -2.06350128E-16 14. 1.66127832E-11 2.34487161E-14 -3.83935678E-16 -2.24396442E-16 15. 1.79830524E-11 2.34906687E-14 -4.12155560E-16 -2.42441983E-16 15. 1.93533963E-11 2.26377072E-14 -4.39950727E-16 -2.90534988E-16 16. 2.06980593E-11 2.17171727E-14 -4.70446135E-16 -3.56228328E-16 16. 2.19820310E-11 1.87207801E-14 -4.98853596E-16 -4.46861939E-16 17. 2.32023004E-11 1.94966974E-14 -5.22695187E-16 -5.78943073E-16 18. 2.43484124E-11 1.78879137E-14 -5.38143362E-16 -7.87629728E-16 18. 2.54003017E-11 3.09723082E-11 -5.36126990E-16 -1.16329436E-15 19. 2.63149519E-11 9.87861573E-15 -4.79610661E-16 -2.06287085E-15 19. 2.69770178E-11 -8.01983206E-14 -6.65787469E-17 -3.62969106E-15 20. 2.50814500E-11 1.40265746E-09 -4.91364111E-17 -3.37932814E-15 21. 2.23899790E-11 -1.55489960E-14 -4.45522315E-17 -3.35597818E-15 21. 2.09705026E-11 -1.20672358E-14 -5.22849137E-17 -3.35854728E-15 22. 1.99397498E-11 -1.67958460E-14 -6.01502858E-17 -3.24838081E-15 22. 1.91117367E-11 -1.08397245E-14 -6.67337987E-17 -3.01352774E-15 23. 1.84409696E-11 -3.92677717E-15 -7.10576594E-17 -2.67668939E-15 24. 1.78951028E-11 -2.97739441E-15 -7.26367637E-17 -2.28204667E-15 24. 1.74505303E-11 -2.59241020E-15 -7.15967013E-17 -1.88057466E-15 25. 1.70886909E-11 -2.22676101E-11 -6.90344927E-17 -1.51430500E-15 25. 1.67938004E-11 -2.43300435E-15 -6.45959094E-17 -1.20480757E-15 26. 1.65527898E-11 -2.00735751E-15 -6.04828745E-17 -9.62120866E-16 27. 1.63547265E-11 -1.67859253E-15 -5.68950146E-17 -7.80733120E-16 27. 1.61904094E-11 -1.41262089E-15 -5.41786960E-17 -6.51168814E-16 28. 1.60527139E-11 -2.49439678E-12 -5.23952425E-17 -5.62375603E-16 28. 1.59360400E-11 2.62881626E-10 -5.15606096E-17 -5.03612687E-16 29.
Re: [R] Vector of weekly dates starting with a given date
On Wed, Mar 9, 2011 at 3:04 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I have a date (a Monday): date-20081229 mydates-as.Date(as.character(date),%Y%m%d) What package would allow me to create a vector that starts with that date (mydates) and contains dates for 51 Mondays that follow it (so, basically, 51 dates separated by a week)? library(lubridate) mydates - ymd(date) + weeks(0:51) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R usage survey
Ok, I am very interested in what methods you plan to use that would be fit under the description suitably analyzed for voluntary response data. From my training and experience the only suitable thing to do with voluntary response data is to put it through the shredder, into the recycle bin, or use as an example of what not to do in introductory textbooks. Treating voluntary response data (especially given the responses to your post you have seen so far) as if it came from a proper random probability sample does not fit the idea of suitable analysis. Come on, that's a bit strong. In real life, it's not always possible to take a perfectly random sample and assume (at best) that missing responses are completely at random. Even descriptive analysis on a flawed sample is better than nothing at all. Of course you need to be extremely careful about making inferences about the wider population, but it's not true that the only thing you can do with survey data is to throw it in the trash. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The L Word
Note however that I've never seen evidence for a *practical* difference in simple cases, and also of such cases as part of a larger computation. But I'm happy to see one if anyone has an interesting example. E.g., I would typically never use 0L:100L instead of 0:100 in an R script because I think code readability (and self explainability) is of considerable importance too. But : casts to integer anyway: str(0:100) int [1:101] 0 1 2 3 4 5 6 7 8 9 ... And performance in this case is (obviously) negligible: library(microbenchmark) microbenchmark(as.integer(c(0, 100)), times = 1000) Unit: nanoeconds min lq median uq max as.integer(c(0, 100)) 712 791813 896 15840 (mainly included as opportunity to try out microbenchmark) So you save ~800 ns but typing two letters probably takes 0.2 s (100 wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves you time if you're going to be calling it more than 125000 times ;) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] monitor variable change
You can replace the previous line by: browser(expr=(a!=old.a) see ?browser for details. I don't understand why you'd want to do that - using if is much more readable to me (and is much more general!) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] monitor variable change
One way to implement this functionality is with a task manager callback: watch - function(varname) { old - get(varname) changed - function(...) { new - get(varname) if (!identical(old, new)) { message(varname, is now , new) old - new } TRUE } invisible(addTaskCallback(changed)) } a - 1 watch(a) a - 2 Hadley On Wed, Feb 16, 2011 at 9:38 AM, Alaios ala...@yahoo.com wrote: Dear all I would like to ask you if there is a way in R to monitor in R when a value changes. Right now I use the sprintf('my variables is %d \n, j) to print the value of the variable. Is it possible when a 'big' for loop executes to open in a new window to dynamically check only the variable I want to. If I put all the sprintf statements inside my loop then I get flooded with so many messages that makes it useless. Best Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error when modifying names of the object returned by get()
You can probably do this by constructing a call to the `names-` replacement function, but it's really bad style. Don't write R code that has external side effects if you can avoid it. In this case, you'll almost certainly get more maintainable code by writing your function to return a copy of x with new names, rather than trying to modify the original. And for that task, you might find setNames useful: It is most useful at the end of a function definition where one is creating the object to be returned and would prefer not to store it under a name just so the names can be assigned. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to order POSIXt objects ?
It's a bit better to use xtfrm. Hadley On Monday, February 14, 2011, jim holtman jholt...@gmail.com wrote: 'unclass' it first(assuming that it is POSIXct) -unclass(mytime) On Mon, Feb 14, 2011 at 3:55 AM, JonC jon_d_co...@yahoo.co.uk wrote: I have a problem ordering by descending magnitude a POSIXt object. Can someone help please and let me know how to work around this. My goal is to be able to order my data by DATE and then by descending TIME. I have tried to include as much info as possible below. The problem stems from trying to read in times from a CSV file. I have converted the character time values to a POSIXt object using the STRPTIME function. I would like ideally to sort using the order function as below. test.sort - order(test$DATE, -test$mytime) However, when I try this I receive the error as below : Error in `-.POSIXt`(test2$mytime) : unary '-' is not defined for POSIXt objects To make this easier to understand I have pasted my example data below with a list of R commands I have used. Any help or assistance would be appreciated. test2 - read.csv(C:/Documents and Settings/Jonathan Cooke/My Documents/Downloads/test2.csv, sep=,) test2 DATE TIME 1 18/01/2011 08:00:01 2 18/01/2011 08:10:01 3 18/01/2011 08:20:01 4 18/01/2011 08:30:01 5 19/01/2011 08:00:01 6 19/01/2011 08:10:01 7 19/01/2011 08:20:01 8 19/01/2011 08:30:01 test2$mytime - strptime(test2$TIME,%H:%M:%S) test2$mytime [1] 2011-02-14 08:00:01 2011-02-14 08:10:01 2011-02-14 08:20:01 2011-02-14 08:30:01 2011-02-14 08:00:01 [6] 2011-02-14 08:10:01 2011-02-14 08:20:01 2011-02-14 08:30:01 test2 DATE TIME mytime 1 18/01/2011 08:00:01 2011-02-14 08:00:01 2 18/01/2011 08:10:01 2011-02-14 08:10:01 3 18/01/2011 08:20:01 2011-02-14 08:20:01 4 18/01/2011 08:30:01 2011-02-14 08:30:01 5 19/01/2011 08:00:01 2011-02-14 08:00:01 6 19/01/2011 08:10:01 2011-02-14 08:10:01 7 19/01/2011 08:20:01 2011-02-14 08:20:01 8 19/01/2011 08:30:01 2011-02-14 08:30:01 test2.sort - order(test2$DATE, -test2$mytime) Error in `-.POSIXt`(test2$mytime) : unary '-' is not defined for POSIXt objects It's at this stage where I have got stuck as I'm new to R and don't yet know a way of getting around this error. Thanks in advance. JonC -- View this message in context: http://r.789695.n4.nabble.com/how-to-order-POSIXt-objects-tp3304609p3304609.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
On Mon, Feb 7, 2011 at 5:54 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: Looking at the timings by each stage may help : system.time(dt - data.table(dat)) user system elapsed 1.20 0.28 1.48 system.time(setkey(dt, x1, x2, x3, x4, x5, x6, x7, x8)) # sort by the 8 columns (one-off) user system elapsed 4.72 0.94 5.67 system.time(udt - dt[, list(y = sum(y, na.rm = TRUE)), by = 'x1, x2, x3, x4, x5, x6, x7, x8']) user system elapsed 2.00 0.21 2.20 # compared to 11.07s data.table doesn't have a custom data structure, so it can't be that. data.table's structure is the same as data.frame i.e. a list of vectors. data.table inherits from data.frame. It *is* a data.frame, too. The reasons it is faster in this example include : 1. Memory is only allocated for the largest group. 2. That memory is re-used for each group. 3. Since the data is ordered contiguously in RAM, the memory is copied over in bulk for each group using memcpy in C, which is faster than a for loop in C. Page fetches are expensive; they are minimised. But this is exactly what I mean by a custom data structure - you're not using the usual data frame API. Wouldn't it be better to implement these changes to data frame so that everyone can benefit? Or is it just too specialised to this particular case (where I guess you're using that the return data structure of the summary function is consistent)? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
Does FAQ 1.8 answer that ok ? Ok, I'm starting to see what data.table is about, but why didn't you enhance data.frame in R? Why does it have to be a new package? http://datatable.r-forge.r-project.org/datatable-faq.pdf Kind of. I think there are two sets of features data.table provides: * a compact syntax for expressing many common data manipulations * high performance data manipulation FAQ 1.8 answers the question for the syntax, but not for the performance related features. Basically, I'd love to be able to use the high performance components of data table in plyr, but keep using my existing syntax. Currently the only way to do that is for me to dig into your C code to understand why it's fast, and then implement those ideas in plyr. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
There's definitely something amiss with aggregate() here since similar functions from other packages can reproduce your 'control' sum. I expect ddply() will have some timing issues because of all the subgrouping in your data frame, but data.table did very well and the summaryBy() function in the doBy package did OK: Well, if you use the right plyr function, it works just fine: system.time(count(dat, c(x1, x2, x3, x4, x4, x5, x6, x7, x8), y)) # user system elapsed # 9.754 1.314 11.073 Which illustrates something that I've believed for a while about data.table - it's not the indexing that speed things up, it's the custom data structure. If you use ddply with data frames, it's slow because data frames are slow. I think the right way to resolve this is to to make data frames more efficient, perhaps using some kind of mutable interface where necessary for high-performance operations. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rows with two criteria in dataframe
On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: Here are two more candidates, using the plyr and data.table packages: library(plyr) ddply(X, .(x, y), function(d) length(unique(d$z))) x y V1 1 1 1 2 2 1 2 2 3 2 3 2 4 2 4 2 5 3 5 2 6 3 6 2 The function counts the number of unique z values in each sub-data frame with the same x and y values. The argument d in the anonymous function is a data frame object. Another approach is to use the much faster count function: count(unique(X)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to reshape wide format data.frame to long format?
I think I should be able to do this using the reshape function, but I cannot get it to work. I think I need some help to understand this... (If I could split the variable into three separate columns splitting by ., that would be even better.) Use strsplit and [ Or colsplit, from reshape, that does this for you. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2, geom_hline and facet_grid
Ok, that's a known bug: https://github.com/hadley/ggplot2/issues/labels/facet#issue/96 Thanks for the reproducible example though! Hadley On Thu, Jan 20, 2011 at 3:46 AM, Sandy Small sandy.sm...@nhs.net wrote: Thank you. That seems to work - also on my much larger data set. I'm not sure I understand why it has to be defined as a factor, but if it works... Sandy Dennis Murphy wrote: Hi Sandy: I can reproduce your problem given the data provided. When I change ecd_rhythm from character to factor, it works as you intended. str(lvefeg) List of 4 ### Interesting... $ cvd_basestudy: chr [1:10] CBP05J02 CBP05J02 CBP05J02 CBP05J02 ... $ ecd_rhythm : chr [1:10] AF AF AF AF ... $ fixed_time : num [1:10] 30.9 33.2 32.6 32.1 30.9 ... $ variable_time: num [1:10] 29.4 32 30.3 33.7 28.3 ... - attr(*, row.names)= int [1:10] 1 2 3 4 5 6 7 9 10 11 class(lvefeg) [1] cast_df data.frame lvefeg$ecd_rhythm - factor(lvefeg$ecd_rhythm) p - qplot((variable_time + fixed_time) /2 , variable_time - fixed_time, data = lvefeg, geom='point') p p + facet_grid(ecd_rhythm ~ .) + geom_hline(yintercept=0) Does that work on your end? (And thank you for the reproducible example. Using dput() allows us to see what you see, which is very helpful.) HTH, Dennis On Wed, Jan 19, 2011 at 1:30 PM, Small Sandy (NHS Greater Glasgow Clyde) [1]sandy.sm...@nhs.net wrote: Hi Still having problems in that when I use geom_hline and facet_grid together I get two extra empty panels A reproducible example can be found at: [2]https://gist.github.com/786894 Sandy Small From: [3]h.wick...@gmail.com [[4]h.wick...@gmail.com] On Behalf Of Hadley Wickham [[5]had...@rice.edu] Sent: 19 January 2011 15:11 To: Small Sandy (NHS Greater Glasgow Clyde) Cc: [6]r-help@r-project.org Subject: Re: [R] ggplot2, geom_hline and facet_grid Hi Sandy, It's difficult to know what's going wrong without a small reproducible example ([7]https://github.com/hadley/devtools/wiki/Reproducibility) - could you please provide one? You might also have better luck with an email directly to the ggplot2 mailing list. Hadley On Wed, Jan 19, 2011 at 2:57 AM, Sandy Small [8]sandy.sm...@nhs.net wrote: Having upgraded to R version 2.12.1 I still have the same problem: The combination of facet_grid and geom_hline produce (for me) 4 panels of which two are empty of any data or lines (labelled 1 and 2). Removing either the facet_grid or the geom_hline gives me the result I would then expect. I have tried forcing the rhythm to be a factor Anyone have any ideas? Sandy Dennis Murphy wrote: Hi: The attached plot comes from the following code: g - ggplot(data =f, aes(x = (variable_time + fixed_time)/2, y variable_time - fixed_time)) g + geom_point() + geom_hline(yintercept =) + facet_grid(ecd_rhythm ~ .) Is this what you were expecting? sessionInfo() R version 2.12.1 Patched (2010-12-18 r53869) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=glish_United States.1252 [2] LC_CTYPE=glish_United States.1252 [3] LC_MONETARY=glish_United States.1252 [4] LC_NUMERIC=nbsp; [5] LC_TIME=glish_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets grid [8] methods base other attached packages: [1] data.table_1.5.1 doBy_4.2.2 R2HTML_2.2 contrast_0.13 [5] Design_2.3-0 Hmisc_3.8-3 survival_2.36-2 sos_1.3-0 [9] brew_1.0-4 lattice_0.19-17 ggplot2_0.8.9 proto_0.3-8 [13] reshape_0.8.3 plyr_1.4 loaded via a namespace (and not attached): [1] cluster_1.13.2 digest_0.4.2 Matrix_0.999375-46 reshape2_1.1 [5] stringr_0.4 tools_2.12.1 HTH, Dennis On Tue, Jan 18, 2011 at 1:46 AM, Small Sandy (NHS Greater Glasgow Clyde) [9]sandy.sm...@nhs.net [10]ailto:sandy.sm...@nhs.net%22 wrote: Hi I have a long data set on which I want to do Bland-Altman style plots for each rhythm type Using ggplot2, when I use geom_hline with facet_grid I get an extra set of empty panels. I can't get it to do it with the Diamonds data supplied with the package so here is a (much abbreviated) example: lvexs cvd_basestudy ecd_rhythm fixed_time variable_time 1 CBP05J02 AF 30.9000 29.4225 2 CBP05J02 AF 33.1700 32.0350 3 CBP05J02 AF 32.5700 30.2775 4 CBP05J02
Re: [R] ggplot2, geom_hline and facet_grid
Hi Sandy, It's difficult to know what's going wrong without a small reproducible example (https://github.com/hadley/devtools/wiki/Reproducibility) - could you please provide one? You might also have better luck with an email directly to the ggplot2 mailing list. Hadley On Wed, Jan 19, 2011 at 2:57 AM, Sandy Small sandy.sm...@nhs.net wrote: Having upgraded to R version 2.12.1 I still have the same problem: The combination of facet_grid and geom_hline produce (for me) 4 panels of which two are empty of any data or lines (labelled 1 and 2). Removing either the facet_grid or the geom_hline gives me the result I would then expect. I have tried forcing the rhythm to be a factor Anyone have any ideas? Sandy Dennis Murphy wrote: Hi: The attached plot comes from the following code: g - ggplot(data =f, aes(x = (variable_time + fixed_time)/2, y variable_time - fixed_time)) g + geom_point() + geom_hline(yintercept =) + facet_grid(ecd_rhythm ~ .) Is this what you were expecting? sessionInfo() R version 2.12.1 Patched (2010-12-18 r53869) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=glish_United States.1252 [2] LC_CTYPE=glish_United States.1252 [3] LC_MONETARY=glish_United States.1252 [4] LC_NUMERIC=nbsp; [5] LC_TIME=glish_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets grid [8] methods base other attached packages: [1] data.table_1.5.1 doBy_4.2.2 R2HTML_2.2 contrast_0.13 [5] Design_2.3-0 Hmisc_3.8-3 survival_2.36-2 sos_1.3-0 [9] brew_1.0-4 lattice_0.19-17 ggplot2_0.8.9 proto_0.3-8 [13] reshape_0.8.3 plyr_1.4 loaded via a namespace (and not attached): [1] cluster_1.13.2 digest_0.4.2 Matrix_0.999375-46 reshape2_1.1 [5] stringr_0.4 tools_2.12.1 HTH, Dennis On Tue, Jan 18, 2011 at 1:46 AM, Small Sandy (NHS Greater Glasgow Clyde) sandy.sm...@nhs.net ailto:sandy.sm...@nhs.net%22 wrote: Hi I have a long data set on which I want to do Bland-Altman style plots for each rhythm type Using ggplot2, when I use geom_hline with facet_grid I get an extra set of empty panels. I can't get it to do it with the Diamonds data supplied with the package so here is a (much abbreviated) example: lvexs cvd_basestudy ecd_rhythm fixed_time variable_time 1 CBP05J02 AF 30.9000 29.4225 2 CBP05J02 AF 33.1700 32.0350 3 CBP05J02 AF 32.5700 30.2775 4 CBP05J02 AF 32.0550 33.7275 5 CBP05J02 SINUS 30.9175 28.3475 6 CBP05J02 SINUS 30.5725 29.7450 7 CBP05J02 SINUS 33. 31.1550 9 CBP05J02 SINUS 31.8350 30.7000 10 CBP05J02 SINUS 34.0450 33.4800 11 CBP05J02 SINUS 31.3975 29.8150 qplot((variable_time + fixed_time)/2, variable_time - fixed_time, data=exs) + facet_grid(ecd_rhythm ~ .) + geom_hline(yintercept=0) If I take out the geom_hline I get the plots I would expect. It doesn't seem to make any difference if I get the mean and difference separately. Can anyone explain this and tell me how to avoid it (and why does it work with the Diamonds data set? Any help much appreciated - thanks. Sandy Sandy Small Clinical Physicist NHS Forth Valley and NHS Greater Glasgow and Clyde This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe: string operations on columns
how can I perform a string operation like strsplit(x, ) on a column of a dataframe, and put the first or the second item of the split into a new dataframe column? (so that on each row it is consistent) Have a look at str_split_fixed in the stringr package. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to cut a multidimensional array along a chosen dimension and store each piece into a list
On Mon, Jan 17, 2011 at 2:20 PM, Sean Zhang seane...@gmail.com wrote: Dear R-Helpers, I wonder whether there is a function which cuts a multiple dimensional array along a chosen dimension and then store each piece (still an array of one dimension less) into a list. For example, arr - array(seq(1*2*3*4),dim=c(1,2,3,4)) # I made a point to set the length of the first dimension be 1to test whether I worry about drop=F option. brkArrIntoListAlong - function(arr,alongWhichDim){ return(outlist) } I have tried splitter_a in plyr package but does not get what I want. library(plyr) plyr:::splitter_a(arr,3) We'll you're really not supposed to call internal functions - you probably want: alply(arr, 3) but you don't say what is wrong with the output. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing data frame columns on identical data
library(plyr) # Function to sum y by A-B combinations for a generic data frame dsum - function(d) ddply(d, .(A, B), summarise, sumY = sum(y)) See count in plyr 1.4 for a much much faster way of doing this. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rootogram for normal distributions
The normal distribution is a continuous distribution, i.e., the frequency for each observed value will essentially be 1/n and not converge to the density function. Hence, you would need to look at histogram or smoothed densities. Rootograms, on the other hand, are intended for discrete distributions. I don't think that's true - rootograms are useful for both continuous and discrete distributions. See (e.g.) p 314 at http://www.edwardtufte.com/tufte/tukey, where Tukey himself uses a rootogram with a normal distribution. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data prep question
On Sun, Jan 16, 2011 at 5:48 AM, bill.venab...@csiro.au wrote: Here is one way Here is one way: con - textConnection( + ID TIME OBS + 001 2200 23 + 001 2400 11 + 001 3200 10 + 001 4500 22 + 003 3900 45 + 003 5605 32 + 005 1800 56 + 005 1900 34 + 005 2300 23) dat - read.table(con, header = TRUE, + colClasses = c(factor, numeric, numeric)) closeAllConnections() tmp - lapply(split(dat, dat$ID), + function(x) within(x, TIME - TIME - min(TIME))) split(dat, dat$ID) - tmp Or, in one line with ddply: library(plyr) ddply(dat, ID, transform, TIME = TIME - min(TIME)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] median by geometric mean
exp(median(log(x)) ? Hadley On Sat, Jan 15, 2011 at 10:26 AM, Skull Crossbones witch.of.agne...@gmail.com wrote: Hi All, I need to calculate the median for even number of data points.However instead of calculating the arithmetic mean of the two middle values,I need to calculate their geometric mean. Though I can code this in R, possibly in a few lines, but wondering if there is already some built in function. Can somebody give a hint? Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Data Transformation
The data is initially extracted from an SQL database into Excel, then saved as a tab-delimited text file for use in R. You might also want to look at the SQL packages for R so you can skip this manual step. I'd recommend starting with http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] plyr 1.4
# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise transformations like scaling or standardising It's already possible to do this with base R functions (like split and the apply family of functions), but plyr makes it all a bit easier with: * totally consistent names, arguments and outputs * convenient parallelisation through the foreach package * input from and output to data.frames, matrices and lists * progress bars to keep track of long running operations * built-in error recovery, and informative error messages * labels that are maintained across all transformations Considerable effort has been put into making plyr fast and memory efficient, and in many cases plyr is as fast as, or faster than, the built-in functions. You can find out more at http://had.co.nz/plyr/, including a 20 page introductory guide, http://had.co.nz/plyr/plyr-intro.pdf. You can ask questions about plyr (and data-manipulation in general) on the plyr mailing list. Sign up at http://groups.google.com/group/manipulatr Version 1.4 (2011-01-03) -- * `count` now takes an additional parameter `wt_var` which allows you to compute weighted sums. This is as fast, or faster than, `tapply` or `xtabs`. * Really fix bug in `names.quoted` * `.` now captures the environment in which it was evaluated. This should fix an esoteric class of bugs which no-one probably ever encountered, but will form the basis for an improved version of `ggplot2::aes`. Version 1.3.1 (2010-12-30) -- * Fix bug in `names.quoted` that interfered with ggplot2 Version 1.3 (2010-12-28) -- NEW FEATURES * new function `mutate` that works like transform to add new columns or overwrite existing columns, but computes new columns iteratively so later transformations can use columns created by earlier transformations. (It's also about 10x faster) (Fixes #21) BUG FIXES * split column names are no longer coerced to valid R names. * `quickdf` now adds names if missing * `summarise` preserves variable names if explicit names not provided (Fixes #17) * `arrays` with names should be sorted correctly once again (also fixed a bug in the test case that prevented me from catching this automatically) * `m_ply` no longer possesses .parallel argument (mistakenly added) * `ldply` (and hence `adply` and `ddply`) now correctly passes on .parallel argument (Fixes #16) * `id` uses a better strategy for converting to integers, making it possible to use for cases with larger potential numbers of combinations -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] reshape2 1.1
Reshape2 is a reboot of the reshape package. It's been over five years since the first release of the package, and in that time I've learned a tremendous amount about R programming, and how to work with data in R. Reshape2 uses that knowledge to make a new package for reshaping data that is much more focussed and much much faster. This version improves speed at the cost of functionality, so I have renamed it to `reshape2` to avoid causing problems for existing users. Based on user feedback I may reintroduce some of these features. What's new in `reshape2`: * considerably faster and more memory efficient thanks to a much better underlying algorithm that uses the power and speed of subsetting to the fullest extent, in most cases only making a single copy of the data. * cast is replaced by two functions depending on the output type: `dcast` produces data frames, and `acast` produces matrices/arrays. * multidimensional margins are now possible: `grand_row` and `grand_col` have been dropped: now the name of the margin refers to the variable that has its value set to (all). * some features have been removed such as the `|` cast operator, and the ability to return multiple values from an aggregation function. I'm reasonably sure both these operations are better performed by plyr. * a new cast syntax which allows you to reshape based on functions of variables (based on the same underlying syntax as plyr): * better development practices like namespaces and tests. Initial benchmarking has shown `melt` to be up to 10x faster, pure reshaping `cast` up to 100x faster, and aggregating `cast()` up to 10x faster. This work has been generously supported by BD (Becton Dickinson). Version 1.1 --- * `melt.data.frame` no longer turns characters into factors * All melt methods gain a `na.rm` and `value.name` arguments - these previously were only possessed by `melt.data.frame` (Fixes #5) -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packagename:::functionname vs. importFrom
Hi Frank, I think you mean packagename::functionname? The three colon form is for accessing non-exported objects. Otherwise, I think using :: vs importFrom is functionally identical - either approach delays package loading until necessary. Hadley On Mon, Jan 3, 2011 at 9:45 PM, Frank Harrell f.harr...@vanderbilt.edu wrote: In my rms package I use the packagename:::functionname construct in a number of places. If I instead use the importFrom declaration in the NAMESPACE file would that require the package to be available, and does it load the package when my package loads? If so I would keep using packagename::: to avoid up-front loading of other packages that are not always used. Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/packagename-functionname-vs-importFrom-tp3172684p3172684.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packagename:::functionname vs. importFrom
I think you mean packagename::functionname? The three colon form is for accessing non-exported objects. Normally two colons suffice, but within a package you need three to access exported but un-imported objects :) Are you sure? Note that it is typically a design mistake to use ‘:::’ in your code since the corresponding object has probably been kept internal for a good reason. Consider contacting the package maintainer if you feel the need to access the object for anything but mere inspection. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packagename:::functionname vs. importFrom
Correct. I'm doing this because of non-exported functions in other packages, so I need ::: But you really really shouldn't be doing that. Is there a reason that the package authors won't export the functions? I'd still appreciate any insight about whether importFrom in NAMESPACE defers package loading so that if the package is not actually used (and is not installed) there will be no problem. Imported packages need to be installed - but it's the import vs. suggests vs. depends statement in DESCRIPTION that controls this behaviour, not the namespace. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing a single output file
It looks like you have csv files, so use read.csv instead of read.table. Hadley On Thu, Dec 30, 2010 at 12:18 AM, Amy Milano milano_...@yahoo.com wrote: Dear sir, At the outset I sincerely apologize for reverting back bit late as I was out of office. I thank you for your guidance extended by you in response to my earlier mail regarding Writing a single output file where I was trying to read multiple output files and create a single output date.frame. However, I think things are not working as I am mentioning below - # Your code setwd('/temp') fileNames - list.files(pattern = file.*.csv) input - do.call(rbind, lapply(fileNames, function(.name) { .data - read.table(.name, header = TRUE, as.is = TRUE) .data$file - .name .data })) # This produces following output containing only two columns and moreover date and yield_rates are clubbed together. date.yield_rate file 1 12/23/10,5.25 file1.csv 2 12/22/10,5.19 file1.csv 3 12/23/10,4.16 file2.csv 4 12/22/10,4.59 file2.csv 5 12/23/10,6.15 file3.csv 6 12/22/10,6.41 file3.csv 7 12/23/10,8.15 file4.csv 8 12/22/10,8.68 file4.csv # and NOT the kind of output given below where date and yield_rates are different. input date yield_rate file 1 12/23/2010 5.25 file1.csv 2 12/22/2010 5.19 file1.csv 3 12/23/2010 5.25 file2.csv 4 12/22/2010 5.19 file2.csv 5 12/23/2010 5.25 file3.csv 6 12/22/2010 5.19 file3.csv 7 12/23/2010 5.25 file4.csv 8 12/22/2010 5.19 file4.csv So when I tried following code to produce the required result, it throws me an error. require(reshape) in.melt - melt(input, measure = 'yield_rate') in.melt - melt(input, measure = 'yield_rate') Error: measure variables not found in data: yield_rate # So I tried in.melt - melt(input, measure = 'date.yield_rate') cast(in.melt, date.yield_rate ~ file) cast(in.melt, date ~ file) Error: Casting formula contains variables not found in molten data: date # If I try to change it as cast(in.melt, date.yield_rate ~ file) # Gives following error. Error: Casting formula contains variables not found in molten data: date.yield_rate Sir, it will be a great help if you can guide me and once again sinserely apologize for reverting so late. Regards Amy --- On Thu, 12/23/10, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Writing a single output file To: Amy Milano milano_...@yahoo.com Cc: r-help@r-project.org Date: Thursday, December 23, 2010, 1:39 PM This should get you close: # get file names setwd('/temp') fileNames - list.files(pattern = file.*.csv) fileNames [1] file1.csv file2.csv file3.csv file4.csv input - do.call(rbind, lapply(fileNames, function(.name){ + .data - read.table(.name, header = TRUE, as.is = TRUE) + # add file name to the data + .data$file - .name + .data + })) input date yield_rate file 1 12/23/2010 5.25 file1.csv 2 12/22/2010 5.19 file1.csv 3 12/23/2010 5.25 file2.csv 4 12/22/2010 5.19 file2.csv 5 12/23/2010 5.25 file3.csv 6 12/22/2010 5.19 file3.csv 7 12/23/2010 5.25 file4.csv 8 12/22/2010 5.19 file4.csv require(reshape) in.melt - melt(input, measure = 'yield_rate') cast(in.melt, date ~ file) date file1.csv file2.csv file3.csv file4.csv 1 12/22/2010 5.19 5.19 5.19 5.19 2 12/23/2010 5.25 5.25 5.25 5.25 On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano milano_...@yahoo.com wrote: Dear R helpers! Let me first wish all of you Merry Christmas and Very Happy New year 2011 Christmas day is a day of Joy and Charity, May God make you rich in both - Phillips Brooks ## I have a process which generates number of outputs. The R code for the same is as given below. for(i in 1:n) { write.csv(output[i], file = paste(output, i, .csv, sep = ), row.names = FALSE) } Depending on value of 'n', I get different output files. Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv' output1.csv date yield_rate 12/23/2010 5.25 12/22/2010 5.19 . . output2.csv date yield_rate 12/23/2010 4.16 12/22/2010 4.59 . . output3.csv date yield_rate 12/23/2010 6.15 12/22/2010 6.41 . . Thus all the output files have same column names viz. Date and yield_rate. Also, I do need these files individually too. My further requirement is to have
Re: [R] pdf() Export Problem: Circles Interpreted as Fonts from ggplot2 Graphics
The Inkscape user asked if there was any way that R could be coerced to use actual circles or paths for the points. I am not aware of a way to do this so any input from anyone here would be greatly appreciated. pdf(..., useDingbats = F) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] ggplot2 0.8.9 - Merry Christmas version
ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and avoid bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics. To install or update, run: install.packages(c(ggplot2, plyr)) Find out more at http://had.co.nz/ggplot2, and check out the nearly 500 examples of ggplot in use. If you're interested, you can also sign up to the ggplot2 mailing list at http://groups.google.com/group/ggplot2, or track development at http://github.com/hadley/ggplot2 ggplot2 0.8.9 (2010-12-24) A big thanks to Koshke Takahashi, who supplied the majority of improvements in this release! GUIDE IMPROVEMENTS * key size: can specify width and height separately * axis: can partially handle text rotation (issue #149) * legend: now can specify the direction of element by opts(legend.direction = vertical) or opts(legend.direction = horizontal), and legend box is center aligned if horizontal * legend: now can override the alignment of legend box by opts(legend.box = vertical) or opts(legend.box = horizontal) * legend: now can override legend title alignment with opts(legend.title.align = 0) or opts(legend.title.align = 1) * legend: can override legend text alignment with opts(legend.text.align = 0) or opts(legend.text.align = 1) BUG FIXES * theme_*: can specify font-family for all text elements other than geom_text * facet_grid: fixed hirozontal spacing when nrow of horizontal strip = 2 * facet_grid: now can manually specify the relative size of each row and column * is.zero: now correctly works * +: adding NULL to a plot returns the plot (idempotent under addition) (thanks to suggestion by Matthew O'Meara) * +: meaningful error message if + doesn't know how to deal with an object type * coord_cartesian and coord_flip: now can wisely zoom when wise = TRUE * coord_polar: fix point division bugs * facet_grid: now labels in facet_grid are correctly aligned when the number of factors is more then one (fixes #87 and #65) * geom_hex: now correctly applies alpha to fill colour not outline colour (thanks to bug report from Ian Fellows) * geom_polygon: specifying linetype now works (thanks to fix from Kohske Takahashi) * hcl: can now set c and l, and preserves names (thanks to suggestion by Richard Cotton) * mean_se: a new summary function to work with stat_summary that calculates mean and one standard error on either side (thanks to contribution from Kohske Takahashi) * pos_stack: now works with NAs in x * scale_alpha: setting limits to a range inside the data now works (thanks to report by Dr Proteome) * scale_colour_continuous: works correctly with single continuous value (fixes #73) * scale_identity: now show legends (fix #119) * stat_function: now works without y values * stat_smooth: draw line if only 2 unique x values, not three as previously * guides: fixed #126 * stat_smooth: once again works if n 1000 and SE = F (thanks to bug report from Theiry Onkelinx and fix from Kohske Takahashi) * stat_smooth: works with locfit (fix #129) * theme_text handles alignment better when angle = 90 -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing a single output file
input - do.call(rbind, lapply(fileNames, function(.name){ + .data - read.table(.name, header = TRUE, as.is = TRUE) + # add file name to the data + .data$file - .name + .data + })) You can simplify this a little with plyr: fileNames - list.files(pattern = file.*.csv) names(fileNames) - fileNames input - ldply(fileNames, read.table, header = TRUE, as.is = TRUE) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coding a new variable based on criteria in a dataset
It isn't quite convenient to read the data posted below into R (if it was originally tab-separated, that formatting got lost) but ddply from the plyr package is good for this: something like (untested) d - with(data,ddply(data,interaction(UniqueID,Reason), function(x) { ## make sure x is sorted by date/time here x$F_R - c(F,rep(R,nrow(x)-1)) x }) Or a little more succinctly: d - ddply(data, c(UniqueID, Reason), transform, F_R = c(F,rep(R,nrow(x)-1)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to change the default location of x-axis in ggplot2?
In ggplot2, by default the x-axis is in the bottom of the graph and y-axis is in the left of the graph. I wonder if it is possible to: 1. put the x axis in the top, or put the y axis in the right? 2. display x axis in both the top and bottom? These are on the to do list. 3. display x axis in both sides, and each of them has individual scales? ggplot2 will never support this because I think it's a really really bad idea. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 histograms
Hi Sandy, The way I'd describe it is that you expected the width parameter of the position adjustment to be relative to the binwidth of the histogram - but it's actually absolute, and it has to be this way because there's currently no way for the position adjustment to know about the parameters of the geom. Hadley On Wed, Dec 1, 2010 at 10:07 AM, Small Sandy (NHS Greater Glasgow Clyde) sandy.sm...@nhs.net wrote: Sorry this should have ben to the whole list: Hadley I think I've sorted it out in my head but for the record, and just to be sure... I guess what I was expecting was that the width parameter would be independent of binwidth. Thus a width parameter of 0.5 would always indicate an overlap of half the bar. In fact the width is determined as a fraction of the binwidth, so if width is greater than binwidth the overlap will be with adjacent bins not the bin it actually corresponds to. So in my example you can completely separate the data by putting ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 2, position = position_dodge(width=7)) Obviously this isn't helpful. I think the rules are: 1. the width of each bar equals binwidth divided by number of fill factors (in my case two) 2. total width of the visible bars would be centred on the centre of the bin 3. overlap of the visible bars is governed by the width parameter of position_dodge with 0 being complete overlap and binwidth being complete (but touching) separation (More than binwidth would then mean space between the bars - and presumably overlap with adjacent bars - I don't think this would ever be useful). Hope this makes sense. Sandy Sandy Small Clinical Physicist NHS Forth Valley (Tel: 01324567002) and NHS Greater Glasgow and Clyde (Tel: 01412114592) From: h.wick...@gmail.com [h.wick...@gmail.com] On Behalf Of Hadley Wickham [had...@rice.edu] Sent: 01 December 2010 14:27 To: Small Sandy (NHS Greater Glasgow Clyde) Cc: ONKELINX, Thierry; r-help@r-project.org Subject: Re: [R] ggplot2 histograms However if you do: ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 1, position = position_dodge(width=0.99)) The position of first bin which goes from 0-2 appears to start at about 0.2 (I accept that there is some white space to the left of this) while the position of the last bin (16-18) appears to start at about 15.8, so the whole histogram seems to be wrongly compressed into the scale. In my real data which has potentially 250 bins the problem becomes much more pronounced. Has any one else noticed this? Is there a work around? What do you expect this to do? The bars are one unit wide, but you've told position_dodge to treat them like they're only 0.99 units wide. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ This message may contain confidential information. If yo...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava research.b...@gmail.com wrote: Dear R-Helpers: I am using trying to use *ddply* to extract min and max of a particular column in a data.frame. I am using two different forms of the function: ## var_name_to_split is a string -- something like var1 which is the name of a column in data.frame ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ , 3]))) ## fails with an error - case 1 ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3]))) ## works fine - case 2 I can't understand why I get the error in case 1. Can someone help me please? Why do you expect case 1 to work? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
It's easiest to see what's going on if you use eval.quoted directly: eval.quoted(.(cyl), mtcars) eval.quoted(.(cyl), mtcars) eval.quoted(.(as.name(cyl)), mtcars) But you shouldn't need to do any syntactic hackery because the default method automatically parses the string for you: eval.quoted(as.quoted(cyl), mtcars) Hadley On Mon, Dec 6, 2010 at 6:22 PM, Sunny Srivastava research.b...@gmail.com wrote: Hi Hadley: I was trying to use ddply using the format . (var1) for splitting. I thought . ( as.name(grp) ) would do the same thing. But it does not. I was just trying to know my mistake. I am sorry if it is a basic question. Thank you and others for your reply. Best Regards, S. On Mon, Dec 6, 2010 at 5:28 PM, Hadley Wickham had...@rice.edu wrote: On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava research.b...@gmail.com wrote: Dear R-Helpers: I am using trying to use *ddply* to extract min and max of a particular column in a data.frame. I am using two different forms of the function: ## var_name_to_split is a string -- something like var1 which is the name of a column in data.frame ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ , 3]))) ## fails with an error - case 1 ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3]))) ## works fine - case 2 I can't understand why I get the error in case 1. Can someone help me please? Why do you expect case 1 to work? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 histograms
However if you do: ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 1, position = position_dodge(width=0.99)) The position of first bin which goes from 0-2 appears to start at about 0.2 (I accept that there is some white space to the left of this) while the position of the last bin (16-18) appears to start at about 15.8, so the whole histogram seems to be wrongly compressed into the scale. In my real data which has potentially 250 bins the problem becomes much more pronounced. Has any one else noticed this? Is there a work around? What do you expect this to do? The bars are one unit wide, but you've told position_dodge to treat them like they're only 0.99 units wide. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 histograms
You may find it easier to use a frequency polygon, geom = freqpoly. Hadley On Tue, Nov 30, 2010 at 2:36 PM, Small Sandy (NHS Greater Glasgow Clyde) sandy.sm...@nhs.net wrote: Hi With ggplot2 I can very easily create beautiful histograms but I would like to put two histograms on the same plot. The histograms may be over-lapping. When they are overlapping the bars are shown on top of each other (so that the overall height is the sum of the two). Is there any way to get them to display overlapping (with smaller value in front, larger value behind) so that the overall height is equal to the height of the largest value The following demonstrates the problem (there is probably a simple way to generate the sequence in d1 but I don't know it and just threw this together quickly) d1-c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7,7,8,8,9,6,7,7,8,8,8,9,9,9,9,10,10,10,10,10,11,11,11,11,12,12,12,13,13,14,15,15,16,16,16,17,17,17,17,18,18,18,18,18) d2-c(rep(a,25), rep(b,39)) dafr-data.frame(d1,d2) library(ggplot) qplot(d1, data=dafr, fill=d2, geom='histogram', binwidth = 1) Many thanks for any help Sandy Sandy Small Clinical Physicist NHS Forth Valley and NHS Greater Glasgow and Clyde This message may contain confidential information. If yo...{{dropped:24}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on running regression by grouping firms
res - function(x) resid(x) ds_test$u - do.call(c, llply(mods, res)) I'd be a little careful with this, because there's no guarantee the results will by ordered in the same way as the input (and I'd also prefer ds_test$u - unlist(llply(mods, res)) or ds_test$u - laply(mods, res)) In your case, where you have multiple grouping factors, you may have to be a little more careful, but the strategy is the same. You could possibly reduce it to a one-liner (untested): ds_test$u - do.call(c, dlply(ds_test, .(individual), function(x) resid(lm(size ~ time, data = x Or: ds_test - ddply(ds_test, .(individual), transform, u = resid(lm(size ~ time))) which will guarantee the correct ordering. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Go (back) from Rd to roxygen
Since roxygen is a great help to document R packages, I am wondering if there exists an approach to go back from the raw Rd files to roxygen-documentation? E.g. turn \author{Somebody} into @author Somebody. This sounds ridiculous, but I believe it helps in the long term for me to maintain R packages. Have a look at https://gist.github.com/d1bbd44894a99a2e1d1b for a start. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sum in vector
rowsum(value, paste(factor1, factor2, factor3)) That is dangerous in general, and always inefficient. Imagine factor1 is c(a, a b) and factor2 is (b c, c). Use interaction with drop = T. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extending the accuracy of exp(1) in R
Where the value of exp(1) as computed by R is concerned, you have been deceived by what R displays (prints) on screen. The default is to display any number to 7 digits of accuracy, but that is not the accuracy of the number held internally by R: exp(1) # [1] 2.718282 exp(1) - 2.718282 # [1] -1.715410e-07 I encourage anyone confused about this issue to study http://en.wikipedia.org/wiki/The_Treachery_of_Images And to watch http://www.youtube.com/watch?v=ejweI0EQpX8 Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to detect if a vector is FP constant?
Hi all, What's the equivalent to length(unique(x)) == 1 if want to ignore small floating point differences? Should I look at diff(range(x)) or sd(x) or something else? What cut off should I use? If it helps to be explicit, I'm interested in detecting when a vector is constant for the purpose of visual display. In other words, if I rescale x to [0, 1] do I have enough precision to get at least 100 unique values. Thanks! Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to detect if a vector is FP constant?
I think this does what you want (borrowing from all.equal.numeric): all(abs((x - mean(x))) .Machine$double.eps^0.5) with a vector of length 1 million, it took .076 seconds on a fairly old system. Hmmm, maybe I want: all.equal(min(x), max(x)) ? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Heatmap construction problems
It's hard to know without a minimal reproducible example, but you probably want scale_fill_gradient or scale_fill_gradientn. Hadley On Thu, Oct 28, 2010 at 9:42 AM, Struchtemeyer, Chris stru...@okstate.edu wrote: I am very new to R and don't have any computer program experience whatsoever. I am trying to generate a heatmap of the following data: Phylum,AI,AJT,BY,GA,Grt,Sm Acidobacteria,0.5,0.7,2.7,0.1,2.6,1.0 Actinobacteria,33.7,65.1,9.7,2.0,3.9,2.1 Bacteroidetes,9.7,5.6,0.7,13.2,41.1,21.6 CCM11b,0.0,0.0,0.0,0.0,0.0,0.1 Chlamydiae,0.1,0.1,0.0,0.0,1.0,0.2 Chlorobi,0.0,0.0,0.0,0.0,0.7,1.0 Chloroflexi,0.1,0.2,0.6,0.2,0.8,0.6 Cyanobacteria,18.7,0.0,1.0,1.5,0.9,0.3 Ellusimicrobiales,0.0,0.0,0.0,0.0,0.0,0.0 Firmicutes,1.0,7.6,8.3,31.9,2.1,6.9 Gemmatimonadetes,5.0,0.3,0.3,0.0,0.1,0.0 GN02,0.0,0.0,0.0,0.0,0.0,0.5 Nitrospirae,0.0,0.2,1.1,0.0,0.0,0.0 NKB19,0.0,0.0,0.9,0.0,0.0,0.0 OP8,0.0,0.1,0.0,0.0,0.0,0.0 OP10,0.6,0.2,0.5,0.0,0.6,0.6 Planctomycetes,0.9,0.5,6.5,2.2,2.0,2.3 Alphaproteobacteria,7.8,10.7,21.8,12.2,5.3,26.8 Betaproteobacteria,9.9,2.8,8.9,21.7,8.3,21.9 Deltaproteobacteria,0.5,0.2,1.8,2.0,1.2,7.1 Epsilonproteobacteria,0.0,0.0,0.0,0.1,0.0,0.2 Gammaproteobacteria,4.0,2.5,8.0,9.4,24.7,5.4 SC4,0.0,0.0,0.0,0.0,0.7,0.0 SM2F11,0.0,0.0,0.0,0.0,0.2,0.0 SPAM,0.0,0.1,0.0,0.0,0.1,0.1 Synergistes,0.0,0.0,0.0,0.1,0.0,0.0 Deinococcus-Thermus,0.0,0.0,0.0,0.0,0.0,0.0 TM6,0.1,0.0,0.0,0.0,0.1,0.0 TM7,0.0,0.1,0.4,0.0,0.4,0.1 Verrucomicrobia,3.8,2.1,23.2,2.9,1.3,0.5 WPS-2,0.0,0.0,0.1,0.0,0.0,0.0 WS3,0.0,0.0,0.0,0.0,0.0,0.1 Uncl Bacteria,3.7,1.2,3.7,0.4,1.9,0.8 I am a microbiologist. What I want to do is construct a heatmap showing the relative abundance of each phylum. The far left column of my table contains all of the phylum names I observed in a set of 6 water samples and each of the columns to the right contains the relative abundance (%) of each phylum in each water sample. I have tried constructing a heatmap using the ggplot guidelines listed at the following site: http://learnr.wordpress.com/2010/01/26/ggplot2-quick-heatmap-plotting/ I can generate a heatmap using this method, but would like to alter the scale. I would like it so that I can have a little more complex gradient ranging from 0% to the highest relative abundance that I observe in the above table (65.1%). The default scale I get using the link above is just a relative intensity scale ranging from 1 to 5 (where white represent low percentages and steelblue represented high percentages). This is alright but for phyla that are present at relative abundance of less than 5% all appear to be white (or non-existant). Is there anyway to fix this? Any help would be greatly appreciated. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: facet_grid with only one level does not display the graph with the facet_grid level in title
This is on my to do list: https://github.com/hadley/ggplot2/issues/labels/facet#issue/107 Hadley On Thu, Oct 28, 2010 at 11:51 AM, Matthew Pettis matthew.pet...@gmail.com wrote: Hi All, Here is the code that I'll be referring to: p - ggplot(wastran.data, aes(PER_KEY, EVENTS)) (p - p + facet_grid( pool.short ~ .) + stat_summary(aes(y=EVENTS), fun.y = sum, geom=line) + opts(axis.text.x = theme_text(angle = 90, hjust=1), title=Events (15min.) vs. Time: Facet pool, strip.text.y = theme_text()) ) Now, depending on preceding parameters, the 'pool.short' factor variable in 'wastran.data' can have one distinct factor level or it can have more than one. When 'pool.short' has more than one factor level, the graph performs as I expect, with multiple rows of graphs with the value of the 'pool.short' variable displayed on the right-hand side of the graph. When 'pool.short' has only one factor level, the value is NOT displayed on the right-hand side. However, I'd still like it displayed, even though it has only one value. Can someone tell me how to tweak this code to make it still display when it has only 1 factor level? If this code is unclear, I will be happy to take some time and generate an artificial but reproducible self-contained example. I left in the stat_summary layer in this code in case it is interfering with the desired output (but I suspect is is superfluous, but I am not confident enough to say that with absolute certainty). Thanks, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding too many loops - reshaping data
Beware of facile comparisons of this sort -- they may be apples and nematodes. And they also imply that the main time sink is the computation. In my experience, figuring out how to solve the problem using takes considerably more time than 18 / 1000 seconds, and so investing your energy in learning idioms that apply in a wide range of situations is far more useful than figuring out the fastest solution to a single problem. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overloading the generic primitive functions + and [
Note how S3 methods are dispatched only by reference to the first argument (on the left of the operator). I think S4 beats this by having signatures that can dispatch depending on both arguments. That's somewhat of a simplification for primitive binary operators. R actually looks up the method for both input classes, and returns a error if they are different. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Which version control system to learn for managing R projects?
git is where the world is headed. This video is a little old: http://www.youtube.com/watch?v=4XpnKHJAok8, but does a good job getting the point across. And lots of R users are using github already: http://github.com/languages/R/created Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forcing results from lm into datframe
On Tue, Oct 26, 2010 at 11:55 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: When it comes to split, apply, combine, think plyr. library(plyr) ldply(split(afvtprelvefs, afvtprelvefs$basestudy), function(x) coef(lm (ef ~ quartile, data=x, weights=1/ef_std))) Or do it in two steps: models - dlply(aftvprelvef, basestudy, function(x) lm(ef ~ quartile, data=x, weights=1/ef_std) coefs - ldply(models, coefs) That way you can easily pull out other info rsq - function(x) summary(x)$r.squared ldply(models, rsq) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Which version control system to learn for managing R projects?
1. What is everyone else using? The network effect is important since you want people to be able to access your repository and you want to leverage your knowledge of the version control system for other projects' repositories. To that extent Subversion is the clear choice since its used on R-Forge, by R itself and on Google code (Google code also supports Mercurial). There's a bit of a complication that you can use git (and mercurial I assume) to work with svn repositories, but not vice versa. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find index of a string inside a string?
Or str_locate: library(stringr) str_locate(aabcd, bcd) Hadley On Mon, Oct 25, 2010 at 5:53 AM, jim holtman jholt...@gmail.com wrote: I think what you want is 'regexpr': regexpr(bcd, aabcd) [1] 3 attr(,match.length) [1] 3 On Mon, Oct 25, 2010 at 7:27 AM, yoav baranan ybara...@hotmail.com wrote: Hi, I am searching for the equivalent of the function Index from SAS. In SAS: index(abcd, bcd) will return 2 because bcd is located in the 2nd cell of the abcd string. The equivalent in R should do this: myIndex - foo(abcd, bcd) #return 2. What is the function that I am looking for? I want to use the return value in substr, like I do in SAS. thanks, y. baranan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Query on save.image()
On Thu, Oct 14, 2010 at 11:56 AM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi, I do not believe you can use the save.image() function in this case. save.image() is a wrapper for save() with defaults for the global environment (your workspace). Try this instead, I believe it does what you are after: myfun - function(x) { y - 5 * x + x^2 save(list = ls(envir = environment(), all.names = TRUE), file = myfile.RData, envir = environment()) } Notice that for both save() and ls() I used the environment() function to grab the current environment. This should mean that even if y was defined globally, it would save a copy of the version inside your function. I think the defaults are actually ok in this case: myfun - function(x) { + y - 5 * x + x^2 + save(list = ls(all.names = TRUE), file = myfile.RData) + } print(load(myfile.RData)) [1] x y Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can't find and install reshape2??
My guess is you are using an outdated R version for which the rather new reshape2 package has not been compiled. I wonder if install.packages() could detect this case (e.g. by also checking if the source version is not available), and offer a more informative error message. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looking for a book/tutorial with the following context:
Do you also know more references about variables? Unfortunately this was a little bit short so I do not feel 100% sure I completely got it. Try here: http://github.com/hadley/devtools/wiki/Scoping It's a work in progress. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: Tools for thinking about data analysis and graphics
On Wed, Oct 6, 2010 at 4:05 PM, Michael Friendly frien...@yorku.ca wrote: I'm giving a talk about some aspects of language and conceptual tools for thinking about how to solve problems in several programming languages for statistical computing and graphics. I'm particularly interested in language features that relate to: o expressive power: ease of translating what you want to do into the results you want o elegance: how well does the code provide a simple human-readable description of what is done? o extensibility: ease of generalizing a method to wider scope o learnability: your learning curve (rate, asymptote) For R, some things to cite are (a) data and function objects, (b) object-oriented methods (S3 S4); (c) function mapping over data with *apply methods and plyr. What other language features of R should be on this list? I would welcome suggestions (and brief illustrative examples). * missing values * subsetting * lexical scope and closures (goes along with first class functions) * built-in documentation * CRAN (not exactly a language feature, but important part of ecosystem) * thoughtful interactive features - e.g. a - 10 doesn't print 10. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plyr: a*ply with functions that return matrices-- possible bug in aaply?
That is, I want to define something like the following using an a*ply method, but aaply gives a result in which the applied .margin(s) do not appear last in the result, contrary to the documentation for ?aaply. I think this is a bug, either in the function or the documentation, but perhaps there's something I misunderstand for this case. Maybe the documentation isn't clear but I think this is behaving as expected: * the margin you split on comes first in the output * followed by the dimensions created by the applied function. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with match.call
RFF-function(qtype, qOpt,...){} i.e., I have two args that are compulsary and the rest are optional. Now when my user passes the function call, I need to see what optional args are defined and process accordingly...what I have so far is.. RFF-function(qtype, qOpt,...){ mc - match.call(expand.dots=TRUE) } I need to see what all args have been sent out of vec-c(flag,sep,dec) and define if-else conditions based on whether they have been defined. How do I do this? I think you'd be much better off defining those as arguments and using missing(), rather than messing around with match.call (unless there is a specific reason you need the unevaluated expressions). Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Script auto-detecting its own path
I'm not sure this will solve the issue because if I move the script, I would still have to go into the script and edit the /path/to/my/script.r, or do I misunderstand your workaround? I'm looking for something like: file.path.is.here(myscript.r) and which would return something like: [1] c:/user/Desktop/ so that regardless of where the script is, as long as the accompanying scripts are in the same directory, they can be easily sourced with something like: dirX - file.path.is.here(MasterScript.r) source(paste(dirX, AuxillaryFile.r, sep=)) If you use relative paths like so: # master.r source(AuxillaryFile.r) Then source(path/to/master.r, chdir = T) will work. Mastering working directories is a much better idea than coming up with your own workarounds. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function which can apply a function by a grouping variable and also hand over an additional variable, e.g. a weight
You might want to check out the plyr package. Hadley On Fri, Oct 1, 2010 at 6:05 AM, Werner W. pensterfuz...@yahoo.de wrote: Hi, I was wondering if there is an easy way to accomplish the following in R: Often I want to apply a function, e.g. weighted.quantile from the Hmisc package to grouped subsets of a data.frame (grouping variable) but then I also need to hand over the weights which seems not possible with summaryBy or aggregate or the like. Is there a function to do this? Currently I do this with loops but it is very slow. I would be very grateful for any hints. Thanks, Werner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Script auto-detecting its own path
Forgive me if this question has been addressed, but I was unable to find anything in the r-help list or in cyberspace. My question is this: is there a function, or set of functions, that will enable a script to detect its own path? I have tried file.path() but that was not what I was looking for. It would be nice to be able to put all the related scripts I use in the same folder with a master script and then source() them in that master script. Problem is, the master script must first know where it is (without me having to open it and retype the path every time I move it). Instead of trying to work out where your script is located, when you source it in, just make sure the working directory is set correctly: source(/path/to/my/script.r, chdir = T) chdir is the very useful, but under advertised, argument to source(). Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ggplot2 - Boxplot
That implies you need to update your version of plyr. Hadley On Wed, Sep 22, 2010 at 4:10 AM, RaoulD raoul.t.dso...@gmail.com wrote: Hi, I am using ggplot2 to create a boxplot that summarizes a continuous variable. This code works fine for me on one PC however when I use it on another it doesnt. The structure of the dataset AHT_TopCD is SubReason=Categorical variable, AHT=Continuous variable. The code for the boxplot: require(ggplot2) qplot(SubReason,AHT,data=AHT_TopCD,geom=boxplot,main=AHT Spread - By Sub-Reason,xlab=AHT,colour=SubReason,alpha = I(1 / 5))+ + coord_flip() + scale_x_discrete(breaks=NA) The error I get is : Error in get(make_aesthetics, env = x, inherits = TRUE)(x, ...) : could not find function empty I do not understand this error. Can anyone help me with this please? Also, let me know if you have any questions or require clarification on anything here. Regards, Raoul -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-ggplot2-Boxplot-tp2549970p2549970.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parallel computation with plyr 1.2.1
Yes, this was a little bug that will be fixed in the next release. Hadley On Thu, Sep 16, 2010 at 1:11 PM, Dylan Beaudette debeaude...@ucdavis.edu wrote: Hi, I have been trying to use the new .parallel argument with the most recent version of plyr [1] to speed up some tasks. I can run the example in the NEWS file [1], and it seems to be working correctly. However, R will only use a single core when I try to apply this same approach with ddply(). 1. http://cran.r-project.org/web/packages/plyr/NEWS Watching my CPUs I see that in both cases only a single core is used, and they take about the same amount of time. Is there a limitation with how ddply() dispatches parallel jobs, or is this task not suitable for parallel computing? Cheers, Dylan Here is an example: library(plyr) library(doMC) registerDoMC(cores=2) # example data d - data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500)) # function that wastes some time f - function(x) { m - vector(length=1) for(i in 1:1) { m[i] - mean(sample(x$y, 100)) } mean(m) } system.time(ddply(d, .(id), .fun=f, .parallel=FALSE)) # user system elapsed # 2.740 0.016 2.766 system.time(ddply(d, .(id), .fun=f, .parallel=TRUE)) # user system elapsed # 2.720 0.000 2.726 -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with reshape2 on Mac
Hi Uwe, The problem is most likely because the original poster doesn't have the latest version of plyr. I correctly declare this dependency in the DESCRIPTION (http://cran.r-project.org/web/packages/reshape2/index.html), but unfortunately R doesn't seem to use this information at run time, generally creating many error reports of this nature. Hadley 2010/9/13 Uwe Ligges lig...@statistik.tu-dortmund.de: Is this a recent version of R? If so, please report to the maintainer. Otherwise, please also report that it does not work with your version of R so that the maintainer can add a version dependency. Best, Uwe Ligges On 13.09.2010 17:42, Paul Metzner wrote: Hi! I updated to reshape2 yesterday and tried to make it work. Unfortunately, it mainly throws error messages at me (good thing it's reshape2 1.0 and not reshape 2.0). The most recent is: Error in match.fun(FUN) : object 'id' not found When I manually create an object 'id', it says: Error in get(as.character(FUN), mode = function, envir = envir) : object 'id' of mode 'function' was not found I assume that dcast is looking for a function by the name 'id' which is not present. I tried both Rdaemon within TextMate and R in the Terminal. I also tried both my own code and the airquality example. reshape is still working flawlessly. I also needed to load plyr manually to make another error message go away, that asked for 'as.quoted'. Best, Paul --- Paul Metzner Humboldt-Universität zu Berlin Philosophische Fakultät II Institut für deutsche Sprache und Linguistik Post: Unter den Linden 6 | 10099 Berlin | Deutschland Besuch: Dorotheenstraße 24 | 10117 Berlin | Deutschland +49-(0)30-2093-9726 paul.metz...@rz.hu-berlin.de http://amor.rz.hu-berlin.de/~metznerp/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] post
Have a look at: Computing Thousands of Test Statistics Simultaneously in R by Holger Schwender and Tina Müller, in http://stat-computing.org/newsletter/issues/scgn-18-1.pdf Hadley On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush usha...@yahoo.com wrote: Hello, I have a question regarding how to speed up the t.test on large dataset. For example, I have a table tab which looks like: a b c d e f g h 1 2 3 4 5 ... 10 dim(tab) is 10 x 100 I need to do the t.test for each row on the two subsets of columns, ie to compare a b d group against e f g group at each row. subset 1: a b d 1 2 3 4 5 ... 10 subset 2: e f g 1 2 3 4 5 ... 10 10 t.test's for each row for these two subsets will take around 1 min. The prblem is that I have around 1 different combinations of such a subsets. therefore 1min*1 =1min in the case if I will use for loop like this: n1=1 #number of subset combinations for (i1 in 1:n1) { n2=10 # number of rows i2=1 for (i2 in 1:n1) { t.test(tab[i2,v5],tab[i2,v6])$p.value #v5 and v6 are vectors containing the veriable names for the two subsets (they are different for each loop) } } My question is there more efficient way how to do this computations in a short period of time? Any packages, like plyr? May be direct calculations isted of using t.test function? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.