from:"Hadley Wickham"

Re: [R] ddply with mean and max...

2011-05-11 Thread Hadley Wickham

 Thats the ticket!  So mean is already set up to operate on columns but max and
 min are not?  I guess its not too important now I know ... but whats going on 
 in
 the background that makes that happen?

Basically, this:

 mean.data.frame
function (x, ...)
sapply(x, mean, ...)
environment: namespace:base
 min.data.frame
Error: object 'min.data.frame' not found

There was some discussion on r-devel recently about removing
mean.data.frame to be consistent with the other summary functions
(plus the way it's currently written makes it prone to problems)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple loop

2011-05-07 Thread Hadley Wickham

 Using paste(Site,Prof) when calling ave() is ugly, in that it
 forces you to consider implementation details that you expect
 ave() to take care of (how does paste convert various types
 to strings?).  It also courts errors  since paste(A B, C)
 and paste(A, B C) give the same result but represent different
 Site/Prof combinations.

 Well, ave() uses interaction(...) and interaction() has a drop argument, so

 with(x, ave(H, Site, Prof, drop=TRUE, FUN=function(y)y-min(y)))
 [1]  8  0 51  0 33 22 21  0

I don't understand why this isn't the default.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Empty Data Frame

2011-04-27 Thread Hadley Wickham

On Wed, Apr 27, 2011 at 4:58 AM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 You could try something like

 df - data.frame( expand.grid( Week = 1:52, Year = 2002:2011 ))

expand.grid already returns a data frame...  You might want
KEEP.OUT.ATTRS = F though.  Even it feels like you are yelling at R.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] setting options only inside functions

2011-04-27 Thread Hadley Wickham

 This has the side effect of ignoring errors
 and even hiding the error messages.  If you
 are concerned about multiple calls to on.exit()
 in one function you could define a new function
 like
  withOptions - function(optionList, expr) {
   oldOpts - options(optionList)
   on.exit(options(oldOpts))
   expr # lazily evaluate
  }

I wish R had more functions like this.  This sort of behaviour is also
useful when you open connections or change locales.  Ruby's blocks
provide nice syntactic sugar for this idea.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MASS fitdistr with plyr or data.table?

2011-04-27 Thread Hadley Wickham

On Wed, Apr 27, 2011 at 3:55 PM, Justin Haynes jto...@gmail.com wrote:
 I am trying to extract the shape and scale parameters of a wind speed
 distribution for different sites.  I can do this in a clunky way, but
 I was hoping to find a way using data.table or plyr.  However, when I
 try I am met with the following:

 set.seed(144)
 weib.dist-rweibull(1,shape=3,scale=8)
 weib.test-data.table(cbind(1:10,weib.dist))
 names(weib.test)-c('site','wind_speed')

 fitted-weib.test[,fitdistr(wind_speed,'weibull'),by=site]

 Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
  invalid to set the class to matrix unless the dimension attribute is
 of length 2 (was 0)
 In addition: Warning messages:
 1: In dweibull(x, shape, scale, log) : NaNs produced
 ...
 10: In dweibull(x, shape, scale, log) : NaNs produced

 (the warning messages are normal from what I can tell)

 or using plyr:

 set.seed(144)
 weib.dist-rweibull(1,shape=3,scale=8)
 weib.test.too-data.frame(cbind(1:10,weib.dist))
 names(weib.test.too)-c('site','wind_speed')

 fitted-ddply(weib.test.too,.(site),fitdistr,'weibull')

Well fitdistr doesn't return a data frame, so you need to do something
to its output...

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] setting options only inside functions

2011-04-27 Thread Hadley Wickham

 Put together a list and we can see what might make sense.  If we did
 take this on it would be good to think about providing a reasonable
 mechanism for addressing the small flaw in this function as it is
 defined here.

In devtools, I have:

#' Evaluate code in specified locale.
with_locale - function(locale, expr) {
  cur - Sys.getlocale(category = LC_COLLATE)
  on.exit(Sys.setlocale(category = LC_COLLATE, locale = cur))

  Sys.setlocale(category = LC_COLLATE, locale = locale)
  force(expr)
}

(Using force here just to be clear about what's going on)

Bill discussed options().  Other ideas (mostly from skimming apropos(set)):

 * graphics devices (i.e. automatic dev.off)
 * working directory (as in the chdir argument to sys.source)
 * environment variables (Sys.setenv/Sys.getenv)
 * time limits (as replacement for transient = T)

For connections it would be nice to have something like:

with_connection - function(con, expr) {
  open(con)
  on.exist(close(con))

  force(expr)
}

but it's a little clumsy, because

with_connection(file(myfile.txt), {do stuff...})

isn't very useful because you have no way to reference the connection
that you're using. Ruby's blocks have arguments which would require
big changes to R's syntax.  One option would to use pronouns:

with_connection - function(con, expr) {
  open(con)
  on.exist(close(con))

  env - new.env(parent = parent.frame()
  env$.it - con

  eval(substitute(expr), env)
}

or anonymous functions:

with_connection - function(con, f) {
  open(con)
  on.exist(close(con))

  f(con)
}

Neither of which seems particularly appealing to me.

(I didn't test any of this code, so no guarantees that it works, but
hopefully you see the ideas)

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Hadley Wickham

 If you need plyr for other tasks you ought to use a different
 class for your date data (or wait until plyr can deal with
 POSIXlt objects).

How do you get POSIXlt objects into a data frame?

 df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01
 str(df)
'data.frame':   1 obs. of  1 variable:
 $ x: POSIXct, format: 2008-01-01

 df - data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01)
 str(df)
'data.frame':   1 obs. of  1 variable:
 $ x: AsIs, format: 0

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] taking rows from data.frames in list to form new data.frame?

2011-04-21 Thread Hadley Wickham

On Wed, Apr 20, 2011 at 6:36 PM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Perhaps you're looking for subset()? I'm not sure I understand the
 problem completely, but is

 do.call(rbind, lapply(database, function(df) subset(df, Symbol == 'IBM')))

 or

 library(plyr)
 ldply(lapply(database, function(df) subset(df, Symbol == 'IBM'), rbind)

That's a bit redundant.  All you need is:

ldply(database, function(df) subset(df, Symbol == 'IBM'))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-04-18 Thread Hadley Wickham

Yes, it's fixed and a new version of plyr has been pushed up to cran -
hopefully will be available for download soon.  In the meantime, I
think you can fix it by running library(stats) before
library(ggplot2).

Hadley

On Sun, Apr 17, 2011 at 3:51 PM, Bryan Hanson han...@depauw.edu wrote:
 Is there any news on this issue?  I have the same problem but on a Mac.  I
 have upgraded R and updated the built packages.  The console output and
 sessionInfo are below.  The problem is triggered by library(ggplot2) my
 .Rprofile  If I do library(ggplot2) after the aborted start up ggplot2 is
 loaded properly, and I can manually do everything in my .Rprofile and my
 configuration is as originally intended.  Thanks, Bryan

 Console Output:

 Loading required package: reshape
 Loading required package: plyr

 Attaching package: 'reshape'

 The following object(s) are masked from 'package:plyr':

    rename, round_any

 Loading required package: grid
 Loading required package: proto
 Error in rename(x, .base_to_ggplot) : could not find function setNames
 Error : unable to load R code in package 'ggplot2'
 Error: package/namespace load failed for 'ggplot2'
 [R.app GUI 1.40 (5751) x86_64-apple-darwin9.8.0]

 [History restored from /Users/bryanhanson/.Rhistory]

 and here is my session info after the aborted start up:

 R version 2.13.0 (2011-04-13)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  grid      methods
 base

 other attached packages:
 [1] proto_0.3-9.1   reshape_0.8.4   plyr_1.5.1      lattice_0.19-23

 * Original Post from Stephen Sefick


 I have just upgraded to R 2.13 and have library(ggplot2) in my
 .Rprofile (among other things).  when i start R I get an error
 message.  Has something in the start up scripts changed?  Is there a
 better way to specify the library calls in .Rprofile?  Thanks for all
 of the help in advance.

 Error:

 Loading required package: grid
 Loading required package: proto
 Error in rename(x, .base_to_ggplot) : could not find function setNames
 Error : unable to load R code in package 'ggplot2'
 Error: package/namespace load failed for 'ggplot2'
 [Previously saved workspace restored]


 Computer 1:

 R version 2.13.0 (2011-04-13)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  grid      methods
 [8] base

 other attached packages:
 [1] proto_0.3-9.1 reshape_0.8.4 plyr_1.5.1

 Computer 2

 R version 2.13.0 (2011-04-13)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  grid      methods
 [8] base

 other attached packages:
 [1] proto_0.3-9.1 reshape_0.8.4 plyr_1.5.1

 --
 Stephen Sefick
 
 | Auburn University                                         |
 | Biological Sciences                                      |
 | 331 Funchess Hall                                       |
 | Auburn, Alabama                                         |
 | 36849                                                           |
 |___|
 | sas0...@auburn.edu                                  |
 | http://www.auburn.edu/~sas0025                 |
 |___|


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/no-subject-tp3454416p3456100.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a better way to parse strings than this?

2011-04-14 Thread Hadley Wickham

 I was trying strsplit(string,\.\.\.) as per the suggestion in Venables
 and Ripleys book to (use '\.' to match '.'), which is in the Regular
 expressions section.

 I noticed that in the suggestions sent to me people used:
 strsplit(test,\\.\\.\\.)


 Could anyone please explain why I should have used \\.\\.\\. rather than
 \.\.\.?

Basically,

 * you want to match .
 * so the regular expression you need is \.
 * and the way you represent that in a string in R is \\.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a better way to parse strings than this?

2011-04-13 Thread Hadley Wickham

On Wed, Apr 13, 2011 at 5:18 AM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Here's one approach:

 strings - c(
 A5.Brands.bought...Dulux,
 A5.Brands.bought...Haymes,
 A5.Brands.bought...Solver,
 A5.Brands.bought...Taubmans.or.Bristol,
 A5.Brands.bought...Wattyl,
 A5.Brands.bought...Other)

 slist - strsplit(strings, '\\.\\.\\.')

Or with stringr:

library(stringr)
str_split_fixed(strings, fixed(...), n = 2)

# or maybe
str_match(strings, (..).*\\.\\.\\.(.*))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R plots pdf() does not allow spotcolors?

2011-04-13 Thread Hadley Wickham

 Even so, this would depend on what your publisher/printer
 requires in what you submit. It would be important to obtain
 from them a full and exact specification of what they require
 for colour printing in files submitted to them for printing.

No one else has mentioned this, but the publisher is trying to make
money, not make your life easier.  Sometimes the right thing to do is
say Hey - you guys are the experts at this, you convert my RGB pdfs
to the correct format.  It's worthwhile to push back a bit to
publishers and get them to do their job.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Line plots in base graphics

2011-04-13 Thread Hadley Wickham

Am I missing something obvious on how to draw multi-line plots in base graphics?

In ggplot2, I can do:

data(Oxboys, package = nlme)
library(ggplot2)

qplot(age, height, data = Oxboys, geom = line, group = Subject)

But in base graphics, the best I can come up with is this:

with(Oxboys, plot(age, height, type = n))
lapply(split(Oxboys[c(age, height)], Oxboys$Subject), lines)

Am I missing something obvious?

Thanks!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line plots in base graphics

2011-04-13 Thread Hadley Wickham

On Wed, Apr 13, 2011 at 2:58 PM, Ben Bolker bbol...@gmail.com wrote:
 Hadley Wickham hadley at rice.edu writes:


 Am I missing something obvious on how to draw multi-line plots in
 base graphics?

 In ggplot2, I can do:

 data(Oxboys, package = nlme)
 library(ggplot2)

 qplot(age, height, data = Oxboys, geom = line, group = Subject)

 But in base graphics, the best I can come up with is this:

 with(Oxboys, plot(age, height, type = n))
 lapply(split(Oxboys[c(age, height)], Oxboys$Subject), lines)

 [quoting removed to fool gmane]
 Am I missing something obvious?


  reshape to wide format and matplot()?

Hmmm, that doesn't work if your measurements are at different times e.g:

Oxboys2 - transform(Oxboys, age = age + runif(234))

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: CRAN problem with plyr-1.4.1

2011-04-12 Thread Hadley Wickham

 Then, can we have the ERROR message, please?
 Otherwise the only explanation I can guess is that a mirror grabs the
 contents of a repository exactly in the second the repository is updated and
 that is unlikely, particularly if more than one mirror is involved.

Isn't one possible explanation that PACKAGES.gz on the mirror was
updated before the package directory was?  That would seem to me a
plausible hypothesis to me, because rsync seems to send files in the
top level directory before files in directories below.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] plyr: version 1.5

2011-04-11 Thread Hadley Wickham

# plyr

plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:

  * fit the same model each patient subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising

It's already possible to do this with base R functions (like split and
the apply family of functions), but plyr makes it all a bit easier
with:

  * totally consistent names, arguments and outputs
  * convenient parallelisation through the foreach package
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages
  * labels that are maintained across all transformations

Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in equivalents.

A detailed introduction to plyr has been published in JSS: The
Split-Apply-Combine Strategy for Data Analysis,
http://www.jstatsoft.org/v40/i01/. You can find out more at
http://had.co.nz/plyr/, or track development at
http://github.com/hadley/plyr. You can ask questions about plyr (and
data manipulation in general) on the plyr mailing list. Sign up at
http://groups.google.com/group/manipulatr.

Version 1.5
--

NEW FEATURES

* new `strip_splits` function removes splitting variables from the data frames
  returned by `ddply`.

* `rename` moved in from reshape, and rewritten.

* new `match_df` function makes it easy to subset a data frame to only contain
  values matching another data frame. Inspired by
  http://stackoverflow.com/questions/4693849.

BUG FIXES

* `**ply` now works when passed a list of functions

* `*dply` now correctly names output even when some output combinations are
  missing (NULL) (Thanks to bug report from Karl Ove Hufthammer)

* `*dply` preserves the class of many more object types.

* `a*ply` now correctly works with zero length margins, operating on the
  entire object (Thanks to bug report from Stavros Macrakis)

* `join` now implements joins in a more SQL like way, returning all possible
  matches, not just the first one. It is still a (little) faster than merge.
  The previous behaviour is accessible with `match = first`.

* `join` is now more symmetric so that `join(x, y, left)` is closer to
  `join(y, x, right)`, modulo column ordering

* `named.quoted` failed when quoted expressions were longer than 50
  characters. (Thanks to bug report from Eric Goldlust)

* `rbind.fill` now correctly maintains POSIXct tzone attributes and preserves
  missing factor levels

* `split_labels` correctly preserves empty factor levels, which means that
  `drop = FALSE` should work in more places. Use `base::droplevels` to remove
  levels that don't occur in the data, and `drop = T` to remove combinations
  of levels that don't occur.

* `vaggregate` now passes `...` to the aggregation function when working out
  the output type (thanks to bug report by Pavan Racherla)


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R licence

2011-04-07 Thread Hadley Wickham

If all you need is loess, I suspect it would be cheaper to re-write it
in C# than to get a considered legal opinion on the matter.

Hadley

On Thu, Apr 7, 2011 at 2:45 AM, Stanislav Bek
stanislav.pavel@gmail.com wrote:
 Hi,

 is it possible to use some statistic computing by R in proprietary software?
 Our software is written in c#, and we intend to use
 http://rdotnet.codeplex.com/
 to get R work there. Especially we want to use loess function.

 Thanks,

 Best regards,
 Stanislav

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Windrose Percent Interval Frequencies Are Non Linear! Help!

2011-04-07 Thread Hadley Wickham

 Does anyone with specific windrose experience know how to adjust the
 graphic such that the data and the percent intervals are evenly spaced?

 Hopefully I am making sense here

 How about giving us a reproducible example?
 Code is better than mere description;
 code + description is best.

The problem is probably that A = pi r ^2, and the percent intervals
are spaced evenly on the square root scale to keep areas from being
distorted.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging data list in to single data frame

2011-04-04 Thread Hadley Wickham

 filelist = list.files(pattern = K*cd.txt) # the file names are K1cd.txt
 .to K200cd.txt

It's very easy:

names(filelist) - basename(filelist)
data_list - ldply(filelist, read.table, header=T, comment=;, fill=T)

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset and as.POSIXct / as.POSIXlt oddness

2011-03-24 Thread Hadley Wickham

On Thu, Mar 24, 2011 at 8:29 AM, Michael Bach pha...@gmail.com wrote:
 Dear R users,

 Given this data:

 x - seq(1,100,1)
 dx - as.POSIXct(x*900, origin=2007-06-01 00:00:00)
 dfx - data.frame(dx)

 Now to play around for example:

 subset(dfx, dx  as.POSIXct(2007-06-01 16:00:00))

 Ok. Now for some reason I want to extract the datapoints between hours
 10:00:00 and 14:00:00, so I thought well:

 subset(dfx, dx  as.POSIXct(2007-06-01 16:00:00), 14  as.POSIXlt(dx)$hour
  as.POSIXlt(dx)$hour  10)
 Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied

As others have noted you used a , instead of .  I wanted to point out
that this is a little easier to express with the lubridate package:

subset(dfx, dx  ymd(2007-06-01)  hour(dx)  14  hour(x)  10)

but I presume you meant:

subset(dfx, dx  ymd(2007-06-01)  hour(dx)  10  hour(x)  14)

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How create vector that sums correct responses for multiple subjects?

2011-03-24 Thread Hadley Wickham

On Thu, Mar 24, 2011 at 2:24 PM, Kevin Burnham kburn...@gmail.com wrote:
 I have a data file with indicates pretest scores for a linguistics
 experiment.  The data are in long form so for each of 33 subjects there are
 400 rows, one for each item on the test, and there is a column called
 ‘Correct’ that shows ‘C’ for a correct response and ‘E’ for an incorrect
 response.  I am trying to write a formula that will create a vector that
 indicates the number of correct answers for each subject.

 nrow(pretestdata[(pretestdata$Subject==1  pretestdata$Correct==C),])



 gives the number of correct responses for subject 1, but I would like a
 vector that indicates the number correct for each of 33 subjects.

How about

with(pretestdata, table(Subject, Correct))

?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-22 Thread Hadley Wickham

 I don't doubt that R may be the most popular in terms of discussion group
 traffic, but you should be aware that the traffic for SAS comprises two
 separate lists that used to be mirrored, but are no longer linked
 Usenet --  news://comp.soft-sys.sas  (what you counted)
 listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html

R programming challenge: create a script that parses those html pages
to compute the total number of messages per week!  (Maybe I'll use
this in class)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assigning to list element within target environment

2011-03-17 Thread Hadley Wickham

On Thu, Mar 17, 2011 at 7:25 AM, Richard D. Morey r.d.mo...@rug.nl wrote:
 I would like to assign an value to an element of a list contained in an
 environment. The list will contain vectors and matrices. Here's a simple
 example:

 # create toy environment
 testEnv = new.env(parent = emptyenv())

 # create list that will be in the environment, then assign() it
 x = list(a=1,b=2)
 assign(xList,x,testEnv)

 # create new element, to be inserted into xList
 c = 5:7

 Now, what I'd like to do is something like this:

 assign(xList[[3]],c,testEnv)

testEnv$x[[3]] - c

?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange R squared, possible error

2011-03-17 Thread Hadley Wickham

 2) I don't want to fit data with linear model of zero intercept.
 3) I dont know if I understand correctly. Im 100% sure the model for my data
 should have zero intercept.
 The only coordinate which Im 100% sure is correct. If I had measured quality
 Y of a same sample X0 number of times I would get E(Y(X0))=0.

Are points 2) and 3) not contradictory?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Persistent storage between package invocations

2011-03-16 Thread Hadley Wickham

 No.  First, please use path.expand(~) for this, and it does not
 necessarily mean the home directory (and in principle it might not expand at
 all).  In practice I think it will always be *a* home directory, but on
 Windows there may be more than one (and watch out for local/roaming profile
 differences).

Ok - I did remember that something like path.expand existed, I just
couldn't find it.  (And I always get confused by the difference
between normalizePath and path.expand).

 Second, it need not be writeable, and so many package authors write rubbish
 in my home directory that I usually arrange it not be writeable to R test
 processes.

So at a minimum I need to check if the home directory is writeable,
and fail gracefully if not.

What about using the registry on windows?  Does R provide any
convenience functions for adding/accessing entries?

 If you want something writeable across processes, use dirname(tempdir()) .

I was really looking for options to be persistent between instances -
i.e. so you decide once, and not need to be asked again. In a similar
way, it would be nice if you could choose a CRAN mirror once and then
not be asked again - and not need to know anything about how to set
options during startup.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] File Save As...

2011-03-16 Thread Hadley Wickham

 No, defaults are evaluated in the evaluation frame of the function. That's
 why you can use local variables in them, e.g. the way rgamma uses 1/rate as
 a default for scale.

Oops, yes, I was getting confused with promises - non-missing
arguments are promises evaluated in the parent frame.

 But the point isn't evaluation here:  the point is the parsing.  A function
 gets its source attribute when it is parsed, so getSrcFilename needs to be
 passed something that was parsed in the script.

Still, it would be nice to have a function that, by default, would
return the location of the calling script.  You can also hack
something together using sys.frames(), but it would be nice to have
official R support for it.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] proportional symbol map ggplot

2011-03-16 Thread Hadley Wickham

On Mon, Mar 14, 2011 at 9:41 AM, Strategische Analyse CSD Hasselt
csd...@fedpolhasselt.be wrote:
 Hello,

 we want to plot a proportional symbol map with ggplot. Symbols' area should
 have the same proportions as the scaled variable.
 Hereby an example we found on
 http://www.r-bloggers.com/bubble-chart-by-using-ggplot2/ . In this example
 we see the proportions of the symbols' area are different from the
 proportions of the scaled variable:

 crime -
 read.csv(http://datasets.flowingdata.com/crimeRatesByState2008.csv;,
 header=TRUE, sep=\t)
 p - ggplot(crime, aes(murder,burglary,size=population, label=state))
 p - p+geom_point(colour=red) +scale_area(to=c(1,20))+geom_text(size=3)

 Example:
 proportion population Pennsylvania/Tennessee= 2.003
 proportion symbols' area Pennsylvania/Tennessee= +/- 2.50

 proportion population California/Florida= 2.005
 proportion symbols' area California/Florida= +/-2.25

 What we would like is that the proportion of the symbols' area is also equal
 to 2.0.

To do that you need to make sure the lower limit extends to 0 and the
size of the smallest circle is also 0. I think something like
scale_area(to=c(0, 20), limits = c(0, 4e7), breaks = 1:4 * 1e7) should
suffice.

It would also be helpful if you stated how you calculated the areas.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does R have a const object?

2011-03-16 Thread Hadley Wickham

 Its useful for being able to set defaults for arguments that do not
 have defaults.  That cannot break existing programs.

 Until the next program decides do co change those defaults and either
 can't or does and you end up with incompatible assumptions.  It also
 make the code with the added defaults inconsistent with the
 documentation though, which is not a good idea.  It may seem
 convenient but it isn't a good idea in production code that is
 intended to play well with other production code.

I like the name the ruby community has for these sort of changes:
monkey patching.  It's an evocative term!

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Persistent storage between package invocations

2011-03-15 Thread Hadley Wickham

Hi all,

Does anyone have any advice or experience storing package settings
between R runs?  Can I rely on the user's home directory (e.g.
tools::file_path_as_absolute(~)) to be available and writeable
across platforms?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing colour of continuous time-series in ggplot2

2011-03-15 Thread Hadley Wickham

You need to specify the group aesthetic - that defines how
observations are grouped into instances of a geom.

Hadley

On Tue, Mar 15, 2011 at 8:37 AM, joeP joseph.parr...@bt.com wrote:
 Hi,

 This seems like there should be a simple answer, but having spent most of
 the day trying to find it, I'm becoming less convinced and as such am asking
 it here.

 Here's a sub-set of my data (a data.frame in R):

 myDF
                                time     value      trial
 1   2011-03-01 01:00:00  64092  FALSE
 2   2011-03-01 02:00:00  47863  FALSE
 3   2011-03-01 03:00:00  43685  FALSE
 4   2011-03-01 04:00:00  44821   TRUE
 5   2011-03-01 05:00:00  48610   TRUE
 6   2011-03-01 06:00:00  44856   TRUE
 7   2011-03-01 07:00:00  55199   TRUE
 8   2011-03-01 08:00:00  69326  FALSE
 9   2011-03-01 09:00:00  84048  FALSE
 10 2011-03-01 10:00:00  81341  FALSE

 From this, I can plot a simple time-series in ggplot:

 ggplot(myDF, aes(time,value)) + geom_line()

 but I'd like to change the colour of the line based on whether the trial
 value is TRUE or FALSE, so I try:

 ggplot(myDF, aes(time,value)) + geom_line(aes(colour=trial))

 but this draws a line from the value on row 3 to that on row 8 (essentially
 plotting TRUE and FALSE as separate data-sets).  I've tried using various
 other geometries (inc. geom_path()) but all have produced similar events.
 Is there a way I can plot the time-series in a continuous way (i.e. as one
 data-set) and change only the colour of the line?

 Thanks,
 Joe


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Changing-colour-of-continuous-time-series-in-ggplot2-tp3356582p3356582.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] File Save As...

2011-03-15 Thread Hadley Wickham

 The bigger issue is that R can't tell the location of an open script,
 which makes it harder to create new versions of existing work

 But it can.  If you open a script and choose save, it will be saved to the
 same place.  Or do you mean an executing script?  There are indirect ways to
 find the name of the executing script.  For example,
 in R-devel (to become 2.13.0 next month), you can do this:


 cat(This file is , getSrcFilename(function(){}, full=TRUE), \n)

 The getSrcFilename() function will be new in 2.13.0.  You can do the same in
 earlier versions, but you need to program it yourself.

Could getSrcFilename() gain a default argument so that
getSrcFilename() would by default return the path of the executing
script?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] File Save As...

2011-03-15 Thread Hadley Wickham

 Could getSrcFilename() gain a default argument so that
 getSrcFilename() would by default return the path of the executing
 script?

 No, it needs to see a function defined in that script.

But I thought default arguments were evaluated in the parent
environment?  Does that not follow for source attributes as well?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dataframe to a timeseries object

2011-03-14 Thread Hadley Wickham

Well, I'd start by removing all explicit use of environments, which
makes you code very hard to follow.

Hadley

On Monday, March 14, 2011, Daniele Amberti daniele.ambe...@ors.it wrote:
 I found that plyr:::daply is more efficient than base:::by (am I doing 
 something wrong?), below updated code for comparison (I also fixed a couple 
 things).
 Function daply from plyr package has also a .parallel argument and I wonder 
 if creating timeseries objects in parallel and then combining them would be 
 faster (Windows XP platform); does someone has experience with this topic? I 
 found only very simple examples about plyr and parallel computations and I do 
 not have a working example for such kind of implementation (daply that return 
 a list of timeseries objects).

 Thanks in advance,
 Daniele Amberti


 set.seed(123)

 N - 1
 X - data.frame(
   ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
   DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4)),
   VALUE = runif(N*4), stringsAsFactors = FALSE)
 X - X[sample(1:(N*4), N*4),]
 str(X)

 library(timeSeries)
 buildTimeSeriesFromDataFrame - function(x, env)
 {
   {
     if(exists(xx, envir = env))
       assign(xx,
         cbind(get(xx, env), timeSeries(x$VALUE, x$DATE,
           format = '%Y-%m-%d %H:%M:%S',
           zone = 'GMT', units = as.character(x$ID[1]))),
         envir = env)
     else
       assign(xx,
         timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
           zone = 'GMT', units = as.character(x$ID[1])),
         envir = env)

     return(TRUE)
   }
 }

 tsBy - function(...)
 {
   e1 - new.env(parent = baseenv())
   res - by(X, X$ID, buildTimeSeriesFromDataFrame,
       env = e1, simplify = TRUE)
   return(get(xx, e1))
 }

 Time01 - replicate(100,
   system.time(tsBy(X, X$ID, simplify = TRUE))[[1]])
 median(Time01)
 hist(Time01)
 ATS - tsBy(X, X$ID, simplify = TRUE)


 library(xts)
 buildXtsFromDataFrame - function(x, env)
 {
   {
     if(exists(xx, envir = env))
       assign(xx,
         cbind(get(xx, env), xts(x$VALUE,
           as.POSIXct(x$DATE, tz = GMT,
             format = '%Y-%m-%d %H:%M:%S'),
           tzone = 'GMT')),
         envir = env)
     else
       assign(xx,
         xts(x$VALUE, as.POSIXct(x$DATE, tz = GMT,
             format = '%Y-%m-%d %H:%M:%S'),
           tzone = 'GMT'),
         envir = env)

     return(TRUE)
   }
 }

 xtsBy - function(...)
 {
   e1 - new.env(parent = baseenv())
   res - by(X, X$ID, buildXtsFromDataFrame,
       env = e1, simplify = TRUE)
   return(get(xx, e1))
 }

 Time02 - replicate(100,
   system.time(xtsBy(X, X$ID,simplify = TRUE))[[1]])
 median(Time02)
 hist(Time02)
 AXTS - xtsBy(X, X$ID, simplify = TRUE)

 plot(density(Time02), col = red,
   xlim = c(min(c(Time02, Time01)), max(c(Time02, Time01
 lines(density(Time01), col = blue)
 #check equal, a still a problem with names
 AXTS2 - as.timeSeries(AXTS)
 names(AXTS2) - names(ATS)
 identical(getDataPart(ATS), getDataPart(AXTS2))
 identical(time(ATS), time(AXTS2))

 # with plyr library and daply instead of by:
 library(plyr)

 tsDaply - function(...)
 {
   e1 - new.env(parent = baseenv())
   res - daply(X, ID, buildTimeSeriesFromDataFrame,
       env = e1)
   return(get(xx, e1))
 }

 Time03 - replicate(100,
   system.time(tsDaply(X, X$ID))[[1]])
 median(Time03)
 hist(Time03)

 xtsDaply - function(...)
 {
   e1 - new.env(parent = baseenv())
   res - daply(X, ID, buildXtsFromDataFrame,
       env = e1)
   return(get(xx, e1))
 }

 Time04 - replicate(100,
   system.time(xtsDaply(X, X$ID))[[1]])

 median(Time04)
 hist(Time04)

 plot(density(Time04), col = red,
   xlim = c(
     min(c(Time02, Time01, Time03, Time04)),
     max(c(Time02, Time01, Time03, Time04))),
   ylim = c(0,100))
 lines(density(Time03), col = blue)
 lines(density(Time02))
 lines(density(Time01))





 -Original Message-
 From: Daniele Amberti
 Sent: 11 March 2011 14:44
 To: r-help@r-project.org
 Subject: dataframe to a timeseries object

 I’m wondering which is the most efficient (time, than memory usage) way to 
 obtain a multivariate time series object from a data frame (the easiest data 
 structure to get data from a database trough RODBC).
 I have a starting point using timeSeries or xts library (these libraries can 
 handle time zones), below you can find code to test.
 Merging parallelization (cbind) is something I’m thinking at (suggestions 
 from users with experience on this topic is highly appreciated), any 
 suggestion is welcome.
 My platform is Windows XP, R 2.12.1, latest available packages on CRAN for 
 timeSeries and xts.


 set.seed(123)

 N - 9000
 X - data.frame(
   ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
   DATE = rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4),
   VALUE = runif(N*4))

 library(timeSeries)
 buildTimeSeriesFromDataFrame - function(x, env)
 {
   {
     if(exists(xx, envir = env))
       assign(xx,
         cbind(get(xx, env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d

Re: [R] dataframe to a timeseries object

2011-03-14 Thread Hadley Wickham

That's a bit better, but you're still creating an object in the global
environment, when you should be returning it from your function.

Hadley

On Mon, Mar 14, 2011 at 8:54 AM, Daniele Amberti daniele.ambe...@ors.it wrote:
 Thanks Hadley for Your interest, below some code without environments use 
 (using timeSeries); I also made some experiments with .parallel = TRUE in 
 daply to crate timeSeries objects and then bind them together but I have some 
 problems.

 Thank You in advance,
 Daniele Amberti

 set.seed(123)
 N - 1
 X - data.frame(
  ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
  DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4)),
  VALUE = runif(N*4), stringsAsFactors = FALSE)
 X - X[sample(1:(N*4), N*4),]
 str(X)
 head(X)

 #define a variable in global env
 ATS - NULL

 buildTimeSeriesFromDataFrame - function(x)
 {
  library(timeSeries)
  if(!is.null(ATS)) # in global env
  {
    # assign in global env
    ATS - cbind(ATS,
      timeSeries(x$VALUE, x$DATE,
        format = '%Y-%m-%d %H:%M:%S',
        zone = 'GMT', units = as.character(x$ID[1])))
  } else
  {
    # assign in global env
    ATS - timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
      zone = 'GMT', units = as.character(x$ID[1]))
  }
  return(TRUE)
 }

 tsDaply - function(...)
 {
  # assign in global env, to clean previous run
  ATS - NULL
  library(plyr)
  res - daply(X, ID, buildTimeSeriesFromDataFrame)
  return(res)
 }

 tsDaply(X, X$ID)
 head(ATS)

 #performance tests
 Time - replicate(100,
  system.time(tsDaply(X, X$ID))[[1]])
 median(Time)
 hist(Time)

 ###
 #some multithread tests:
 ###

 library(doSMP)
 w - startWorkers(workerCount = 2)
 registerDoSMP(w)

 # do not cbint ts, just create
 buildTimeSeriesFromDataFrame2 - function(x)
 {
  library(timeSeries  )
  xx - timeSeries:::timeSeries(x$VALUE, x$DATE,
    format = '%Y-%m-%d %H:%M:%S',
    zone = 'GMT', units = as.character(x$ID[1]))
  return(xx)
 }

 #tsDaply2 - function(...)
 #{
 #  library(plyr)
 #  res - daply(X, ID, buildTimeSeriesFromDataFrame2, .parallel = TRUE)
 #  return(res)
 #}

 # tsDaply2 .parallel = TRUE return error:
 #Error in do.ply(i) : task 4 failed - subscript out of bounds
 #In addition: Warning messages:
 #1: anonymous: ... may be used in an incorrect context: '.fun(piece, ...)'
 #2: anonymous: ... may be used in an incorrect context: '.fun(piece, ...)'


 tsDaply2 - function(...)
 {
  library(plyr)
  res - daply(X, ID, buildTimeSeriesFromDataFrame2, .parallel = FALSE)
  return(res)
 }
 # tsDaply2 .parallel = FALSE work but list discart timeSeries class

 # bind after ts creation
 res - tsDaply2(X, X$ID)
 # list is not a timeSeries object
 str(cbind(t(res)))
 res - as.timeSeries(cbind(t(res)))

 stopWorkers(w)


 -Original Message-
 From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley 
 Wickham
 Sent: 14 March 2011 12:48
 To: Daniele Amberti
 Cc: r-help@r-project.org
 Subject: Re: [R] dataframe to a timeseries object - [ ] Message is from an 
 unknown sender

 Well, I'd start by removing all explicit use of environments, which
 makes you code very hard to follow.

 Hadley

 On Monday, March 14, 2011, Daniele Amberti daniele.ambe...@ors.it wrote:
 I found that plyr:::daply is more efficient than base:::by (am I doing 
 something wrong?), below updated code for comparison (I also fixed a couple 
 things).
 Function daply from plyr package has also a .parallel argument and I wonder 
 if creating timeseries objects in parallel and then combining them would be 
 faster (Windows XP platform); does someone has experience with this topic? I 
 found only very simple examples about plyr and parallel computations and I 
 do not have a working example for such kind of implementation (daply that 
 return a list of timeseries objects).

 Thanks in advance,
 Daniele Amberti


 set.seed(123)

 N - 1
 X - data.frame(
   ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
   DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), 4)),
   VALUE = runif(N*4), stringsAsFactors = FALSE)
 X - X[sample(1:(N*4), N*4),]
 str(X)

 library(timeSeries)
 buildTimeSeriesFromDataFrame - function(x, env)
 {
   {
     if(exists(xx, envir = env))
       assign(xx,
         cbind(get(xx, env), timeSeries(x$VALUE, x$DATE,
           format = '%Y-%m-%d %H:%M:%S',
           zone = 'GMT', units = as.character(x$ID[1]))),
         envir = env)
     else
       assign(xx,
         timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
           zone = 'GMT', units = as.character(x$ID[1])),
         envir = env)

     return(TRUE)
   }
 }

 tsBy - function(...)
 {
   e1 - new.env(parent = baseenv())
   res - by(X, X$ID, buildTimeSeriesFromDataFrame,
       env = e1, simplify = TRUE)
   return(get(xx, e1))
 }

 Time01 - replicate(100,
   system.time(tsBy(X, X$ID, simplify = TRUE))[[1]])
 median(Time01)
 hist(Time01)
 ATS - tsBy(X, X$ID, simplify = TRUE)


 library(xts)
 buildXtsFromDataFrame - function(x, env

Re: [R] increase a value by each group?

2011-03-14 Thread Hadley Wickham

On Mon, Mar 14, 2011 at 9:59 AM, ONKELINX, Thierry
thierry.onkel...@inbo.be wrote:
 Something like this?

 my_data=read.table(clipboard, header=TRUE)
 my_data$s_name - factor(my_data$s_name)
 library(plyr)
 ddply(my_data, .(s_name), function(x){
        x$Im_looking - x$Depth + as.numeric(x$s_name) / 100
        x
 })


I think you need factor in there:

ddply(my_data, .(s_name), function(x){
        x$Im_looking - x$Depth + as.numeric(factor(x$s_name)) / 100
        x
})

or with transform:

ddply(my_data, s_name, transform,
  Im_looking = Depth + as.numeric(factor(s_name)) / 100)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need Assistance in Stacked Area plot

2011-03-13 Thread Hadley Wickham

You might try sending a reproducible example
(https://github.com/hadley/devtools/wiki/Reproducibility) to the
ggplot2 mailing list.

Hadley

On Wed, Feb 16, 2011 at 8:41 AM, Kishorenalluri
kishorenalluri...@gmail.com wrote:

 Dear All,

 I need the assistance to plot the staked area plot using ggplot2

 What i am expecting is to plot in X-axis the time(Shown in column1) from
 range of 0 to 100 seconds, and in the y axis the stable increment in area in
 both directions (less than and greater than zero) for columns 3, 4, 5 (
 Column1, Column2, column3). The example as follows
        TIME              concentration           Column1
 Column2               Column3
        0.     0.E+00    0.E+00    0.E+00
 0.E+00
        0.     0.E+00    1.06339151E-16 -1.45858050E-21
 -5.91725566E-19
        0.     5.38792107E-16 1.02157781E-17 -1.64419026E-20
 -7.66233765E-19
        1.     2.59545931E-15 3.42126227E-18 -1.98776066E-20
 -3.72669548E-19
        1.     2.91310885E-15    2.81003039E-18   -1.91286265E-20
 -2.38608440E-19
        2.     3.07852570E-15    2.50631096E-18   -1.81194864E-20
 -1.86739453E-19
        3.     3.23261641E-15   -5.56403736E-16 -1.77552840E-20
 -1.70484122E-19
        3.     3.35382008E-15 1.54158070E-17   -2.34217089E-20
 -2.07658923E-19
        4.     3.82413183E-15   -9.70815457E-13    0.E+00
 -6.79571364E-19
        4.     5.95542983E-15    6.80013097E-16   -4.50874919E-19
 -2.66777428E-19
        5.     4.39175250E-14 2.47867332E-15   -1.01608288E-18
 -9.76030255E-19
        6.     1.38894685E-13    5.61681417E-15   -3.93770327E-18
 -3.49248490E-18
        6.     3.68692195E-13    1.16035253E-14   -1.41445363E-17
 -1.22159013E-17
        7.     8.73269040E-13 1.79082686E-14   -3.66076039E-17
 -2.95735768E-17
        7.     1.76597984E-12    2.12332352E-14   -6.62184833E-17
 -4.70513477E-17
        8.     2.95030081E-12    2.25656057E-14   -9.60506242E-17
 -5.97356578E-17
        9.     4.26108735E-12 1.34254419E-14 -1.25090653E-16
 -6.95633425E-17
        9.     5.63056757E-12    2.32612717E-14 -1.53348074E-16
 -8.54969359E-17
       10.     7.01928856E-12    2.31851800E-14 -1.82546609E-16
 -1.02043519E-16
       10.     8.39638583E-12    2.31701226E-14   -2.11755800E-16
 -1.18810066E-16
       11.     9.76834605E-12 -5.45647674E-13   -2.40796726E-16
 -1.35796249E-16
       12.     1.11376775E-11   -9.78639513E-13   -2.69731656E-16
 -1.53027433E-16
       12.     1.25059438E-11    2.32773795E-14   -2.98461854E-16
 -1.70614303E-16
       13.     1.38742114E-11    2.33364991E-14 -3.27081730E-16
 -1.88394094E-16
       13.     1.52430947E-11    2.33958407E-14 -3.55573728E-16
 -2.06350128E-16
       14.     1.66127832E-11    2.34487161E-14 -3.83935678E-16
 -2.24396442E-16
       15.     1.79830524E-11    2.34906687E-14   -4.12155560E-16
 -2.42441983E-16
       15.     1.93533963E-11    2.26377072E-14 -4.39950727E-16
 -2.90534988E-16
       16.     2.06980593E-11    2.17171727E-14   -4.70446135E-16
 -3.56228328E-16
       16.     2.19820310E-11    1.87207801E-14   -4.98853596E-16
 -4.46861939E-16
       17.     2.32023004E-11    1.94966974E-14   -5.22695187E-16
 -5.78943073E-16
       18.     2.43484124E-11    1.78879137E-14   -5.38143362E-16
 -7.87629728E-16
       18.     2.54003017E-11    3.09723082E-11   -5.36126990E-16
 -1.16329436E-15
       19.     2.63149519E-11    9.87861573E-15   -4.79610661E-16
 -2.06287085E-15
       19.     2.69770178E-11   -8.01983206E-14   -6.65787469E-17
 -3.62969106E-15
       20.     2.50814500E-11    1.40265746E-09   -4.91364111E-17
 -3.37932814E-15
       21.     2.23899790E-11   -1.55489960E-14   -4.45522315E-17
 -3.35597818E-15
       21.     2.09705026E-11   -1.20672358E-14   -5.22849137E-17
 -3.35854728E-15
       22.     1.99397498E-11   -1.67958460E-14   -6.01502858E-17
 -3.24838081E-15
       22.     1.91117367E-11   -1.08397245E-14   -6.67337987E-17
 -3.01352774E-15
       23.     1.84409696E-11   -3.92677717E-15   -7.10576594E-17
 -2.67668939E-15
       24.     1.78951028E-11   -2.97739441E-15   -7.26367637E-17
 -2.28204667E-15
       24.     1.74505303E-11   -2.59241020E-15   -7.15967013E-17
 -1.88057466E-15
       25.     1.70886909E-11   -2.22676101E-11   -6.90344927E-17
 -1.51430500E-15
       25.     1.67938004E-11   -2.43300435E-15   -6.45959094E-17
 -1.20480757E-15
       26.     1.65527898E-11   -2.00735751E-15   -6.04828745E-17
 -9.62120866E-16
       27.     1.63547265E-11   -1.67859253E-15   -5.68950146E-17
 -7.80733120E-16
       27.     1.61904094E-11   -1.41262089E-15   -5.41786960E-17
 -6.51168814E-16
       28.     1.60527139E-11   -2.49439678E-12   -5.23952425E-17
 -5.62375603E-16
       28.     1.59360400E-11    2.62881626E-10   -5.15606096E-17
 -5.03612687E-16
       29.

Re: [R] Vector of weekly dates starting with a given date

2011-03-09 Thread Hadley Wickham

On Wed, Mar 9, 2011 at 3:04 PM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Hello!

 I have a date (a Monday):

 date-20081229
 mydates-as.Date(as.character(date),%Y%m%d)

 What package would allow me to create a vector that starts with that
 date (mydates) and contains dates for 51 Mondays that follow it (so,
 basically, 51 dates separated by a week)?

library(lubridate)
mydates - ymd(date) + weeks(0:51)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R usage survey

2011-03-04 Thread Hadley Wickham

 Ok, I am very interested in what methods you plan to use that would be fit 
 under the description suitably analyzed for voluntary response data.  From 
 my training and experience the only suitable thing to do with voluntary 
 response data is to put it through the shredder, into the recycle bin, or use 
 as an example of what not to do in introductory textbooks.  Treating 
 voluntary response data (especially given the responses to your post you have 
 seen so far) as if it came from a proper random probability sample does not 
 fit the idea of suitable analysis.

Come on, that's a bit strong.  In real life, it's not always possible
to take a perfectly random sample and assume (at best) that missing
responses are completely at random. Even descriptive analysis on a
flawed sample is better than nothing at all.  Of course you need to be
extremely careful about making inferences about the wider population,
but it's not true that the only thing you can do with survey data is
to throw it in the trash.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The L Word

2011-02-24 Thread Hadley Wickham

 Note however that I've never seen evidence for a *practical*
 difference in simple cases, and also of such cases as part of a
 larger computation.
 But I'm happy to see one if anyone has an interesting example.

 E.g., I would typically never use  0L:100L  instead of 0:100
 in an R script because I think code readability (and self
 explainability) is of considerable importance too.

But : casts to integer anyway:

 str(0:100)
 int [1:101] 0 1 2 3 4 5 6 7 8 9 ...

And performance in this case is (obviously) negligible:

 library(microbenchmark)
 microbenchmark(as.integer(c(0, 100)), times = 1000)
Unit: nanoeconds
  min  lq median  uq   max
as.integer(c(0, 100)) 712 791813 896 15840

(mainly included as opportunity to try out microbenchmark)

So you save ~800 ns but typing two letters probably takes 0.2 s (100
wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves
you time if you're going to be calling it more than 125000 times ;)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] monitor variable change

2011-02-16 Thread Hadley Wickham

 You can replace the previous line by:

 browser(expr=(a!=old.a)

 see ?browser for details.

I don't understand why you'd want to do that - using if is much more
readable to me (and is much more general!)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] monitor variable change

2011-02-16 Thread Hadley Wickham

One way to implement this functionality is with a task manager callback:

watch - function(varname) {
  old - get(varname)

  changed - function(...) {
new - get(varname)
if (!identical(old, new)) {
  message(varname,  is now , new)
  old - new
}
TRUE
  }
  invisible(addTaskCallback(changed))
}

a - 1
watch(a)
a - 2


Hadley

On Wed, Feb 16, 2011 at 9:38 AM, Alaios ala...@yahoo.com wrote:
 Dear all I would like to ask you if there is a way in R to monitor in R when 
 a value changes.

 Right now I use the sprintf('my variables is %d \n, j) to print the value of 
 the variable.

 Is it possible when a 'big' for loop executes to open in a new window to 
 dynamically check only the variable I want to.

 If I put all the sprintf statements inside my loop then I get flooded with so 
 many messages that makes it useless.

 Best Regards
 Alex

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error when modifying names of the object returned by get()

2011-02-15 Thread Hadley Wickham

 You can probably do this by constructing a call to the `names-` replacement
 function, but it's really bad style.  Don't write R code that has external
 side effects if you can avoid it.  In this case, you'll almost certainly get
 more maintainable code by writing your function to return a copy of x with
 new names, rather than trying to modify the original.

And for that task, you might find setNames useful: It is most useful
at the end of a function definition where one is creating the object
to be returned and would prefer not to store it under a name just so
the names can be assigned.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to order POSIXt objects ?

2011-02-14 Thread Hadley Wickham

It's a bit better to use xtfrm.
Hadley

On Monday, February 14, 2011, jim holtman jholt...@gmail.com wrote:
 'unclass' it first(assuming that it is POSIXct)

 -unclass(mytime)

 On Mon, Feb 14, 2011 at 3:55 AM, JonC jon_d_co...@yahoo.co.uk wrote:

 I have a problem ordering by descending magnitude a POSIXt object. Can
 someone help please and let me know how to work around this. My goal is to
 be able to order my data by DATE and then by descending TIME.

 I have tried to include as much info as possible below. The problem stems
 from trying to read in times from a CSV file. I have converted the character
 time values to a POSIXt object using the STRPTIME function. I would like
 ideally to sort using the order function as below.

 test.sort - order(test$DATE, -test$mytime)

 However, when I try this I receive the error as below :

 Error in `-.POSIXt`(test2$mytime) :
  unary '-' is not defined for POSIXt objects

 To make this easier to understand I have pasted my example data below with a
 list of R commands I have used. Any help or assistance would be appreciated.

 test2 - read.csv(C:/Documents and Settings/Jonathan Cooke/My
 Documents/Downloads/test2.csv, sep=,)
 test2
        DATE     TIME
 1 18/01/2011 08:00:01
 2 18/01/2011 08:10:01
 3 18/01/2011 08:20:01
 4 18/01/2011 08:30:01
 5 19/01/2011 08:00:01
 6 19/01/2011 08:10:01
 7 19/01/2011 08:20:01
 8 19/01/2011 08:30:01

 test2$mytime - strptime(test2$TIME,%H:%M:%S)
 test2$mytime
 [1] 2011-02-14 08:00:01 2011-02-14 08:10:01 2011-02-14 08:20:01
 2011-02-14 08:30:01 2011-02-14 08:00:01
 [6] 2011-02-14 08:10:01 2011-02-14 08:20:01 2011-02-14 08:30:01

 test2
        DATE     TIME              mytime
 1 18/01/2011 08:00:01 2011-02-14 08:00:01
 2 18/01/2011 08:10:01 2011-02-14 08:10:01
 3 18/01/2011 08:20:01 2011-02-14 08:20:01
 4 18/01/2011 08:30:01 2011-02-14 08:30:01
 5 19/01/2011 08:00:01 2011-02-14 08:00:01
 6 19/01/2011 08:10:01 2011-02-14 08:10:01
 7 19/01/2011 08:20:01 2011-02-14 08:20:01
 8 19/01/2011 08:30:01 2011-02-14 08:30:01

 test2.sort - order(test2$DATE, -test2$mytime)
 Error in `-.POSIXt`(test2$mytime) :
  unary '-' is not defined for POSIXt objects

 It's at this stage where I have got stuck as I'm new to R and don't yet know
 a way of getting around this error. Thanks in advance.

 JonC









 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-order-POSIXt-objects-tp3304609p3304609.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-07 Thread Hadley Wickham

On Mon, Feb 7, 2011 at 5:54 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
 Looking at the timings by each stage may help :

   system.time(dt - data.table(dat))
   user  system elapsed
   1.20    0.28    1.48
   system.time(setkey(dt, x1, x2, x3, x4, x5, x6, x7, x8))   # sort by the
 8 columns (one-off)
   user  system elapsed
   4.72    0.94    5.67
   system.time(udt - dt[, list(y = sum(y, na.rm = TRUE)), by = 'x1, x2,
 x3, x4, x5, x6, x7, x8'])
   user  system elapsed
   2.00    0.21    2.20     # compared to 11.07s


 data.table doesn't have a custom data structure, so it can't be that.
 data.table's structure is the same as data.frame i.e. a list of vectors.
 data.table inherits from data.frame.  It *is* a data.frame, too.

 The reasons it is faster in this example include :
 1. Memory is only allocated for the largest group.
 2. That memory is re-used for each group.
 3. Since the data is ordered contiguously in RAM, the memory is copied over
 in bulk for each group using
 memcpy in C, which is faster than a for loop in C. Page fetches are
 expensive; they are minimised.

But this is exactly what I mean by a custom data structure - you're
not using the usual data frame API.

Wouldn't it be better to implement these changes to data frame so that
everyone can benefit? Or is it just too specialised to this particular
case (where I guess you're using that the return data structure of the
summary function is consistent)?

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-07 Thread Hadley Wickham

 Does FAQ 1.8 answer that ok ?
   Ok, I'm starting to see what data.table is about, but why didn't you
 enhance data.frame in R? Why does it have to be a new package?
   http://datatable.r-forge.r-project.org/datatable-faq.pdf

Kind of.  I think there are two sets of features data.table provides:

 * a compact syntax for expressing many common data manipulations
 * high performance data manipulation

FAQ 1.8 answers the question for the syntax, but not for the
performance related features.

Basically, I'd love to be able to use the high performance components
of data table in plyr, but keep using my existing syntax.  Currently
the only way to do that is for me to dig into your C code to
understand why it's fast, and then implement those ideas in plyr.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate function - na.action

2011-02-06 Thread Hadley Wickham

 There's definitely something amiss with aggregate() here since similar
 functions from other packages can reproduce your 'control' sum. I expect
 ddply() will have some timing issues because of all the subgrouping in your
 data frame, but data.table did very well and the summaryBy() function in the
 doBy package did OK:

Well, if you use the right plyr function, it works just fine:

system.time(count(dat, c(x1, x2, x3, x4, x4, x5, x6,
x7, x8), y))
#   user  system elapsed
#  9.754   1.314  11.073

Which illustrates something that I've believed for a while about
data.table - it's not the indexing that speed things up, it's the
custom data structure.  If you use ddply with data frames, it's slow
because data frames are slow.  I think the right way to resolve this
is to to make data frames more efficient, perhaps using some kind of
mutable interface where necessary for high-performance operations.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rows with two criteria in dataframe

2011-01-26 Thread Hadley Wickham

On Wed, Jan 26, 2011 at 5:27 AM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Here are two more candidates, using the plyr and data.table packages:

 library(plyr)
 ddply(X, .(x, y), function(d) length(unique(d$z)))
  x y V1
 1 1 1  2
 2 1 2  2
 3 2 3  2
 4 2 4  2
 5 3 5  2
 6 3 6  2

 The function counts the number of unique z values in each sub-data frame
 with the same x and y values. The argument d in the anonymous function is a
 data frame object.

Another approach is to use the much faster count function:

count(unique(X))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to reshape wide format data.frame to long format?

2011-01-20 Thread Hadley Wickham

 I think I should be able to do this using the reshape function, but
 I cannot get it to work. I think I need some help to understand
 this...


 (If I could split the variable into three separate columns splitting
 by ., that would be even better.)

 Use strsplit and [

Or colsplit, from reshape, that does this for you.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2, geom_hline and facet_grid

2011-01-20 Thread Hadley Wickham

Ok, that's a known bug:
https://github.com/hadley/ggplot2/issues/labels/facet#issue/96

Thanks for the reproducible example though!

Hadley

On Thu, Jan 20, 2011 at 3:46 AM, Sandy Small sandy.sm...@nhs.net wrote:

   Thank you.
   That seems to work - also on my much larger data set.
   I'm not sure I understand why it has to be defined as a factor, but if it
   works...
   Sandy
   Dennis Murphy wrote:

     Hi Sandy:
     I can reproduce your problem given the data provided. When I change
     ecd_rhythm from character to factor, it works as you intended.
      str(lvefeg)
     List of 4     ### Interesting...
      $ cvd_basestudy: chr [1:10] CBP05J02 CBP05J02 CBP05J02 CBP05J02
     ...
      $ ecd_rhythm   : chr [1:10] AF AF AF AF ...
      $ fixed_time   : num [1:10] 30.9 33.2 32.6 32.1 30.9 ...
      $ variable_time: num [1:10] 29.4 32 30.3 33.7 28.3 ...
      - attr(*, row.names)= int [1:10] 1 2 3 4 5 6 7 9 10 11
      class(lvefeg)
     [1] cast_df    data.frame
     lvefeg$ecd_rhythm - factor(lvefeg$ecd_rhythm)
     p - qplot((variable_time + fixed_time) /2 , variable_time - fixed_time,
     data = lvefeg, geom='point')
     p
     p + facet_grid(ecd_rhythm ~ .) + geom_hline(yintercept=0)
     Does that work on your end?  (And thank you for the reproducible example.
     Using dput() allows us to see what you see, which is very helpful.)
     HTH,
     Dennis

   On Wed, Jan 19, 2011 at 1:30 PM, Small Sandy (NHS Greater Glasgow  Clyde)
   [1]sandy.sm...@nhs.net wrote:

     Hi
     Still  having problems in that when I use geom_hline and facet_grid
     together I get two extra empty panels
     A reproducible example can be found at:
     [2]https://gist.github.com/786894
     Sandy Small
     
     From: [3]h.wick...@gmail.com [[4]h.wick...@gmail.com] On Behalf Of Hadley
     Wickham [[5]had...@rice.edu]
     Sent: 19 January 2011 15:11
     To: Small Sandy (NHS Greater Glasgow  Clyde)
     Cc: [6]r-help@r-project.org

   Subject: Re: [R] ggplot2, geom_hline and facet_grid

   Hi Sandy,
   It's difficult to know what's going wrong without a small reproducible
   example ([7]https://github.com/hadley/devtools/wiki/Reproducibility) -
   could you please provide one?  You might also have better luck with an
   email directly to the ggplot2 mailing list.
   Hadley
   On Wed, Jan 19, 2011 at 2:57 AM, Sandy Small [8]sandy.sm...@nhs.net wrote:
    Having upgraded to R version 2.12.1 I still have the same problem:
   
    The combination of facet_grid and geom_hline produce (for me) 4 panels
    of which two are empty of any data or lines (labelled 1 and 2).
    Removing either the facet_grid or the geom_hline  gives me the result I
    would then expect.
   
    I have tried forcing the rhythm to be a factor
    Anyone have any ideas?
   
    Sandy
   
    Dennis Murphy wrote:
   
      Hi:
   
      The attached plot comes from the following code:
   
      g - ggplot(data =f, aes(x = (variable_time + fixed_time)/2, y
      variable_time - fixed_time))
      g + geom_point() + geom_hline(yintercept =) + facet_grid(ecd_rhythm ~ .)
   
      Is this what you were expecting?
   
        sessionInfo()
      R version 2.12.1 Patched (2010-12-18 r53869)
      Platform: x86_64-pc-mingw32/x64 (64-bit)
   
      locale:
      [1] LC_COLLATE=glish_United States.1252
      [2] LC_CTYPE=glish_United States.1252
      [3] LC_MONETARY=glish_United States.1252
      [4] LC_NUMERIC=nbsp;
      [5] LC_TIME=glish_United States.1252
   
      attached base packages:
      [1] splines   stats     graphics  grDevices utils     datasets
      grid
      [8] methods   base
   
      other attached packages:
       [1] data.table_1.5.1 doBy_4.2.2       R2HTML_2.2       contrast_0.13
       [5] Design_2.3-0     Hmisc_3.8-3      survival_2.36-2  sos_1.3-0
       [9] brew_1.0-4       lattice_0.19-17  ggplot2_0.8.9    proto_0.3-8
      [13] reshape_0.8.3    plyr_1.4
   
      loaded via a namespace (and not attached):
      [1] cluster_1.13.2     digest_0.4.2       Matrix_0.999375-46
      reshape2_1.1
      [5] stringr_0.4        tools_2.12.1
   
      HTH,
      Dennis
   
      On Tue, Jan 18, 2011 at 1:46 AM, Small Sandy (NHS Greater Glasgow 
      Clyde) [9]sandy.sm...@nhs.net [10]ailto:sandy.sm...@nhs.net%22
   wrote:
   
          Hi
   
          I have a long data set on which I want to do Bland-Altman style
          plots for each rhythm type
          Using ggplot2, when I use geom_hline with facet_grid I get an
          extra set of empty panels.
          I can't get it to do it with the Diamonds data supplied with
          the package so here is a (much abbreviated) example:
   
            lvexs
            cvd_basestudy ecd_rhythm fixed_time variable_time
          1       CBP05J02         AF    30.9000       29.4225
          2       CBP05J02         AF    33.1700       32.0350
          3       CBP05J02         AF    32.5700       30.2775
          4       CBP05J02

Re: [R] ggplot2, geom_hline and facet_grid

2011-01-19 Thread Hadley Wickham

Hi Sandy,

It's difficult to know what's going wrong without a small reproducible
example (https://github.com/hadley/devtools/wiki/Reproducibility) -
could you please provide one?  You might also have better luck with an
email directly to the ggplot2 mailing list.

Hadley

On Wed, Jan 19, 2011 at 2:57 AM, Sandy Small sandy.sm...@nhs.net wrote:
 Having upgraded to R version 2.12.1 I still have the same problem:

 The combination of facet_grid and geom_hline produce (for me) 4 panels
 of which two are empty of any data or lines (labelled 1 and 2).
 Removing either the facet_grid or the geom_hline  gives me the result I
 would then expect.

 I have tried forcing the rhythm to be a factor
 Anyone have any ideas?

 Sandy

 Dennis Murphy wrote:

   Hi:

   The attached plot comes from the following code:

   g - ggplot(data =f, aes(x = (variable_time + fixed_time)/2, y
   variable_time - fixed_time))
   g + geom_point() + geom_hline(yintercept =) + facet_grid(ecd_rhythm ~ .)

   Is this what you were expecting?

     sessionInfo()
   R version 2.12.1 Patched (2010-12-18 r53869)
   Platform: x86_64-pc-mingw32/x64 (64-bit)

   locale:
   [1] LC_COLLATE=glish_United States.1252
   [2] LC_CTYPE=glish_United States.1252
   [3] LC_MONETARY=glish_United States.1252
   [4] LC_NUMERIC=nbsp;
   [5] LC_TIME=glish_United States.1252

   attached base packages:
   [1] splines   stats     graphics  grDevices utils     datasets
   grid
   [8] methods   base

   other attached packages:
    [1] data.table_1.5.1 doBy_4.2.2       R2HTML_2.2       contrast_0.13
    [5] Design_2.3-0     Hmisc_3.8-3      survival_2.36-2  sos_1.3-0
    [9] brew_1.0-4       lattice_0.19-17  ggplot2_0.8.9    proto_0.3-8
   [13] reshape_0.8.3    plyr_1.4

   loaded via a namespace (and not attached):
   [1] cluster_1.13.2     digest_0.4.2       Matrix_0.999375-46
   reshape2_1.1
   [5] stringr_0.4        tools_2.12.1

   HTH,
   Dennis

   On Tue, Jan 18, 2011 at 1:46 AM, Small Sandy (NHS Greater Glasgow 
   Clyde) sandy.sm...@nhs.net ailto:sandy.sm...@nhs.net%22 wrote:

       Hi

       I have a long data set on which I want to do Bland-Altman style
       plots for each rhythm type
       Using ggplot2, when I use geom_hline with facet_grid I get an
       extra set of empty panels.
       I can't get it to do it with the Diamonds data supplied with
       the package so here is a (much abbreviated) example:

         lvexs
         cvd_basestudy ecd_rhythm fixed_time variable_time
       1       CBP05J02         AF    30.9000       29.4225
       2       CBP05J02         AF    33.1700       32.0350
       3       CBP05J02         AF    32.5700       30.2775
       4       CBP05J02         AF    32.0550       33.7275
       5       CBP05J02      SINUS    30.9175       28.3475
       6       CBP05J02      SINUS    30.5725       29.7450
       7       CBP05J02      SINUS    33.       31.1550
       9       CBP05J02      SINUS    31.8350       30.7000
       10      CBP05J02      SINUS    34.0450       33.4800
       11      CBP05J02      SINUS    31.3975       29.8150
          qplot((variable_time + fixed_time)/2, variable_time -
       fixed_time, data=exs) + facet_grid(ecd_rhythm ~ .) +
       geom_hline(yintercept=0)

       If I take out the geom_hline I get the plots I would expect.

       It doesn't seem to make any difference if I get the mean and
       difference separately.

       Can anyone explain this and tell me how to avoid it (and why
       does it work with the Diamonds data set?

       Any help much appreciated - thanks.

       Sandy

       Sandy Small
       Clinical Physicist
       NHS Forth Valley
       and
       NHS Greater Glasgow and Clyde



 

 This message may contain confidential information. If yo...{{dropped:21}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dataframe: string operations on columns

2011-01-18 Thread Hadley Wickham

 how can I perform a string operation like strsplit(x, )  on a column of a
 dataframe, and put the first or the second item of the split into a new
 dataframe column?
 (so that on each row it is consistent)

Have a look at str_split_fixed in the stringr package.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to cut a multidimensional array along a chosen dimension and store each piece into a list

2011-01-17 Thread Hadley Wickham

On Mon, Jan 17, 2011 at 2:20 PM, Sean Zhang seane...@gmail.com wrote:
 Dear R-Helpers,

 I wonder whether there is a function which cuts a multiple dimensional array
 along a chosen dimension and then store each piece (still an array of one
 dimension less) into a list.
 For example,

 arr - array(seq(1*2*3*4),dim=c(1,2,3,4))  # I made a point to set the
 length of the first dimension be 1to test whether I worry about drop=F
 option.

 brkArrIntoListAlong - function(arr,alongWhichDim){
 
 return(outlist)
 }

 I have tried splitter_a in plyr package but does not get what I want.

 library(plyr)
 plyr:::splitter_a(arr,3)

We'll you're really not supposed to call internal functions - you probably want:

alply(arr, 3)

but you don't say what is wrong with the output.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing data frame columns on identical data

2011-01-17 Thread Hadley Wickham

 library(plyr)
 # Function to sum y by A-B combinations for a generic data frame
 dsum - function(d) ddply(d, .(A, B), summarise, sumY = sum(y))

See count in plyr 1.4 for a much much faster way of doing this.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rootogram for normal distributions

2011-01-16 Thread Hadley Wickham

 The normal distribution is a continuous distribution, i.e., the frequency
 for each observed value will essentially be 1/n and not converge to the
 density function. Hence, you would need to look at histogram or smoothed
 densities. Rootograms, on the other hand, are intended for discrete
 distributions.

I don't think that's true - rootograms are useful for both continuous
and discrete distributions.  See (e.g.) p 314 at
http://www.edwardtufte.com/tufte/tukey, where Tukey himself uses a
rootogram with a normal distribution.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data prep question

2011-01-16 Thread Hadley Wickham

On Sun, Jan 16, 2011 at 5:48 AM,  bill.venab...@csiro.au wrote:
 Here is one way

 Here is one way:

 con - textConnection(
 + ID              TIME    OBS
 + 001             2200    23
 + 001             2400    11
 + 001             3200    10
 + 001             4500    22
 + 003             3900     45
 + 003             5605     32
 + 005             1800    56
 + 005             1900    34
 + 005             2300    23)
 dat - read.table(con, header = TRUE,
 + colClasses = c(factor, numeric, numeric))
 closeAllConnections()

 tmp - lapply(split(dat, dat$ID),
 + function(x) within(x, TIME - TIME - min(TIME)))
 split(dat, dat$ID) - tmp

Or, in one line with ddply:

library(plyr)
ddply(dat, ID, transform, TIME = TIME - min(TIME))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] median by geometric mean

2011-01-15 Thread Hadley Wickham

exp(median(log(x)) ?

Hadley

On Sat, Jan 15, 2011 at 10:26 AM, Skull Crossbones
witch.of.agne...@gmail.com wrote:
 Hi All,

 I need to calculate the median for even number of data points.However
 instead of calculating
 the arithmetic mean of the two middle values,I need to calculate their
 geometric mean.

 Though I can code this in R, possibly in a few lines, but wondering if there
 is
 already some built in function.

 Can somebody give a hint?

 Thanks in advance

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Data Transformation

2011-01-11 Thread Hadley Wickham

 The data is initially extracted from an SQL database into Excel, then saved 
 as a tab-delimited text file for use in R.

You might also want to look at the SQL packages for R so you can skip
this manual step. I'd recommend starting with
http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] plyr 1.4

2011-01-04 Thread Hadley Wickham

# plyr

plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:

  * fit the same model each patient subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising

It's already possible to do this with base R functions (like split and
the apply family of functions), but plyr makes it all a bit easier
with:

  * totally consistent names, arguments and outputs
  * convenient parallelisation through the foreach package
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages
  * labels that are maintained across all transformations

Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in functions.

You can find out more at http://had.co.nz/plyr/, including a 20 page
introductory guide, http://had.co.nz/plyr/plyr-intro.pdf.  You can ask
questions about plyr (and data-manipulation in general) on the plyr
mailing list. Sign up at http://groups.google.com/group/manipulatr

Version 1.4 (2011-01-03)
--

* `count` now takes an additional parameter `wt_var` which allows you to
  compute weighted sums. This is as fast, or faster than, `tapply` or `xtabs`.

* Really fix bug in `names.quoted`

* `.` now captures the environment in which it was evaluated. This should fix
  an esoteric class of bugs which no-one probably ever encountered, but will
  form the basis for an improved version of `ggplot2::aes`.

Version 1.3.1 (2010-12-30)
--

* Fix bug in `names.quoted` that interfered with ggplot2

Version 1.3 (2010-12-28)
--

NEW FEATURES

* new function `mutate` that works like transform to add new columns or
  overwrite existing columns, but computes new columns iteratively so later
  transformations can use columns created by earlier transformations. (It's
  also about 10x faster) (Fixes #21)

BUG FIXES

* split column names are no longer coerced to valid R names.

* `quickdf` now adds names if missing

* `summarise` preserves variable names if explicit names not provided (Fixes
  #17)

* `arrays` with names should be sorted correctly once again (also fixed a bug
  in the test case that prevented me from catching this automatically)

* `m_ply` no longer possesses .parallel argument (mistakenly added)

* `ldply` (and hence `adply` and `ddply`) now correctly passes on .parallel
  argument (Fixes #16)

* `id` uses a better strategy for converting to integers, making it possible
  to use for cases with larger potential numbers of combinations


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] reshape2 1.1

2011-01-04 Thread Hadley Wickham

Reshape2 is a reboot of the reshape package. It's been over five years
since the first release of the package, and in that time I've learned
a tremendous amount about R programming, and how to work with data in
R. Reshape2 uses that knowledge to make a new package for reshaping
data that is much more focussed and much much faster.

This version improves speed at the cost of functionality, so I have
renamed it to `reshape2` to avoid causing problems for existing users.
 Based on user feedback I may reintroduce some of these features.

What's new in `reshape2`:

 * considerably faster and more memory efficient thanks to a much better
   underlying algorithm that uses the power and speed of subsetting to the
   fullest extent, in most cases only making a single copy of the data.

 * cast is replaced by two functions depending on the output type: `dcast`
   produces data frames, and `acast` produces matrices/arrays.

 * multidimensional margins are now possible: `grand_row` and `grand_col` have
   been dropped: now the name of the margin refers to the variable that has
   its value set to (all).

 * some features have been removed such as the `|` cast operator, and the
   ability to return multiple values from an aggregation function. I'm
   reasonably sure both these operations are better performed by plyr.

 * a new cast syntax which allows you to reshape based on functions
   of variables (based on the same underlying syntax as plyr):

 * better development practices like namespaces and tests.

Initial benchmarking has shown `melt` to be up to 10x faster, pure
reshaping `cast` up to 100x faster, and aggregating `cast()` up to 10x
faster.

This work has been generously supported by BD (Becton Dickinson).

Version 1.1
---

* `melt.data.frame` no longer turns characters into factors

* All melt methods gain a `na.rm` and `value.name` arguments - these
  previously were only possessed by `melt.data.frame` (Fixes #5)

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Hadley Wickham

Hi Frank,

I think you mean packagename::functionname?  The three colon form is
for accessing non-exported objects.  Otherwise, I think using :: vs
importFrom is functionally identical - either approach delays package
loading until necessary.

Hadley

On Mon, Jan 3, 2011 at 9:45 PM, Frank Harrell f.harr...@vanderbilt.edu wrote:

 In my rms package I use the packagename:::functionname construct in a number
 of places.  If I instead use the importFrom declaration in the NAMESPACE
 file would that require the package to be available, and does it load the
 package when my package loads?  If so I would keep using packagename::: to
 avoid up-front loading of other packages that are not always used.

 Thanks
 Frank


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/packagename-functionname-vs-importFrom-tp3172684p3172684.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Hadley Wickham

 I think you mean packagename::functionname?  The three colon form is
 for accessing non-exported objects.

 Normally two colons suffice, but within a package you need three to
 access exported but un-imported objects :)

Are you sure?

 Note that it is typically a design mistake to use ‘:::’ in your
 code since the corresponding object has probably been kept
 internal for a good reason.  Consider contacting the package
 maintainer if you feel the need to access the object for anything
 but mere inspection.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Hadley Wickham

 Correct.  I'm doing this because of non-exported functions in other packages,
 so I need :::

But you really really shouldn't be doing that.  Is there a reason that
the package authors won't export the functions?

 I'd still appreciate any insight about whether importFrom in NAMESPACE
 defers package loading so that if the package is not actually used (and is
 not installed) there will be no problem.

Imported packages need to be installed - but it's the import vs.
suggests vs. depends statement in DESCRIPTION that controls this
behaviour, not the namespace.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Writing a single output file

2010-12-30 Thread Hadley Wickham

It looks like you have csv files, so use read.csv instead of read.table.
Hadley

On Thu, Dec 30, 2010 at 12:18 AM, Amy Milano milano_...@yahoo.com wrote:
 Dear sir,

 At the outset I sincerely apologize for reverting back bit late as I was out 
 of office. I thank you for your guidance extended by you in response to my 
 earlier mail regarding Writing a single output file where I was trying to 
 read multiple output files and create a single output date.frame. However, I 
 think things are not working as I am mentioning below -


 # Your code

 setwd('/temp')
 fileNames - list.files(pattern = file.*.csv)

 input - do.call(rbind, lapply(fileNames, function(.name)
 {
 .data - read.table(.name, header = TRUE, as.is = TRUE)
 .data$file - .name
 .data
 }))


 # This produces following output containing only two columns and moreover 
 date and yield_rates are clubbed together.



  date.yield_rate  file
 1   12/23/10,5.25 file1.csv
 2   12/22/10,5.19 file1.csv
 3   12/23/10,4.16 file2.csv
 4   12/22/10,4.59 file2.csv
 5   12/23/10,6.15 file3.csv
 6   12/22/10,6.41 file3.csv
 7   12/23/10,8.15 file4.csv
 8   12/22/10,8.68 file4.csv


 # and NOT the kind of output given below where date and yield_rates are 
 different.

 input
     date  yield_rate  file
 1 12/23/2010   5.25 file1.csv
 2 12/22/2010   5.19 file1.csv
 3 12/23/2010   5.25 file2.csv
 4 12/22/2010   5.19 file2.csv
 5 12/23/2010   5.25 file3.csv
 6
  12/22/2010   5.19 file3.csv
 7 12/23/2010   5.25 file4.csv
 8 12/22/2010   5.19 file4.csv

 So when I tried following code to produce the required result, it throws me 
 an error.

 require(reshape)

 in.melt - melt(input, measure = 'yield_rate')
 in.melt - melt(input, measure = 'yield_rate')
 Error: measure variables not found in data: yield_rate

 # So I tried

 in.melt - melt(input, measure = 'date.yield_rate')


 cast(in.melt, date.yield_rate ~ file)

 cast(in.melt, date ~ file)
 Error: Casting formula contains variables not found in molten data: date

 # If I try to change it as

 cast(in.melt, date.yield_rate ~ file)    # Gives following error.
 Error: Casting formula contains variables not found in molten data: 
 date.yield_rate

 Sir, it will be a
  great help if you can guide me and once again sinserely apologize for 
 reverting so late.

 Regards

 Amy


 --- On Thu, 12/23/10, jim holtman jholt...@gmail.com wrote:

 From: jim holtman jholt...@gmail.com
 Subject: Re: [R] Writing a single output file
 To: Amy Milano milano_...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, December 23, 2010, 1:39 PM

 This should get you close:

 # get file names
 setwd('/temp')
 fileNames - list.files(pattern = file.*.csv)
 fileNames
 [1] file1.csv file2.csv file3.csv file4.csv
 input - do.call(rbind, lapply(fileNames, function(.name){
 +     .data - read.table(.name, header = TRUE, as.is = TRUE)
 +     # add
  file name to the data
 +     .data$file - .name
 +     .data
 + }))
 input
         date yield_rate      file
 1 12/23/2010       5.25 file1.csv
 2 12/22/2010       5.19 file1.csv
 3 12/23/2010       5.25 file2.csv
 4 12/22/2010       5.19 file2.csv
 5 12/23/2010       5.25 file3.csv
 6 12/22/2010       5.19 file3.csv
 7 12/23/2010       5.25 file4.csv
 8 12/22/2010       5.19 file4.csv
 require(reshape)
 in.melt - melt(input, measure = 'yield_rate')
 cast(in.melt, date ~ file)
         date file1.csv file2.csv file3.csv file4.csv
 1 12/22/2010      5.19      5.19
      5.19      5.19
 2 12/23/2010      5.25      5.25      5.25      5.25



 On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano milano_...@yahoo.com wrote:
 Dear R helpers!

 Let me first wish all of you Merry Christmas and Very Happy New year 2011

 Christmas day is a day of Joy and Charity,
 May God make you rich in both - Phillips Brooks

 ## 
 

 I have a process which generates number of outputs. The R code for the same 
 is as given below.

 for(i in 1:n)
 {
 write.csv(output[i], file = paste(output, i, .csv, sep = ), row.names =
  FALSE)
 }

 Depending on value of 'n', I get different output files.

 Suppose n = 3, that means I am having three output csv files viz. 
 'output1.csv', 'output2.csv' and 'output3.csv'

 output1.csv
 date   yield_rate
 12/23/2010    5.25
 12/22/2010    5.19
 .
 .


 output2.csv

 date   yield_rate

 12/23/2010    4.16

 12/22/2010    4.59

 .


  .

 output3.csv


 date   yield_rate


 12/23/2010    6.15


 12/22/2010    6.41


 .


 .



 Thus all the output files have same column names viz. Date and yield_rate. 
 Also, I do need these files individually too.

 My further requirement is to have

Re: [R] pdf() Export Problem: Circles Interpreted as Fonts from ggplot2 Graphics

2010-12-30 Thread Hadley Wickham

 The Inkscape user asked if there was any way that R could be coerced to use
 actual circles or paths for the points. I am not aware of a way to do this so
 any input from anyone here would be greatly appreciated.

pdf(..., useDingbats = F)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] ggplot2 0.8.9 - Merry Christmas version

2010-12-24 Thread Hadley Wickham

ggplot2 

ggplot2 is a plotting system for R, based on the grammar of graphics,
which tries to take the good parts of base and lattice graphics and
avoid bad parts. It takes care of many of the fiddly details
that make plotting a hassle (like drawing legends) as well as
providing a powerful model of graphics that makes it easy to produce
complex multi-layered graphics.

To install or update, run:
install.packages(c(ggplot2, plyr))

Find out more at http://had.co.nz/ggplot2, and check out the nearly 500
examples of ggplot in use.  If you're interested, you can also sign up to
the ggplot2 mailing list at http://groups.google.com/group/ggplot2, or track
development at  http://github.com/hadley/ggplot2

ggplot2 0.8.9 (2010-12-24) 

A big thanks to Koshke Takahashi, who supplied the majority of
improvements in this release!

GUIDE IMPROVEMENTS

* key size: can specify width and height separately

* axis: can partially handle text rotation (issue #149)

* legend: now can specify the direction of element by opts(legend.direction =
  vertical) or opts(legend.direction = horizontal), and legend box is
  center aligned if horizontal

* legend: now can override the alignment of legend box by
  opts(legend.box = vertical) or opts(legend.box = horizontal)

* legend: now can override legend title alignment with opts(legend.title.align
  = 0) or opts(legend.title.align = 1)

* legend: can override legend text alignment with opts(legend.text.align = 0)
  or opts(legend.text.align = 1)

BUG FIXES

* theme_*: can specify font-family for all text elements other than geom_text

* facet_grid: fixed hirozontal spacing when nrow of horizontal strip = 2

* facet_grid: now can manually specify the relative size of each row and column

* is.zero: now correctly works

* +: adding NULL to a plot returns the plot (idempotent under addition)
  (thanks to suggestion by Matthew O'Meara)

* +: meaningful error message if + doesn't know how to deal with an object
  type

* coord_cartesian and coord_flip: now can wisely zoom when wise = TRUE

* coord_polar: fix point division bugs

* facet_grid: now labels in facet_grid are correctly aligned when the number
  of factors is more then one (fixes #87 and #65)

* geom_hex: now correctly applies alpha to fill colour not outline colour
  (thanks to bug report from Ian Fellows)

* geom_polygon: specifying linetype now works (thanks to fix from Kohske
  Takahashi)

* hcl: can now set c and l, and preserves names (thanks to suggestion by
  Richard Cotton)

* mean_se: a new summary function to work with stat_summary that calculates
  mean and one standard error on either side (thanks to contribution from
  Kohske Takahashi)

* pos_stack: now works with NAs in x

* scale_alpha: setting limits to a range inside the data now works (thanks to
  report by Dr Proteome)

* scale_colour_continuous: works correctly with single continuous value (fixes
  #73)

* scale_identity: now show legends (fix #119)

* stat_function: now works without y values

* stat_smooth: draw line if only 2 unique x values, not three as previously *
  guides: fixed #126

* stat_smooth: once again works if n  1000 and SE = F (thanks to bug report
  from Theiry Onkelinx and fix from Kohske Takahashi)

* stat_smooth: works with locfit (fix #129)

* theme_text handles alignment better when angle = 90


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Writing a single output file

2010-12-23 Thread Hadley Wickham

 input - do.call(rbind, lapply(fileNames, function(.name){
 +     .data - read.table(.name, header = TRUE, as.is = TRUE)
 +     # add file name to the data
 +     .data$file - .name
 +     .data
 + }))

You can simplify this a little with plyr:

fileNames - list.files(pattern = file.*.csv)
names(fileNames) - fileNames

input - ldply(fileNames, read.table, header = TRUE, as.is = TRUE)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coding a new variable based on criteria in a dataset

2010-12-22 Thread Hadley Wickham

  It isn't quite convenient to read the data posted below into R
 (if it was originally tab-separated, that formatting got lost) but
 ddply from the plyr package is good for this: something like (untested)

  d - with(data,ddply(data,interaction(UniqueID,Reason),
                    function(x) {
                          ## make sure x is sorted by date/time here
                          x$F_R - c(F,rep(R,nrow(x)-1))
                          x
                     })

Or a little more succinctly:

d - ddply(data, c(UniqueID, Reason), transform, F_R =
c(F,rep(R,nrow(x)-1))

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to change the default location of x-axis in ggplot2?

2010-12-22 Thread Hadley Wickham

 In ggplot2, by default the x-axis is in the bottom of the graph and
 y-axis is in the left of the graph. I wonder if it is possible to:

 1. put the x axis in the top, or put the y axis in the right?
 2. display x axis in both the top and bottom?

These are on the to do list.

 3. display x axis in both sides, and each of them has individual scales?

ggplot2 will never support this because I think it's a really really bad idea.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 histograms

2010-12-13 Thread Hadley Wickham

Hi Sandy,

The way I'd describe it is that you expected the width parameter of
the position adjustment to be relative to the binwidth of the
histogram - but it's actually absolute, and it has to be this way
because there's currently no way for the position adjustment to know
about the parameters of the geom.

Hadley

On Wed, Dec 1, 2010 at 10:07 AM, Small Sandy (NHS Greater Glasgow 
Clyde) sandy.sm...@nhs.net wrote:
 Sorry this should have ben to the whole list:

 Hadley

 I think I've sorted it out in my head but for the record, and just to be 
 sure...
 I guess what I was expecting was that the width parameter would be 
 independent of binwidth. Thus a width parameter of 0.5 would always indicate 
 an overlap of half the bar. In fact the width is determined as a fraction of 
 the binwidth, so if width is greater than binwidth the overlap will be with 
 adjacent bins not the bin it actually corresponds to.
 So in my example you can completely separate the data by putting
 ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 2, 
 position = position_dodge(width=7))
 Obviously this isn't helpful.
 I think the rules are:
 1. the width of each bar equals binwidth divided by number of fill factors 
 (in my case two)
 2. total width of the visible bars would be centred on the centre of the bin
 3. overlap of the visible bars is governed by the width parameter of 
 position_dodge with 0 being complete overlap and binwidth being complete (but 
 touching) separation (More than binwidth would then mean space between the 
 bars - and presumably overlap with adjacent bars - I don't think this would 
 ever be useful).
 Hope this makes sense.
 Sandy

 Sandy Small
 Clinical Physicist
 NHS Forth Valley
 (Tel: 01324567002)
 and
 NHS Greater Glasgow and Clyde
 (Tel: 01412114592)
 
 From: h.wick...@gmail.com [h.wick...@gmail.com] On Behalf Of Hadley Wickham 
 [had...@rice.edu]
 Sent: 01 December 2010 14:27
 To: Small Sandy (NHS Greater Glasgow  Clyde)
 Cc: ONKELINX, Thierry; r-help@r-project.org
 Subject: Re: [R] ggplot2 histograms

 However if you do:
 ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 1, 
 position = position_dodge(width=0.99))

 The position of first bin which goes from 0-2 appears to start at about 0.2 
 (I accept that there is some white space to the left of this) while the 
 position of the last bin (16-18) appears to start at about 15.8, so the 
 whole histogram seems to be wrongly compressed into the scale. In my real 
 data which has potentially 250 bins the problem becomes much more 
 pronounced. Has any one else noticed this? Is there a work around?

 What do you expect this to do?  The bars are one unit wide, but you've
 told position_dodge to treat them like they're only 0.99 units wide.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 

 This message may contain confidential information. If yo...{{dropped:21}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

2010-12-06 Thread Hadley Wickham

On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava
research.b...@gmail.com wrote:
 Dear R-Helpers:

 I am using trying to use *ddply* to extract min and max of a particular
 column in a data.frame. I am using two different forms of the function:


 ## var_name_to_split is a string -- something like var1 which is the name
 of a column in data.frame

 ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[
 , 3]))) ## fails with an error - case 1
 ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3])))
               ## works fine - case 2

 I can't understand why I get the error in case 1. Can someone help me
 please?

Why do you expect case 1 to work?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

2010-12-06 Thread Hadley Wickham

It's easiest to see what's going on if you use eval.quoted directly:

eval.quoted(.(cyl), mtcars)
eval.quoted(.(cyl), mtcars)
eval.quoted(.(as.name(cyl)), mtcars)

But you shouldn't need to do any syntactic hackery because the default
method automatically parses the string for you:

eval.quoted(as.quoted(cyl), mtcars)

Hadley

On Mon, Dec 6, 2010 at 6:22 PM, Sunny Srivastava
research.b...@gmail.com wrote:
 Hi Hadley:
 I was trying to use ddply using the format . (var1) for splitting.
 I thought . ( as.name(grp) ) would do the same thing. But it does not. I was
 just trying to know my mistake. I am sorry if it is a basic question.
 Thank you and others for your reply.
 Best Regards,
 S.

 On Mon, Dec 6, 2010 at 5:28 PM, Hadley Wickham had...@rice.edu wrote:

 On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava
 research.b...@gmail.com wrote:
  Dear R-Helpers:
 
  I am using trying to use *ddply* to extract min and max of a particular
  column in a data.frame. I am using two different forms of the function:
 
 
  ## var_name_to_split is a string -- something like var1 which is the
  name
  of a column in data.frame
 
  ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] ,
  max(x[
  , 3]))) ## fails with an error - case 1
  ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ ,
  3])))
                ## works fine - case 2
 
  I can't understand why I get the error in case 1. Can someone help me
  please?

 Why do you expect case 1 to work?

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/





-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 histograms

2010-12-01 Thread Hadley Wickham

 However if you do:
 ggplot(data=dafr, aes(x = d1, fill=d2)) + geom_histogram(binwidth = 1, 
 position = position_dodge(width=0.99))

 The position of first bin which goes from 0-2 appears to start at about 0.2 
 (I accept that there is some white space to the left of this) while the 
 position of the last bin (16-18) appears to start at about 15.8, so the whole 
 histogram seems to be wrongly compressed into the scale. In my real data 
 which has potentially 250 bins the problem becomes much more pronounced. Has 
 any one else noticed this? Is there a work around?

What do you expect this to do?  The bars are one unit wide, but you've
told position_dodge to treat them like they're only 0.99 units wide.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 histograms

2010-11-30 Thread Hadley Wickham

You may find it easier to use a frequency polygon, geom = freqpoly.

Hadley

On Tue, Nov 30, 2010 at 2:36 PM, Small Sandy (NHS Greater Glasgow 
Clyde) sandy.sm...@nhs.net wrote:
 Hi

 With ggplot2 I can very easily create beautiful histograms but I would like 
 to put two histograms on the same plot. The histograms may be over-lapping.
 When they are overlapping the bars are shown on top of each other (so that 
 the overall height is the sum of the two). Is there any way to get them to 
 display overlapping (with smaller value in front, larger value behind) so 
 that the overall height is equal to the height of the largest value

 The following demonstrates the problem (there is probably a simple way to 
 generate the sequence in d1 but I don't know it and just threw this together 
 quickly)
 d1-c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7,7,8,8,9,6,7,7,8,8,8,9,9,9,9,10,10,10,10,10,11,11,11,11,12,12,12,13,13,14,15,15,16,16,16,17,17,17,17,18,18,18,18,18)

 d2-c(rep(a,25), rep(b,39))
 dafr-data.frame(d1,d2)

 library(ggplot)
 qplot(d1, data=dafr, fill=d2, geom='histogram', binwidth = 1)

 Many thanks for any help
 Sandy

 Sandy Small
 Clinical Physicist
 NHS Forth Valley
 and
 NHS Greater Glasgow and Clyde


 

 This message may contain confidential information. If yo...{{dropped:24}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on running regression by grouping firms

2010-11-25 Thread Hadley Wickham

 res - function(x) resid(x)
 ds_test$u - do.call(c, llply(mods, res))

I'd be a little careful with this, because there's no guarantee the
results will by ordered in the same way as the input (and I'd also
prefer ds_test$u - unlist(llply(mods, res)) or ds_test$u -
laply(mods, res))

 In your case, where you have multiple grouping factors, you may have to be a
 little more careful, but the strategy is the same. You could possibly reduce
 it to a one-liner (untested):

 ds_test$u - do.call(c, dlply(ds_test, .(individual), function(x)
 resid(lm(size ~ time, data = x

Or:

ds_test - ddply(ds_test, .(individual), transform, u = resid(lm(size ~ time)))

which will guarantee the correct ordering.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Go (back) from Rd to roxygen

2010-11-25 Thread Hadley Wickham

 Since roxygen is a great help to document R packages, I am wondering
 if there exists an approach to go back from the raw Rd files to
 roxygen-documentation? E.g. turn \author{Somebody} into @author
 Somebody. This sounds ridiculous, but I believe it helps in the long
 term for me to maintain R packages.

Have a look at https://gist.github.com/d1bbd44894a99a2e1d1b for a start.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sum in vector

2010-11-17 Thread Hadley Wickham

 rowsum(value, paste(factor1, factor2, factor3))

That is dangerous in general, and always inefficient. Imagine factor1
is c(a, a b) and factor2 is (b c, c).  Use interaction with
drop = T.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extending the accuracy of exp(1) in R

2010-11-09 Thread Hadley Wickham

 Where the value of exp(1) as computed by R is concerned, you have
 been deceived by what R displays (prints) on screen. The default
 is to display any number to 7 digits of accuracy, but that is not
 the accuracy of the number held internally by R:

  exp(1)
  # [1] 2.718282
  exp(1) - 2.718282
  # [1] -1.715410e-07

I encourage anyone confused about this issue to study
http://en.wikipedia.org/wiki/The_Treachery_of_Images

And to watch
http://www.youtube.com/watch?v=ejweI0EQpX8

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to detect if a vector is FP constant?

2010-11-08 Thread Hadley Wickham

Hi all,

What's the equivalent to length(unique(x)) == 1 if want to ignore
small floating point differences?  Should I look at diff(range(x)) or
sd(x) or something else?  What cut off should I use?

If it helps to be explicit, I'm interested in detecting when a vector
is constant for the purpose of visual display.  In other words, if I
rescale x to [0, 1] do I have enough precision to get at least 100
unique values.

Thanks!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to detect if a vector is FP constant?

2010-11-08 Thread Hadley Wickham

 I think this does what you want (borrowing from all.equal.numeric):

 all(abs((x - mean(x)))  .Machine$double.eps^0.5)

 with a vector of length 1 million, it took .076 seconds on a fairly old 
 system.

Hmmm, maybe I want:

all.equal(min(x), max(x))

?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Heatmap construction problems

2010-11-07 Thread Hadley Wickham

It's hard to know without a minimal reproducible example, but you
probably want scale_fill_gradient or scale_fill_gradientn.

Hadley

On Thu, Oct 28, 2010 at 9:42 AM, Struchtemeyer, Chris
stru...@okstate.edu wrote:
 I am very new to R and don't have any computer program experience
 whatsoever.  I am trying to generate a heatmap of the following data:



  Phylum,AI,AJT,BY,GA,Grt,Sm
  Acidobacteria,0.5,0.7,2.7,0.1,2.6,1.0
  Actinobacteria,33.7,65.1,9.7,2.0,3.9,2.1
  Bacteroidetes,9.7,5.6,0.7,13.2,41.1,21.6
  CCM11b,0.0,0.0,0.0,0.0,0.0,0.1
  Chlamydiae,0.1,0.1,0.0,0.0,1.0,0.2
  Chlorobi,0.0,0.0,0.0,0.0,0.7,1.0
  Chloroflexi,0.1,0.2,0.6,0.2,0.8,0.6
  Cyanobacteria,18.7,0.0,1.0,1.5,0.9,0.3
  Ellusimicrobiales,0.0,0.0,0.0,0.0,0.0,0.0
  Firmicutes,1.0,7.6,8.3,31.9,2.1,6.9
  Gemmatimonadetes,5.0,0.3,0.3,0.0,0.1,0.0
  GN02,0.0,0.0,0.0,0.0,0.0,0.5
  Nitrospirae,0.0,0.2,1.1,0.0,0.0,0.0
  NKB19,0.0,0.0,0.9,0.0,0.0,0.0
  OP8,0.0,0.1,0.0,0.0,0.0,0.0
  OP10,0.6,0.2,0.5,0.0,0.6,0.6
  Planctomycetes,0.9,0.5,6.5,2.2,2.0,2.3
  Alphaproteobacteria,7.8,10.7,21.8,12.2,5.3,26.8
  Betaproteobacteria,9.9,2.8,8.9,21.7,8.3,21.9
  Deltaproteobacteria,0.5,0.2,1.8,2.0,1.2,7.1
  Epsilonproteobacteria,0.0,0.0,0.0,0.1,0.0,0.2
  Gammaproteobacteria,4.0,2.5,8.0,9.4,24.7,5.4
  SC4,0.0,0.0,0.0,0.0,0.7,0.0
  SM2F11,0.0,0.0,0.0,0.0,0.2,0.0
  SPAM,0.0,0.1,0.0,0.0,0.1,0.1
  Synergistes,0.0,0.0,0.0,0.1,0.0,0.0
  Deinococcus-Thermus,0.0,0.0,0.0,0.0,0.0,0.0
  TM6,0.1,0.0,0.0,0.0,0.1,0.0
  TM7,0.0,0.1,0.4,0.0,0.4,0.1
  Verrucomicrobia,3.8,2.1,23.2,2.9,1.3,0.5
  WPS-2,0.0,0.0,0.1,0.0,0.0,0.0
  WS3,0.0,0.0,0.0,0.0,0.0,0.1
  Uncl Bacteria,3.7,1.2,3.7,0.4,1.9,0.8

 I am a microbiologist.  What I want to do is construct a heatmap showing the
 relative abundance of each phylum.  The far left column of my table contains
 all of the phylum names I observed in a set of 6 water samples and each of
 the columns to the right contains the relative abundance (%) of each phylum
 in each water sample.  I have tried constructing a heatmap using the ggplot
 guidelines listed at the following site:
 http://learnr.wordpress.com/2010/01/26/ggplot2-quick-heatmap-plotting/

 I can generate a heatmap using this method, but would like to alter the
 scale.  I would like it so that I can have a little more complex gradient
 ranging from 0% to the highest relative abundance that I observe in the
 above table (65.1%).  The default scale I get using the link above is just a
 relative intensity scale ranging from 1 to 5 (where white represent low
 percentages and steelblue represented high percentages).  This is alright
 but for phyla that are present at relative abundance of less than 5% all
 appear to be white (or non-existant).  Is there anyway to fix this?  Any
 help would be greatly appreciated.

 Thanks,

 Chris

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: facet_grid with only one level does not display the graph with the facet_grid level in title

2010-11-07 Thread Hadley Wickham

This is on my to do list:
https://github.com/hadley/ggplot2/issues/labels/facet#issue/107

Hadley

On Thu, Oct 28, 2010 at 11:51 AM, Matthew Pettis
matthew.pet...@gmail.com wrote:
 Hi All,

 Here is the code that I'll be referring to:

 p - ggplot(wastran.data, aes(PER_KEY, EVENTS))
 (p - p +
    facet_grid( pool.short ~ .) +
    stat_summary(aes(y=EVENTS), fun.y = sum, geom=line) +
    opts(axis.text.x = theme_text(angle = 90, hjust=1), title=Events
 (15min.) vs. Time: Facet pool, strip.text.y = theme_text())
 )


 Now, depending on preceding parameters, the 'pool.short' factor variable in
 'wastran.data' can have one distinct factor level or it can have more than
 one.  When 'pool.short' has more than one factor level, the graph performs
 as I expect, with multiple rows of graphs with the value of the 'pool.short'
 variable displayed on the right-hand side of the graph.  When 'pool.short'
 has only one factor level, the value is NOT displayed on the right-hand
 side.  However, I'd still like it displayed, even though it has only one
 value.

 Can someone tell me how to tweak this code to make it still display when it
 has only 1 factor level?  If this code is unclear, I will be happy to take
 some time and generate an artificial but reproducible self-contained
 example.  I left in the stat_summary layer in this code in case it is
 interfering with the desired output (but I suspect is is superfluous, but I
 am not confident enough to say that with absolute certainty).

 Thanks,
 Matt

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoiding too many loops - reshaping data

2010-11-04 Thread Hadley Wickham

 Beware of facile comparisons of this sort -- they may be apples and nematodes.

And they also imply that the main time sink is the computation.  In my
experience, figuring out how to solve the problem using takes
considerably more time than 18 / 1000 seconds, and so investing your
energy in learning idioms that apply in a wide range of situations is
far more useful than figuring out the fastest solution to a single
problem.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overloading the generic primitive functions + and [

2010-10-28 Thread Hadley Wickham

 Note how S3 methods are dispatched only by reference to the first
 argument (on the left of the operator). I think S4 beats this by
 having signatures that can dispatch depending on both arguments.

That's somewhat of a simplification for primitive binary operators. R
actually looks up the method for both input classes, and returns a
error if they are different.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which version control system to learn for managing R projects?

2010-10-26 Thread Hadley Wickham

 git is where the world is headed.  This video is a little old:
 http://www.youtube.com/watch?v=4XpnKHJAok8, but does a good job
 getting the point across.

And lots of R users are using github already:

http://github.com/languages/R/created

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Forcing results from lm into datframe

2010-10-26 Thread Hadley Wickham

On Tue, Oct 26, 2010 at 11:55 AM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 When it comes to split, apply, combine, think plyr.

 library(plyr)
 ldply(split(afvtprelvefs, afvtprelvefs$basestudy),
         function(x) coef(lm (ef ~ quartile, data=x, weights=1/ef_std)))

Or do it in two steps:

models - dlply(aftvprelvef, basestudy, function(x)
  lm(ef ~ quartile, data=x, weights=1/ef_std)
coefs - ldply(models, coefs)

That way you can easily pull out other info

rsq - function(x) summary(x)$r.squared
ldply(models, rsq)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which version control system to learn for managing R projects?

2010-10-26 Thread Hadley Wickham

 1. What is everyone else using?  The network effect is important since
 you want people to be able to access your repository and you want to
 leverage your knowledge of the version control system for other
 projects' repositories.  To that extent Subversion is the clear choice
 since its used on R-Forge, by R itself and on Google code (Google code
 also supports Mercurial).

There's a bit of a complication that you can use git (and mercurial I
assume) to work with svn repositories, but not vice versa.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Find index of a string inside a string?

2010-10-25 Thread Hadley Wickham

Or str_locate:

library(stringr)
str_locate(aabcd, bcd)

Hadley

On Mon, Oct 25, 2010 at 5:53 AM, jim holtman jholt...@gmail.com wrote:
 I think what you want is 'regexpr':

 regexpr(bcd, aabcd)
 [1] 3
 attr(,match.length)
 [1] 3



 On Mon, Oct 25, 2010 at 7:27 AM, yoav baranan ybara...@hotmail.com wrote:

 Hi,
 I am searching for the equivalent of the function Index from SAS.

 In SAS: index(abcd, bcd) will return 2 because bcd is located in the 2nd 
 cell of the abcd string.
 The equivalent in R should do this:
 myIndex - foo(abcd, bcd) #return 2.
 What is the function that I am looking for?

 I want to use the return value in substr, like I do in SAS.

 thanks, y. baranan.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Query on save.image()

2010-10-14 Thread Hadley Wickham

On Thu, Oct 14, 2010 at 11:56 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi,

 I do not believe you can use the save.image() function in this case.
 save.image() is a wrapper for save() with defaults for the global
 environment (your workspace).  Try this instead, I believe it does
 what you are after:

 myfun - function(x) {
 y - 5 * x + x^2
 save(list = ls(envir = environment(), all.names = TRUE),
     file = myfile.RData, envir = environment())
 }

 Notice that for both save() and ls() I used the environment() function
 to grab the current environment.  This should mean that even if y
 was defined globally, it would save a copy of the version inside your
 function.

I think the defaults are actually ok in this case:

 myfun - function(x) {
+   y - 5 * x + x^2
+   save(list = ls(all.names = TRUE), file = myfile.RData)
+ }
 print(load(myfile.RData))
[1] x y

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can't find and install reshape2??

2010-10-12 Thread Hadley Wickham

 My guess is you are using an outdated R version for which the rather new
 reshape2 package has not been compiled.

I wonder if install.packages() could detect this case (e.g. by also
checking if the source version is not available), and offer a more
informative error message.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looking for a book/tutorial with the following context:

2010-10-08 Thread Hadley Wickham

 Do you also know more references about variables? Unfortunately this was a
 little bit short so I do not feel 100% sure I completely got it.

Try here:
http://github.com/hadley/devtools/wiki/Scoping

It's a work in progress.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Tools for thinking about data analysis and graphics

2010-10-06 Thread Hadley Wickham

On Wed, Oct 6, 2010 at 4:05 PM, Michael Friendly frien...@yorku.ca wrote:
  I'm giving a talk about some aspects of language and conceptual tools for
 thinking about how
 to solve problems in several programming languages for statistical computing
 and graphics. I'm particularly
 interested in language features that relate to:

 o expressive power: ease of translating what you want to do into the results
 you want
 o elegance: how well does the code provide a simple human-readable
 description of what is done?
 o extensibility: ease of generalizing a method to wider scope
 o learnability: your learning curve (rate, asymptote)

 For R, some things to cite are (a) data and function objects, (b)
 object-oriented methods (S3  S4); (c) function mapping over data with
 *apply methods and plyr.

 What other language features of R should be on this list?  I would welcome
 suggestions (and brief illustrative examples).

 * missing values
 * subsetting
 * lexical scope and closures (goes along with first class functions)
 * built-in documentation
 * CRAN (not exactly a language feature, but important part of ecosystem)
 * thoughtful interactive features - e.g. a - 10 doesn't print 10.


Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plyr: a*ply with functions that return matrices-- possible bug in aaply?

2010-10-04 Thread hadley wickham

  That is, I want to define something like the
 following using an a*ply method, but aaply gives a result in which the
 applied .margin(s) do not appear last in the
 result, contrary to the documentation for ?aaply.  I think this is a bug,
 either in the function or the documentation,
 but perhaps there's something I misunderstand for this case.

Maybe the documentation isn't clear but I think this is behaving as expected:

 * the margin you split on comes first in the output
 * followed by the dimensions created by the applied function.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issue with match.call

2010-10-04 Thread Hadley Wickham

 RFF-function(qtype, qOpt,...){}
 i.e., I have two args that are compulsary and the rest are optional. Now when 
 my user passes the function call, I need to see what optional args are 
 defined and process accordingly...what I have so far is..

 RFF-function(qtype, qOpt,...){
        mc - match.call(expand.dots=TRUE)
  }

 I need to see what all args have been sent out of
 vec-c(flag,sep,dec) and define if-else conditions based on whether 
 they have been defined. How do I do this?

I think you'd be much better off defining those as arguments and using
missing(), rather than messing around with match.call (unless there is
a specific reason you need the unevaluated expressions).

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Script auto-detecting its own path

2010-10-04 Thread Hadley Wickham

 I'm not sure this will solve the issue because if I move the script, I would
 still have to go into the script and edit the /path/to/my/script.r, or do
 I misunderstand your workaround?
 I'm looking for something like:
 file.path.is.here(myscript.r)
 and which would return something like:
 [1] c:/user/Desktop/
 so that regardless of where the script is, as long as the accompanying
 scripts are in the same directory, they can be easily sourced with something
 like:
 dirX - file.path.is.here(MasterScript.r)
 source(paste(dirX, AuxillaryFile.r, sep=))

If you use relative paths like so:

# master.r
source(AuxillaryFile.r)

Then source(path/to/master.r, chdir = T) will work.  Mastering
working directories is a much better idea than coming up with your own
workarounds.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function which can apply a function by a grouping variable and also hand over an additional variable, e.g. a weight

2010-10-01 Thread Hadley Wickham

You might want to check out the plyr package.
Hadley

On Fri, Oct 1, 2010 at 6:05 AM, Werner W. pensterfuz...@yahoo.de wrote:
 Hi,

 I was wondering if there is an easy way to accomplish the following in R:
 Often I want to apply a function, e.g. weighted.quantile from the Hmisc 
 package
 to grouped subsets of a data.frame (grouping variable) but then I also need to
 hand over the weights which seems not possible with summaryBy or aggregate or
 the like.

 Is there a function to do this? Currently I do this with loops but it is very
 slow.

 I would be very grateful for any hints.

 Thanks,
  Werner




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Script auto-detecting its own path

2010-09-29 Thread Hadley Wickham


 Forgive me if this question has been addressed, but I was unable to find 
 anything in the r-help list or in cyberspace. My question is this: is there a 
 function, or set of functions, that will enable a script to detect its own 
 path? I have tried file.path() but that was not what I was looking for. It 
 would be nice to be able to put all the related scripts I use in the same 
 folder with a master script and then source() them in that master script. 
 Problem is, the master script must first know where it is (without me 
 having to open it and retype the path every time I move it).

Instead of trying to work out where your script is located, when you
source it in, just make sure the working directory is set correctly:

source(/path/to/my/script.r, chdir = T)

chdir is the very useful, but under advertised, argument to source().

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ggplot2 - Boxplot

2010-09-22 Thread Hadley Wickham

That implies you need to update your version of plyr.
Hadley

On Wed, Sep 22, 2010 at 4:10 AM, RaoulD raoul.t.dso...@gmail.com wrote:

 Hi,

 I am using ggplot2 to create a boxplot that summarizes a continuous
 variable. This code works fine for me on one PC however when I use it on
 another it doesnt.

 The structure of the dataset AHT_TopCD is SubReason=Categorical variable,
 AHT=Continuous variable.

 The code for the boxplot:
 require(ggplot2)
 qplot(SubReason,AHT,data=AHT_TopCD,geom=boxplot,main=AHT Spread - By
 Sub-Reason,xlab=AHT,colour=SubReason,alpha = I(1 / 5))+
 + coord_flip() + scale_x_discrete(breaks=NA)

 The error I get is  :
 Error in get(make_aesthetics, env = x, inherits = TRUE)(x, ...) :
  could not find function empty

 I do not understand this error. Can anyone help me with this please? Also,
 let me know if you have any questions or require clarification on anything
 here.

 Regards,
 Raoul
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Problem-with-ggplot2-Boxplot-tp2549970p2549970.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] parallel computation with plyr 1.2.1

2010-09-16 Thread Hadley Wickham

Yes, this was a little bug that will be fixed in the next release.
Hadley

On Thu, Sep 16, 2010 at 1:11 PM, Dylan Beaudette
debeaude...@ucdavis.edu wrote:
 Hi,

 I have been trying to use the new .parallel argument with the most recent
 version of plyr [1] to speed up some tasks. I can run the example in the NEWS
 file [1], and it seems to be working correctly. However, R will only use a
 single core when I try to apply this same approach with ddply().

 1. http://cran.r-project.org/web/packages/plyr/NEWS

 Watching my CPUs I see that in both cases only a single core is used, and they
 take about the same amount of time. Is there a limitation with how ddply()
 dispatches parallel jobs, or is this task not suitable for parallel
 computing?

 Cheers,
 Dylan


 Here is an example:

 library(plyr)
 library(doMC)
 registerDoMC(cores=2)

 # example data
 d - data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500))

 # function that wastes some time
 f - function(x) {
 m - vector(length=1)
 for(i in 1:1) {
        m[i] - mean(sample(x$y, 100))
        }
 mean(m)
 }

 system.time(ddply(d, .(id), .fun=f, .parallel=FALSE))
 #  user  system elapsed
 #  2.740   0.016   2.766

 system.time(ddply(d, .(id), .fun=f, .parallel=TRUE))
 #  user  system elapsed
 #  2.720   0.000   2.726





 --
 Dylan Beaudette
 Soil Resource Laboratory
 http://casoilresource.lawr.ucdavis.edu/
 University of California at Davis
 530.754.7341

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with reshape2 on Mac

2010-09-13 Thread Hadley Wickham

Hi Uwe,

The problem is most likely because the original poster doesn't have
the latest version of plyr.  I correctly declare this dependency in
the DESCRIPTION
(http://cran.r-project.org/web/packages/reshape2/index.html), but
unfortunately R doesn't seem to use this information at run time,
generally creating many error reports of this nature.

Hadley

2010/9/13 Uwe Ligges lig...@statistik.tu-dortmund.de:
 Is this a recent version of R? If so, please report to the maintainer.
 Otherwise, please also report that it does not work with your version of R
 so that the maintainer can add a version dependency.

 Best,
 Uwe Ligges

 On 13.09.2010 17:42, Paul Metzner wrote:

 Hi!

 I updated to reshape2 yesterday and tried to make it work. Unfortunately,
 it mainly throws error messages at me (good thing it's reshape2 1.0 and not
 reshape 2.0). The most recent is:

 Error in match.fun(FUN) : object 'id' not found

 When I manually create an object 'id', it says:

 Error in get(as.character(FUN), mode = function, envir = envir) :
   object 'id' of mode 'function' was not found

 I assume that dcast is looking for a function by the name 'id' which is
 not present. I tried both Rdaemon within TextMate and R in the Terminal. I
 also tried both my own code and the airquality example. reshape is still
 working flawlessly. I also needed to load plyr manually to make another
 error message go away, that asked for 'as.quoted'.

 Best,
 Paul

 ---
 Paul Metzner

 Humboldt-Universität zu Berlin
 Philosophische Fakultät II
 Institut für deutsche Sprache und Linguistik

 Post: Unter den Linden 6 | 10099 Berlin | Deutschland
 Besuch: Dorotheenstraße 24 | 10117 Berlin | Deutschland

 +49-(0)30-2093-9726
 paul.metz...@rz.hu-berlin.de
 http://amor.rz.hu-berlin.de/~metznerp/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] post

2010-09-13 Thread Hadley Wickham

Have a look at:

Computing Thousands of Test Statistics Simultaneously in R by Holger
Schwender and Tina Müller, in
http://stat-computing.org/newsletter/issues/scgn-18-1.pdf

Hadley

On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush usha...@yahoo.com wrote:
 Hello,

 I have a question regarding how to speed up the t.test on large dataset. For 
 example, I have a table tab which looks like:

        a       b       c       d       e       f       g       h
 1
 2
 3
 4
 5

 ...

 10

 dim(tab) is 10 x 100



 I need to do the t.test for each row on the two subsets of columns, ie to 
 compare a b d group against e f g group at each row.


 subset 1:
        a       b       d
 1
 2
 3
 4
 5

 ...

 10


 subset 2:
        e       f       g
 1
 2
 3
 4
 5

 ...

 10

    10 t.test's for each row for these two subsets will take around 1 min. 
 The prblem is that I have around 1 different combinations of such a 
 subsets. therefore 1min*1
 =1min in the case if I will use for loop like this:

 n1=1 #number of subset combinations
 for (i1 in 1:n1) {

 n2=10 # number of rows
 i2=1
 for (i2 in 1:n1) {
        t.test(tab[i2,v5],tab[i2,v6])$p.value  #v5 and v6 are vectors 
 containing the veriable names for the two subsets (they are different for 
 each loop)
        }

 }


 My question is there more efficient way how to do this computations in a 
 short period of time? Any packages, like plyr? May be direct calculations 
 isted of using t.test function?


 Thank you.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1415 matches

Mail list logo