from:"Hadley Wickham"

Re: [R] ggplot2: geom_errorbarh()

2009-07-12 Thread hadley wickham

Hi Benoit,

What do you expect width to do?  You are already setting the left and
right extents with xmin and xmax.

Hadley

On Thu, Jul 9, 2009 at 10:37 AM, Benoit
Boulinguiez wrote:
> Hi all,
>
> quick question: is the optional command "width" effective in the
> geom_errorbarh() layer of ggplot?
> Cause I can't get it works on this graph
> http://www.4shared.com/file/116919103/93488d88/iso_2PrsH.html
>
>
> pdf(file = "iso_2PrsH.pdf", width = 7, height = 7)
> NC60.iso.graph<-ggplot(
>  NC60.DATA
>  ,aes(Ce,Qe)) +
>  geom_point(col=MaCouleur1, size=4) +
>
>  geom_errorbar(
>  aes(ymax = NC60.DATA$Qe+NC60.DATA$sdQe
>   ,ymin=NC60.DATA$Qe-NC60.DATA$sdQe)
>   ,colour=alpha("black",0.4)
>   ,width=1) +
>
>  geom_errorbarh(
>  aes(xmax = NC60.DATA$Ce+NC60.DATA$sdCe
>   ,xmin=NC60.DATA$Ce-NC60.DATA$sdCe)
>   ,colour=alpha("black",0.4)
>   ,width=1) +
>
>  geom_line(data=NC60.Res4.curve
>  ,aes(x,y)
>  ,size=1
>  ,colour=alpha("black",0.5)) +
>  xlab(C[e]~(mmol/m^3)) +
>  ylab(q[e]~(mmol/m^3))
>
> print(NC60.iso.graph)
> dev.off()
>
> Regards/Cordialement
>
> -
> Benoit Boulinguiez
> Ph.D student
> Ecole de Chimie de Rennes (ENSCR) Bureau 1.20
> Equipe CIP UMR CNRS 6226 "Sciences Chimiques de Rennes"
> Avenue du Général Leclerc
> CS 50837
> 35708 Rennes CEDEX 7
> Tel 33 (0)2 23 23 80 83
> Fax 33 (0)2 23 23 81 20
>   http://www.ensc-rennes.fr/
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ReShape chicks example - line plots

2009-07-06 Thread hadley wickham

On Mon, Jul 6, 2009 at 8:22 PM, Mark Knecht wrote:
> Hi,
>   In the examples from the ReShape package there is a simple example
> of using melt followed by cast that produces a smallish amount of
> output about the chicks database. Here's the code:
>
> library(reshape)
>
> names(ChickWeight) <- tolower(names(ChickWeight))
> chick_m <- melt(ChickWeight, id=2:4, na.rm=TRUE)
> DietResults <- cast(chick_m, diet + chick ~ time)
> DietResults
>
>   My challenge is to extract an plot only a portion of this data.
>
>   I would like to plot the data for each chick that participated in
> diet 1 only. Assume that the numbered column names (0,2,4, ...)
> represent time on the diet and will be the X axis. Y values on the
> plot will be the value in the table. (chick weight) Y maximum should
> be larger than the max value in the diet 1 portion of the table.
> Additionally if a chick's number is even I would like to plot it's
> results in green, if it's odd then plot in red. The plot should use a
> line type so that in the general case I could trace an individual
> chick's progress on the diet. I don't care if I use plot vs any other
> command that would make a plot with colored lines. I would *prefer*
> that the code discovers where in DietResults the column entitled "0"
> is as I don't know where the beginning of the data will be based on
> how many variables I bin for in cast.

Generally, I think it's easier to work with longitudinal data with
time as its own column.  It makes plotting and analysis much easier:

library(ggplot2)
qplot(time, value, data = chick_m, group = chick,
  colour = as.numeric(as.character(chick)) %% 2, geom = "line")

It's far easier to see what's going on:

 * on the x-axis, time
 * on the y-axis, value (weight)
 * grouped by chick

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA values trimming

2009-07-06 Thread hadley wickham

On Mon, Jul 6, 2009 at 12:12 AM, nyk wrote:
>
> Thanks for your reply! This is what I was looking for!
> I'm using
> nas1 <- apply(data_matrix,1,function(x)sum(is.na(x))/nrow(data_matrix))
> nas2 <- apply(data_matrix,2,function(x)sum(is.na(x))/ncol(data_matrix))

You can simplify this a little:

perc_missing <- function(x) mean(is.na(x))

nas1 <- apply(data_matrix,1, perc_missing)
nas2 <- apply(data_matrix,2, perc_missing)

or if your matrix is really big the following should be faster:

nas1 <- rowMeans(is.na(data_matrix))
nas2 <- colMeans(is.na(data_matrix))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: Boxplot with given box size

2009-07-05 Thread hadley wickham

Hi Malcolm,

You need to tell geom_boxplot not to use stat_boxplot:
geom_boxplot(aes(lower=y_q1, upper=y_q3, middle=y_med, ymin=y_min,
ymax=y_max), stat = "identity")

Hadley

On Mon, Jul 6, 2009 at 6:55 AM, Malcolm Ryan wrote:
> Is there anyway in ggplot2 to set the aesthetics for a geom_boxplot
> directly, rather than having them computed by an implicit stat_boxplot?
>
> If I try:
>
> ggplot(data = t, aes(x = factor(x))) + geom_boxplot(coef=NULL,
> aes(lower=y_q1, upper=y_q3, middle=y_med, ymin=y_min, ymax=y_max))
>
> I get the error:
>
> Error: stat_boxplot requires the following missing aesthetics: y
>
> Malcolm
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OK - I got the data - now what? :-)

2009-07-05 Thread hadley wickham

>   I think the root cause of a number of my coding problems in R right
> now is my lack of skills in reading and grabbing portions of the data
> out of arrays. I'm new at this. (And not a programmer) I need to find
> some good examples to read and test on that subject. If I could locate
> which column was called C1, then read row 3 from C1 up to the last
> value before a 0, I'd have proper data to plot for one line. Repeat as
> necessary through the array and I get all the lines. Doing the lines
> one at a time should allow me the opportunity to apply color or not
> plot based on values in the first few columns.
>
> Thanks,
> Mark
>
> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> test<-round(test,2)
>
> #Make array ragged
> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> test$C6[7]<-0
> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>
> #Print array
> test

Are the zeros always going to be arranged like this? i.e. for
experiment there is a point at which all later values are zero?  If
so, the following is a much simpler way of getting to the core of your
data, without fussing with overly complicated matrix indexing:

library(reshape)
testm <- melt(test, id = c("A", "B"))
subset(testm, value > 0)

I suspect you will also find this form easier to plot and analyse.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Skeleton Package to Flesh Out?

2009-07-05 Thread hadley wickham

Also make sure to check roxygen (from roxygen.org) - it makes package
documentation much much easier.  Ironically, the documentation for
roxygen currently leaves something to be desired but I think Peter and
Manuel are working on it.

Hadley

On Sat, Jul 4, 2009 at 3:59 PM, Jason Rupert wrote:
>
> By any chance is there a skeleton package to use as a template to develop an 
> R package?
>
> I downloaded "Writing R Extensions", which was evidently updated pretty 
> recently, but I did not see any references (and of course I may have totally 
> missed it) to a package template to use as a go by.
>
> Does such a skeleton package exist?
>
> Thanks again for all your help, as I've got the code, but I just need to 
> stick it into an R package.
>
> Thanks again.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What command lists everything in a package?

2009-07-05 Thread hadley wickham

> 2) Related to the above, how do I tell what packages are currently
> loaded at any given time so that I don't waste time loading things
> that are already loaded? search() tells me what's available, but
> what's loaded? The best I can find so far goes like this:

Loading something a second time takes hardly any time, so why worry about it?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with dealing with integer(0) returns from grep used within a conditional loop

2009-07-04 Thread hadley wickham

On Sat, Jul 4, 2009 at 7:56 PM, Mark Kimpel wrote:
> I am using grep to locate colnames to automate a report build and have
> run into a problem when a colname is not found. The use of integer(0)
> in a conditional statement seems to be a no no as it has length 0.
> Below is a self-contained trivial example. I would like to get
> something like "NA" or -1 for the position when it is not found OR
> learn a way to use integer(0) or some "cast" of it in a logical
> statement. Example, output, and sessionInfo follow. Thanks, Mark

You might also consider using grepl instead of grep.  grepl works just
like grep, but returns a logical vector.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing expression as argument to do.call

2009-07-02 Thread hadley wickham

On Thu, Jul 2, 2009 at 3:34 PM, Sebastien
Bihorel wrote:
> Dear R-users,
>
> I would like to know how expressions could be passed as arguments to do.call
> functions. As illustrated in the short example below, concatenating lists
> objects and an expression creates an expression object, which is not an
> acceptable argument for do.call. Is there a way to avoid that?
>
> Thanks you
>
> Sebastien
>
>
> foo <- list(x=1:10, y=1:10)
> mylist <- list(pch=6, col=2)
> title <- "1 microgram"
> title2 <- expression ("1 " mu "g")
>
> do.call(plot, c(foo, mylist, main=title))
>
> class(c(foo, mylist, main=title2))
>
> do.call(plot, c(foo, mylist, main=title2))

do.call(plot, c(foo, mylist, list(main=title2)))

Both foo and myllist are already lists, but title2 isn't.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Learning S3

2009-07-02 Thread Hadley Wickham

On Thu, Jun 18, 2009 at 12:08 PM, Dirk Eddelbuettel wrote:
>
> On 18 June 2009 at 09:36, Bert Gunter wrote:
> | -- or Chapter 4 in S PROGRAMMING? (you'll need to determine if it's "reader
> | friendly")
>
> +1
>
> It helped me a lot too back in the day.  But I am wondering if there are good
> current alternatives.   Hadley: if you could, please send a summary back to
> the list.

Unfortunately there doesn't seem to have been much change since then.
The resources I was pointed to include:

?UseMethod
?Math

Chapter 3.10 in MASS
Chapter 4 in S programming

Also see R.methodsS3 for some functions to make programming with S3
methods easier.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Select values at random by id value

2009-07-02 Thread hadley wickham

On Thu, Jul 2, 2009 at 8:15 AM, James Martin wrote:
> Hadley, Sunil, and list,
>
> This is not quite doing what I wanted it to do (as far as I can tell). I
> perhaps did not explain it thoroughly.  It seems to be sampling one value
> for each day leaving ~200 observations. I need for it randomly chose one hab
> value for each bird if there is more than one value for a given day, I will
> try and example below.
>
> id,date,location2,hab
>
> 1,05/23/06,0,1
> 1,05/23/06,0,2
> 1,05/23/06,0,1
>
> So in this case the animal was located 3 times on may 23rd but I only want
> one of the locations and instead of arbitrarily choosing one I wanted to
> randomly sample one.

ddply(df, c("date", "location"), function(df) df[sample(nrow(df), 1), ])

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Select values at random by id value

2009-07-01 Thread hadley wickham

On Wed, Jul 1, 2009 at 2:10 PM, Sunil
Suchindran wrote:
> #Highlight the text below (without the header)
> # read the data in from clipboard
>
> df <- do.call(data.frame, scan("clipboard", what=list(id=0,
> date="",loctype=0 ,haptype=0)))
>
> # split the data by date, sample 1 observation from each split, and rbind
>
> sampled_df <- do.call(rbind, lapply(split(df,
> df$date),function(x)x[sample(1:nrow(x), 1),]))

ddply from the plyr package (http://had.co.nz/plyr), makes this sort
of operation a little simpler:

ddply(df, "date", function(df) df[sample(nrow(df), 1), ])

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I change which R Graphics Device is active?

2009-06-30 Thread hadley wickham

On Tue, Jun 30, 2009 at 2:12 PM, Barry
Rowlingson wrote:
> On Tue, Jun 30, 2009 at 8:05 PM, Mark Knecht wrote:
>
>> You could wrap it in a function of your own making, right?
>>
>> AddNewDev = function() {dev.new();AddNewDev=dev.cur()}
>>
>> histPlot=AddNewDev()
>>
>> Seems to work.
>
>  You leaRn fast :) Probably better style is:
>
>  newDev = function(){dev.new();return(dev.cur())}
>
>  - which returns the value explicitly with return().

R isn't C! ;)  I'd claim idiomatic R only uses return for special
cases (i.e. when you can terminate the function early)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 x axis question

2009-06-29 Thread hadley wickham

In that case, try:

qplot(reorder(factor(model),delta),delta,data=growthm.bic)

Deepayan: do you think there should also be a numeric method for reorder?

Hadley

On Mon, Jun 29, 2009 at 10:39 AM, Christopher
Desjardins wrote:
> Hi Hadley,
> Thanks for the reply and the great graphing package. That code is giving me
> the following error:
>
>> qplot(reorder(model,delta),delta,data=growthm.bic)
> Error in UseMethod("reorder") : no applicable method for "reorder"
>
> Cheers,
> Chris
>
> On 6/28/09 8:21 PM, hadley wickham wrote:
>
> Hi Chris,
>
> Try this:
>
> qplot(reorder(model, delta), delta, data = growthm.bic)
>
> Hadley
>
> On Sun, Jun 28, 2009 at 9:53 AM, Christopher
> Desjardins wrote:
>
>
> Hi,
> I have 45 models that I have named: 1, 2, 3, ... , 45 and I am trying to
> plot them in order of ascending BIC values. I am however unclear as to how I
> can get the models to line up on the x-axis by BIC and not by numeric order.
> For example, if model 5 has a lower BIC than 1, I want it to be the first
> point on the left hand side of the curve. This seems to work in plot:
>
> plot(1:45, growthm.bic$delta, type="b", xaxt = "n", xlab="Model",
> ylab=expression(Delta[k]))   # where growthm.bic$delta is my BIC value
> axis(1, at=1:45, labels=growthm.bic$Model) #where model is the name of the
> 45 models examined, i.e 1:45
>
> Currently using qplot I have this which doesn't not work as it arranges the
> BIC values in order from 1:45.
>
> qplot(model,delta,data=growthm.bic)
>
> Thanks. Also please cc me as I am a digest subscriber.
> Chris
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 x axis question

2009-06-28 Thread hadley wickham

Hi Chris,

Try this:

qplot(reorder(model, delta), delta, data = growthm.bic)

Hadley

On Sun, Jun 28, 2009 at 9:53 AM, Christopher
Desjardins wrote:
> Hi,
> I have 45 models that I have named: 1, 2, 3, ... , 45 and I am trying to
> plot them in order of ascending BIC values. I am however unclear as to how I
> can get the models to line up on the x-axis by BIC and not by numeric order.
> For example, if model 5 has a lower BIC than 1, I want it to be the first
> point on the left hand side of the curve. This seems to work in plot:
>
> plot(1:45, growthm.bic$delta, type="b", xaxt = "n", xlab="Model",
> ylab=expression(Delta[k]))   # where growthm.bic$delta is my BIC value
> axis(1, at=1:45, labels=growthm.bic$Model) #where model is the name of the
> 45 models examined, i.e 1:45
>
> Currently using qplot I have this which doesn't not work as it arranges the
> BIC values in order from 1:45.
>
> qplot(model,delta,data=growthm.bic)
>
> Thanks. Also please cc me as I am a digest subscriber.
> Chris
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple loop

2009-06-28 Thread hadley wickham

> Also consider ddply in the plyr package (although that's an over kill if
> your only having two loops)

Maybe, but it sure is much simpler:

library(plyr)
ddply(data, c("industry","year"), summarise, avg = mean(X1))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a plot of stacked boxes

2009-06-26 Thread hadley wickham

On Fri, Jun 26, 2009 at 10:27 PM, Osman Al-Radi wrote:
> Dear Richard and David,
>
> Thanks for this reference. I looked into vcd and mosaic plot, it is a nice
> plot for investigating associations between two or more variables. However,
> I just need to plot the frequency of a single variable as the area of the
> box. boxes are stacked to fill a larger box that represents the entire
> population. The axes are non-informative.
>
> I am trying to recreate the plot in the following website, used to represent
> the market capital of public companies. I would like to use a similar plot
> for a totally different application.
>
> The website: http://finviz.com/map.ashx

That plot is called a treemap, and as far as I know there is no
implementation in R.  If you wanted you to implement it yourself, the
java code from 
http://www.cs.umd.edu/hcil/treemap-history/Treemaps-Java-Algorithms.zip
would probably be a good place to start.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using by() and stacking back sub-data frames to one data frame

2009-06-25 Thread hadley wickham

Have a look at ddply from the plyr package, http://had.co.nz/plyr.
It's made for exactly this type of operation.

Hadley

On Wed, Jun 24, 2009 at 10:34 PM, Stephan Lindner wrote:
> Dear all,
>
>
> I have a code where I subset a data frame to match entries within
> levels of an factor (actually, the full script uses three difference
> factors do do that). I'm very happy with the precision with which I can
> work with R, but since I loop over factor levels, and the data frame is
> big, the process is slow. So I've been trying to speed up the process
> using by(), but I got stuck at the point where I want to stack back
> the sub- data frames, and I was wondering whether someone could help me
> out.
>
> Here is an example:
>
> <--
>
>> y <- data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
>                 month = rep(c(12,1,2,3),5),
>                 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))
>
>
>> by(y,y$month,function(x)return(x))
>
> y$month: 1
>      suid month esr
> 2  1074034     1   2
> 6  1074034     1   1
> 10 1074034     1   2
> 14 1074034     1   9
> 18 1123003     1   2
> 
> y$month: 2
>      suid month esr
> 3  1074034     2   2
> 7  1074034     2   1
> 11 1074034     2   2
> 15 1074034     2   9
> 19 1123003     2   2
> 
> y$month: 3
>      suid month esr
> 4  1074034     3   2
> 8  1074034     3   1
> 12 1074034     3   2
> 16 1074034     3   9
> 20 1123003     3   2
> 
> y$month: 12
>      suid month esr
> 1  1074034    12   6
> 5  1074034    12   1
> 9  1074034    12   2
> 13 1074034    12   9
> 17 1123003    12   2
>
> -->
>
> What I would like to do is stacking these four data frames back to one
> data frame, which in this simple example would just be y. I tried
> unlist(), unclass() and rbind(), but none of them would work.
>
>
> Thanks a lot,
>
>
>
>        Stephan
>
>
>
>
>
>
>
>
>
>
> --
> ---
> Stephan Lindner
> University of Michigan
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "by" question

2009-06-24 Thread hadley wickham

You might also want to look at the plyr package,
http://had.co.nz/plyr.  In particular, ddply + transform makes these
tasks very easy.

library(plyr)
ddply(mtcars, "cyl", transform, pos = seq_along(cyl), mpg_avg = mean(mpg))

Hadley

On Wed, Jun 24, 2009 at 11:48 AM, David
Hugh-Jones wrote:
> That seems to work. I should add that to make "ave" work like "by" one can
> do:
>
> mydata$newvar <- ave(1:nrow(mydata), mydata$some_factor, FUN= function (x) {
>  x <- ds[x,]
> # ... etc...
> })
>
> Thanks!
> David
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Apply as.factor (or as.numeric etc) to multiple columns

2009-06-23 Thread hadley wickham

Hi Mark,

Have a look at colwise (and numcolwise and catcolwise) in the plyr package.

Hadley

On Tue, Jun 23, 2009 at 4:23 PM, Mark Na wrote:
> Hi R-helpers,
>
> I have a dataframe with 60columns and I would like to convert several
> columns to factor, others to numeric, and yet others to dates. Rather
> than having 60 lines like this:
>
> data$Var1<-as.factor(data$Var1)
>
> I wonder if it's possible to write one line of code (per data type,
> e.g. factor) that would apply a function (e.g., as.factor) to several
> (non-contiguous) columns. So, I could then use 3 or 4 lines of code
> (for 3 or 4 data types) instead of 60.
>
> I have tried writing an apply function, but it failed.
>
> Thanks for any help you might be able to provide.
>
> Mark Na
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] plyr 0.1.9

2009-06-23 Thread Hadley Wickham

plyr is a set of tools for a common set of problems: you need to break
down a big data structure into manageable pieces, operate on each
piece and then put all the pieces back together.  For example, you
might want to:

  * fit the same model to subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising
  * eliminate for-loops in your code

It's already possible to do this with built-in functions (like split
and the apply functions), but plyr just makes it all a bit easier
with:

  * absolutely consistent names, arguments and outputs
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages

Some considerable effort has been put into making plyr fast and memory
efficient, and in most cases it is faster than the built-in functions.

You can find out more at http://had.co.nz/plyr/, including a 20 page
introductory guide, http://had.co.nz/plyr/plyr-intro.pdf.  You can ask
questions about plyr (and data-manipulation in general) on the plyr
mailing list.  Sign up at http://groups.google.com/group/manipulatr

plyr 0.1.9 (2009-06-23) ---

* fix bug in rbind.fill when NULLs present in list
* improve each to recognise when all elements are numeric
* fix labelling bug in d*ply when .drop = FALSE
* additional methods for quoted objects
* add summarise helper - this function is like transform, but creates
a new data frame rather than reusing the old (thanks to Brendan
O'Connor for the neat idea)


-- 
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Roxygen vs Sweave for S4 documentation

2009-06-21 Thread hadley wickham

> I have been using R for a while.  Recently, I have begun converting my
> package into S4 classes.  I was previously using Rdoc for documentation.
> Now, I am looking to use the best tool for S4 documentation.  It seems that
> the best choices for me are Roxygen and Sweave (I am fine with tex).
>
> Are there any users of Roxygen or Sweave who can comment on the strengths or
> weaknesses of one or othe other?  Thanks in advance.

Sweave isn't used for writing Rdoc files.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataset suggestion sought

2009-06-18 Thread hadley wickham

> In revising my book Regression Modeling Strategies for a second edition, I
> am seeking a dataset for exemplifying multiple regression using least
> squares.  Ideally the dataset would have 5-40 variables and 40-1
> independent observations, and would generate significant interest for a wide
> variety of readers.  For example, the topic could be political science,
> society, human suffering, sports, psychology, economics, entertainment,
> history, etc.  The dataset needs to be publicly available.

I have a few datasets that might be of interest:

* Movie rankings from imdb, https://github.com/hadley/data-movies/tree

* Prices of 50,000 round cut diamonds (included in ggplot2)

* Baby name popularity for the top 1000 names over the whole USA
1880-2008, and top 100 names per state 1960 to 2008,
https://github.com/hadley/data-baby-names/tree

 * EPA fuel economy measurements for all cars tested in the US,
https://github.com/hadley/data-fuel-economy/tree

 * Many datasets about the US housing crisis (work in progress),
https://github.com/hadley/data-housing-crisis

 * 500,000 house sales in the Bay Area, https://github.com/hadley/sfhousing/tree

If any of those sound of interest, I can provide more details.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Learning S3

2009-06-18 Thread Hadley Wickham

I think I remember reading that some time back and finding it
confusing because it described the ideal implementation in S, rather
than the actual implementation in R.  I'll look at it again.

Hadley

On Thu, Jun 18, 2009 at 11:36 AM, Bert Gunter wrote:
> -- or Chapter 4 in S PROGRAMMING? (you'll need to determine if it's "reader
> friendly")
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Gabor Grothendieck
> Sent: Thursday, June 18, 2009 9:17 AM
> To: Hadley Wickham
> Cc: r-help
> Subject: Re: [R] Learning S3
>
> There is a section on Object Orientation in MASS (I have 2nd ed).
>
> On Thu, Jun 18, 2009 at 12:06 PM, Hadley Wickham wrote:
>> Hi all,
>>
>> Do you know of any good resources for learning how S3 works?  I've
>> some how become familiar with it by reading many small pieces, but now
>> that I'm teaching it to students I'm wondering if there are any good
>> resources that describe it completely, especially in a reader-friendly
>> way.  So far I've found:
>>
>>  *
> http://cran.r-project.org/doc/manuals/R-lang.html#Object_002doriented-progra
> mming
>> - it has most of the theory (although some bits are missing), but no
>> examples
>>
>>  * page 33 of http://CRAN.R-project.org/doc/Rnews/Rnews_2004-1.pdf
>> shows how to create a simple object in both S3 and S4
>>
>> What has helped you learn S3?
>>
>> Regards,
>>
>> Hadley
>>
>> --
>> http://had.co.nz/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Learning S3

2009-06-18 Thread Hadley Wickham

Hi all,

Do you know of any good resources for learning how S3 works?  I've
some how become familiar with it by reading many small pieces, but now
that I'm teaching it to students I'm wondering if there are any good
resources that describe it completely, especially in a reader-friendly
way.  So far I've found:

 * 
http://cran.r-project.org/doc/manuals/R-lang.html#Object_002doriented-programming
- it has most of the theory (although some bits are missing), but no
examples

 * page 33 of http://CRAN.R-project.org/doc/Rnews/Rnews_2004-1.pdf
shows how to create a simple object in both S3 and S4

What has helped you learn S3?

Regards,

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [OT] VBA to save excel as csv

2009-06-15 Thread Hadley Wickham

Hi all,

This is a little off-topic, but it is on the general topic of getting
data in R.  I'm looking for a excel macro / vba script that will
export all spreadsheets in a directory (with one file per tab) into
csv.  Does anyone have anything like this?

Thanks,

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Programmatically copying a graphic to the clipboard

2009-06-12 Thread Hadley Wickham

Hi all,

Is there a cross-platform way to do this?  On the mac, I cando this by
saving an eps file, and then using pbcopy. Is it possible on other
platforms?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to substitute missing values (NAs) by the group means

2009-06-08 Thread hadley wickham

On Mon, Jun 8, 2009 at 8:56 PM, Mao Jianfeng wrote:
> Dear Ruser's
>
> I ask for helps on how to substitute missing values (NAs) by mean of the
> group it is belonging to.
>
> my dummy dataframe is:
>
>> df
>       group traits
> 1  BSPy01-10     NA
> 2  BSPy01-10    7.3
> 3  BSPy01-10    7.3
> 4  BSPy01-11    5.3
> 5  BSPy01-11    5.4
> 6  BSPy01-11    5.6
> 7  BSPy01-11     NA
> 8  BSPy01-11     NA
> 9  BSPy01-11    4.8
> 10 BSPy01-12    8.1
> 11 BSPy01-12    6.0
> 12 BSPy01-12    6.0
> 13 BSPy01-13    6.1
>
>
> I want to substitute each "NA" by the group mean of which the "NA" is
> belonging to. For example, substitute the first record of traits "NA" by the
> mean of "BSPy01-10".

Here's yet another way, using the plyr package, http://had.co.nz/

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
ddply(df, ~ group, transform, traits = impute.mean(traits))

Or if you wanted to make it a little more generic

impute <- function(x, fun) {
  missing <- is.na(x)
  replace(x, missing, fun(x[!missing]))
}
ddply(df, ~ group, transform, traits = impute(traits, mean))
ddply(df, ~ group, transform, traits = impute(traits, median))
ddply(df, ~ group, transform, traits = impute(traits, min))

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looking for easy way to normalize data by groups

2009-06-08 Thread hadley wickham

On Mon, Jun 8, 2009 at 10:29 AM, Herbert
Jägle wrote:
> Hi,
>
> i do have a dataframe representing data from a repeated experiment. PID is a
> subject identifier, Time are timepoints in an experiment which was repeated
> twice. For each subject and all three timepoints there are 2 sets of four
> values.
>
> df <- data.frame(PID = c(rep("A", 12), rep("B", 12), rep("C", 12)),
>                Time = rep(c(0, 0, 0, 0, 30, 30, 30, 30, 60, 60, 60, 60), 3),
>                Dset = rep(c(1, 2),18),
>                Val1 = rnorm(36),
>                Val2 = rnorm(36),
>                Val3 = rnorm(36),
>                Val4 = rnorm(36))
>
> You can plot the data nicely with x=Time and y=Val1 by grouping PID and
> facetting for Dset.
>
> p <- ggplot(df) +
>       geom_line(aes(x=Time,y=Val1,group=PID)) +
>       geom_point(aes(x=Time,y=Val1,colour=PID)) +
>       facet_grid(. ~ Ecc)
>   theme_set(theme_bw())
> p
>
> I would now like to normalize these data to the mean of the two values at
> Time = 0 for each subject (so having plots in % of the mean Time=0 value
> rather than absolute values).

Maybe like this?

library(plyr)

ggplot(df, aes(Time, Val1, colour = PID)) +
  geom_line(stat="summary", fun.y = mean) +
  geom_point() +
  facet_grid(. ~ Dset)

std <- ddply(df, c("PID", "Dset"), transform, Val1 = Val1 /
mean(Val1[Time == min(Time)]))

last_plot() %+% std

I modified the plot so it's a bit more informative.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A very frustrating read.table error message

2009-06-06 Thread hadley wickham

On Sat, Jun 6, 2009 at 5:02 PM, Adam D. I. Kramer wrote:
> Dear Colleagues,
>
>        Occasionally I deal with computer-generated (i.e., websurvey) data
> files that haven't quite worked correctly. When I try to read the data into
> R, I get something like this:
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>  line 26 did not have 648 elements
>
> ...is there any way to get R to tell me how many elements line 26 *did*
> have? That information would take this error message from frustrating to
> useful.  :)

?count.fields

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT: Inference for R - Interview

2009-06-04 Thread hadley wickham

Is it really necessary to further advertise this company which already
spams R-help subscribers?
Hadley

On Thu, Jun 4, 2009 at 10:41 PM, Ajay ohri  wrote:
> Dear All,
>
> Slightly off -non technical topic ( but hey it is Friday)
>
> Following last week's interview with REvolution Computing which makes
> enterprise  versions of R,  here is another interview with the rapidly
> growing company Blue Reference CEOPaul van Eikeren at
> 
> http://www.decisionstats.com/2009/06/04/interview-inference-for-r/
>
> Paul talks on his product, Inference for R- a add on plugin which makes a R
> GUI within Office Excel available for 199$ a year ( and *separate
> academic*program as well) for enhanced analytics as well as graphical
> capabilities.
>
>
> Best Regards,
>
> Ajay Ohri
>
> www.decisionstats.com
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Minor tick marks for date/time ggplot2 (this is better, but not exactly what I want)

2009-06-04 Thread hadley wickham

On Mon, Jun 1, 2009 at 2:18 PM, stephen sefick  wrote:
> library(ggplot2)
>
> melt.updn <- (structure(list(date = structure(c(11808, 11869, 11961, 11992,
> 12084, 12173, 12265, 12418, 12600, 12631, 12753, 12996, 13057,
> 13149, 11808, 11869, 11961, 11992, 12084, 12173, 12265, 12418,
> 12600, 12631, 12753, 12996, 13057, 13149), class = "Date"), site =
> structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("unrestored",
> "restored"), class = "factor"), value = c(1.10799962473684, 0.732152347032395,
> 0.410438861475827, 0.458941230025228, 0.429883166858706, 0.831083728521569,
> 0.601942073736539, 0.81855597155132, 1.12612228239269, 0.246006569972335,
> 0.940239233910111, 0.98645360143702, 0.291191536260016, 0.346271105079473,
> 1.36216149279675, 0.878585508942967, 0.525184260519839, 0.803247305232454,
> 1.08086182748669, 1.24915815325761, 0.971046497346528, 0.936835411801682,
> 1.26957337598606, 0.337691543740682, 0.90931142298893, 0.950891472223867,
> 0.290354002109368, 0.426509990013021)), .Names = c("date", "site",
> "value"), row.names = c(NA, -28L), class = "data.frame"))
>
> #I would also like to add tick marks to this graph is possible with no
> label for the months in between the years
> qplot(date, value, data=melt.updn, shape=site, ylab="Distance"
> ,main="Euclidean Distances Time Series", xlim=c(as.Date("2002-1-1"),
> as.Date("2006-3-1")))+geom_line()+theme_bw()+geom_vline(x=as.numeric(as.Date("2002-11-01")))
> + opts(panel.grid.major = theme_line(colour="grey", size=0.75),
> panel.grid.minor=theme_line(colour="grey", size=0.25))

Unfortunately that's currently not possible - tick marks are always
associated with major grid lines, and more importantly currently
scale_date only lets you specify the time between ticks, not their
labels.  Are the minor monthly grid lines not good enough?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Still can't find missing data - How do I get NA in xtabs with factors?

2009-06-02 Thread hadley wickham

>> Let's see if I understand this.  Do I iterate through
>>    x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
>> for each of the few hundred variables (x) in my data frame?
>
>
> Yes, for all being factors.

Wouldn't addNA() be the preferred method?

To do it for all variables is pretty simple:
cat <- sapply(df, is.factor)
df[cat] <- lapply(df[cat], addNA, ifany = TRUE)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [ANN] ggplot2 + rggobi course. July 30-31, Washington DC

2009-06-02 Thread hadley wickham

Hi everyone,

We still have places available for our two-day looking-at-data course:

July 30-31
Washington DC

Day one: static graphics with ggplot2
Day two: interactive graphics with rggobi and GGobi.

You can attend one day (for $295) or both days (for $550).
Student discounts are available.

All proceeds go to the GGobi Foundation to support graphics research.

Find out more, and book your tickets online at
http://lookingatdata.com

Regards,

Hadley Wickham
Dianne Cook

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 and Date class

2009-06-01 Thread hadley wickham

You might have an out-of-date version of the plyr package - try
install.packages("plyr")

Hadley

On Mon, Jun 1, 2009 at 10:20 AM, Matt Frost  wrote:
> I'm trying to plot a time series in ggplot, but a date column in my
> data frame is causing errors. Rather than provide my own data, I'll
> just refer to the scale_date example at:
> http://had.co.nz/ggplot2/scale_date.html
> , which reproduces the error.
>
>> df <- data.frame( date = seq(Sys.Date(), len=100, by="1 day")[sample(100, 
>> 50)], price = runif(50) )
>> dt <- qplot(date, price, data=df, geom="line") + opts(aspect.ratio = 1/4)
>> dt + scale_x_date()
>
> Error in aesdefaults(data, .$geom$default_aes(), compact(.$mapping)) :
> could not find function "as_df"
>
> My ggplot2 package is up to date. Am I missing something else, maybe
> in my default aesthetics?
>
> Thanks,
> Matt Frost
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: annotating plot with mathematical formulae

2009-05-16 Thread hadley wickham

Hi Paul,

Unfortunately that's not something that's currently possible with
ggplot2, but I am thinking about how to make it possible.

Hadley

On Sat, May 16, 2009 at 7:48 AM, Paul Emberson  wrote:
> Hi Stephen,
>
> The problem is that the label on the graph doesn't get rendered with a
> superscript.  I want the label on the graph to be rendered the same way
> as the label you have put on the axis.
>
> I am plotting a piecewise function and I wanted to label each section of it.
>
> Paul
>
> stephen sefick wrote:
>> how about this
>>
>> a <- 1:10
>> b <- 1:10
>> d <- paste("x","^","{n-1}")
>> qplot(a,b, xlab=expression(x^{n-1}))+geom_text(aes(4,8, label=d))
>>
>> On Fri, May 15, 2009 at 10:02 PM, Paul Emberson  
>> wrote:
>>
>>> Hi,
>>>
>>> Is there a way of annotating a ggplot plot with mathematical formulae?
>>>
>>> I can do
>>>
>>> geom_text(aes(label="some text", ...
>>>
>>> but I can't do
>>>
>>> geom_text(aes(label=expression(x^{n-1}), ...
>>>
>>> It gives the error
>>>
>>> Error: geom_text requires the following missing aesthetics: label
>>>
>>> Is there a convenient equivalent?
>>>
>>> Cheers,
>>>
>>> Paul
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assign unique size of point in xyplot

2009-05-15 Thread hadley wickham

On Thu, May 14, 2009 at 2:14 PM, Garritt Page  wrote:
> Hello,I am using xyplot to try and create a conditional plot.  Below is a
> toy example of the type of data I am working with
>
> slevel <- rep(rep(c(0.5,0.9), each=2, times=2), times=2)
>
> tlevel <- rep(rep(c(0.5,0.9), each=4), times=2)
>
> noutliers <- rep(rep(c(2,4), times=4), times=2)
>
> analysis <- as.factor(rep(c('uv', 'mv'), each=8))
>
>
> cp <- c(0.9450,0.9525,0.9425,1.,0.9425,0.9410,0.900,0.800,0.9050,0.9020,
> 0.9040,0.9140,0.9400,0.9430,1.000,0.800 )
>
>
> area <- c(2.896485,4.952239,2.899030, 7.522729,2.827712, 4.950359,3.651156,
> 4.966610,2.85710, 6.649610 ,2.212295,2.778280,1.897921,  2.847249,1.777387,
> 2.418103)
>
>
>
> xyplot(cp  ~ noutliers|slevel*tlevel, group=analysis, cex=area,
>
> type='o', ylab="", col=c("red","blue"),
>
> xlab='',
>
>
>
> )
>
> This creates a trellis plot with four panels and four points in each panel.
>  I want the size of the points in each panel to be proportional to the value
> of area (simply putting cex=area in the xyplot function obviously isn't
> working).  I assume that I must create a customized panel function but
> haven't had any luck.  Any suggestions would be appreciated.

This is pretty easy to do in ggplot2:

library(ggplot2)
qplot(noutliers, cp, colour=analysis, size=area, facets = slevel ~ tlevel) +
  geom_line(size = 0.5)

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory usage grows too fast

2009-05-14 Thread hadley wickham

On Thu, May 14, 2009 at 6:21 PM, Ping-Hsun Hsieh  wrote:
> Hi All,
>
> I have a 1000x100 matrix.
> The calculation I would like to do is actually very simple: for each row, 
> calculate the frequency of a given pattern. For example, a toy dataset is as 
> follows.
>
> Col1    Col2    Col3    Col4
> 01      02      02      00              => Freq of “02” is 0.5
> 02      02      02      01              => Freq of “02” is 0.75
> 00      02      01      01              …
>
> My code is quite simple as the following to find the pattern “02”.
>
> OccurrenceRate_Fun<-function(dataMatrix)
> {
>  tmp<-NULL
>  tmpMatrix<-apply(dataMatrix,1,match,"02")
>   for ( i in 1: ncol(tmpMatrix))
>  {
>    tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
>    tmp<-c(tmp,tmpHET)
>  }
>  rm(tmpMatrix)
>  rm(tmpRate)
>  return(tmp)
>  gc()
> }
>
> The problem is the memory usage grows very fast and hard to be handled on 
> machines with less RAM.
> Could anyone please give me some comments on how to reduce the space 
> complexity in this calculation?

rowMeans(dataMatrix == "02")  ?

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function to read a string as the variables as opposed to taking the string name as the variable

2009-05-14 Thread hadley wickham

On Thu, May 14, 2009 at 12:16 PM, Lori Simpson
 wrote:
> I am writing a custom function that uses an R-function from the
> reshape package: cast.  However, my question could be applicable to
> any R function.
>
> Normally one writes the arguments directly into a function, e.g.:
>
> result=cast(table1, column1 + column2 + column3   ~    column4,
> mean)      (1)
>
> I need to be able to write this statement as follows:
>
> result=cast(table1, string_with_columns   ~    column4, mean)    (2)
> string_with_columns = group of functions that ultimately outputs:
> "column1 + column2 + column3"

It's complex in general, but for cast you can just supply a string:

cast(table, paste(string_with_columns, "~ column4"))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with reshape/reShape and indexing

2009-05-13 Thread hadley wickham

> This does it more or less your way:
>
> ds <- split(df, df$Name)
> ds <- lapply(ds, function(x){x$Index <- seq_along(x[,1]); x})
> df2 <- unsplit(ds, df$Name)
> tapply(df2$X1, df2[,c("Name", "Index")], function(x) x)
>
> athough there may exist much easier ways ...

Here's one way with the plyr and reshape package:

library(plyr)
df.index <- ddply(df, .(Name), transform, Index = seq_along(X1))

library(reshape)
cast(df.index, Name ~ Index, value = "X1")

Hadley



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: recommended workaround for broken legend.position="top"

2009-05-10 Thread hadley wickham

On Sun, May 10, 2009 at 10:32 AM, Zeljko Vrba  wrote:
> Searching the mail archives I found that using legend.position as in
> p.ring.3 + opts(legend.position="top")
>
> is a known bug.  I tried doing
> p.ring.3 + opts(legend.position=c(0.8, 0.2))
>
> which works, but the legend background is transparent, i.e. I see the
> plot background through the legend.  Adding additional option
>
> opts(legend.background=theme_rect(fill=TRUE,colour="white"))
>
> fills the whole rectangle black(!), making text invisible, but leaves the 
> shape
> symbols visible.
>
> So, how can I obtain a graph with legend positioned within the plot boundaries
> (that's OK, I don't even mind manually positioning the legend), but on a white
> background, i.e., so that the plot underneath is not visible?

opts(legend.background=theme_rect(fill="white"))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] by-group processing

2009-05-08 Thread hadley wickham

On Wed, May 6, 2009 at 8:12 PM, jim holtman  wrote:
> Ths should do it:
>
>> do.call(rbind, lapply(split(x, x$ID), tail, 1))
>         ID Type N
> 45900 45900    I 7
> 46550 46550    I 7
> 49270 49270    E 3

Or with plyr:

library(plyr)
ddply(x, "id", tail, 1)

plyr encapsulates the common split-apply-combine strategy and takes
cares of the details for you.  Read more about it on
http://had.co.nz/plyr

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Houston, TX Users Group

2009-05-06 Thread hadley wickham

Hi Robert,

I'm organising one - sign up to the mailing list,
http://groups.google.com/group/houston-r.  I'm hoping to organise our
first meeting this summer.

Hadley

On Wed, May 6, 2009 at 10:15 AM, Robert Sanford  wrote:
> I'm looking for a Users Group in or near Houston, TX.
>
> Many thanks!
>
> rjsjr
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Do you use R for data manipulation?

2009-05-06 Thread hadley wickham

> Take a look at plyr and reshape packages (http://had.co.nz/), I have a hunch
> that they would have saved me a lot of headache had I found out about them
> earlier :)

As the author of these two packages, I'm admittedly biased, but I
think R is unparalleled for data preparation, manipulation, and
cleaning (with the small caveat that your data needs to fit in
memory).  The R data frame is a fantastic abstraction that most other
programming languages lack, and vectorised subscripting make it
possible to express many transformations in an elegant and efficient
manner.  On top of the facilities provided by base R, there are a huge
number of packages available to load data from just about every data
format, as well as a number of packages (plyr, reshape, sqldf, doBy,
gdata, scope, ...) for data manipulation - just pick the metaphor that
is most natural to you.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] re shape package - use one cast() instead of many

2009-05-05 Thread hadley wickham

On Tue, May 5, 2009 at 3:55 PM, jwg20  wrote:
>
> Thanks for your help! I wasn't sure what the margins variable did, but I'm
> beginning to understand. I'm almost there, but with my data (and with ff_d)
> I tried to margin over two variable names, however it only does one of them.
> So with ff_d I set margins=c("treatment","variable"); however I only ever
> get 1_(all) 2_(all) and 3_(all)... never something like (all)_painty. (This
> also happens for margins=TRUE)

Ah ok.  Margins only work in one direction, so currently there's no
way to do what you want in a single step.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] re shape package - use one cast() instead of many

2009-05-05 Thread hadley wickham

On Tue, May 5, 2009 at 3:03 PM, jwg20  wrote:
>
> I have a data set that I'm trying to melt and cast in a specific way using
> the reshape package. (I'll use the ff_d dataset from reshape so I don't have
> to post a toy data set here. )
>
> Lets say I'm looking for the interaction of treatment with each type of
> "variable" in ff_d. Using the command below gets me this. Subject will get a
> column and each treatment type by each variable will also get a column with
> values for each.
>
> cast(ff_d, subject~treatment+variable)
>   subject 1_potato 1_buttery 1_grassy 1_rancid 1_painty 2_potato 2_buttery
>   3_painty
> 1        3       18        18       18       18       18       18        18
>    18
> ...
>
> Now, if I want to look at just the  the values for each variable by subject
> I can run the following command.
> cast(ff_d, subject~variable)
>   subject potato buttery grassy rancid painty
> 1        3     54      54     54     54     54
> ...
>
> What I'm wondering now, is run one cast() call and get both of these in one
> data.frame? Essentially, the values for each separate "condition" and
> interactions between them? cast() doesn't let me repeat variable names as
> that's what I first tried.  Right now, i'm just running two separate cast()
> calls and cbinding/merging them together. Is there a better way?

Have a look at the margins argument.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quick square root axes

2009-05-05 Thread hadley wickham

> If you do write your own, the hardest part will be picking the nice tick
> marks.  They should be approximately evenly spaced, but at nice round values
> of the original variable:  that's hard to do in general.  R has the pretty()
> function for the linear scale, and doesn't do too badly on log axes, but
> you'll need to work out your own rules for the sqrt or other scales.

This seems like a nice (if smallish) research problem...

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Overlaying graphs from different datasets with ggplot

2009-05-03 Thread hadley wickham

On Thu, Apr 30, 2009 at 2:03 PM, MUHC-Research
 wrote:
>
> Dear R-users,
>
> I recently began using the ggplot2 package and I am still in the process of
> getting used to it.
>
> My goal would be to plot on the same grid a number of curves derived from
> two distinct datasets. The first dataset (called molten.data) looks like
> this :
>
> Column names : Perc, Week, Weight
>
> P10   21  333.3554
> P90   21  486.0480
> P10   22  452.6347
> P90   22  563.8263
> P10   23  575.0960
> P90   23  661.6841
> P10   24  700.4449
> P90   24  779.4067
> P10   25  828.4966
> P90   25  917.1222
>
> The second dataset (called skj) looks like this:
>
> Column names : Week, Perc, Weight
>
> 21    1  317.5
> 22    1  392.5
> 23    1  467.5
> 24    1  542.5
> 25    1  617.5
> 26    1  697.5
> 21    2  535.0
> 22    2  632.5
> 23    2  737.5
> 24    2  855.0
> 25    2  980.0
> 26    2 1115.0
> 21    3  425.0
> 22    3  512.5
> 23    3  602.5
> 24    3  697.5
> 25    3  800.0
> 26    3  907.5
>
...

> So, what am I doing wrong in this situation?

The perc columns are different in the two data frames.  How do you
expect them to match up?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Last month on the Revolutions blog

2009-05-02 Thread hadley wickham

Hi David,

I think the revolution blog is fantastic and a great service to the R
community.  Thanks for all your hard work!

Hadley

On Fri, May 1, 2009 at 4:54 PM, David M Smith
 wrote:
> I write about R every weekday at http://blog.revolution-computing.com
> . In case you missed them, here are some articles from the month of
> April of particular interest to r-help subscribers. Thanks to everyone
> who has been following the blog and sending me messages and/or leaving
> comments -- it always brightens my day to hear from readers!
>
> http://tinyurl.com/cy7x9a (from April 1) announced the new
> "lottopredictor" package, and showed how to make a poor-man's density
> chart with transparent dots.
>
> http://tinyurl.com/c3pj48 announced the series of courses from
> REvolution Computing: "An Introduction to R using Real-World Examples"
> and "High-Performance Computing with R".
>
> http://tinyurl.com/csqmlq linked to a chart of R color names, while
> http://tinyurl.com/darcho linked to a discussion about RColorBrewer, a
> better way to create ranges of colors for charts.
>
> http://tinyurl.com/c7v8nm linked to a useful and amusing list of R
> resources from Cerebral Mastication.
>
> http://tinyurl.com/cn64bf linked to solutions in R for some of the
> Project Euler programming puzzles.
>
> http://tinyurl.com/c9qqq4 showed how to speed up backtesting in R with
> parallel computing from ParallelR 2.0.
>
> http://tinyurl.com/cvtbmu discussed the animations package for
> creating animated graphs in R.
>
> http://tinyurl.com/dnxn7n reviews a web-based tool in R for
> visualizing performance of baseball pitchers.
>
> http://tinyurl.com/cl3u7u reviews another web-based tool built with R,
> linking Google Maps with environmental data.
>
> http://tinyurl.com/cuec9g announced REvolution R Enterprise 2.0, while
> http://tinyurl.com/czefgs gave a behind-the-scenes view of the process
> of porting R to 64-bit Windows.
>
> http://tinyurl.com/crp6t4 marveled at an amazing feat of productivity:
> creating a sales dashboard with only 6 week's experience in R.
>
> http://tinyurl.com/ctmyjq embedded a cute and popular professional
> video introducing newcomers to R.
>
> http://tinyurl.com/dyszxb announced the availability of R 2.9.0 on CRAN.
>
> http://tinyurl.com/c26pl5 showed how survival analysis in R can be
> applied to business systems.
>
> http://tinyurl.com/cvutfz announced that you can follow me on Twitter
> as @revodavid.
>
> http://tinyurl.com/cunen5 reviewed several of the talks from the
> outstanding R/Finance 2009 conference.
>
> http://tinyurl.com/dx3srx announced two events I'll be speaking at in
> San Francisco in May and June.
>
> (I've provided tinyurls above because many mailers break the long direct 
> URLs.)
>
> Quite a lot of R-specific posts this month! There were also general
> (non-R) articles about statistics and probability, graphical displays
> of data, and swine flu. Also, the R Community Calendar has been
> updated with many forthcoming events related to R.
> http://blog.revolution-computing.com/calendar.html
>
> Comments and suggestions about the blog are welcome! (My email is
> da...@revolution-computing.com ).
>
> Regards to all,
> # David Smith
>
> --
> David M Smith 
> Director of Community, REvolution Computing www.revolution-computing.com
> Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)
>
> Check out our upcoming events schedule at www.revolution-computing.com/events
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reccomendation for graphics package

2009-05-01 Thread hadley wickham

On Fri, May 1, 2009 at 2:38 PM, Zeljko Vrba  wrote:
> On Fri, May 01, 2009 at 01:06:34PM -0500, hadley wickham wrote:
>>
>> It should be trivial with ggplot2 too, but it's hard to provide
>> concrete advice without a concrete problem.
>>
> Elementary problem:
>
> qplot(wg, v.realtime, data=df.best.medians$gv1, facets = . ~ n, colour=sp)
>
> produces a nice plot. Adding a geom="line" produces everything *except* the
> lines that show the dataset!  What am I doing wrong? (R-2.9.0, on 64-bit 
> Vista;
> package freshly installed from CRAN)

If you have a categorical x axis, you need to specify the group
aesthetic which defines what group of points should form a line.  It's
hard to tell what that should be from your example, maybe sp?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MCO: Timing using model.matrix method

2009-05-01 Thread hadley wickham

> My issue is self-evident:  using this method resulted in a 30 fold
> increase in time.  My question is why?  If I time the individual
> components separately, nothing is unusual.  My hunch is the
> "interaction" between the model.matrix and nsga2 methods.
>
> Any ideas on how to speed this process up, or circumvent the issue altogether?

You might find some ideas in

COMPUTING THOUSANDS OF TEST
STATISTICS SIMULTANEOUSLY IN R
Holger Schwender and Tina Müller
University of  Dortmund, Germany
http://stat-computing.org/newsletter/v181.pdf

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reccomendation for graphics package

2009-05-01 Thread hadley wickham

> Is situation anything better with ggplot2?  It seems rather easy to get e.g.
> line plots with error bars, provided that one feeds the data to some
> modeling/regression function and passes the result over for plotting.. but 
> what
> if I have generated my own error bar data?  This is almost trivial to
> accomplish with built-in plot functions (lines function), but that doesn't 
> play
> nicely with composing plots..

It should be trivial with ggplot2 too, but it's hard to provide
concrete advice without a concrete problem.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A beginner's question about ggplot

2009-05-01 Thread hadley wickham

On Fri, May 1, 2009 at 12:22 PM, MUHC-Research
 wrote:
>
> Dear R-users,
>
> I would have another question about the ggplot() function in the ggplot2
> package.
>
> All the examples I've read so far in the documentation make use of a single
> neatly formatted data.frame. However, sometimes, one may be interested in
> plotting on the same grid information or objects derived from two totally
> different datasets and customize both displays. I still cannot tell how this
> can be done using ggplot().
>
> Here's an example.
>
> ###
> ## A very simple data.frame;
>
> my.data = data.frame(X1 =
> as.factor(rep(1:2,c(4,4))),X2=c(4,3,5,2,6,2,3,5),X3=c(1:3,2,2:4,5)) ;
>
> ## Let's say I want to add the X^2 line to the plot;
>
> squared = data.frame(X=1:12,Y=((1:12)/2)^2) ;
>
> ## A scatterplot for my.data ;
>
> p = ggplot(my.data,aes(x=X2,y=X3,group=X1)) ;
> p = p+geom_point(aes(colour=X1)) ;
>
> #
>
> How can "squared" be added to the plot? At first, I used
>
> p+geom_line(data=squared,aes(x=X,y=Y,group=1,colour="green")) ;
>
> but the plotted line is always blue! In fact, I can replace colour by any
> character value and I will still get a blue line.

You have two alternatives:

p + geom_line(data=squared,aes(x=X,y=Y,group=1,colour="squared")) +
  labs(colour = "Dataset")

p + geom_line(data=squared,aes(x=X,y=Y,group=1), colour="green")

see section 4.5.2 of the book for details.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gridding values in a data frame

2009-04-30 Thread hadley wickham

It's hard to check without a reproducible example, but the following
code should give you a 3d array of lat x long x time:

library(reshape)

df$lat <- round_any(df$LATITUDE, 5)
df$long <- round_any(df$LONGITUDE, 5)
df$value <- df$TIME

cast(df, lat ~ long ~ time, mean)


On Thu, Apr 30, 2009 at 10:55 AM, dxc13  wrote:
>
> Hi all,
> I have a data frame that looks like such:
> LATITUDE   LONGITUDE   TEMPERATURE   TIME
> 36.73         -176.43        58.32               1
> 50.95            90.00        74.39               1
> -30.42            5.45        23.26               1
> 15.81         -109.31        52.44               1
> -80.75        -144.95        66.19              2
> 90.00          100.55        37.50               2
> 65.41         -4.49           29.83               2
>
> and this goes on for a A LOT more records, until time=1200
>
> I want to create a 5 degree by 5 degree grid of this data, with the value of
> temperature in the appropriate grid cell.  I want one grid for each time
> value.  For each time value, this works out to be a 36x72 grid with 2592
> cells because the longitude spans -180 to 180 and latitude spans 90 to -90
> and they would be in increments of 5 degrees.  If there are no temperatures
> available to be put into a grid cell, than that cell should get a missing
> value, NA, put into it.
> Also, could the gridded result for each time be written to a text file
> before processing the next time value?
>
> Hope this is clear.
> Thanks in advance.
>
> dxc13
> --
> View this message in context: 
> http://www.nabble.com/gridding-values-in-a-data-frame-tp23319190p23319190.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bumps chart in R

2009-04-27 Thread hadley wickham

]
> library(ggplot2)
> qplot(year,value, data=data,label=countries, geom=c("line","text"),
> group=countries, col=countries)
>
> But I would like to have the text labels show only once - e.g. at 1990
> - and also control the size of the text. In my crude qplot, setting
> size=2 e.g. changes not only the text, but also the lines etc. I guess
> I have to move from qplot to gplot.

Or just add the text layer separately:

qplot(year, value, data = data, geom = "line", group = countries) +
  geom_text(aes(label = countries), subset = .(year == 1990),
hjust = 1, size = 3, lineheight = 1)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bumps chart in R

2009-04-26 Thread hadley wickham

In statistics, a bumps chart is more commonly called a parallel
coordinates plot.

Hadley

On Sun, Apr 26, 2009 at 5:45 PM, Andreas Christoffersen
 wrote:
> Hi there,
>
> I would like to make a 'bumps chart' like the ones described e.g.
> here: http://junkcharts.typepad.com/junk_charts/bumps_chart/
>
> Purpose: I'd like to plot the proportion of people in select countries
> living for less then one USD pr day in 1994 and 2004 respectively. I
> have already constructed a barplot - but I think a bumps chart would
> be better
>
> # The barplot and data
> countries <- c("U-lande", "Afrika syd for sahara", "Europa og
> Centralasien", "Lantinamerika og Caribien","Mellemøstenog Nordafrika",
> "Sydasien","ØStasien og stillehaveet", "Kina", "Brasilien")
> poor_1990 <- c(28.7,46.7,0.5,10.2,2.3,43,29.8,33,14)
> poor_2004 <- c(18.1,41.1,0.9,8.6,1.5,30.8,9.1,9.9,7.5)
> poor <- cbind(poor_1990,poor_2004)
> rownames(poor) <- countries
> oldpar <- par(no.readonly=T)
> par <- par(mar=c(15,5,5,1))
> png("poor.png")
> par <- par(mar=c(15,5,5,1))
> barplot(t(poor[order(poor[,2]),]),beside=T,col=c(1,2),las=3,ylab="%
> poor",main="Percent living for < 1 USD per day (1993
> prices)",ylim=c(0,50))
> legend("topleft",c("1990","2004"),fill=c(1,2),bty="n")
> par(oldpar)
> dev.off()
>
> I Guess I need to start with an normal plot? Something like the below
> - but there is a loong way to go...
>
> # A meager start - how to finish my bumps chart
> plot(c(rep(1,9),rep(2,9)),c(fattig_1990,fattig_2004),type="b",ann=F)
>
> Thankfull for any help.
>
> Cheers.
>
> Andreas
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eager to learn how to use "sapply", "lapply", ...

2009-04-26 Thread hadley wickham

Have a look at the plyr package and associated documentation -
http://had.co.nz/plyr

Hadley

On Sun, Apr 26, 2009 at 12:42 PM,   wrote:
> After a year my R programming style is still very "C like".
> I am still writing a lot of "for loops" and finding it difficult to recognize 
> where, in place of loops, I could just do the
> same with one line of code, using "sapply", "lapply", or the like.
> On-line examples for such high level function do not help me.
> Even if, sooner or later, I am getting my R scripts to do what I expect, I 
> would really like to shake my C programming style off.
> I am staring at my R script and thinking "how can I improve it ?"
> For instance, I have a lot of loops similar to the following one and wonder 
> whether I can replace them with a proper call to a high level R function that 
> does the same:
>
>    Nstart <- Nfour/(2^Lev) + 1
>     Nfinish <- Nstart -1 + Nfour/(2^Lev)
>     LengLev <- Nfinish - Nstart + 1
>     NW <- floor(LengLev*N/Nfour)
>     if(NW > 0){
>       for(j in Nstart:(Nstart + NW -1)){
>          Dw <- abs(Y[j])
>          Rnorm <- Rnorm + Dw^2
>       }
>     }
>
>
> Thank you very much for helping me get better.
> Maura
>
>
>
>
>
> tutti i telefonini TIM!
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 3 questions regarding matrix copy/shuffle/compares

2009-04-26 Thread hadley wickham

In that case, you would want a shallow copy, and you'd need to jump
through a lot of hoops to do that in R.

Hadley

On Sun, Apr 26, 2009 at 10:35 AM, David Winsemius
 wrote:
> My understanding of the OP's request was for some sort of copy which did
> change when entries in the original were changed; the sort of behavior that
> might be seen  in a spreadsheet that had a copy "by reference".
>
> On Apr 26, 2009, at 11:28 AM, hadley wickham wrote:
>
>>>> I want to (1) create a deep copy of pop,
>>>
>>> I have already said *I* do not know how to create a "deep copy" in R.
>>
>> Creating a deep copy is easy, because all copies are "deep" copies.
>> You need to try very hard to create a reference in R.
>>
>> Hadley
>
> --
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 3 questions regarding matrix copy/shuffle/compares

2009-04-26 Thread hadley wickham

>> I want to (1) create a deep copy of pop,
>
> I have already said *I* do not know how to create a "deep copy" in R.

Creating a deep copy is easy, because all copies are "deep" copies.
You need to try very hard to create a reference in R.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] omit empty cells in crosstab?

2009-04-24 Thread hadley wickham

On Fri, Apr 24, 2009 at 3:12 PM, sjaffe  wrote:
>
> small example:
>
> a<-c(1.1, 2.1, 9.1)
> b<-cut(a,0:10)
> c<-data.frame(b,b)
> d<-table(c)
> dim(d)
> ##result: c(10, 10)
>
> But only 9 of the 100 cells are non-zero.
> If there were 10 columns, the table have 10 dimensions each of length 10, so
> have 10^10 elements, too much even to fit in memory

Here's one way with the plyr package:

library(plyr)
ddply(c, names(c), nrow)

Find more about plyr at http://had.co.nz/plyr

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] omit empty cells in crosstab?

2009-04-24 Thread hadley wickham

Hi Steve,

The general answer is yes, but the specific will depend on your
problem.  Could you provide a small reproducible example to illustrate
your problem?

Hadley

On Fri, Apr 24, 2009 at 1:19 PM, sjaffe  wrote:
>
> Perhaps this is a common question but I haven't been able to find the answer.
>
> I have data with many factors, each taking many values. However, only
> relatively few combinations appear in the data, ie have nonzero counts, in
> other words the resulting table is sparse. Say we have 10 factors each with
> 10 levels. The result of table() would exceed the memory space (on a 32bit
> machine). Is there any way to produce a table with empty cells omitted?
> (without first producing the whole table and then removing rows.)
>
> Thanks,
> Steve
>
> --
> View this message in context: 
> http://www.nabble.com/omit-empty-cells-in-crosstab--tp2363p2363.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generalized 2D list/array/whatever?

2009-04-24 Thread hadley wickham

On Fri, Apr 24, 2009 at 5:50 AM, Duncan Murdoch  wrote:
> Toby wrote:
>>
>> I'm trying to figure out how I can get a generalized 2D
>> list/array/matrix/whatever
>> working.  Seems I can't figure out how to make the variables the right
>> type.  I
>> always seem to get some sort of error... out of bounds, wrong type, wrong
>> dim, etc.
>> Very confused... :)
>>
>> x[["some label", "some other index"]] <- 3
>> x[["some other label", "something else"]] <- 4
>>
>> I don't know the indexes/label ahead of time... they get generated...  Any
>> thoughts?
>>
>
> What you have there is not legal syntax, but this would be:

It isn't?

a <- as.list(1:4)
dim(a) <- c(2, 2)
rownames(a) <- c("a", "b")
colnames(a) <- c("c", "d")

a[["a", "d"]]

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditional grouping of variables: ave or tapply or by or???

2009-04-23 Thread hadley wickham

On Thu, Apr 23, 2009 at 5:11 PM, ozan bakis  wrote:
> Dear R Users,
> I have the following data frame:
>
> v1 <- c(rep(10,3),rep(11,2))
> v2 <- sample(5:10, 5, replace = T)
> v3 <- c(0,1,2,0,2)
> df <- data.frame(v1,v2,v3)
>> df
>  v1 v2 v3
> 1 10  9  0
> 2 10  5  1
> 3 10  6  2
> 4 11  7  0
> 5 11  5  2
>
> I want to add a new column v4 such that its values are equal to the value
> of v2 conditional on v3=0 for each subgroup of v1. In the above example,
> the final result should be like
>
> df$v4 <- c(9,9,9,7,7)
>> df
>  v1 v2 v3 v4
> 1 10  9  0  9
> 2 10  5  1  9
> 3 10  6  2  9
> 4 11  7  0  7
> 5 11  5  2  7
>
>
> I tried the following commands without success.
>
> df$v4 <- ave(df$v2, df$v1, FUN=function(x) x[df$v3==0])
> tapply(df$v2, df$v1, FUN=function(x) x[df$v3==0])
> by(df$v2, df$v1, FUN=function(x) x[df$v3==0])
>
> Any help? Thanks in advance!

Here's one approach with the plyr package, http://had.co.nz/plyr

library(plyr)
ddply(df, .(v1), transform, v4 = v2[v3 == 0])

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug when subtracting decimals?

2009-04-21 Thread hadley wickham

> "Have you read the posting guide and the FAQs? If you do not get a reply
> within two days, you may want to look at both and think about reformulating
> your query. Oh, and while you are at it, look through the archives, a lot of
> questions have already been asked and answered before."

As I say every time someone brings this up, there are currently ~130
printed pages of FAQs.  Reading all that seems a rather large burden
on the novice poster.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] [ANN] ggplot2 version 0.8.3

2009-04-20 Thread hadley wickham

ggplot2 

ggplot2 is a plotting system for R, based on the grammar of graphics,
which tries to take the good parts of base and lattice graphics and
avoid bad parts. It takes care of many of the fiddly details
that make plotting a hassle (like drawing legends) as well as
providing a powerful model of graphics that makes it easy to produce
complex multi-layered graphics.

To install or update, run:
install.packages(c("ggplot2", "plyr"))

Find out more at http://had.co.nz/ggplot2, and check out the nearly 500
examples of ggplot in use.  If you're interested, you can also sign up to
the ggplot2 mailing list at http://groups.google.com/group/ggplot2, or track
development at  http://github.com/hadley/ggplot2

ggplot2 0.8.3  (2008-04-20)


New features

* alpha: new aesthetic, with scale alpha.  Where a geom has both fill
and colour, alpha affects the fill.
* annotate: new annotate function to make it easier to add annotations to plots
* facet_grid now takes strip label function from parameter labeller,
not theme setting
* facet_grid: gains as.table argument to control direction of horizontal facets
* fortify: full set of methods for turning data from the sp package
into data frames that can be plotted with ggplot2
* geom_errorbarh: new geom for horizontal error bars
* labels_parsed and labels_bquote functions to make it easier to
display expressions on facet labels
* scale_manual now supports breaks and limits
* subset: experimental new feature.  Layers now have a subset
argument, which takes subsets formatted like .(var1 < 5, var2 == 3)
etc.
* xlim and ylim now work recognise Date and POSIXct classes to create
date and date_time scales respectively

Dealing with missing values

* facet_wrap: add drop argument to control whether or not panels for
non-existent combinations of facetting variables should be dropped or
not.  Defaults to TRUE
* scale_discrete: empty factor levels will be preserved, unless drop = TRUE

Bug fixes

* added presidents dataset from book to package
* American spelling of color accepted in as geom parameter, and all
colour scales have alias spelled color (e.g. scale_color_hue)
* facet_wrap: contents guaranteed to be clipped to panel
* facet_wrap: corrected labelling when facetting by multiple variables
(thank to Charlotte Wickham for a clear test case)
* geom_histogram now works with negative weights (provided position =
"identity").  This is useful for creating back to back histograms.
* geom_step: improve legend
* geom_text: better legend
* geom_vline, geom_hline, geom_abline: should work in yet more situations
* resolution: fixed bug in computation of resolution that lead to
(e.g.) incorrect boxplot widths when there was only a single x value
in a group.
* position_stack: fixed bug in detection of overlap for very large bins
* scale_discrete: factor levels no longer mistakenly reordered
* scale_hue: now spans full range of hue if it less than 360 degrees
* scale_hue: rotated default hue range by 15 degrees to avoid
unfortunate red-green contrast in two colour case
* show now works with ggplot objects
* stat_sum: fixed bug which resulted in dropped aesthetics
* stat_summary: now warns when dropping records with missing values
* stat_summary: should be a little faster
* stat_summary: correctly passes ... arguments on fun.data
* theme_bw: corrected justification of axis.text.y
* trans: bug fixes to logistic transformation
* order aesthetic should work again

-- 
http://had.co.nz/



-- 
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AICs from lmer different with summary and anova

2009-04-19 Thread hadley wickham

> Am I doing something wrong, here? If not, which are the real AIC and logLik
> values for the different models?

I don't think it's reasonable to expect that the log-likelihood
computed by different functions be should comparable.  Are the
constant terms included or dropped?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create histogram from data matrix

2009-04-18 Thread hadley wickham

On Fri, Apr 17, 2009 at 2:07 PM, Paul Warren Simonin
 wrote:
> Thank you all for your advice.
>  I have received some good tips, but it was suggested I write back with a
> small simulated data set to better illustrate my needs. So, currently my
> data frame looks something like:
>
> ID (date)  Temperature  Number of fish
> 200706183       5       456
> 200706183       5       765
> 200706183       4       567
> 200706183       3       876
> 200706183       3       888
> 200706183       2       111
> 200706184       8       2345
> 200706184       8       654
> 200706184       8       7786
> 200706184       7       345
> 200706184       6       234
> 200706184       6       123
>
>
> I need to create a plots for each ID (date) of the number of fish observed
> at each temperature. Obviously my data frame is much larger. These plots do
> not have to be in a specific histogram format, but it seems this may be
> appropriate.
> Thanks for any additional advice as to how this may be done, either using
> plot commands or reformatting my data.
>
> It seemed the ggplot2 options may be good but so far I have tried qplot with
> no success:
>
> my most recent code looks like:
>
> qplot(temp,number of fish, geom="histogram",binwidth=1)
>
> I have tried various tweaks of this, but no success.

The problem is number of fish is not a valid R variable name because
it has spaces in it (and you didn't specify the data frame to look for
those variable in).

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ColorRamp different from ColorRampPalette

2009-04-17 Thread hadley wickham

Look at the output of pal.cr((0:40)/40)
Hadley

On Fri, Apr 17, 2009 at 2:42 PM, Etienne B. Racine  wrote:
>
> I try to use ColorRamp as ColorRampPalette (i.e. with the same gradient), but
> it seems there is a nuance that I've missed.
>
> pal.crp<-colorRampPalette( c("blue", "white", "red"), space = "rgb")
> plot(rep(0,40),pch=16,col=pal.crp(40))
> # is great
>
> But, using the same gradient with colorRamp is giving erratic colors.
>
> pal.cr<-colorRamp( c("blue", "white", "red"), space = "rgb")
> plot(rep(0,40),pch=16,col=pal.cr((0:40)/40))
> # is not great
>
> >From the help : "colorRamp returns a function that maps values between 0 and
> 1 to colors" ...colors I guess taken from the gradient, but I don't get the
> gradient.
>
> Etienne
>
> --
> View this message in context: 
> http://www.nabble.com/ColorRamp-different-from-ColorRampPalette-tp23104641p23104641.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] numbers loop in R

2009-04-17 Thread hadley wickham

On Fri, Apr 17, 2009 at 12:19 PM, jim holtman  wrote:
> try this:
>
>> matrixx<-function(A){
> +     B=matrix(NaN,nrow=(A+1),ncol=4)
> +     k <- 1
> +     for (i in 3:A){
> +         for (j in i:A) {
> +             B[k,] <- c(NaN, i-2, i-1, j)
> +             k <- k + 1
> +         }
> +     }
> +     B
> + }
>> matrixx(5)
>     [,1] [,2] [,3] [,4]
> [1,]  NaN    1    2    3
> [2,]  NaN    1    2    4
> [3,]  NaN    1    2    5
> [4,]  NaN    2    3    4
> [5,]  NaN    2    3    5
> [6,]  NaN    3    4    5

Here's a solution without the loop.  I think it illustrates the intent
of the algorithm more clearly.

candidates <- t(combn(5,3))
firstdiff <- candidates[,2] - candidates[, 1]
cbind(NaN, candidates[firstdiff == 1, ])

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create histogram from data matrix

2009-04-17 Thread hadley wickham

On Fri, Apr 17, 2009 at 9:59 AM, Paul Warren Simonin
 wrote:
> Hello!
>  Thanks for reading this request for assistance. I have a question regarding
> creating a histogram-like figure from data that are not currently in the
> correct format for the "hist" command.
>  Specifically, my data have been processed and are in a matrix with columns
> containing the variables of interest and separate columns containing the
> number of times this variable was observed (counts). This data frame/matrix
> is rather large (1600 rows), and there are multiple rows corresponding to
> the same variable level (e.g., "temperature=8, 5 observations" in one row,
> then the next: "temperature=8, 9 observations", and so on). In other words,
> the data are not one long vector R can read and plot as a histogram, nor are
> they condensed. My goal is to create a figure in which one axis is bins
> (e.g., temperature values) and the other is number of observations in this
> bin (e.g., number of organisms seen).
>  My question is: Is there a way R can be told to read my data to create a
> plot like that I desire? So far I have tried several options, including bar
> plots with no success.
>
>  If there is no way to do this with my data as the are currently arrange, is
> there an efficient way to re-arrange them?

In fact, this is very easy to do with ggplot2:

install.packages("ggplot2")
library(ggplot2)

qplot(value, weight = numofobservations, data = mydf, geom="histogram")

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cast function in package reshape

2009-04-17 Thread hadley wickham

On Fri, Apr 17, 2009 at 8:38 AM, David Hajage  wrote:
> Hello R useRs,
>
> I have a function which returns a list of functions :
>
> freq1 <- function(x) {
>  lev <- unique(x[!is.na(x)])
>  nlev <- length(lev)
>  args <- alist(x=)
>
>  if (nlev == 1) {
>    body <- c("{", "sum(!is.na(x))", "}")
>    f <- function() {}
>    formals(f) <- as.pairlist(args)
>    body(f) <- parse(text = body)
>    namef <- paste("freq", as.character(nlev), sep = "_")
>    assign(namef, f)
>    res <- list(get(namef))
>    names(res) <- namef
>  }
>  if (nlev > 1) {
>    res <- NULL
>    namesf <- NULL
>    for (i in 1:nlev) {
>      body <- c("{", paste("sum(x[!is.na(x)] ==", as.character(lev[i]), ")",
> sep = " "), "}")
>      f <- function() {}
>      formals(f) <- as.pairlist(args)
>      body(f) <- parse(text = body)
>      namef <- paste("freq", as.character(lev[i]), sep = "_")
>      assign(namef, f)
>      namesf <- c(namesf, namef)
>      res <- c(res, get(namef))
>    }
>    names(res) <- namesf
>  }
>  return(res)
> }
>
> df <- data.frame(id = 1:50, x = sample(c(NA, 1), 50, T), y = sample(1:2, 50,
> T), z = sample(letters[1:2], 50, T))
>
>> freq1(df$x)
> $freq_1
> function (x)
> {
>    sum(!is.na(x))
> }
> 
>
>> freq1(df$y)
> $freq_2
> function (x)
> {
>    sum(x[!is.na(x)] == 2)
> }
> 
>
> $freq_1
> function (x)
> {
>    sum(x[!is.na(x)] == 1)
> }
> 
>
>
> I would like to use this list of functions with cast function (in package
> reshape by Hadley Wickham) :
>
>> cast(melt(df, id = c("id", "z"), measure = c("x", "y")), variable +
> result_variable ~ z, fun = function(x) freq1(x), margins = "grand_col")
> Erreur dans freq1(x) : objet "res" non trouvé
>
> Here the result I would like to have :
>
>  variable                  a  b (all)
> 1        x          freq_1 10 14    24
> 2        y          freq_1 18 32    50
> 3        y          freq_2  9 14    23
>
> I admit it is a bit far-fetched, but is this actually possible ?

Something like this?

df <- data.frame(
  id = 1:50,
  x = sample(c(NA, 1), 50, T),
  y = sample(1:2, 50, T),
  z = sample(letters[1:2], 50, T)
)
dfm <- melt(df, id = c("id", "z"))


f1 <- function(base)
  function(x) table(factor(x, levels = unique(base)))
cast(dfm, variable + result_variable ~ z, f1(dfm$value),
  margins = "grand_col")

I think f1 effectively does what your freq1 function does, but always
returns the same number of results, a requirement of the aggregation
function in cast.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extending a vector to length n

2009-04-16 Thread hadley wickham

Great idea - that's a little faster than my previous approach of
setting length() and then re-adding the attributes.  Thanks!

Hadley

On Thu, Apr 16, 2009 at 12:16 PM, Raubertas, Richard
 wrote:
> The following approach works for both of your examples:
>
> xx <- rep(x, length.out=n)
> xx[m:n] <- NA
>
> Thus:
>
>> n <- 2
>> aa <- rep(a, length.out=n)
>> aa[(length(a)+1):n] <- NA
>> aa
> [1] "2008-01-01" NA
>> bb <- rep(b, length.out=n)
>> bb[(length(b)+1):n] <- NA
>> bb
> [1] a    
> Levels: a
>>
>
> R. Raubertas
> Merck & Co
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org] On Behalf Of hadley wickham
>> Sent: Wednesday, April 15, 2009 10:55 AM
>> To: r-help
>> Subject: [R] Extending a vector to length n
>>
>> In general, how can I increase a vector of length m (< n) to length n
>> by padding it with m - n missing values, without losing attributes?
>> The two approaches I've tried, using length<- and adding missings with
>> c, do not work in general:
>>
>> > a <- as.Date("2008-01-01")
>> > c(a, NA)
>> [1] "2008-01-01" NA
>> > length(a) <- 2
>> > a
>> [1] 13879    NA
>>
>>
>> > b <- factor("a")
>> > c(b, NA)
>> [1]  1 NA
>> > length(b) <- 2
>> > b
>> [1] a    
>> Levels: a
>>
>> Hadley
>>
>>
>>
>> --
>> http://had.co.nz/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates (which may be known
> outside the United States as Merck Frosst, Merck Sharp & Dohme or
> MSD and in Japan, as Banyu - direct contact information for affiliates is
> available at http://www.merck.com/contact/contacts.html) that may be
> confidential, proprietary copyrighted and/or legally privileged. It is
> intended solely for the use of the individual or entity named on this
> message. If you are not the intended recipient, and have received this
> message in error, please notify us immediately by reply e-mail and
> then delete it from your system.
>
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] [ANN] plyr version 0.1.7

2009-04-15 Thread hadley wickham

plyr is a set of tools for a common set of problems: you need to break
down a big data structure into manageable pieces, operate on each
piece and then put all the pieces back together.  For example, you
might want to:

  * fit the same model to subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising
  * eliminate for-loops in your code

It's already possible to do this with built-in functions (like split
and the apply functions), but plyr just makes it all a bit easier
with:

  * absolutely consistent names, arguments and outputs
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages

Some considerable effort has been put into making plyr fast and memory
efficient, and in most cases it is faster than the built-in functions.

You can find out more at http://had.co.nz/plyr/, including a 20 page
introductory guide, http://had.co.nz/plyr/plyr-intro.pdf.  You can ask
questions about plyr (and data-manipulation in general) on the plyr
mailing list.  Sign up at http://groups.google.com/group/manipulatr


plyr 0.1.7 (2008-04-15) ---

Ensure that rbind.fill preserves attributes.

plyr 0.1.6 (2008-04-15) ---

Improvements:

* all ply functions deal more elegantly when given function names: can
supply a vector of function names, and name is used as label in output
* failwith and each now work with function names as well as functions
(i.e. "nrow" instead of nrow)
* each now accepts a list of functions or a vector of function names
* l*ply will use list names where present
* if .inform is TRUE, error messages will give you information about
where errors within your data - hopefully this will make problems
easier to track down

Speed-ups

* massive speed ups for splitting large arrays
* fixed typo that was causing a 50% speed penalty for d*ply
* rewritten rbind.fill is considerably (> 4x) faster for many data frames
* colwise about twice as fast

Bug fixes:

* daply: now works when the data frame is split by multiple variables
* aaply: now works with vectors
* ddply: first variable now varies slowest as you'd expect


plyr 0.1.5 (2008-02-23) ---

* colwise now accepts a quoted list as its second argument.  This
allows you to specify the names of columns to work on: colwise(mean,
.(lat, long))
* d_ply and a_ply now correctly pass ... to the function


-- 
http://had.co.nz/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extending a vector to length n

2009-04-15 Thread hadley wickham

In general, how can I increase a vector of length m (< n) to length n
by padding it with m - n missing values, without losing attributes?
The two approaches I've tried, using length<- and adding missings with
c, do not work in general:

> a <- as.Date("2008-01-01")
> c(a, NA)
[1] "2008-01-01" NA
> length(a) <- 2
> a
[1] 13879NA


> b <- factor("a")
> c(b, NA)
[1]  1 NA
> length(b) <- 2
> b
[1] a
Levels: a

Hadley



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Concatenation, was Re: Physical Units in Calculations

2009-04-13 Thread hadley wickham

On Mon, Apr 13, 2009 at 4:15 AM, Peter Dalgaard
 wrote:
> Stavros Macrakis wrote:
>
>> It would of course be nice if the existing difftime class could be fit
>> into this, as it is currently pretty much a second-class citizen.  For
>> example, c of two time differences is currently a numeric vector,
>> losing its units (hours, days, etc.) completely.
>
> That's actually a generic feature/issue of c(). We also have
>
>> c(factor(1),factor(3))
> [1] 1 1
>> library(survival)
> Loading required package: splines
>> c(Surv(1,T),Surv(2,F))
> [1] 1 1 2 0
>
> and similar issues apply to rbind() of data frames,
>
>> rbind(data.frame(s=Surv(1,T)),data.frame(s=Surv(2,F)))
>  s.time s.status
> 1      1        1
> 2      2        0
>
> There is some potential for redesigning this, using a concat() generic which
> should do the Right Thing for all classed vector-like objects. (There is
> such a function in Splus, but I don't their data frame code is using it.)

It would also be nice to have a strip_attr function for those times
you want to be explicit about using c to remove attributes.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with postscript (huge file size)

2009-04-12 Thread hadley wickham

> I'm generating some images in R to put into a document that I'm producing
> using Latex. This document in Latex is following a predefined model, which
> does not accept compilation with pdflatex, so I have to compile with latex
> -> dvi -> pdf. Because of that, I have to generate the images in R with
> postscript (I want a vector format to keep the quality). The problem is that
> the files of the images are very huge (10MB) and I have many images to put
> into the pdf document.
> I want to know if there is a way to reduce the size of those images
> generated by R using postscript.

Just use a high-resolution png or tiff.  At 300 dpi you won't be able
to tell the difference when it's printed.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Display a very low p-value

2009-04-08 Thread hadley wickham

>  pnorm(37:39,lower.tail=FALSE)
> [1] 5.725571e-300  0.00e+00  0.00e+00
>
>  This is just a limitation of double precision floating-point arithmetic
> ...
>
>  curve(pnorm(x,lower.tail=FALSE),from=30,to=40,log="y")
> .Machine$double.xmin

But note

curve(pnorm(x,lower.tail=FALSE, log=T),from=30,to=1000)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newbie query: simple crosstabs

2009-04-07 Thread hadley wickham

On Tue, Apr 7, 2009 at 4:41 PM, Jorge Ivan Velez
 wrote:
> Hi Eik,
> You're absolutely right. My bad.
>
> Here is the correction of the code I sent:
>
> apply(mydata[,-1], 2, tapply, mydata[,1], function(x) sum(x)/length(x))

Or more simply:

apply(mydata[,-1], 2, tapply, mydata[,1], mean)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using as.formula() with the reshape package cast

2009-04-07 Thread hadley wickham

On Tue, Apr 7, 2009 at 8:44 AM,   wrote:
>
> I am trying to use the "cast" function from the reshape package, where the
> formula is not passed in directly, but as the result of the as.formula()
> function.
>
> Using reshape v. 0.7.2
>
> I am able to properly melt() by data with:
>
>> molten <- melt(x, id=1:2)
>
> then I can properly cast with this:
>
>> cast(molten, days ~ variable)
>
> but if I try
>
>> cast(molten, as.function("days ~ variable"))

You're calling as.function ;)  Also, I think you should be able to
pass the string directly - reshape will convert it to a formula for
you.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] change inter-line spacing in grid graphics - how to?

2009-04-07 Thread hadley wickham

Have a look at ?gpar - it will tell you about lineheight.

Hadley

On Tue, Apr 7, 2009 at 3:28 AM, Mark Heckmann  wrote:
> I am trying to change the inter-line spacing in grid.text(), but I just
> don't find how to do it.
>
> pushViewport(viewport())
> grid.text("The inter-line spacing\n is too big")
> popViewport()
>
> Can anyone help?
> TIA, Mark
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SUM,COUNT,AVG

2009-04-06 Thread hadley wickham

On Mon, Apr 6, 2009 at 5:31 PM, Jun Shen  wrote:
> This is a good example to compare different approaches. My understanding is
>
> aggregate() can apply one function to multiple columns
> summarize() can apply multiple functions to one column
> I am not sure if ddply() can actually apply multiple functions to multiple
> columns? This is what I would like to do. The syntax in the help is  a
> little confusing to me. Appreciate more comments. Thanks

In theory, you should be able to combine colwise and each:
colwise(each(min, median, max)).   That should return the min, median
and max for each column, but currently it doesn't return the values in
quite right the form for recombination with ddply.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Collapse data matrix with extra info separated by commas

2009-04-06 Thread hadley wickham

On Mon, Apr 6, 2009 at 10:40 AM, baptiste auguie  wrote:
> Here's one attempt with plyr, hopefully Hadley will give you a better
> solution ( I could not get cast() to do it either)
>
> test <-
> data.frame(a=c("A","A","A","A","B","B","B"),b=c(1,1,2,2,1,1,1),c=sample(1:7))
> ddply(test,.(a,b),.fun=function(.) paste(.)[3])

This is a problem that currently isn't very easy to solve in plyr (but
I'm working on it).  About the best you can do is:

ddply(test, ~ a + b, colwise(paste, .(c)), collapse =",")

(this is basically equivalent to your suggestion)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SUM,COUNT,AVG

2009-04-06 Thread hadley wickham

On Mon, Apr 6, 2009 at 9:34 AM, Stavros Macrakis  wrote:
> There are various ways to do this in R.
>
> # sample data
> dd <- data.frame(a=1:10,b=sample(3,10,replace=T),c=sample(3,10,replace=T))
>
> Using the standard built-in functions, you can use:
>
> *** aggregate ***
>
> aggregate(dd,list(b=dd$b,c=dd$c),sum)
>  b c  a b c
> 1 1 1 10 2 2
> 2 2 1  3 2 1
> 
>
> *** tapply ***
>
> tapply(dd$a,interaction(dd$b,dd$c),sum)
>      1.1       2.1       3.1       1.2       2.2       3.2       1.3
> 2.3
>  5.00  3.00 10.00  5.00        NA        NA  5.00
> ...
>
> But the nicest way is probably to use the plyr package:
>
>> library(plyr)
>> ddply(dd,~b+c,sum)
>  b c V1
> 1 1 1 14
> 2 2 1  6
> 
>
> 
>
> Unfortunately, none of these approaches allows you do return more than one
> result from the function, so you'll need to write
>
>> ddply(dd,~b+c,length)   # count
>> ddply(dd,~b+c,sum)
>> ddply(dd,~b+c,mean)   # arithmetic average
>
> There is an 'each' function in plyr, but it doesn't seem to be compatible
> with ddply.

That's because ddply applies the function to the whole data frame, not
just the columns that aren't participating in the split.  One way
around it is:

ddply(dd, ~ b + c, function(df) each(length, sum, mean)(df$a))

I haven't figured out a more elegant way to specify this yet.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best way to turn a list into a data.frame

2009-04-06 Thread hadley wickham

On Mon, Apr 6, 2009 at 8:49 AM, Daniel Brewer  wrote:
> Hello,
>
> What is the best way to turn a list into a data.frame?
>
> I have a list with something like:
> $`3845`
>  [1] "04010" "04012" "04360"
>
> $`1029`
> [1] "04110" "04115"
>
> And I would like to get a data frame like the following:
>
> 3845 "04010"
> 3845 "04012"
> 3845 "04360"
> 1029 "04110"
> 1029 "04115"
>
> Any ideas?

l <- list("3845" = c("a", "b", "c"), "1029" = c("d", "e","f"))

libary(reshape)
melt(l)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package: maps and spatstat question

2009-04-06 Thread hadley wickham

Hi Laura,

You might find the map_data function from the ggplot2 package helpful:

library(ggplot2)
library(maps)
head(map_data("state", "iowa"))

It formats the output of the map command into a self-documenting data frame.

Hadley

On Mon, Apr 6, 2009 at 7:00 AM, Laura Chihara  wrote:
>
> I would like to use the output from the map function
> in the package maps for use in, say, the spatstat
> package. I don't quite understand the coordinates
> for the border of the state:
> Example:
>
> library(maps)
> iowa<-map("region","iowa)
>
> x<-iowa$x
> y<-iowa$y
>
> There are NA's and duplicated coordinates.
> What would I need to do to use this in the
> spatstat owin command?
> owin(poly= ?)
>
> Thank you.
>
> -- Laura
>
> 
> Laura Chihara
> Professor of Mathematics  507-222-4065 (office)
> Dept of Mathematics       507-222-4312 (fax)
> Carleton College
> 1 North College Street
> Northfield MN 55057
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame, converting row data to columns

2009-04-04 Thread hadley wickham

On Sat, Apr 4, 2009 at 12:28 PM, jim holtman  wrote:
> Does this do what you want:
>
>> x <- read.table(textConnection("name         wrist nLevel            emot
> + 1                    4094          3.34                    1   frustrated
> + 2                    4094          3.94                    1  frustrated
> + 3                    4094            NA                    1   frustrated
> + 4                    4094          3.51                    1   frustrated
> + 5                    4094          3.81                    1   frustrated
> + 6                    4101          2.62                    4   excited
> + 7                    4094          2.65                    1   frustrated
> + 8                    4101            NA                    4   excited
> + 9                    4101          0.24                    4   excited
> + 10                   4101          0.23                    4
> excited"), header=TRUE)
>> # add index
>> x$indx <- ave(seq_along(x$emot), x$emot, FUN=function(z) seq(length(z)))
>> require(reshape)
>> y <- melt(x, measure='wrist')
>> cast(y, name+nLevel+emot~indx)

which you can abbreviate to :

cast(y, ... ~ indx)

... means all the other variables not explicitly mentioned.

Hadley

-- 
http://had.co.nz/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame, converting row data to columns

2009-04-04 Thread hadley wickham

On Sat, Apr 4, 2009 at 12:09 PM, ds  wrote:
>
> I have a data frame something like:
>                      name         wrist
> nLevel            emot
> 1                    4094          3.34                    1
> frustrated
> 2                    4094          3.94                    1
> frustrated
> 3                    4094            NA                    1
> frustrated
> 4                    4094          3.51                    1
> frustrated
> 5                    4094          3.81                    1
> frustrated
> 6                    4101          2.62                    4
> excited
> 7                    4094          2.65                    1
> frustrated
> 8                    4101            NA                    4
> excited
> 9                    4101          0.24                    4
> excited
> 10                   4101          0.23                    4
> excited
>
> I am trying to change it to this:
>
>              name          nLevel           emot          w1
> w2       w3      w4      w5     w5
>                4094                   1      frustrated    3.34
> 3.94      NA    3.51     3.81    2.65
>                4101                   4      excited
> 2.62       NA     0.24    0.23      NA    NA
>
> The nLevel and emot will never vary with the name, so there can be one
> row per name.  But I need the wrist measurements to be in the same
> row.  The number of wrist measures are variable, so I could fill in
> with NAs .  But I really just need help with reshaping the data frame
>
> I think I had some success with the melt
>
> x
> =
> melt
> .data
> .frame(bsub,id.vars=c("name","nLevel","emot"),measure.vars=c("wrist"))
>
> But I can't figure out the cast to get the wrist values in the rows.

cast(x, ... ~ nLevel) ?

If that doesn't work, please provide a minimal reproducible example.

Hadley

-- 
http://had.co.nz/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame to array?

2009-04-03 Thread hadley wickham

On Fri, Apr 3, 2009 at 1:45 PM,   wrote:
> I have a list of data.frames
>
>> str(bins)
>
> List of 19217
>  $ 100026:'data.frame': 1 obs. of  6 variables:
>  ..$ Sku  : chr "100026"
>  ..$ Bin  : chr "T149C"
>  ..$ Count: int 108
>  ..$ X    : int 20
>  ..$ Y    : int 149
>  ..$ Z    : chr "3"
>  $ 100030:'data.frame': 1 obs. of  6 variables:
> ...
> As you can see one 'column' is "Count". This list seems to contain 19217 
> data.frames. I would like to create an array of 19217 integers which hold the 
> values of the "Count" column. I have tried the obvious (to me):
>
> bins[[1:3]]$Count
>
> But that returns NULL instead of an array of length 3 that I was expecting. 
> Interestingly bins[[1]]$Count returns the first "Count" in the list of data 
> frames. How do I get all of the "Count"s?

Why not turn your list of data frames into a single data frame?

bindf <- do.call("rbind", bins)
bindf$Count

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plyr and table question

2009-04-03 Thread hadley wickham

On Fri, Apr 3, 2009 at 8:43 AM, baptiste auguie  wrote:
> That makes sense, so I can do something like,
>
> count <- function(x){
>        as.integer(unclass(table(x)))
> }
>
> count(d$user_id)
>
> ddply(d, .(user_id), transform, count = count(user_id))
>
>>  user_id  website time count
>> 1      20   google  930     2
>> 2      20 facebook 1000     2
>> 3      21    yahoo  935     1
>> 4      25 facebook 1015     1
>> 5      61   google  940     1
>
> Have I missed a built-in function to obtain this result?

ddply(d, .(user_id), transform, count = nrow)

?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plyr and table question

2009-04-03 Thread hadley wickham

On Fri, Apr 3, 2009 at 4:43 AM, baptiste auguie  wrote:
> Dear all,
>
> I'm puzzled by the following example inspired by a recent question on
> R-help,
>
>
> cc <- textConnection("user_id  website          time
> 20        google            0930
> 21        yahoo            0935
> 20        facebook        1000
> 25        facebook        1015
> 61        google            0940")
>
> d <- read.table(cc, head=T) ; close(cc)
>
> table(d$user_id) # count the occurrences
>
> # now I'd like to include these results in the original data.frame,
>
> ddply(d, .(website), transform, count = table(user_id)) # why two new
> columns?

Because ddply expects a data frame as output from your aggregation
function.  When the output isn't a data frame, it calls as.data.frame,
which in this case produces a data frame with two columns.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting all rows of factors which have at least one positive value?

2009-04-02 Thread hadley wickham

>   X1 X2
> 1  11  0
> 2  11  0
> 3  11  0
> 4  11  1
> 5  12  0
> 6  12  0
> 7  12  0
> 8  13  0
> 9  13  1
> 10 13  1
>
>
> and I want to select all rows pertaining to factor levels of X1 for
> which exists at least one "1" for X2. To be clear, I want rows 1:4
> (since there exists at least one observation for X1==11 for which
> X2==1) and rows 8:10 (likewise).
>
> It is easy to obtain the corresponding factor levels (i.e.,
> unique(x$X1[x$X2==1])), but I got stalled selecting the corresponding
> rows. I tried grep, but then I have to loop and concatenate the
> resulting vector. Any ideas?

Here's one way using plyr:

library(plyr)
ddply(x, "X1", subset, any(X2 == 1))

See http://had.co.nz/plyr for more details.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deleting rows based on identity variable

2009-04-02 Thread hadley wickham

On Thu, Apr 2, 2009 at 3:37 PM, Rowe, Brian Lee Yung (Portfolio
Analytics)  wrote:
> Is this what you want:
>> d1[which(id != 4),]

Or just

d1[id != 4, ]

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Public R servers?

2009-04-01 Thread hadley wickham

> Earlier I posted a question about memory usage, and the community's input was 
> very helpful.  However, I'm now extending my dataset (which I use when 
> running a regression using lm).  As a result, I am continuing to run into 
> problems with memory usage, and I believe I need to shift to implementing the 
> analysis on a different system..

Have you looked at the biglm package?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating First Occurance by a factor

2009-04-01 Thread hadley wickham

On Wed, Apr 1, 2009 at 11:00 AM, hadley wickham  wrote:
>> I tried messing with the line df$FixTime[which.min(df$FixInx)] changing it
>> to df[which.min(df$FixInx)] or adding new lines with the additional columns
>> that I want to include, but nothing seemed to work. I'll admit I only have a
>> mild understanding of what is going on with the function .fun. :-)
>
> You probably want:
>
> df[which.min(df$FixInx), ]

Or alternatively:

ddply(data, .(Sub, Tr, IA), subset, FixInx == min(FixInx))

which might be a bit easier to understand.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating First Occurance by a factor

2009-04-01 Thread hadley wickham

> I tried messing with the line df$FixTime[which.min(df$FixInx)] changing it
> to df[which.min(df$FixInx)] or adding new lines with the additional columns
> that I want to include, but nothing seemed to work. I'll admit I only have a
> mild understanding of what is going on with the function .fun. :-)

You probably want:

df[which.min(df$FixInx), ]

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: order of numeric factor levels?

2009-03-31 Thread hadley wickham

On Tue, Mar 31, 2009 at 5:01 PM, Marianne Promberger
 wrote:
> Hi,
>
> I'm having problems with qplot and the order of numeric factor levels.
>
> Factors with numeric levels show up in the order in which they appear
> in the data, not in the order of the levels (as far as I understand
> factors!)
>
> Here is a minimal example:
>
> library(ggplot2)
> y <- c(-1,2,0,0,-2,-1)
> z <- factor(y,levels=c(-2,-1,0,1,2))
> qplot(z)
>
> For me, the resulting plot is ordered: -1,2,0,-2
>
> By contrast,
> plot(z) is neatly ordered -2,-1,0,1,2
>
> What am I not getting?

It's a bug in the current version.  You can fix it by explicitly
setting the limits for the x axis - + xlim("-2","-1","0","1","2")).
It will be fixed in the next release which I'm trying to get out soon.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshape: 'melt' numerous objects

2009-03-31 Thread hadley wickham

On Tue, Mar 31, 2009 at 11:12 AM, Steve Murray  wrote:
>
> Dear R Users,
>
> I'm trying to use the reshape package to 'melt' my gridded data into column 
> format. I've done this before on individual files, but this time I'm trying 
> to do it on a directory of files (with variable file names) - therefore I 
> have to also use the 'assign' command. I have come up against a couple of 
> problems however and am therefore seeking advice...

I'd _strongly_ recommend you don't use assign.  Instead put everything
in a list:

paths <- dir("mydir", "\\.csv$", full.names = TRUE)
names(paths) <- basename(paths)

data <- lapply(paths, read.csv)
molten <- lapply(data, melt, id = "Latitude", na.rm=TRUE)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using apply to get group means

2009-03-31 Thread hadley wickham

On Tue, Mar 31, 2009 at 11:31 AM, baptiste auguie  wrote:
> Not exactly the output you asked for, but perhaps you can consider,
>
> library(doBy)
>> summaryBy(x3~x2+x1,data=x,FUN=mean)
>>
>>  x2 x1 x3.mean
>> 1  1  A     1.5
>> 2  1  B     2.0
>> 3  1  C     3.5
>> 4  2  A     4.0
>> 5  2  B     5.5
>> 6  2  C     6.0
>
>
> the plyr package also provides similar functionality, as do the ?by, ?ave,
> and ?tapply base functions.

In plyr it would look like:

x1 <- rep(c("A", "B", "C"), 3)
x2 <- c(rep(1, 3), rep(2, 3), 1, 2, 1)
x3 <- c(1, 2, 3, 4, 5, 6, 2, 6, 4)
df <- data.frame(x1, x2, x3)

ddply(df, .(x1, x2), transform, x3.mean = mean(x3))

Note how I created the data frame - only use cbind if you want a
matrix (i.e. all the columns have the same type)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bug in col2rgb?

2009-03-31 Thread hadley wickham

> col2rgb("#0079", TRUE)
  [,1]
red  0
green0
blue 0
alpha  121
> col2rgb("#0080", TRUE)
  [,1]
red255
green  255
blue   255
alpha0
> col2rgb("#0081", TRUE)
  [,1]
red  0
green0
blue 0
alpha  129


Any ideas?

Thanks,
Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating First Occurance by a factor

2009-03-30 Thread hadley wickham

On Mon, Mar 30, 2009 at 2:58 PM, Mike Lawrence  wrote:
> I discovered Hadley Wickham's "plyr" package last week and have found
> it very useful in circumstances like this:
>
> library(plyr)
>
> firstfixtime = ddply(
>       .data = data
>       , .variables = c('Sub','Tr','IA')
>       , .fun <- function(df){
>               df$FixTime[which.min(df$FixInx)]
>       }
> )

Or to save a little typing:

ddply(data, .(Sub, Tr, IA), colwise(min, .(FixTime))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1520 matches

Mail list logo