[R] averaging rows on a data.frame according to a factor

2013-11-22 Thread john d
Dear all,

I apologize for the newbie question, but I'm stuck.

I have a data frame in the following form:

dat-as.data.frame(cbind(c(a,a,a,b,b), c(1,2,3,3,2),c(4,3,5,4,4)))

I need a way to generate a new dataframe with the average for each factor.
The result should look like:

res-as.data.frame(cbind(c(a,b), c(2,2.5),c(4,4)))

Any help would be greatly appreciated.

Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] histogram with bars colored according to a vector of values

2013-07-25 Thread john d
Dear all,

Let's say I have the following data.frame:

dat-data.frame(x=rnorm(100), y=rnorm(100,2))

and I plot a histogram of variable x, somethink like:
hist(dat$x, breaks=-5:5)

Now, I'd like to color each bar according to the mean of the cases
according to y. For instance, the color of the bar between -2 and -1 should
reflect the mean of variable y for the corresponding cases. Any suggestions?

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with PCA

2013-03-16 Thread john d
Dear all,

If I do a PCA like this:

dat-matrix(rnorm(30),ncol=3)
res-prcomp(dat)

Now, imagine that I got new data that I want to project onto the
original PC axes. How do I do that?

Thanks!

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple values in one column

2012-04-06 Thread John D. Muccigrosso
I have some data files in which some fields have multiple values. For example

first  last   sex   major
John   Smith  M ANTH
Jane   DoeF HIST,BIOL

What's the best R-like way to handle these data (Jane's major in my example), 
so that I can do things like summarize the other fields by them (e.g., sex by 
major)?

Right now I'm processing the files (in excel since they're spreadsheets) by 
duplicating lines with two values in the major field, eliminating one value per 
row. I suspect there's a nifty R way to do this.

Thanks in advance!

John Muccigrosso

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple values in one column

2012-04-06 Thread John D. Muccigrosso
On Apr 6, 2012, at 9:09 AM, John D. Muccigrosso wrote:

 I have some data files in which some fields have multiple values. For example
 
 first  last   sex   major
 John   Smith  M ANTH
 Jane   DoeF HIST,BIOL
 
 What's the best R-like way to handle these data (Jane's major in my example), 
 so that I can do things like summarize the other fields by them (e.g., sex by 
 major)?
 
 Right now I'm processing the files (in excel since they're spreadsheets) by 
 duplicating lines with two values in the major field, eliminating one value 
 per row. I suspect there's a nifty R way to do this.


I've gotten a few responses, for which I'm grateful, but either I don't quite 
see how they answer my question, or I didn't phrase my question well, both of 
which are equally possible. :-)

So, given the data as above, let's call it students, I have no problem 
turning it into:

first  last   sex   major
John   Smith  M ANTH
Jane   DoeF HIST
Jane   DoeF BIOL

What I then do with this is things like 

table(students$sex, students$major)

So, three steps:

1. Get data with multiple values per field.
2. Turn it into a data frame with only one value per field (by duplicating 
lines).
3. Do things like table().

I'd like to be able to skip #2.

Thanks.

John Muccigrosso

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] circles()

2012-03-26 Thread John D. Muccigrosso
I cannot for the life of me figure this out:

What's the parameter to fill in with color circles made with circles()? col 
changes the line color, but all I see in the help is a reference to additional 
graphic parameters, and no examples via google.

Thanks!

John Muccigrosso

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] barplot and NA

2012-03-12 Thread John D. Muccigrosso
Am I wrong that barplot is supposed to just skip NAs, and continue with the 
rest of the data in a matrix column? That's how I read various posts on the 
subject.

But that's not what happens for me with R64.app (on a Mac, obviously). For 
example:

d0 - as.matrix(c(2,3,4))
d1 - as.matrix(c(2,3,NA))
d2 - as.matrix(c(2,NA,4))
d3 - as.matrix(c(NA,3,4))
barplot(d0)
barplot(d1)
barplot(d2)
barplot(d3)

generates four bar plots. The first has one bar with three visible bands, as 
expected. The second has two bands; still OK. But the third has only one band 
(at 2) and the fourth has none.

So it appears that barplot is barfing on those NAs and stopping its plot at 
those points.

Is that the expected behavior?

Thanks.

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot and NA

2012-03-12 Thread John D. Muccigrosso
On 12 Mar 2012, at 12:47 , S Ellison wrote:

 Yes, to the extent that the default barplot plots the height of the bar so 
 far as the sum of teh values so far, starting at teh first. For your first 
 vector, no problem; for your second, the highest value is undefiuned, for the 
 third, the sum is undefined after the second value (an NA) and so on.
 
 Try adding 'beside=TRUE to the barplots, as in
 barplot(d3, beside=TRUE)
 and you will see all the known values plotted as you;d expect.

That makes sense, but since I do want a stacked bar plot, I'll need to change 
the NAs to 0 (which of course I've already done).

This should be made clear in the documentation, no? It's possible that barplot 
could do something like a na.rm=T internally and avoid this problem, but it 
doesn't, so NA is deadly in stacked plots. To be honest, if I hadn't scaled all 
my bars to 1 to show percentages, I wouldn't have noticed how some were leaving 
out a small category or two.

Thanks for the help.

John Muccigrosso

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.