[R] tapply

2005-06-20 Thread Weiwei Shi
hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391  2600
5541 389100307391  2600
5542 389100307391  2600
5543 389100307391  2600
5544 389100307391  2600
5546 381300302513NA
5547 387000307470NA
5548 387000307470NA
5549 387000307470NA
5550 387000307470NA
5551 387000307470NA
5552 387000307470NA

I want to sum the column 3 by column 2.
I removed NA by calling:
tapply(z[[3]], z[[2]], sum, na.rm=T)
but it does not work.

then, i used
z1<-z[!is.na(z[[3]],]
and repeat
still doesn't work.

please help.

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply

2007-04-09 Thread crmontes
Hi,

I have a summary table for an experiment that looks like this

STUDY BLOCK  TREATMENT MEASURMENT RESPONSE
A  1   T-0   1 12
A  1   T-1   1 52
A  1   T-0   2 12
A  1   T-1   2 65

and so on...

there are 10 studies, 4 blocks, 10 treatemnts, 5 measurments for
the response value.

I want to produce a table that looks like this:

STUDY BLOCK TREATMENT MEAS.1 MEAS.2 MEAS.3
A   1 T-1  15 54 65
A   1 T-2  54 65 45
A   2 T-1  12 12 23
A   2 T-2  65 54 65

and so on...

with tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean)

I get very close, however, I get the results as a list!

if instead I use

ftable(tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean))

I get REALLY close, but the I get only one value for each class, however I
need to whole table, because at the end, what I really need is the
increment between MEASUREMENT (n) - Measurement (n-1) for each TREATMENT,
BLOCK, STUDY, to perform a ANOVA analysis over increment data.

Esentialy, I want to move away from running a pivot-table in ACCESS

Any thoughts?

Cristian Montes
North Carolina State University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply

2007-06-01 Thread livia

Hello, I want to conduct normality test to a series of data and get the
p-value for each subset. I am using the following codes, but it does not
work.

tapply(re, list(reg, ast), pvalue(shapiro.test))

Could anyone give me some advice? Many thanks.
-- 
View this message in context: 
http://www.nabble.com/tapply-tf3851631.html#a10910748
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply

2007-07-19 Thread sigalit mangut-leiba
hello,
i want to compute the mean of a variable ("aps") for every class
(1,2, and 3).
every id have a few obs., "aps" and class are constant over id.
like this:
id   aps class
1  11   2
1  11   2
1  11   2
1  11   2
1  11   2
2   83
2   83
2   83
3  12   2
3  12   2
.
.

i tried:

tapply(icu1$aps_st, icu1$hidclass, function(z) mean(unique(z)))

but it's counting every row and not every id.

thank you,

Sigalit.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply

2007-07-19 Thread sigalit mangut-leiba
I'm sorry for the unfocused questions, i'm new here...
the output should be:
classaps_mean
1  na
2 11.5
3   8

the mean aps of every class, when every id count *once*,  for example: class
2, mean= (11+12)/2=11.5
hope it's clearer.
sigalit.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply

2004-03-18 Thread mike . campana
Dear all
I have a dataframe containing hourly data of 3 parameters. 
I would like to create a dataframe containg daily mean values of these 
parameters. Additionally I want to keep information about time of 
measurement ("year","month","day"). 
With the function tapply I can average  over a column of the dataframe. 
I can repeat the function 2 time and  merge the vectors. In this way I 
obtain my new dataframe (see below).If I want to add the column day, 
month and year I can repeat tapply other three time. This system works.  


Question: is there a function that average in a single step over the 3 
columns?

Thanks a lot for your answer!
Regards
Mike Campana   

 read the data
setwd("c:/R")
data <- NULL
data <- as.data.frame(read.table(file="Montreal.txt",header=F,skip=15))
colnames(data) 
<-c("year","month","day","hour","min","temp","press","ozone")
### create  mean value
temp_daily <- 
tapply(data$temp,data$year*1+data$month*100+data$day,FUN=mean)
press_daily <- 
tapply(data$press,data$year*1+data$month*100+data$day,FUN=mean)
ozone_daily <- 
tapply(data$ozone,data$year*1+data$month*100+data$day,FUN=mean)
### merge the data
newdata <- as.data.frame (cbind(temp_daily,temp_daily,temp_daily))

---



---



---



---



---

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-20 Thread Jim Brennan
This may help
R>wei
 V1   V2   V3
1  5540 389100307391 2600
2  5541 389100307391 2600
3  5542 389100307391 2600
4  5543 389100307391 2600
5  5544 389100307391 2600
6  5546 381300302513   NA
7  5547 387000307470   NA
8  5548 387000307470   NA
9  5549 387000307470   NA
10 5550 387000307470   NA
11 5551 387000307470   NA
12 5552 387000307470   NA
R>ave(wei[,3],wei[,2],FUN=sum)
 [1] 13000 13000 13000 13000 13000NANANANANANANA
R>

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi
Sent: June 20, 2005 7:16 PM
To: R-help@stat.math.ethz.ch
Subject: [R] tapply

hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391  2600
5541 389100307391  2600
5542 389100307391  2600
5543 389100307391  2600
5544 389100307391  2600
5546 381300302513NA
5547 387000307470NA
5548 387000307470NA
5549 387000307470NA
5550 387000307470NA
5551 387000307470NA
5552 387000307470NA

I want to sum the column 3 by column 2.
I removed NA by calling:
tapply(z[[3]], z[[2]], sum, na.rm=T)
but it does not work.

then, i used
z1<-z[!is.na(z[[3]],]
and repeat
still doesn't work.

please help.

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-20 Thread Marc Schwartz
On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:
> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391  2600
> 5541 389100307391  2600
> 5542 389100307391  2600
> 5543 389100307391  2600
> 5544 389100307391  2600
> 5546 381300302513NA
> 5547 387000307470NA
> 5548 387000307470NA
> 5549 387000307470NA
> 5550 387000307470NA
> 5551 387000307470NA
> 5552 387000307470NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
> 
> please help.


The index vector(s) in tapply() need to be a "list". See the description
of the INDEX argument in ?tapply:

> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)
381300302513 387000307470 389100307391 
   0013000 


Note that the use of na.rm = TRUE here results in misleading values of 0
for the other two groups, which are all NA's and this is not
self-evident unless you know the data.

You may be better off with:

> tapply(z[[3]],list(z[[2]]), sum)
381300302513 387000307470 389100307391 
  NA   NA13000 

unless your real data is a mix of NA's and measured values.

Also see ?complete.cases and ?na.omit for further approaches to dealing
with such data sets.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-20 Thread Douglas Bates
On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391  2600
> 5541 389100307391  2600
> 5542 389100307391  2600
> 5543 389100307391  2600
> 5544 389100307391  2600
> 5546 381300302513NA
> 5547 387000307470NA
> 5548 387000307470NA
> 5549 387000307470NA
> 5550 387000307470NA
> 5551 387000307470NA
> 5552 387000307470NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.

Can you be more explicit about "doesn't work"?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-20 Thread Gabor Grothendieck
On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391  2600
> 5541 389100307391  2600
> 5542 389100307391  2600
> 5543 389100307391  2600
> 5544 389100307391  2600
> 5546 381300302513NA
> 5547 387000307470NA
> 5548 387000307470NA
> 5549 387000307470NA
> 5550 387000307470NA
> 5551 387000307470NA
> 5552 387000307470NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
> 
> please help.
> 

Depending on what you want you may be able to use rowsum:

- display only groups that have at least one non-NA with the sum
  being the sum of the non-NAs:

with(na.omit(z), rowsum(V3, V2))

- display all groups with the sum being NA if any member is NA:

rowsum(z$V3, z$V2)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-21 Thread Weiwei Shi
hi
i tried all the methods suggested above:
ave and rowsum with "with" function works for my situation. I think
the problem might not be due to tapply.
My data z comes from
z<-y[y[[1]] %in% x[[2]], c(1,9)]

while z is supposed to have no entries for those non-matched between x and y.

however, when I run tapply, and the result also includes those
non-matched entries. I use is.na function to remove those entry from z
first and then use tapply again, but the result is the same: those
NA's and those non-matched results are still there. That's what I mean
by "it doesn't work".

Is there something I missed here so that z "implicitly" has some
"trace" back to y dataset?

thanks,

On 6/20/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
> > hi,
> > i have another question on tapply:
> > i have a dataset z like this:
> > 5540 389100307391  2600
> > 5541 389100307391  2600
> > 5542 389100307391  2600
> > 5543 389100307391  2600
> > 5544 389100307391  2600
> > 5546 381300302513NA
> > 5547 387000307470NA
> > 5548 387000307470NA
> > 5549 387000307470NA
> > 5550 387000307470NA
> > 5551 387000307470NA
> > 5552 387000307470NA
> >
> > I want to sum the column 3 by column 2.
> > I removed NA by calling:
> > tapply(z[[3]], z[[2]], sum, na.rm=T)
> > but it does not work.
> >
> > then, i used
> > z1<-z[!is.na(z[[3]],]
> > and repeat
> > still doesn't work.
> >
> > please help.
> >
> 
> Depending on what you want you may be able to use rowsum:
> 
> - display only groups that have at least one non-NA with the sum
>   being the sum of the non-NAs:
> 
> with(na.omit(z), rowsum(V3, V2))
> 
> - display all groups with the sum being NA if any member is NA:
> 
> rowsum(z$V3, z$V2)
> 


-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-21 Thread Liaw, Andy
What does str(z) say?  I suspect the second column is a factor, which, after
the subsetting, has some empty levels.  If so, just drop those levels.

Andy

> From: Weiwei Shi
> 
> hi
> i tried all the methods suggested above:
> ave and rowsum with "with" function works for my situation. I think
> the problem might not be due to tapply.
> My data z comes from
> z<-y[y[[1]] %in% x[[2]], c(1,9)]
> 
> while z is supposed to have no entries for those non-matched 
> between x and y.
> 
> however, when I run tapply, and the result also includes those
> non-matched entries. I use is.na function to remove those entry from z
> first and then use tapply again, but the result is the same: those
> NA's and those non-matched results are still there. That's what I mean
> by "it doesn't work".
> 
> Is there something I missed here so that z "implicitly" has some
> "trace" back to y dataset?
> 
> thanks,
> 
> On 6/20/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> > On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
> > > hi,
> > > i have another question on tapply:
> > > i have a dataset z like this:
> > > 5540 389100307391  2600
> > > 5541 389100307391  2600
> > > 5542 389100307391  2600
> > > 5543 389100307391  2600
> > > 5544 389100307391  2600
> > > 5546 381300302513NA
> > > 5547 387000307470NA
> > > 5548 387000307470NA
> > > 5549 387000307470NA
> > > 5550 387000307470NA
> > > 5551 387000307470NA
> > > 5552 387000307470NA
> > >
> > > I want to sum the column 3 by column 2.
> > > I removed NA by calling:
> > > tapply(z[[3]], z[[2]], sum, na.rm=T)
> > > but it does not work.
> > >
> > > then, i used
> > > z1<-z[!is.na(z[[3]],]
> > > and repeat
> > > still doesn't work.
> > >
> > > please help.
> > >
> > 
> > Depending on what you want you may be able to use rowsum:
> > 
> > - display only groups that have at least one non-NA with the sum
> >   being the sum of the non-NAs:
> > 
> > with(na.omit(z), rowsum(V3, V2))
> > 
> > - display all groups with the sum being NA if any member is NA:
> > 
> > rowsum(z$V3, z$V2)
> > 
> 
> 
> -- 
> Weiwei Shi, Ph.D
> 
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-21 Thread Weiwei Shi
Even before I tried, I already realize it must be true when I read
this reply! Great job! thanks, Andy.

> str(z)
`data.frame':   235 obs. of  2 variables:
 $ CLAIMNUM : Factor w/ 1907 levels "0","1001849",..: 1083 1083
1083 1582 1582 1084 1681 1681 1391 1391 ...
 $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...

So, I have another general question: how to avoid this when I do the matching?
In my case, claimnum does not have to be a factor.  I think I can do
as.integer on it to de-factor it. But, I want to know how to do it w/
keeping is as factor? btw, what's your way to drop those levels?  :)

weiwei 


On 6/21/05, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> What does str(z) say?  I suspect the second column is a factor, which, after
> the subsetting, has some empty levels.  If so, just drop those levels.
> 
> Andy
> 
> > From: Weiwei Shi
> >
> > hi
> > i tried all the methods suggested above:
> > ave and rowsum with "with" function works for my situation. I think
> > the problem might not be due to tapply.
> > My data z comes from
> > z<-y[y[[1]] %in% x[[2]], c(1,9)]
> >
> > while z is supposed to have no entries for those non-matched
> > between x and y.
> >
> > however, when I run tapply, and the result also includes those
> > non-matched entries. I use is.na function to remove those entry from z
> > first and then use tapply again, but the result is the same: those
> > NA's and those non-matched results are still there. That's what I mean
> > by "it doesn't work".
> >
> > Is there something I missed here so that z "implicitly" has some
> > "trace" back to y dataset?
> >
> > thanks,
> >
> > On 6/20/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> > > On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
> > > > hi,
> > > > i have another question on tapply:
> > > > i have a dataset z like this:
> > > > 5540 389100307391  2600
> > > > 5541 389100307391  2600
> > > > 5542 389100307391  2600
> > > > 5543 389100307391  2600
> > > > 5544 389100307391  2600
> > > > 5546 381300302513NA
> > > > 5547 387000307470NA
> > > > 5548 387000307470NA
> > > > 5549 387000307470NA
> > > > 5550 387000307470NA
> > > > 5551 387000307470NA
> > > > 5552 387000307470NA
> > > >
> > > > I want to sum the column 3 by column 2.
> > > > I removed NA by calling:
> > > > tapply(z[[3]], z[[2]], sum, na.rm=T)
> > > > but it does not work.
> > > >
> > > > then, i used
> > > > z1<-z[!is.na(z[[3]],]
> > > > and repeat
> > > > still doesn't work.
> > > >
> > > > please help.
> > > >
> > >
> > > Depending on what you want you may be able to use rowsum:
> > >
> > > - display only groups that have at least one non-NA with the sum
> > >   being the sum of the non-NAs:
> > >
> > > with(na.omit(z), rowsum(V3, V2))
> > >
> > > - display all groups with the sum being NA if any member is NA:
> > >
> > > rowsum(z$V3, z$V2)
> > >
> >
> >
> > --
> > Weiwei Shi, Ph.D
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> >
> 
> 
> 
> --
> Notice:  This e-mail message, together with any attachment...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-21 Thread Liaw, Andy
Try:

> (x <- factor(1:2, levels=1:5))
[1] 1 2
Levels: 1 2 3 4 5
> (x <- x[, drop=TRUE])
[1] 1 2
Levels: 1 2

Andy

> From: Weiwei Shi [mailto:[EMAIL PROTECTED] 
> 
> Even before I tried, I already realize it must be true when I read
> this reply! Great job! thanks, Andy.
> 
> > str(z)
> `data.frame':   235 obs. of  2 variables:
>  $ CLAIMNUM : Factor w/ 1907 levels "0","1001849",..: 1083 1083
> 1083 1582 1582 1084 1681 1681 1391 1391 ...
>  $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...
> 
> So, I have another general question: how to avoid this when I 
> do the matching?
> In my case, claimnum does not have to be a factor.  I think I can do
> as.integer on it to de-factor it. But, I want to know how to do it w/
> keeping is as factor? btw, what's your way to drop those levels?  :)
> 
> weiwei 
> 
> 
> On 6/21/05, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> > What does str(z) say?  I suspect the second column is a 
> factor, which, after
> > the subsetting, has some empty levels.  If so, just drop 
> those levels.
> > 
> > Andy
> > 
> > > From: Weiwei Shi
> > >
> > > hi
> > > i tried all the methods suggested above:
> > > ave and rowsum with "with" function works for my 
> situation. I think
> > > the problem might not be due to tapply.
> > > My data z comes from
> > > z<-y[y[[1]] %in% x[[2]], c(1,9)]
> > >
> > > while z is supposed to have no entries for those non-matched
> > > between x and y.
> > >
> > > however, when I run tapply, and the result also includes those
> > > non-matched entries. I use is.na function to remove those 
> entry from z
> > > first and then use tapply again, but the result is the same: those
> > > NA's and those non-matched results are still there. 
> That's what I mean
> > > by "it doesn't work".
> > >
> > > Is there something I missed here so that z "implicitly" has some
> > > "trace" back to y dataset?
> > >
> > > thanks,
> > >
> > > On 6/20/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> > > > On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
> > > > > hi,
> > > > > i have another question on tapply:
> > > > > i have a dataset z like this:
> > > > > 5540 389100307391  2600
> > > > > 5541 389100307391  2600
> > > > > 5542 389100307391  2600
> > > > > 5543 389100307391  2600
> > > > > 5544 389100307391  2600
> > > > > 5546 381300302513NA
> > > > > 5547 387000307470NA
> > > > > 5548 387000307470NA
> > > > > 5549 387000307470NA
> > > > > 5550 387000307470NA
> > > > > 5551 387000307470NA
> > > > > 5552 387000307470NA
> > > > >
> > > > > I want to sum the column 3 by column 2.
> > > > > I removed NA by calling:
> > > > > tapply(z[[3]], z[[2]], sum, na.rm=T)
> > > > > but it does not work.
> > > > >
> > > > > then, i used
> > > > > z1<-z[!is.na(z[[3]],]
> > > > > and repeat
> > > > > still doesn't work.
> > > > >
> > > > > please help.
> > > > >
> > > >
> > > > Depending on what you want you may be able to use rowsum:
> > > >
> > > > - display only groups that have at least one non-NA with the sum
> > > >   being the sum of the non-NAs:
> > > >
> > > > with(na.omit(z), rowsum(V3, V2))
> > > >
> > > > - display all groups with the sum being NA if any member is NA:
> > > >
> > > > rowsum(z$V3, z$V2)
> > > >
> > >
> > >
> > > --
> > > Weiwei Shi, Ph.D
> > >
> > > "Did you always know?"
> > > "No, I did not. But I believed..."
> > > ---Matrix III
> > >
> > > __
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> > >
> > 
> > 
> > 
> > 
> --
> 
> > Notice:  This e-mail message, together with any 
> attachments, contains information of Merck & Co., Inc. (One 
> Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
> and/or its affiliates (which may be known outside the United 
> States as Merck Frosst, Merck Sharp & Dohme or MSD and in 
> Japan, as Banyu) that may be confidential, proprietary 
> copyrighted and/or legally privileged. It is intended solely 
> for the use of the individual or entity named on this 
> message.  If you are not the intended recipient, and have 
> received this message in error, please notify us immediately 
> by reply e-mail and then delete it from your system.
> > 
> --
> 
> > 
> 
> 
> -- 
> Weiwei Shi, Ph.D
> 
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2005-06-22 Thread Martin Maechler
> "AndyL" == Liaw, Andy <[EMAIL PROTECTED]>
> on Tue, 21 Jun 2005 13:30:54 -0400 writes:

AndyL> Try:
>> (x <- factor(1:2, levels=1:5))
AndyL> [1] 1 2
AndyL> Levels: 1 2 3 4 5
>> (x <- x[, drop=TRUE])
AndyL> [1] 1 2
AndyL> Levels: 1 2

or  
(x <- factor(1:2, levels=1:5))
(x2 <- factor(x))

which also drops the level
Martin

AndyL> Andy

>> From: Weiwei Shi [mailto:[EMAIL PROTECTED] 
>> 
>> Even before I tried, I already realize it must be true when I read
>> this reply! Great job! thanks, Andy.
>> 
>> > str(z)
>> `data.frame':   235 obs. of  2 variables:
>> $ CLAIMNUM : Factor w/ 1907 levels "0","1001849",..: 1083 1083
>> 1083 1582 1582 1084 1681 1681 1391 1391 ...
>> $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...
>> 
>> So, I have another general question: how to avoid this when I 
>> do the matching?
>> In my case, claimnum does not have to be a factor.  I think I can do
>> as.integer on it to de-factor it. But, I want to know how to do it w/
>> keeping is as factor? btw, what's your way to drop those levels?  :)
>> 
>> weiwei 
>> 
>> 
>> On 6/21/05, Liaw, Andy <[EMAIL PROTECTED]> wrote:
>> > What does str(z) say?  I suspect the second column is a 
>> factor, which, after
>> > the subsetting, has some empty levels.  If so, just drop 
>> those levels.
>> > 
>> > Andy
>> > 
>> > > From: Weiwei Shi
>> > >
>> > > hi
>> > > i tried all the methods suggested above:
>> > > ave and rowsum with "with" function works for my 
>> situation. I think
>> > > the problem might not be due to tapply.
>> > > My data z comes from
>> > > z<-y[y[[1]] %in% x[[2]], c(1,9)]
>> > >
>> > > while z is supposed to have no entries for those non-matched
>> > > between x and y.
>> > >
>> > > however, when I run tapply, and the result also includes those
>> > > non-matched entries. I use is.na function to remove those 
>> entry from z
>> > > first and then use tapply again, but the result is the same: those
>> > > NA's and those non-matched results are still there. 
>> That's what I mean
>> > > by "it doesn't work".
>> > >
>> > > Is there something I missed here so that z "implicitly" has some
>> > > "trace" back to y dataset?
>> > >
>> > > thanks,
>> > >
>> > > On 6/20/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>> > > > On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote:
>> > > > > hi,
>> > > > > i have another question on tapply:
>> > > > > i have a dataset z like this:
>> > > > > 5540 389100307391  2600
>> > > > > 5541 389100307391  2600
>> > > > > 5542 389100307391  2600
>> > > > > 5543 389100307391  2600
>> > > > > 5544 389100307391  2600
>> > > > > 5546 381300302513NA
>> > > > > 5547 387000307470NA
>> > > > > 5548 387000307470NA
>> > > > > 5549 387000307470NA
>> > > > > 5550 387000307470NA
>> > > > > 5551 387000307470NA
>> > > > > 5552 387000307470NA
>> > > > >
>> > > > > I want to sum the column 3 by column 2.
>> > > > > I removed NA by calling:
>> > > > > tapply(z[[3]], z[[2]], sum, na.rm=T)
>> > > > > but it does not work.
>> > > > >
>> > > > > then, i used
>> > > > > z1<-z[!is.na(z[[3]],]
>> > > > > and repeat
>> > > > > still doesn't work.
>> > > > >
>> > > > > please help.
>> > > > >
>> > > >
>> > > > Depending on what you want you may be able to use rowsum:
>> > > >
>> > > > - display only groups that have at least one non-NA with the sum
>> > > >   being the sum of the non-NAs:
>> > > >
>> > > > with(na.omit(z), rowsum(V3, V2))
>> > > >
>> > > > - display all groups with the sum being NA if any member is NA:
>> > > >
>> > > > rowsum(z$V3, z$V2)
>> > > >
>> > >
>> > >
>> > > --
>> > > Weiwei Shi, Ph.D
>> > >
>> > > "Did you always know?"
>> > > "No, I did not. But I believed..."
>> > > ---Matrix III
>> > >
>> > > __
>> > > R-help@stat.math.ethz.ch mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide!
>> > > http://www.R-project.org/posting-guide.html
>> > >
>> > >
>> > >
>> > 
>> > 
>> > 
>> > 
>> --
>> 
>> > Notice:  This e-mail message, together with any 
>> attachments, contains information of Merck & Co., Inc. (One 
>> Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
>> and/or its affiliates (which may be known outside the United 
>> States as Merck Frosst, Merck Sharp & Dohme or MSD and in 
  

[R] tapply t.test

2005-07-26 Thread mark salsburg
I cannot find in the literature a way to conduct the following t.test
on 2 objects, A and B

A   B
col1 col2 col3  col1 col2  col3

Where col(i)'s name is identical in both A and B (they are names of tissues).

How do I test (t.test) if each tissue across the object is
signifanctly different?? (i'm pretty sure I have to use tapply())


Also is there a way to multi plot all 89 tissues showing the A values
and the B values..

thank you

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply question

2006-07-06 Thread markleeds
I think I understand tapply but i still
can't figure out how to do the following.

I have a dataframe where some of the column names are the same
and i want to make a new dataframe where columns
that have the same name are averaged by row.

so, if the data frame, DF, was 

AAABBB  CCC   AAA DDD
1   07 11  13
20   8 12  14
30   6  0  15

then the resulting data frame would be exactly the same except
that the AAA column would be 

6   comes from  (11 + 1)/2
7comes from  (12 + 2)/2
3   stays 3 because the element in the other AAA is zero
so i don't want to average that one. it shoulsd just stay 3.

So, I do 

DF[DF == 0]<-NA
rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
revisedDF<-tapply(seq(DF),names(DF),rowmeans)

there are two problems with this :

1) i need to go through the rows of the same name, not the columns
so i don't think seq(DF) is right because that goes through 
the columns but i want to go through rows.

2) BBB will come back with ALL NA's ( since
it was unique and there was nothing else to average ( and I don't know how to 
transform that BB column to all zero's.

thanks and i'm sorry for so many questions. i'm getting bettter with this stuff 
and my questions will decrease soon.

my guess is that i no longer should be using tapply ?
and should be using some other version of apply.
thanks
 mark

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2007-04-10 Thread Petr PIKAL
Hallo

Seems to me that you can make a summary table using

aggregate(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean)

and then if you want you can use reshape function or melt/cast function 
from reshape package to get wide form of your table.

Regards

Petr Pikal
[EMAIL PROTECTED]

[EMAIL PROTECTED] napsal dne 10.04.2007 00:14:15:

> Hi,
> 
> I have a summary table for an experiment that looks like this
> 
> STUDY BLOCK  TREATMENT MEASURMENT RESPONSE
> A  1   T-0   1 12
> A  1   T-1   1 52
> A  1   T-0   2 12
> A  1   T-1   2 65
> 
> and so on...
> 
> there are 10 studies, 4 blocks, 10 treatemnts, 5 measurments for
> the response value.
> 
> I want to produce a table that looks like this:
> 
> STUDY BLOCK TREATMENT MEAS.1 MEAS.2 MEAS.3
> A   1 T-1  15 54 65
> A   1 T-2  54 65 45
> A   2 T-1  12 12 23
> A   2 T-2  65 54 65
> 
> and so on...
> 
> with tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean)
> 
> I get very close, however, I get the results as a list!
> 
> if instead I use
> 
> ftable(tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), 
mean))
> 
> I get REALLY close, but the I get only one value for each class, however 
I
> need to whole table, because at the end, what I really need is the
> increment between MEASUREMENT (n) - Measurement (n-1) for each 
TREATMENT,
> BLOCK, STUDY, to perform a ANOVA analysis over increment data.
> 
> Esentialy, I want to move away from running a pivot-table in ACCESS
> 
> Any thoughts?
> 
> Cristian Montes
> North Carolina State University
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply question

2005-12-16 Thread Frank Johannes
HI,
Suppose I have the following data structure.
   LRT  tp
1   1.50654010 522
2   0.51793929 522
3   0.90340299 522
4   1.20293325 522
5   1.05578774 523
6   0.01617942 523
7   0.68183543 523
8   0.43820244 523
9   1.14123995 524
10  0.05809550 524
11  0.93061597 524
12  1.39739700 524
13  1.05220953 525
14  0.03471461 525
15  0.63168798 525
16  1.40592603 525
17  1.41884492 526
18  0.23388479 526
19  0.21881064 526
20  0.99710830 526
21  2.02054187 527
22  1.99872887 527
23  1.04187450 527
24  1.31556807 527
25  2.5190 528
26  2.94778561 528
27  1.88800177 528
28  2.08249941 528


I have succesfully used a command line such as the one below to get
maxima for each "tp-category'

data.out<-data[tapply(LRT,tp, function(x) which(LRT==max(x))),]

However, when I try it on the above data, it gives me the following
error message:
>Error in "[.data.frame"(data, tapply(LRT, tp, function(x) which(LRT ==  : 
invalid subscript type

I don't know what to do.
Thanks for your help

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2007-06-01 Thread Bojanowski, M.J. \(Michal\)
I'm not sure what is the 'pvalue' function (it's not found in base nor
stats packages) but
this should give you what you want:

# some example
re <- rnorm(100)
reg <- rep(1:3, length=100)
ast <- rep(1:2, length=100)

tapply( re, list(reg, ast), function(v) shapiro.test(v)$p.value )

# or neater by defining a function
p.shapiro <- function(v) shapiro.test(v)$p.value
tapply( re, list(reg, ast), p.shapiro )



hth,

michal

> Hello, I want to conduct normality test to a series of data 
> and get the
> p-value for each subset. I am using the following codes, but 
> it does not
> work.
> 
> tapply(re, list(reg, ast), pvalue(shapiro.test))
> 
> Could anyone give me some advice? Many thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply

2007-06-01 Thread Dimitris Rizopoulos
try this:

tapply(re, list(reg, ast), function(x) shapiro.test(x)$p.value)


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: "livia" <[EMAIL PROTECTED]>
To: 
Sent: Friday, June 01, 2007 1:00 PM
Subject: [R] tapply


>
> Hello, I want to conduct normality test to a series of data and get 
> the
> p-value for each subset. I am using the following codes, but it does 
> not
> work.
>
> tapply(re, list(reg, ast), pvalue(shapiro.test))
>
> Could anyone give me some advice? Many thanks.
> -- 
> View this message in context: 
> http://www.nabble.com/tapply-tf3851631.html#a10910748
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply histogram

2007-06-01 Thread livia

Dear members,

I would like to pass the histogram settings to each subset of the dataframe,
and generate a multiple figures graph.

First, can anyone tell me how to generate a multiple figures environment? I
am trying 

mfrow=c(2,4) and nothing appears.

Secondly, I want to pass the following function in tapply()

hist(x, freq=FALSE)
lines(density(x), col="red")
rug(x)

how can I manage it?

Many thanks

-- 
View this message in context: 
http://www.nabble.com/tapply-histogram-tf3852186.html#a10912441
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply

2007-07-19 Thread John Kane
I do not understand what you want.  If aps is constant
over each class then the mean for each class is equal
to any value of aps.  

Using your example you can do 

tapply(icu1$aps, icu1$d, mean)

but it does not give you anything new.  Can you
explain the problem a bit more? 


--- sigalit mangut-leiba <[EMAIL PROTECTED]> wrote:

> hello,
> i want to compute the mean of a variable ("aps") for
> every class
> (1,2, and 3).
> every id have a few obs., "aps" and class are
> constant over id.
> like this:
> id   aps class
> 1  11   2
> 1  11   2
> 1  11   2
> 1  11   2
> 1  11   2
> 2   83
> 2   83
> 2   83
> 3  12   2
> 3  12   2
> .
> .
> 
> i tried:
> 
> tapply(icu1$aps_st, icu1$hidclass, function(z)
> mean(unique(z)))
> 
> but it's counting every row and not every id.
> 
> thank you,
> 
> Sigalit.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply

2007-07-19 Thread Henrique Dallazuanna
I also don't understand, but perhaps:

with(df, tapply(aps, list(class, id), mean))


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O


On 19/07/07, sigalit mangut-leiba <[EMAIL PROTECTED]> wrote:
> hello,
> i want to compute the mean of a variable ("aps") for every class
> (1,2, and 3).
> every id have a few obs., "aps" and class are constant over id.
> like this:
> id   aps class
> 1  11   2
> 1  11   2
> 1  11   2
> 1  11   2
> 1  11   2
> 2   83
> 2   83
> 2   83
> 3  12   2
> 3  12   2
> .
> .
>
> i tried:
>
> tapply(icu1$aps_st, icu1$hidclass, function(z) mean(unique(z)))
>
> but it's counting every row and not every id.
>
> thank you,
>
> Sigalit.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply

2007-07-19 Thread Peter Dalgaard
sigalit mangut-leiba wrote:
> I'm sorry for the unfocused questions, i'm new here...
> the output should be:
> classaps_mean
> 1  na
> 2 11.5
> 3   8
>
> the mean aps of every class, when every id count *once*,  for example: class
> 2, mean= (11+12)/2=11.5
> hope it's clearer.
>   
Much... Get the first record for each individual from (e.g.)

icul.redux <- subset(icul, !duplicated(id))

then use tapply as before using variables from icul.redux. Or in one go

with(
  subset(icul, !duplicated(id)),
  tapply(aps, class, mean, na.rm=TRUE)
)


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply, levelinformation

2007-02-15 Thread Antje
Hello,

I have another question. I would like to plot something within a self 
written function (plotdensity) called by tapply

t <- tapply(mat, classes, plotdensity)

Now I would like to add each plot the class/level as title.
How can I do this?

Antje

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


RE: [R] tapply

2004-03-18 Thread Gabor Grothendieck



Try this (untested):

aggregate( data[,6:8], list(date = as.matrix(data[,1:3]) %*% c(1,100,1)), mean )

---
Date:   Thu, 18 Mar 2004 09:39:02 +0100 
From:   <[EMAIL PROTECTED]>
To:   <[EMAIL PROTECTED]> 
Subject:   [R] tapply 

 
Dear all
I have a dataframe containing hourly data of 3 parameters. 
I would like to create a dataframe containg daily mean values of these 
parameters. Additionally I want to keep information about time of 
measurement ("year","month","day"). 
With the function tapply I can average over a column of the dataframe. 
I can repeat the function 2 time and merge the vectors. In this way I 
obtain my new dataframe (see below).If I want to add the column day, 
month and year I can repeat tapply other three time. This system works. 


Question: is there a function that average in a single step over the 3 
columns?

Thanks a lot for your answer!
Regards
Mike Campana 

 read the data
setwd("c:/R")
data <- NULL
data <- as.data.frame(read.table(file="Montreal.txt",header=F,skip=15))
colnames(data) 
<-c("year","month","day","hour","min","temp","press","ozone")
### create mean value
temp_daily <- 
tapply(data$temp,data$year*1+data$month*100+data$day,FUN=mean)
press_daily <- 
tapply(data$press,data$year*1+data$month*100+data$day,FUN=mean)
ozone_daily <- 
tapply(data$ozone,data$year*1+data$month*100+data$day,FUN=mean)
### merge the data
newdata <- as.data.frame (cbind(temp_daily,temp_daily,temp_daily))

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply

2004-03-18 Thread Thomas Petzoldt
[EMAIL PROTECTED] wrote:

Question: is there a function that average in a single step over the 3 
columns?
You may look for ?aggregate

Thomas P.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply & hist

2004-05-13 Thread Vittorio
I'm learning how to use tapply. 
Now I'm having a go at the following code in which dati contains almost 600 
lines, Pot - numeric - are the capacities of power plants and SGruppo - text 
- the corresponding six technologies ("CCC", "CIC","TGC", "CSC","CPC", "TE"). 
.

dati=sqlQuery(canale,"select Id,SGruppo,Classe, NGruppo,ProdNetta,Pot from 
SintesiQuery")
attach(dati)
# Grouping by technology
tapply(Pot,SGruppo,sum)
...
# Histograms by technology
par(mfrow=c(2,3)) 
tapply(Pot,SGruppo,hist)
detach(dati)

It all works great but  tapply(Pot,SGruppo,hist) produces 6 histograms with 
the titles and the xlab labels in a generic form, something like integer[1], 
integer[2], ... while I'd like to have each graph indicating the 
mentioned technologies.
I've been trying issuing 
tech=c("CCC", "CIC","TGC", "CSC","CPC", "TE")
tapply(Pot,SGruppo,hist, main=tech)

but R prints in each histogram the six values in the title without cycling 
among them.

How can I obtain what I want?

Ciao
Vittorio 


to no avail

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply question

2006-07-06 Thread Jacques VESLOT
i think you can't have column with the same names.

 > data.frame(AAA=1:3, AAA=4:6)
   AAA AAA.1
1   1 4
2   2 5
3   3 6

but you could subset the data frame by names using substring():

sapply(unique(substring(names(data1), 1, 3)), function(x)
rowMeans(data1[, substring(names(data1), 1, 3) == x])


---
Jacques VESLOT

CNRS UMR 8090
I.B.L (2ème étage)
1 rue du Professeur Calmette
B.P. 245
59019 Lille Cedex

Tel : 33 (0)3.20.87.10.44
Fax : 33 (0)3.20.87.10.31

http://www-good.ibl.fr
---


[EMAIL PROTECTED] a écrit :
> I think I understand tapply but i still
> can't figure out how to do the following.
> 
> I have a dataframe where some of the column names are the same
> and i want to make a new dataframe where columns
> that have the same name are averaged by row.
> 
> so, if the data frame, DF, was 
> 
> AAABBB  CCC   AAA DDD
> 1   07 11  13
> 20   8 12  14
> 30   6  0  15
> 
> then the resulting data frame would be exactly the same except
> that the AAA column would be 
> 
> 6   comes from  (11 + 1)/2
> 7comes from  (12 + 2)/2
> 3   stays 3 because the element in the other AAA is zero
> so i don't want to average that one. it shoulsd just stay 3.
> 
> So, I do 
> 
> DF[DF == 0]<-NA
> rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
> revisedDF<-tapply(seq(DF),names(DF),rowmeans)
> 
> there are two problems with this :
> 
> 1) i need to go through the rows of the same name, not the columns
> so i don't think seq(DF) is right because that goes through 
> the columns but i want to go through rows.
> 
> 2) BBB will come back with ALL NA's ( since
> it was unique and there was nothing else to average ( and I don't know how to 
> transform that BB column to all zero's.
> 
> thanks and i'm sorry for so many questions. i'm getting bettter with this 
> stuff and my questions will decrease soon.
> 
> my guess is that i no longer should be using tapply ?
> and should be using some other version of apply.
> thanks
>  mark
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply question

2006-07-06 Thread jim holtman
I think this does what you want:


> In <- "AAABBB  CCC   AAA DDD
+ 1   07 11  13
+ 20   8 12  14
+ 30   6  0  15"
> DF <- read.table(textConnection(In), header=TRUE, check.names=FALSE)
>
> DF[DF == 0]<-NA
> rowaverage<-function(x) rowMeans(DF[x],na.rm=TRUE)
> revisedDF<-tapply(seq(DF),names(DF),rowaverage)
> revisedDF
$AAA
1 2 3
6 7 3

$BBB
 1  2  3
NA NA NA

$CCC
1 2 3
7 8 6

$DDD
 1  2  3
13 14 15

> do.call('cbind', revisedDF)
  AAA BBB CCC DDD
1   6  NA   7  13
2   7  NA   8  14
3   3  NA   6  15
>
>



On 7/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> I think I understand tapply but i still
> can't figure out how to do the following.
>
> I have a dataframe where some of the column names are the same
> and i want to make a new dataframe where columns
> that have the same name are averaged by row.
>
> so, if the data frame, DF, was
>
> AAABBB  CCC   AAA DDD
> 1   07 11  13
> 20   8 12  14
> 30   6  0  15
>
> then the resulting data frame would be exactly the same except
> that the AAA column would be
>
> 6   comes from  (11 + 1)/2
> 7comes from  (12 + 2)/2
> 3   stays 3 because the element in the other AAA is zero
> so i don't want to average that one. it shoulsd just stay 3.
>
> So, I do
>
> DF[DF == 0]<-NA
> rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
> revisedDF<-tapply(seq(DF),names(DF),rowmeans)
>
> there are two problems with this :
>
> 1) i need to go through the rows of the same name, not the columns
> so i don't think seq(DF) is right because that goes through
> the columns but i want to go through rows.
>
> 2) BBB will come back with ALL NA's ( since
> it was unique and there was nothing else to average ( and I don't know how
> to transform that BB column to all zero's.
>
> thanks and i'm sorry for so many questions. i'm getting bettter with this
> stuff and my questions will decrease soon.
>
> my guess is that i no longer should be using tapply ?
> and should be using some other version of apply.
> thanks
> mark
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply and storage

2007-04-10 Thread Floris Van Ogtrop
Dear R-Users,

I have the following problem of which I have provided a simple example.
Using the tapply command I can efficiently run the function genflo for
all months and years. I am new to R and I do not understand how I can
store the results of f such that as the function loops through the
months, I can retrieve the tail value of f from the previous month and
use this as a condition for the current month iteration (note the
comments in the code).
Forgive me if I am not clear.

Thanks in advance

Floris   

year <- c(1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972,
1972, 1972)
month <- c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)
discharge <- c(100921, 89885, 81493, 74876, 70579, 68305, 66337, 63095,
58446, 52674, 44028, 31956)
n11 <- c(1,1,1,1,1,1,1,1,1,1,1,1)
n00 <- c(0,0,0,0,0,0,0,0,0,0,0,0)
n01 <- c(0,0,0,0,0,0,0,0,0,0,0,0)
n10 <- c(0,0,0,0,0,0,0,0,0,0,0,0)
flow_data <- data.frame(year, month, discharge, n11, n00, n01, n10)

genflo <- function(X)
{
  n <- nrow(X)
  if((sum(X$n11) + sum(X$n10)) > 0) {
  Pww <- sum(X$n11)/(sum(X$n11) + sum(X$n10))
  } else
{Pww <- 0}
  if((sum(X$n00) + sum(X$n01)) > 0) {
  Pdd <- sum(X$n00)/(sum(X$n00) + sum(X$n01))
} else
{Pdd <- 0}
  r <- vector(length = n)
rand <- runif(r, 0, 1)
  f <- vector(length = n)# 
  for (i in 2:n) {   #
if(X$discharge[i-1] > 0) { # X$discharge needs to be replaced by
   # the tail value of f from the
# previous iteration (month) 
  if(rand[i] > Pww) {
f[i] <- 0
} else
{f[i] <- 1}
} else
  {if(rand[i] > Pdd){
f[i] <- 1
  } else
{f[i] <- 0}
  }
}
return(f)
}

gen_flow_days <- by(flow_data, list(month = flow_data[,2], year =
flow_data[,1]), genflo)
gen_flow_days <- unlist(gen_flow_days)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply question

2005-12-16 Thread Uwe Ligges
Frank Johannes wrote:

> HI,
> Suppose I have the following data structure.
>LRT  tp
> 1   1.50654010 522
> 2   0.51793929 522
> 3   0.90340299 522
> 4   1.20293325 522
> 5   1.05578774 523
> 6   0.01617942 523
> 7   0.68183543 523
> 8   0.43820244 523
> 9   1.14123995 524
> 10  0.05809550 524
> 11  0.93061597 524
> 12  1.39739700 524
> 13  1.05220953 525
> 14  0.03471461 525
> 15  0.63168798 525
> 16  1.40592603 525
> 17  1.41884492 526
> 18  0.23388479 526
> 19  0.21881064 526
> 20  0.99710830 526
> 21  2.02054187 527
> 22  1.99872887 527
> 23  1.04187450 527
> 24  1.31556807 527
> 25  2.5190 528
> 26  2.94778561 528
> 27  1.88800177 528
> 28  2.08249941 528
> 
> 
> I have succesfully used a command line such as the one below to get
> maxima for each "tp-category'
> 
> data.out<-data[tapply(LRT,tp, function(x) which(LRT==max(x))),]
> 
> However, when I try it on the above data, it gives me the following
> error message:
> 
>>Error in "[.data.frame"(data, tapply(LRT, tp, function(x) which(LRT ==  : 
> 
> invalid subscript type


Works for me. Look at your data structures and check whether your data 
frame is OK.

Or much better easier:

   tapply(LRT, tp, max)

Uwe Ligges




> I don't know what to do.
> Thanks for your help
> 
> --
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply histogram

2007-06-01 Thread Aimin Yan
use lattice graph


At 08:00 AM 6/1/2007, livia wrote:

>Dear members,
>
>I would like to pass the histogram settings to each subset of the dataframe,
>and generate a multiple figures graph.
>
>First, can anyone tell me how to generate a multiple figures environment? I
>am trying
>
>mfrow=c(2,4) and nothing appears.
>
>Secondly, I want to pass the following function in tapply()
>
>hist(x, freq=FALSE)
>lines(density(x), col="red")
>rug(x)
>
>how can I manage it?
>
>Many thanks
>
>--
>View this message in context: 
>http://www.nabble.com/tapply-histogram-tf3852186.html#a10912441
>Sent from the R help mailing list archive at Nabble.com.
>
>__
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply histogram

2007-06-01 Thread Marc Schwartz
On Fri, 2007-06-01 at 06:00 -0700, livia wrote:
> Dear members,
> 
> I would like to pass the histogram settings to each subset of the dataframe,
> and generate a multiple figures graph.
> 
> First, can anyone tell me how to generate a multiple figures environment? I
> am trying 
> 
> mfrow=c(2,4) and nothing appears.
> 
> Secondly, I want to pass the following function in tapply()
> 
> hist(x, freq=FALSE)
> lines(density(x), col="red")
> rug(x)
> 
> how can I manage it?
> 
> Many thanks

In this case, you would not want to use one of the *apply() family of
functions. First, it does not save you anything and second, these
functions are designed to return some type of R object, which you don't
want here.

Better to use a for() loop and if you wish, encapsulate the loop in a
function. Something along the lines of the following, which actually
defines a new 'formula' method for hist() (though not fully tested):


hist.formula <- function(formula, data, cols, rows, ...)
{
  DF <- model.frame(formula, data = data, ...)
  DF.split <- split(DF[[1]], DF[[2]])
  
  par(mfrow = c(cols, rows))

  for (i in names(DF.split))
  {
Col <- DF.split[[i]]
hist(Col, freq = FALSE, main = i, ...)
lines(density(Col), col = "red")
rug(Col)
  }
}



The function will take the formula, create a data frame comprised of the
formula terms and then loop over the list of data frames created by
split(). 

So we call it as follows:


  hist(Sepal.Length ~ Species, data = iris, 2, 2)


Based upon the formula specification, you will then get a matrix of
histograms, where each will be titled with the factor level used to
split the original data frame.

You could further consolidate the function by implementing an automated
means to determine the number of rows and columns required in the plot
matrix, but I'll leave that for you.

See ?model.frame and ?split

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply grand mean

2007-08-08 Thread Lauri Nikkinen
Hi R-users,

I have a data.frame like this (modificated from
https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).

y1 <- rnorm(20) + 6.8
y2 <- rnorm(20) + (1:20*1.7 + 1)
y3 <- rnorm(20) + (1:20*6.7 + 3.7)
y <- c(y1,y2,y3)
x <- rep(1:5,12)
f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
d <- data.frame(x=x,y=y, f=f)

and this is how I can calculate mean of these levels.

tapply(d$y, list(d$x, d$f), mean)

But how can I calculate the mean of d$x 1 and 2 and the grand mean of d$x 1,
2, 3, 4, 5 (within d$f) into a table?

Regards,
Lauri

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply, data.frame problem

2007-01-17 Thread Lauri Nikkinen
Hi R-users,

I'm quite new to R and trying to learn the basics. I have a following
problem concerning the convertion of array object into data frame. I have
made following data sets

tmp1 <- rnorm(100)
tmp2 <- gl(10,2,length=100)
tmp3 <- as.data.frame(cbind(tmp1,tmp2))
tmp3.sum <- tapply(tmp3$tmp1,tmp3$tmp2,sum)
tmp3.sum <- as.data.frame(tapply(tmp1,tmp2,sum))
and I want the levels from tmp2 be shown as a column in the data.frame, not
as row name as it now does. To put it in another way, as a result, I want a
data frame with two columns: levels and the sums of those levels. Row names
can be, for example, numbers from 1 to 10.

-Lauri Nikkinen
Lahti, Finland

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "tapply" and "data.frame"?

2007-01-23 Thread Zhang Jian
I want to transform the data by "tapply" to one dataframe. But I can not get
it.
For example:
> tst=tapply(point,pp,length)
> tst[1:10]
  p1   p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
  1   5   1   8   6   5   8   7   4   4
> res=as.data.frame(tst)  # I try to transform it
> res[1:10,]
  p1   p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
  1   5   1   8   6   5   8   7   4   4
How to transfrom it like the following:
>res
point ind
1   p1   1
2   p10   5
3 p100   1
4 p1000   8
5 p1001   6

Thanks!

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply, levelinformation

2007-02-16 Thread niederlein-rstat
Hi Jim,

jim holtman schrieb:
> Here is one way:
>  
> t <- split(mat, classes)
> for (i in names(t)) plotdensity(t[[i]], main=i)
> 

But then I don't use the advantages of the tapply anymore...

> What is the problem you are trying to solve?

I have a set of data (multiple files), which belong to different 
conditions (one or more files per condition). I wanted to read the data 
set and a "description" of the conditions and then automatically create 
plots for data of the same condition.

Maybe it's much to complicate the way I do...

Antje

-
NEU: Fragen stellen - Wissen, Meinungen und Erfahrungen teilen. Jetzt auf 
Yahoo! Clever.
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply, levelinformation

2007-02-16 Thread Antje
Hi Jim,

jim holtman schrieb:
> Here is one way:
>  
> t <- split(mat, classes)
> for (i in names(t)) plotdensity(t[[i]], main=i)
> 

But then I don't use the advantages of the tapply anymore...

> What is the problem you are trying to solve?

I have a set of data (multiple files), which belong to different
conditions (one or more files per condition). I wanted to read the data
set and a "description" of the conditions and then automatically create
plots for data of the same condition.

Maybe it's much to complicate the way I do...

Antje

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply, levelinformation

2007-02-16 Thread jim holtman
But it does the same thing.  What 'advantage' of tapply do you think that
you are missing?  Performance is probably not impacted since most of the
time is in the plot.

On 2/16/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> Hi Jim,
>
> jim holtman schrieb:
> > Here is one way:
> >
> > t <- split(mat, classes)
> > for (i in names(t)) plotdensity(t[[i]], main=i)
> >
>
> But then I don't use the advantages of the tapply anymore...
>
> > What is the problem you are trying to solve?
>
> I have a set of data (multiple files), which belong to different
> conditions (one or more files per condition). I wanted to read the data
> set and a "description" of the conditions and then automatically create
> plots for data of the same condition.
>
> Maybe it's much to complicate the way I do...
>
> Antje
>
> -
> NEU: Fragen stellen - Wissen, Meinungen und Erfahrungen teilen. Jetzt auf
> Yahoo! Clever.
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply and names

2005-01-25 Thread Göran Broström
I have a data frame containing children, with variables 'year' = birth
year, and 'm.id' = mother's id number. Let's assume that all the births of
each mother is represented in the data frame. 

Now I want to create a subset of this data frame containing all children,
whose mother's first birth was in the year 1816 or later. This seems to
work: 

mid <- tapply(dat$year, dat$m.id, min)
mid <- as.numeric(names(mid)[mid >= 1816])
dat <- dat[dat$m.id %in% mid, ]

but I'm worried about the second line, because the output from 'tapply'
isn't documented to have a 'dimnames' attribute (although it has one, at
least in R-2.1.0, 2005-01-19). Another aspect is that this code relies on
m.id being numeric; I would have to change it if the type of m.id changes
to, eg, character.

So, question: Is there a better way of doing this?
-- 
 Göran Broströmtel: +46 90 786 5223
 Department of Statistics  fax: +46 90 786 6614
 Umeå University   http://www.stat.umu.se/egna/gb/
 SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply & hist

2004-05-13 Thread Jason Turner
> ...
> # Histograms by technology
> par(mfrow=c(2,3))
> tapply(Pot,SGruppo,hist)
> detach(dati)
>
> It all works great but  tapply(Pot,SGruppo,hist) produces 6 histograms
> with
> the titles and the xlab labels in a generic form, something like
> integer[1],
> integer[2], ... while I'd like to have each graph indicating the

tapply takes atomic data (usually vectors).  You want to pass rows of a
data frame, so the Pot *and* SGruppo will be sent together; "by()" is very
good for this.  It might be possible (even easy?) to use tapply, but I
just use "by" for these things.

Since dati is your data frame, try this (untested!):

by(dati,dati$SGruppo, function(x,...){
  hist(x$Pot,main=as.character(x$SGruppo[1])) } )

Or, use Lattice:

library(lattice)
histogram( ~ Pot | SGruppo, data=dati)

Cheers

Jason

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply & hist

2004-05-13 Thread Gabor Grothendieck

As another respondent already mentioned, Lattice is probably the way to
go on this one but if you do want to use tapply try this:

names(Pot) <- SGruppo
dummy <- tapply(Pot,SGruppo,function(x)hist(x,main=names(x)[1],xlab=NULL))


Vittorio  virgilio.it> writes:

: 
: I'm learning how to use tapply. 
: Now I'm having a go at the following code in which dati contains almost 600 
: lines, Pot - numeric - are the capacities of power plants and SGruppo - text 
: - the corresponding six technologies 
("CCC", "CIC","TGC", "CSC","CPC", "TE"). 
: .
: 
: dati=sqlQuery(canale,"select Id,SGruppo,Classe, NGruppo,ProdNetta,Pot from 
: SintesiQuery")
: attach(dati)
: # Grouping by technology
: tapply(Pot,SGruppo,sum)
: ...
: # Histograms by technology
: par(mfrow=c(2,3)) 
: tapply(Pot,SGruppo,hist)
: detach(dati)
: 
: It all works great but  tapply(Pot,SGruppo,hist) produces 6 histograms with 
: the titles and the xlab labels in a generic form, something like integer[1], 
: integer[2], ... while I'd like to have each graph indicating the 
: mentioned technologies.
: I've been trying issuing 
: tech=c("CCC", "CIC","TGC", "CSC","CPC", "TE")
: tapply(Pot,SGruppo,hist, main=tech)
: 
: but R prints in each histogram the six values in the title without cycling 
: among them.
: 
: How can I obtain what I want?
: 
: Ciao
: Vittorio

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply and weighted means

2006-01-12 Thread Florent Bresson
I' m trying to compute weighted mean on different
groups but it only returns NA. If I use the following
data.frame truc:

x  y  w
1  1  1
1  2  2
1  3  1
1  4  2
0  2  1
0  3  2
0  4  1
0  5  1

where x is a factor, and then use the command :

tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)

I just get NA. What's the problem ? What can I do ?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply grand mean

2007-08-08 Thread Chuck Cleland
Lauri Nikkinen wrote:
> Hi R-users,
> 
> I have a data.frame like this (modificated from
> https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).
> 
> y1 <- rnorm(20) + 6.8
> y2 <- rnorm(20) + (1:20*1.7 + 1)
> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> y <- c(y1,y2,y3)
> x <- rep(1:5,12)
> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> d <- data.frame(x=x,y=y, f=f)
> 
> and this is how I can calculate mean of these levels.
> 
> tapply(d$y, list(d$x, d$f), mean)
> 
> But how can I calculate the mean of d$x 1 and 2 and the grand mean of d$x 1,
> 2, 3, 4, 5 (within d$f) into a table?

  You might like the tables produced by summary.formula() in the Hmisc
package:

library(Hmisc)

summary(y ~ x + f, data = d, fun=mean, method="cross", overall=TRUE)

 UseMethod by x, f

+-+
|N|
|y|
+-+
+---+-+-+-+-+
| x |   lev1  |   lev2  |   lev3  |   ALL   |
+---+-+-+-+-+
|1  | 4   | 4   | 4   |12   |
|   | 6.452326|15.861256|61.393455|27.902346|
+---+-+-+-+-+
|2  | 4   | 4   | 4   |12   |
|   | 7.403041|17.296270|68.208299|30.969203|
+---+-+-+-+-+
|3  | 4   | 4   | 4   |12   |
|   | 6.117648|17.976864|73.479837|32.524783|
+---+-+-+-+-+
|4  | 4   | 4   | 4   |12   |
|   | 7.831390|19.696998|80.323382|35.950590|
+---+-+-+-+-+
|5  | 4   | 4   | 4   |12   |
|   | 6.746213|21.101952|87.430087|38.426084|
+---+-+-+-+-+
|ALL|20   |20   |20   |60   |
|   | 6.910124|18.386668|74.167012|33.154601|
+---+-+-+-+-+

summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method="cross",
overall=TRUE)

 UseMethod by I(x %in% c(1, 2)), f

+-+
|N|
|y|
+-+
+-+-+-+-+-+
|I(x %in% c(1, 2))|   lev1  |   lev2  |   lev3  |   ALL   |
+-+-+-+-+-+
|  FALSE  |12   |12   |12   |36   |
| | 6.898417|19.591938|80.411102|35.633819|
+-+-+-+-+-+
|  TRUE   | 8   | 8   | 8   |24   |
| | 6.927684|16.578763|64.800877|29.435774|
+-+-+-+-+-+
|  ALL|20   |20   |20   |60   |
| | 6.910124|18.386668|74.167012|33.154601|
+-+-+-+-+-+

> Regards,
> Lauri
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply grand mean

2007-08-08 Thread Lauri Nikkinen
Thanks Chuck but I would fancy the output made by tapply because the idea is
to make a barplot based on those values.

-Lauri


2007/8/8, Chuck Cleland <[EMAIL PROTECTED]>:
>
> Lauri Nikkinen wrote:
> > Hi R-users,
> >
> > I have a data.frame like this (modificated from
> > https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).
> >
> > y1 <- rnorm(20) + 6.8
> > y2 <- rnorm(20) + (1:20*1.7 + 1)
> > y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> > y <- c(y1,y2,y3)
> > x <- rep(1:5,12)
> > f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> > d <- data.frame(x=x,y=y, f=f)
> >
> > and this is how I can calculate mean of these levels.
> >
> > tapply(d$y, list(d$x, d$f), mean)
> >
> > But how can I calculate the mean of d$x 1 and 2 and the grand mean of
> d$x 1,
> > 2, 3, 4, 5 (within d$f) into a table?
>
> You might like the tables produced by summary.formula() in the Hmisc
> package:
>
> library(Hmisc)
>
> summary(y ~ x + f, data = d, fun=mean, method="cross", overall=TRUE)
>
> UseMethod by x, f
>
> +-+
> |N|
> |y|
> +-+
> +---+-+-+-+-+
> | x |   lev1  |   lev2  |   lev3  |   ALL   |
> +---+-+-+-+-+
> |1  | 4   | 4   | 4   |12   |
> |   | 6.452326|15.861256|61.393455|27.902346|
> +---+-+-+-+-+
> |2  | 4   | 4   | 4   |12   |
> |   | 7.403041|17.296270|68.208299|30.969203|
> +---+-+-+-+-+
> |3  | 4   | 4   | 4   |12   |
> |   | 6.117648|17.976864|73.479837|32.524783|
> +---+-+-+-+-+
> |4  | 4   | 4   | 4   |12   |
> |   | 7.831390|19.696998|80.323382|35.950590|
> +---+-+-+-+-+
> |5  | 4   | 4   | 4   |12   |
> |   | 6.746213|21.101952|87.430087|38.426084|
> +---+-+-+-+-+
> |ALL|20   |20   |20   |60   |
> |   | 6.910124|18.386668|74.167012|33.154601|
> +---+-+-+-+-+
>
> summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method="cross",
> overall=TRUE)
>
> UseMethod by I(x %in% c(1, 2)), f
>
> +-+
> |N|
> |y|
> +-+
> +-+-+-+-+-+
> |I(x %in% c(1, 2))|   lev1  |   lev2  |   lev3  |   ALL   |
> +-+-+-+-+-+
> |  FALSE  |12   |12   |12   |36   |
> | | 6.898417|19.591938|80.411102|35.633819|
> +-+-+-+-+-+
> |  TRUE   | 8   | 8   | 8   |24   |
> | | 6.927684|16.578763|64.800877|29.435774|
> +-+-+-+-+-+
> |  ALL|20   |20   |20   |60   |
> | | 6.910124|18.386668|74.167012|33.154601|
> +-+-+-+-+-+
>
> > Regards,
> > Lauri
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Chuck Cleland, Ph.D.
> NDRI, Inc.
> 71 West 23rd Street, 8th floor
> New York, NY 10010
> tel: (212) 845-4495 (Tu, Th)
> tel: (732) 512-0171 (M, W, F)
> fax: (917) 438-0894
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply grand mean

2007-08-08 Thread Chuck Cleland
Lauri Nikkinen wrote:
> Thanks Chuck but I would fancy the output made by tapply because the
> idea is to make a barplot based on those values.
>  
> -Lauri

sum1 <- summary(y ~ x + f, data = d, fun=mean,
method="cross", overall=TRUE)

df <- data.frame(x = sum1$x, f = sum1$f, y = sum1$S)

df
 xf y
11 lev1  6.452326
22 lev1  7.403041
33 lev1  6.117648
44 lev1  7.831390
55 lev1  6.746213
6  ALL lev1  6.910124
71 lev2 15.861256
82 lev2 17.296270
93 lev2 17.976864
10   4 lev2 19.696998
11   5 lev2 21.101952
12 ALL lev2 18.386668
13   1 lev3 61.393455
14   2 lev3 68.208299
15   3 lev3 73.479837
16   4 lev3 80.323382
17   5 lev3 87.430087
18 ALL lev3 74.167012
19   1  ALL 27.902346
20   2  ALL 30.969203
21   3  ALL 32.524783
22   4  ALL 35.950590
23   5  ALL 38.426084
24 ALL  ALL 33.154601

library(lattice)

barchart(y ~ x | f, data = df, layout=c(4,1,1))

OR

barchart(S ~ x | f, data = sum1, layout=c(4,1,1))

> 2007/8/8, Chuck Cleland <[EMAIL PROTECTED]
> >:
> 
> Lauri Nikkinen wrote:
> > Hi R-users,
> >
> > I have a data.frame like this (modificated from
> > https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).
> >
> > y1 <- rnorm(20) + 6.8
> > y2 <- rnorm(20) + (1:20* 1.7 + 1)
> > y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> > y <- c(y1,y2,y3)
> > x <- rep(1:5,12)
> > f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> > d <- data.frame(x=x,y=y, f=f)
> >
> > and this is how I can calculate mean of these levels.
> >
> > tapply(d$y, list(d$x, d$f), mean)
> >
> > But how can I calculate the mean of d$x 1 and 2 and the grand mean
> of d$x 1,
> > 2, 3, 4, 5 (within d$f) into a table?
> 
> You might like the tables produced by summary.formula() in the Hmisc
> package:
> 
> library(Hmisc)
> 
> summary(y ~ x + f, data = d, fun=mean, method="cross", overall=TRUE)
> 
> UseMethod by x, f
> 
> +-+
> |N|
> |y|
> +-+
> +---+-+-+-+-+
> | x |   lev1  |   lev2  |   lev3  |   ALL   |
> +---+-+-+-+-+
> |1  | 4   | 4   | 4   |12   |
> |   | 6.452326|15.861256|61.393455|27.902346|
> +---+-+-+-+-+
> |2  | 4   | 4   | 4   |12   |
> |   | 7.403041|17.296270|68.208299|30.969203|
> +---+-+-+-+-+
> |3  | 4   | 4   | 4   |12   |
> |   | 6.117648|17.976864|73.479837|32.524783|
> +---+-+-+-+-+
> |4  | 4   | 4   | 4   |12   |
> |   | 7.831390|19.696998|80.323382|35.950590|
> +---+-+-+-+-+
> |5  | 4   | 4   | 4   |12   |
> |   | 6.746213|21.101952|87.430087|38.426084|
> +---+-+-+-+-+
> |ALL|20   |20   |20   |60   |
> |   | 6.910124|18.386668|74.167012|33.154601|
> +---+-+-+-+-+
> 
> summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method="cross",
> overall=TRUE)
> 
> UseMethod by I(x %in% c(1, 2)), f
> 
> +-+
> |N|
> |y|
> +-+
> +-+-+-+-+-+
> |I(x %in% c(1, 2))|   lev1  |   lev2  |   lev3  |   ALL   |
> +-+-+-+-+-+
> |  FALSE  |12   |12   |12   |36   |
> | | 6.898417|19.591938|80.411102|35.633819|
> +-+-+-+-+-+
> |  TRUE   | 8   | 8   | 8   |24   |
> | | 6.927684|16.578763|64.800877|29.435774|
> +-+-+-+-+-+
> |  ALL|20   |20   |20   |60   |
> | | 6.910124|18.386668|74.167012|33.154601|
> +-+-+-+-+-+
> 
> > Regards,
> > Lauri
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch  mailing
> list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> --
> Chuck Cleland, Ph.D.
> NDRI, Inc.
> 71 West 23rd Street, 8th floor
> New York, NY 10010
> tel: (212) 845-4495 (Tu, Th)
> tel: (732) 512-0171 (M, W, F)
> fax: (917) 438-0894 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894


Re: [R] tapply, data.frame problem

2007-01-17 Thread Chuck Cleland
Lauri Nikkinen wrote:
> Hi R-users,
> 
> I'm quite new to R and trying to learn the basics. I have a following
> problem concerning the convertion of array object into data frame. I have
> made following data sets
> 
> tmp1 <- rnorm(100)
> tmp2 <- gl(10,2,length=100)
> tmp3 <- as.data.frame(cbind(tmp1,tmp2))
> tmp3.sum <- tapply(tmp3$tmp1,tmp3$tmp2,sum)
> tmp3.sum <- as.data.frame(tapply(tmp1,tmp2,sum))
> and I want the levels from tmp2 be shown as a column in the data.frame, not
> as row name as it now does. To put it in another way, as a result, I want a
> data frame with two columns: levels and the sums of those levels. Row names
> can be, for example, numbers from 1 to 10.

aggregate(tmp3[1], tmp3[2], sum)
   tmp2tmp1
1 1  8.41550650
2 2  3.65831086
3 3 -0.26296334
4 4  3.45368671
5 5 -4.64383794
6 6  0.25640949
7 7  0.02832348
8 8 -0.03811150
9 9  1.41724121
10   10 -1.06780900

?aggregate

> -Lauri Nikkinen
> Lahti, Finland
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "tapply" and "data.frame"?

2007-01-23 Thread jim holtman
Is this what you want:

> tst
   p1   p10  p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
1 5 1 8 6 5 8 7 4 4
> data.frame(point=names(tst), ind=tst)
  point ind
p1   p1   1
p10 p10   5
p100   p100   1
p1000 p1000   8
p1001 p1001   6
p1002 p1002   5
p1003 p1003   8
p1004 p1004   7
p1005 p1005   4
p1006 p1006   4
>


On 1/23/07, Zhang Jian <[EMAIL PROTECTED]> wrote:
> I want to transform the data by "tapply" to one dataframe. But I can not get
> it.
> For example:
> > tst=tapply(point,pp,length)
> > tst[1:10]
>  p1   p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
>  1   5   1   8   6   5   8   7   4   4
> > res=as.data.frame(tst)  # I try to transform it
> > res[1:10,]
>  p1   p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
>  1   5   1   8   6   5   8   7   4   4
> How to transfrom it like the following:
> >res
> point ind
> 1   p1   1
> 2   p10   5
> 3 p100   1
> 4 p1000   8
> 5 p1001   6
>
> Thanks!
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


RE: [R] tapply and names

2005-01-25 Thread Liaw, Andy
> From: Göran Broström
> 
> I have a data frame containing children, with variables 'year' = birth
> year, and 'm.id' = mother's id number. Let's assume that all 
> the births of
> each mother is represented in the data frame. 
> 
> Now I want to create a subset of this data frame containing 
> all children,
> whose mother's first birth was in the year 1816 or later. 
> This seems to
> work: 
> 
> mid <- tapply(dat$year, dat$m.id, min)
> mid <- as.numeric(names(mid)[mid >= 1816])
> dat <- dat[dat$m.id %in% mid, ]
> 
> but I'm worried about the second line, because the output 
> from 'tapply'
> isn't documented to have a 'dimnames' attribute (although it 
> has one, at
> least in R-2.1.0, 2005-01-19). Another aspect is that this 
> code relies on
> m.id being numeric; I would have to change it if the type of 
> m.id changes
> to, eg, character.
> 
> So, question: Is there a better way of doing this?

Would this work?

  dat <- dat[ave(dat$year, dat$m.id, min) >= 1816, ]

Andy

> -- 
>  Göran Broströmtel: +46 90 786 5223
>  Department of Statistics  fax: +46 90 786 6614
>  Umeå University   http://www.stat.umu.se/egna/gb/
>  SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and names

2005-01-25 Thread Dimitris Rizopoulos
your approach, after omitting the "as.numeric()" in the second line, 
seems to work even for `m.id' being factor, i.e.,

dat <- data.frame(m.id=rep(letters[1:10], 10), year=sample(1805:1950, 
100, TRUE))
###
mid <- tapply(dat$year, dat$m.id, min)
mid <- names(mid)[mid >= 1816]
dat. <- dat[dat$m.id %in% mid, ]
dat; dat.

but maybe there is something better.
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
- Original Message - 
From: "Göran Broström" <[EMAIL PROTECTED]>
To: 
Sent: Tuesday, January 25, 2005 3:55 PM
Subject: [R] tapply and names


I have a data frame containing children, with variables 'year' = 
birth
year, and 'm.id' = mother's id number. Let's assume that all the 
births of
each mother is represented in the data frame.

Now I want to create a subset of this data frame containing all 
children,
whose mother's first birth was in the year 1816 or later. This seems 
to
work:

   mid <- tapply(dat$year, dat$m.id, min)
   mid <- as.numeric(names(mid)[mid >= 1816])
   dat <- dat[dat$m.id %in% mid, ]
but I'm worried about the second line, because the output from 
'tapply'
isn't documented to have a 'dimnames' attribute (although it has 
one, at
least in R-2.1.0, 2005-01-19). Another aspect is that this code 
relies on
m.id being numeric; I would have to change it if the type of m.id 
changes
to, eg, character.

So, question: Is there a better way of doing this?
--
Göran Broströmtel: +46 90 786 5223
Department of Statistics  fax: +46 90 786 6614
Umeå University   http://www.stat.umu.se/egna/gb/
SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and names

2005-01-25 Thread Göran Broström
On Tue, Jan 25, 2005 at 10:43:24AM -0500, Liaw, Andy wrote:
> > From: Göran Broström
> > 
> > I have a data frame containing children, with variables 'year' = birth
> > year, and 'm.id' = mother's id number. Let's assume that all 
> > the births of
> > each mother is represented in the data frame. 
> > 
> > Now I want to create a subset of this data frame containing 
> > all children,
> > whose mother's first birth was in the year 1816 or later. 
> > This seems to
> > work: 
> > 
> > mid <- tapply(dat$year, dat$m.id, min)
> > mid <- as.numeric(names(mid)[mid >= 1816])
> > dat <- dat[dat$m.id %in% mid, ]
> > 
> > but I'm worried about the second line, because the output 
> > from 'tapply'
> > isn't documented to have a 'dimnames' attribute (although it 
> > has one, at
> > least in R-2.1.0, 2005-01-19). Another aspect is that this 
> > code relies on
> > m.id being numeric; I would have to change it if the type of 
> > m.id changes
> > to, eg, character.
> > 
> > So, question: Is there a better way of doing this?
> 
> Would this work?
> 
>   dat <- dat[ave(dat$year, dat$m.id, min) >= 1816, ]

Yes, but you (or I) need

> dat <- dat[ave(dat$year, dat$m.id, FUN = min) >= 1816, ]
 ^
(took me some time to figure out), because

?ave

Usage:

 ave(x, ..., FUN = mean)

Thanks Andy for giving me 'ave'! And thanks to Dimitris for his suggestion. 

Göran

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply and NA value

2005-03-25 Thread Leonardo Lami
Hi,
I'm writing for a little help.
I have a dataframe with same NA value and I'd like to obtain the means of the 
value of a coloumn grouped by the levels of a factor coloumn of the datframe.
I'm using the function "tapply" but I see that if only a NA value is present 
the result is NA.
There is an option to have the correct result or I must use an other function?

Thanks of all
Leonardo
-- 
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: (+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and weighted means

2006-01-12 Thread Dimitris Rizopoulos
you need also to split the 'w' column, for each level of 'x'; you 
could use:

lapply(split(truc, truc$x), function(z) weighted.mean(z$y, z$w))


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm



- Original Message - 
From: "Florent Bresson" <[EMAIL PROTECTED]>
To: "R-help" 
Sent: Thursday, January 12, 2006 3:44 PM
Subject: [R] tapply and weighted means


> I' m trying to compute weighted mean on different
> groups but it only returns NA. If I use the following
> data.frame truc:
>
> x  y  w
> 1  1  1
> 1  2  2
> 1  3  1
> 1  4  2
> 0  2  1
> 0  3  2
> 0  4  1
> 0  5  1
>
> where x is a factor, and then use the command :
>
> tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)
>
> I just get NA. What's the problem ? What can I do ?
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and weighted means

2006-01-12 Thread Frank E Harrell Jr
Dimitris Rizopoulos wrote:
> you need also to split the 'w' column, for each level of 'x'; you 
> could use:
> 
> lapply(split(truc, truc$x), function(z) weighted.mean(z$y, z$w))
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris

Or:
library(Hmisc)
?wtd.mean
The help file has a built-in example of this.
Frank

> 
> 
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
> 
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://www.med.kuleuven.be/biostat/
>  http://www.student.kuleuven.be/~m0390867/dimitris.htm
> 
> 
> 
> - Original Message - 
> From: "Florent Bresson" <[EMAIL PROTECTED]>
> To: "R-help" 
> Sent: Thursday, January 12, 2006 3:44 PM
> Subject: [R] tapply and weighted means
> 
> 
> 
>>I' m trying to compute weighted mean on different
>>groups but it only returns NA. If I use the following
>>data.frame truc:
>>
>>x  y  w
>>1  1  1
>>1  2  2
>>1  3  1
>>1  4  2
>>0  2  1
>>0  3  2
>>0  4  1
>>0  5  1
>>
>>where x is a factor, and then use the command :
>>
>>tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)
>>
>>I just get NA. What's the problem ? What can I do ?
>>
>>__
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
> 
> 
> 
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and weighted means

2006-01-12 Thread Gavin Simpson
On Thu, 2006-01-12 at 15:44 +0100, Florent Bresson wrote:
> I' m trying to compute weighted mean on different
> groups but it only returns NA. If I use the following
> data.frame truc:
> 
> x  y  w
> 1  1  1
> 1  2  2
> 1  3  1
> 1  4  2
> 0  2  1
> 0  3  2
> 0  4  1
> 0  5  1
> 
> where x is a factor, and then use the command :
> 
> tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)
> 
> I just get NA. What's the problem ? What can I do ?

Florent,

I guess you didn't read the help for tapply, which in the Value section
states:

 Note that optional arguments to 'FUN' supplied by the '...'
 argument are not divided into cells.  It is therefore
 inappropriate for 'FUN' to expect additional arguments with the
 same length as 'X'.

So tapply is not the right tool for this job. We can use by() instead (a
wrapper for tapply) as so:

dat <- matrix(scan(), byrow = TRUE, ncol = 3)
1  1  1
1  2  2
1  3  1
1  4  2
0  2  1
0  3  2
0  4  1
0  5  1

colnames(dat) <- c("x", "y", "w")
dat <- as.data.frame(dat)
dat
(res <- by(dat, dat$x, function(z) weighted.mean(z$y, z$w)))

but if you want to easily access the numbers you need to do a little
work, e.g.

as.vector(res)

Also, I don't see a function wtd.mean in standard R and weighted.mean()
doesn't have a weights argument, so I guess you are using a function
from another package and did not tell us.

HTH,

Gav
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [T] +44 (0)20 7679 5522
ENSIS Research Fellow [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and NA value

2005-03-25 Thread Dimitris Rizopoulos
you should look at the 'na.rm=FALSE' argument of '?mean()', i.e.,
x <- rnorm(100); x[sample(100, 10)] <- NA
f <- sample(letters[1:5], 100, TRUE)
###
tapply(x, f, mean)
tapply(x, f, mean, na.rm=TRUE)
I hope it helps.
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
- Original Message - 
From: "Leonardo Lami" <[EMAIL PROTECTED]>
To: 
Sent: Friday, March 25, 2005 10:35 AM
Subject: [R] tapply and NA value


Hi,
I'm writing for a little help.
I have a dataframe with same NA value and I'd like to obtain the 
means of the
value of a coloumn grouped by the levels of a factor coloumn of the 
datframe.
I'm using the function "tapply" but I see that if only a NA value is 
present
the result is NA.
There is an option to have the correct result or I must use an other 
function?

Thanks of all
Leonardo
--
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: 
(+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and NA value

2005-03-25 Thread Ales Ziberna
I am not really sure what you mean. If I understand you correctly, than all 
ylu have to do is to give additiona parameter to tapply, na.rm=TRUE,

tapply(, na.rm=TRUE)
However as I already said, I'm not sure what you did and what is the 
problem. Plese provide the code that did not work, possibly with a workable 
example, as the posting guide suggests:
"PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html";

I hope this helps in anyway,
Ales Ziberna
- Original Message - 
From: "Leonardo Lami" <[EMAIL PROTECTED]>
To: 
Sent: Friday, March 25, 2005 10:35 AM
Subject: [R] tapply and NA value


Hi,
I'm writing for a little help.
I have a dataframe with same NA value and I'd like to obtain the means of 
the
value of a coloumn grouped by the levels of a factor coloumn of the 
datframe.
I'm using the function "tapply" but I see that if only a NA value is 
present
the result is NA.
There is an option to have the correct result or I must use an other 
function?

Thanks of all
Leonardo
--
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: (+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply and NA value

2005-03-25 Thread Leonardo Lami
Thanks very much!
Best of all,
Leonardo


Alle 10:52, venerdì 25 marzo 2005, Dimitris Rizopoulos ha scritto:
> you should look at the 'na.rm=FALSE' argument of '?mean()', i.e.,
>
> x <- rnorm(100); x[sample(100, 10)] <- NA
> f <- sample(letters[1:5], 100, TRUE)
> ###
> tapply(x, f, mean)
> tapply(x, f, mean, na.rm=TRUE)
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
> 
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/16/336899
> Fax: +32/16/337015
> Web: http://www.med.kuleuven.ac.be/biostat/
>  http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
>
>
> - Original Message -----
> From: "Leonardo Lami" <[EMAIL PROTECTED]>
> To: 
> Sent: Friday, March 25, 2005 10:35 AM
> Subject: [R] tapply and NA value
>
> > Hi,
> > I'm writing for a little help.
> > I have a dataframe with same NA value and I'd like to obtain the
> > means of the
> > value of a coloumn grouped by the levels of a factor coloumn of the
> > datframe.
> > I'm using the function "tapply" but I see that if only a NA value is
> > present
> > the result is NA.
> > There is an option to have the correct result or I must use an other
> > function?
> >
> > Thanks of all
> > Leonardo
> > --
> > Leonardo Lami
> > [EMAIL PROTECTED]www.faunalia.it
> > Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel:
> > (+39)349-1310164
> > GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
> > https://www.biglumber.com
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html

-- 
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: (+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply with unequal length of arguments

2006-03-12 Thread
Hi everyone,

Is it possible to use tapply(x,y,mean) if not all groups of x by y are 
of the same length (for example if you have one missing observation)?

I tried tapply(x,y,mean,na.omit=T) but it doesn't work!

Steffi
-- 
-
Stefanie von Felten
Doktorandin

ETH Zürich
Institut für Pflanzenwissenschaften
ETH Zentrum, LFW A 2

Telefon: 044 632 85 97
Telefax: 044 632 11 53
e-mail:  [EMAIL PROTECTED]
http://www.ipw.agrl.ethz.ch/~svfelten/

und:

Universität Zürich
Institut für Umweltwissenschaften
Winterthurerstrasse 190
8057 Zürich

Telefon: 044 635 61 23
Telefax: 044 635 57 11
e-mail:  [EMAIL PROTECTED]
http://www.unizh.ch/uwinst/homepages/steffi.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply, how to get level information

2007-01-08 Thread Antje

Hello,

I'm applying a self-written function to a matrix on basis of different 
levels.
Is there any way, to get the level information within the self-written 
function???

t <- tapply(mat, levels, plotDensity)

plotDensity <- function(x) {
??? print(level(x)) ???
}

Antje

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with unequal length of arguments

2006-03-12 Thread ronggui
2006/3/12, Stefanie von Felten, IPW&IfU <[EMAIL PROTECTED]>:
> Hi everyone,
>
> Is it possible to use tapply(x,y,mean) if not all groups of x by y are
> of the same length (for example if you have one missing observation)?

Yes,It works.

> I tried tapply(x,y,mean,na.omit=T) but it doesn't work!
What does "it doesn't work" mean exactly?Can you give an example and
the error msg?

> Steffi
> --
> -
> Stefanie von Felten
> Doktorandin
>
> ETH Zürich
> Institut für Pflanzenwissenschaften
> ETH Zentrum, LFW A 2
>
> Telefon: 044 632 85 97
> Telefax: 044 632 11 53
> e-mail:  [EMAIL PROTECTED]
> http://www.ipw.agrl.ethz.ch/~svfelten/
>
> und:
>
> Universität Zürich
> Institut für Umweltwissenschaften
> Winterthurerstrasse 190
> 8057 Zürich
>
> Telefon: 044 635 61 23
> Telefax: 044 635 57 11
> e-mail:  [EMAIL PROTECTED]
> http://www.unizh.ch/uwinst/homepages/steffi.html
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
黄荣贵
Deparment of Sociology
Fudan University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply with unequal length of arguments

2006-03-12 Thread Uwe Ligges
Stefanie von Felten, IPW&IfU wrote:

> Hi everyone,
> 
> Is it possible to use tapply(x,y,mean) if not all groups of x by y are 
> of the same length (for example if you have one missing observation)?
> 
> I tried tapply(x,y,mean,na.omit=T) but it doesn't work!


See ?tapply which tells you that the argument "..." is passed to FUN 
which is mean() in this case. mean() has an argument "na.rm", see ?mean.
So we get:

  tapply(x, y, mean, na.rm = TRUE)

Please read the help pages more carefully.

Uwe Ligges

> Steffi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply() and barplot() help files for 1.8.1

2004-04-15 Thread David Whiting

Hi,

I've just upgraded to 1.9.0 and one of my Sweave files that produces a
number of barplots in a standard manner now produces them in a
different way.  I have made a couple of small changes to my code to
get the back the output I was getting before upgrading and now (mostly
out of curiosity) would like to understand what has changed.

I *think* I've tracked it down to tapply() and/or barplot() and have
not seen anything in the NEWS file regarding changes to these
functions (as far a I can see).  As part of doing my homework, I would
like to read the version 1.8.1 help files for these two functions, but
now that I've upgraded I'm not sure where I can find them.  Is there a
simple way for me to get copies of these two help files to compare
with the versions in 1.9.0?  As far as I can see, barplot() and
tapply() in 1.9.0 work as described in their 1.9.0 help files (which
does not surprise me).

I've been lurking on this list long enough to know that if there has
been a change it is documented, so it must be that I just haven't
found it yet.  If there hasn't been a change, then I am totally
perplexed, because I have been running this Sweave file several times
a day for the last few weeks and have not changed that part of it
(I've been changing the LaTeX parts).

In the part of the code that has changed I use tapply() to summarise
some data and then plot it with barplot().  I now have to use matrix()
on the output of tapply() before using barplot() because tapply()
produces a list and barplot() wants a vector or matrix.

In the code below, z is a dataframe, yllperdth is a numeric and fld
is the name of a factor, both in the dataframe.

Old version (as used with R 1.8.1):

  ## Calculate the % of YLLs for each group in the cause classification.
  x <- tapply(z$yllperdth, z[, fld], sum)
  totalYLLs <- sum(x)
  x <- x / totalYLLs * 100
  x <- sort(x)
  
  ## Plot the chart. horiz = TRUE makes it a bar instead of 
  ## column chart.  las = 1 prints the labels horizontally.
  xplot <- barplot(x, 
   horiz = TRUE, 
   xlab = "Percent of YLLs",
   las = 1)


New Version (as used with R 1.9.0):

  ## Calculate the % of YLLs for each group in the cause classification.
  x <- tapply(z$yllperdth, z[, fld], sum)
  totalYLLs <- sum(x)
  x <- x / totalYLLs * 100
  x <- sort(x)

  causeNames <- names(x)  ## NEW BIT
  x <- matrix(x)  ## NEW BIT
  

  ## Plot the chart. horiz = TRUE makes it a bar instead of 
  ## column chart.  las = 1 prints the labels horizontally.
  xplot <- barplot(x, 
   beside = TRUE,   ## NEW BIT
   names.arg = causeNames,  ## NEW BIT
   horiz = TRUE, 
   xlab = "Percent of YLLs",
   las = 1)




> version
 _
platform i686-pc-linux-gnu
arch i686 
os   linux-gnu
system   i686, linux-gnu  
status
major1
minor9.0  
year 2004 
month04   
day  12   
language R


A little while before upgrading I noted my previous R version (for a
post that I redrafted 7 times and never sent because I found the answer
through refining my draft), and it was:

> version
 _
platform i686-pc-linux-gnu
arch i686 
os   linux-gnu
system   i686, linux-gnu  
status   Patched  
major1
minor8.1  
year 2004 
month02   
day  16   
language R


So, can I get the old help files?  Or it is easy to point me to a
documented change?  Or is it clear from my code what has changed or
what I am or was doing wrong?

Thanks.

Dave

-- 
David Whiting
Dar es Salaam, Tanzania

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] tapply huge speed difference if X has names

2005-08-08 Thread Matthew Dowle

Hi all,

Apologies if this has been raised before ... R's tapply is very fast, but if
X has names in this example, there seems to be a huge slow down: under 1
second compared to 151 seconds.  The following timings are repeatable and
are timed properly on a single user machine :

> X = 1:10
> names(X) = X
> system.time(fast<<-tapply(as.vector(X), rep(1:1,each=10), mean))  #
as.vector() to drop the names
[1] 0.36 0.00 0.35 0.00 0.00
> system.time(slow<<-tapply(X, rep(1:1,each=10), mean))
[1] 149.95   1.83 151.79   0.00   0.00
> head(fast)
   123456 
 5.5 15.5 25.5 35.5 45.5 55.5 
> head(slow)
   123456 
 5.5 15.5 25.5 35.5 45.5 55.5 
> identical(fast,slow)
[1] TRUE
> 

Looking inside tapply, which then calls split, it seems there is an
is.null(names(x)) which prevents R's internal fast version from being
called. Why is that there? Could it be removed?  I often do something like
tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames
of mat become the names of the vector mat[,"colname"], and this seems to
slow down tapply a lot. Perhaps other functions which call split also suffer
this problem?

> split.default
function (x, f)
{
if (is.list(f)) 
f <- interaction(f)
f <- factor(f)
if (is.null(attr(x, "class")) && is.null(names(x))) 
return(.Internal(split(x, f)))
lf <- levels(f)
y <- vector("list", length(lf))
names(y) <- lf
for (k in lf) y[[k]] <- x[f %in% k]
y
}

> 

> version
 _  
platform x86_64-redhat-linux-gnu
arch x86_64 
os   linux-gnu  
system   x86_64, linux-gnu  
status  
major2  
minor0.1
year 2004   
month11 
day  15 
language R  
> 


Thanks and regards,
Matthew



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply() and barplot() help files for 1.8.1

2004-04-15 Thread Martin Maechler
> "David" == David Whiting <[EMAIL PROTECTED]>
> on 15 Apr 2004 11:42:18 + writes:

David> Hi,

David> I've just upgraded to 1.9.0 and one of my Sweave
David> files that produces a number of barplots in a
David> standard manner now produces them in a different way.
David> I have made a couple of small changes to my code to
David> get the back the output I was getting before
David> upgrading and now (mostly out of curiosity) would
David> like to understand what has changed.

and I like to help you.
As I keep installed `(almost) all released versions of R ever
installed on our machines'
I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you.

The only difference
 between the help page help(tapply)
is an extra   "require(stats)" statement at the beginning of the
`Examples' section in 1.9.0.

and the only change to  tapply() is 
group <- rep.int(one, nx)#- to contain the splitting vector
instead of
group <- rep(one, nx)#- to contain the splitting vector

which hardly should have adverse results.

In barplot, there's the new 'offset' option  --- not in NEWS ()

and another change that may be a problem.

Can you dig harder and if possible provide a reproducible (small..)
example to make progress here...


David> I *think* I've tracked it down to tapply() and/or
David> barplot() and have not seen anything in the NEWS file
David> regarding changes to these functions (as far a I can
David> see).  As part of doing my homework, I would like to
David> read the version 1.8.1 help files for these two
David> functions, but now that I've upgraded I'm not sure
David> where I can find them.  Is there a simple way for me
David> to get copies of these two help files to compare with
David> the versions in 1.9.0?  As far as I can see,
David> barplot() and tapply() in 1.9.0 work as described in
David> their 1.9.0 help files (which does not surprise me).

David> I've been lurking on this list long enough to know
David> that if there has been a change it is documented, so
David> it must be that I just haven't found it yet.  If
David> there hasn't been a change, then I am totally
David> perplexed, because I have been running this Sweave
David> file several times a day for the last few weeks and
David> have not changed that part of it (I've been changing
David> the LaTeX parts).

David> In the part of the code that has changed I use
David> tapply() to summarise some data and then plot it with
David> barplot().  I now have to use matrix() on the output
David> of tapply() before using barplot() because tapply()
David> produces a list and barplot() wants a vector or
David> matrix.

David> In the code below, z is a dataframe, yllperdth is a
David> numeric and fld is the name of a factor, both in the
David> dataframe.

David> Old version (as used with R 1.8.1):

David>   ## Calculate the % of YLLs for each group in the
David> cause classification.  x <- tapply(z$yllperdth, z[,
David> fld], sum) totalYLLs <- sum(x) x <- x / totalYLLs *
David> 100 x <- sort(x)
  
David>   ## Plot the chart. horiz = TRUE makes it a bar
David> instead of ## column chart.  las = 1 prints the
David> labels horizontally.  xplot <- barplot(x, horiz =
David> TRUE, xlab = "Percent of YLLs", las = 1)


David> New Version (as used with R 1.9.0):

David>   ## Calculate the % of YLLs for each group in the
David> cause classification.  x <- tapply(z$yllperdth, z[,
David> fld], sum) totalYLLs <- sum(x) x <- x / totalYLLs *
David> 100 x <- sort(x)

David>   causeNames <- names(x) ## NEW BIT x <- matrix(x) ##
David> NEW BIT
  

David>   ## Plot the chart. horiz = TRUE makes it a bar
David> instead of ## column chart.  las = 1 prints the
David> labels horizontally.  xplot <- barplot(x, beside =
David> TRUE, ## NEW BIT names.arg = causeNames, ## NEW BIT
David> horiz = TRUE, xlab = "Percent of YLLs", las = 1)




>> version
David>  _ platform i686-pc-linux-gnu arch i686 os
David> linux-gnu system i686, linux-gnu status major 1 minor
David> 9.0 year 2004 month 04 day 12 language R


David> A little while before upgrading I noted my previous R
David> version (for a post that I redrafted 7 times and
David> never sent because I found the answer through
David> refining my draft), and it was:

>> version
David>  _ platform i686-pc-linux-gnu arch i686 os
David> linux-gnu system i686, linux-gnu status Patched major
David> 1 minor 8.1 year 2004 month 02 day 16 language R

David> So, can I get the old help files?  Or it is easy to
David> point me to a documented change?  Or is it clear from
David> my code what has changed or what I am or was doing
David> wrong?

David> Thanks.

David> Dave

David> -- David Whiting Dar es Salaam, Tanzania


Re: [R] tapply() and barplot() help files for 1.8.1

2004-04-15 Thread Duncan Murdoch
On Thu, 15 Apr 2004 18:10:27 +0200, Martin Maechler
<[EMAIL PROTECTED]> wrote :

>> "David" == David Whiting <[EMAIL PROTECTED]>
>> on 15 Apr 2004 11:42:18 + writes:
>
>David> Hi,
>
>David> I've just upgraded to 1.9.0 and one of my Sweave
>David> files that produces a number of barplots in a
>David> standard manner now produces them in a different way.
>David> I have made a couple of small changes to my code to
>David> get the back the output I was getting before
>David> upgrading and now (mostly out of curiosity) would
>David> like to understand what has changed.
>
>and I like to help you.
>As I keep installed `(almost) all released versions of R ever
>installed on our machines'
>I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you.
>
>The only difference
> between the help page help(tapply)
>is an extra   "require(stats)" statement at the beginning of the
>`Examples' section in 1.9.0.
>
>and the only change to  tapply() is 
>group <- rep.int(one, nx)#- to contain the splitting vector
>instead of
>group <- rep(one, nx)#- to contain the splitting vector
>
>which hardly should have adverse results.
>
>In barplot, there's the new 'offset' option  --- not in NEWS ()
>
>and another change that may be a problem.

Here's a reproducible bug in barplot in 1.9.0 (based on an email I got
this morning from Richard Rowe):

x <- table(rep(1:5,1:5))
barplot(x)

The problem is that table() produces a one dimensional array, and
barplot() doesn't handle those properly now.  The offending line is
this one:

$ cvs diff -r 1.3 barplot.R
[junk deleted] 
43c43
<   width <- rep(width, length.out = NR * NC)
---
>   width <- rep(width, length.out = NR)

In the example above, x gets turned into a matrix with NR=1 row and
NC=5 columns so only one bar width gets set.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply() and barplot() help files for 1.8.1

2004-04-16 Thread David Whiting
Martin Maechler <[EMAIL PROTECTED]> writes:

> and I like to help you.
> As I keep installed `(almost) all released versions of R ever
> installed on our machines'
> I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you.
> 
> The only difference
>  between the help page help(tapply)
> is an extra   "require(stats)" statement at the beginning of the
> `Examples' section in 1.9.0.
> 
> and the only change to  tapply() is 
> group <- rep.int(one, nx)#- to contain the splitting vector
> instead of
> group <- rep(one, nx)#- to contain the splitting vector
> 
> which hardly should have adverse results.
> 
> In barplot, there's the new 'offset' option  --- not in NEWS ()
> 
> and another change that may be a problem.
> 
> Can you dig harder and if possible provide a reproducible (small..)
> example to make progress here...
> 

Last night I found I had a backup of the source of 1.8.0, built that
and tested an example and it worked as in 1.9.0.  I then started to
question my sanity (or at least my competence).

The code that follows should be a reproducible example.  It creates a
data frame that has the same structure as the data I am working with
(with a number of other columns dropped) and is followed by the
function that creates the barplot.  The changes I have had to make to
make it work as I thought it was working with 1.8.1 have ## NEW BIT
after them, i.e. those lines were not there in the version I ran with
1.8.1.  The important new lines are:

 x <- matrix(x)  ## NEW BIT

and 

 beside = TRUE,  ## NEW BIT



--- EXAMPLE ---

## Create some fake data.
x <- c(rep("", 926), 
rep("All Other Perinatal Causes", 46), 
rep("Anaemia", 3), 
rep("Congenital Abnormalities", 1), 
rep("Unsp. Direct Maternal Causes", 24))
y <- runif(length(x))
tempdat <- data.frame(smi=x, yllperdth=y)



## Define the function to make my barplot
bodShare <- function(x, fld, main = "", userpar = 18, xlimMult=1.3 ) {
  ###
  # A horizontal barchart to display BoD shares #
  ###
  z <- subset(x, as.character(x[,fld]) != "")
  z[, fld] <- factor(z[, fld])

  ## We need to change the parameters of the chart.
  ## First save the old settings.
  oldpar <- par("mar")
  newpar <- par("mar")

  ## Increase the size of the margin on the left so there 
  ## is enough space for the long text labels (which will 
  ## be displayed horizontally on the y-axis).
  newpar[2] <- userpar

  
  ## Reduce the top margin because I will use a \caption in LaTeX 
  ## instead.
  newpar[3] <- 1


  ## Now apply the new settings.
  par(mar = newpar)

  ## Calculate the % of YLLs for each group in the cause classification.
  x <- tapply(z$yllperdth, z[, fld], sum)
  totalYLLs <- sum(x)
  x <- x / totalYLLs * 100
  x <- sort(x)

  causeNames <- names(x)  ## NEW BIT
  x <- matrix(x)  ## NEW BIT
  

  ## Plot the chart. horiz = TRUE makes it a bar instead of 
  ## column chart.  las = 1 prints the labels horizontally.
  xplot <- barplot(x, 
##   main = main,
   horiz = TRUE, 
   beside = TRUE,## NEW BIT
   names.arg = causeNames,   ## NEW BIT
   xlab = "Percent of YLLs",
   xlim = c(0, max(x) * xlimMult), 
   las = 1)
  
  text(x + (max(x) * .15), xplot, formatC(x, digits=1, format='f'))

  ## Reset the old margin parameters.
  par(mar = oldpar)
  
  ## Write data to a table for export.
  # First we need to remove newlines from labels.
  names(x) <- sub("\n", "", names(x))
  write.table(as.table(x), file = paste("tables/", fld, ".csv", sep=""), col.names=NA, 
sep="\t")
  names(x) <- causeNames
  x[length(x)]
}

## Create the barplot.
bodShare(tempdat, "smi")


-- 
David Whiting
Dar es Salaam, Tanzania

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] tapply huge speed difference if X has names

2005-08-08 Thread Prof Brian Ripley
Please use a current version of R!

This was fixed long ago, and you will find it in the NEWS file:

 split() now handles vectors with names internally and so is
 almost as fast as on vectors without names (and maybe 100x
 faster than before).


On Mon, 8 Aug 2005, Matthew Dowle wrote:

>
> Hi all,
>
> Apologies if this has been raised before ... R's tapply is very fast, but if
> X has names in this example, there seems to be a huge slow down: under 1
> second compared to 151 seconds.  The following timings are repeatable and
> are timed properly on a single user machine :
>
>> X = 1:10
>> names(X) = X
>> system.time(fast<<-tapply(as.vector(X), rep(1:1,each=10), mean)) #
> as.vector() to drop the names
> [1] 0.36 0.00 0.35 0.00 0.00
>> system.time(slow<<-tapply(X, rep(1:1,each=10), mean))
> [1] 149.95   1.83 151.79   0.00   0.00
>> head(fast)
>   123456
> 5.5 15.5 25.5 35.5 45.5 55.5
>> head(slow)
>   123456
> 5.5 15.5 25.5 35.5 45.5 55.5
>> identical(fast,slow)
> [1] TRUE
>>
>
> Looking inside tapply, which then calls split, it seems there is an
> is.null(names(x)) which prevents R's internal fast version from being
> called. Why is that there? Could it be removed?  I often do something like
> tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames
> of mat become the names of the vector mat[,"colname"], and this seems to
> slow down tapply a lot. Perhaps other functions which call split also suffer
> this problem?
>
>> split.default
> function (x, f)
> {
>if (is.list(f))
>f <- interaction(f)
>f <- factor(f)
>if (is.null(attr(x, "class")) && is.null(names(x)))
>return(.Internal(split(x, f)))
>lf <- levels(f)
>y <- vector("list", length(lf))
>names(y) <- lf
>for (k in lf) y[[k]] <- x[f %in% k]
>y
> }
> 
>>
>
>> version
> _
> platform x86_64-redhat-linux-gnu
> arch x86_64
> os   linux-gnu
> system   x86_64, linux-gnu
> status
> major2
> minor0.1
> year 2004
> month11
> day  15
> language R
>>
>
>
> Thanks and regards,
> Matthew
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html