Re: [R] Fw: calculate mean of multiple rows in a data frame

2011-12-02 Thread Jabez Wilson
Thank you, I copied the data from the R environment, but it came out wrong. You 
understood exactly what I wanted, and your solution is admirable: I clearly 
need to address the naming convention. Thanks for your help.

--- On Fri, 2/12/11, Jean V Adams  wrote:


From: Jean V Adams 
Subject: Re: [R] Fw: calculate mean of multiple rows in a data frame
To: "Jabez Wilson" 
Cc: "R-Help" 
Date: Friday, 2 December, 2011, 14:29



It's easier for folks to help you if you put your example data in a format that 
can be readily read in R.  See, for example, the dput() function, which you can 
use to provide us with something like this: 

DF <- structure(list(NAME = c("Control_1", "Control_2", "Control_1", 
"Control_3", "MM0289~RFU:11810.15", "MM0289~RFU:9238.41", 
"MM16597~RFU:36765.38", 
"MM16597~RFU:41258.94"), ID = c("probe~B01R01C01", "probe~B01R01C02", 
"probe~B01R09C01", "probe~B01R09C02", "probe~B29R13C06", "probe~B29R13C05", 
"probe~B44R15C20", "probe~B44R15C19"), a = c(3L, 712L, 937L, 
464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 
603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 
497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = 
c("NAME", 
"ID", "a", "b", "c", "d"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8")) 

If I understand what you're after, you want to summarize data within groups, 
but your NAME variable is not as general as you would like.  You can get around 
this by creating a new variable which is a shorter and more general version of 
the NAME variable.  I did this by saving just the part of the NAME before the 
colon, ":". 

shortname <- sapply(strsplit(DF$NAME, ":"), "[", 1) 
aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) 

    shortname   a     b     c     d 
1   Control_1 470 423.0 912.0 721.0 
2   Control_2 712  13.0  32.0 179.0 
3   Control_3 464 836.0 508.0  53.0 
4  MM0289~RFU 352 573.5 734.5 779.5 
5 MM16597~RFU 416 850.0 358.0 788.5 

Jean 


> Jabez Wilson wrote on 12/01/2011 03:15:39 PM:

> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> 
> 
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> Sorry, that should look like this:
> 
> 
> 
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686 NAME ID a b c d 
> 1 Control_1 probe~B01R01C01 3 22 926 774 
> 2 Control_2 probe~B01R01C02 712 13 32 179 
> 3 Control_1 probe~B01R09C01 937 824 898 668 
> 4 Control_3 probe~B01R09C02 464 

[R] Fw: calculate mean of multiple rows in a data frame

2011-12-01 Thread Jabez Wilson



NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686



NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686
Sorry, that should look like this:




NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686 NAME ID a b c d 
1 Control_1 probe~B01R01C01 3 22 926 774 
2 Control_2 probe~B01R01C02 712 13 32 179 
3 Control_1 probe~B01R09C01 937 824 898 668 
4 Control_3 probe~B01R09C02 464 836 508 53 
5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 
6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 
7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 
8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995

--- On Thu, 1/12/11, Jabez Wilson  wrote:


From: Jabez Wilson 
Subject: calculate mean of multiple rows in a data frame
To: "R-Help" 
Date: Thursday, 1 December, 2011, 20:45







Dear all, I have a data frame (DF) in the following format:










NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686.
I would like to consolidate the data frame by parsing through the rows, and 
where the NAME is identical, consolidate into one row and return the mean.
I can do this for the first lines (Control_1 etc) by using aggregate()
aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
but since aggregate looks for unique lines it won't consolidate e.g. lines 5/6 
and 7/8.
Is there a way of telling aggregate to grep just the first part of the name 
(i.e. up to "~") and consolidate those?
I could pre-grep the file before importing into R, but I'd like to do it within 
R if possible.
Thanks for any suggestions
 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calculate mean of multiple rows in a data frame

2011-12-01 Thread Jabez Wilson
Dear all, I have a data frame (DF) in the following format:










NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686.
I would like to consolidate the data frame by parsing through the rows, and 
where the NAME is identical, consolidate into one row and return the mean.
I can do this for the first lines (Control_1 etc) by using aggregate()
aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
but since aggregate looks for unique lines it won't consolidate e.g. lines 5/6 
and 7/8.
Is there a way of telling aggregate to grep just the first part of the name 
(i.e. up to "~") and consolidate those?
I could pre-grep the file before importing into R, but I'd like to do it within 
R if possible.
Thanks for any suggestions
 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add "waiting for page change" to my script

2011-11-24 Thread Jabez Wilson
That works wonderfully - thank you

--- On Thu, 24/11/11, Jim Holtman  wrote:


From: Jim Holtman 
Subject: Re: [R] how to add "waiting for page change" to my script
To: "Jabez Wilson" 
Cc: "R-Help" 
Date: Thursday, 24 November, 2011, 12:39


I thing it is

par(ask = TRUE) 

Sent from my iPad

On Nov 24, 2011, at 7:28, Jabez Wilson  wrote:

> I'd like to "step" through 24 histograms by using the return or click button 
> option, as shown in the demo(graphics) demonstration. I've searched for 
> "interactive graphics", and "waiting for page change" in R documentation but 
> with no result. I'm sure that this is a relatively straightforward procedure. 
> Can anyone point me to the correct solution?
>  
> Jabez
>    [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add "waiting for page change" to my script

2011-11-24 Thread Jabez Wilson
Thanks, Enrico, that will do nicely.
 
Jab

--- On Thu, 24/11/11, Enrico Schumann  wrote:


From: Enrico Schumann 
Subject: Re: [R] how to add "waiting for page change" to my script
To: "Jabez Wilson" 
Cc: "R-Help" 
Date: Thursday, 24 November, 2011, 13:20


see ?devAskNewPage

plot(1:10, col = "green", pch = 19)
devAskNewPage(ask = TRUE)
plot(1:10, col = "blue", pch = 19)



Am 24.11.2011 13:28, schrieb Jabez Wilson:
> I'd like to "step" through 24 histograms by using the return or click button 
> option, as shown in the demo(graphics) demonstration. I've searched for 
> "interactive graphics", and "waiting for page change" in R documentation but 
> with no result. I'm sure that this is a relatively straightforward procedure. 
> Can anyone point me to the correct solution?
> �
> Jabez
>     [[alternative HTML version deleted]]
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to add "waiting for page change" to my script

2011-11-24 Thread Jabez Wilson
I'd like to "step" through 24 histograms by using the return or click button 
option, as shown in the demo(graphics) demonstration. I've searched for 
"interactive graphics", and "waiting for page change" in R documentation but 
with no result. I'm sure that this is a relatively straightforward procedure. 
Can anyone point me to the correct solution?
 
Jabez
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interpreting one-way anova tables

2010-09-20 Thread Jabez Wilson
Hi, I am trying to reconcile anova table in R (summary(lm)) with individual 
t.test.
datafilename="http://personality-project.org/R/datasets/R.appendix1.data";
data.ex1=read.table(datafilename,header=T)   #read the data into a table
summary(lm(Alertness~Dosage,data=data.ex1))

gives:

Call:
lm(formula = Alertness ~ Dosage, data = data.ex1)

Residuals:
   Min     1Q Median     3Q    Max 
-8.500 -2.437  0.250  2.687  8.500 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   32.500      2.010  16.166 6.72e-11 ***
Dosageb       -4.250      2.659  -1.598 0.130880    
Dosagec      -13.250      3.179  -4.168 0.000824 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ 
’ 1 

Residual standard error: 4.924 on 15 degrees of freedom
Multiple R-squared: 0.5396,     Adjusted R-squared: 0.4782 
F-statistic: 8.789 on 2 and 15 DF,  p-value: 0.002977 

As far as I understand it the lines "Dosageb" and "DosageC" represent the 
difference between DosageA and the other two dosages.
My question is this: are these differences and the p-values associated with 
them the same as a t.test or pairwise.t.test on these groups? If I do t.tests, 
I get different values for t and p-value from those in the anova table above.
Can someone please explain what the discrepancy is?
Thanks



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pairwise.t.test vs t.test

2010-09-10 Thread Jabez Wilson
Thanks a lot, Peter. Excellent book btw.
 
Jab

--- On Fri, 10/9/10, peter dalgaard  wrote:


From: peter dalgaard 
Subject: Re: [R] pairwise.t.test vs t.test
To: "Jabez Wilson" 
Cc: "R-Help" 
Date: Friday, 10 September, 2010, 15:20



On Sep 10, 2010, at 16:01 , Jabez Wilson wrote:

> Dear all, I am perplexed when trying to get the same results using 
> pairwise.t.test and t.test.
> I'm using examples in the ISwR library, 
>> attach(red.cell.folate)
> I can get the same result for pairwise.t.test and t.test when I set the 
> variances to be non-equal, but not when they are assumed to be equal. Can 
> anyone explain the differences, or what I'm doing wrong?
> Here's an example where I compare the first two ventilations with 
> pairwise.t.test and t.test
>> pairwise.t.test(folate, ventilation, p.adj="none", pool.sd=F)
>         Pairwise comparisons using t tests with non-pooled SD 
> data:  folate and ventilation 
>           N2O+O2,24h N2O+O2,op
> N2O+O2,op 0.029      -        
> O2,24h    0.161      0.298    
> P value adjustment method: none 
> 
>> t.test(folate[1:8], folate[9:17], var.equal=F)
>         Welch Two Sample t-test
> data:  folate[1:8] and folate[9:17] 
> t = 2.4901, df = 11.579, p-value = 0.02906
> alternative hypothesis: true difference in means is not equal to 0 
> 95 percent confidence interval:
>    7.310453 113.050658 
> sample estimates:
> mean of x mean of y 
>  316.6250  256. 
>  
> So 0.029 and 0.02906 are identical but if I do the same with pool.sd and 
> var.equal = T, I get different results
>> pairwise.t.test(folate, ventilation, p.adj="none", pool.sd=T)
>         Pairwise comparisons using t tests with pooled SD 
> data:  folate and ventilation 
>           N2O+O2,24h N2O+O2,op
> N2O+O2,op 0.014      -        
> O2,24h    0.155      0.408    
> P value adjustment method: none 
> 
>> t.test(folate[1:8], folate[9:17], var.equal=T)
>         Two Sample t-test
> data:  folate[1:8] and folate[9:17] 
> t = 2.5582, df = 15, p-value = 0.02184
> alternative hypothesis: true difference in means is not equal to 0 
> 95 percent confidence interval:
>   10.03871 110.32240 
> sample estimates:
> mean of x mean of y 
>  316.6250  256. 
>  
> So 0.014 and 0.02184 are not the same.
>  
>  


The help page says:

"The pool.SD switch calculates a common SD for all groups" (NB: "all")

So the denominator is not the same as when testing each pair separately.

You can in fact do

pairwise.t.test(folate, ventilation, p.adj="none", pool.sd=F,var.eq=T)




-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pairwise.t.test vs t.test

2010-09-10 Thread Jabez Wilson
Dear all, I am perplexed when trying to get the same results using 
pairwise.t.test and t.test.
I'm using examples in the ISwR library, 
>attach(red.cell.folate)
I can get the same result for pairwise.t.test and t.test when I set the 
variances to be non-equal, but not when they are assumed to be equal. Can 
anyone explain the differences, or what I'm doing wrong?
Here's an example where I compare the first two ventilations with 
pairwise.t.test and t.test
> pairwise.t.test(folate, ventilation, p.adj="none", pool.sd=F)
    Pairwise comparisons using t tests with non-pooled SD 
data:  folate and ventilation 
  N2O+O2,24h N2O+O2,op
N2O+O2,op 0.029  -    
O2,24h    0.161  0.298    
P value adjustment method: none 

> t.test(folate[1:8], folate[9:17], var.equal=F)
    Welch Two Sample t-test
data:  folate[1:8] and folate[9:17] 
t = 2.4901, df = 11.579, p-value = 0.02906
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
   7.310453 113.050658 
sample estimates:
mean of x mean of y 
 316.6250  256. 
 
So 0.029 and 0.02906 are identical but if I do the same with pool.sd and 
var.equal = T, I get different results
> pairwise.t.test(folate, ventilation, p.adj="none", pool.sd=T)
    Pairwise comparisons using t tests with pooled SD 
data:  folate and ventilation 
  N2O+O2,24h N2O+O2,op
N2O+O2,op 0.014  -    
O2,24h    0.155  0.408    
P value adjustment method: none 

> t.test(folate[1:8], folate[9:17], var.equal=T)
    Two Sample t-test
data:  folate[1:8] and folate[9:17] 
t = 2.5582, df = 15, p-value = 0.02184
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
  10.03871 110.32240 
sample estimates:
mean of x mean of y 
 316.6250  256. 
 
So 0.014 and 0.02184 are not the same.
 
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re-arrange Columns in data frame

2009-11-25 Thread Jabez Wilson
Thanks very much. Using the matrix function and then DF[,ix] gives me exactly 
what I wanted.
Jabez

--- On Wed, 25/11/09, Gabor Grothendieck  wrote:


From: Gabor Grothendieck 
Subject: Re: [R] Re-arrange Columns in data frame
To: "Jabez Wilson" 
Cc: "R Mailing List" 
Date: Wednesday, 25 November, 2009, 16:27


Try this:

# first case
ix <- c(matrix(1:24, 4, byrow = TRUE))
DF[ix]

# second case
ix <- c(matrix(1:16, 4, byrow = TRUE))
DF[ix]

On Wed, Nov 25, 2009 at 11:16 AM, Jabez Wilson  wrote:
> Hi, I have a data frame which is 24 columns by 10 rows. This is essentially 
> 6 groups of 4 columns. I want to re-arrange the columns into the following 
> order 1,7,13,19,2,8,14,20,3,9,15,21,4,10,16,22,5,11,17,23,6,12,18,24 i.e. 
> first of each group of 6 grouped together, then 2nd of each group of six etc.
> I know that I can do 
> df[,c(1,7,13,19,2,8,14,20,3,9,15,21,4,10,16,22,5,11,17,23,6,12,18,24)], but 
> what if I now have 4 groups of 4 columns, I would want the order to be 
> c(1,5,9,13,2,6,10,14,3,7,11,15,4,8,12,16). I know that seq() comes into it 
> somewhere, and I've got as far as seq(1,ncol(df),number_of_groups), but that 
> gives me only one sequence. Is there a way of combining with rep() that can 
> do this?
> Jabez
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re-arrange Columns in data frame

2009-11-25 Thread Jabez Wilson
Hi, I have a data frame which is 24 columns by 10 rows. This is essentially 
6 groups of 4 columns. I want to re-arrange the columns into the following 
order 1,7,13,19,2,8,14,20,3,9,15,21,4,10,16,22,5,11,17,23,6,12,18,24 i.e. first 
of each group of 6 grouped together, then 2nd of each group of six etc.
I know that I can do 
df[,c(1,7,13,19,2,8,14,20,3,9,15,21,4,10,16,22,5,11,17,23,6,12,18,24)], but 
what if I now have 4 groups of 4 columns, I would want the order to be 
c(1,5,9,13,2,6,10,14,3,7,11,15,4,8,12,16). I know that seq() comes into it 
somewhere, and I've got as far as seq(1,ncol(df),number_of_groups), but that 
gives me only one sequence. Is there a way of combining with rep() that can do 
this?
Jabez


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] processing log file

2009-11-13 Thread Jabez Wilson
Thanks, that's helpful because I can see the individuals and how many times 
they accessed:
The 'plyr' solution of Karl Ove Hufthammer gives me the exact summary 
statistics that I'm looking for.
 
Jab

--- On Fri, 13/11/09, markle...@verizon.net  wrote:


From: markle...@verizon.net 
Subject: Re: Re: [R] processing log file
To: jabez...@yahoo.co.uk
Date: Friday, 13 November, 2009, 16:36


Hi: I think below does what you want but it doesn't come out formatted very 
nicely. Maybe someone can show you
the formatting ? Good luck.

table.users <- read.table(textConnection("Date UserName Machine
2008-11-25 John 641
2008-11-25    Clive 611
2008-11-25   Jeremy 641
2008-11-25 Walt 722
2008-11-25 Tony 645
2008-11-26 Tony 645
2008-11-26 Tony 641
2008-11-26 Tony 641
2008-11-26 Walt 641
2008-11-26 Walt 645
2008-11-30 John 641
2008-11-30    Clive 611
2008-11-30 Tony 641
2008-11-30 John 641
2008-11-30 John 641"),header=TRUE,as.is=TRUE)

print(table.users)
print(str(table.users))

lapply(split(table.users,table.users$Date),function(.df) {
    table(.df$Machine)
})

lapply(split(table.users,table.users$Date),function(.df) {
    table(.df$UserName)
})






On Nov 13, 2009, Karl Ove Hufthammer  wrote: 

On Fri, 13 Nov 2009 11:03:31 + (GMT) Jabez Wilson 
 wrote:
> What I want to do is to find out how many unique users logged 
> on each day, and how many individual machines where accessed per day.

Use the 'plyr' package:

library(plyr)
ddply(table.users, .(Date), summarise,
users=length(unique(Username)),
machines=length(unique(Machine)))

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] processing log file

2009-11-13 Thread Jabez Wilson
Dear all, I'm trying to process a log file which logs the date, the username 
and the computer number accessed. The table looks like this:
>table.users
 Date UserName Machine
1  2008-11-25 John 641
2  2008-11-25    Clive 611
3  2008-11-25   Jeremy 641
4  2008-11-25 Walt 722
5  2008-11-25 Tony 645
6  2008-11-26 Tony 645
7  2008-11-26 Tony 641
8  2008-11-26 Tony 641
9  2008-11-26 Walt 641
10 2008-11-26 Walt 645
11 2008-11-30 John 641
12 2008-11-30    Clive 611
13 2008-11-30 Tony 641
14 2008-11-30 John 641
15 2008-11-30 John 641
..etc
What I want to do is to find out how many unique users logged on each day, and 
how many individual machines where accessed per day. In the above example, 
therefore on 2008-11-25 there were 5 separate users accessing 4 machines, on 
2008-11-26 there were 2 unique users who used 2 machines (although both logged 
on more than once).
I've got as far as apply(table.users, 2, FUN=table) which gives me an output of 
date, or username or machine and how many times they were accessed, but not 
really what I want.
Any help appreciated
 
Jabez


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expression matrix

2008-03-10 Thread Jabez Wilson
>Date: Sat, 8 Mar 2008 04:56:58 -0800 (PST)
>From: Keizer_71 <[EMAIL PROTECTED]>
>Subject: [R]  expression matrix
>To: r-help@r-project.org
>Message-ID: <[EMAIL PROTECTED]>
>Content-Type: text/plain; charset=us-ascii

>Hello,

>I am to run this R script but i keep getting this error.

>> expr<-exprs(golubMerge)
>Warning message:
>The exprSet class is deprecated, use ExpressionSet instead 

>I tried to find information on the website but no luck. (exprSet...etc)

>thank you.

  You must be very unlucky indeed. Top hit for me with "ExpressionSet" in 
google was this:
  An Introduction to Bioconductor’s ExpressionSet Class  File Format: 
PDF/Adobe Acrobat - View as HTML
The data in an ExpressionSet is complicated, consisting of expression data from 
mi- ... The ExpressionSet class coordinates all of this data, so that ...
www.bioconductor.org/packages/2.0/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf
 - Similar pages

   
-

The World 's Favourite Email.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical Questions: finding differentially expressed

2008-03-10 Thread Jabez Wilson
>Date: Thu, 6 Mar 2008 06:46:07 -0800 (PST)
>From: Keizer_71 <[EMAIL PROTECTED]>
>Subject: [R] Statistical Questions: finding differentially expressed
 >genes
>To: r-help@r-project.org
>Message-ID: <[EMAIL PROTECTED]>
>Content-Type: text/plain; charset=us-ascii


>Hi Everyone,

>I am trying to find a way to do this in excel to tell me which genes
>are the most differentially expressed. Sorry, i couldn't find excel forum
>section in nabble. However, if it is in R it is fine. This is a microarray 
>data,
>and it has been normalized. According to Dov Stekel in Microarray, i will need
 >to calculate log ratio (control-treatment). Once you have the log ratio,
> calculate using paired t-test. Once you calculate the paired t-test,
> you will find the p-value and the t-test. Is there a way in excel to
 > calculate the confidence level that is significant. For example, it will be 
 > under
>1% for all the genes to be differentially expressed. 

>The book did not explained how log ratio will help me determine the
>significant value. 


>GeneID   treatment control treatment control treatment control 
>Gene12.1   1 2 2.2 1.10.7  2.7 
>Gene21.5   1.4   1.72.2   1.3 1.2 
>Gene3  1.4   1.7   1.82.7   1.6  1.5 
>Gene4   2.2   2.42.12.3 2.1  1.9 
>Gene5   2.6   3.42.11.3   2.6 2.9 


>Objective: find genes who are differentially epxressed.


  I'm not sure what you are asking, but to find whether one of your genes is 
significantly expressed is relatively straightforward in R or excel, and you 
have already outlined the procedure yourself. Have you tried to perform a 
paired t test or log transform in either software yet, and if so, what is the 
stumbling block?
  Read and follow the examples given in Dov Stekel's excellent book. There is 
no better microarray statistics primer IMHO, and reasons for log transforms and 
an example of exactly the analysis you require are clearly explained.
  I 

   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 1-pnorm values in a table

2008-03-06 Thread Jabez Wilson
   hsl.gov.uk> writes:
   
  [snip]
  > Try:
  > nrows <- 5
  > mm <- matrix(rnorm(30),nrow=nrows)
  > sd.by.col <- apply(mm,2,sd)
  > mean.by.col <- apply(mm,2,mean)
  > values <- 1-mapply(pnorm, q=as.vector(mm), mean=rep(mean.by.col, 
  > nrows)), sd=rep(sd.by.col, nrows))) values <- matrix(values, nrow=5)
  > 
  > > p.s. I know I'm asking a lot, but ideally, I'd like to print out 
  > > the table with those 1-pnorm values only if they are in the right 
  > > hand tail (i.e. >= mean) and if not nothing or NA be written.
  > 
  > values[values<.5] <- NA
  > 
   
  I'm not sure, but I think that
  nrows <- 5
  mm <- matrix(rnorm(30),nrow=nrows)
  pnorm(scale(mm),lower.tail=FALSE)
  values[values<.5] <- NA
  will do the same thing.
  lower.tail=FALSE is a little more accurate than 1-pnorm(...)
  cheers
  Ben Bolker
   
   
  Brilliant! I am in awe
   
  Thanks for the other contributions
   
  Jab
   
   
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 1-pnorm values in a table

2008-03-06 Thread Jabez Wilson
Hi,
   
  I've read in a csv file (test.csv) which gives me the following table:
   
   Hin1 Hin2   Hin3Hin4   Hin5   Hin6
HAI1  9534.83  4001.74 157.16 3736.93 484.60  59.25
HAI2 13272.48  1519.88  36.35   33.64  46.68  82.11
HAI3 12587.71  5686.94 656.62  572.29 351.60 136.91
HAI4 15240.81 10031.57 426.73  275.29 561.30 302.38
HAI5 15878.32 10517.14  18.93   22.00  16.91  21.17

  I would like to find a way of finding the 1-pnorm of each value in the table 
based on the mean and sd of the data only in the column in which the value 
lies. I can do it using a for loop, but would like to know if it can be done 
using e.g. apply or something similar, so that the whole table is printed out 
with the 1-pnorm values.
  1-pnorm(test[,1],mean([,1]), sd([,1])) gives me the values for col1 only, but 
that's as far as I've got.
   
  tia
   
  p.s. I know I'm asking a lot, but ideally, I'd like to print out the table 
with those 1-pnorm values only if they are in the right hand tail (i.e. >= 
mean) and if not nothing or NA be written.
   

   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replace numbers in a column conditional on their value

2008-01-17 Thread Jabez Wilson
Splendid, thanks for your quick response.

[EMAIL PROTECTED] wrote:  > I have a data frame column in which I would like to 
replace some 
> of the numbers dependent on their value.
> 
> data frame = zz
> 
> AveExpr t P.Value FC
> 7.481964 7.323950 1.778503e-04 2.218760
> 7.585783 12.233056 6.679776e-06 2.155867
> 6.953215 6.996525 2.353705e-04 1.685733
> 7.647513 8.099859 9.512639e-05 1.674742
> 7.285446 7.558675 1.463732e-04 1.584071
> 6.405605 3.344031 1.276812e-02 1.541569
> 
> I would like to replace the values in column 'FC' which are >2 
> with their squared value.
> If I do this, however, I get a warning but it does the sum correctly.
> Warning message:
> number of items to replace is not a multiple of replacement length 
> in: zz[, 4][zz[, 4] > 2] <- zz[, 4]^2 

Try
zz$FC[zz$FC > 2] <- (zz$FC[zz$FC > 2])^2

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:27}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] replace numbers in a column conditional on their value

2008-01-16 Thread Jabez Wilson
Dear R help,
   
  I have a data frame column in which I would like to replace some of the 
numbers dependent on their value.
   
  data frame = zz
   
  AveExpr t  P.Value   FC
7.481964  7.323950 1.778503e-04 2.218760
7.585783 12.233056 6.679776e-06 2.155867
6.953215  6.996525 2.353705e-04 1.685733
7.647513  8.099859 9.512639e-05 1.674742
7.285446  7.558675 1.463732e-04 1.584071
6.405605  3.344031 1.276812e-02 1.541569

  I would like to replace the values in column 'FC' which are >2 with their 
squared value.
  If I do this, however, I get a warning but it does the sum correctly.
  Warning message:
number of items to replace is not a multiple of replacement length in: zz[, 
4][zz[, 4] > 2] <- zz[, 4]^2 
   
  Is there a way to do this without the warning?
   

   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.