[R] percent sign in plot annotation

2007-01-17 Thread Martin Keller-Ressel
Hello,

I would like to annotate a graph with the expression 'alpha = 5%' (the  
alpha should be displayed as the greek letter).
I tried

> text(1,1,expression(alpha == 5%))

which gives a syntax error.
escaping the percent sign (\%) or doubling (%%) does not help.
What do I do?

Thanks,

Martin Keller-Ressel



-- 
Martin Keller-Ressel
Research Unit of Financial and Actuarial Mathematics
TU Vienna

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parallel coordinates plot

2007-01-17 Thread Jim Lemon
Marco Helbich wrote:
> Dear List,
> 
> I want to make a parallel coordinates plot with the specific variables on
> the abscissa and the cases on the ordinate should be dyed dependent on
> another nominal variable from the data frame. I use the parcoord function.
> 
Hi Marco,
If I understand your question, you want to color the points displayed 
for each variable dependent upon some nominal value (probably a factor).
Say you have a data frame like this, you could plot:

my.df<-data.frame(var.names=paste("V",1:10,sep=""),
  var.values=rnorm(10)+5,var.colors=sample(2:4,10,TRUE))
plot(my.df$var.values,col=my.df$var.colors,axes=FALSE)
box()
axis(1,at=1:10,labels=as.character(my.df$var.names))
axis(2)

Jim

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] percent sign in plot annotation

2007-01-17 Thread Bettina Gruen
Martin Keller-Ressel wrote:
> Hello,
> 
> I would like to annotate a graph with the expression 'alpha = 5%' (the  
> alpha should be displayed as the greek letter).
> I tried
> 
>> text(1,1,expression(alpha == 5%))

Try
text(1,1,expression(alpha == 5*"%"))

Best,
Bettina

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] percent sign in plot annotation

2007-01-17 Thread Dimitris Rizopoulos
you can try

text(1, 1, expression(paste(alpha, " = 5%")))


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: "Martin Keller-Ressel" <[EMAIL PROTECTED]>
To: 
Sent: Wednesday, January 17, 2007 10:57 AM
Subject: [R] percent sign in plot annotation


> Hello,
>
> I would like to annotate a graph with the expression 'alpha = 5%' 
> (the
> alpha should be displayed as the greek letter).
> I tried
>
>> text(1,1,expression(alpha == 5%))
>
> which gives a syntax error.
> escaping the percent sign (\%) or doubling (%%) does not help.
> What do I do?
>
> Thanks,
>
> Martin Keller-Ressel
>
>
>
> -- 
> Martin Keller-Ressel
> Research Unit of Financial and Actuarial Mathematics
> TU Vienna
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] percent sign in plot annotation

2007-01-17 Thread Philipp Pagel
On Wed, Jan 17, 2007 at 09:57:49AM -, Martin Keller-Ressel wrote:
> I would like to annotate a graph with the expression 'alpha = 5%' (the  
> alpha should be displayed as the greek letter).
> I tried
> 
> > text(1,1,expression(alpha == 5%))

text(1,1, expression(paste(alpha == 5, '%')) )

cu
Philipp

-- 
Dr. Philipp PagelTel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics  Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] percent sign in plot annotation

2007-01-17 Thread Gavin Simpson
On Wed, 2007-01-17 at 09:57 +, Martin Keller-Ressel wrote:
> Hello,
> 
> I would like to annotate a graph with the expression 'alpha = 5%' (the  
> alpha should be displayed as the greek letter).
> I tried
> 
> > text(1,1,expression(alpha == 5%))
> 
> which gives a syntax error.
> escaping the percent sign (\%) or doubling (%%) does not help.
> What do I do?
> 
> Thanks,
> 
> Martin Keller-Ressel

Escaping a % with \ and then escaping the \ is not valid syntactically.

This works, but there may be better ways to do this:

plot(0:10, 0:10, type = "n")
text(5,5,expression(paste(alpha == 5, "%", sep = "")))

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] percent sign in plot annotation

2007-01-17 Thread Martin Keller-Ressel
Thanks to Bettina, Dimitris and Gavin for their help.
All their solutions work nicely.
For future reference, here are three ways to draw a percent sign in R  
plots:

plot(0:10, 0:10, type = "n")
text(5,7,expression(paste(alpha == 5, "%", sep = "")))
text(5, 5, expression(paste(alpha, " = 5%")))
text(5,3,expression(alpha == 5*"%"))

best regards,
Martin Keller-Ressel



-- 
Martin Keller-Ressel
Research Unit of Financial and Actuarial Mathematics
TU Vienna

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sleep data

2007-01-17 Thread Tom Backer Johnsen
When reading the documentation for the "sleep" data set in R, the 
impression is clear, this is an "independent groups" kind of design 
(two groups of 10 subjects each).  However, when browsing the original 
article (referred to in the help file), my impression is quite clear, 
this is really a "repeated measures" kind of data (one group of 10 
subjects, two observations).  What is correct?

Tom

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested hierarchical design

2007-01-17 Thread lorenz.gygax
> 1. The first I used to do in SPSS and I would like to be able 
> to do it in R as well.
> This is the hierarchical model I would like to use: a continuous 
> variable explained by factor A(fixed) + factor B(random) 
> nested in A + factor C (random) nested in factor B (which is nested
> in A).

You would need the following, I guess:

lme (var ~ A, data= NameOfDataFrame, random= ~ 1 | B/C)

This means that C is nested in B. If all C within B have the same levels of A, 
the model automatically treats B as nested in A(but you can check that based on 
the degrees of freedom in the output).

> 2. the same model but than for count data (like 15 out of 30, 
> 23 out of 60) instead of the continous variable(I know the basics
> of glm in R)

I guess, this can be viewed as a data set with binary outcomes and thus modeled 
by a generalised mixed-effects model with a binomial distribution using glmmPQL 
(in library MASS) or lmer (in library lme4). Please refer to:

@Book{Pin:00a,
  author = {Pinheiro, Jose C and Bates, Douglas M},
  title = {Mixed-Effects Models in {S} and {S}-{P}{L}{U}{S}},
  publisher = {Springer},
  year = {2000},
  address = {New York}
}

@BOOK {Ven:02,
  AUTHOR = {Venables, W N and Ripley, B D},
  TITLE = {Modern Applied Statistics with {S}},
  PUBLISHER = {Springer},
  YEAR = {2002},
  ADDRESS = {New York},
  EDITION = {fourth}
}

@Article{Rnews:Bates:2005,
  author   = {Douglas Bates},
  title= {Fitting Linear Mixed Models in {R}},
  journal  = {R News},
  year = 2005,
  volume   = 5,
  number   = 1,
  pages= {27--30},
  month= {May},
  url  = {http://CRAN.R-project.org/doc/Rnews/},
}

and: http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests

Do not be surprised if the results are not identical to those in SPSS ...

Regards, Lorenz Gygax
- 
Swiss Federal Veterinary Office
Centre for proper housing of ruminants and pigs
Agroscope Reckenholz-Tänikon Research Station ART

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with unlist POSIX date at midnight

2007-01-17 Thread Hanneke Schuurmans
Dear R-users,

I use unlist of POSIX dates to extract the year, hour etc. With that I 
can search for files in my database which are in the form 
'mmddhh_synops.txt'
However, I get stucked during midnight where unlist just gives NA's.
The script is given below, the problem accurs at acc.period[16] 
(midnight). However when I write out the character, unlist works well. 
But as.character(acc.period[16]) is not the solution

begin=paste(dates[i]-1,"09:00:00")
end=paste(dates[i],"08:00:00")
acc.period=seq(as.POSIXct(begin),as.POSIXct(end),"hour")

unlist(strptime(acc.period[16],format="%Y-%m-%d %H:%M:%S"))
# sec   min  hour  mday   mon  year  wday  yday isdst
#  NANANANANANANANA-1
unlist(strptime(acc.period[17],format="%Y-%m-%d %H:%M:%S"))
#  sec   min  hour  mday   mon  year  wday  yday isdst
 #   0 0 1 1 2   106 359 0
 
 a="2006-03-01 00:00:00"
 unlist(strptime(a,format="%Y-%m-%d %H:%M:%S"))
#  sec   min  hour  mday   mon  year  wday  yday isdst
#   0 0 0 1 2   106 359 0

Could someone help?

Thanks in advance!

Hanneke


-- 
-
ir J.M. (Hanneke) Schuurmans
PhD student Hydrology
Department of Physical Geography
Faculty of Geosciences - Universiteit Utrecht
P.O. Box 80115  3508 TC  Utrecht
T +31 (0)30 2532988  F +31 (0)30 2531145
W www.geo.uu.nl/staff/schuurmans

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Repeated measures

2007-01-17 Thread Tom Backer Johnsen
I am having a hard time understanding how to perform a "repeated 
measures" type of ANOVA with R.  When reading the document found here:

http://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_repms.html

I find that there is a reference to a function make.rm () that is 
supposed to rearrange a "one row per person" type of frame to a "one 
row per observation" type of frame.  But that function does not seem 
to be there.  Nor does the help.search suggest anything.  Is that 
function buried in some package?

Is there  some simple documentation that might be useful somewhere? 
Starting with a really simple problem (one group, two observations)?

Tom

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer or glm with family=binomial : probability variable

2007-01-17 Thread Francesca
Dear all,
We are dealing with a variable (BA) which indicates the overlap between
small mammal home ranges. It varies between 0 and 1 and it can be
interpreted as "the probability of two home ranges to overlap",
therefore we would have modelled it with the binomial family, also
supported by  the distribution of the variable itself.  However, lmer or
glm require the data to be presented as successes vs failures. In our
case, this is not possible as BA is calculated by GIS on raster maps; in
other words, BA expressess p (probability of success), but it is not
possible to know from how many cases/attempts p came from. 
Therefore, what we get from the analysis is:

   IDAN_IDAN SESSO   SESSIONE BA
1   1D00AD9_1D1421F   F_F1  5.909904e-06
2   1D00AD9_602F513   M_F1  5.640469e-03
3   1D00AD9_602FEAB   M_F1  3.715911e-13
4   1D00AD9_603086B   F_F1  2.350365e-17
5   1D00AD9_60778A4   M_F1  1.589195e-08
6   1D00AD9_60779D7   F_F1  7.343189e-22
7   1D00AD9_6723D30   M_F1  8.725496e-01
8   1D1421F_602F513   M_F1  6.757339e-02
9   1D1421F_602FEAB   M_F1  7.612337e-01
10  1D1421F_603086B   F_F1  4.623883e-06
11  1D1421F_60778A4   M_F1  2.856006e-01
12  1D1421F_60779D7   F_F1  9.752100e-11
13  1D1421F_6723D30   M_F1  8.921498e-08
14  602F513_602FEAB   M_M1  2.127866e-02
15  602F513_603086B   M_F1  6.695516e-05
16  1D00AD9_671ED61   M_F2  3.873126e-01
17  1D00AD9_6723D30   M_F2  2.080799e-01
18  1D00AD9_672594F   M_F2  3.983634e-15
19  1D1421F_602FEAB   M_F2  2.956002e-01
20  1D1421F_603086B   F_F2  2.150006e-06
21  1D1421F_60314C4   F_F2  1.947681e-21
22  1D1421F_6033E53   M_F2  1.855792e-01
23  1D1421F_60655F4   F_F2  1.242808e-02
24  1D1421F_60778A4   M_F2  1.398984e-02

> SESSIONE1<-factor(SESSIONE)
> model<-lmer(BA~ SESSO + (1|SESSIONE1:IDAN_IDAN) + (1|SESSIONE1),
data=foglio1, family=binomial)

Warning messages:
1: #non integer successes in glm binomial model! in: eval(expr, envir,
enclos) 
2: nlminb returned message singular convergence (7) 
 in: LMEopt(x = mer, value = cv) 
3: nlminb returned message false convergence (8) 
 in: LMEopt(x = mer, value = cv) 
4: nlminb returned message singular convergence (7) 
 in: LMEopt(x = mer, value = cv) 
5: nlminb returned message false convergence (8) 
 in: LMEopt(x = mer, value = cv) 
6: nlminb returned message false convergence (8) 
 in: LMEopt(x = mer, value = cv) 
7: IRLS iterations for PQL did not converge 

Is there any possibility to model p vs q=1-p without passing by
successes vs failures frequencies?

Thank you very much for helping!!!

Best regards

Francesca Cagnacci


Francesca Cagnacci, PhD

Centro di Ecologia Alpina
Viote del Monte Bondone
38040 Trento
Tel. +393388668767 or +393397481073
Email [EMAIL PROTECTED] or [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with unlist POSIX date at midnight

2007-01-17 Thread Prof Brian Ripley
On Wed, 17 Jan 2007, Hanneke Schuurmans wrote:

> Dear R-users,
>
> I use unlist of POSIX dates to extract the year, hour etc. With that I
> can search for files in my database which are in the form
> 'mmddhh_synops.txt'
> However, I get stucked during midnight where unlist just gives NA's.
> The script is given below, the problem accurs at acc.period[16]
> (midnight). However when I write out the character, unlist works well.
> But as.character(acc.period[16]) is not the solution
>
> begin=paste(dates[i]-1,"09:00:00")
> end=paste(dates[i],"08:00:00")
> acc.period=seq(as.POSIXct(begin),as.POSIXct(end),"hour")
>
> unlist(strptime(acc.period[16],format="%Y-%m-%d %H:%M:%S"))
> # sec   min  hour  mday   mon  year  wday  yday isdst
> #  NANANANANANANANA-1
> unlist(strptime(acc.period[17],format="%Y-%m-%d %H:%M:%S"))
> #  sec   min  hour  mday   mon  year  wday  yday isdst
> #   0 0 1 1 2   106 359 0
>
> a="2006-03-01 00:00:00"
> unlist(strptime(a,format="%Y-%m-%d %H:%M:%S"))
> #  sec   min  hour  mday   mon  year  wday  yday isdst
> #   0 0 0 1 2   106 359 0
>
> Could someone help?

Please see the footer comment

> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

We do need that to help you.  Using unclass() rather than unlist() will 
help us see what strptime actually returned (I suspect the NAs are from 
there as your (unshown) input was invalid, e.g. had hour=24).

We also need the other information the posting guide asks for, such as the 
output of sessionInfo().

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer or glm with family=binomial : probability variable

2007-01-17 Thread lorenz.gygax
> We are dealing with a variable (BA) which indicates the overlap
> between small mammal home ranges. It varies between 0 and 1 and it
> can be interpreted as "the probability of two home ranges to
> overlap", therefore we would have modelled it with the binomial
> family, also supported by  the distribution of the variable itself.  
> However, lmer or glm require the data to be presented as successes
> vs failures. In our case, this is not possible as BA is calculated
> by GIS on raster maps; in other words, BA expressess p (probability
> of success), but it is not possible to know from how many
> cases/attempts p came from. 

These models internally use the logit link. You could potentially transform 
your response variable accordingly and fit a model based on the assumption of 
normally distributed variables (of course, you have to check whether the 
corresponding assumptions were met e.g. by conducting an analysis of the 
residuals).

the logit link is: log (P/(1-P))

Regards, Lorenz Gygax
- 
Swiss Federal Veterinary Office
Centre for proper housing of ruminants and pigs
Agroscope Reckenholz-Tänikon Research Station ART

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does R implement DBSCAN , ROCK, BIRCH?

2007-01-17 Thread Bhanu Kalyan.K
I saw that R language has a cluster package which has in built PAM, CLARA and 
Kmeans (and many more) Clustering Algorithms. 

But, I couldnot find DBSCAN, ROCK, BIRCH algorithms (which I feel are standard 
ones). Aren't these implemented as well? 


Bhanu Kalyan K
B.Tech Final Year, CSE

Tel: +91-9885238228

Alternate E-Mail: 
[EMAIL PROTECTED]

 
-
We won't tell. Get more on shows you hate to love

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] curious about dimension of 'apply' output when MARGIN=1

2007-01-17 Thread Patrick Burns
A logical reason for the phenomenon is that
matrices are stored down their columns. For
example:

 > matrix(1:15,5)
 [,1] [,2] [,3]
[1,]16   11
[2,]27   12
[3,]38   13
[4,]49   14
[5,]5   10   15

When an 'apply' across rows is done, it will be
the values corresponding to each of the rows that
are together.

For matrices, merely transposing the result fixes
the "problem", but it is considerably more complex
in higher dimensional arrays.

There could be a spectrum of opinion from:

the original programmer was lazy and didn't adequately
serve users

to:

the simpler the program the fewer bugs there will be.

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")


Gabor Grothendieck wrote:

>The reshape package has an idempotent apply, iapply:
>
>  
>
>>library(reshape)
>>iapply(M,1,function(row) row+c(1,2))
>>
>>
> [,1] [,2]
>[1,]26
>[2,]37
>[3,]48
>
>On 1/16/07, Benjamin Tyner <[EMAIL PROTECTED]> wrote:
>  
>
>>Reading the documentation for 'apply', I understand the following is
>>working exactly as documented:
>>
>> > M<-matrix(1:6,ncol=2)
>> > M
>>[,1] [,2]
>>[1,]14
>>[2,]25
>>[3,]36
>> > apply(M,2,function(column) column+c(1,2,3))
>>[,1] [,2]
>>[1,]25
>>[2,]47
>>[3,]69
>> > apply(M,1,function(row) row+c(1,2))
>>[,1] [,2] [,3]
>>[1,]234
>>[2,]678
>>
>>I'm not proposing any changes or extra arguments to 'apply'. Rather, I'm
>>wondering what is the benefit for (or rationale behind) this somewhat
>>unintuitive behavior in the case that MARGIN=1.
>>
>>Thanks,
>>Ben
>>
>>__
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>__
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>  
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures

2007-01-17 Thread Chuck Cleland
Tom Backer Johnsen wrote:
> I am having a hard time understanding how to perform a "repeated 
> measures" type of ANOVA with R.  When reading the document found here:
> 
> http://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_repms.html
> 
> I find that there is a reference to a function make.rm () that is 
> supposed to rearrange a "one row per person" type of frame to a "one 
> row per observation" type of frame.  But that function does not seem 
> to be there.  Nor does the help.search suggest anything.  Is that 
> function buried in some package?

  I'm not able to find that function.  Perhaps that document is out of date.

> Is there  some simple documentation that might be useful somewhere? 
> Starting with a really simple problem (one group, two observations)?

  Here is an example showing the use of reshape() and analysis via aov()
and lme() in the nlme package.

tolerance <-
read.table("http://www.ats.ucla.edu/stat/Splus/examples/alda/tolerance1.txt";,
sep=",", header=TRUE)

tolerance.long <- reshape(tolerance,
  varying = list(c("tol11","tol12","tol13",
   "tol14", "tol15")),
  v.names = c("tol"), timevar = "time",
  times = 11:15, direction = "long")

tolerance.aov <- aov(tol ~ as.factor(time) * male + Error(id),
 data = tolerance.long)

summary(tolerance.aov)

Error: id
 Df   Sum Sq  Mean Sq
male  1 0.085168 0.085168

Error: Within
 Df  Sum Sq Mean Sq F value  Pr(>F)
as.factor(time)   4  2.8326  0.7081  3.0538 0.02236 *
male  1  0.3024  0.3024  1.3039 0.25745
as.factor(time):male  4  0.1869  0.0467  0.2015 0.93670
Residuals69 16.0002  0.2319
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(nlme)

tolerance.lme <- lme(tol ~ as.factor(time) * male, random = ~ 1 | id,
 data = tolerance.long)

anova(tolerance.lme)
 numDF denDF  F-value p-value
(Intercept)  156 353.9049  <.0001
as.factor(time)  456   5.1309  0.0014
male 114   0.6071  0.4488
as.factor(time):male 456   0.3386  0.8508

  RSiteSearch("repeated measures") points to other examples, functions,
and documentation.

> Tom
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] percent sign in plot annotation

2007-01-17 Thread Peter Dalgaard
Martin Keller-Ressel wrote:
> Thanks to Bettina, Dimitris and Gavin for their help.
> All their solutions work nicely.
> For future reference, here are three ways to draw a percent sign in R  
> plots:
>
> plot(0:10, 0:10, type = "n")
> text(5,7,expression(paste(alpha == 5, "%", sep = "")))
> text(5, 5, expression(paste(alpha, " = 5%")))
> text(5,3,expression(alpha == 5*"%"))
>
>   
A bit surprising that nobody mentioned

expression(alpha == "5%")

-p



-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] curious about dimension of 'apply' output when MARGIN=1

2007-01-17 Thread Antonio, Fabio Di Narzo
2007/1/17, Patrick Burns <[EMAIL PROTECTED]>:
> A logical reason for the phenomenon is that
> matrices are stored down their columns. For
> example:
>
>  > matrix(1:15,5)
>  [,1] [,2] [,3]
> [1,]16   11
> [2,]27   12
> [3,]38   13
> [4,]49   14
> [5,]5   10   15
>
> When an 'apply' across rows is done, it will be
> the values corresponding to each of the rows that
> are together.
>
> For matrices, merely transposing the result fixes
> the "problem", but it is considerably more complex
> in higher dimensional arrays.

I routinely use 'aperm'

>
> There could be a spectrum of opinion from:
>
> the original programmer was lazy and didn't adequately
> serve users
>
> to:
>
> the simpler the program the fewer bugs there will be.
>
> Patrick Burns
> [EMAIL PROTECTED]
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
>
> Gabor Grothendieck wrote:
>
> >The reshape package has an idempotent apply, iapply:
> >
> >
> >
> >>library(reshape)
> >>iapply(M,1,function(row) row+c(1,2))
> >>
> >>
> > [,1] [,2]
> >[1,]26
> >[2,]37
> >[3,]48
> >
> >On 1/16/07, Benjamin Tyner <[EMAIL PROTECTED]> wrote:
> >
> >
> >>Reading the documentation for 'apply', I understand the following is
> >>working exactly as documented:
> >>
> >> > M<-matrix(1:6,ncol=2)
> >> > M
> >>[,1] [,2]
> >>[1,]14
> >>[2,]25
> >>[3,]36
> >> > apply(M,2,function(column) column+c(1,2,3))
> >>[,1] [,2]
> >>[1,]25
> >>[2,]47
> >>[3,]69
> >> > apply(M,1,function(row) row+c(1,2))
> >>[,1] [,2] [,3]
> >>[1,]234
> >>[2,]678
> >>
> >>I'm not proposing any changes or extra arguments to 'apply'. Rather, I'm
> >>wondering what is the benefit for (or rationale behind) this somewhat
> >>unintuitive behavior in the case that MARGIN=1.
> >>
> >>Thanks,
> >>Ben
> >>
> >>__
> >>R-help@stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >
> >__
> >R-help@stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Antonio, Fabio Di Narzo
Ph.D. student at
Department of Statistical Sciences
University of Bologna, Italy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Effect size in GLIM models

2007-01-17 Thread Behnke Jerzy
Dear All,
I wonder if anyone can advise me as to whether there is a consensus as
to how the effect size should be calculated from GLIM models in R for
any specified significant main effect or interaction.

In investigating the causes of variation in infection in wild animals,
we have fitted 4-way GLIM models in R with negative binomial errors.
These are then simplified using the STEP procedure, and finally each of
the remaining terms is deleted in turn, and the model without that term
compared to a model with that term to estimate probability

An ANOVA of each model gives the deviance explained by each interaction
and main effect, and the percentage deviance attributable to each factor
can be calculated from NULL deviance.

However, we estimate probabilities by subsequent deletion of terms, and
this gives the LR statistic. Expressing the value of the LR statistic as
a percentage of 2xlog-like in a model without any factors, gives lower
values than the former procedure.

Are either of these appropriate? If so which is best, or alternatively
how can % deviance be calculated. We require % deviance explained by
each factor or interaction,  because we need to compare individual
factors (say host age) across a range of infections.

Any advice will be most gratefully appreciated. I can send you a worked
example if you require more information.

Jerzy. M. Behnke,
The School of Biology,
The University of Nottingham,
University Park,
NOTTINGHAM, NG7 2RD

Tel: +0044 (0) 115 951 3208
Fax: +0044 (0) 115 951 3251

http://www.nottingham.ac.uk/biology/contact/academics/behnke/overview.ph
tml?P=1&R=1&S=&ID=11&from=iai&m1=&m2=
Useful links to field stations:
http://www.quintastudies.info/index.htm



This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on variable ranking

2007-01-17 Thread Rupendra Chulyadyo
Thanks for quick responses.

My dataset consists of 213 independent variables (disease related costs) and
the depedent variable is the total medical + pharmacy cost of the member.

As you had suggested, I tried to use hier.part function in R, but the
current implementation does not seem to allow more than 13 independent
variables at a time. Meanwhile, I am also trying to get access to the paper
"Variable importance by partitioning R^2".

I am currently using t-values from multiple regression to perform ranking of
variables. But the result does not seem to be satisfactory.

With regards,
Rupendra Chulyadyo
Institute of Engineering,
Tribhuvan University,
Nepal


On 1/17/07, Simon Blomberg <[EMAIL PROTECTED]> wrote:
>
> Before you do that, you might try reading this paper:
>
> Bring, J. 1995. Variable importance by partitioning R^2. Quality and
> Quantity 29:173-189.
>
> Cheers,
>
> Simon.
>
> Andrew Robinson wrote:
> > Rupendra,
> >
> > depending on the nature of your data (which you haven't mentioned),
> > you might try hierarchical partitioning, as found in the hier.part
> > package on CRAN.
> >
> > Cheers
> >
> > Andrew
> >
> > On Wed, Jan 17, 2007 at 11:07:18AM +0545, Rupendra Chulyadyo wrote:
> >> Hello all,
> >>
> >> I want to assign relative score to the predictor variables on the basis
> of
> >> its influence on the dependent variable. But I could not find any
> standard
> >> statistical approach appropriate for this purpose.
> >> Please suggest the possible approaches.
> >>
> >> Thanks in advance,
> >>
> >> Rupendra Chulyadyo
> >> Institute of Engineering,
> >> Tribhuvan University,
> >> Nepal
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
> Centre for Resource and Environmental Studies
> The Australian National University
> Canberra ACT 0200
> Australia
> T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
> F: +61 2 6125 0757
> CRICOS Provider # 00120C
>
> The combination of some data and an aching desire for
> an answer does not ensure that a reasonable answer
> can be extracted from a given body of data.
> - John Tukey.
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with unlist POSIX date at midnight

2007-01-17 Thread Hanneke Schuurmans
Dear R-users,

I use unlist of POSIX dates to extract the year, hour etc. With that I 
can search for files in my database which are in the form 
'mmddhh_synops.txt'
However, I get stucked during midnight where unlist just gives NA's.
The script is given below; the problem accurs at acc.period[16] 
(midnight). However when I write out the character, unlist works well. 
But as.character(acc.period[16]) is not the solution

 > dates=seq(as.Date("2006-03-01"), as.Date("2006-11-01"), "days")
 > i=1
 > begin=paste(dates[i]-1,"09:00:00")
 > end=paste(dates[i],"08:00:00")
 > acc.period=seq(as.POSIXct(begin),as.POSIXct(end),"hour")
 >
 > unlist(strptime(acc.period[15],format="%Y-%m-%d %H:%M:%S"))
  sec   min  hour  mday   mon  year  wday  yday isdst
0 02328 1   106 258 0
 > unlist(strptime(acc.period[16],format="%Y-%m-%d %H:%M:%S"))
  sec   min  hour  mday   mon  year  wday  yday isdst
   NANANANANANANANA-1
 > unclass(strptime(acc.period[16],format="%Y-%m-%d %H:%M:%S"))
$sec
[1] NA

$min
[1] NA

$hour
[1] NA

$mday
[1] NA

$mon
[1] NA

$year
[1] NA

$wday
[1] NA

$yday
[1] NA

$isdst
[1] -1

 >
 > a="2006-03-01 00:00:00"
 > unlist(strptime(a,format="%Y-%m-%d %H:%M:%S"))
  sec   min  hour  mday   mon  year  wday  yday isdst
0 0 0 1 2   106 359 0
 >
 > sessionInfo()
R version 2.4.0 (2006-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] "methods"   "stats" "graphics"  "grDevices" "utils" "datasets"
[7] "base"

Thanks in advance!

Hanneke

-- 
-
ir J.M. (Hanneke) Schuurmans
PhD student Hydrology
Department of Physical Geography
Faculty of Geosciences - Universiteit Utrecht
P.O. Box 80115  3508 TC  Utrecht
T +31 (0)30 2532988  F +31 (0)30 2531145
W www.geo.uu.nl/staff/schuurmans

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on variable ranking

2007-01-17 Thread Frank E Harrell Jr
Rupendra Chulyadyo wrote:
> Hello all,
> 
> I want to assign relative score to the predictor variables on the basis of
> its influence on the dependent variable. But I could not find any standard
> statistical approach appropriate for this purpose.
> Please suggest the possible approaches.
> 
> Thanks in advance,
> 
> Rupendra Chulyadyo
> Institute of Engineering,
> Tribhuvan University,
> Nepal

You might consider using the bootstrap to get confidence intervals of 
the rank of each predictor's partial chi-square or partial R-square. 
The following takes into account all terms that might be associated with 
a variable (nonlinear and interaction terms, dummy variables).  The code 
is taken from an example in the anova.Design help file in the Design 
package.  Unless the dataset is huge and there is little collinearity, 
you will be surprised how difficult it is to pick winners and losers 
from the predictors [try ranking gene expressions from gene microarray 
data for even more of a shock].  Note that Bayesian ranking procedures 
are more accurate, but this quick and dirty approach isn't bad.

mydata <- data.frame(x1=runif(200), x2=runif(200),
  sex=factor(sample(c('female','male'),200,TRUE)))
set.seed(9)  # so can reproduce example
mydata$y <- ifelse(runif(200)<=plogis(mydata$x1-.5 + .5*(mydata$x2-.5) +
.5*(mydata$sex=='male')),1,0)

library(Design)
library(boot)
b <- boot(mydata, function(data, i, ...) rank(-plot(anova(
 lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data,subset=i)),
 sort='none', pl=FALSE)),
 R=25)  # should really do R=500 but will take a while
Rank <- b$t0
lim <- t(apply(b$t, 2, quantile, probs=c(.025,.975)))
# Use the Hmisc Dotplot function to display ranks and their confidence
# intervals.  Sort the categories by descending adj. chi-square, for ranks
original.chisq <- plot(anova(lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data=mydata)),
sort='none', pl=FALSE)
predictor <- as.factor(names(original.chisq))
predictor <- reorder.factor(predictor, -original.chisq)

Dotplot(predictor ~ Cbind(Rank, lim), pch=3, xlab='Rank',
 main=expression(paste(
'Ranks and 0.95 Confidence Limits for ',chi^2,' - d.f.')))

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with unlist POSIX date at midnight

2007-01-17 Thread Prof Brian Ripley
[Unstated, but a near repeat of a posting earlier today with the same 
subject, in response to unacknowledged help.]

The problem is not with unlist (as in your subject line) but your input to
strptime.

strptime() is intended to convert character strings, and

> as.character(acc.period[16])
[1] "2006-03-01"

is not in the format you specified.  Why are you using strptime() to 
convert objects of class POSIXct to class POSIXlt and not as.POSIXlt()?


On Wed, 17 Jan 2007, Hanneke Schuurmans wrote:

> Dear R-users,
>
> I use unlist of POSIX dates to extract the year, hour etc. With that I
> can search for files in my database which are in the form
> 'mmddhh_synops.txt'
> However, I get stucked during midnight where unlist just gives NA's.
> The script is given below; the problem accurs at acc.period[16]
> (midnight). However when I write out the character, unlist works well.
> But as.character(acc.period[16]) is not the solution
>
> > dates=seq(as.Date("2006-03-01"), as.Date("2006-11-01"), "days")
> > i=1
> > begin=paste(dates[i]-1,"09:00:00")
> > end=paste(dates[i],"08:00:00")
> > acc.period=seq(as.POSIXct(begin),as.POSIXct(end),"hour")
> >
> > unlist(strptime(acc.period[15],format="%Y-%m-%d %H:%M:%S"))
>  sec   min  hour  mday   mon  year  wday  yday isdst
>0 02328 1   106 258 0
> > unlist(strptime(acc.period[16],format="%Y-%m-%d %H:%M:%S"))
>  sec   min  hour  mday   mon  year  wday  yday isdst
>   NANANANANANANANA-1
> > unclass(strptime(acc.period[16],format="%Y-%m-%d %H:%M:%S"))
> $sec
> [1] NA
>
> $min
> [1] NA
>
> $hour
> [1] NA
>
> $mday
> [1] NA
>
> $mon
> [1] NA
>
> $year
> [1] NA
>
> $wday
> [1] NA
>
> $yday
> [1] NA
>
> $isdst
> [1] -1
>
> >
> > a="2006-03-01 00:00:00"
> > unlist(strptime(a,format="%Y-%m-%d %H:%M:%S"))
>  sec   min  hour  mday   mon  year  wday  yday isdst
>0 0 0 1 2   106 359 0
> >
> > sessionInfo()
> R version 2.4.0 (2006-10-03)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "methods"   "stats" "graphics"  "grDevices" "utils" "datasets"
> [7] "base"
>
> Thanks in advance!
>
> Hanneke
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size in GLIM models

2007-01-17 Thread Prof Brian Ripley
On Wed, 17 Jan 2007, Behnke Jerzy wrote:

> Dear All,
> I wonder if anyone can advise me as to whether there is a consensus as
> to how the effect size should be calculated from GLIM models in R for
> any specified significant main effect or interaction.

I think there is consensus that effect sizes are not measured by 
significance tests.  If you have a log link (you did not say), the model 
coefficients have a direct interpretation via multiplicative increases in 
rates.

> In investigating the causes of variation in infection in wild animals,
> we have fitted 4-way GLIM models in R with negative binomial errors.

What exactly do you mean by 'GLIM models in R with negative binomial 
errors'?  Negative binomial regression is within the GLM framework only 
for fixed shape theta. Package MASS has glm.nb() which extends the 
framework and you may be using without telling us.  (AFAIK GLIM is a 
software package, not a class of models.)

I suspect you are using the code from MASS without reference to the book
it supports, which has a worked example of model selection.

> These are then simplified using the STEP procedure, and finally each of
> the remaining terms is deleted in turn, and the model without that term
> compared to a model with that term to estimate probability

'probability' of what?

> An ANOVA of each model gives the deviance explained by each interaction
> and main effect, and the percentage deviance attributable to each factor
> can be calculated from NULL deviance.

If theta is not held fixed, anova() is probably not appropriate: see the 
help for anova.negbin.

> However, we estimate probabilities by subsequent deletion of terms, and
> this gives the LR statistic. Expressing the value of the LR statistic as
> a percentage of 2xlog-like in a model without any factors, gives lower
> values than the former procedure.

I don't know anything to suggest percentages of LR statistics are 
reasonable summary measures.  There are extensions of R^2 to these models, 
but AFAIK they share the well-attested drawbacks of R^2.

> Are either of these appropriate? If so which is best, or alternatively
> how can % deviance be calculated. We require % deviance explained by
> each factor or interaction,  because we need to compare individual
> factors (say host age) across a range of infections.
>
> Any advice will be most gratefully appreciated. I can send you a worked
> example if you require more information.

We do ask for more information in the posting guide and the footer of 
every message.  I have had to guess uncomfortably much in formulating my 
answers.

> Jerzy. M. Behnke,
> The School of Biology,
> The University of Nottingham,
> University Park,
> NOTTINGHAM, NG7 2RD
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: model simplification in repeated anova

2007-01-17 Thread Michael Greeff
Hi,

I tried to do a model simplification in repeated anovas. obviously  
it's not possible to use the function "anova" straight away. I got  
the suggestion to do it with proj (see below). In my data, however, I  
have in "block" not the third order interaction (n:p:k), but to  
factors and their interaction (population, site, population:site).

I somehow didn't manage to get the right aov after the projection.


> From: "Richard M. Heiberger" <[EMAIL PROTECTED]>
>
>
> npk.aovE <- aov(yield ~  N*P*K + Error(block), npk)
>
> np.k.aovE <- aov(yield ~  N*P+K + Error(block), npk)
>
>
> npk.proj <- proj(npk.aovE)
> npk.x <- npk.proj$block[,"N:P:K"]
>
> npk.aov <- aov(terms(yield ~  npk.x + block + N+P+K+N:P+N:K+P:K,  
> keep.order=TRUE), npk)
>
> np.k.aov <- aov(terms(yield ~  block + N+P+K+N:P, keep.order=TRUE),  
> npk)
>
> summary(npk.aov)
> summary(np.k.aov)
>
> anova(npk.aov, np.k.aov)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size in GLIM models

2007-01-17 Thread Reader Tom
Dear All

Thanks very much for the rapid reply Prof Ripley. I had been looking at
this anaysis for my colleague (Prof Behnke) and suggested that he
contact the R mailing list because I couldn't answer his question. I
think some of the detail got lost in translation (he grew up with the
GLIM package!). So here are some more details:

We are indeed following your guidelines in the MASS book, and using
glm.nb to analyse some data on the abundance of several parasite species
in mice. We proceeded with model selection as suggested, and we are
reasonably happy that we end up with decent models for our several
parasite species. 

The question that Prof Behnke asked is: if we fit similar models
(initial full models have the same factors and covariates) with
different response variables (abundances of different species of
parasite), is there a way of comparing the relative effect sizes of the
key explanatory variables across different models? For example, if we
find that the best models for two different species include the term
"sex", is there a way of determining if sex explains more of the
variance in parasite abundance in species A than in species B? 

In a simple ANOVA with Guassian errors, we might compare the percentage
variance explained. We could also look at the overall r^2 for the models
and determine how well (relatively) our different models perform. We
might end up concluding that for species A, we have found the most
important biolgoical factors explaining parasite abundance, but that for
species B we have yet to explain a large proportion of the variance.

Is there something similar we can do with our glm.nb models? Clearly the
coefficients will tell us about relative effect sizes WITHIN a given
model, but what can we do when comparing completely different response
variables?!

Regards

Tom Reader

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: 17 January 2007 14:02
To: Behnke Jerzy
Cc: r-help@stat.math.ethz.ch; Reader Tom
Subject: Re: [R] Effect size in GLIM models

On Wed, 17 Jan 2007, Behnke Jerzy wrote:

> Dear All,
> I wonder if anyone can advise me as to whether there is a consensus as

> to how the effect size should be calculated from GLIM models in R for 
> any specified significant main effect or interaction.

I think there is consensus that effect sizes are not measured by
significance tests.  If you have a log link (you did not say), the model
coefficients have a direct interpretation via multiplicative increases
in rates.

> In investigating the causes of variation in infection in wild animals,

> we have fitted 4-way GLIM models in R with negative binomial errors.

What exactly do you mean by 'GLIM models in R with negative binomial
errors'?  Negative binomial regression is within the GLM framework only
for fixed shape theta. Package MASS has glm.nb() which extends the
framework and you may be using without telling us.  (AFAIK GLIM is a
software package, not a class of models.)

I suspect you are using the code from MASS without reference to the book
it supports, which has a worked example of model selection.

> These are then simplified using the STEP procedure, and finally each 
> of the remaining terms is deleted in turn, and the model without that 
> term compared to a model with that term to estimate probability

'probability' of what?

> An ANOVA of each model gives the deviance explained by each 
> interaction and main effect, and the percentage deviance attributable 
> to each factor can be calculated from NULL deviance.

If theta is not held fixed, anova() is probably not appropriate: see the
help for anova.negbin.

> However, we estimate probabilities by subsequent deletion of terms, 
> and this gives the LR statistic. Expressing the value of the LR 
> statistic as a percentage of 2xlog-like in a model without any 
> factors, gives lower values than the former procedure.

I don't know anything to suggest percentages of LR statistics are
reasonable summary measures.  There are extensions of R^2 to these
models, but AFAIK they share the well-attested drawbacks of R^2.

> Are either of these appropriate? If so which is best, or alternatively

> how can % deviance be calculated. We require % deviance explained by 
> each factor or interaction,  because we need to compare individual 
> factors (say host age) across a range of infections.
>
> Any advice will be most gratefully appreciated. I can send you a 
> worked example if you require more information.

We do ask for more information in the posting guide and the footer of
every message.  I have had to guess uncomfortably much in formulating my
answers.

> Jerzy. M. Behnke,
> The School of Biology,
> The University of Nottingham,
> University Park,
> NOTTINGHAM, NG7 2RD
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and pr

[R] Row limit for read.table

2007-01-17 Thread Frank McCown
I have been trying to read in a large data set using read.table, but 
I've only been able to grab the first 50,871 rows of the total 122,269 rows.

 > f <- 
read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";, 
header=TRUE, nrows=123000, comment.char="", sep="\t")
 > length(f$change_rate)
[1] 50871

 From searching the email archives, I believe this is due to size limits 
of a data frame.  So...

1) Why doesn't read.table give a proper warning when it doesn't place 
every read item into a data frame?

2) Why isn't there a parameter to read.table that allows the user to 
specify which columns s/he is interested in?  This functionality would 
allow extraneous columns to be ignored which would improve memory usage.

I've already made a work-around by loading the table into mysql and 
doing a select on the 2 columns I need.  I just wonder why the above 2 
points aren't implemented.  Maybe they are and I'm totally missing it.

Thanks,
Frank


-- 
Frank McCown
Old Dominion University
http://www.cs.odu.edu/~fmccown/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add non-linear line

2007-01-17 Thread Matt Sakals
I am trying to plot a non-linear trend line along with my data. I am  
sure the solution is simple; any help greatly appreciated.

Recent and historic attempts:
fit = nls((sd~a+b*exp(-c/time)+d*geology), start=list(a=1, b=1, c=10,  
d=-1), model=TRUE)
plot(time, sd, col=3, main = "Regression", sub=formula(fit))
lines(time, fitted.values(fit), lwd=2)

#tt = seq(from=0, to=200, by=10)
#lines(time, predict(fit, time=tt))
#lines(time, predict(fit))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to annotate a graph with non-transparent math labels?

2007-01-17 Thread Luis A Escobar
I do not know the precise language to describe the situation. But here
it is what I want to do.

I need to annotate a graph that contains two or more curves with labels
that contain math symbols. The label must go on top of the curve. 

The problem is that when I annotate the plot, I can see the curve
behind the label. Here it is an example using a simple straight line.

x<-c(0,1)
plot(x,x,type='l')
text(0.5,0.5,expression(theta),cex=3)

The line is visible inside the theta symbol. I would like to avoid that.

Following the solution to problem 3.2 in the ISwR book, for a related
problem,  I used the symbols package to write a rectangle (it can a
circle, etc)
with white bf and white fg and then write the label on top of that
rectangle.

Here is it is the code

x<-c(0,1)
plot(x,x,type='l')
text(0.5,0.5,expression(theta),cex=3)
xmax<-max(abs(x))
dimensions<-matrix(c(xmax/24,xmax/24),nrow=1)
symbols(0.5,0.5,rectangle=dimensions,bg='white',fg='white',add=TRUE,inch
es=FALSE)
text(0.5,0.5,expression(theta),cex=3)

It seems to work, but I am not happy with this solution. The symbols
statement above has fixed dimensions and does not work well with
multiple labels of different sizes. With multiple curves the symbols
statement might peel off other curves, axis, etc. Also if one changes
the size of the label (using cex), the size of the "rectangle" might
be too big or too small and it has to be adjusted.

What I would like to have is a simple way to write the label over the
line such
that the label only wipes out what is needed to avoid seeing the curve
behind. It is like I need a rectangles with rubbery dimensions to
contain just the label I am writing.

I searched the FAQ questions for related problems but I did not find
something related. 

My R skills are very rudimentary and I would not surprised the
solution is trivial.

Thank you for any help.

Luis Escobar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correlation to a Single Gene

2007-01-17 Thread Damion Colin Nero
I am trying to find a way to perform pairwise correlations against one
gene in a matrix rather than computing every pairwise correlation.  I am
interested in how 1 transcription factor correlates to every gene in a
matrix of 55 experiments (columns) by 23,000 genes (rows), performing
the correlation by rows.  Trying to perform every pairwise correlation
in this fashion is too memory intensive for any computer I am currently
using so I am wondering if anyone had a method for doing pairwise
correlations to a single gene or if there is a preexisting package in R
that might address this.

Damion Nero
Plant Molecular Biology Lab
Department of Biology 
New York University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sort dataframe by field

2007-01-17 Thread Milton Cezar Ribeiro
Hi there,
   
  How can I sort (order?) a data.frame using a dataframe field (evidence) as 
classifyer? My data.frame looks like:
   
  record   model   evidence
1 areatotha 6638.32581
2   areatotha_ca000 8111.01860
3  areatotha_ca000_Pais 1721.41828
4   areatotha_ca020  827.33097
5  areatotha_ca020_Pais 2212.40899
6   areatotha_ca040 3569.17169
7  areatotha_ca040_Pais 2940.01636
8   areatotha_ca060  992.62852
9  areatotha_ca060_Pais 4237.95709
10  areatotha_ca080   62.74915
11 areatotha_ca080_Pais 1726.55082
12   areatotha_Pais   52.02524
13  areatotha_ca000 3391.92930
14 areatotha_ca000_Pais   39.52170
15  areatotha_ca020  268.55875
16 areatotha_ca020_Pais   20.43317
17  areatotha_ca040 1698.75892
18 areatotha_ca040_Pais   43.90613
19  areatotha_ca060  350.79857
20 areatotha_ca060_Pais   51.04471

  Cheers,
   
  Miltinho, Brazil

 __


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: model simplification in repeated anova

2007-01-17 Thread Michael Greeff
Hi,

I tried to do a model simplification in a repeated anova with 3  
factors (population,treatment,site). I tried to get rid of some  
interactions. Obviously it's not possible to use the function "anova"  
straight away to compare the different models. I got the suggestion  
to use the function proj (see below). In my data, however, I have in  
"block" not the third order interaction (n:p:k), but two factors and  
their interaction (population, site, population:site).

I somehow didn't manage to get the right aov after the projection.



> From: "Richard M. Heiberger" <[EMAIL PROTECTED]>
>
>
> npk.aovE <- aov(yield ~  N*P*K + Error(block), npk)
>
> np.k.aovE <- aov(yield ~  N*P+K + Error(block), npk)
>
>
> npk.proj <- proj(npk.aovE)
> npk.x <- npk.proj$block[,"N:P:K"]
>
> npk.aov <- aov(terms(yield ~  npk.x + block + N+P+K+N:P+N:K+P:K,  
> keep.order=TRUE), npk)
>
> np.k.aov <- aov(terms(yield ~  block + N+P+K+N:P, keep.order=TRUE),  
> npk)
>
> summary(npk.aov)
> summary(np.k.aov)
>
> anova(npk.aov, np.k.aov)
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort dataframe by field

2007-01-17 Thread Benilton Carvalho
say your data.frame is called "df"

df[order(df$evidence),]

or

df[order(df$evidence, decreasing=T),] # if you want the other way  
around.

b

On Jan 17, 2007, at 11:21 AM, Milton Cezar Ribeiro wrote:

> Hi there,
>
>   How can I sort (order?) a data.frame using a dataframe field  
> (evidence) as classifyer? My data.frame looks like:
>
>   record   model   evidence
> 1 areatotha 6638.32581
> 2   areatotha_ca000 8111.01860
> 3  areatotha_ca000_Pais 1721.41828
> 4   areatotha_ca020  827.33097
> 5  areatotha_ca020_Pais 2212.40899
> 6   areatotha_ca040 3569.17169
> 7  areatotha_ca040_Pais 2940.01636
> 8   areatotha_ca060  992.62852
> 9  areatotha_ca060_Pais 4237.95709
> 10  areatotha_ca080   62.74915
> 11 areatotha_ca080_Pais 1726.55082
> 12   areatotha_Pais   52.02524
> 13  areatotha_ca000 3391.92930
> 14 areatotha_ca000_Pais   39.52170
> 15  areatotha_ca020  268.55875
> 16 areatotha_ca020_Pais   20.43317
> 17  areatotha_ca040 1698.75892
> 18 areatotha_ca040_Pais   43.90613
> 19  areatotha_ca060  350.79857
> 20 areatotha_ca060_Pais   51.04471
>
>   Cheers,
>
>   Miltinho, Brazil
>
>  __
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sp: proj4string has no impact

2007-01-17 Thread Albrecht Kauffmann
Hi all,

I'm faced with a problem applying the sp package: The projection argument in

readShapePoly(Shapefile,proj4string="CRS class argument")

e.g.: CRS("+proj=aea +lat_1=46 +lat_2=73 +lat_0=60 +lon_0=84 +x_0=0
+y_0=0 +ellps=clrk66 +units=m  +no_defs")

doesn't have any impact on the plotted object. I also tested the simple
example:

xy = cbind(x = 2 * runif(100) - 1, y = 2 * runif(100) - 1)
plot(SpatialPoints(xy, proj4string =
CRS("+proj=longlat")),xlim=c(-1,1),ylim=c(-1,1))

looks exactly like

plot(SpatialPoints(xy, proj4string =CRS("+proj=stere +lon_0=98  +over"))

or

plot(SpatialPoints(xy))

without any projection.

What I'm doing wrong?

I use the latest versions of sp and rgdal.

With many thanks for any hint,

Albrecht Kauffmann
Potsdam University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row limit for read.table

2007-01-17 Thread Peter Dalgaard
Frank McCown wrote:
> I have been trying to read in a large data set using read.table, but 
> I've only been able to grab the first 50,871 rows of the total 122,269 rows.
>
>  > f <- 
> read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";, 
> header=TRUE, nrows=123000, comment.char="", sep="\t")
>  > length(f$change_rate)
> [1] 50871
>
>  From searching the email archives, I believe this is due to size limits 
> of a data frame.  So...
>   
I think you believe wrongly...
> 1) Why doesn't read.table give a proper warning when it doesn't place 
> every read item into a data frame?
>   
That isn't the problem, it is a somewhat obscure interaction between
quote= and sep= that is doing you in. Remove the sep="\t" and/or add
quote="" and your life should be easier.
> 2) Why isn't there a parameter to read.table that allows the user to 
> specify which columns s/he is interested in?  This functionality would 
> allow extraneous columns to be ignored which would improve memory usage.
>
>   
There is!  check out colClasses

> cc <- rep("NULL",5)
> cc[4:5] <- NA
> f <-
read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";,
header=TRUE, sep="\t", quote="", colClasses=cc)
> str(f)
'data.frame':   122271 obs. of  2 variables:
 $ recovered  : Factor w/ 5 levels "changed","identical",..: 5 3 3 3 2 2
2 2 1 2 ...
 $ change_rate: num  1 0 0 1 0 0 0 0 0 0 ...

> I've already made a work-around by loading the table into mysql and 
> doing a select on the 2 columns I need.  I just wonder why the above 2 
> points aren't implemented.  Maybe they are and I'm totally missing it.
>
> Thanks,
> Frank
>
>
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row limit for read.table

2007-01-17 Thread Martin Becker
Frank McCown schrieb:
> I have been trying to read in a large data set using read.table, but 
> I've only been able to grab the first 50,871 rows of the total 122,269 rows.
>
>  > f <- 
> read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";, 
> header=TRUE, nrows=123000, comment.char="", sep="\t")
>  > length(f$change_rate)
> [1] 50871
>
>  From searching the email archives, I believe this is due to size limits 
> of a data frame.  So...
>
>   
It is not due to size limits, see below.
> 1) Why doesn't read.table give a proper warning when it doesn't place 
> every read item into a data frame?
>
>   
In your case, read.table behaves as documented.
The ' - character is one of the standard quoting characters. Some (but 
very few) of the entrys contain single ' chars, so sometimes more than 
ten thousand lines are just treated as a single entry. Try using 
quote="" to disable quoting, as documented on the help page:

f<-read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";,
header=TRUE, nrows=123000, comment.char="", sep="\t",quote="")

length(f$change_rate)
[1] 122271

> 2) Why isn't there a parameter to read.table that allows the user to 
> specify which columns s/he is interested in?  This functionality would 
> allow extraneous columns to be ignored which would improve memory usage.
>
>   
There is (colClasses, works as documented). Try

 f<-read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";, 

+ header=TRUE, nrows=123000, comment.char="", 
sep="\t",quote="",colClasses=c("character","NULL","NULL","NULL","NULL"))
 > dim(f)
[1] 122271  1

> I've already made a work-around by loading the table into mysql and 
> doing a select on the 2 columns I need.  I just wonder why the above 2 
> points aren't implemented.  Maybe they are and I'm totally missing it.
>
>   
Did you read the help page?
> Thanks,
> Frank
>
>
>   
Regards,
   Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size in GLIM models

2007-01-17 Thread Bert Gunter
Folks:

I think this and several other recent posts on ranking predictors are nice
illustrations of a fundamental conundrum: Empirical models are fit as good
*predictors*; "meaningful" interpretation of separate parameters/components
of the predictors may well be difficult or impossible, especially in complex
models.  All that the fitting process guarantees if it works well is a good
overall predictor to data sampled from the same process. Unfortunately,
most/much of the time, those who apply the procedures are interested in
interpretation, not prediction.

Addendum: Interpretation is helped by well-designed studies and experiments,
hindered by data mining of observational data.

I don't think any of this is profound, just sometimes forgotten; however, I
would welcome public or private reaction to this comment, and especially
refinement/corrections.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley
Sent: Wednesday, January 17, 2007 6:02 AM
To: Behnke Jerzy
Cc: Reader Tom; r-help@stat.math.ethz.ch
Subject: Re: [R] Effect size in GLIM models

On Wed, 17 Jan 2007, Behnke Jerzy wrote:

> Dear All,
> I wonder if anyone can advise me as to whether there is a consensus as
> to how the effect size should be calculated from GLIM models in R for
> any specified significant main effect or interaction.

I think there is consensus that effect sizes are not measured by 
significance tests.  If you have a log link (you did not say), the model 
coefficients have a direct interpretation via multiplicative increases in 
rates.

> In investigating the causes of variation in infection in wild animals,
> we have fitted 4-way GLIM models in R with negative binomial errors.

What exactly do you mean by 'GLIM models in R with negative binomial 
errors'?  Negative binomial regression is within the GLM framework only 
for fixed shape theta. Package MASS has glm.nb() which extends the 
framework and you may be using without telling us.  (AFAIK GLIM is a 
software package, not a class of models.)

I suspect you are using the code from MASS without reference to the book
it supports, which has a worked example of model selection.

> These are then simplified using the STEP procedure, and finally each of
> the remaining terms is deleted in turn, and the model without that term
> compared to a model with that term to estimate probability

'probability' of what?

> An ANOVA of each model gives the deviance explained by each interaction
> and main effect, and the percentage deviance attributable to each factor
> can be calculated from NULL deviance.

If theta is not held fixed, anova() is probably not appropriate: see the 
help for anova.negbin.

> However, we estimate probabilities by subsequent deletion of terms, and
> this gives the LR statistic. Expressing the value of the LR statistic as
> a percentage of 2xlog-like in a model without any factors, gives lower
> values than the former procedure.

I don't know anything to suggest percentages of LR statistics are 
reasonable summary measures.  There are extensions of R^2 to these models, 
but AFAIK they share the well-attested drawbacks of R^2.

> Are either of these appropriate? If so which is best, or alternatively
> how can % deviance be calculated. We require % deviance explained by
> each factor or interaction,  because we need to compare individual
> factors (say host age) across a range of infections.
>
> Any advice will be most gratefully appreciated. I can send you a worked
> example if you require more information.

We do ask for more information in the posting guide and the footer of 
every message.  I have had to guess uncomfortably much in formulating my 
answers.

> Jerzy. M. Behnke,
> The School of Biology,
> The University of Nottingham,
> University Park,
> NOTTINGHAM, NG7 2RD
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and prov

Re: [R] Row limit for read.table

2007-01-17 Thread Vladimir Eremeev
The problem is somewhere in the file, probably with tab characters, as removing 
sep="" from your call does the job.

> dfr<-read.table("Tchange_rates_crawled.dat",header=TRUE)
> str(dfr)
'data.frame':   122271 obs. of  5 variables:
[skipped]
> dfr<-
read.table("Tchange_rates_crawled.dat",header=TRUE,stringsAsFactors=FALSE)
> str(dfr)
'data.frame':   122271 obs. of  5 variables:
[skipped]

R has no limitations you're talking about. A couple hours ago my R has 
successfully read in 46 rows of data from a text file in a data frame using 
read.table. 
I have also processed even larger data sets (more than 50 rows).

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row limit for read.table

2007-01-17 Thread Frank McCown
> In your case, read.table behaves as documented.
> The ' - character is one of the standard quoting characters. Some (but 
> very few) of the entrys contain single ' chars, so sometimes more than 
> ten thousand lines are just treated as a single entry. Try using 
> quote="" to disable quoting, as documented on the help page:
> 
> f<-read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";,
> header=TRUE, nrows=123000, comment.char="", sep="\t",quote="")
> 
> length(f$change_rate)
> [1] 122271


So either adding quote="" works or removing sep="\t" (and not using 
quote) works.  It seems an odd side-effect that specifying the separator 
changes the default behavior of quoting (because of the ' character).  I 
don't see that association made in the help file.


> There is (colClasses, works as documented). Try
> 
> f<-read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat";,
> + header=TRUE, nrows=123000, comment.char="", 
> sep="\t",quote="",colClasses=c("character","NULL","NULL","NULL","NULL"))
>  > dim(f)
> [1] 122271  1

> Did you read the help page?

Of course I did.  For me the definition of colClasses wasn't clear... 
"A vector of classes to be assumed for the columns" didn't seem to be 
the same thing as "the columns you would like to be read."  I may have 
made the association if the help page had contained a simple example of 
using colClasses.

Thanks for the help,
Frank


-- 
Frank McCown
Old Dominion University
http://www.cs.odu.edu/~fmccown/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add non-linear line

2007-01-17 Thread Duncan Murdoch
On 1/17/2007 10:42 AM, Matt Sakals wrote:
> I am trying to plot a non-linear trend line along with my data. I am  
> sure the solution is simple; any help greatly appreciated.
> 
> Recent and historic attempts:
> fit = nls((sd~a+b*exp(-c/time)+d*geology), start=list(a=1, b=1, c=10,  
> d=-1), model=TRUE)
> plot(time, sd, col=3, main = "Regression", sub=formula(fit))
> lines(time, fitted.values(fit), lwd=2)
> 
> #tt = seq(from=0, to=200, by=10)
> #lines(time, predict(fit, time=tt))
> #lines(time, predict(fit))

Your model has two independent variables, time and geology.  You won't 
get a simple trend line unless the value of geology is kept fixed.  You 
could do that with something like this code:

lines(time, predict(fit, newdata=data.frame(time=time, geology=0)))

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add non-linear line

2007-01-17 Thread Duncan Murdoch
On 1/17/2007 12:24 PM, Duncan Murdoch wrote:
> On 1/17/2007 10:42 AM, Matt Sakals wrote:
>> I am trying to plot a non-linear trend line along with my data. I am  
>> sure the solution is simple; any help greatly appreciated.
>> 
>> Recent and historic attempts:
>> fit = nls((sd~a+b*exp(-c/time)+d*geology), start=list(a=1, b=1, c=10,  
>> d=-1), model=TRUE)
>> plot(time, sd, col=3, main = "Regression", sub=formula(fit))
>> lines(time, fitted.values(fit), lwd=2)
>> 
>> #tt = seq(from=0, to=200, by=10)
>> #lines(time, predict(fit, time=tt))
>> #lines(time, predict(fit))
> 
> Your model has two independent variables, time and geology.  You won't 
> get a simple trend line unless the value of geology is kept fixed.  You 
> could do that with something like this code:
> 
> lines(time, predict(fit, newdata=data.frame(time=time, geology=0)))

One other suggestion:  in the new data, make sure your "time" variable 
is sorted into increasing order, or the lines will be a mess.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on variable ranking

2007-01-17 Thread Weiwei Shi
I suggest using permutation on each predictor and see how much the
accuracy drops, no matter what modeling approach you used.


HTH,

weiwei

On 1/17/07, Rupendra Chulyadyo <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I want to assign relative score to the predictor variables on the basis of
> its influence on the dependent variable. But I could not find any standard
> statistical approach appropriate for this purpose.
> Please suggest the possible approaches.
>
> Thanks in advance,
>
> Rupendra Chulyadyo
> Institute of Engineering,
> Tribhuvan University,
> Nepal
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size in GLIM models

2007-01-17 Thread Reader Tom
Dear All (apologies if some of you have received this twice)
Thanks very much for the rapid reply Prof Ripley. I had been looking at this 
anaysis for my colleague (Prof Behnke) and suggested that he contact the R 
mailing list because I couldn't answer his question. I think some of the detail 
got lost in translation (he grew up with the GLIM package!). So here are some 
more details:
We are indeed following your guidelines in the MASS book, and using glm.nb to 
analyse some data on the abundance of several parasite species in mice. We 
proceeded with model selection as suggested, and we are reasonably happy that 
we end up with decent models for our several parasite species. 
The question that Prof Behnke asked is: if we fit similar models (initial full 
models have the same factors and covariates) with different response variables 
(abundances of different species of parasite), is there a way of comparing the 
relative effect sizes of the key explanatory variables across different models? 
For example, if we find that the best models for two different species include 
the term "sex", is there a way of determining if sex explains more of the 
variance in parasite abundance in species A than in species B? 
In a simple ANOVA with Guassian errors, we might compare the percentage 
variance explained. We could also look at the overall r^2 for the models and 
determine how well (relatively) our different models perform. We might end up 
concluding that for species A, we have found the most important biolgoical 
factors explaining parasite abundance, but that for species B we have yet to 
explain a large proportion of the variance.
Is there something similar we can do with our glm.nb models? Clearly the 
coefficients will tell us about relative effect sizes WITHIN a given model, but 
what can we do when comparing completely different response variables?!
Regards 
Tom Reader 



-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: Wed 17/01/2007 14:01
To: Behnke Jerzy
Cc: r-help@stat.math.ethz.ch; Reader Tom
Subject: Re: [R] Effect size in GLIM models
 
On Wed, 17 Jan 2007, Behnke Jerzy wrote:

> Dear All,
> I wonder if anyone can advise me as to whether there is a consensus as
> to how the effect size should be calculated from GLIM models in R for
> any specified significant main effect or interaction.

I think there is consensus that effect sizes are not measured by 
significance tests.  If you have a log link (you did not say), the model 
coefficients have a direct interpretation via multiplicative increases in 
rates.

> In investigating the causes of variation in infection in wild animals,
> we have fitted 4-way GLIM models in R with negative binomial errors.

What exactly do you mean by 'GLIM models in R with negative binomial 
errors'?  Negative binomial regression is within the GLM framework only 
for fixed shape theta. Package MASS has glm.nb() which extends the 
framework and you may be using without telling us.  (AFAIK GLIM is a 
software package, not a class of models.)

I suspect you are using the code from MASS without reference to the book
it supports, which has a worked example of model selection.

> These are then simplified using the STEP procedure, and finally each of
> the remaining terms is deleted in turn, and the model without that term
> compared to a model with that term to estimate probability

'probability' of what?

> An ANOVA of each model gives the deviance explained by each interaction
> and main effect, and the percentage deviance attributable to each factor
> can be calculated from NULL deviance.

If theta is not held fixed, anova() is probably not appropriate: see the 
help for anova.negbin.

> However, we estimate probabilities by subsequent deletion of terms, and
> this gives the LR statistic. Expressing the value of the LR statistic as
> a percentage of 2xlog-like in a model without any factors, gives lower
> values than the former procedure.

I don't know anything to suggest percentages of LR statistics are 
reasonable summary measures.  There are extensions of R^2 to these models, 
but AFAIK they share the well-attested drawbacks of R^2.

> Are either of these appropriate? If so which is best, or alternatively
> how can % deviance be calculated. We require % deviance explained by
> each factor or interaction,  because we need to compare individual
> factors (say host age) across a range of infections.
>
> Any advice will be most gratefully appreciated. I can send you a worked
> example if you require more information.

We do ask for more information in the posting guide and the footer of 
every message.  I have had to guess uncomfortably much in formulating my 
answers.

> Jerzy. M. Behnke,
> The School of Biology,
> The University of Nottingham,
> University Park,
> NOTTINGHAM, NG7 2RD
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the pos

Re: [R] sleep data

2007-01-17 Thread Charles C. Berry

Yes, you refer to

Cushny, A. R. and Peebles, A. R. The action of optical isomers: II
hyoscines. The Journal of Physiology, 1905, 32: 501.510.

which was used by 'Student' to illustrate the paired t-test.

This is indeed a crossover design.

On Wed, 17 Jan 2007, Tom Backer Johnsen wrote:

> When reading the documentation for the "sleep" data set in R, the
> impression is clear, this is an "independent groups" kind of design
> (two groups of 10 subjects each).  However, when browsing the original
> article (referred to in the help file), my impression is quite clear,
> this is really a "repeated measures" kind of data (one group of 10
> subjects, two observations).  What is correct?
>
> Tom
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation to a Single Gene

2007-01-17 Thread Charles C. Berry
On Wed, 17 Jan 2007, Damion Colin Nero wrote:

> I am trying to find a way to perform pairwise correlations against one
> gene in a matrix rather than computing every pairwise correlation.  I am
> interested in how 1 transcription factor correlates to every gene in a
> matrix of 55 experiments (columns) by 23,000 genes (rows), performing
> the correlation by rows.  Trying to perform every pairwise correlation
> in this fashion is too memory intensive for any computer I am currently
> using so I am wondering if anyone had a method for doing pairwise
> correlations to a single gene or if there is a preexisting package in R
> that might address this.


You measure the transcription factor once in each of 55 experiments and 
you measure gene *expression* (or some other quantity) on each of 23000 
genes?

cor.vec <- cor (transfac, t( gene.mat ) )

will do.

Questions like this might best be posted to the bioconductor mail list.


>
> Damion Nero
> Plant Molecular Biology Lab
> Department of Biology
> New York University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sp: proj4string has no impact

2007-01-17 Thread Roger Bivand
On Wed, 17 Jan 2007, Albrecht Kauffmann wrote:

> Hi all,
> 
> I'm faced with a problem applying the sp package: The projection argument in
> 
> readShapePoly(Shapefile,proj4string="CRS class argument")
> 
> e.g.: CRS("+proj=aea +lat_1=46 +lat_2=73 +lat_0=60 +lon_0=84 +x_0=0
> +y_0=0 +ellps=clrk66 +units=m  +no_defs")
> 
> doesn't have any impact on the plotted object. I also tested the simple
> example:
> 
> xy = cbind(x = 2 * runif(100) - 1, y = 2 * runif(100) - 1)
> plot(SpatialPoints(xy, proj4string =
> CRS("+proj=longlat")),xlim=c(-1,1),ylim=c(-1,1))
> 
> looks exactly like
> 
> plot(SpatialPoints(xy, proj4string =CRS("+proj=stere +lon_0=98  +over"))
> 
> or
> 
> plot(SpatialPoints(xy))
> 
> without any projection.
> 
> What I'm doing wrong?

What impact would you like? The vertical scaling adjustment for 
geographical coordinates ("+proj=longlat") is proportional to the distance 
of the midpoint of ylim from the Equator, so in your case no adjustment 
would be expected:

library(sp)
set.seed(1)
x <- runif(20, -10, 10)
y <- runif(20, -10, 10)
par(mfrow=c(3,2))
plot(SpatialPoints(cbind(x, y)), axes=TRUE)
plot(SpatialPoints(cbind(x, y), proj4string=CRS("+proj=longlat")), 
axes=TRUE)
plot(SpatialPoints(cbind(x, y+50)), axes=TRUE)
plot(SpatialPoints(cbind(x, y+50), proj4string=CRS("+proj=longlat")), 
  axes=TRUE)
plot(SpatialPoints(cbind(x, y+70)), axes=TRUE)
plot(SpatialPoints(cbind(x, y+70), proj4string=CRS("+proj=longlat")),
  axes=TRUE)
par(mfrow=c(1,1))



> 
> I use the latest versions of sp and rgdal.
> 
> With many thanks for any hint,
> 
> Albrecht Kauffmann
> Potsdam University
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation to a Single Gene

2007-01-17 Thread Robert Gentleman
In the package genefilter, from www.bioconductor.org there is a function 
to do this (genefinder, if I recall correctly)
best wishes
  Robert


Charles C. Berry wrote:
> On Wed, 17 Jan 2007, Damion Colin Nero wrote:
> 
>> I am trying to find a way to perform pairwise correlations against one
>> gene in a matrix rather than computing every pairwise correlation.  I am
>> interested in how 1 transcription factor correlates to every gene in a
>> matrix of 55 experiments (columns) by 23,000 genes (rows), performing
>> the correlation by rows.  Trying to perform every pairwise correlation
>> in this fashion is too memory intensive for any computer I am currently
>> using so I am wondering if anyone had a method for doing pairwise
>> correlations to a single gene or if there is a preexisting package in R
>> that might address this.
> 
> 
> You measure the transcription factor once in each of 55 experiments and 
> you measure gene *expression* (or some other quantity) on each of 23000 
> genes?
> 
>   cor.vec <- cor (transfac, t( gene.mat ) )
> 
> will do.
> 
> Questions like this might best be posted to the bioconductor mail list.
> 
> 
>> Damion Nero
>> Plant Molecular Biology Lab
>> Department of Biology
>> New York University
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> Charles C. Berry(858) 534-2098
>   Dept of Family/Preventive Medicine
> E mailto:[EMAIL PROTECTED] UC San Diego
> http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply, data.frame problem

2007-01-17 Thread Lauri Nikkinen
Hi R-users,

I'm quite new to R and trying to learn the basics. I have a following
problem concerning the convertion of array object into data frame. I have
made following data sets

tmp1 <- rnorm(100)
tmp2 <- gl(10,2,length=100)
tmp3 <- as.data.frame(cbind(tmp1,tmp2))
tmp3.sum <- tapply(tmp3$tmp1,tmp3$tmp2,sum)
tmp3.sum <- as.data.frame(tapply(tmp1,tmp2,sum))
and I want the levels from tmp2 be shown as a column in the data.frame, not
as row name as it now does. To put it in another way, as a result, I want a
data frame with two columns: levels and the sums of those levels. Row names
can be, for example, numbers from 1 to 10.

-Lauri Nikkinen
Lahti, Finland

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply, data.frame problem

2007-01-17 Thread Chuck Cleland
Lauri Nikkinen wrote:
> Hi R-users,
> 
> I'm quite new to R and trying to learn the basics. I have a following
> problem concerning the convertion of array object into data frame. I have
> made following data sets
> 
> tmp1 <- rnorm(100)
> tmp2 <- gl(10,2,length=100)
> tmp3 <- as.data.frame(cbind(tmp1,tmp2))
> tmp3.sum <- tapply(tmp3$tmp1,tmp3$tmp2,sum)
> tmp3.sum <- as.data.frame(tapply(tmp1,tmp2,sum))
> and I want the levels from tmp2 be shown as a column in the data.frame, not
> as row name as it now does. To put it in another way, as a result, I want a
> data frame with two columns: levels and the sums of those levels. Row names
> can be, for example, numbers from 1 to 10.

aggregate(tmp3[1], tmp3[2], sum)
   tmp2tmp1
1 1  8.41550650
2 2  3.65831086
3 3 -0.26296334
4 4  3.45368671
5 5 -4.64383794
6 6  0.25640949
7 7  0.02832348
8 8 -0.03811150
9 9  1.41724121
10   10 -1.06780900

?aggregate

> -Lauri Nikkinen
> Lahti, Finland
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eval(parse(text vs. get when accessing a function

2007-01-17 Thread Ramon Diaz-Uriarte
(I overlooked the reply).

Thanks, Gabor. That is neat and easy! (and I should have been able to
see it on my own :-(

Best,

R.

On 1/8/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> The S4 is not essential.  You could do it in S3 too:
>
> > f.a <- function(x) x+1
> > f.b <- function(x) x+2
> > f <- function(x) UseMethod("f")
> >
> > f(structure(10, class = "a"))
> [1] 11
> attr(,"class")
> [1] "a"
>
> On 1/6/07, Ramon Diaz-Uriarte <[EMAIL PROTECTED]> wrote:
> > Hi Martin,
> >
> >
> >
> > On 1/6/07, Martin Morgan <[EMAIL PROTECTED]> wrote:
> > > Hi Ramon,
> > >
> > > It seems like a naming convention (f.xxx) and eval(parse(...)) are
> > > standing in for objects (of class 'GeneSelector', say, representing a
> > > function with a particular form and doing a particular operation) and
> > > dispatch (a function 'geneConverter' might handle a converter of class
> > > 'GeneSelector' one way, user supplied ad-hoc functions more carefully;
> > > inside geneConverter the only real concern is that the converter
> > > argument is in fact a callable function).
> > >
> > > eval(parse(...)) brings scoping rules to the fore as an explicit
> > > programming concern; here scope is implicit, but that's probably better
> > > -- R will get its own rules right.
> > >
> > > Martin
> > >
> > > Here's an S4 sketch:
> > >
> > > setClass("GeneSelector",
> > >  contains="function",
> > >  representation=representation(description="character"),
> > >  validity=function(object) {
> > >  msg <- NULL
> > >  argNames <- names(formals(object))
> > >  if (argNames[1]!="x")
> > >msg <- c(msg, "\n  GeneSelector requires a first argument 
> > > named 'x'")
> > >  if (!"..." %in% argNames)
> > >msg <- c(msg, "\n  GeneSelector requires '...' in its 
> > > signature")
> > >  if (0==length([EMAIL PROTECTED]))
> > >msg <- c(msg, "\n  Please describe your GeneSelector")
> > >  if (is.null(msg)) TRUE else msg
> > >  })
> > >
> > > setGeneric("geneConverter",
> > >function(converter, x, ...) standardGeneric("geneConverter"),
> > >signature=c("converter"))
> > >
> > > setMethod("geneConverter",
> > >   signature(converter="GeneSelector"),
> > >   function(converter, x, ...) {
> > >   ## important stuff here
> > >   converter(x, ...)
> > >   })
> > >
> > > setMethod("geneConverter",
> > >   signature(converter="function"),
> > >   function(converter, x, ...) {
> > >   message("ad-hoc converter; hope it works!")
> > >   converter(x, ...)
> > >   })
> > >
> > > and then...
> > >
> > > > c1 <- new("GeneSelector",
> > > +   function(x, ...) prod(x, ...),
> > > +   description="Product of x")
> > > >
> > > > c2 <- new("GeneSelector",
> > > +   function(x, ...) sum(x, ...),
> > > +   description="Sum of x")
> > > >
> > > > geneConverter(c1, 1:4)
> > > [1] 24
> > > > geneConverter(c2, 1:4)
> > > [1] 10
> > > > geneConverter(mean, 1:4)
> > > ad-hoc converter; hope it works!
> > > [1] 2.5
> > > >
> > > > cvterr <- new("GeneSelector", function(y) {})
> > > Error in validObject(.Object) : invalid class "GeneSelector" object: 1:
> > >   GeneSelector requires a first argument named 'x'
> > > invalid class "GeneSelector" object: 2:
> > >   GeneSelector requires '...' in its signature
> > > invalid class "GeneSelector" object: 3:
> > >   Please describe your GeneSelector
> > > > xxx <- 10
> > > > geneConverter(xxx, 1:4)
> > > Error in function (classes, fdef, mtable)  :
> > > unable to find an inherited method for function "geneConverter", 
> > > for signature "numeric"
> > >
> >
> >
> >
> > Thanks!! That is actually a rather interesting alternative approach
> > and I can see it also adds a lot of structure to the problem. I have
> > to confess, though, that I am not a fan of OOP (nor of S4 classes); in
> > this case, in particular, it seems there is a lot of scaffolding in
> > the code above (the counterpoint to the structure?) and, regarding
> > scoping rules, I prefer to think about them explicitly (I find it much
> > simpler than inheritance).
> >
> > Best,
> >
> > R.
> >
> >
> > >
> > > "Ramon Diaz-Uriarte" <[EMAIL PROTECTED]> writes:
> > >
> > > > Dear Greg,
> > > >
> > > >
> > > > On 1/5/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > > >> Ramon,
> > > >>
> > > >> I prefer to use the list method for this type of thing, here are a 
> > > >> couple of reasons why (maybe you are more organized than me and would 
> > > >> never do some of the stupid things that I have, so these don't apply 
> > > >> to you, but you can see that the general suggestion applys to some of 
> > > >> the rest of us).
> > > >>
> > > >
> > > >
> > > > Those suggestions do apply to me of course (no claim to being
> > > > organized nor beyond idiocy here). And actually the suggest

[R] Memory leak with character arrays?

2007-01-17 Thread Peter Waltman
Hi -

When I'm trying to read in a text file into a labeled character array, 
the memory stamp/footprint of R will exceed 4 gigs or more.  I've seen 
this behavior on Mac OS X, Linux for AMD_64 and X86_64., and the R 
versions are 2.4, 2.4 and 2.2, respectively.  So, it would seem that 
this is platform and R version independant.

The file that I'm reading contains the upstream regions of the yeast 
genome, with each upstream region labeled using a FASTA header, i.e.:

FASTA header for gene 1
upstream region.
.

FASTA header for gene 2
upstream


The script I use - code below - opens the file, parses for a FASTA 
header, and then parses the header for the gene name.  Once this is 
done, it reads the following lines which contain the upstream region, 
and then adds it as an item to the character array, using the gene name 
as the name of the item it adds.  And then continues on to the following 
genes.

Each upstream region (the text to be added) is 550 bases (characters) 
long.  With ~6000 genes in the file I'm reading it, this would be 550 * 
6000 * 8 (if we're using ascii chars) ~= 25 Megs (if we're using ascii 
chars).

I realize that the character arrays/vectors will have a higher memory 
stamp b/c they are a named array and most likely aren't storing the text 
as ascii, but 4 gigs and up seems a bit excessive.  Or is it?

For an example, this is the output of top, at the point which R has 
processed around 5000 genes:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
 4969 waltman   18   0 *6746m 3.4g*  920 D  2.7 88.2  19:09.19 R

Is this expected behavior?  Can anyone recommend a less memory intensive 
way to store this data?  The relevant code that reads in the file follows:

 code
 lines <- readLines( gzfile( seqs.fname ) )
 
  n.seqs <- 0
 
  upstream <- gene.names <- character()
  syn <- character( 0 )
  gene.start <- gene.end <- integer()
  gene <- seq <- ""


  for ( i in 1:length( lines ) ) {
line <- lines[ i ]
if ( line == "" ) next
if ( substr( line, 1, 1 ) == ">" ) {

  if ( seq != "" && gene != "" ) upstream[ gene ] <-
toupper( seq )
  splitted <- strsplit( line, "\t" )[[ 1 ]]
  splitted <- strsplit( splitted[ 1 ], ";\\ " )[[ 1 ]]
  gene <- toupper( substr( splitted[ 1 ], 2, nchar(
splitted[ 1 ] ) ) )
  syn <- splitted[ 2 ]
  if ( ! is.null( syn ) &&
  length( grep( valid.gene.regexp, gene, perl=T ) ) == 0 &&
  length( grep( valid.gene.regexp, syn, perl=T ) ) == 1
) gene <- syn
  else if ( length( grep( valid.gene.regexp, gene, perl=T,
ignore.case=T ) ) == 0 &&
   length( grep( valid.gene.regexp, syn, perl=T,
ignore.case=T ) ) == 0 ) next
  gene.start[ gene ] <- as.integer( splitted[ 9 ] )
  gene.end[ gene ] <- as.integer( splitted[ 10 ] )
  if ( n.seqs %% 100 == 0 ) cat.new( n.seqs, gene, "|", syn,
"| length=", nchar( seq ),
   gene.end[gene]-gene.start[gene]+1,"\n" )
  if ( ! is.na( syn ) && syn != "" ) gene.names[ gene ] <- syn
  else gene.names[ gene ] <- toupper( gene )
  n.seqs <- n.seqs + 1
  seq <- ""
} else {
  seq <- paste( seq, line, sep="" )
}
  }
  if ( seq != "" && gene != "" ) upstream[ gene ] <- toupper( seq )

 code

Thanks,

Peter Waltman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to annotate a graph with non-transparent math labels?

2007-01-17 Thread jim holtman
try using strwidth & strheight


x<-c(0,1)
plot(x,x,type='l')
dimensions<-matrix(c(strwidth(expression(theta),cex=5),strheight(expression(theta),
cex=5)),nrow=1)
symbols(0.5,0.5
,rectangle=dimensions,bg='white',fg='white',add=TRUE,inches=FALSE)
text(0.5,0.5,expression(theta),cex=5)





On 1/17/07, Luis A Escobar <[EMAIL PROTECTED]> wrote:
>
> I do not know the precise language to describe the situation. But here
> it is what I want to do.
>
> I need to annotate a graph that contains two or more curves with labels
> that contain math symbols. The label must go on top of the curve.
>
> The problem is that when I annotate the plot, I can see the curve
> behind the label. Here it is an example using a simple straight line.
>
> x<-c(0,1)
> plot(x,x,type='l')
> text(0.5,0.5,expression(theta),cex=3)
>
> The line is visible inside the theta symbol. I would like to avoid that.
>
> Following the solution to problem 3.2 in the ISwR book, for a related
> problem,  I used the symbols package to write a rectangle (it can a
> circle, etc)
> with white bf and white fg and then write the label on top of that
> rectangle.
>
> Here is it is the code
>
> x<-c(0,1)
> plot(x,x,type='l')
> text(0.5,0.5,expression(theta),cex=3)
> xmax<-max(abs(x))
> dimensions<-matrix(c(xmax/24,xmax/24),nrow=1)
> symbols(0.5,0.5,rectangle=dimensions,bg='white',fg='white',add=TRUE,inch
> es=FALSE)
> text(0.5,0.5,expression(theta),cex=3)
>
> It seems to work, but I am not happy with this solution. The symbols
> statement above has fixed dimensions and does not work well with
> multiple labels of different sizes. With multiple curves the symbols
> statement might peel off other curves, axis, etc. Also if one changes
> the size of the label (using cex), the size of the "rectangle" might
> be too big or too small and it has to be adjusted.
>
> What I would like to have is a simple way to write the label over the
> line such
> that the label only wipes out what is needed to avoid seeing the curve
> behind. It is like I need a rectangles with rubbery dimensions to
> contain just the label I am writing.
>
> I searched the FAQ questions for related problems but I did not find
> something related.
>
> My R skills are very rudimentary and I would not surprised the
> solution is trivial.
>
> Thank you for any help.
>
> Luis Escobar
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coefficient of determination when intercept is zero

2007-01-17 Thread endeitz

I am curious as to the "lm" calculation of R2 (multiple coefficient of
determination, I assume) when intercept is zero.  I have 18 data points, two
independent variables:

First, a model with an intercept:

> mod0=lm(Div~Rain+Evap,data=test)
> summary(mod0)$r.squared
[1] 0.6257541
> cor(predict(mod0),test$Div)^2
[1] 0.6257541

The $r.squared and the result from "cor" are the same, as I would expect.

Now we try a model with zero intercept:

> mod1=lm(Div~0+Rain+Evap,data=test)
> summary(mod1)$r.squared
[1] 0.9099358
> cor(predict(mod1),test$Div)^2
[1] 0.5813659

Why has the $r.squared value increased to 0.9?  And now the result from
"cor" is not the same?  Is there a special way to calculate the coefficient
of determination when the intercept is zero?

Cheers,

Ed.

-- 
View this message in context: 
http://www.nabble.com/Coefficient-of-determination-when-intercept-is-zero-tf3030776.html#a8420811
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast Removing Duplicates from Every Column

2007-01-17 Thread Bert Jacobs
Hi,

 

Working further on this dataframe : my_data

 

  Col1 Col2 Col3 ... Col 159 Col 160 

 Row 1  0 0 LD ... 0   VD 

 Row 2  HD0 0  0   MD 

 Row 3  0 HDHD 0   LD 

 Row 4  LDHDHD 0   0 

 ......

 LastRowHDHDLD 0   MD

 

Running this line of code:

Test = apply(X=my_data, MARGIN=2, FUN=unique)

 

I get this list:

 

$Col1

[1] "0" "HD" "LD"   

$Col2

[1] "0" "HD"

$Col3

[1] "LD" "0" "HD"

...

$Col159

[1] "0" 

$Col160

[1] "VD" "MD" "LD" "0"

 

Now I was wondering how I can get this list into a data.frame:

because a simple data.frame doesn't work (error: arguments imply differing
number of rows)

 

Can someone help me out on this. Thx

 

So that I get the following result:

   Col1 Col2 Col3 ... Col 159 Col 160 

 Row 1   0   0LD   0VD

 Row 2 HD   HD   00MD

 Row 3 LD   0HD   0LD

 Row 4  00000

 

 

 

 

-Original Message-
From: Petr Pikal [mailto:[EMAIL PROTECTED] 
Sent: 05 January 2007 11:51
To: Bert Jacobs; 'R help list'
Subject: Re: [R] Fast Removing Duplicates from Every Column

 

Hi

 

I am not sure if I understand how do you want to select unique items.

 

with

 sapply(DF, function(x) !duplicated(x))

you can get data frame with TRUE when an item in particular column is 

unique and FALSE in opposite. However then you need to choose which 

rows to keep or discard

 

e.g.

 

DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]

 

selects all rows in which are 2 or more unique values.

 

HTH

Petr

 

 

On 5 Jan 2007 at 9:54, Bert Jacobs wrote:

 

From: "Bert Jacobs" <[EMAIL PROTECTED]>

To:   "'R help list'" 

Date sent:Fri, 5 Jan 2007 09:54:17 +0100

Subject:  Re: [R] Fast Removing Duplicates from Every Column

 

> Hi,

> 

> I'm looking for some lines of code that does the following:

> I have a dataframe with 160 Columns and a number of rows (max 30):

> 

>  Col1 Col2 Col3 ... Col 159 Col 160 

> Row 1 0 0 LD ... 0   VD 

> Row 2 HD0 0  0   MD 

> Row 3 0 HDHD 0   LD 

> Row 4 LDHDHD 0   0 

> ...   ...

> LastRow   HDHDLD 0   MD

> 

> 

> Now I want a dataframe that looks like this. As you see all duplicates

> are removed. Can this dataframe be constructed in a fast way?

> 

>   Col1 Col2 Col3 ... Col 159 Col 160 

> Row 1   00LD   0  VD

> Row 2   HD   HD   00MD

> Row 3   LD   0HD   0LD

> 

> Thx for helping me out.

> Bert

> 

> __

> R-help@stat.math.ethz.ch mailing list

> https://stat.ethz.ch/mailman/listinfo/r-help

> PLEASE do read the posting guide

> http://www.R-project.org/posting-guide.html and provide commented,

> minimal, self-contained, reproducible code.

 

Petr Pikal

[EMAIL PROTECTED]

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R.oo Destructors

2007-01-17 Thread Feng, Ken [CIB-EQTY]

Has anyone figured out how to create a destructor in R.oo?

How I'd like to use it:  I have an object which opens a connection thru RODBC
(held as a private member) It would be nice if the connection closes 
automatically
(inside the destructor) when an object gets gc()'ed.

Thanks in advance.

Regards,
Ken
BTW, a >BIG< thanks to Henrik Bengtsson for creating the R.oo package!
Lucky for me as I am not smart enough to use S4...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with regexpr in gsub

2007-01-17 Thread Kimpel, Mark William
I have a very long vector of character strings of the format
"GO:0008104.ISS" and need to strip off the dot and anything that follows
it. There are always 10 characters before the dot. The actual characters
and the number of them after the dot is variable.

So, I would like to return in the format "GO:0008104" . I could do this
with substr and loop over the entire vector, but I thought there might
be a more elegant (and faster) way to do this.

I have tried gsub using regular expressions without success. The code 

gsub(pattern= "\.*?" , replacement="", x=character.vector)

correctly locates the positions in the vector that contain the dot, but
replaces all of the strings with "". Obviously not what I want. Is there
a regular expression for replacement that would accomplish what I want?

Or, does R have a better way to do this?

Thanks,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work, & Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Gabor Grothendieck
Try this:

> gsub("[.].*", "", "GO:0008104.ISS")
[1] "GO:0008104"

On 1/17/07, Kimpel, Mark William <[EMAIL PROTECTED]> wrote:
> I have a very long vector of character strings of the format
> "GO:0008104.ISS" and need to strip off the dot and anything that follows
> it. There are always 10 characters before the dot. The actual characters
> and the number of them after the dot is variable.
>
> So, I would like to return in the format "GO:0008104" . I could do this
> with substr and loop over the entire vector, but I thought there might
> be a more elegant (and faster) way to do this.
>
> I have tried gsub using regular expressions without success. The code
>
> gsub(pattern= "\.*?" , replacement="", x=character.vector)
>
> correctly locates the positions in the vector that contain the dot, but
> replaces all of the strings with "". Obviously not what I want. Is there
> a regular expression for replacement that would accomplish what I want?
>
> Or, does R have a better way to do this?
>
> Thanks,
>
> Mark
>
> Mark W. Kimpel MD
>
>
>
> (317) 490-5129 Work, & Mobile
>
>
>
> (317) 663-0513 Home (no voice mail please)
>
> 1-(317)-536-2730 FAX
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Hong Ooi

___


substr is vectorised, so it should work fine without needing an explicit
loop.

-- 
Hong Ooi
Senior Research Analyst, IAG Limited
388 George St, Sydney NSW 2000
+61 (2) 9292 1566
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark
William
Sent: Thursday, 18 January 2007 11:27 AM
To: r-help@stat.math.ethz.ch
Subject: [R] help with regexpr in gsub

I have a very long vector of character strings of the format
"GO:0008104.ISS" and need to strip off the dot and anything that follows
it. There are always 10 characters before the dot. The actual characters
and the number of them after the dot is variable.

So, I would like to return in the format "GO:0008104" . I could do this
with substr and loop over the entire vector, but I thought there might
be a more elegant (and faster) way to do this.

I have tried gsub using regular expressions without success. The code 

gsub(pattern= "\.*?" , replacement="", x=character.vector)

correctly locates the positions in the vector that contain the dot, but
replaces all of the strings with "". Obviously not what I want. Is there
a regular expression for replacement that would accomplish what I want?

Or, does R have a better way to do this?

Thanks,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work, & Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

___

The information transmitted in this message and its attachme...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Seth Falcon
"Kimpel, Mark William" <[EMAIL PROTECTED]> writes:

> I have a very long vector of character strings of the format
> "GO:0008104.ISS" and need to strip off the dot and anything that follows
> it. There are always 10 characters before the dot. The actual characters
> and the number of them after the dot is variable.
>
> So, I would like to return in the format "GO:0008104" . I could do this
> with substr and loop over the entire vector, but I thought there might
> be a more elegant (and faster) way to do this.
>
> I have tried gsub using regular expressions without success. The code 
>
> gsub(pattern= "\.*?" , replacement="", x=character.vector)

I guess you want:

sub("([GO:0-9]+)\\..*$", "\\1", goids)

[You don't need gsub here]

But I don't understand why you wouldn't want to use substr.  At least
for me substr looks to be about 20x faster than sub for this
problem...


  > library(GO)
  > goids = ls(GOTERM)
  > gids = paste(goids, "ISS", sep=".")
  > gids[1:10]
   [1] "GO:001.ISS" "GO:002.ISS" "GO:003.ISS" "GO:004.ISS"
   [5] "GO:006.ISS" "GO:007.ISS" "GO:009.ISS" "GO:010.ISS"
   [9] "GO:011.ISS" "GO:012.ISS"
  
  > system.time(z <- substr(gids, 0, 10))
 user  system elapsed 
0.008   0.000   0.007 
  > system.time(z2 <- sub("([GO:0-9]+)\\..*$", "\\1", gids))
 user  system elapsed 
0.136   0.000   0.134 

+ seth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Marc Schwartz
On Wed, 2007-01-17 at 16:46 -0800, Seth Falcon wrote:
> "Kimpel, Mark William" <[EMAIL PROTECTED]> writes:
> 
> > I have a very long vector of character strings of the format
> > "GO:0008104.ISS" and need to strip off the dot and anything that follows
> > it. There are always 10 characters before the dot. The actual characters
> > and the number of them after the dot is variable.
> >
> > So, I would like to return in the format "GO:0008104" . I could do this
> > with substr and loop over the entire vector, but I thought there might
> > be a more elegant (and faster) way to do this.
> >
> > I have tried gsub using regular expressions without success. The code 
> >
> > gsub(pattern= "\.*?" , replacement="", x=character.vector)
> 
> I guess you want:
> 
> sub("([GO:0-9]+)\\..*$", "\\1", goids)
> 
> [You don't need gsub here]
> 
> But I don't understand why you wouldn't want to use substr.  At least
> for me substr looks to be about 20x faster than sub for this
> problem...
> 
> 
>   > library(GO)
>   > goids = ls(GOTERM)
>   > gids = paste(goids, "ISS", sep=".")
>   > gids[1:10]
>[1] "GO:001.ISS" "GO:002.ISS" "GO:003.ISS" "GO:004.ISS"
>[5] "GO:006.ISS" "GO:007.ISS" "GO:009.ISS" "GO:010.ISS"
>[9] "GO:011.ISS" "GO:012.ISS"
>   
>   > system.time(z <- substr(gids, 0, 10))
>  user  system elapsed 
> 0.008   0.000   0.007 
>   > system.time(z2 <- sub("([GO:0-9]+)\\..*$", "\\1", gids))
>  user  system elapsed 
> 0.136   0.000   0.134 

I think that some of the overhead here in using sub() is due to the
effective partitioning of the source vector, a more complex regex and
then just returning the first element.

This can be shortened to:

# Note that I have 12 elements here
> gids
 [1] "GO:001.ISS" "GO:002.ISS" "GO:003.ISS" "GO:004.ISS"
 [5] "GO:005.ISS" "GO:006.ISS" "GO:007.ISS" "GO:008.ISS"
 [9] "GO:009.ISS" "GO:010.ISS" "GO:011.ISS" "GO:012.ISS"

> system.time(z2 <- sub("\\..+", "", gids))
[1] 0 0 0 0 0

> z2
 [1] "GO:001" "GO:002" "GO:003" "GO:004" "GO:005"
 [6] "GO:006" "GO:007" "GO:008" "GO:009" "GO:010"
[11] "GO:011" "GO:012"


Which would appear to be quicker than using substr().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast Removing Duplicates from Every Column

2007-01-17 Thread jim holtman
Here is one way of doing it by 'padding' all the elements to the same
length:

>  x <- "Col1 Col2 Col3  Col159 Col160
+  Row1  0 0 LD  0   VD
+  Row2  HD0 0  0   MD
+  Row3  0 HDHD 0   LD
+  Row4  LDHDHD 0   0
+  LastRowHDHDLD 0   MD"
>  input <- read.table(textConnection(x), header=TRUE)
> Uniq <- apply(input, 2, unique)
> # find maximum length of an element
> maxLen <- max(sapply(Uniq, length))
> # pad with '0' all element to maxLen
> Uniq <- lapply(Uniq, function(x){
+ c(x, rep('0', maxLen - length(x)))
+ })
> as.data.frame(Uniq)
  Col1 Col2 Col3 Col159 Col160
100   LD  0 VD
2   HD   HD0  0 MD
3   LD0   HD  0 LD
4000  0  0



On 1/17/07, Bert Jacobs <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>
>
> Working further on this dataframe : my_data
>
>
>
>  Col1 Col2 Col3 ... Col 159 Col 160
>
> Row 1  0 0 LD ... 0   VD
>
> Row 2  HD0 0  0   MD
>
> Row 3  0 HDHD 0   LD
>
> Row 4  LDHDHD 0   0
>
> ......
>
> LastRowHDHDLD 0   MD
>
>
>
> Running this line of code:
>
> Test = apply(X=my_data, MARGIN=2, FUN=unique)
>
>
>
> I get this list:
>
>
>
> $Col1
>
> [1] "0" "HD" "LD"
>
> $Col2
>
> [1] "0" "HD"
>
> $Col3
>
> [1] "LD" "0" "HD"
>
> ...
>
> $Col159
>
> [1] "0"
>
> $Col160
>
> [1] "VD" "MD" "LD" "0"
>
>
>
> Now I was wondering how I can get this list into a data.frame:
>
> because a simple data.frame doesn't work (error: arguments imply differing
> number of rows)
>
>
>
> Can someone help me out on this. Thx
>
>
>
> So that I get the following result:
>
>   Col1 Col2 Col3 ... Col 159 Col 160
>
> Row 1   0   0LD   0VD
>
> Row 2 HD   HD   00MD
>
> Row 3 LD   0HD   0LD
>
> Row 4  00000
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Petr Pikal [mailto:[EMAIL PROTECTED]
> Sent: 05 January 2007 11:51
> To: Bert Jacobs; 'R help list'
> Subject: Re: [R] Fast Removing Duplicates from Every Column
>
>
>
> Hi
>
>
>
> I am not sure if I understand how do you want to select unique items.
>
>
>
> with
>
> sapply(DF, function(x) !duplicated(x))
>
> you can get data frame with TRUE when an item in particular column is
>
> unique and FALSE in opposite. However then you need to choose which
>
> rows to keep or discard
>
>
>
> e.g.
>
>
>
> DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]
>
>
>
> selects all rows in which are 2 or more unique values.
>
>
>
> HTH
>
> Petr
>
>
>
>
>
> On 5 Jan 2007 at 9:54, Bert Jacobs wrote:
>
>
>
> From: "Bert Jacobs" <[EMAIL PROTECTED]>
>
> To:   "'R help list'" 
>
> Date sent:Fri, 5 Jan 2007 09:54:17 +0100
>
> Subject:  Re: [R] Fast Removing Duplicates from Every Column
>
>
>
> > Hi,
>
> >
>
> > I'm looking for some lines of code that does the following:
>
> > I have a dataframe with 160 Columns and a number of rows (max 30):
>
> >
>
> >  Col1 Col2 Col3 ... Col 159 Col 160
>
> > Row 1 0 0 LD ... 0   VD
>
> > Row 2 HD0 0  0   MD
>
> > Row 3 0 HDHD 0   LD
>
> > Row 4 LDHDHD 0   0
>
> > ...   ...
>
> > LastRow   HDHDLD 0   MD
>
> >
>
> >
>
> > Now I want a dataframe that looks like this. As you see all duplicates
>
> > are removed. Can this dataframe be constructed in a fast way?
>
> >
>
> >   Col1 Col2 Col3 ... Col 159 Col 160
>
> > Row 1   00LD   0  VD
>
> > Row 2   HD   HD   00MD
>
> > Row 3   LD   0HD   0LD
>
> >
>
> > Thx for helping me out.
>
> > Bert
>
> >
>
> > __
>
> > R-help@stat.math.ethz.ch mailing list
>
> > https://stat.ethz.ch/mailman/listinfo/r-help
>
> > PLEASE do read the posting guide
>
> > http://www.R-project.org/posting-guide.html and provide commented,
>
> > minimal, self-contained, reproducible code.
>
>
>
> Petr Pikal
>
> [EMAIL PROTECTED]
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R.oo Destructors

2007-01-17 Thread Henrik Bengtsson
Hi,

I'm about the head out of the office next 48 hours, but the short
answer is to override the finalize() method in your subclass of
Object.  This will be called when R garbage collects the object.  From
?finalize.Object:

 setConstructorS3("MyClass", function() {
extend(Object(), "MyClass")
  })

  setMethodS3("finalize", "MyClass", function(this) {
cat(as.character(this), "is about to be removed from the memory!\n")
  })

  o <- MyClass()
  o <- MyClass()
  o <- MyClass()
  o <- MyClass()
  gc()

  ## Not run:
MyClass: 0x31543776 is about to be removed from the memory!
MyClass: 0x31772572 is about to be removed from the memory!
MyClass: 0x32553244 is about to be removed from the memory!
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 246049  6.6 407500 10.9   35  9.4
Vcells 132538  1.1 786432  6.0   404358  3.1

rm(o)
gc()

MyClass: 0x31424196 is about to be removed from the memory!
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 246782  6.6 467875 12.5   354145  9.5
Vcells 110362  0.9 786432  6.0   404358  3.1

Hope this helps

Henrik

On 1/18/07, Feng, Ken [CIB-EQTY] <[EMAIL PROTECTED]> wrote:
>
> Has anyone figured out how to create a destructor in R.oo?
>
> How I'd like to use it:  I have an object which opens a connection thru RODBC
> (held as a private member) It would be nice if the connection closes 
> automatically
> (inside the destructor) when an object gets gc()'ed.
>
> Thanks in advance.
>
> Regards,
> Ken
> BTW, a >BIG< thanks to Henrik Bengtsson for creating the R.oo package!
> Lucky for me as I am not smart enough to use S4...
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Kimpel, Mark William
Thanks for 6 ways to skin this cat! I am just beginning to learn about
the power of regular expressions and appreciate the many examples of how
they can be used in this context. This knowledge will come in handy the
next time the number of characters is variable both before and after the
dot. On my machine and for my particular example, however, Seth is
correct in that substr is by far the fastest. I had forgotten that
substr is vectorized.

Below is the output of my speed trials and sessionInfo in case anyone is
curious. I artificially made the go.id vector 10X its normal length to
magnify differences. I did also check to verify that each solution
worked as predicted, which they all did.

Thanks again for your generous help, Mark

length(go.ids)
[1] 79750
> go.ids[1:5]
[1] "GO:0006091.NA"  "GO:0008104.ISS" "GO:0008104.ISS" "GO:0006091.NA"
"GO:0006091.NAS"
> system.time(z <- gsub("[.].*", "", go.ids))
[1] 0.47 0.00 0.47   NA   NA
> system.time(z <- gsub('\\..+$','', go.ids))
[1] 0.56 0.00 0.56   NA   NA
> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
[1] 1.08 0.00 1.09   NA   NA
> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
[1] 1.03 0.00 1.03   NA   NA
> system.time(z <- sub("\\..+", "", go.ids))
[1] 0.49 0.00 0.48   NA   NA
> system.time(z <- substr(go.ids, 0, 10))
[1] 0.02 0.00 0.01   NA   NA
> sessionInfo()
R version 2.4.1 (2006-12-18) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] "splines"   "stats" "graphics"  "grDevices" "datasets"  "utils"
"tools" "methods"   "base" 

other attached packages:
rat2302 xlsReadWritePro  qvalue   affycoretools
biomaRt   RCurl XML GOstatsCategory 
   "1.14.0" "1.0.6" "1.8.0" "1.6.0"
"1.8.1" "0.8-0" "1.2-0" "2.0.4" "2.0.3" 
 genefiltersurvivalKEGGRBGL
annotate  GO   graph RWinEdt   limma

   "1.12.0"  "2.30""1.14.1""1.10.0"
"1.12.1""1.14.1""1.12.0" "1.7-5" "2.9.1"

   affy  affyio Biobase 
   "1.12.2" "1.2.0""1.12.2"

Mark W. Kimpel MD 

 

(317) 490-5129 Work, & Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Marc Schwartz [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 17, 2007 8:11 PM
To: Seth Falcon
Cc: Kimpel, Mark William; r-help@stat.math.ethz.ch
Subject: Re: [R] help with regexpr in gsub

On Wed, 2007-01-17 at 16:46 -0800, Seth Falcon wrote:
> "Kimpel, Mark William" <[EMAIL PROTECTED]> writes:
> 
> > I have a very long vector of character strings of the format
> > "GO:0008104.ISS" and need to strip off the dot and anything that
follows
> > it. There are always 10 characters before the dot. The actual
characters
> > and the number of them after the dot is variable.
> >
> > So, I would like to return in the format "GO:0008104" . I could do
this
> > with substr and loop over the entire vector, but I thought there
might
> > be a more elegant (and faster) way to do this.
> >
> > I have tried gsub using regular expressions without success. The
code 
> >
> > gsub(pattern= "\.*?" , replacement="", x=character.vector)
> 
> I guess you want:
> 
> sub("([GO:0-9]+)\\..*$", "\\1", goids)
> 
> [You don't need gsub here]
> 
> But I don't understand why you wouldn't want to use substr.  At least
> for me substr looks to be about 20x faster than sub for this
> problem...
> 
> 
>   > library(GO)
>   > goids = ls(GOTERM)
>   > gids = paste(goids, "ISS", sep=".")
>   > gids[1:10]
>[1] "GO:001.ISS" "GO:002.ISS" "GO:003.ISS"
"GO:004.ISS"
>[5] "GO:006.ISS" "GO:007.ISS" "GO:009.ISS"
"GO:010.ISS"
>[9] "GO:011.ISS" "GO:012.ISS"
>   
>   > system.time(z <- substr(gids, 0, 10))
>  user  system elapsed 
> 0.008   0.000   0.007 
>   > system.time(z2 <- sub("([GO:0-9]+)\\..*$", "\\1", gids))
>  user  system elapsed 
> 0.136   0.000   0.134 

I think that some of the overhead here in using sub() is due to the
effective partitioning of the source vector, a more complex regex and
then just returning the first element.

This can be shortened to:

# Note that I have 12 elements here
> gids
 [1] "GO:001.ISS" "GO:002.ISS" "GO:003.ISS" "GO:004.ISS"
 [5] "GO:005.ISS" "GO:006.ISS" "GO:007.ISS" "GO:008.ISS"
 [9] "GO:009.ISS" "GO:010.ISS" "GO:011.ISS" "GO:012.ISS"

> system.time(z2 <- sub("\\..+", "", gids))
[1] 0 0 0 0 0

> z2
 [1] "GO:001" "GO:002" "GO:003" "GO:004" "GO:005"
 [6] "GO:006" "GO:007" "GO:008" "GO:009" "GO:010"
[11] "GO:011" "GO:012"


Which would appe

Re: [R] Memory leak with character arrays?

2007-01-17 Thread jim holtman
What does the FASTA header look like.  You are using 'gene' to access things
in the array and if (for example) 'gene' is a character vector of 10, then
for every element of vectors that you are using (I count about 4-5 that use
this index) then you are going to have at least 550 * 6000 * 5 * 10 more
bytes (165MB) used just to store the names of the elements.

You are also dynamically increasing the size of the vectors which means a
lot of copying of the objects and therefore using a lot of memory that is
probably fragmenting your memory.

So if you look at all these vectors, how many of them will contain data?
What you might want to do is to preprocess the data (pass 1) to find out how
many 'gene's there are and then create a factor from this.  You can then
statically allocate the vectors and use the numeric value of the factor to
index into the vector.


So you might have fragmentation (that seems to be what your 'ps' command is
showing.  So it looks like a two pass process: 1) determine how many genes
you have and statically allocate, 2) go through the data and use the
'factor' to index into the vectors.

On 1/17/07, Peter Waltman <[EMAIL PROTECTED]> wrote:
>
> Hi -
>
> When I'm trying to read in a text file into a labeled character array,
> the memory stamp/footprint of R will exceed 4 gigs or more.  I've seen
> this behavior on Mac OS X, Linux for AMD_64 and X86_64., and the R
> versions are 2.4, 2.4 and 2.2, respectively.  So, it would seem that
> this is platform and R version independant.
>
> The file that I'm reading contains the upstream regions of the yeast
> genome, with each upstream region labeled using a FASTA header, i.e.:
>
>FASTA header for gene 1
>upstream region.
>.
>
>FASTA header for gene 2
>upstream
>
>
> The script I use - code below - opens the file, parses for a FASTA
> header, and then parses the header for the gene name.  Once this is
> done, it reads the following lines which contain the upstream region,
> and then adds it as an item to the character array, using the gene name
> as the name of the item it adds.  And then continues on to the following
> genes.
>
> Each upstream region (the text to be added) is 550 bases (characters)
> long.  With ~6000 genes in the file I'm reading it, this would be 550 *
> 6000 * 8 (if we're using ascii chars) ~= 25 Megs (if we're using ascii
> chars).
>
> I realize that the character arrays/vectors will have a higher memory
> stamp b/c they are a named array and most likely aren't storing the text
> as ascii, but 4 gigs and up seems a bit excessive.  Or is it?
>
> For an example, this is the output of top, at the point which R has
> processed around 5000 genes:
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 4969 waltman   18   0 *6746m 3.4g*  920 D  2.7 88.2  19:09.19 R
>
> Is this expected behavior?  Can anyone recommend a less memory intensive
> way to store this data?  The relevant code that reads in the file follows:
>
> code
> lines <- readLines( gzfile( seqs.fname ) )
>
>  n.seqs <- 0
>
>  upstream <- gene.names <- character()
>  syn <- character( 0 )
>  gene.start <- gene.end <- integer()
>  gene <- seq <- ""
>
>
>  for ( i in 1:length( lines ) ) {
>line <- lines[ i ]
>if ( line == "" ) next
>if ( substr( line, 1, 1 ) == ">" ) {
>
>  if ( seq != "" && gene != "" ) upstream[ gene ] <-
>toupper( seq )
>  splitted <- strsplit( line, "\t" )[[ 1 ]]
>  splitted <- strsplit( splitted[ 1 ], ";\\ " )[[ 1 ]]
>  gene <- toupper( substr( splitted[ 1 ], 2, nchar(
>splitted[ 1 ] ) ) )
>  syn <- splitted[ 2 ]
>  if ( ! is.null( syn ) &&
>  length( grep( valid.gene.regexp, gene, perl=T ) ) == 0 &&
>  length( grep( valid.gene.regexp, syn, perl=T ) ) == 1
>) gene <- syn
>  else if ( length( grep( valid.gene.regexp, gene, perl=T,
>ignore.case=T ) ) == 0 &&
>   length( grep( valid.gene.regexp, syn, perl=T,
>ignore.case=T ) ) == 0 ) next
>  gene.start[ gene ] <- as.integer( splitted[ 9 ] )
>  gene.end[ gene ] <- as.integer( splitted[ 10 ] )
>  if ( n.seqs %% 100 == 0 ) cat.new( n.seqs, gene, "|", syn,
>"| length=", nchar( seq ),
>   gene.end[gene]-gene.start[gene]+1,"\n" )
>  if ( ! is.na( syn ) && syn != "" ) gene.names[ gene ] <- syn
>  else gene.names[ gene ] <- toupper( gene )
>  n.seqs <- n.seqs + 1
>  seq <- ""
>} else {
>  seq <- paste( seq, line, sep="" )
>}
>  }
>  if ( seq != "" && gene != "" ) upstream[ gene ] <- toupper( seq )
>
> code
>
> Thanks,
>
> Peter Waltman
>
> __
> R-help@stat.math

Re: [R] Memory leak with character arrays?

2007-01-17 Thread Peter Waltman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coefficient of determination when intercept is zero

2007-01-17 Thread Prof Brian Ripley
This is documented on ?summary.lm.

It is not that 'intercept is zero' or 'zero intercept', it is that there 
is no intercept term in the model.

On Wed, 17 Jan 2007, endeitz wrote:

>
> I am curious as to the "lm" calculation of R2 (multiple coefficient of
> determination, I assume) when intercept is zero.  I have 18 data points, two
> independent variables:
>
> First, a model with an intercept:
>
>> mod0=lm(Div~Rain+Evap,data=test)
>> summary(mod0)$r.squared
> [1] 0.6257541
>> cor(predict(mod0),test$Div)^2
> [1] 0.6257541
>
> The $r.squared and the result from "cor" are the same, as I would expect.
>
> Now we try a model with zero intercept:
>
>> mod1=lm(Div~0+Rain+Evap,data=test)
>> summary(mod1)$r.squared
> [1] 0.9099358
>> cor(predict(mod1),test$Div)^2
> [1] 0.5813659
>
> Why has the $r.squared value increased to 0.9?  And now the result from
> "cor" is not the same?  Is there a special way to calculate the coefficient
> of determination when the intercept is zero?
>
> Cheers,
>
> Ed.
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to get the index of entry with max value in an array?

2007-01-17 Thread Feng Qiu
Hi all:
 A short question:
 For example,  a=[3,4,6,2,3], obviously the 3rd entry of the array has 
the maxium value, what I want is index of the maxium value: 3. is there a neat 
expression to get this index? 

Thank you!

Best,

Feng


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get the index of entry with max value in an array?

2007-01-17 Thread talepanda
In R language, one solution is:

a<-c(3,4,6,2,3)
which(a==max(a))


On 1/18/07, Feng Qiu <[EMAIL PROTECTED]> wrote:
> Hi all:
>  A short question:
>  For example,  a=[3,4,6,2,3], obviously the 3rd entry of the array
> has the maxium value, what I want is index of the maxium value: 3. is there
> a neat expression to get this index?
>
> Thank you!
>
> Best,
>
> Feng
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get the index of entry with max value in an array?

2007-01-17 Thread Christos Hatzis
Or
which.max(a)

-Christos 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of talepanda
Sent: Wednesday, January 17, 2007 11:45 PM
To: Feng Qiu
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] how to get the index of entry with max value in an array?

In R language, one solution is:

a<-c(3,4,6,2,3)
which(a==max(a))


On 1/18/07, Feng Qiu <[EMAIL PROTECTED]> wrote:
> Hi all:
>  A short question:
>  For example,  a=[3,4,6,2,3], obviously the 3rd entry of the 
> array has the maxium value, what I want is index of the maxium value: 
> 3. is there a neat expression to get this index?
>
> Thank you!
>
> Best,
>
> Feng
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Prof Brian Ripley
One thing to watch with experiments like this is that the locale will 
matter.  Character operations will be faster in a single-byte locale (as 
used here) than in a variable-byte locale (and I suspect Seth and Marc 
used UTF-8), and the relative speeds may alter.  Also, the PCRE regexps 
are often much faster, and 'useBytes' can be much faster with ASCII data 
in UTF-8.

For example:

# R-devel, x86_64 Linux
library(GO)
goids <- ls(GOTERM)
gids <- paste(goids, "ISS", sep=".")
go.ids <- rep(gids, 10)
> length(go.ids)
[1] 205950

# In en_GB (single byte)

> system.time(z <- gsub("[.].*", "", go.ids))
user  system elapsed
   1.709   0.004   1.716
> system.time(z <- gsub("[.].*", "", go.ids, perl=TRUE))
user  system elapsed
   0.241   0.004   0.246

> system.time(z <- gsub('\\..+$','', go.ids))
user  system elapsed
   2.254   0.018   2.286
> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
user  system elapsed
   2.890   0.002   2.895
> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
user  system elapsed
   2.716   0.002   2.721
> system.time(z <- sub("\\..+", "", go.ids))
user  system elapsed
   1.724   0.001   1.725
> system.time(z <- substr(go.ids, 0, 10))
user  system elapsed
   0.084   0.000   0.084

# in en_GB.utf8

> system.time(z <- gsub("[.].*", "", go.ids))
user  system elapsed
   1.689   0.020   1.712
> system.time(z <- gsub("[.].*", "", go.ids, perl=TRUE))
user  system elapsed
   0.718   0.017   0.736
> system.time(z <- gsub("[.].*", "", go.ids, perl=TRUE, useByte=TRUE))
user  system elapsed
   0.243   0.001   0.244

> system.time(z <- gsub('\\..+$','', go.ids))
user  system elapsed
   2.509   0.024   2.537
> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
user  system elapsed
   3.772   0.004   3.779
> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
user  system elapsed
   4.088   0.007   4.099
> system.time(z <- sub("\\..+", "", go.ids))
user  system elapsed
   1.920   0.004   1.927
> system.time(z <- substr(go.ids, 0, 10))
user  system elapsed
   0.096   0.002   0.098

substr still wins, but by a much smaller margin.


On Wed, 17 Jan 2007, Kimpel, Mark William wrote:

> Thanks for 6 ways to skin this cat! I am just beginning to learn about
> the power of regular expressions and appreciate the many examples of how
> they can be used in this context. This knowledge will come in handy the
> next time the number of characters is variable both before and after the
> dot. On my machine and for my particular example, however, Seth is
> correct in that substr is by far the fastest. I had forgotten that
> substr is vectorized.
>
> Below is the output of my speed trials and sessionInfo in case anyone is
> curious. I artificially made the go.id vector 10X its normal length to
> magnify differences. I did also check to verify that each solution
> worked as predicted, which they all did.
>
> Thanks again for your generous help, Mark
>
> length(go.ids)
> [1] 79750
>> go.ids[1:5]
> [1] "GO:0006091.NA"  "GO:0008104.ISS" "GO:0008104.ISS" "GO:0006091.NA"
> "GO:0006091.NAS"
>> system.time(z <- gsub("[.].*", "", go.ids))
> [1] 0.47 0.00 0.47   NA   NA
>> system.time(z <- gsub('\\..+$','', go.ids))
> [1] 0.56 0.00 0.56   NA   NA
>> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
> [1] 1.08 0.00 1.09   NA   NA
>> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
> [1] 1.03 0.00 1.03   NA   NA
>> system.time(z <- sub("\\..+", "", go.ids))
> [1] 0.49 0.00 0.48   NA   NA
>> system.time(z <- substr(go.ids, 0, 10))
> [1] 0.02 0.00 0.01   NA   NA
>> sessionInfo()
> R version 2.4.1 (2006-12-18)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "splines"   "stats" "graphics"  "grDevices" "datasets"  "utils"
> "tools" "methods"   "base"
>
> other attached packages:
>rat2302 xlsReadWritePro  qvalue   affycoretools
> biomaRt   RCurl XML GOstatsCategory
>   "1.14.0" "1.0.6" "1.8.0" "1.6.0"
> "1.8.1" "0.8-0" "1.2-0" "2.0.4" "2.0.3"
> genefiltersurvivalKEGGRBGL
> annotate  GO   graph RWinEdt   limma
>
>   "1.12.0"  "2.30""1.14.1""1.10.0"
> "1.12.1""1.14.1""1.12.0" "1.7-5" "2.9.1"
>
>   affy  affyio Biobase
>   "1.12.2" "1.2.0""1.12.2"
>
> Mark W. Kimpel MD
>
>
>
> (317) 490-5129 Work, & Mobile
>
>
>
> (317) 663-0513 Home (no voice mail please)
>
> 1-(317)-536-2730 FAX
>
>
> -Original Message-
> From: Marc Schwartz [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 17, 2007 8:11 PM
> To: Seth Falcon
> Cc: Kimpel, Mark William; r

Re: [R] how to get the index of entry with max value in an array?

2007-01-17 Thread Prof Brian Ripley
?which.max
?which.is.min

On Wed, 17 Jan 2007, Feng Qiu wrote:

> Hi all:
> A short question:
> For example,  a=[3,4,6,2,3], obviously the 3rd entry of the array has 
> the maxium value, what I want is index of the maxium value: 3. is there a 
> neat expression to get this index?
>
> Thank you!
>
> Best,
>
> Feng
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get the index of entry with max value in an array?

2007-01-17 Thread jim holtman
which.max

> a <- sample(1:10,10)
> a
 [1]  3  4  5  7  2  8  9  6 10  1
> which.max(a)
[1] 9
>



On 1/17/07, Feng Qiu <[EMAIL PROTECTED]> wrote:
>
> Hi all:
> A short question:
> For example,  a=[3,4,6,2,3], obviously the 3rd entry of the array
> has the maxium value, what I want is index of the maxium value: 3. is there
> a neat expression to get this index?
>
> Thank you!
>
> Best,
>
> Feng
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Kimpel, Mark William
Thanks Brian, that advice may help speed up my regexp operations in the
future. The computer science advice offered by those of you who are more
expert is appreciated by we biologists who are primarily working more at
the level of bioinformatics. Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work, & Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 17, 2007 11:49 PM
To: Kimpel, Mark William
Cc: [EMAIL PROTECTED]; Seth Falcon; r-help@stat.math.ethz.ch
Subject: Re: [R] help with regexpr in gsub

One thing to watch with experiments like this is that the locale will 
matter.  Character operations will be faster in a single-byte locale (as

used here) than in a variable-byte locale (and I suspect Seth and Marc 
used UTF-8), and the relative speeds may alter.  Also, the PCRE regexps 
are often much faster, and 'useBytes' can be much faster with ASCII data

in UTF-8.

For example:

# R-devel, x86_64 Linux
library(GO)
goids <- ls(GOTERM)
gids <- paste(goids, "ISS", sep=".")
go.ids <- rep(gids, 10)
> length(go.ids)
[1] 205950

# In en_GB (single byte)

> system.time(z <- gsub("[.].*", "", go.ids))
user  system elapsed
   1.709   0.004   1.716
> system.time(z <- gsub("[.].*", "", go.ids, perl=TRUE))
user  system elapsed
   0.241   0.004   0.246

> system.time(z <- gsub('\\..+$','', go.ids))
user  system elapsed
   2.254   0.018   2.286
> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
user  system elapsed
   2.890   0.002   2.895
> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
user  system elapsed
   2.716   0.002   2.721
> system.time(z <- sub("\\..+", "", go.ids))
user  system elapsed
   1.724   0.001   1.725
> system.time(z <- substr(go.ids, 0, 10))
user  system elapsed
   0.084   0.000   0.084

# in en_GB.utf8

> system.time(z <- gsub("[.].*", "", go.ids))
user  system elapsed
   1.689   0.020   1.712
> system.time(z <- gsub("[.].*", "", go.ids, perl=TRUE))
user  system elapsed
   0.718   0.017   0.736
> system.time(z <- gsub("[.].*", "", go.ids, perl=TRUE, useByte=TRUE))
user  system elapsed
   0.243   0.001   0.244

> system.time(z <- gsub('\\..+$','', go.ids))
user  system elapsed
   2.509   0.024   2.537
> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
user  system elapsed
   3.772   0.004   3.779
> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
user  system elapsed
   4.088   0.007   4.099
> system.time(z <- sub("\\..+", "", go.ids))
user  system elapsed
   1.920   0.004   1.927
> system.time(z <- substr(go.ids, 0, 10))
user  system elapsed
   0.096   0.002   0.098

substr still wins, but by a much smaller margin.


On Wed, 17 Jan 2007, Kimpel, Mark William wrote:

> Thanks for 6 ways to skin this cat! I am just beginning to learn about
> the power of regular expressions and appreciate the many examples of
how
> they can be used in this context. This knowledge will come in handy
the
> next time the number of characters is variable both before and after
the
> dot. On my machine and for my particular example, however, Seth is
> correct in that substr is by far the fastest. I had forgotten that
> substr is vectorized.
>
> Below is the output of my speed trials and sessionInfo in case anyone
is
> curious. I artificially made the go.id vector 10X its normal length to
> magnify differences. I did also check to verify that each solution
> worked as predicted, which they all did.
>
> Thanks again for your generous help, Mark
>
> length(go.ids)
> [1] 79750
>> go.ids[1:5]
> [1] "GO:0006091.NA"  "GO:0008104.ISS" "GO:0008104.ISS" "GO:0006091.NA"
> "GO:0006091.NAS"
>> system.time(z <- gsub("[.].*", "", go.ids))
> [1] 0.47 0.00 0.47   NA   NA
>> system.time(z <- gsub('\\..+$','', go.ids))
> [1] 0.56 0.00 0.56   NA   NA
>> system.time(z <- gsub('([^.]+)\\..*','\\1',go.ids))
> [1] 1.08 0.00 1.09   NA   NA
>> system.time(z <- sub("([GO:0-9]+)\\..*$", "\\1", go.ids))
> [1] 1.03 0.00 1.03   NA   NA
>> system.time(z <- sub("\\..+", "", go.ids))
> [1] 0.49 0.00 0.48   NA   NA
>> system.time(z <- substr(go.ids, 0, 10))
> [1] 0.02 0.00 0.01   NA   NA
>> sessionInfo()
> R version 2.4.1 (2006-12-18)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "splines"   "stats" "graphics"  "grDevices" "datasets"
"utils"
> "tools" "methods"   "base"
>
> other attached packages:
>rat2302 xlsReadWritePro  qvalue   affycoretools
> biomaRt   RCurl XML GOstats
Category
>   "1.14.0" "1.0.6" "1.8.0" "1.6.0"
> "1.8.1" "0.8-0" "1.2-0" "2.0.4"
"2.0.3"
> genefiltersurvivalKEGGRBGL
> annotate   

Re: [R] how to get the index of entry with max value in an array?

2007-01-17 Thread Benilton Carvalho
which.max()
b

On Jan 17, 2007, at 11:20 PM, Feng Qiu wrote:

> Hi all:
>  A short question:
>  For example,  a=[3,4,6,2,3], obviously the 3rd entry of  
> the array has the maxium value, what I want is index of the maxium  
> value: 3. is there a neat expression to get this index?
>
> Thank you!
>
> Best,
>
> Feng

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coefficient of determination when intercept is zero

2007-01-17 Thread endeitz

Thanks for the guidance.  With the help of your explanation, I was able to
find this reference that provides a good explanation of why the definition
may be different in the "no intercept" or "regression through origin" case. 
For future readers of this list, the reference is:

Joseph G. Eisenhauer (2003)
Regression through the Origin
Teaching Statistics 25 (3), 76–80.
doi:10./1467-9639.00136

Cheers,

Ed.



Prof Brian Ripley wrote:
> 
> This is documented on ?summary.lm.
> 
> It is not that 'intercept is zero' or 'zero intercept', it is that there 
> is no intercept term in the model.
> 
> On Wed, 17 Jan 2007, endeitz wrote:
> 
>>
>> I am curious as to the "lm" calculation of R2 (multiple coefficient of
>> determination, I assume) when intercept is zero.  I have 18 data points,
>> two
>> independent variables:
>>
>> First, a model with an intercept:
>>
>>> mod0=lm(Div~Rain+Evap,data=test)
>>> summary(mod0)$r.squared
>> [1] 0.6257541
>>> cor(predict(mod0),test$Div)^2
>> [1] 0.6257541
>>
>> The $r.squared and the result from "cor" are the same, as I would expect.
>>
>> Now we try a model with zero intercept:
>>
>>> mod1=lm(Div~0+Rain+Evap,data=test)
>>> summary(mod1)$r.squared
>> [1] 0.9099358
>>> cor(predict(mod1),test$Div)^2
>> [1] 0.5813659
>>
>> Why has the $r.squared value increased to 0.9?  And now the result from
>> "cor" is not the same?  Is there a special way to calculate the
>> coefficient
>> of determination when the intercept is zero?
>>
>> Cheers,
>>
>> Ed.
>>
>>
> 
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Coefficient-of-determination-when-intercept-is-zero-tf3030776.html#a8425379
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 3D Plot

2007-01-17 Thread Arun Kumar Saha
Hi all R users,

I want to draw a 3D plot, where the Z-axis will represent the normal
densities each with zero mean but different volatilities, Y-axis will
represent the SD [volatilities), and X-axis will represent time at which
these SD are calculated.

Can anyone give me any clue? Your help will be highly appreciated.

Thanks and regards,
Arun

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] selecting rows for inclusion in lm

2007-01-17 Thread John Sorkin
I am having trouble selecting rows of a dataframe that will be included
in a regression. I am trying to select those rows for which the variable
Meno equals PRE. I have used the code below:

difffitPre<-lm(data[,"diff"]~data[,"Age"]+data[,"Race"],data=data[data[,"Meno"]=="PRE",])
summary(difffitPre)

The output from the summary indicates that more than 76 rows are
included in the regression:

Residual standard error: 2.828 on 76 degrees of freedom

where in fact only 22 rows should be included as can be seen from the
following:

print(data[length(data[,"Meno"]=="PRE","Meno"]))
[1] 22

I would appreciate any help in modifying the data= parameter of the lm
so that I include only those subjects for which Meno=PRE.

R 2.3.1
Windows XP

Thanks,
John

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
[EMAIL PROTECTED]

Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory leak with character arrays?

2007-01-17 Thread Peter Waltman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.