[R] Problem in performing goodness of fit test in R.

2010-02-13 Thread Faiz Rasool
I am trying to perform goodness of fit test using R. I am using this website 
http://wiener.math.csi.cuny.edu/Statistics/R/simpleR/stat013.html for help. 
However, I am unable to carry out the test successfully. My code follows. It is 
taken from the website just mentioned. 
freq=c(22,21,22,27,22,36) # frequencies obtained after rolling the dice 150 
times.
prob=c(1,1,1,1,1,1)/6 # specify expected frequency for each category.
chisq.test(freq,p=prob) # I do not know what this line means. I just followed 
instructions on the website.
The erorr I receive is "erorr in chisq.test(freq,p=prob)/6 probabilities must 
sum to 1" 

I am very new to R, so any help would be appreciated. 
Faiz.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS and RODBC

2010-02-13 Thread Daniel Nordlund
> -Original Message-
> From: Frank E Harrell Jr [mailto:f.harr...@vanderbilt.edu]
> Sent: Saturday, February 13, 2010 5:49 AM
> To: Daniel Nordlund
> Cc: r-help@r-project.org
> Subject: Re: [R] SAS and RODBC
> 
> Daniel Nordlund wrote:
> . . .
> 
> >
> > This is just a quick follow-up to my previous post.  Based on Prof.
> Ripley's response I went back and looked at the SAS log file and reread
> the RODBC help pages.  The problem of writing a SAS dataset was solved by
> setting colQuote=NULL in the call to the odbcConnect() function.
> >
> > ch <- odbcConnect('sasodbc', believeNRows=FALSE, colQuote=NULL)
> >
> > I hope this will be useful to others who may have the SAS BASE product
> and want to do graphics or statistical analyses with their SAS data, but
> can't afford the high licensing fees for the SAS STAT and GRAPH modules.
> Thanks to Prof. Ripley for the fine RODBC package.
> >
> > Dan
> >
> > Daniel Nordlund
> > Bothell, WA USA
> >
> >
> 
> Daniel since you have SAS BASE installed why not use sas.get in the
> Hmisc package and also get access to metadata such as variable labels
> that ODBC does not handle?  Besides providing better documentation, the
> labels are very useful as axis labels in plotting, etc.
> 
> Frank
> 

Frank,

I have used sas.get from Hmisc before, and I will continue to use it. I 
appreciate the work that you and your colleagues have done with Hmisc and the 
Design and rms packages.  However, the sas.get function still appears to be 
broken on Windows platforms (or Windows is broken :-).  I know how to fix the 
problem, but I am always looking for approaches where I don't have to fix 
things.   It may well be that the better documentation provided by sas.get will 
prove to out weigh the inconvenience of having to source an edited version of 
sas.get for my regular use.

As for moving data from R to SAS, I don't know of any methods other than the 
RODBC package with the SAS ODBC driver for writing SAS datasets.  Yes, I can 
write to csv or other file types that SAS can import, but if I can eliminate 
extra steps when going from R to SAS then that is a plus for me. 

Thanks again for the great tools,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NextMethod() example from S Programming by Vena bles and Ripley (page 78)

2010-02-13 Thread Ken Knoblauch
blue sky  gmail.com> writes:
> S Programming by Venables and Ripley (page 78) has the example listed
> at the end of this email. However, I get the following error when I
> try the example. I don't understand the descriptions of NextMethod on
> its help page. Could somebody let me know how to fix the error of this
> example?
> > test(x)
> c1
> c2
> Error in NextMethod() : no method to invoke
> Calls: test -> test.c1 -> NextMethod -> test.c2 -> NextMethod
> Execution halted
> 
> 
> test=function(x) UseMethod('test')
> 
> test.c1=function(x) {
>   cat('c1\n')
>   NextMethod()
>   x
> }
> 
> test.c2=function(x) {
>   cat('c2\n')
>   NextMethod()
>   x
> }
> 
> test.c3=function(x) {
>   cat('c3\n')
>   x
> }
> 
> x=1
> class(x)=c('c1','c2')
> test(x)
It works fine for me if you add a default method,
which I think is what it is looking for. 

test.default <- function(x){
cat("default")
x
}

 test(x)
c1
c2
default[1] 1
attr(,"class")
[1] "c1" "c2"



-- 
Ken Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NextMethod() example from S Programming by Venables and Ripley (page 78)

2010-02-13 Thread blue sky
S Programming by Venables and Ripley (page 78) has the example listed
at the end of this email. However, I get the following error when I
try the example. I don't understand the descriptions of NextMethod on
its help page. Could somebody let me know how to fix the error of this
example?

> test(x)
c1
c2
Error in NextMethod() : no method to invoke
Calls: test -> test.c1 -> NextMethod -> test.c2 -> NextMethod
Execution halted


test=function(x) UseMethod('test')

test.c1=function(x) {
  cat('c1\n')
  NextMethod()
  x
}

test.c2=function(x) {
  cat('c2\n')
  NextMethod()
  x
}

test.c3=function(x) {
  cat('c3\n')
  x
}

x=1
class(x)=c('c1','c2')
test(x)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dimensional reduction package

2010-02-13 Thread mauede
Is there any R package which implements non-linear dimensionality reduction 
(LLE, ISOMAP, GTM, and so on)  and/or intrinsic dimensionality estimation ?
Thank you,
Maura


tutti i telefonini TIM!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm function in R

2010-02-13 Thread Ista Zahn
Dear Something,
I think you have no hope of completing this project without going back
to an introductory regression text. The two models you listed (Y-Hat =
b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk) and (Y-Hat = b0 + b1X1 +
b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c) are identical
except for the typos. Cohen and Cohen [1] might be a good place to
start.

Best,
Ista

[1] 
http://www.amazon.com/Multiple-Regression-Correlation-Analysis-Behavioral/dp/0805822232

On Sat, Feb 13, 2010 at 8:03 PM, Something Something
 wrote:
>>>From your question it is difficult to determine what sort of tutoring you
> are expecting.
> The kind of tutoring you are providing is definitely helpful :)
>
> I guess I should quickly explain what I am trying to accomplish.  I have
> been a computer scientist for quite a few years, but I studied Statistics
> long time ago.  Now trying to get back into the Statistics field.
>
> For my latest project I have been asked to implement Multiple Regression
> Analysis (using interactions - I guess) in Java & Distributed Computing
> (more specifically Hadoop).  I looked at the possibility of using rJava, but
> it's 'single threaded' model plus complex deployment on 1000s of machines
> does not make it very appealing.  I looked at using 3rd party libraries such
> as 'Apache Commons Math' & Flanagan's JAVA statistics library.  Both of them
> use this formula:
> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk
>
> and NOT the one I have been asked to implement - which is -
> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c
>
> (Not to mention these tools do not use BigDecimal so the answers given by
> them are not as precise as those from R).
>
> So now the deadline is approaching... and I can't find any Java library that
> matches results from R.. so I have to roll up my sleeves and start coding in
> Java - which means I need to get up to speed on Y-Hat calculations very
> quickly.  Ideally, looking for a paper (or a text book) that gives step by
> step instructions on calculating coefficients such as (b0, b1... b13).  I
> have been searching but haven't found anything.  My last resort is to look
> at R source code and start converting it to Java.
>
> I am new to R and a little rusty on Statistics, so I apologize for all the
> dumb questions, and GREATLY appreciate your patience and help.  Thanks.
>
>
> On Sat, Feb 13, 2010 at 3:20 PM, David Winsemius 
> wrote:
>
>>
>> On Feb 13, 2010, at 5:03 PM, Something Something wrote:
>>
>>  I tried..
>>>
>>> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
>>> formula(mod)
>>>
>>> This produced
>>> Y ~ X1 * X2 * X3
>>>
>>>
>>> When I typed just mod I got:
>>>
>>> Call:
>>> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)
>>>
>>> Coefficients:
>>> (Intercept)          X11          X21          X31      X11:X21
>>>  X11:X31
>>>    X21:X31  X11:X21:X31
>>>  177.9245       0.2005       2.4482       3.1216       0.8127     -26.6166
>>>    -3.0398      29.6049
>>>
>>>
>>> I am trying to figure out how R computed all these coefficients.
>>>
>>
>> From your question it is difficult to determine what sort of tutoring you
>> are expecting. To get the code of an R formula, you just type its name:
>>
>> lm
>>
>> Leads to lm.fit:
>>
>> lm.fit
>>
>> Reading further it appears the lm and lm.fit functions are really front
>> ends for this call:
>>
>> .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
>>        tol = as.double(tol), coefficients = mat.or.vec(p, ny),
>>        residuals = y, effects = y, rank = integer(1L), pivot = 1L:p,
>>        qraux = double(p), work = double(2 * p), PACKAGE = "base")
>>
>> Seems pretty likely that is a QR decomposition-based method that i
>> implemented in compiled code.
>>
>> So if you want to go deeper, at least you know what to search for. Or if
>> you want to know how regression works on a matrix level, you should consult
>> a good reference text or Wikipedia, which is surprisingly good for that sort
>> of question these days.
>>
>> --
>> David.
>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter 
>>> wrote:
>>>
>>>  ?formula


 Bert Gunter
 Genentech Nonclinical Statistics

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Something Something
 Sent: Saturday, February 13, 2010 1:24 PM
 To: Daniel Nordlund
 Cc: r-help@r-project.org
 Subject: Re: [R] lm function in R

 Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
 to
 '+'.

 Seems like when I put * it means - interaction & when I put + it's not an
 interaction.

 Is it correct to assume then that...

 When I put + R evaluates the following equation:
 Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk


 But when I put * R evaluates the following equation;
 Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12

Re: [R] Hierarchical data sets: which software to use?

2010-02-13 Thread kMan
Dear Anton,

4Mb is not a lot of data. A Gb still wouldn't be that troublesome in a flat
file. Your data can be migrated to a relational database at a future point. 

Sincerely,
KeithC.

-Original Message-
From: Anton du Toit [mailto:atdutoitrh...@gmail.com] 
Sent: Saturday, February 13, 2010 3:54 AM
To: r-help
Subject: Re: [R] Hierarchical data sets: which software to use?

Hi Douglas,

Thanks for your helpful response. I've commented on some of the points you
raised below:

> Although this may not be helpful for your immediate goal, storing and
manipulating data of this size and complexity (and, I expect, cost for
collection) really calls for tools like relational databases.  A single flat
file of 2500 variables by 1500 cases is almost never the best way to
organize such data.  A normalized representation as a collection of
interlinked tables in a relational data base is much more effective and less
error prone.  The widespread use of spreadsheets or SPSS data sets or SAS
data sets which encourage the "single table with a gargantuan number of
columns, most of which are missing data in most cases" approach to
organization of longitudinal data is regrettable.

I'm both relieved and daunted by this. Daunted because it means I'll need to
learn another package (probably postGreSQL or MySQL?), but relieved because
constructing a 2500 by 1500 file seemed intuitively wrong, as well as
introducing the possibility of errors unnecessarily--surely it makes more
sense to leave the data as is.

As far as immediate goals go--I am at the beginning of a thesis, and I have
more research planned after that, so I want to get things right from the
start.


> For later analysis in R it is better to start with "long" form of the
data, as opposed to the "wide" form, even if it means repeating demographic
information over several occasions.  Using a relational database allows for
a long view to be generated without the possibility of inconsistency in the
demographics.  I am using the descriptions "long" and "wide" in the sense
that they are used in the reshape help page.  See

?reshape

> in R.  The long view is also called the subject/occasion view in the
sense that each row corresponds to one subject on one occasion.

> Robert Gentleman's book "R Programming for Bioinformatics" provides
background on linking R to relational databases.

Thanks--I'll look this one up.


> As I said at the beginning, you may not want to undertake the
necessary study and effort to reorganize your data for this specific project
but if you do this a lot you may want to consider it.

As above: a stitch in time, I suppose.

Thanks again.

Anton

On Sat, Feb 6, 2010 at 3:22 AM, Douglas Bates  wrote:
> On Sun, Jan 31, 2010 at 10:24 PM, Anton du Toit 
wrote:
>> Dear R-helpers,
>>
>> I’m writing for advice on whether I should use R or a different 
>> package or language. I’ve looked through the R-help archives, some 
>> manuals, and some other sites as well, and I haven’t done too well 
>> finding relevant info, hence my question here.
>>
>> I’m working with hierarchical data (in SPSS lingo). That is, for each 
>> case
>> (person) I read in three types of (medical) record:
>>
>> 1. demographic data: name, age, sex, address, etc
>>
>> 2. ‘admissions’ data: this generally repeats, so I will have 20 or so 
>> variables relating to their first hospital admission, then the same 
>> 20 again for their second admission, and so on
>>
>> 3. ‘collections’ data, about 100 variables containing the results of 
>> a battery of standard tests. These are administered at intervals and 
>> so this is repeating data as well.
>>
>> The number of repetitions varies between cases, so in its one case 
>> per line format the data is non-rectangular.
>>
>> At present I have shoehorned all of this into SPSS, with each case on 
>> one line. My test database has 2,500 variables and 1,500 cases (or 
>> persons), and in SPSS’s *.SAV format is ~4MB. The one I finally work 
>> with will be larger again, though likely within one order of 
>> magnitude. Down the track, funding permitting, I hope to be working with
tens of thousands of cases.
>
> Although this may not be helpful for your immediate goal, storing and 
> manipulating data of this size and complexity (and, I expect, cost for
> collection) really calls for tools like relational databases.  A 
> single flat file of 2500 variables by 1500 cases is almost never the 
> best way to organize such data.  A normalized representation as a 
> collection of interlinked tables in a relational data base is much 
> more effective and less error prone.  The widespread use of 
> spreadsheets or SPSS data sets or SAS data sets which encourage the 
> "single table with a gargantuan number of columns, most of which are 
> missing data in most cases" approach to organization of longitudinal 
> data is regrettable.
>
> For later analysis in R it is better to start with "long" form of the 
> data, as opposed to the "wide" form, even if it me

Re: [R] lm function in R

2010-02-13 Thread Daniel Malter
It seems to me that your question is more about the econometrics than about
R. Any introductory econometric textbook or compendium on econometrics will
cover this as it is a basic. See, for example, Greene 2006 or Wooldridge
2002.

Say X is your data matrix, that contains columns for each of the individual
variables (x), columns with their interactions, and one column of 1s for the
intercept. Let y be your dependent variable. Then, OLS estimates are
computed by

X'X inverse X'y

Or in R

solve(t(X)%*%X)%*%t(X)%*%y


Best,
Daniel 


-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Something Something
Sent: Saturday, February 13, 2010 5:04 PM
To: Bert Gunter
Cc: r-help@r-project.org
Subject: Re: [R] lm function in R

I tried..

mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
formula(mod)

This produced
Y ~ X1 * X2 * X3


When I typed just mod I got:

Call:
lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)

Coefficients:
(Intercept)  X11  X21  X31  X11:X21  X11:X31
 X21:X31  X11:X21:X31
   177.9245   0.2005   2.4482   3.1216   0.8127 -26.6166
 -3.0398  29.6049


I am trying to figure out how R computed all these coefficients.





On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter  wrote:

> ?formula
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Something Something
> Sent: Saturday, February 13, 2010 1:24 PM
> To: Daniel Nordlund
> Cc: r-help@r-project.org
> Subject: Re: [R] lm function in R
>
> Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
> to
> '+'.
>
> Seems like when I put * it means - interaction & when I put + it's not an
> interaction.
>
> Is it correct to assume then that...
>
> When I put + R evaluates the following equation:
> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>
>
> But when I put * R evaluates the following equation;
> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c
>
> Is this correct?  If it is then can someone point me to any sources that
> will explain how the coefficients (such as b0... bk, b12.. , b123..) are
> calculated.  I guess, one source is the R source code :) but is there any
> other documentation anywhere?
>
> Please let me know.  Thanks.
>
>
>
> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
> wrote:
>
> > > -Original Message-
> > > From: r-help-boun...@r-project.org [mailto:
> r-help-boun...@r-project.org]
> > > On Behalf Of Something Something
> > > Sent: Friday, February 12, 2010 5:28 PM
> > > To: Phil Spector; r-help@r-project.org
> > > Subject: Re: [R] lm function in R
> > >
> > > Thanks for the replies everyone.  Greatly appreciate it.  Some
> progress,
> > > but
> > > now I am getting the following values when I don't use "as.factor"
> > >
> > > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
> > >
> > > Is that what you guys get?
> >
> >
> > If you look at Phil's response below, no, that is not what he got.  The
> > difference is that you are specifying an interaction, whereas Phil did
> not
> > (because the equation you initially specified did not include an
> > interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
> >
> > >
> > >
> > > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> > > wrote:
> > >
> > > > By converting the two variables to factors, you are fitting
> > > > an entirely different model.  Leave out the as.factor stuff
> > > > and it will work exactly as you want it to.
> > > >
> > > >  dat
> > > >>
> > > >  V1 V2 V3 V4
> > > > 1 s1 14  4  1
> > > > 2 s2 23  4  2
> > > > 3 s3 30  7  2
> > > > 4 s4 50  7  4
> > > > 5 s5 39 10  3
> > > > 6 s6 67 10  6
> > > >
> > > >> names(dat) = c('id','y','x1','x2')
> > > >> z = lm(y~x1+x2,dat)
> > > >> predict(z)
> > > >>
> > > >   123456 15.16667
> 24.7
> > > > 27.7 46.7 40.16667 68.7
> > > >
> > > >
> > > >- Phil Spector
> > > > Statistical Computing
> Facility
> > > > Department of Statistics
> > > > UC Berkeley
> > > > spec...@stat.berkeley.edu
> > > >
> > > >
> > > >
> > > > On Fri, 12 Feb 2010, Something Something wrote:
> > > >
> > > >  Hello,
> > > >>
> > > >> I am trying to learn how to perform Multiple Regression Analysis in
> R.
> > > I
> > > >> decided to take a simple example given in this PDF:
> > > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> > > >>
> > > >> I created a small CSV called, students.csv that contains the
> following
> > > >> data:
> > > >>
> > > >> s1 14 4 1
> > > >> s2 23 4 2
> > > >> s3 30 7 2

Re: [R] lm function in R

2010-02-13 Thread Something Something
>>From your question it is difficult to determine what sort of tutoring you
are expecting.
The kind of tutoring you are providing is definitely helpful :)

I guess I should quickly explain what I am trying to accomplish.  I have
been a computer scientist for quite a few years, but I studied Statistics
long time ago.  Now trying to get back into the Statistics field.

For my latest project I have been asked to implement Multiple Regression
Analysis (using interactions - I guess) in Java & Distributed Computing
(more specifically Hadoop).  I looked at the possibility of using rJava, but
it's 'single threaded' model plus complex deployment on 1000s of machines
does not make it very appealing.  I looked at using 3rd party libraries such
as 'Apache Commons Math' & Flanagan's JAVA statistics library.  Both of them
use this formula:
Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk

and NOT the one I have been asked to implement - which is -
Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c

(Not to mention these tools do not use BigDecimal so the answers given by
them are not as precise as those from R).

So now the deadline is approaching... and I can't find any Java library that
matches results from R.. so I have to roll up my sleeves and start coding in
Java - which means I need to get up to speed on Y-Hat calculations very
quickly.  Ideally, looking for a paper (or a text book) that gives step by
step instructions on calculating coefficients such as (b0, b1... b13).  I
have been searching but haven't found anything.  My last resort is to look
at R source code and start converting it to Java.

I am new to R and a little rusty on Statistics, so I apologize for all the
dumb questions, and GREATLY appreciate your patience and help.  Thanks.


On Sat, Feb 13, 2010 at 3:20 PM, David Winsemius wrote:

>
> On Feb 13, 2010, at 5:03 PM, Something Something wrote:
>
>  I tried..
>>
>> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
>> formula(mod)
>>
>> This produced
>> Y ~ X1 * X2 * X3
>>
>>
>> When I typed just mod I got:
>>
>> Call:
>> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)
>>
>> Coefficients:
>> (Intercept)  X11  X21  X31  X11:X21
>>  X11:X31
>>X21:X31  X11:X21:X31
>>  177.9245   0.2005   2.4482   3.1216   0.8127 -26.6166
>>-3.0398  29.6049
>>
>>
>> I am trying to figure out how R computed all these coefficients.
>>
>
> From your question it is difficult to determine what sort of tutoring you
> are expecting. To get the code of an R formula, you just type its name:
>
> lm
>
> Leads to lm.fit:
>
> lm.fit
>
> Reading further it appears the lm and lm.fit functions are really front
> ends for this call:
>
> .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
>tol = as.double(tol), coefficients = mat.or.vec(p, ny),
>residuals = y, effects = y, rank = integer(1L), pivot = 1L:p,
>qraux = double(p), work = double(2 * p), PACKAGE = "base")
>
> Seems pretty likely that is a QR decomposition-based method that i
> implemented in compiled code.
>
> So if you want to go deeper, at least you know what to search for. Or if
> you want to know how regression works on a matrix level, you should consult
> a good reference text or Wikipedia, which is surprisingly good for that sort
> of question these days.
>
> --
> David.
>
>>
>>
>>
>>
>>
>> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter 
>> wrote:
>>
>>  ?formula
>>>
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Statistics
>>>
>>> -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> On
>>> Behalf Of Something Something
>>> Sent: Saturday, February 13, 2010 1:24 PM
>>> To: Daniel Nordlund
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] lm function in R
>>>
>>> Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
>>> to
>>> '+'.
>>>
>>> Seems like when I put * it means - interaction & when I put + it's not an
>>> interaction.
>>>
>>> Is it correct to assume then that...
>>>
>>> When I put + R evaluates the following equation:
>>> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>>>
>>>
>>> But when I put * R evaluates the following equation;
>>> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c
>>>
>>> Is this correct?  If it is then can someone point me to any sources that
>>> will explain how the coefficients (such as b0... bk, b12.. , b123..) are
>>> calculated.  I guess, one source is the R source code :) but is there any
>>> other documentation anywhere?
>>>
>>> Please let me know.  Thanks.
>>>
>>>
>>>
>>> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
>>> wrote:
>>>
>>>  -Original Message-
> From: r-help-boun...@r-project.org [mailto:
>
 r-help-boun...@r-project.org]
>>>
 On Behalf Of Something Something
> Sent: Friday, February 12, 2010 5:28 PM
> To: Phil Spector; r-help@r-project.org
> Subject: Re: [R] lm funct

Re: [R] lm function in R

2010-02-13 Thread Gabor Grothendieck
You can find out the model matrix like this where we use the builtin
data frame CO2 to illustrate:

fo <- uptake ~ Treatment * Type

mod <- lm(fo, CO2); mod

mm <- model.matrix(fo, CO2)
View(mm)

On Sat, Feb 13, 2010 at 5:03 PM, Something Something
 wrote:
> I tried..
>
> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
> formula(mod)
>
> This produced
> Y ~ X1 * X2 * X3
>
>
> When I typed just mod I got:
>
> Call:
> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)
>
> Coefficients:
> (Intercept)          X11          X21          X31      X11:X21      X11:X31
>     X21:X31  X11:X21:X31
>   177.9245       0.2005       2.4482       3.1216       0.8127     -26.6166
>     -3.0398      29.6049
>
>
> I am trying to figure out how R computed all these coefficients.
>
>
>
>
>
> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter  wrote:
>
>> ?formula
>>
>>
>> Bert Gunter
>> Genentech Nonclinical Statistics
>>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On
>> Behalf Of Something Something
>> Sent: Saturday, February 13, 2010 1:24 PM
>> To: Daniel Nordlund
>> Cc: r-help@r-project.org
>> Subject: Re: [R] lm function in R
>>
>> Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
>> to
>> '+'.
>>
>> Seems like when I put * it means - interaction & when I put + it's not an
>> interaction.
>>
>> Is it correct to assume then that...
>>
>> When I put + R evaluates the following equation:
>> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>>
>>
>> But when I put * R evaluates the following equation;
>> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c
>>
>> Is this correct?  If it is then can someone point me to any sources that
>> will explain how the coefficients (such as b0... bk, b12.. , b123..) are
>> calculated.  I guess, one source is the R source code :) but is there any
>> other documentation anywhere?
>>
>> Please let me know.  Thanks.
>>
>>
>>
>> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
>> wrote:
>>
>> > > -Original Message-
>> > > From: r-help-boun...@r-project.org [mailto:
>> r-help-boun...@r-project.org]
>> > > On Behalf Of Something Something
>> > > Sent: Friday, February 12, 2010 5:28 PM
>> > > To: Phil Spector; r-help@r-project.org
>> > > Subject: Re: [R] lm function in R
>> > >
>> > > Thanks for the replies everyone.  Greatly appreciate it.  Some
>> progress,
>> > > but
>> > > now I am getting the following values when I don't use "as.factor"
>> > >
>> > > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
>> > >
>> > > Is that what you guys get?
>> >
>> >
>> > If you look at Phil's response below, no, that is not what he got.  The
>> > difference is that you are specifying an interaction, whereas Phil did
>> not
>> > (because the equation you initially specified did not include an
>> > interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
>> >
>> > >
>> > >
>> > > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
>> > > wrote:
>> > >
>> > > > By converting the two variables to factors, you are fitting
>> > > > an entirely different model.  Leave out the as.factor stuff
>> > > > and it will work exactly as you want it to.
>> > > >
>> > > >  dat
>> > > >>
>> > > >  V1 V2 V3 V4
>> > > > 1 s1 14  4  1
>> > > > 2 s2 23  4  2
>> > > > 3 s3 30  7  2
>> > > > 4 s4 50  7  4
>> > > > 5 s5 39 10  3
>> > > > 6 s6 67 10  6
>> > > >
>> > > >> names(dat) = c('id','y','x1','x2')
>> > > >> z = lm(y~x1+x2,dat)
>> > > >> predict(z)
>> > > >>
>> > > >       1        2        3        4        5        6 15.16667
>> 24.7
>> > > > 27.7 46.7 40.16667 68.7
>> > > >
>> > > >
>> > > >                                        - Phil Spector
>> > > >                                         Statistical Computing
>> Facility
>> > > >                                         Department of Statistics
>> > > >                                         UC Berkeley
>> > > >                                         spec...@stat.berkeley.edu
>> > > >
>> > > >
>> > > >
>> > > > On Fri, 12 Feb 2010, Something Something wrote:
>> > > >
>> > > >  Hello,
>> > > >>
>> > > >> I am trying to learn how to perform Multiple Regression Analysis in
>> R.
>> > > I
>> > > >> decided to take a simple example given in this PDF:
>> > > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
>> > > >>
>> > > >> I created a small CSV called, students.csv that contains the
>> following
>> > > >> data:
>> > > >>
>> > > >> s1 14 4 1
>> > > >> s2 23 4 2
>> > > >> s3 30 7 2
>> > > >> s4 50 7 4
>> > > >> s5 39 10 3
>> > > >> s6 67 10 6
>> > > >>
>> > > >> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
>> > > >>
>> > > >> Now the expected results are:
>> > > >>
>> > > >> yHat[0]:15.168
>> > > >> yHat[1]:24.668
>> > > >> yHat[2]:27.664
>> > > >> yHat[3]:46.664
>> > > >> yHat[4]:40.164
>> > > >> yHat[5]:68.67
>> > > >>
>> > > >>

Re: [R] lm function in R

2010-02-13 Thread David Winsemius


On Feb 13, 2010, at 5:03 PM, Something Something wrote:


I tried..

mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
formula(mod)

This produced
Y ~ X1 * X2 * X3


When I typed just mod I got:

Call:
lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)

Coefficients:
(Intercept)  X11  X21  X31  X11:X21   
X11:X31

X21:X31  X11:X21:X31
  177.9245   0.2005   2.4482   3.1216   0.8127  
-26.6166

-3.0398  29.6049


I am trying to figure out how R computed all these coefficients.


From your question it is difficult to determine what sort of tutoring  
you are expecting. To get the code of an R formula, you just type its  
name:


lm

Leads to lm.fit:

lm.fit

Reading further it appears the lm and lm.fit functions are really  
front ends for this call:


.Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
tol = as.double(tol), coefficients = mat.or.vec(p, ny),
residuals = y, effects = y, rank = integer(1L), pivot = 1L:p,
qraux = double(p), work = double(2 * p), PACKAGE = "base")

Seems pretty likely that is a QR decomposition-based method that i  
implemented in compiled code.


So if you want to go deeper, at least you know what to search for. Or  
if you want to know how regression works on a matrix level, you should  
consult a good reference text or Wikipedia, which is surprisingly good  
for that sort of question these days.


--
David.






On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter  
 wrote:



?formula


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org 
]

On
Behalf Of Something Something
Sent: Saturday, February 13, 2010 1:24 PM
To: Daniel Nordlund
Cc: r-help@r-project.org
Subject: Re: [R] lm function in R

Thanks Dan.  Yes that was very helpful.  I didn't see the change  
from '*'

to
'+'.

Seems like when I put * it means - interaction & when I put + it's  
not an

interaction.

Is it correct to assume then that...

When I put + R evaluates the following equation:
Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk


But when I put * R evaluates the following equation;
Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +   
+ c


Is this correct?  If it is then can someone point me to any sources  
that
will explain how the coefficients (such as b0... bk, b12.. ,  
b123..) are
calculated.  I guess, one source is the R source code :) but is  
there any

other documentation anywhere?

Please let me know.  Thanks.



On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
wrote:


-Original Message-
From: r-help-boun...@r-project.org [mailto:

r-help-boun...@r-project.org]

On Behalf Of Something Something
Sent: Friday, February 12, 2010 5:28 PM
To: Phil Spector; r-help@r-project.org
Subject: Re: [R] lm function in R

Thanks for the replies everyone.  Greatly appreciate it.  Some

progress,

but
now I am getting the following values when I don't use "as.factor"

13.14167 25.11667 28.34167 49.14167 40.39167 66.86667

Is that what you guys get?



If you look at Phil's response below, no, that is not what he  
got.  The
difference is that you are specifying an interaction, whereas Phil  
did

not

(because the equation you initially specified did not include an
interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.




On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
wrote:


By converting the two variables to factors, you are fitting
an entirely different model.  Leave out the as.factor stuff
and it will work exactly as you want it to.

dat



V1 V2 V3 V4
1 s1 14  4  1
2 s2 23  4  2
3 s3 30  7  2
4 s4 50  7  4
5 s5 39 10  3
6 s6 67 10  6


names(dat) = c('id','y','x1','x2')
z = lm(y~x1+x2,dat)
predict(z)


 123456 15.16667

24.7

27.7 46.7 40.16667 68.7


  - Phil Spector
   Statistical Computing

Facility

   Department of Statistics
   UC Berkeley
   spec...@stat.berkeley.edu



On Fri, 12 Feb 2010, Something Something wrote:

Hello,


I am trying to learn how to perform Multiple Regression  
Analysis in

R.

I

decided to take a simple example given in this PDF:
http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf

I created a small CSV called, students.csv that contains the

following

data:

s1 14 4 1
s2 23 4 2
s3 30 7 2
s4 50 7 4
s5 39 10 3
s6 67 10 6

Col headers:  Student id, Memory span(Y), age(X1), speech  
rate(X2)


Now the expected results are:

yHat[0]:15.168
yHat[1]:24.668
yHat[2]:27.664
yHat[3]:46.664
yHat[4]:40.164
yHat[5]:68.67

This is based on the following equation (given in the PDF):  Y =

1.67

+

X1

+
9.50 X2

I ran the following commands in R:

data = re

Re: [R] NMDS ordination

2010-02-13 Thread Dan Davison
Aisyah  ioz.ac.uk> writes:

> 
> 
> Hi
> 
> Im currently trying to plot my NMDS data together with fitted variables
> (envfit funct) on an ordination plot. The plot function shows two
> displays="sites" and "sp". I was wondering how to plot it so that the sites
> come up as different points for different sites but the species come up as
> actual names? It looks a little busy at the moment with everything in.

Please provide an example. I.e. working code creating the plot as you have it at
the moment. Include an example data set and be explicit about what packages are
needed. Don't post a large data set -- just create a minimal example
demonstrating the problem you are having.


> 
> Sya

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm function in R

2010-02-13 Thread Something Something
I tried..

mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
formula(mod)

This produced
Y ~ X1 * X2 * X3


When I typed just mod I got:

Call:
lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)

Coefficients:
(Intercept)  X11  X21  X31  X11:X21  X11:X31
 X21:X31  X11:X21:X31
   177.9245   0.2005   2.4482   3.1216   0.8127 -26.6166
 -3.0398  29.6049


I am trying to figure out how R computed all these coefficients.





On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter  wrote:

> ?formula
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Something Something
> Sent: Saturday, February 13, 2010 1:24 PM
> To: Daniel Nordlund
> Cc: r-help@r-project.org
> Subject: Re: [R] lm function in R
>
> Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
> to
> '+'.
>
> Seems like when I put * it means - interaction & when I put + it's not an
> interaction.
>
> Is it correct to assume then that...
>
> When I put + R evaluates the following equation:
> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>
>
> But when I put * R evaluates the following equation;
> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c
>
> Is this correct?  If it is then can someone point me to any sources that
> will explain how the coefficients (such as b0... bk, b12.. , b123..) are
> calculated.  I guess, one source is the R source code :) but is there any
> other documentation anywhere?
>
> Please let me know.  Thanks.
>
>
>
> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
> wrote:
>
> > > -Original Message-
> > > From: r-help-boun...@r-project.org [mailto:
> r-help-boun...@r-project.org]
> > > On Behalf Of Something Something
> > > Sent: Friday, February 12, 2010 5:28 PM
> > > To: Phil Spector; r-help@r-project.org
> > > Subject: Re: [R] lm function in R
> > >
> > > Thanks for the replies everyone.  Greatly appreciate it.  Some
> progress,
> > > but
> > > now I am getting the following values when I don't use "as.factor"
> > >
> > > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
> > >
> > > Is that what you guys get?
> >
> >
> > If you look at Phil's response below, no, that is not what he got.  The
> > difference is that you are specifying an interaction, whereas Phil did
> not
> > (because the equation you initially specified did not include an
> > interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
> >
> > >
> > >
> > > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> > > wrote:
> > >
> > > > By converting the two variables to factors, you are fitting
> > > > an entirely different model.  Leave out the as.factor stuff
> > > > and it will work exactly as you want it to.
> > > >
> > > >  dat
> > > >>
> > > >  V1 V2 V3 V4
> > > > 1 s1 14  4  1
> > > > 2 s2 23  4  2
> > > > 3 s3 30  7  2
> > > > 4 s4 50  7  4
> > > > 5 s5 39 10  3
> > > > 6 s6 67 10  6
> > > >
> > > >> names(dat) = c('id','y','x1','x2')
> > > >> z = lm(y~x1+x2,dat)
> > > >> predict(z)
> > > >>
> > > >   123456 15.16667
> 24.7
> > > > 27.7 46.7 40.16667 68.7
> > > >
> > > >
> > > >- Phil Spector
> > > > Statistical Computing
> Facility
> > > > Department of Statistics
> > > > UC Berkeley
> > > > spec...@stat.berkeley.edu
> > > >
> > > >
> > > >
> > > > On Fri, 12 Feb 2010, Something Something wrote:
> > > >
> > > >  Hello,
> > > >>
> > > >> I am trying to learn how to perform Multiple Regression Analysis in
> R.
> > > I
> > > >> decided to take a simple example given in this PDF:
> > > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> > > >>
> > > >> I created a small CSV called, students.csv that contains the
> following
> > > >> data:
> > > >>
> > > >> s1 14 4 1
> > > >> s2 23 4 2
> > > >> s3 30 7 2
> > > >> s4 50 7 4
> > > >> s5 39 10 3
> > > >> s6 67 10 6
> > > >>
> > > >> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
> > > >>
> > > >> Now the expected results are:
> > > >>
> > > >> yHat[0]:15.168
> > > >> yHat[1]:24.668
> > > >> yHat[2]:27.664
> > > >> yHat[3]:46.664
> > > >> yHat[4]:40.164
> > > >> yHat[5]:68.67
> > > >>
> > > >> This is based on the following equation (given in the PDF):  Y =
> 1.67
> > +
> > > X1
> > > >> +
> > > >> 9.50 X2
> > > >>
> > > >> I ran the following commands in R:
> > > >>
> > > >> data = read.table("students.csv", head=F, as.is=T, na.string=".",
> > > >> row.nam=NULL)
> > > >> X1 = as.factor(data[[3]])
> > > >> X2 = as.factor(data[[4]])
> > > >> Y = data[[2]]
> > > >> mod = lm(Y ~ X1*X2, na.action = na.exclude)
> > > >> Y.ha

Re: [R] (no subject)

2010-02-13 Thread Ben Bolker
  [cc'ing back to r-help]

  You need to get the separator right: you specify slashes below where
your date is space-separated

as.Date("02 03 2008", format="%m %d  %Y")

  note that the solution I gave you below assumed that your dates were
in d/m/Y order (since you didn't specify).  You will have to modify
things slightly (i.e. specifying format explicitly) if that's not true.

Hussain Abusaaq wrote:
> I hope that will work. just to make my problem clear
> when i used as.date  for e "02 03 2008"  it gives me NA
> 
> for example
>  
>>  as.Date("02 03 2008",format="%m/%d/%Y")
> [1] NA
> I hope you can help me here
> - Original Message -
> From: Ben Bolker 
> Date: Saturday, February 13, 2010 1:03 pm
> Subject: Re: [R] (no subject)
> To: r-h...@stat.math.ethz.ch
> 
>> Hussain Abusaaq  fsu.edu> writes:
>>
>>> I have this vector and I want to change it to date.
>>>
>>> for example G=[05 12 2008]
>>>
>>> or g=[2]
>>> f=[3]
>>> y=[2208]
>>>
>> Something like as.Date(paste(rev(G),collapse="-"))
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.htmland provide commented, minimal, self-contained, 
>> reproducible code.
>>


-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bol...@ufl.edu / people.biology.ufl.edu/bolker
GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Highlighting points in a quantile plot for different values of a second column

2010-02-13 Thread Peter Ehlers

Saptarshi Guha wrote:

Hello,
I have the following data frame, using lattice and qqmath, how do I display the
quantile plot (e.g distribution = qunif), but highlight the f==TRUE and f==FALSE
differently?

library(lattice)
pp <- function(n) ((1:n)-0.5)/n

x=data.frame(x=runif(20),f=sapply(runif(20),function(r) r>0.5))

I would like a single curve, the following example displays two curves.

qqmath(~x,groups=f,data=x,distribution=qunif,f.value=pp)



Here's one way:

qqmath(~x, data=x, type='b',
  col=ifelse(x$f, 2, 4),
  pch=ifelse(x$f, 17, 19),cex=2)

 -Peter Ehlers




Thank you
Saptarshi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm function in R

2010-02-13 Thread Bert Gunter
?formula 


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Something Something
Sent: Saturday, February 13, 2010 1:24 PM
To: Daniel Nordlund
Cc: r-help@r-project.org
Subject: Re: [R] lm function in R

Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*' to
'+'.

Seems like when I put * it means - interaction & when I put + it's not an
interaction.

Is it correct to assume then that...

When I put + R evaluates the following equation:
Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk


But when I put * R evaluates the following equation;
Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c

Is this correct?  If it is then can someone point me to any sources that
will explain how the coefficients (such as b0... bk, b12.. , b123..) are
calculated.  I guess, one source is the R source code :) but is there any
other documentation anywhere?

Please let me know.  Thanks.



On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
wrote:

> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> > On Behalf Of Something Something
> > Sent: Friday, February 12, 2010 5:28 PM
> > To: Phil Spector; r-help@r-project.org
> > Subject: Re: [R] lm function in R
> >
> > Thanks for the replies everyone.  Greatly appreciate it.  Some progress,
> > but
> > now I am getting the following values when I don't use "as.factor"
> >
> > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
> >
> > Is that what you guys get?
>
>
> If you look at Phil's response below, no, that is not what he got.  The
> difference is that you are specifying an interaction, whereas Phil did not
> (because the equation you initially specified did not include an
> interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
>
> >
> >
> > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> > wrote:
> >
> > > By converting the two variables to factors, you are fitting
> > > an entirely different model.  Leave out the as.factor stuff
> > > and it will work exactly as you want it to.
> > >
> > >  dat
> > >>
> > >  V1 V2 V3 V4
> > > 1 s1 14  4  1
> > > 2 s2 23  4  2
> > > 3 s3 30  7  2
> > > 4 s4 50  7  4
> > > 5 s5 39 10  3
> > > 6 s6 67 10  6
> > >
> > >> names(dat) = c('id','y','x1','x2')
> > >> z = lm(y~x1+x2,dat)
> > >> predict(z)
> > >>
> > >   123456 15.16667 24.7
> > > 27.7 46.7 40.16667 68.7
> > >
> > >
> > >- Phil Spector
> > > Statistical Computing Facility
> > > Department of Statistics
> > > UC Berkeley
> > > spec...@stat.berkeley.edu
> > >
> > >
> > >
> > > On Fri, 12 Feb 2010, Something Something wrote:
> > >
> > >  Hello,
> > >>
> > >> I am trying to learn how to perform Multiple Regression Analysis in
R.
> > I
> > >> decided to take a simple example given in this PDF:
> > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> > >>
> > >> I created a small CSV called, students.csv that contains the
following
> > >> data:
> > >>
> > >> s1 14 4 1
> > >> s2 23 4 2
> > >> s3 30 7 2
> > >> s4 50 7 4
> > >> s5 39 10 3
> > >> s6 67 10 6
> > >>
> > >> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
> > >>
> > >> Now the expected results are:
> > >>
> > >> yHat[0]:15.168
> > >> yHat[1]:24.668
> > >> yHat[2]:27.664
> > >> yHat[3]:46.664
> > >> yHat[4]:40.164
> > >> yHat[5]:68.67
> > >>
> > >> This is based on the following equation (given in the PDF):  Y = 1.67
> +
> > X1
> > >> +
> > >> 9.50 X2
> > >>
> > >> I ran the following commands in R:
> > >>
> > >> data = read.table("students.csv", head=F, as.is=T, na.string=".",
> > >> row.nam=NULL)
> > >> X1 = as.factor(data[[3]])
> > >> X2 = as.factor(data[[4]])
> > >> Y = data[[2]]
> > >> mod = lm(Y ~ X1*X2, na.action = na.exclude)
> > >> Y.hat = fitted(mod)
> > >> Y.hat
> > >>
> > >> This gives me the following output:
> > >>
> > >>  Y.hat
> > >>>
> > >> 1  2  3  4  5  6
> > >> 14 23 30 50 39 67
> > >>
> > >> Obviously I am doing something wrong.  Please help.  Thanks.
> > >>
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Bothell, WA USA
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/postin

Re: [R] lm function in R

2010-02-13 Thread Something Something
Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*' to
'+'.

Seems like when I put * it means - interaction & when I put + it's not an
interaction.

Is it correct to assume then that...

When I put + R evaluates the following equation:
Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + · · · + bkXk


But when I put * R evaluates the following equation;
Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +  + c

Is this correct?  If it is then can someone point me to any sources that
will explain how the coefficients (such as b0... bk, b12.. , b123..) are
calculated.  I guess, one source is the R source code :) but is there any
other documentation anywhere?

Please let me know.  Thanks.



On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund wrote:

> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> > On Behalf Of Something Something
> > Sent: Friday, February 12, 2010 5:28 PM
> > To: Phil Spector; r-help@r-project.org
> > Subject: Re: [R] lm function in R
> >
> > Thanks for the replies everyone.  Greatly appreciate it.  Some progress,
> > but
> > now I am getting the following values when I don't use "as.factor"
> >
> > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
> >
> > Is that what you guys get?
>
>
> If you look at Phil's response below, no, that is not what he got.  The
> difference is that you are specifying an interaction, whereas Phil did not
> (because the equation you initially specified did not include an
> interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
>
> >
> >
> > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> > wrote:
> >
> > > By converting the two variables to factors, you are fitting
> > > an entirely different model.  Leave out the as.factor stuff
> > > and it will work exactly as you want it to.
> > >
> > >  dat
> > >>
> > >  V1 V2 V3 V4
> > > 1 s1 14  4  1
> > > 2 s2 23  4  2
> > > 3 s3 30  7  2
> > > 4 s4 50  7  4
> > > 5 s5 39 10  3
> > > 6 s6 67 10  6
> > >
> > >> names(dat) = c('id','y','x1','x2')
> > >> z = lm(y~x1+x2,dat)
> > >> predict(z)
> > >>
> > >   123456 15.16667 24.7
> > > 27.7 46.7 40.16667 68.7
> > >
> > >
> > >- Phil Spector
> > > Statistical Computing Facility
> > > Department of Statistics
> > > UC Berkeley
> > > spec...@stat.berkeley.edu
> > >
> > >
> > >
> > > On Fri, 12 Feb 2010, Something Something wrote:
> > >
> > >  Hello,
> > >>
> > >> I am trying to learn how to perform Multiple Regression Analysis in R.
> > I
> > >> decided to take a simple example given in this PDF:
> > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> > >>
> > >> I created a small CSV called, students.csv that contains the following
> > >> data:
> > >>
> > >> s1 14 4 1
> > >> s2 23 4 2
> > >> s3 30 7 2
> > >> s4 50 7 4
> > >> s5 39 10 3
> > >> s6 67 10 6
> > >>
> > >> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
> > >>
> > >> Now the expected results are:
> > >>
> > >> yHat[0]:15.168
> > >> yHat[1]:24.668
> > >> yHat[2]:27.664
> > >> yHat[3]:46.664
> > >> yHat[4]:40.164
> > >> yHat[5]:68.67
> > >>
> > >> This is based on the following equation (given in the PDF):  Y = 1.67
> +
> > X1
> > >> +
> > >> 9.50 X2
> > >>
> > >> I ran the following commands in R:
> > >>
> > >> data = read.table("students.csv", head=F, as.is=T, na.string=".",
> > >> row.nam=NULL)
> > >> X1 = as.factor(data[[3]])
> > >> X2 = as.factor(data[[4]])
> > >> Y = data[[2]]
> > >> mod = lm(Y ~ X1*X2, na.action = na.exclude)
> > >> Y.hat = fitted(mod)
> > >> Y.hat
> > >>
> > >> This gives me the following output:
> > >>
> > >>  Y.hat
> > >>>
> > >> 1  2  3  4  5  6
> > >> 14 23 30 50 39 67
> > >>
> > >> Obviously I am doing something wrong.  Please help.  Thanks.
> > >>
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Bothell, WA USA
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot different regression models on one graph

2010-02-13 Thread Bert Gunter
Brian Joiner, my teacher and insightful guide to the world of applied
statistics when I was a student, succinctly summarized this discussion by:

"Even the data aren't sufficient."

(This is an inside joke for statisticians).

Cheers,
Bert 

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Peter Ehlers
Sent: Saturday, February 13, 2010 12:35 PM
To: David Winsemius
Cc: Rhonda Reidy; r-help@r-project.org
Subject: Re: [R] Plot different regression models on one graph

Rhonda:

As David points out, a cubic fit is rather speculative. I think that
one needs to distinguish two situations: 1) theoretical justification
for a cubic model is available, or 2) you're dredging the data for
the "best" fit. Your case is the second. That puts you in the realm
of EDA (exploratory data analysis). You're free to fit any model you
wish, but you should assess the value of the model. One convenient
way is to use the influence.measures() function, which will show
that points 9 and 10 in your data have a strong influence on your
cubic fit (as, of course, your eyes would tell you). A good thing
to do at this point is to fit 3 more cubic models:
1) omitting point 9, 2) omitting point 10, 3) omitting both.

Try it and plot the resulting fits. You may be surprised.

Conclusion: Without more data, you can conclude nothing
about a non-straightline fit.

(And this hasn't even addressed the relative abundance of x=0 data.)

  -Peter Ehlers

David Winsemius wrote:
> 
> On Feb 13, 2010, at 1:35 PM, Rhonda Reidy wrote:
> 
>> The following variables have the following significant relationships 
>> (x is the explanatory variable): linear, cubic, exponential, logistic. 
>> The linear relationship plots without any trouble.
>>
>> Cubic is the 'best' model, but it is not plotting as a smooth curve 
>> using the following code:
>>
>> cubic.lm<- lm(y~poly(x,3))
> 
> Try:
> 
> lines(0:80, predict(cubic.lm, data.frame(x=0:80)),lwd=2)
> 
> But I really must say the superiority of that relationship over a linear 
> one is far from convincing to my eyes. Seems to be caused by one data 
> point. I hope the quotes around "best" mean tha tyou have the same qualms.
> 
> 
>> lines(x,predict(cubic.lm),lwd=2)
>>
>> How do I plot the data and the estimated curves for all of these 
>> regression models in the same plot?
>>
>> x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)
>>
>> y <- 
>>
c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042,0.042,0.
041,0.045,0.021,0.018,0.017,0.018,0.028,0.022) 
>>
>>
>> Thanks in advance.
>>
>> Rhonda Reidy
>>

-- 
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NMDS ordination

2010-02-13 Thread Aisyah

Hi

Im currently trying to plot my NMDS data together with fitted variables
(envfit funct) on an ordination plot. The plot function shows two
displays="sites" and "sp". I was wondering how to plot it so that the sites
come up as different points for different sites but the species come up as
actual names? It looks a little busy at the moment with everything in.

Sya
-- 
View this message in context: 
http://n4.nabble.com/NMDS-ordination-tp1554632p1554632.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] max and in the zoo package

2010-02-13 Thread Gabor Grothendieck
Try this:

> library(zoo)
> set.seed(1)
> z <- zoo(rnorm(100), as.Date(0) + 1:100)
> mx <- z[which.max(z)]; mx
1970-03-03
  2.401618
> coredata(mx)
[1] 2.401618
> time(mx)
[1] "1970-03-03"

On Sat, Feb 13, 2010 at 3:39 PM, Edouard Tallent  wrote:
> hi everyone.
> i am dealing on a zoo-class object.i am aware of the 'min' and 'max' function 
> to get the minimum and the maximum values.getting these maximum and minimum 
> values is ont thing.but, how to get the (zoo) dates these values occur ?
> in fact, in analysing a time series i am interested in the particular 
> (extreme) values AND in the moments they happen.
> regards.edouard.
>
>        [[alternative(swapped) HTML version deleted]]
>
> This is MIME Epilogue
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Highlighting points in a quantile plot for different values of a second column

2010-02-13 Thread Saptarshi Guha
Hello,
I have the following data frame, using lattice and qqmath, how do I display the
quantile plot (e.g distribution = qunif), but highlight the f==TRUE and f==FALSE
differently?

library(lattice)
pp <- function(n) ((1:n)-0.5)/n

x=data.frame(x=runif(20),f=sapply(runif(20),function(r) r>0.5))

I would like a single curve, the following example displays two curves.

qqmath(~x,groups=f,data=x,distribution=qunif,f.value=pp)



Thank you
Saptarshi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] max and in the zoo package

2010-02-13 Thread Edouard Tallent
hi everyone.
i am dealing on a zoo-class object.i am aware of the 'min' and 'max' function 
to get the minimum and the maximum values.getting these maximum and minimum 
values is ont thing.but, how to get the (zoo) dates these values occur ?
in fact, in analysing a time series i am interested in the particular (extreme) 
values AND in the moments they happen.
regards.edouard.

[[alternative(swapped) HTML version deleted]]

This is MIME Epilogue

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot different regression models on one graph

2010-02-13 Thread Peter Ehlers

Rhonda:

As David points out, a cubic fit is rather speculative. I think that
one needs to distinguish two situations: 1) theoretical justification
for a cubic model is available, or 2) you're dredging the data for
the "best" fit. Your case is the second. That puts you in the realm
of EDA (exploratory data analysis). You're free to fit any model you
wish, but you should assess the value of the model. One convenient
way is to use the influence.measures() function, which will show
that points 9 and 10 in your data have a strong influence on your
cubic fit (as, of course, your eyes would tell you). A good thing
to do at this point is to fit 3 more cubic models:
1) omitting point 9, 2) omitting point 10, 3) omitting both.

Try it and plot the resulting fits. You may be surprised.

Conclusion: Without more data, you can conclude nothing
about a non-straightline fit.

(And this hasn't even addressed the relative abundance of x=0 data.)

 -Peter Ehlers

David Winsemius wrote:


On Feb 13, 2010, at 1:35 PM, Rhonda Reidy wrote:

The following variables have the following significant relationships 
(x is the explanatory variable): linear, cubic, exponential, logistic. 
The linear relationship plots without any trouble.


Cubic is the 'best' model, but it is not plotting as a smooth curve 
using the following code:


cubic.lm<- lm(y~poly(x,3))


Try:

lines(0:80, predict(cubic.lm, data.frame(x=0:80)),lwd=2)

But I really must say the superiority of that relationship over a linear 
one is far from convincing to my eyes. Seems to be caused by one data 
point. I hope the quotes around "best" mean tha tyou have the same qualms.




lines(x,predict(cubic.lm),lwd=2)

How do I plot the data and the estimated curves for all of these 
regression models in the same plot?


x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)

y <- 
c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042,0.042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022) 



Thanks in advance.

Rhonda Reidy



--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to do calculations in data matrices?

2010-02-13 Thread Dan Davison
Zoppoli, Gabriele (NIH/NCI) [G]  mail.nih.gov> writes:

> 
> Please give me just a reference where I can find something useful. 

The others are right that rather than randomly googling, you should bite
the bullet and sit down for a couple of hours with some introductory
material on R (a book, or one of the freely available pdfs). Unless you are
never going to use R again, it will be worth it. But seeing as you asked your
question clearly, here's one
way to do the steps you specify. Hopefully this will help as well.

First, make a matrix to work with:

> mat1 <- matrix(sample(1:10, size=12, replace=TRUE), ncol=4)
> mat1
 [,1] [,2] [,3] [,4]
[1,]57   109
[2,]4   1082
[3,]8   1095

> 
> 
> In summary, I need to :
> 
> - find the median of each row of a matrix

You can use apply for that:

> row.medians <- apply(mat1, 1, median)
> row.medians
[1] 8.0 6.0 8.5

> - create a new matrix with each value in the first matrix divided by
> the median of its row

That's easy to *do*:

> mat2 <- mat1 / row.medians
> mat2
  [,1] [,2] [,3]  [,4]
[1,] 0.625 0.875000 1.25 1.125
[2,] 0.667 1.67 1.33 0.333
[3,] 0.9411765 1.176471 1.058824 0.5882353

but it may take more time to understand why that worked. How come it
knew that we wanted to divide each row by the median of the row? (Hint:
understand the "byrow" argument in ?matrix and the mentions of the word
"recyling" in ?Arithmetic).

> - if a value "a" in the second matrix is < 1, I need to substitute it
> with 1/a

First make a logical vector which identifies the elements of the matrix
you want to operate on:
> is.small <- mat2 < 1

Then perform the operation on those elements:

> mat2[is.small] <- 1 / mat2[is.small]
> mat2
   [,1] [,2] [,3]  [,4]
[1,] 1.6000 1.142857 1.25 1.125
[2,] 1.5000 1.67 1.33 3.000
[3,] 1.0625 1.176471 1.058824 1.700


# you could also use
ifelse(mat2 < 1, 1/mat2, mat2)

dan

> 
> I know that for some of you it must be overeasy, but I swear I googled
> for two hours with keywords "operations", "calculations", "data
> matrices", "data tables", and "CRAN", and I didn't find anything
> useful.
> 
> Thank you all
> 
> Gabriele Zoppoli, MD
> Ph.D. Fellow, Experimental and Clinical Oncology and Hematology,
> University of Genova, Genova, Italy
> Guest Researcher, LMP, NCI, NIH, Bethesda MD
> 
> Work: 301-451-8575
> Mobile: 301-204-5642
> Email: zoppolig  mail.nih.gov
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot different regression models on one graph

2010-02-13 Thread David Winsemius


On Feb 13, 2010, at 1:35 PM, Rhonda Reidy wrote:

The following variables have the following significant relationships  
(x is the explanatory variable): linear, cubic, exponential,  
logistic. The linear relationship plots without any trouble.


Cubic is the 'best' model, but it is not plotting as a smooth curve  
using the following code:


cubic.lm<- lm(y~poly(x,3))


Try:

lines(0:80, predict(cubic.lm, data.frame(x=0:80)),lwd=2)

But I really must say the superiority of that relationship over a  
linear one is far from convincing to my eyes. Seems to be caused by  
one data point. I hope the quotes around "best" mean tha tyou have the  
same qualms.




lines(x,predict(cubic.lm),lwd=2)

How do I plot the data and the estimated curves for all of these  
regression models in the same plot?


x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)

y <-  
c 
(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042,0.042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022 
)


Thanks in advance.

Rhonda Reidy


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot different regression models on one graph

2010-02-13 Thread Sharpie


Rhonda Reidy wrote:
> 
> The following variables have the following significant relationships (x is
> the explanatory variable): linear, cubic, exponential, logistic. The
> linear relationship plots without any trouble. 
> 
> Cubic is the 'best' model, but it is not plotting as a smooth curve using
> the following code:
> 
> cubic.lm<- lm(y~poly(x,3))
> lines(x,predict(cubic.lm),lwd=2)
> 
> How do I plot the data and the estimated curves for all of these
> regression models in the same plot? 
> 
> x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)
> 
> y <-
> c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042,0.042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022)
> 

Hi Rhonda,

The problem is that the default behavior of predict() only produces output
corresponding to the original values of x used in the model.  To get a
"smooth" curve, you will want to predict at a large number of x values that
fall within the range of the data.  This can be done using the pretty()
function, or using a composition of the seq() and range() functions if you
desire more control:

  # Create a new vector of X values at which to generate predictions:
  predX <- pretty( x, n = 100 )

  plot( y ~ x )

  # Add the regression model, but generate predictions using the
  # points in predX-- the default is to predict using points in x.

  lines( predX, predict( cubic.lm, newdata = list( x = predX )), lwd = 2 )


The above should produce the output you want.  However, I suggest Hadley
Wickham's excellent ggplot2 package-- I feel the following is more
expressive than using base graphics functions:

  require( ggplot2 )
  Plot <- qplot( x, y ) + stat_smooth( method = 'lm', formula = 'y ~
poly(x,3)' )
  print( Plot )

The amazing power of ggplot2 is that when you want to alter or extend your
plot, you simply "add" more ggplot2 commands to your plot object:

  # Add a linear regression model and change the color theme used in the
output
  Plot <- Plot + stat_smooth( method = 'lm', color = 'red' ) + theme_bw()
  print( Plot )


Hope this helps!

-Charlie
-- 
View this message in context: 
http://n4.nabble.com/Plot-different-regression-models-on-one-graph-tp1554606p1554622.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to do calculations in data matrices?

2010-02-13 Thread Liviu Andronic
On Sat, Feb 13, 2010 at 5:39 PM, Zoppoli, Gabriele (NIH/NCI) [G]
 wrote:
> Please give me just a reference where I can find something useful.
>
To complement Sarah's suggestions, a good place to start is Quick-R [1].
Liviu

[1] http://www.statmethods.net/stats/descriptives.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend mathematical annotation problem

2010-02-13 Thread David Winsemius


On Feb 13, 2010, at 1:45 PM, David Winsemius wrote:



On Feb 13, 2010, at 1:13 PM, Mark Heckmann wrote:

1) I need to plot a legend containing the mathematical symbol  
greater-than-or-equal sign.

And I want the text to start with that symbol.

plot(110, 0.8)
categories <- expression(blank >= 85, 84.9 - 80, 79.9 - 75, 74.9 -  
70, 69.9 - 65, 64.9 - 60, blank< 60)

legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)

What I want is just ">=85" to be printed, but without something  
(here "blank") in front it is no correct expression.


2) Also I want German type decimals, that is a comma instead of a  
point.
But the problem is, that the comma is used as argument separator in  
expression.


plot(110, 0.8)
categories <- expression(blank >= 85, 84,9 - 80, 79,9 - 75, 74,9 -  
70, 69,9 - 65, 64,9 - 60, blank< 60)

legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)


Try:
> plot(110, 0.8)
> categories <-legend(110, 0.8, c( expression(" ">= " 85"),  
"84,9 - 80", "79,9 - 75", "74,9 - 70", "69,9 - 65", "64,9 - 60",  
expression(""<" 60"))

+ , lty=1:7, lwd=3, col=1, merge=TRUE)


That worked because of side-effects. Better might be:

categories <- c( expression(" " >=  " 85"), "84,9 - 80", "79,9 -  
75", "74,9 - 70", "69,9 - 65", "64,9 - 60", expression("" < " 60"))


plot(110, 0.8)
legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)



The commas inside quotes do not cause problems.



This does obviously not work. Any ideas?

Thanks,
Mark
–––
Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend mathematical annotation problem

2010-02-13 Thread David Winsemius


On Feb 13, 2010, at 1:13 PM, Mark Heckmann wrote:

1) I need to plot a legend containing the mathematical symbol  
greater-than-or-equal sign.

And I want the text to start with that symbol.

plot(110, 0.8)
categories <- expression(blank >= 85, 84.9 - 80, 79.9 - 75, 74.9 -  
70, 69.9 - 65, 64.9 - 60, blank< 60)

legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)

What I want is just ">=85" to be printed, but without something  
(here "blank") in front it is no correct expression.


2) Also I want German type decimals, that is a comma instead of a  
point.
But the problem is, that the comma is used as argument separator in  
expression.


plot(110, 0.8)
categories <- expression(blank >= 85, 84,9 - 80, 79,9 - 75, 74,9 -  
70, 69,9 - 65, 64,9 - 60, blank< 60)

legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)


Try:
> plot(110, 0.8)
> categories <-legend(110, 0.8, c( expression(" ">= " 85"), "84,9  
- 80", "79,9 - 75", "74,9 - 70", "69,9 - 65", "64,9 - 60",  
expression(""<" 60"))

+ , lty=1:7, lwd=3, col=1, merge=TRUE)

The commas inside quotes do not cause problems.



This does obviously not work. Any ideas?

Thanks,
Mark
–––
Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot different regression models on one graph

2010-02-13 Thread Rhonda Reidy
The following variables have the following significant relationships (x is the 
explanatory variable): linear, cubic, exponential, logistic. The linear 
relationship plots without any trouble. 

Cubic is the 'best' model, but it is not plotting as a smooth curve using the 
following code:

cubic.lm<- lm(y~poly(x,3))
lines(x,predict(cubic.lm),lwd=2)

How do I plot the data and the estimated curves for all of these regression 
models in the same plot? 

x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)

y <- 
c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042,0.042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022)

Thanks in advance.

Rhonda Reidy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] legend mathematical annotation problem

2010-02-13 Thread Mark Heckmann
1) I need to plot a legend containing the mathematical symbol greater- 
than-or-equal sign.

And I want the text to start with that symbol.

plot(110, 0.8)
categories <- expression(blank >= 85, 84.9 - 80, 79.9 - 75, 74.9 - 70,  
69.9 - 65, 64.9 - 60, blank< 60)

legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)

What I want is just ">=85" to be printed, but without something (here  
"blank") in front it is no correct expression.


2) Also I want German type decimals, that is a comma instead of a point.
But the problem is, that the comma is used as argument separator in  
expression.


plot(110, 0.8)
categories <- expression(blank >= 85, 84,9 - 80, 79,9 - 75, 74,9 - 70,  
69,9 - 65, 64,9 - 60, blank< 60)

legend(110, 0.8, categories, lty=1:7, lwd=3, col=1, merge=TRUE)

This does obviously not work. Any ideas?

Thanks,
Mark
–––
Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2010-02-13 Thread Ben Bolker
Hussain Abusaaq  fsu.edu> writes:

> I have this vector and I want to change it to date.
> 
> for example G=[05 12 2008]
> 
> or g=[2]
> f=[3]
> y=[2208]
> 

Something like as.Date(paste(rev(G),collapse="-"))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2010-02-13 Thread Hussain Abusaaq
Hi, I havw some porblem in R and i need your help.

I have this vector and I want to change it to date.

for example G=[05 12 2008]


or g=[2]
f=[3]
y=[2208]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to do calculations in data matrices?

2010-02-13 Thread Sarah Goslee
Some places to start
apply()
sweep()
and any of the good introductory guides to R available online.

For finding R material, it's vastly easier to use the custom search engine
at www.rseek.org than to wander aimlessly around google trying to
find things with "R" in them.

We'd be happy to give more specific answers if you need them,
especially if you do as the posting guide asks and provide
a reproducible example. (Not that you need one for this question,
since you did ask just for references.)

Sarah

On Sat, Feb 13, 2010 at 12:39 PM, Zoppoli, Gabriele (NIH/NCI) [G]
 wrote:
> Please give me just a reference where I can find something useful.
>
> In summary, I need to :
>
> - find the median of each row of a matrix
> - create a new matrix with each value in the first matrix divided by the 
> median of its row
> - if a value "a" in the second matrix is < 1, I need to substitute it with 1/a
>
> I know that for some of you it must be overeasy, but I swear I googled for 
> two hours with keywords "operations", "calculations", "data matrices", "data 
> tables", and "CRAN", and I didn't find anything useful.
>
> Thank you all
>
>
> Gabriele Zoppoli, MD
> Ph.D. Fellow, Experimental and Clinical Oncology and Hematology, University 
> of Genova, Genova, Italy
> Guest Researcher, LMP, NCI, NIH, Bethesda MD
>
> Work: 301-451-8575
> Mobile: 301-204-5642
> Email: zoppo...@mail.nih.gov
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Sarah Goslee
http://www.stringpage.com
http://www.astronomicum.com
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logical operations with lists

2010-02-13 Thread Greg Snow
The fact that A and B have dimensions indicates that they are more than simple 
vectors.  It appears that when you create D you are subsetting columns rather 
than rows, try:

> D <- C[which(C %in% A ==FALSE),] # note the ","

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Zoppoli, Gabriele (NIH/NCI) [G]
> Sent: Friday, February 12, 2010 3:58 PM
> To: r-help@r-project.org
> Subject: Re: [R] logical operations with lists
> 
> I'm sorry but here's what I get:
> 
> > A[1:10,]
>  [1] UQCRC1 IDH3B  PDHA1  SUCLA2 COX5B  SDHB   SDHA   MDH2   DLD
> COQ7
> 
> > dim(A)
> [1] 10131
> 
> > B[1:10,]
>  [1] 3.8-1.2 3.8-1.3 3.8-1.4 3.8-1.5 5-HT3c2 A1BGA1CFA2BP1
> A2LD1   A2M
> 
> > dim(B)
> [1] 55546 1
> 
> > C<-rbind(A,B)
> > dim(C)
> [1] 56559 1
> 
> > D <- C[which(C %in% A ==FALSE)]
> > dim(D)
> [1] 56559 0
> 
> and so with any other proposed method.
> 
> I imported the list A and B this way:
> 
> > A<-as.vector(read.delim("E:/A.txt",sep="\t",header=FALSE))
> 
> and then removed the redundant rows with:
> 
> > A<-unique(A)
> 
> Guess I'm doing something really wrong here... Sorry for the
> inexperience, I'm trying to improve...
> 
> 
> 
> 
> 
> Gabriele Zoppoli, MD
> Ph.D. Fellow, Experimental and Clinical Oncology and Hematology,
> University of Genova, Genova, Italy
> Guest Researcher, LMP, NCI, NIH, Bethesda MD
> 
> Work: 301-451-8575
> Mobile: 301-204-5642
> Email: zoppo...@mail.nih.gov
> 
> From: jbreic...@gmail.com [jbreic...@gmail.com] On Behalf Of Jonathan
> [jonsle...@gmail.com]
> Sent: Friday, February 12, 2010 5:21 PM
> To: Zoppoli, Gabriele (NIH/NCI) [G]
> Cc: r-help@r-project.org
> Subject: Re: [R] logical operations with lists
> 
> This is probably not the best way, but (assuming you had vectors and
> not lists, since I'm not sure what your list looks like):
> 
> C <- B[which(B %in% A ==FALSE)]
> 
> Regards,
> Jonathan
> 
> 
> On Fri, Feb 12, 2010 at 5:06 PM, Zoppoli, Gabriele (NIH/NCI) [G]
>  wrote:
> > Sorry, maybe it's easy but I haven't found anything useful:
> >
> > how can I obtain a list C that contains all the members in the list B
> that are not in list A? This are lists of nanes, not numbers!
> >
> > Thank you
> >
> >
> > Gabriele Zoppoli, MD
> > Ph.D. Fellow, Experimental and Clinical Oncology and Hematology,
> University of Genova, Genova, Italy
> > Guest Researcher, LMP, NCI, NIH, Bethesda MD
> >
> > Work: 301-451-8575
> > Mobile: 301-204-5642
> > Email: zoppo...@mail.nih.gov
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to do calculations in data matrices?

2010-02-13 Thread Zoppoli, Gabriele (NIH/NCI) [G]
Please give me just a reference where I can find something useful. 

In summary, I need to :

- find the median of each row of a matrix
- create a new matrix with each value in the first matrix divided by the median 
of its row
- if a value "a" in the second matrix is < 1, I need to substitute it with 1/a

I know that for some of you it must be overeasy, but I swear I googled for two 
hours with keywords "operations", "calculations", "data matrices", "data 
tables", and "CRAN", and I didn't find anything useful. 

Thank you all


Gabriele Zoppoli, MD
Ph.D. Fellow, Experimental and Clinical Oncology and Hematology, University of 
Genova, Genova, Italy
Guest Researcher, LMP, NCI, NIH, Bethesda MD

Work: 301-451-8575
Mobile: 301-204-5642
Email: zoppo...@mail.nih.gov
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple figures margin problem

2010-02-13 Thread Greg Snow
I still have not received a copy of the esp package, so I don't know what 
exactly you tried, or what the results are.

The following works for me and seems to fit your description, but without code 
and details of the results and how they differ from what you expect, we can 
only guess.

x1 <- runif(100, 1, 25)
x2 <- runif(100, 1, 25)
y1 <- rnorm(100, x1)
y2 <- rnorm(100, x2)
y3 <- rnorm(100, x1+x2)


par(mfcol=c(3,2), mar=c(0,0,0,0), oma=c(5,4,1,1)+0.1, xpd=NA)

plot(x1,y1, xaxt='n')
plot(x1,y2, xaxt='n')
plot(x1,y3)
plot(x2,y1, xaxt='n', yaxt='n')
plot(x2,y2, xaxt='n', yaxt='n')
plot(x2,y3, yaxt='n')



you may also want to look at the pairs function or the pairs2 function 
(TeachingDemos package) for examples of axes in the margins.  In fact you can 
get similar to the above by 

pairs2( cbind(x1,x2), cbind(y1,y2,y3), gap=0 )





-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of mnstn
> Sent: Friday, February 12, 2010 3:27 PM
> To: r-help@r-project.org
> Subject: Re: [R] Multiple figures margin problem
> 
> 
> Hello Greg,
> I tried that and got a similar result. The axes are still hidden. I am
> studying R Intro and Fig2A and 3B in
> http://research.stowers-institute.org/efg/R/Graphics/Basics/mar-
> oma/index.htm
> . They seem to indicate that the "mar" parameter alone controls the
> visibility of axis labels. Am I missing something obvious?
> MoonStone
> --
> View this message in context: http://n4.nabble.com/Multiple-figures-
> margin-problem-tp1490455p1490493.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Labels on a pyramide

2010-02-13 Thread Orvalho Augusto
This is good!

Yes I must learn about dput.

Thank you
Caveman



On Sat, Feb 13, 2010 at 6:57 PM, David Winsemius  wrote:
> I took a shot at getting "assymetric lableling but it's by no means perfect.
> You really, really ,really, should learn to offer data with dput. See below:
>
>
> On Feb 13, 2010, at 9:27 AM, Orvalho Augusto wrote:
>
>> I am using pyramid.plot() from the plotrix package.
>>
>> I have something like this
>>
>> 
>> xy.pop<-dados$masfr
>> xx.pop<-dados$femfr
>> #agelabels<-dados$femlab
>> xycol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),11)
>> xxcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),11)
>> xylab<-dados$maslab
>> xxlab<-dados$femlab
>> agelabels<-xylab
>>
>> png("piramide9808.png")
>>
>> par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels,top.labels=c("Masculino","","Feminino"),
>>      main="Primeiras 10 cancros mais frequentes por
>> sexo...",xycol=xycol,xxcol=xxcol,gap=0, labelcex=0))
>>
>> dev.off()
>>
>> #
>>
>> the dados a dataframe fabircated by someother program and looks like:
>>
>> ##
>> ordem   femlab  femfa   femfr   maslab  masfa   masfr
>>
>> 1       Colo do utero   258     26.76348548     Prostata        613
>> 43.81701215
>>
>> 2       Mama    110     11.41078838     Figado  84      6.004288778
>>
>> 3       Esofago 62      6.43153527      Pele    70      5.003573981
>>
>> 4       Figado  60      6.22406639      Sarcoma de Kaposi       65
>>  4.64617584
>>
>> 5       Pele    48      4.979253112     Esofago 63      4.503216583
>>
>> 6       Bexiga  37      3.838174274     Pulmao  46      3.288062902
>>
>> 7       Corpo do utero  34      3.526970954     Bexiga  43
>>  3.073624017
>>
>> 8       "Utero, SOE"    28      2.904564315     Penis   33
>>  2.358827734
>>
>> 9       Sarcoma de Kaposi       28      2.904564315     Laringe 27
>>  1.929949964
>>
>> 10      Vulva e Vagina  24      2.489626556     Colon   24
>>  1.715511079
>>
>> 11      Outras localizacoes     275     28.52697095     Outras
>> localizacoes     331     23.65975697
>
>
> In case anyone wants to take a crack at this without the hassle of
> recreating htis dataset...
>
> dput(dados) can be used to recreate the complete dataframe
>
> dados <- structure(list(ordem = 1:11, femlab = structure(c(2L, 6L, 4L,
> 5L, 8L, 1L, 3L, 10L, 9L, 11L, 7L), .Label = c("Bexiga", "Colo do utero",
> "Corpo do utero", "Esofago", "Figado", "Mama", "Outras localizacoes",
> "Pele", "Sarcoma de Kaposi", "Utero SOE", "Vulva e Vagina"), class =
> "factor"),
>    femfa = c(258L, 110L, 62L, 60L, 48L, 37L, 34L, 28L, 28L,
>    24L, 275L), femfr = c(26.76348548, 11.41078838, 6.43153527,
>    6.22406639, 4.979253112, 3.838174274, 3.526970954, 2.904564315,
>    2.904564315, 2.489626556, 28.52697095), maslab = structure(c(9L,
>    4L, 7L, 11L, 3L, 10L, 1L, 8L, 5L, 2L, 6L), .Label = c("Bexiga",
>    "Colon", "Esofago", "Figado", "Laringe", "Outras localizacoes",
>    "Pele", "Penis", "Prostata", "Pulmao", "Sarcoma de Kaposi"
>    ), class = "factor"), masfa = c(613L, 84L, 70L, 65L, 63L,
>    46L, 43L, 33L, 27L, 24L, 331L), masfr = c(43.81701215, 6.004288778,
>    5.003573981, 4.64617584, 4.503216583, 3.288062902, 3.073624017,
>    2.358827734, 1.929949964, 1.715511079, 23.65975697)), .Names = c("ordem",
> "femlab", "femfa", "femfr", "maslab", "masfa", "masfr"), class =
> "data.frame", row.names = c(NA,
> -11L))
>
>
>
> xy.pop<-dados$masfr
> xx.pop<-dados$femfr
> #agelabels<-paste(dados$femlab, "\t\t\t", dados$maslab, sep='')
> xycol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),11)
> xxcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),11)
> #xylab<-dados$maslab
> #xxlab<-dados$femlab
> #agelabels<-xylab
>
>
> par(mar=pyramid.plot(xy.pop, xx.pop, labels= rep("",length(xy.pop)),
> top.labels=c("Masculino", "", "Feminino"),
>      main="Primeiras 10 cancros mais frequentes por
> sexo...", xycol=xycol,xxcol=xxcol, gap=0))
> text(-xy.pop-5*nchar(dados$maslab), 1:length(xy.pop), dados$maslab
> );text(xx.pop+7*nchar(dados$femlab), 1:length(xx.pop), dados$femlab )
>
> # Could not get the right ratio of xx.pop to nchar to get completely correct
> placement and given the  small number you might just want to put in a
> "hand-crafted" vector,
>
>>
>>
>> #
>>
>> The problem is (1) I do not want plot agelabels on the center and (2)
>> I want plot different labels for each pair of the bars (one label for
>> masculine and the other feminine).
>>
>> The data represent the 10 most frequent cancer in a group of individuals.
>>
>> Can some one help please?
>>
>> Caveman
>>
>>
>>
>> --
>> OpenSource Software Consultant
>> CENFOSS (www.cenfoss.co.mz)
>> SP Tech (www.sptech.co.mz)
>> email: orvaq...@cenfoss.co.mz
>> cell: +258828810980
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>

Re: [R] Labels on a pyramide

2010-02-13 Thread David Winsemius
I took a shot at getting "assymetric lableling but it's by no means  
perfect. You really, really ,really, should learn to offer data with  
dput. See below:



On Feb 13, 2010, at 9:27 AM, Orvalho Augusto wrote:


I am using pyramid.plot() from the plotrix package.

I have something like this


xy.pop<-dados$masfr
xx.pop<-dados$femfr
#agelabels<-dados$femlab
xycol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),11)
xxcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1), 
11)

xylab<-dados$maslab
xxlab<-dados$femlab
agelabels<-xylab

png("piramide9808.png")
par 
(mar 
= 
pyramid 
.plot 
(xy 
.pop,xx.pop,labels=agelabels,top.labels=c("Masculino","","Feminino"),

  main="Primeiras 10 cancros mais frequentes por
sexo...",xycol=xycol,xxcol=xxcol,gap=0, labelcex=0))

dev.off()

#

the dados a dataframe fabircated by someother program and looks like:

##
ordem   femlab  femfa   femfr   maslab  masfa   masfr

1   Colo do utero   258 26.76348548 Prostata613 
43.81701215

2   Mama110 11.41078838 Figado  84  6.004288778

3   Esofago 62  6.43153527  Pele70  5.003573981

4   Figado  60  6.22406639  Sarcoma de Kaposi   65  
4.64617584

5   Pele48  4.979253112 Esofago 63  4.503216583

6   Bexiga  37  3.838174274 Pulmao  46  3.288062902

7   Corpo do utero  34  3.526970954 Bexiga  43  3.073624017

8   "Utero, SOE"  28  2.904564315 Penis   33  2.358827734

9   Sarcoma de Kaposi   28  2.904564315 Laringe 27  
1.929949964

10  Vulva e Vagina  24  2.489626556 Colon   24  1.715511079

11	Outras localizacoes	275	28.52697095	Outras localizacoes	331	 
23.65975697



In case anyone wants to take a crack at this without the hassle of  
recreating htis dataset...


dput(dados) can be used to recreate the complete dataframe

dados <- structure(list(ordem = 1:11, femlab = structure(c(2L, 6L, 4L,
5L, 8L, 1L, 3L, 10L, 9L, 11L, 7L), .Label = c("Bexiga", "Colo do utero",
"Corpo do utero", "Esofago", "Figado", "Mama", "Outras localizacoes",
"Pele", "Sarcoma de Kaposi", "Utero SOE", "Vulva e Vagina"), class =  
"factor"),

femfa = c(258L, 110L, 62L, 60L, 48L, 37L, 34L, 28L, 28L,
24L, 275L), femfr = c(26.76348548, 11.41078838, 6.43153527,
6.22406639, 4.979253112, 3.838174274, 3.526970954, 2.904564315,
2.904564315, 2.489626556, 28.52697095), maslab = structure(c(9L,
4L, 7L, 11L, 3L, 10L, 1L, 8L, 5L, 2L, 6L), .Label = c("Bexiga",
"Colon", "Esofago", "Figado", "Laringe", "Outras localizacoes",
"Pele", "Penis", "Prostata", "Pulmao", "Sarcoma de Kaposi"
), class = "factor"), masfa = c(613L, 84L, 70L, 65L, 63L,
46L, 43L, 33L, 27L, 24L, 331L), masfr = c(43.81701215, 6.004288778,
5.003573981, 4.64617584, 4.503216583, 3.288062902, 3.073624017,
2.358827734, 1.929949964, 1.715511079, 23.65975697)), .Names =  
c("ordem",
"femlab", "femfa", "femfr", "maslab", "masfa", "masfr"), class =  
"data.frame", row.names = c(NA,

-11L))



xy.pop<-dados$masfr
xx.pop<-dados$femfr
#agelabels<-paste(dados$femlab, "\t\t\t", dados$maslab, sep='')
xycol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),11)
xxcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),11)
#xylab<-dados$maslab
#xxlab<-dados$femlab
#agelabels<-xylab


par(mar=pyramid.plot(xy.pop, xx.pop, labels= rep("",length(xy.pop)),  
top.labels=c("Masculino", "", "Feminino"),

  main="Primeiras 10 cancros mais frequentes por
sexo...", xycol=xycol,xxcol=xxcol, gap=0))
text(-xy.pop-5*nchar(dados$maslab), 1:length(xy.pop), dados 
$maslab );text(xx.pop+7*nchar(dados$femlab), 1:length(xx.pop), dados 
$femlab )


# Could not get the right ratio of xx.pop to nchar to get completely  
correct placement and given the  small number you might just want to  
put in a "hand-crafted" vector,





#

The problem is (1) I do not want plot agelabels on the center and (2)
I want plot different labels for each pair of the bars (one label for
masculine and the other feminine).

The data represent the 10 most frequent cancer in a group of  
individuals.


Can some one help please?

Caveman



--
OpenSource Software Consultant
CENFOSS (www.cenfoss.co.mz)
SP Tech (www.sptech.co.mz)
email: orvaq...@cenfoss.co.mz
cell: +258828810980

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R

Re: [R] Help for numbers close to 1 in R

2010-02-13 Thread Gabor Grothendieck
Check out ?expm1

On Sat, Feb 13, 2010 at 9:17 AM, cjmr  wrote:
>
> Hi all.
>
> I have a problem with adding 1 and very small numbers
>
> ex:
>>p=1e-49
>>1+p
> [1] 1
>
> My problem comes from some values of x for the value:
>    1-exp(-sum((x-1/sqrt(3))^2))
> the return values are always equivalent.
>
> Please help
> Thanks
> --
> View this message in context: 
> http://n4.nabble.com/Help-for-numbers-close-to-1-in-R-tp1554416p1554416.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why double quote is preferred?

2010-02-13 Thread Douglas Bates
On Fri, Feb 12, 2010 at 4:25 PM, blue sky  wrote:
> ?'`' shows the following:
>
> "Single and double quotes delimit character constants. They can be
> used interchangeably but double quotes are *preferred* (and character
> constants are printed using double quotes), so single quotes are
> normally only used to delimit character constants containing double
> quotes."
>
> It is not clear to me why double quote is preferred (I don't think
> that "character constants are printed using double quotes" should be
> the reason, in the sense that single quote can be used for printing if
> we want to do so). It seems that double quote and single quote can be
> used interchangeably, except that "Single quotes need to be escaped by
> backslash in single-quoted strings, and double quotes in double-quoted
> strings."
>
> Could somebody why double quote is preferred?

To avoid confusion for those who are accustomed to programming in the
C family of languages (C, C++, Java), where there is a difference in
the meaning of single quotes and double quotes.
A C programmer reads 'a' as a single character and "a" as a character
string consisting of the letter 'a' followed by a null character to
terminate the string.  In R there is no character data type, there are
only character strings.  For consistency with other languages it helps
if character strings are delimited by double quotes.  The single quote
version in R is for convenience.  On most keyboards you don't need to
use the shift key to type a single quote but you do need the shift for
a double quote.

P.S. I do not plan to get into an extended debate on this issue, Peng.
 (As others have pointed out, it is considered bad form to disguise
your identity on this list.)  Your opinions on the design of the
language have been noted but if you want the language to be redesigned
to your specifications, you will need to fork your own version.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help for numbers close to 1 in R

2010-02-13 Thread Liviu Andronic
On Sat, Feb 13, 2010 at 2:17 PM, cjmr  wrote:
>>p=1e-49
>>1+p
> [1] 1
>

Try this:
> source("http://r-bc.googlecode.com/svn/trunk/R/bc.R";)
> p <- bc(1e-49)
> p
[1] ".1"
> 1+p
[1] "1.1"

You might want to check this [1] and search the archives for "precision".
Liviu

[1] http://code.google.com/p/r-bc/


> My problem comes from some values of x for the value:
>    1-exp(-sum((x-1/sqrt(3))^2))
> the return values are always equivalent.
>
> Please help
> Thanks
> --
> View this message in context: 
> http://n4.nabble.com/Help-for-numbers-close-to-1-in-R-tp1554416p1554416.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create probeAnno object?

2010-02-13 Thread Martin Morgan
On 02/11/2010 08:03 PM, AMBUJ wrote:
> Hi,
> Thank you Viviana for the description to create probeAnno object. 
> The below link was very helpful:
> http://svitsrv25.epfl.ch/R-doc/library/Ringo/html/probeAnnoClass.html

This is just the help page for one of Ringo's classes. It's available in
R with

  > library(Ringo)
  > ?probeAnno

Have you looked at

  > browseVignettes("Ringo")

? Ringo is a Bioconductor package so ask on the Bioconductor mailing list

  http://bioconductor.org

Might as well include the maintainer in your email, since they're likely
to be the expert

  > packageDescription("Ringo")

and always useful to include the output of

  > sessionInfo()

to unambiguously report the version of R and Ringo you're using. A first
action is usually to update to the latest released R and corresponding
Bioconductor packages,

  http://bioconductor.org/install

> 
> I tried creating the object in the following ways where: startProbe & 
> endProbe are the vectors which has the genomic start co-ordinates and end 
> co-ordinates and index is a vector to store identifier of the probes. 
> intensityData is the vector that stores data to be segmented
> 
> Method 1.
>> map<-new("environment",startProbe,endProbe,index)

calling 'new' without naming the second and subsequent arguments is not
likely to create the object you expect. Maybe

  map = new("environment",
 startProbe=startProbe, endProbe=endProbe,
 index=index)

?

>> arrayName<-"2009_01_18_veg1028_w"
>> genome<-"genome"
>> probeAnnotation<-new("probeAnno",map,arrayName,genome)

Here you're probably aiming for something like

  new("probeAnno", map=map, arrayName=arrrayName, genome=genome)

Martin

>> segEnv<-segChrom(intensityData,probeAnno=probeAnnotation,chr="1",strands="+",nrBasesPerSegment=750)
> 

> Running 'segment' on chromosome 1.+Error in mget(paste(chrstrd[j],
what, sep = "."), probeAnno) : second argument must be an environment

> Method 2.
>> pa3<-posToProbeAnno("~/523/POSFormat_tab.csv");Creating probeAnno mapping 
>> for chromosome 1 Done.
>> arrayName(pa3)<-"S.Pombe"
>> genome(pa3)<-"genome"
>> show(pa3)
> A 'probeAnno' object holding the mapping between reporters and genomic 
> positions.
> Chromosomes: 1
> Microarray platform: S.Pombe 
> Genome: genome
>> segEnv<-segChrom(intensityData,probeAnno=pa3,chr="1",strands="+",nrBasesPerSegment=750)
> 
> Running 'segment' on chromosome 1.+Error in mget(paste(chrstrd[j], what, sep 
> = "."), probeAnno) :   second argument must be an environment
> 
> 
> Both the methods gave the same error "second argument must be an 
> environment". I am unable to execute segChrom() of tilingArray package. Any 
> sugetions would be helpful. Thanks in advance.
> 
> 
> Regards,
> 
> Ambuj A Thacker
> 
> 
>   The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
> http://in.yahoo.com/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using getSYMBOL, annotate package on a list with empty elements.

2010-02-13 Thread Martin Morgan
On 02/13/2010 07:17 AM, David Winsemius wrote:
> 
> On Feb 13, 2010, at 10:09 AM, Martin Morgan wrote:
> 
>> On 02/13/2010 06:21 AM, David Winsemius wrote:
>>>
>>> On Feb 13, 2010, at 3:18 AM, Sahil Seth wrote:
>>>
 Hi,
 I have been trying to find a solution to this issue, but have not been
 able
 to so !
 I am trying to use sapply on the function getSYMBOL,
>>>
>>> The annotate package is from BioConductor.
>>>
 an extract from the list is:
> test.goP[13:14]
 $`GO:050`
 IEA   IEA   IEA   IEA   TAS   TAS   TAS
 IEA
 "5270753" "5720725" "1690128" "4850681"  "110433" "2640544" "4900370"
 "1430280"
 IEA   NAS   TAS   IEA
 "6110044" "1170615" "6590546" "1690632"

 $`GO:052`
 [1] NA

 goG=sapply(test.goP,getSYMBOL,data="hgu95av2")
>>>
>>> I was a bit surprised to see a data= argument to sapply. That didn't
>>> seem typical but perhaps I am not aware of all the tricks available.
>>
>>> args(sapply)
>> function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
>>
>> The ... arguments are passed to FUN. Here FUN is getSYMBOL
>>
>>> args(getSYMBOL)
>> function (x, data)
>>
>> so data="hgu95av2" is used in each application of getSYMBOL. This could
>> have been written as
>>
>>  sapply(test.goP, getSYMBOL, "hgu95av2")
>>
>> using positional matching to getSYMBOL's arguments. A 'trick' that is
>> sometimes useful is to use named arguments to force the apply to vary
>> the second (or other) argument, rather than the first.
>>
>>
 error: "Error in .checkKeysAreWellFormed(keys) :
 keys must be supplied in a character vector with no NAs "
 In this the 14th element has missing values, thus getSYMBOL raises
 issues.
>>>
>>> The way this is being displayed makes me think you are processing a list
>>> with named elements. You should use str to determine what test.goP
>>> really is. Then you will have a better idea what functions would be
>>> appropriate. you may want to compose a function to go with the sapply
>>> loop that omots the NA's:
>>>
>>> ?na.omit
>>>
>>> You should also post the results of dput or dump on the test objects you
>>> are working with. That way the list readers can also get access to that
>>> information.

 GetSYMBOL has to be given a char array, so a simple solution is
 infact to
 delete the missing elements from the list.

 I have been trying to find a solution for it, but in vain:
 tried: completecases(goP), na.omit(goP) and several other things.

 Any suggestions please ?
>>
>>
> 
> Would that be:
> 
>  sapply(!is.na(test.goP), getSYMBOL, "hgu95av2")

actually,

  sapply(test.goP[!is.na(test.goP)], getSYMBOL, "hgu95av2")

;).

Martin

> 
> ?
> 
>>
>>>
>>> In addition to the above suggestions ... post on the correct list?
>>
>>  http://bioconductor.org/
>>
>> Martin
>>
>>>
 -- 
> 
> 
>> Phone: (206) 667-2793
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using getSYMBOL, annotate package on a list with empty elements.

2010-02-13 Thread David Winsemius


On Feb 13, 2010, at 10:09 AM, Martin Morgan wrote:


On 02/13/2010 06:21 AM, David Winsemius wrote:


On Feb 13, 2010, at 3:18 AM, Sahil Seth wrote:


Hi,
I have been trying to find a solution to this issue, but have not  
been

able
to so !
I am trying to use sapply on the function getSYMBOL,


The annotate package is from BioConductor.


an extract from the list is:

test.goP[13:14]

$`GO:050`
IEA   IEA   IEA   IEA   TAS   TAS   TAS
IEA
"5270753" "5720725" "1690128" "4850681"  "110433" "2640544"  
"4900370"

"1430280"
IEA   NAS   TAS   IEA
"6110044" "1170615" "6590546" "1690632"

$`GO:052`
[1] NA

goG=sapply(test.goP,getSYMBOL,data="hgu95av2")


I was a bit surprised to see a data= argument to sapply. That didn't
seem typical but perhaps I am not aware of all the tricks available.



args(sapply)

function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

The ... arguments are passed to FUN. Here FUN is getSYMBOL


args(getSYMBOL)

function (x, data)

so data="hgu95av2" is used in each application of getSYMBOL. This  
could

have been written as

 sapply(test.goP, getSYMBOL, "hgu95av2")

using positional matching to getSYMBOL's arguments. A 'trick' that is
sometimes useful is to use named arguments to force the apply to vary
the second (or other) argument, rather than the first.



error: "Error in .checkKeysAreWellFormed(keys) :
keys must be supplied in a character vector with no NAs "
In this the 14th element has missing values, thus getSYMBOL raises
issues.


The way this is being displayed makes me think you are processing a  
list

with named elements. You should use str to determine what test.goP
really is. Then you will have a better idea what functions would be
appropriate. you may want to compose a function to go with the sapply
loop that omots the NA's:

?na.omit

You should also post the results of dput or dump on the test  
objects you
are working with. That way the list readers can also get access to  
that

information.


GetSYMBOL has to be given a char array, so a simple solution is  
infact to

delete the missing elements from the list.

I have been trying to find a solution for it, but in vain:
tried: completecases(goP), na.omit(goP) and several other things.

Any suggestions please ?





Would that be:

 sapply(!is.na(test.goP), getSYMBOL, "hgu95av2")

?





In addition to the above suggestions ... post on the correct list?


 http://bioconductor.org/

Martin




--




Phone: (206) 667-2793


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help for numbers close to 1 in R

2010-02-13 Thread cjmr

Hi all.

I have a problem with adding 1 and very small numbers

ex:
>p=1e-49
>1+p
[1] 1

My problem comes from some values of x for the value:
1-exp(-sum((x-1/sqrt(3))^2))
the return values are always equivalent.

Please help
Thanks
-- 
View this message in context: 
http://n4.nabble.com/Help-for-numbers-close-to-1-in-R-tp1554416p1554416.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using getSYMBOL, annotate package on a list with empty elements.

2010-02-13 Thread Martin Morgan
On 02/13/2010 06:21 AM, David Winsemius wrote:
> 
> On Feb 13, 2010, at 3:18 AM, Sahil Seth wrote:
> 
>> Hi,
>> I have been trying to find a solution to this issue, but have not been
>> able
>> to so !
>> I am trying to use sapply on the function getSYMBOL,
> 
> The annotate package is from BioConductor.
> 
>> an extract from the list is:
>>> test.goP[13:14]
>> $`GO:050`
>>  IEA   IEA   IEA   IEA   TAS   TAS   TAS
>> IEA
>> "5270753" "5720725" "1690128" "4850681"  "110433" "2640544" "4900370"
>> "1430280"
>>  IEA   NAS   TAS   IEA
>> "6110044" "1170615" "6590546" "1690632"
>>
>> $`GO:052`
>> [1] NA
>>
>> goG=sapply(test.goP,getSYMBOL,data="hgu95av2")
> 
> I was a bit surprised to see a data= argument to sapply. That didn't
> seem typical but perhaps I am not aware of all the tricks available.

> args(sapply)
function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

The ... arguments are passed to FUN. Here FUN is getSYMBOL

> args(getSYMBOL)
function (x, data)

so data="hgu95av2" is used in each application of getSYMBOL. This could
have been written as

  sapply(test.goP, getSYMBOL, "hgu95av2")

using positional matching to getSYMBOL's arguments. A 'trick' that is
sometimes useful is to use named arguments to force the apply to vary
the second (or other) argument, rather than the first.


>> error: "Error in .checkKeysAreWellFormed(keys) :
>>  keys must be supplied in a character vector with no NAs "
>> In this the 14th element has missing values, thus getSYMBOL raises
>> issues.
> 
> The way this is being displayed makes me think you are processing a list
> with named elements. You should use str to determine what test.goP
> really is. Then you will have a better idea what functions would be
> appropriate. you may want to compose a function to go with the sapply
> loop that omots the NA's:
> 
> ?na.omit
> 
> You should also post the results of dput or dump on the test objects you
> are working with. That way the list readers can also get access to that
> information.
>>
>> GetSYMBOL has to be given a char array, so a simple solution is infact to
>> delete the missing elements from the list.
>>
>> I have been trying to find a solution for it, but in vain:
>> tried: completecases(goP), na.omit(goP) and several other things.
>>
>> Any suggestions please ?

  sapply(is.na(test.goP), getSYMBOL, "hgu95av2")

> 
> In addition to the above suggestions ... post on the correct list?

  http://bioconductor.org/

Martin

> 
>> -- 
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failed install of package xlsReadWrite

2010-02-13 Thread Hans-Peter Suter
2010/2/11  :
> Does anyone have a work-around for a failed installation of this package?
>> library(xlsReadWrite)
...
> xls.getshlib()
>
> However, the xls.getshlib() command fails with this error:

(For the record) It probably was a corporate firewall problem.

Apart from info on www.swissr.org (thanks for pointing this out) you
can always download the file directly. The full listing is here:
http://dl.dropbox.com/u/2602516/swissrpkg/listing.html

Direct links to the current (R2.10.x/1.5.1 pkg) version, either:
- download the zipped dll:
http://dl.dropbox.com/u/2602516/swissrpkg/bin/win32/shlib/xlsReadWrite_1.5.1_dll.zip
and manually replace xlsReadWrite.dll (in e.g.
'C:\Programme\R\R-2.10.1\library\xlsReadWrite\libs') or
- download the full package:
http://dl.dropbox.com/u/2602516/swissrpkg/bin/win32/2.10/xlsReadWrite_1.5.1.zip
and re-install.

Note: 0.0.0-versions are development versions. They will be updated
when there is something 'fixworthy' while a new official version is
'in wait mode'. Currently 0.0.0 fixes a bug with non-absolute file
paths in 1.5.1.

Cheers,
Hans-Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Labels on a pyramide

2010-02-13 Thread Orvalho Augusto
I am using pyramid.plot() from the plotrix package.

I have something like this


xy.pop<-dados$masfr
xx.pop<-dados$femfr
#agelabels<-dados$femlab
xycol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),11)
xxcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),11)
xylab<-dados$maslab
xxlab<-dados$femlab
agelabels<-xylab

png("piramide9808.png")
par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels,top.labels=c("Masculino","","Feminino"),
   main="Primeiras 10 cancros mais frequentes por
sexo...",xycol=xycol,xxcol=xxcol,gap=0, labelcex=0))

dev.off()

#

the dados a dataframe fabircated by someother program and looks like:

##
ordem   femlab  femfa   femfr   maslab  masfa   masfr

1   Colo do utero   258 26.76348548 Prostata613 
43.81701215

2   Mama110 11.41078838 Figado  84  6.004288778

3   Esofago 62  6.43153527  Pele70  5.003573981

4   Figado  60  6.22406639  Sarcoma de Kaposi   65  
4.64617584

5   Pele48  4.979253112 Esofago 63  4.503216583

6   Bexiga  37  3.838174274 Pulmao  46  3.288062902

7   Corpo do utero  34  3.526970954 Bexiga  43  3.073624017

8   "Utero, SOE"28  2.904564315 Penis   33  2.358827734

9   Sarcoma de Kaposi   28  2.904564315 Laringe 27  
1.929949964

10  Vulva e Vagina  24  2.489626556 Colon   24  1.715511079

11  Outras localizacoes 275 28.52697095 Outras localizacoes 
331 23.65975697


#

The problem is (1) I do not want plot agelabels on the center and (2)
I want plot different labels for each pair of the bars (one label for
masculine and the other feminine).

The data represent the 10 most frequent cancer in a group of individuals.

Can some one help please?

Caveman



-- 
OpenSource Software Consultant
CENFOSS (www.cenfoss.co.mz)
SP Tech (www.sptech.co.mz)
email: orvaq...@cenfoss.co.mz
cell: +258828810980

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using getSYMBOL, annotate package on a list with empty elements.

2010-02-13 Thread David Winsemius


On Feb 13, 2010, at 3:18 AM, Sahil Seth wrote:


Hi,
I have been trying to find a solution to this issue, but have not  
been able

to so !
I am trying to use sapply on the function getSYMBOL,


The annotate package is from BioConductor.


an extract from the list is:

test.goP[13:14]

$`GO:050`
 IEA   IEA   IEA   IEA   TAS   TAS   TAS
IEA
"5270753" "5720725" "1690128" "4850681"  "110433" "2640544" "4900370"
"1430280"
 IEA   NAS   TAS   IEA
"6110044" "1170615" "6590546" "1690632"

$`GO:052`
[1] NA

goG=sapply(test.goP,getSYMBOL,data="hgu95av2")


I was a bit surprised to see a data= argument to sapply. That didn't  
seem typical but perhaps I am not aware of all the tricks available.



error: "Error in .checkKeysAreWellFormed(keys) :
 keys must be supplied in a character vector with no NAs "
In this the 14th element has missing values, thus getSYMBOL raises  
issues.


The way this is being displayed makes me think you are processing a  
list with named elements. You should use str to determine what  
test.goP really is. Then you will have a better idea what functions  
would be appropriate. you may want to compose a function to go with  
the sapply loop that omots the NA's:


?na.omit

You should also post the results of dput or dump on the test objects  
you are working with. That way the list readers can also get access to  
that information.


GetSYMBOL has to be given a char array, so a simple solution is  
infact to

delete the missing elements from the list.

I have been trying to find a solution for it, but in vain:
tried: completecases(goP), na.omit(goP) and several other things.

Any suggestions please ?


In addition to the above suggestions ... post on the correct list?


--


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "drop if missing" command?

2010-02-13 Thread David Winsemius


On Feb 12, 2010, at 6:48 PM, Nora Kozloff wrote:


This will probably seem very simple to experienced R programmers:

I am doing a snp association analysis and am at the model-fitting  
stage. I

am using  the Stats package's  "drop1" with the following code:
##geno is the dataset
## the dependent variable (casectrln) is dichotomous and coded 0,1
##  rs743572_2 is one of the snps (which is coded 0,1,2 for the 3  
genotypes)


library(stats)

modadd = glm(geno$casectrln ~rs743572_2 + factor(racegrp)+  
factor(smokgp)+

factor(alcgp)+ factor(bmigp) + factor(ipssgp)
+ agebase, family="binomial")
drop1(mod,scale=0,test=c("Chisq"), x=NULL, k=2)

There is a great deal of missing data in this dataset for both snps  
and for
covariates--so I have been instructed not to simply drop all cases  
with

missing genotype or covariate data .


It sounds as though you have been asked to do some sort of imputation  
of missing data.



How can I drop the observations which
are missing  only for the snp I am modeling at the time, then  
reinstate

those observations to model the next snp?

Thanks for any help,

Nora

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS and RODBC

2010-02-13 Thread Frank E Harrell Jr

Daniel Nordlund wrote:
. . .



This is just a quick follow-up to my previous post.  Based on Prof. Ripley's 
response I went back and looked at the SAS log file and reread the RODBC help 
pages.  The problem of writing a SAS dataset was solved by setting 
colQuote=NULL in the call to the odbcConnect() function.

ch <- odbcConnect('sasodbc', believeNRows=FALSE, colQuote=NULL) 


I hope this will be useful to others who may have the SAS BASE product and want 
to do graphics or statistical analyses with their SAS data, but can't afford 
the high licensing fees for the SAS STAT and GRAPH modules.  Thanks to Prof. 
Ripley for the fine RODBC package.

Dan

Daniel Nordlund
Bothell, WA USA
 



Daniel since you have SAS BASE installed why not use sas.get in the 
Hmisc package and also get access to metadata such as variable labels 
that ODBC does not handle?  Besides providing better documentation, the 
labels are very useful as axis labels in plotting, etc.


Frank

--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with EXPASY HTML form submission in RCurl package

2010-02-13 Thread Duncan Temple Lang


Sunando Roy wrote:
> 
> Hi Duncan,
> 
> Thanks for your help. I changed the P but the output that I get is not
> what I expect. The form gets aborted without any actual output. I get
> the same result with
> 
> postForm("http://www.expasy.ch/tools/protscale.html";)


That URL  (...protscale.html) is the HTML page that contains the form.
It is not the URL to which you are supposed to submit the form request.
That information is in the attributes of the  .
That is

   http://www.expasy.ch/cgi-bin/protscale.pl?1

So you have to know a little about HTML forms in order to
figure out how to map the HTML description to a request.
That is what your browser does ( including hidden fields in
the form, etc.)
It is also the purpose of the R package RHTMLForms (www.omegahat.org/RHTMLForms
and
 install.packages("RHTMLForms", repos = "http://www.omegahat.org/R";, type = 
"source")
)

e.g.

 library(RHTMLForms)
 f = getHTMLFormDescription("http://www.expasy.ch/tools/protscale.html";)
 fun = createFunction(f[[2]])


 o = fun(prot_id = "P05130", weight_var = "exponential", style = "POST")

(The protscale.html is malformed with an < that messes up parsing the
 linear option.)

Now, of course, you have to parse the resulting HTML (in the string given in 
the variable o)
to get the information the form submission generated.


  D.


> 
> just with an added message that there was no input passed on. But with
> the input like I presented I get the same output. I could make some of
> the examples work like for e.g
> 
> postForm("http://www.omegahat.org/RCurl/testPassword/form_validation.cgi";,
> your_name = "Duncan",
> your_age = "35-55",
> your_sex = "m",
> submit = "submit",
> .opts = list(userpwd = "bob:welcome"))
> 
> which would suggest atleast the setup is correct.
> I parsed the expasy protscale source code to identify the variables but
> the form does not seem to go through. I can post the html body code if
> needed.
> 
> Regards
> 
> Sunando
> On Fri, Feb 12, 2010 at 3:54 PM, Duncan Temple Lang
> mailto:dun...@wald.ucdavis.edu>> wrote:
> 
> 
> 
> Sunando Roy wrote:
> > Hi,
> >
> > I am trying to submit a form to the EXPASY protscale server (
> > http://www.expasy.ch/tools/protscale.html). I am using the RCurl
> package and
> > the postForm function available in it. I have extracted the
> variables for
> > the form from the HTML source page. According to the syntax of
> postForm, I
> > just need to mention the url and assign values to the input
> mentioned in the
> > HTML code.
> > The code that I am using is:
> > postForm("http://www.expasy.ch/tools/protscale.html";,
> > "sequence" = "
> > ",
> > scale = "Molecular weight",
> > window = "5",
> > weight_edges = "100",
> > weight_var = "linear",
> > norm = "no",
> > submit = "Submit"), .checkparams = TRUE)
> 
> 
> I don't think that is what you actually submitted to R.
> It is a syntax error as you end the cal to postForm) after "Submit"
> and then have an extra  ", .checkparams = TRUE)" afterwards.
> 
> But, when you remove the ')' after  "Submit",
> the problem you get is that .checkparams is not a parameter
> of postForm(), but .checkParams is.  R is case-sensitive
> so the problem is that .checkparams is being treated as a parameter
> of your form.
> 
> So change the p to P in .checkparams, and it works.
> 
>  D.
> 
> > the constant error that I get is:
> > Error in postForm("http://www.expasy.ch/tools/protscale.html";,
> .params =
> > list(sequence = "not",  :  STRING_ELT() can only be applied to a
> 'character
> > vector', not a 'logical'
> >
> > Is there any other way to submit an HTML form in R ?
> >
> > Thanks for the help
> >
> > Regards
> >
> > Sunando
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org  mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hierarchical data sets: which software to use?

2010-02-13 Thread Anton du Toit
Hi Douglas,

Thanks for your helpful response. I've commented on some of the points
you raised below:

> Although this may not be helpful for your immediate goal, storing and
manipulating data of this size and complexity (and, I expect, cost for
collection) really calls for tools like relational databases.  A
single flat file of 2500 variables by 1500 cases is almost never the
best way to organize such data.  A normalized representation as a
collection of interlinked tables in a relational data base is much
more effective and less error prone.  The widespread use of
spreadsheets or SPSS data sets or SAS data sets which encourage the
"single table with a gargantuan number of columns, most of which are
missing data in most cases" approach to organization of longitudinal
data is regrettable.

I'm both relieved and daunted by this. Daunted because it means I'll
need to learn another package (probably postGreSQL or MySQL?), but
relieved because constructing a 2500 by 1500 file seemed intuitively
wrong, as well as introducing the possibility of errors
unnecessarily--surely it makes more sense to leave the data as is.

As far as immediate goals go--I am at the beginning of a thesis, and I
have more research planned after that, so I want to get things right
from the  start.


> For later analysis in R it is better to start with "long" form of the
data, as opposed to the "wide" form, even if it means repeating
demographic information over several occasions.  Using a relational
database allows for a long view to be generated without the
possibility of inconsistency in the demographics.  I am using the
descriptions "long" and "wide" in the sense that they are used in the
reshape help page.  See

?reshape

> in R.  The long view is also called the subject/occasion view in the
sense that each row corresponds to one subject on one occasion.

> Robert Gentleman's book "R Programming for Bioinformatics" provides
background on linking R to relational databases.

Thanks--I'll look this one up.


> As I said at the beginning, you may not want to undertake the
necessary study and effort to reorganize your data for this specific
project but if you do this a lot you may want to consider it.

As above: a stitch in time, I suppose.

Thanks again.

Anton

On Sat, Feb 6, 2010 at 3:22 AM, Douglas Bates  wrote:
> On Sun, Jan 31, 2010 at 10:24 PM, Anton du Toit  
> wrote:
>> Dear R-helpers,
>>
>> I’m writing for advice on whether I should use R or a different package or
>> language. I’ve looked through the R-help archives, some manuals, and some
>> other sites as well, and I haven’t done too well finding relevant info,
>> hence my question here.
>>
>> I’m working with hierarchical data (in SPSS lingo). That is, for each case
>> (person) I read in three types of (medical) record:
>>
>> 1. demographic data: name, age, sex, address, etc
>>
>> 2. ‘admissions’ data: this generally repeats, so I will have 20 or so
>> variables relating to their first hospital admission, then the same 20 again
>> for their second admission, and so on
>>
>> 3. ‘collections’ data, about 100 variables containing the results of a
>> battery of standard tests. These are administered at intervals and so this
>> is repeating data as well.
>>
>> The number of repetitions varies between cases, so in its one case per line
>> format the data is non-rectangular.
>>
>> At present I have shoehorned all of this into SPSS, with each case on one
>> line. My test database has 2,500 variables and 1,500 cases (or persons), and
>> in SPSS’s *.SAV format is ~4MB. The one I finally work with will be larger
>> again, though likely within one order of magnitude. Down the track, funding
>> permitting, I hope to be working with tens of thousands of cases.
>
> Although this may not be helpful for your immediate goal, storing and
> manipulating data of this size and complexity (and, I expect, cost for
> collection) really calls for tools like relational databases.  A
> single flat file of 2500 variables by 1500 cases is almost never the
> best way to organize such data.  A normalized representation as a
> collection of interlinked tables in a relational data base is much
> more effective and less error prone.  The widespread use of
> spreadsheets or SPSS data sets or SAS data sets which encourage the
> "single table with a gargantuan number of columns, most of which are
> missing data in most cases" approach to organization of longitudinal
> data is regrettable.
>
> For later analysis in R it is better to start with "long" form of the
> data, as opposed to the "wide" form, even if it means repeating
> demographic information over several occasions.  Using a relational
> database allows for a long view to be generated without the
> possibility of inconsistency in the demographics.  I am using the
> descriptions "long" and "wide" in the sense that they are used in the
> reshape help page.  See
>
> ?reshape
>
> in R.  The long view is also called the subject/occasion view in the
> sense th

[R] Using getSYMBOL, annotate package on a list with empty elements.

2010-02-13 Thread Sahil Seth
Hi,
I have been trying to find a solution to this issue, but have not been able
to so !
I am trying to use sapply on the function getSYMBOL,
an extract from the list is:
> test.goP[13:14]
$`GO:050`
  IEA   IEA   IEA   IEA   TAS   TAS   TAS
IEA
"5270753" "5720725" "1690128" "4850681"  "110433" "2640544" "4900370"
"1430280"
  IEA   NAS   TAS   IEA
"6110044" "1170615" "6590546" "1690632"

$`GO:052`
[1] NA

goG=sapply(test.goP,getSYMBOL,data="hgu95av2")
error: "Error in .checkKeysAreWellFormed(keys) :
  keys must be supplied in a character vector with no NAs "
In this the 14th element has missing values, thus getSYMBOL raises issues.

GetSYMBOL has to be given a char array, so a simple solution is infact to
delete the missing elements from the list.

I have been trying to find a solution for it, but in vain:
tried: completecases(goP), na.omit(goP) and several other things.

Any suggestions please ?
Thanks a lot !

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.