[R] residual plots for lmer in lme4 package

2007-08-15 Thread Margaret Gardiner-Garden
Hi,  

 

I was wondering if I might be able to ask some advice about doing residual
plots for the lmer function in the lme4 package. 

 

Our group's aim is to find if the expression staining of a particular gene
in a sample (or "core")  is related to the pathology of the core.

To do this, we used the lmer function to perform a logistic mixed model
below.  I apologise in advance for the lack of subscripts.

 

 logit P(yij=1) = â0 + Ui + â1Patholij where Ui~N(0, óu2),

i indexes patient, j indexes measurement, Pathol is an indicator variable
(0,1) for benign

epithelium versus cancer and yij is the staining indicator (0,1) for each
core where yij equals 1 if the core stains positive and 0 otherwise. 

 

(I have inserted some example R code at the end of this message)

 

I was wondering if you knew how I could test that the errors Ui are normally
distributed in my fit.  I am not familiar with how to do residual plots for
a mixed logistic regression (or even for any logistic regression!). 

 

Any advice would be greatly appreciated!

 

Thanks and Regards

Marg

 

Example code:

 

lmer(Intensity.over2.hyp.canc~Pathology + (1|Patient.ID), data=
HSD17beta4.hyp.canc, family="binomial", na.action="na.omit")

 

 

   

#Family: binomial(logit link)

 #AIC  BIClogLik deviance

   # 414.1101 431.4147 -203.0550 406.1101

   #Random effects:

   # GroupsNameVarianceStd.Dev. 

   # Patient.ID (Intercept)  4.9558  2.2262 

   # of obs: 559, groups: Patient.ID, 177

 

   #Estimated scale (compare to 1)  0.6782544 

 

   #Fixed effects:

#Estimate Std. Error z value  Pr(>|z|)

   #(Intercept)  -2.057340.24881 -8.2686 < 2.2e-16 ***

   #PathologyHyperplasia -1.766270.44909 -3.9330 8.389e-05 ***

 

NB. Intensity.over2.hyp.canc is the staining of the core (ie 0 or 1)

Pathology is Hyperplasia or Cancer

 

 

Dr Margaret Gardiner-Garden

Garvan Institute of Medical Research

384 Victoria Street

Darlinghurst Sydney

NSW 2010 Australia

 

Phone: 61 2 9295 8348

Fax: 61 2 9295 8321

 

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Difficult survival model

2007-08-15 Thread Daniel Malter
Hi all,

I would appreciate your advice how to model the survival model for the
following data, especially if it can be modeled in one model or if I should
(have to) break it up (i.e. assume that some events are independent of each
other, etc.).  Data is on an experimental stock market simulation. Dependent
variable is the hazard of transition from either holding the stock to
closing the position (-1) or from not holding the stock to opening a
position in that stock (1) in the variable CHNG. Main variables of interest
are the regressors IV1 and IV2 (which are covariates, not factors). BUY and
SELL indicate the number of stocks of type STOCK that were purchased/sold in
ROUND. Accordingly, NBT and NET record the number of STOCK held at the
beginning and the end of period ROUND, respectively. HP is a 0/1 dummy
indicating whether or not shares of STOCK were held by an investor in any
given period.

And these are the issues I see in the data:

1. Over time, the stock may be held several times. This suggests a
conditional hazard. TIMESHELD measures the number of times a stock has been
held (meaning that the stock position must have been closed inbetween).
Alternatively, it could measure the number of prior transactions in the
stock (all kinds). This I intend to model with strata.

2. I also have repeated measures for individuals. As some investor may be
more of a trader than others, I want to model this with a random effect
(frailty).

3. Different stocks may have a different risk of being traded. This I would
model with another frailty term (or would I rather model it differently?).

4. In case that an investor holds a stock, there are competing risks between
increasing or decreasing the position.

5. There are also competing risks, so to speak, between the different
stocks. That is, when an investor makes a buy or sell decision, s/he decides
between all available stocks. One stocks competes with all other stocks to
be purchased. When selling a stock though, one stock only competes with the
other different stocks in the portfolio to be sold, of course.

I have no idea how to model 4. and 5. though and if this is possible. I
would also appreciate your feedback on the suggested way of modeling 1., 2.,
and 3.

Apologies for the lengthy description and thanks much for your support,
Daniel


ID  ROUND   STOCK   BUY SELLNBT NET IV1 IV2 HP
CHNGTIMESHELD
1   1   A   0   0   0   0   3   4   0
0   0
1   2   A   10  0   0   10  3   5   1
1   1
1   3   A   0   0   10  10  5   4   1
0   1
1   4   A   0   10  10  0   2   1   1
-1  1
1   5   A   0   0   0   0   3   4   0
0   1
1   1   A   20  0   0   20  2   5   1
1   2
1   2   A   0   0   20  20  4   5   1
0   2
1   3   A   0   0   20  20  2   3   1
0   2
1   4   A   0   0   20  20  5   3   1
0   2
1   5   A   0   20  20  0   4   5   1
-1  2
2   1   B   15  0   0   15  1   2   1
1   1
2   2   B   0   0   15  15  1   2   1
0   1
2   3   B   0   0   15  15  2   2   1
0   1
2   4   B   0   15  15  0   5   1   1
-1  1
2   5   B   0   0   0   0   4   4   0
0   1
2   1   B   5   0   0   5   3   4   1
1   2
2   2   B   0   0   5   5   5   2   1
0   2
2   3   B   0   5   5   0   4   2   1
-1  2
2   4   B   0   0   0   0   4   1   0
0   2
2   5   B   0   0   0   0   4   1   0
0   2


PhD Program Strategy
Dept. of Management and Organization
Robert H. Smith School of Business   
University of Maryland
Van Munching Hall   
College Park, MD  20742
www.rhsmith.umd.edu
www.umd.edu

mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting lapply() to work for a new class

2007-08-15 Thread Prof Brian Ripley
On Wed, 15 Aug 2007, Pijus Virketis wrote:

> Thank you.
>
> When I tried to set as.list() in baseenv(), I learned that its bindings
> are locked.

Of course.  Did you not see my comment about 'to protect code against 
redefining functions'?

> Does this mean that the thing to do is just to write my own
> "lapply", which does the coercion using my "private" as.list(), and then
> invokes the base lapply()?

I believe 'the thing to do' is to call your as.list explicitly.  After 
all, the first 'l' in lapply means 'list', so is it is 'natural' to call 
it on a list.

And please do NOT edit other people's messages without indication: the R 
posting guide covers that and it is a copyright violation.

>
> -P
>
> -Original Message-

Not so: an EDITED version of my message.

> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 15, 2007 5:18 PM
>
>> As far as I can tell, lapply() needs the class to be coercible to a
>> list. Even after I define as.list() and as.vector(x, mode="list")
>> methods, though, I still get an "Error in as.vector(x, "list") :
>> cannot coerce to vector". What am I doing wrong?
>
> Not considering namespaces.  Setting an S4 method for as.list() creates
> an object called as.list in your workspace, but the lapply function uses
> the as.list in the base namespace.  That's the whole point of
> namespaces: to protect code against redefining functions.
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible memory leak with R v.2.5.0

2007-08-15 Thread Peter Waltman

   I'm  working  with  a  very  large matrix ( 22k rows x 2k cols) of RNA
   expression  data with R v.2.5.0 on a RedHat Enterprise machine, x86_64
   architecture.
   The relevant code is below, but I call a function that takes a cluster
   of  this data ( a list structure that contains a $rows elt which lists
   the rows (genes ) in the cluster by ID, but not the actual data itself
   ).
   The function creates two copies of the matrix, one containing the rows
   in  the  cluster,  and  one  with  the rest of the rows in the matrix.
   After  doing  some  statistical  massaging,  the  function  returns  a
   statistical  score  for  each  rows/genes  in  the matrix, producing a
   vector of 22k elt's.
   When  I  run 'top', I see that the memory stamp of R after loading the
   matrix is ~750M.  However, after calling this function on 10 clusters,
   this  jumps  to  >  3.7 gig (at least by 'top's measurement), and this
   will not be reduced by any subsequent calls to gc().
   Output from gc() is:

 > gc()   used  (Mb) gc trigger   (Mb) max used  (Mb)
 Ncells   377925  20.26819934  364.3   604878  32.4
 Vcells 88857341 678.0  240204174 1832.7 90689707 692.0
 >

   output from top is:

PID  USER   PR   NI   VIRT   RES   SHR  S %CPU %MEMTIME+
 COMMAND
  1199 waltman   17   0 3844m 3.7g 3216 S  0.0 23.6  29:58.74 R

   Note, the relevant call that invoked my function is:

 test   <-   sapply(   c(1:10),   function(x)  get.vars.for.cluster(
 clusterStack[[x]], opt="rows" ) )

   Finally,  for  fun,  I  rm()'d  all variables with the rm( list=ls() )
   command, and then called gc().  The memory of this "empty" instance of
   R is still 3.4 gig, i.e.
   R.console:

 > rm( list=ls() )
 > ls()
 character(0)
 > gc()
used  (Mb) gc trigger   (Mb) max used  (Mb)
 Ncells   363023  19.45455947  291.4   604878  32.4
 Vcells 44434871 339.1  192163339 1466.1 90689707 692.0
 >

   Subsequent top  output:
   output from top is:

PID  USER   PR   NI   VIRT   RES   SHR  S %CPU %MEMTIME+
 COMMAND
  1199 waltman   16   0 3507m 3.4g 3216 S  0.0 21.5  29:58.92 R

   Thanks for any help or suggestions,
   Peter Waltman
   p.s.  code snippet follows.  Note, that I've added extra rm() and gc()
   calls w/in the function to try to reduce the memory stamp to no avail.

 get.vars.for.cluster   =   function(   cluster,   genes=get.global(
 "gene.ids" ), opt=c("rows","cols"),
   ratios=get.global("ratios"),  var.norm=T,
 r.sig=get.global( "r.sig" ),
 allow.anticor=get.global( "allow.anticor" )
 ) {
   cat( "phw dbg msg\n")
   cluster <<- cluster
   opt <- match.arg( opt )
   rows <- cluster$rows
   cols <- cluster$cols
   if ( opt == "rows" ) {
 cat( "phw dbg msg: if opt == rows\n" )
 r <- ratios[ rows, cols ]
 r.all <- ratios[ genes, cols ]
 avg.rows <- apply( r, 2, mean, na.rm=T ) ##median )
 rm( r )  # phw added 8/9/07
 gc( reset=TRUE ) # phw added 8/9/07
 devs <- apply( r.all, 1, "-", avg.rows )
 if ( !allow.anticor ) rm( r.all, avg.rows )  # phw added 8/9/07
 gc( reset=TRUE ) #  phw added 8/9/07
 cat( "phw dbg msg: finished calc'ing avg.rows & devs\n" )
   ##   This   is  what  we'd  use  from  the  deHoon  paper
 (bioinformatics/bth927)
 ##sd.rows <- apply( r, 2, sd )
 ##devs <- devs * devs
 ##sd.rows <- sd.rows * sd.rows
 ##sds <- apply( devs, 2, "/", sd.rows )
 ##sds <- apply( sds, 2, sum )
 ##return( log10( sds ) )
 ## This is faster and nearly equivalent
 vars <- apply( devs, 2, var, na.rm=T )
 rm( devs )
 gc( reset=TRUE ) #  phw added 8/9/07
 test <- log10( vars ) #  phw added 8/9/07
 rm( vars ) #  phw added 8/9/07
 gc( reset=TRUE ) #  phw added 8/9/07
 vars <- log10( test ) #  phw added 8/9/07
 rm( test ) #  phw added 8/9/07
 gc( reset=TRUE ) #  phw added 8/9/07
 #vars <- log10( vars )
 cat( "phw dbg msg: finished calc'ing vars (\n" )
 ## HOW TO ALLOW FOR ANTICOR??? Here's how:
 if ( allow.anticor ) {
   cat( "phw dbg msg: allow.anticor==T\n" )
  ##  Get  variance  against the inverse of the mean
 profile
   devs.2 <- apply( r.all, 1, "-", -avg.rows )
   gc( reset=TRUE ) #  phw added 8/9/07
   vars.2 <- apply( devs.2, 2, var, na.rm=T )
   rm( devs.2 )
   gc( reset=TRUE ) #  phw added 8/9/07
   vars.2 <- log10( vars.2 )
   gc( reset=TRUE ) #  phw added 8/9/07
  ##  For  each  gene  take  the  min of variance or
 anti-cor variance
   vars <- cbind( vars, vars.2 )
   rm( vars.2 )
   

Re: [R] Automatically moving cursor in RGui

2007-08-15 Thread chenxh007
In my output, I should see the last few lines.

If you want to track the output, you can try:
for (i in 1:1)
{
#   print("Who is this?")
print (i)
flush.console ()
}

Xiaohui

[EMAIL PROTECTED] wrote:
> Hi,
> 
> I have a small question about RGui. I have a piece of code that I run out of 
> TinnR. Says.
> 
> for (i in 1:1) 
>print("Who is this?")
> 
> When I send this code to RGui. On RGui output, I can only see the first, 
> perhaps 10-20 lines, and the rest I have to scroll down to see them. How can 
> I set the RGui so that the scrollbar automatically moving down to the latest 
> output content line? Anybody knows the answer to this? Thank you.
> 
> - adschai
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binomial simulation

2007-08-15 Thread Moshe Olshansky
Thank you - I wasn't aware of this function.
One can even use lchoose which allows really huge
arguments (more than 2^1000)!

--- "Lucke, Joseph F" <[EMAIL PROTECTED]>
wrote:

> C is an R function for setting contrasts in a
> factor.  Hence the funky
> error message.
> ?C
> 
> Use choose() for your C(N,k)
> ?choose
> 
> choose(200,2)
> 19900
> 
> choose(200,100)
>  9.054851e+58 
> 
> N=200; k=100; m=50; p=.6; q=.95
>
choose(N,k)*p^k*(1-p)^(N-k)*choose(k,m)*q^m*(1-q)^(k-m)
> 6.554505e-41
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf
> Of Moshe Olshansky
> Sent: Wednesday, August 15, 2007 2:06 AM
> To: sigalit mangut-leiba; r-help
> Subject: Re: [R] binomial simulation
> 
> No wonder that you are getting overflow, since
> gamma(N+1) = n! and 200! > (200/e)^200 > 10^370.
> There exists another way to compute C(N,k). Let me
> know if you need this
> and I will explain to you how this can be done.
> But do you really need to compute the individual
> probabilities? May be
> you need something else and there is no need to
> compute the individual
> probabilities?
> 
> Regards,
> 
> Moshe.
> 
> --- sigalit mangut-leiba <[EMAIL PROTECTED]> wrote:
> 
> > Thank you,
> > I'm trying to run the joint probabilty:
> > 
> > C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
> > 
> > and get the error: Error in C(N, k) : object not
> interpretable as a 
> > factor
> > 
> > so I tried the long way:
> > 
> > gamma(N+1)/(gamma(k+1)*(gamma(N-k)))
> > 
> > and the same with k, and got the error:
> > 
> > 1: value out of range in 'gammafn' in: gamma(N +
> 1)
> > 2: value out of range in 'gammafn' in: gamma(N -
> k) 
> > 
> > Do you know why it's not working?
> > 
> > Thanks again,
> > 
> > Sigalit.
> > 
> > 
> > 
> > On 8/14/07, Moshe Olshansky
> <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > As I understand this,
> > > P(T+ | D-)=1-P(T+ | D+)=0.05
> > > is the probability not to detect desease for a
> > person
> > > at ICU who has the desease. Correct?
> > >
> > > What I asked was whether it is possible to
> > mistakenly
> > > detect the desease for a person who does not
> have
> > it?
> > >
> > > Assuming that this is impossible the formula is
> > below:
> > >
> > > If there are N patients, each has a probability
> p
> > to
> > > have the desease (p=0.6 in your case) and q is
> the probability to 
> > > detect the desease for a person who
> > has
> > > it (q = 0.95 for ICU and q = 0.8 for a regular
> > unit),
> > > then
> > >
> > > P(k have the desease AND m are detected) = P(k
> have the desease)*P(m
> 
> > > are detected / k have
> > the
> > > desease) =
> > > C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
> > > where C(a,b) is the Binomial coefficient "a
> above
> > b" -
> > > the number of ways to choose b items out of a
> > (when
> > > the order does not matter). You of course must
> > assume
> > > that N >= k >= m >= 0 (otherwise the probability
> > is
> > > 0).
> > >
> > > To generate such pairs (k infected and m
> detected)
> > you
> > > can do the following:
> > >
> > > k <- rbinom(N,1,p)
> > > m <- rbinom(k,1,q)
> > >
> > > Regards,
> > >
> > > Moshe.
> > >
> > > --- sigalit mangut-leiba <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi,
> > > > The probability of false detection is: P(T+ |
> D-)=1-P(T+ | 
> > > > D+)=0.05.
> > > > and I want to find the joint probability
> > > > P(T+,D+)=P(T+|D+)*P(D+)
> > > > Thank you for your reply,
> > > > Sigalit.
> > > >
> > > >
> > > > On 8/13/07, Moshe Olshansky
> > <[EMAIL PROTECTED]>
> > > > wrote:
> > > > >
> > > > > Hi Sigalit,
> > > > >
> > > > > Do you want to find the probability P(T+ = t
> > AND
> > > > D+ =
> > > > > d) for all the combinations of t and d (for
> > ICU
> > > > and
> > > > > Reg.)?
> > > > > Is the probability of false detection (when
> > there
> > > > is
> > > > > no disease) always 0?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Moshe.
> > > > >
> > > > > --- sigalit mangut-leiba <[EMAIL PROTECTED]>
> > > > wrote:
> > > > >
> > > > > > hello,
> > > > > > I asked about this simulation a few days
> > ago,
> > > > but
> > > > > > still i can't get what i
> > > > > > need.
> > > > > > I have 2 units: icu and regular. from icu
> I
> > want
> > > > to
> > > > > > take 200 observations
> > > > > > from binomial distribution, when
> probability
> > for
> > > > > > disease is: p=0.6.
> > > > > > from regular I want to take 300
> observation
> > with
> > > > the
> > > > > > same probability: p=0.6
> > > > > > .
> > > > > > the distribution to detect disease when
> > disease
> > > > > > occurred- *for someone from
> > > > > > icu* - is: p(T+ | D+)=0.95.
> > > > > > the distribution to detect disease when
> > disease
> > > > > > occurred- *for someone from
> > > > > > reg.unit* - is: p(T+ | D+)=0.8.
> > > > > > I want to compute the joint distribution
> for
> > > > each
> 
=== message truncated ===

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listi

[R] Automatically moving cursor in RGui

2007-08-15 Thread adschai
Hi,

I have a small question about RGui. I have a piece of code that I run out of 
TinnR. Says.

for (i in 1:1) 
   print("Who is this?")

When I send this code to RGui. On RGui output, I can only see the first, 
perhaps 10-20 lines, and the rest I have to scroll down to see them. How can I 
set the RGui so that the scrollbar automatically moving down to the latest 
output content line? Anybody knows the answer to this? Thank you.

- adschai

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Covariance of data with missing values.

2007-08-15 Thread Ted Harding
On 15-Aug-07 21:16:32, Rolf Turner wrote:
> 
> I have a data matrix X (n x k, say) each row of which constitutes
> an observation of a k-dimensional random variable which I am willing,
> if not happy, to assume to be Gaussian, with mean ``mu'' and
> covariance matrix ``Sigma''.  Distinct rows of X may be assumed to
> correspond to independent realizations of this random variable.
> 
> Most rows of X (all but 240 out of 6000+ rows) contain one or more  
> missing values.
> [...]

One question, Rolf: How big is k (no of columns)?

If it's greater than 30, you may have problems with 'norm', since the
function prelim.norm() builds up its image of the places where there
are missing values as "packed integers" with code:

r <- 1 * is.na(x)

mdp <- as.integer((r %*% (2^((1:ncol(x)) - 1))) + 1)

i.e. 'x' would be nxk and have 1s where your X had missing, 0s elsewhere.
Then each row of 'x' is converted into a 32-bit integer whose "1" bits
correspond to the 1s in 'x'. You'll get "NA" warnings if k>30, and
things could go wrong!

In that case, I hope Chuck's suggestion works!

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-07   Time: 00:10:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Covariance of data with missing values.

2007-08-15 Thread Rolf Turner


Thank you ***very*** much Ted and Chuck.  I will install these two  
packages and study
them over, and then in all probability pester you with more questions  
when I find things
that I don't understand!

Thanks again.

cheers,

Rolf


On 16/08/2007, at 10:24 AM, Chuck Cleland wrote:

> Ted Harding wrote:
>> Hi Rolf!
>>
>> Have a look at the 'norm' package.
>>
>> This does just what you;re asking for (assuming multivariate
>> normal, and allowing MAR missingness -- i.e. probability of
>> missing may depend on observed values, but must not depend on
>> unobserved).
>>
>> Read the documentation for the various function *very* carefully!
>> Drop me a line if you want more info.
>>
>> Best wishes,
>> Ted.
>
>   Also, consider mlest() in the mvnmle package.
>
>> On 15-Aug-07 21:16:32, Rolf Turner wrote:
>>> I have a data matrix X (n x k, say) each row of which constitutes
>>> an observation of a k-dimensional random variable which I am  
>>> willing,
>>> if not happy, to assume to be Gaussian, with mean ``mu'' and
>>> covariance matrix ``Sigma''. Distinct rows of X may be assumed to
>>> correspond to independent realizations of this random variable.
>>>
>>> Most rows of X (all but 240 out of 6000+ rows) contain one or more
>>> missing values. If I am willing to assume that missing entries are
>>> missing completely at random (MCAR) then I can estimate the  
>>> covariance
>>> matrix Sigma via maximum likelihood, by employing the EM algorithm.
>>> Or so I believe.
>>>
>>> Has this procedure been implemented in R in an accessible form?
>>> I've had a bit of a scrounge through the searching facilities,
>>> and have gone through the FAQ, and have found nothing that I could
>>> discern to be directly relevant.
>>>
>>> Thanks for any pointers that anyone may be able to give.
>>>
>>>   cheers,
>>>
>>>   Rolf Turner
>>
>> 
>> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 15-Aug-07   Time: 23:17:48
>> -- XFMail --
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -- 
> Chuck Cleland, Ph.D.
> NDRI, Inc.
> 71 West 23rd Street, 8th floor
> New York, NY 10010
> tel: (212) 845-4495 (Tu, Th)
> tel: (732) 512-0171 (M, W, F)
> fax: (917) 438-0894


##
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting lapply() to work for a new class

2007-08-15 Thread Martin Morgan
Pijus,

My 2 cents, which might be post-hoc rationalization.

If your class 'is' a list, with additional information, then

setClass("Test", contains="list")

and you're ready to go.

On the other hand, if your class 'has' a list, along with other
information, create an accessor

setGeneric("test", function(object) standardGeneric("test"))
setMethod("test",
  signature=signature(object="Test"),
  function(object) slot(object, "test"))

obj <- new("Test", test=list(1,2,3))
lapply(test(obj), print)

(You could define a generic on 'lapply' and a method that dispatches
to your class, but that implies that Test 'is' a list).

Martin


"Pijus Virketis" <[EMAIL PROTECTED]> writes:

> Thank you.
>
> When I tried to set as.list() in baseenv(), I learned that its bindings
> are locked. Does this mean that the thing to do is just to write my own
> "lapply", which does the coercion using my "private" as.list(), and then
> invokes the base lapply()?
>
> -P
>
> -Original Message-
> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, August 15, 2007 5:18 PM
>
>> As far as I can tell, lapply() needs the class to be coercible to a 
>> list. Even after I define as.list() and as.vector(x, mode="list") 
>> methods, though, I still get an "Error in as.vector(x, "list") : 
>> cannot coerce to vector". What am I doing wrong?
>
> Not considering namespaces.  Setting an S4 method for as.list() creates
> an object called as.list in your workspace, but the lapply function uses
> the as.list in the base namespace.  That's the whole point of
> namespaces: to protect code against redefining functions.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Covariance of data with missing values.

2007-08-15 Thread Chuck Cleland
Ted Harding wrote:
> Hi Rolf!
> 
> Have a look at the 'norm' package.
> 
> This does just what you;re asking for (assuming multivariate
> normal, and allowing CAR missingness -- i.e. probability of
> missing may depend on observed values, but must not depend on
> unobserved).
> 
> Read the documentation for the various function *very* carefully!
> Drop me a line if you want more info.
> 
> Best wishes,
> Ted.

  Also, consider mlest() in the mvnmle package.

> On 15-Aug-07 21:16:32, Rolf Turner wrote:
>> I have a data matrix X (n x k, say) each row of which constitutes
>> an observation of a k-dimensional random variable which I am willing,
>> if not happy, to assume to be Gaussian, with mean ``mu'' and
>> covariance matrix ``Sigma''. Distinct rows of X may be assumed to
>> correspond to independent realizations of this random variable.
>>
>> Most rows of X (all but 240 out of 6000+ rows) contain one or more  
>> missing values. If I am willing to assume that missing entries are
>> missing completely at random (MCAR) then I can estimate the covariance
>> matrix Sigma via maximum likelihood, by employing the EM algorithm. 
>> Or so I believe.
>>
>> Has this procedure been implemented in R in an accessible form?
>> I've had a bit of a scrounge through the searching facilities,
>> and have gone through the FAQ, and have found nothing that I could
>> discern to be directly relevant.
>>
>> Thanks for any pointers that anyone may be able to give.
>>
>>   cheers,
>>
>>   Rolf Turner
> 
> 
> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 15-Aug-07   Time: 23:17:48
> -- XFMail --
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Covariance of data with missing values.

2007-08-15 Thread Ted Harding
Hi Rolf!

Have a look at the 'norm' package.

This does just what you;re asking for (assuming multivariate
normal, and allowing CAR missingness -- i.e. probability of
missing may depend on observed values, but must not depend on
unobserved).

Read the documentation for the various function *very* carefully!
Drop me a line if you want more info.

Best wishes,
Ted.

On 15-Aug-07 21:16:32, Rolf Turner wrote:
> 
> I have a data matrix X (n x k, say) each row of which constitutes
> an observation of a k-dimensional random variable which I am willing,
> if not happy, to assume to be Gaussian, with mean ``mu'' and
> covariance matrix ``Sigma''. Distinct rows of X may be assumed to
> correspond to independent realizations of this random variable.
> 
> Most rows of X (all but 240 out of 6000+ rows) contain one or more  
> missing values. If I am willing to assume that missing entries are
> missing completely at random (MCAR) then I can estimate the covariance
> matrix Sigma via maximum likelihood, by employing the EM algorithm. 
> Or so I believe.
> 
> Has this procedure been implemented in R in an accessible form?
> I've had a bit of a scrounge through the searching facilities,
> and have gone through the FAQ, and have found nothing that I could
> discern to be directly relevant.
> 
> Thanks for any pointers that anyone may be able to give.
> 
>   cheers,
> 
>   Rolf Turner


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 23:17:48
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting lapply() to work for a new class

2007-08-15 Thread Pijus Virketis
Thank you.

When I tried to set as.list() in baseenv(), I learned that its bindings
are locked. Does this mean that the thing to do is just to write my own
"lapply", which does the coercion using my "private" as.list(), and then
invokes the base lapply()?

-P

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 15, 2007 5:18 PM

> As far as I can tell, lapply() needs the class to be coercible to a 
> list. Even after I define as.list() and as.vector(x, mode="list") 
> methods, though, I still get an "Error in as.vector(x, "list") : 
> cannot coerce to vector". What am I doing wrong?

Not considering namespaces.  Setting an S4 method for as.list() creates
an object called as.list in your workspace, but the lapply function uses
the as.list in the base namespace.  That's the whole point of
namespaces: to protect code against redefining functions.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Covariance of data with missing values.

2007-08-15 Thread Rolf Turner

I have a data matrix X (n x k, say) each row of which constitutes an  
observation of
a k-dimensional random variable which I am willing, if not happy, to  
assume to be
Gaussian, with mean ``mu'' and covariance matrix ``Sigma''.  Distinct  
rows of X may
be assumed to correspond to independent realizations of this random  
variable.

Most rows of X (all but 240 out of 6000+ rows) contain one or more  
missing values.
If I am willing to assume that missing entries are missing completely  
at random (MCAR)
then I can estimate the covariance matrix Sigma via maximum  
likelihood, by
employing the EM algorithm.  Or so I believe.

Has this procedure been implemented in R in an accessible form?  I've  
had a bit of a
scrounge through the searching facilities, and have gone through the  
FAQ, and have
found nothing that I could discern to be directly relevant.

Thanks for any pointers that anyone may be able to give.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer coefficient distributions and p values

2007-08-15 Thread Steven McKinney

Hi Daniel,

You might want to review further advances by Doug Bates with lme4
since the post you show in your email.

http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3565.html
In this thread Doug Bates discusses fitting using maximum
likelihood for testing purposes.  There is now an anova()
method for lmer() and lmer2() fits performed using method="ML".  
You can compare different models and get p-values for 
p-value obsessed journals using this approach.

I believe I read a thread discussing the use of maximum
likelihood fitting for use in ANOVA tests, and then 
REML fits on final models for better parameter estimates - 
but I can't find that thread.  Hopefully Doug Bates
or anyone involved in that level of discussion can chime in.


See also for example this wiki 
http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests

and the new listserv at
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


Also check the latest documentation for lme4 and the lmer()
and lmer2() functions at
http://cran.r-project.org/
in the 
Packages ... lme4 
pages.


Hope this helps

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Daniel Lakeland
Sent: Wed 8/15/2007 9:34 AM
To: r-help@stat.math.ethz.ch
Subject: [R] lmer coefficient distributions and p values
 
I am helping my wife do some statistical analysis. She is a biologist,
and she has performed some measurements on various genotypes of
mice. My background is in applied mathematics and engineering, and I
have a fairly good statistics background, but I am by no means a PhD
level expert in statistical methods.

We have used the lmer package to fit various models for the various
experiments that she has done (random effects from multiple
measurements for each animal or each trial, and fixed effects from
developmental stage, and genotype etc). The results are fairly clear
cut to me, and I suggested that she publish the results as coefficient
estimates for the relevant contrasts, and their standard error
estimates. However, she has read the statistical guidelines for the
journal and they insist on p values.

I personally think that p values, and sharp-null hypothesis tests are
misguided and should be banned from publications, but it doesn't much
matter what I think compared to what the editors want.

Based on searching the archives, and finding this message:

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

I am aware of the theoretical difficulties with p values from lmer
results. I am also aware of the mcmcsamp function which performs some
kind of bayesian sampling from the posterior distribution of the
coefficients based on some kind of prior (I will need to do some more
reading to more fully understand this). Is this the primary way in
which we can estimate the distribution of the model coefficients and
calculate a p value or a confidence interval?

What can I do with the t statistic provided by lmer? If as the above
message suggests, we are uncertain of the correct F and by extension t
distributions to use, what help are the t statistics? I suppose I
could test them against a very low degree of freedom t distribution
(say 3) and publish those p values?

Again, I'm content to ignore p values and stick to estimates, but the
journal isn't.

BTW: thanks to all on this list, I've benefitted greatly from R and
from the archives of help topics.

-- 
Daniel Lakeland
[EMAIL PROTECTED]
http://www.street-artists.org/~dlakelan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting lapply() to work for a new class

2007-08-15 Thread Prof Brian Ripley
On Wed, 15 Aug 2007, Pijus Virketis wrote:

> I would like to get lapply() to work in the natural way on a class I've
> defined.

What you have not said is that this is an S4 class.

> As far as I can tell, lapply() needs the class to be coercible
> to a list. Even after I define as.list() and as.vector(x, mode="list")
> methods, though, I still get an "Error in as.vector(x, "list") : cannot
> coerce to vector". What am I doing wrong?

Not considering namespaces.  Setting an S4 method for as.list() creates an 
object called as.list in your workspace, but the lapply function uses the 
as.list in the base namespace.  That's the whole point of namespaces: to 
protect code against redefining functions.

This works as documented for S3 methods (since as.list is S3 generic): it 
is a 'feature' of S4 methods that deserves to be much more widely 
understood.


> # dummy class
> setClass("test", representation(test="list"))
>
> # set up as.list()
> test.as.list <- function(x) [EMAIL PROTECTED]
> setMethod("as.list", signature(x="test"), test.as.list)
>
> # set up as.vector(x, mode="list")
> test.as.vector <- function(x, mode) [EMAIL PROTECTED]
> setMethod("as.vector", signature(x="test", mode="character"),
> test.as.vector)
>
> obj <- new("test", test=list(1, 2, 3))
>
> # this produces "Error in as.vector(x, "list") : cannot coerce to
> vector" on R 2.4.1
> lapply(obj, print)
>
> # these work
> lapply(as.list(obj), print)
> lapply(as.vector(obj, "list"), print)
>
> Thank you,
>
> Pijus

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem Connecting to Oracle with R from Windows XP

2007-08-15 Thread Song, Alex
Hello,

 

I installed RGui 2.5.1 and package DBI on Windows XP and tried to connect to 
Oracle database which is on a Linux server. When I tried to use 
dbDriver("Oracle"), I got an error as follows:

 

> drv <- dbDriver("Oracle")

Error in do.call(as.character(drvName), list(...)) : 

could not find function "Oracle"

> 

 

Could anyone tell me how to connect to Oracle with R from Windows XP? Do I need 
to configure any environment variables? Do I need to configure ODBC? Do I need 
to install any other packages? I have Oracle client 10g installed on my 
computer and I can connect to the Oracle database using other client software 
like Toad, SQLPlus, or SQL Developer.

 

Thank you very much.

 

Alex

 

Alex Song 

Data Management Specialist / Spécialiste en gestion des données

National Forest Inventory / Inventaire forestier national

Pacific Forestry Centre / Centre de foresterie du Pacifique
Canadian Forest Service / Service canadien des forêts
Natural Resources Canada / Ressources naturelles Canada

Government of Canada / Gouvernement du Canada

506 West Burnside Road / 506 chemin Burnside ouest

Victoria, BC V8Z 1M5

Phone (250) 363-3342

Facs: (250) 363-0775

 

Email: [EMAIL PROTECTED]  

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AIC and logLik for logistic regression in R and S-PLUS

2007-08-15 Thread Leandra Desousa
Dear R users,

I am using 'R' version 2.2.1 and 'S-PLUS' version 6.0; and I loaded the 
MASS library in 'S-PLUS'.

I am running a logistic regression using glm:

---
 > mydata.glm<-glm(COMU~MeanPycUpT+MeanPycUpS, family=binomial, data=mydata)
---

The values in summary(mydata.glm) are identical for 'R' and 'S-PLUS' 
(except that S-PLUS does not return an AIC value).
Here is the summary(mydata.glm):
---
 >summary(mydata.glm)
Call:
glm(formula = COMU ~ MeanPycUpT + MeanPycUpS, family = binomial,data = 
mydata)
Deviance Residuals:
  Min   1Q  Median3Q   Max 
-2.3514  -0.8268  -0.4831   0.8866   1.9864

Coefficients:
 Estimate   std.Error z value Pr(>|z|)
(Intercept)   75.482 41.6161.814   0.0697 .
MeanPycUpT 1.143  2.8670.399   0.6902
MeanPycUpS-2.548  1.225   -2.081   0.0374 *
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.316  on 21  degrees of freedom
Residual deviance: 23.900  on 19  degrees of freedom
AIC: 29.9



When I use the 'AIC' and the 'logLik' command in 'R' and 'S-PLUS' I get 
different values:

'R'
-
 > AIC(mydata.glm)
[1] 29.89986

 > logLik(mydata.glm)
'log Lik.' -11.94993 (df=3)
-

'S-PLUS'
-
 > AIC(mydata.glm)
[1] 71.03222

 > logLik(mydata.glm)
[1] -31.51611
-

However, if I use the 'extractAIC' command in 'S-PLUS' the returned 
value is the same as the one in 'R'.

'R'

 > AIC(mydata.glm)
[1] 29.89986

 > extractAIC(mydata.glm)
[1]  3.0 29.89986
-

'S-PLUS'

 > extractAIC(mydata.glm)
[1]  3.0 29.89986



*
MY QUESTIONS ARE:

1) Which AIC value is the correct one?
2) Which log-likelihood value is the correct one?
3) If  'extractAIC' in 'S-PLUS' and all values in 'R' are the correct 
ones, and the 'AIC' and 'logLik' in 'S-PLUS' values are wrong then:
   Why 'S-PLUS' cannot retrieve a log-likelihood value from my glm 
object('mydata.glm'), even though it is using log-likelihood to 
calculate its residual deviance?
***

Thank you for you time and Kind attention.

Sincerely,

Leandra de Sousa
--
Ph.D Candidate
School of Fisheries and Ocean Sciences
Universisty of Alaska Fairbanks
245 O'Neill Bldg.
Fairbanks, AK
99775
e.mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann-Whitney U

2007-08-15 Thread Marc Schwartz
On Wed, 2007-08-15 at 12:06 -0600, Natalie O'Toole wrote:
> Hi,
> 
> I do want to use the Mann-Whitney test which ranks my data and then
> uses 
> those ranks rather than the actual data.
> 
> Here is the R code i am using:
> 
>  group1<- 
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> > group2<- 
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> > result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
> 0.95, 
> na.action)

You did not specify a value for the na.action argument, hence the error
message you are getting.

It defaults to 'na.omit', unless you have modified R's options.
See ?na.action for more information.

In this case, it will remove any NA values from the two vectors prior to
calculating the statistic.

The additional arguments are really superfluous here. You can simply
use:

  wilcox.test(group1, group2)

> paired = FALSE so that the Wilcoxon rank sum test which is equivalent
> to 
> the Mann-Whitney test is used (my samples are NOT paired).
> conf.level = 0.95 to specify the confidence level
> na.action is used because i have a NA value (i suspect i am not using 
> na.action in the correct manner)
> 
> When i use this code i get the following error message:
> 
> Error in arg == choices : comparison (1) is possible only for atomic
> and 
> list types
> 
> When i use this code:
> 
>  group1<- 
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> > group2<- 
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> > result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
> 0.95)
> 
> I get the following result:
> 
>   Wilcoxon rank sum test with continuity correction
> 
> data:  group1 and group2 
> W = 405.5, p-value = 0.6494
> alternative hypothesis: true location shift is not equal to 0 
> 
> Warning message:
> cannot compute exact p-value with ties in:
> wilcox.test.default(group1, 
> group2, paired = FALSE, conf.level = 0.95) 
> 
> The W value here is 405.5 with a p-value of 0.6494
> 
> 
> in SPSS, i am ranking my data and then performing a Mann-Whitney U by 
> selecting analyze - non-parametric tests - 2 independent samples  and
> then 
> checking off the Mann-Whitney U test.
> 
> For the Mann-Whitney test in SPSS i am gettting the following results:
> 
> Mann-Whitney U = 350.5
>  2- tailed p value = 0.643
> 
> I think maybe the descrepancy has to do with the specification of the
> NA 
> values in R, but i'm not sure.
> 
> 
> If anyone has any suggestions, please let me know!
> 
> I hope i have provided enough information to convey my problem.
> 
> Thank-you, 
> 
> Nat

It would appear that SPSS is reversing the two groups in it's
calculation and NOT using a correction by default.

If you review the internal code for wilcox.test(), by using:

  stats:::wilcox.test.default

you can see that the relevant code in this case is:

  r <- rank(c(x - mu, y))
  n.x <- as.double(length(x))
  n.y <- as.double(length(y))

  STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2


Thus, if we use 'x' and 'y' for your two groups, respectively, we get:

x <- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,
   1.9,1.96, 2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,
   3.07,3.3)

y <- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,
   1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,
   2.77,2.9,2.94,NA)

mu <- 0


# Now remove the NA values
x <- na.omit(x)
y <- na.omit(y)


r <- rank(c(x - mu, y))
n.x <- as.double(length(x))
n.y <- as.double(length(y))

> r
 [1]  5.0  8.0  9.0 10.5 13.0 14.0 16.0 16.0 16.0 19.0 22.0 24.0 26.5
[14] 28.0 30.5 30.5 35.0 36.5 36.5 38.5 38.5 40.0 42.5 46.0 50.5 50.5
[27] 54.0 55.0  1.0  2.0  3.0  4.0  6.0  7.0 10.5 12.0 18.0 20.0 21.0
[40] 23.0 25.0 26.5 29.0 32.0 33.0 34.0 41.0 42.5 44.0 45.0 47.0 48.0
[53] 49.0 52.0 53.0

> n.x
[1] 28

> n.y
[1] 27


STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2

> STATISTIC
[1] 405.5

This is the value you get with R as you have used it.


Now, to replicate the statistic in SPSS, use the following code, with x
and y interchanged:

r <- rank(c(y - mu, x))
n.x <- as.double(length(x))
n.y <- as.double(length(y))

STATISTIC <- sum(r[seq_along(y)]) - n.y * (n.y + 1)/2

So we get:

> r
 [1]  1.0  2.0  3.0  4.0  6.0  7.0 10.5 12.0 18.0 20.0 21.0 23.0 25.0
[14] 26.5 29.0 32.0 33.0 34.0 41.0 42.5 44.0 45.0 47.0 48.0 49.0 52.0
[27] 53.0  5.0  8.0  9.0 10.5 13.0 14.0 16.0 16.0 16.0 19.0 22.0 24.0
[40] 26.5 28.0 30.5 30.5 35.0 36.5 36.5 38.5 38.5 40.0 42.5 46.0 50.5
[53] 50.5 54.0 55.0

> n.x
[1] 28

> n.y
[1] 27

> STATISTIC
[1] 350.5


So we now match SPSS' calculation of the statistic.


Now, to complete the process and replicate th

Re: [R] time series periodic data

2007-08-15 Thread Rolf Turner

On 16/08/2007, at 12:26 AM, Petr PIKAL wrote:

> Dear all
>
> Please help me with analysis of some periodic data.
>
> I have an output from measurement each minute and this output is  
> modulated
> by rotation of the equipment (approx 6.5 min/revolution). I can easily
> spot this frequency from
>
> spectrum(mydata, some suitable span)
>
> However from other analysis I suspect there is a longer term  
> oscilation
> (about 70-80 min) I am not able to find it from mentioned data.
>
> Plese give me some hint how I could prove that such long term  
> modulation
> of my data exist in presence of quite strong modulation by this  
> short term
> oscilations.

You may be able to get some mileage out of complex demodulation
(at, say, frequencies of 1/70, 1/71, ..., 1/80 cycles per minute).

See Peter Bloomfield's book (``Fourier Analysis of Time Series:  An  
Introduction'',
2nd ed., Wiley, 2000) for a good introduction to complex demodulation.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting lapply() to work for a new class

2007-08-15 Thread Pijus Virketis
Hi,

I would like to get lapply() to work in the natural way on a class I've
defined. As far as I can tell, lapply() needs the class to be coercible
to a list. Even after I define as.list() and as.vector(x, mode="list")
methods, though, I still get an "Error in as.vector(x, "list") : cannot
coerce to vector". What am I doing wrong?

# dummy class
setClass("test", representation(test="list"))

# set up as.list()
test.as.list <- function(x) [EMAIL PROTECTED]
setMethod("as.list", signature(x="test"), test.as.list)

# set up as.vector(x, mode="list")
test.as.vector <- function(x, mode) [EMAIL PROTECTED]
setMethod("as.vector", signature(x="test", mode="character"),
test.as.vector)

obj <- new("test", test=list(1, 2, 3))

# this produces "Error in as.vector(x, "list") : cannot coerce to
vector" on R 2.4.1
lapply(obj, print)

# these work
lapply(as.list(obj), print)
lapply(as.vector(obj, "list"), print)

Thank you,

Pijus

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann-Whitney U

2007-08-15 Thread Peter Dalgaard
Lucke, Joseph F wrote:
> R and SPSS are using different but equivalent statistics.  R is using
> the rank sum of group1 adjusted for the mean rank. SPSS is using the
> rank sum of group2 adjusted for the mean rank. 
>
>   
Close: It is the _minimum_ possible rank sum that is getting subtracted. 
If everyone in group1 is less than everyone in group2, R's W statistic  
will be zero. Other way around in SPSS.

> Example.
>   
>> G1=group1
>> G2=group2[-length(group2)] #get rid of the NA
>> n1=length(G1) #n1=28
>> n2=length(G2) #n2=27
>> 
> # convert to ranks
>   
>> W=rank(c(G1,G2))
>> R1=W[1:n1] #put the ranks back into the groups
>> R2=W[n1+1:n2]
>> 
> #Get the sum of the ranks for each group
>   
>> W1=sum(R1)
>> W2=sum(R2)
>> 
> #Adjust for mean rank for group 1
>   
>> W1-n1*(n1+1)/2
>> 
> [1] 405.5
> #Adjust for mean rank for group 2
>   
>> W2-n2*(n2+1)/2
>> 
> [1] 350.5
>
> W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result.
>
> Ties throw a wrench in the works.  R uses a continuity correction by
> default, SPSS does not.
> Taking out the continuity correction,
>   
>> wilcox.test(G1,G2,correct=FALSE)
>> 
>
> Wilcoxon rank sum test
>
> data:  G1 and G2 
> W = 405.5, p-value = 0.6433
> alternative hypothesis: true location shift is not equal to 0 
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(G1, G2,
> correct = FALSE) 
>
> This p-value is the same as SPSS's.
>
>
> Consult a serious non-parametrics text.  I used
> Lehmann, E. L., Nonparametrics: Statistical methods based on ranks.
> 1975. Holden-Day. San Francisco, CA.
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Natalie O'Toole
> Sent: Wednesday, August 15, 2007 1:07 PM
> To: r-help@stat.math.ethz.ch
> Subject: Re: [R] Mann-Whitney U
>
> Hi,
>
> I do want to use the Mann-Whitney test which ranks my data and then uses
> those ranks rather than the actual data.
>
> Here is the R code i am using:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>   
>> group2<-
>> 
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>   
>> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 
>> 0.95,
>> 
> na.action)
>
> paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
> the Mann-Whitney test is used (my samples are NOT paired).
> conf.level = 0.95 to specify the confidence level na.action is used
> because i have a NA value (i suspect i am not using na.action in the
> correct manner)
>
> When i use this code i get the following error message:
>
> Error in arg == choices : comparison (1) is possible only for atomic and
> list types
>
> When i use this code:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>   
>> group2<-
>> 
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>   
>> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 
>> 0.95)
>> 
>
> I get the following result:
>
>   Wilcoxon rank sum test with continuity correction
>
> data:  group1 and group2
> W = 405.5, p-value = 0.6494
> alternative hypothesis: true location shift is not equal to 0 
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(group1,
> group2, paired = FALSE, conf.level = 0.95) 
>
> The W value here is 405.5 with a p-value of 0.6494
>
>
> in SPSS, i am ranking my data and then performing a Mann-Whitney U by
> selecting analyze - non-parametric tests - 2 independent samples  and
> then checking off the Mann-Whitney U test.
>
> For the Mann-Whitney test in SPSS i am gettting the following results:
>
> Mann-Whitney U = 350.5
>  2- tailed p value = 0.643
>
> I think maybe the descrepancy has to do with the specification of the NA
> values in R, but i'm not sure.
>
>
> If anyone has any suggestions, please let me know!
>
> I hope i have provided enough information to convey my problem.
>
> Thank-you, 
>
> Nat
> __
>
>
> Natalie,
>
> It's best to provide at least a sample of your data.  Your field names 
> suggest 
> that your data might be collected in units of mm^2 or some similar 
> measurement of area.  Why do you want to use Mann-Whitney, which will
> rank 
>
> your data and then use those ranks rather than your actual data?  Unless
>
> your 
> sample is quite small, why not use a two sample t-test?  Also,are your 
> samples paired?  If they aren't, did you use the "paired = FALSE"
> option?
>
> JWDougherty
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the p

Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Nick Chorley
Thanks everyone!

NC

On 15/08/07, Ted Harding <[EMAIL PROTECTED]> wrote:
>
> On 15-Aug-07 18:17:27, Greg Snow wrote:
> >
> > 
> >
> > Ted Harding wrote:
> > [snip]
> >
> >> So you if you want the density plot, you would need to calculate
> >> this for yourself. E.g.
> >>
> >> H1$density <- counts/sum(counts)
> >> plot.histogram(H1,freq=FALSE)
> >
> > shouldn't that be:
> > H1$density <- counts/sum(counts)/diff(brkpts)
> >
> > ?
>
> Of course! Thank you.
> Ted.
>
> 
> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 15-Aug-07   Time: 20:23:13
> -- XFMail --
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann-Whitney U

2007-08-15 Thread Lucke, Joseph F
R and SPSS are using different but equivalent statistics.  R is using
the rank sum of group1 adjusted for the mean rank. SPSS is using the
rank sum of group2 adjusted for the mean rank. 

Example.
> G1=group1
> G2=group2[-length(group2)] #get rid of the NA
> n1=length(G1) #n1=28
> n2=length(G2) #n2=27
# convert to ranks
> W=rank(c(G1,G2))
> R1=W[1:n1] #put the ranks back into the groups
> R2=W[n1+1:n2]
#Get the sum of the ranks for each group
> W1=sum(R1)
> W2=sum(R2)
#Adjust for mean rank for group 1
> W1-n1*(n1+1)/2
[1] 405.5
#Adjust for mean rank for group 2
> W2-n2*(n2+1)/2
[1] 350.5

W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result.

Ties throw a wrench in the works.  R uses a continuity correction by
default, SPSS does not.
Taking out the continuity correction,
> wilcox.test(G1,G2,correct=FALSE)

Wilcoxon rank sum test

data:  G1 and G2 
W = 405.5, p-value = 0.6433
alternative hypothesis: true location shift is not equal to 0 

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(G1, G2,
correct = FALSE) 

This p-value is the same as SPSS's.


Consult a serious non-parametrics text.  I used
Lehmann, E. L., Nonparametrics: Statistical methods based on ranks.
1975. Holden-Day. San Francisco, CA.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Natalie O'Toole
Sent: Wednesday, August 15, 2007 1:07 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Mann-Whitney U

Hi,

I do want to use the Mann-Whitney test which ranks my data and then uses
those ranks rather than the actual data.

Here is the R code i am using:

 group1<-
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<-
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 
> 0.95,
na.action)

paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
the Mann-Whitney test is used (my samples are NOT paired).
conf.level = 0.95 to specify the confidence level na.action is used
because i have a NA value (i suspect i am not using na.action in the
correct manner)

When i use this code i get the following error message:

Error in arg == choices : comparison (1) is possible only for atomic and
list types

When i use this code:

 group1<-
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<-
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 
> 0.95)

I get the following result:

  Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 405.5, p-value = 0.6494
alternative hypothesis: true location shift is not equal to 0 

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(group1,
group2, paired = FALSE, conf.level = 0.95) 

The W value here is 405.5 with a p-value of 0.6494


in SPSS, i am ranking my data and then performing a Mann-Whitney U by
selecting analyze - non-parametric tests - 2 independent samples  and
then checking off the Mann-Whitney U test.

For the Mann-Whitney test in SPSS i am gettting the following results:

Mann-Whitney U = 350.5
 2- tailed p value = 0.643

I think maybe the descrepancy has to do with the specification of the NA
values in R, but i'm not sure.


If anyone has any suggestions, please let me know!

I hope i have provided enough information to convey my problem.

Thank-you, 

Nat
__


Natalie,

It's best to provide at least a sample of your data.  Your field names 
suggest 
that your data might be collected in units of mm^2 or some similar 
measurement of area.  Why do you want to use Mann-Whitney, which will
rank 

your data and then use those ranks rather than your actual data?  Unless

your 
sample is quite small, why not use a two sample t-test?  Also,are your 
samples paired?  If they aren't, did you use the "paired = FALSE"
option?

JWDougherty

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




 

This communication is intended for the use of the recipient to which it
is 
addressed, and may
contain confidential, personal, and or privileged information. Please 
contact the sender
immediately if you are not the intended recipient of this communication,

and do not copy,
distribute, or take action relying on it. Any communication rece

[R] Mann-Whitney U test discrepancies

2007-08-15 Thread Natalie O'Toole

   Hi,
   I  do  want  to use the Mann-Whitney test which ranks my data and then
   uses
   those ranks rather than the actual data.
   Here is the R code i am using:
   group1<-
   c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,
   2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
   > group2<-
   c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1
   .97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
   >  result  <-   wilcox.test(group1, group2, paired=FALSE, conf.level =
   0.95,
   na.action)
   paired  = FALSE so that the Wilcoxon rank sum test which is equivalent
   to
   the Mann-Whitney test is used (my samples are NOT paired).
   conf.level = 0.95 to specify the confidence level
   na.action is used because i have a NA value (i suspect i am not using
   na.action in the correct manner)
   When i use this code i get the following error message:
   Error  in  arg == choices : comparison (1) is possible only for atomic
   and
   list types
   When i use this code:
   group1<-
   c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,
   2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
   > group2<-
   c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1
   .97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
   >  result  <-   wilcox.test(group1, group2, paired=FALSE, conf.level =
   0.95)
   I get the following result:
Wilcoxon rank sum test with continuity correction
   data:  group1 and group2
   W = 405.5, p-value = 0.6494
   alternative hypothesis: true location shift is not equal to 0
   Warning message:
   cannot compute exact p-value with ties in: wilcox.test.default(group1,
   group2, paired = FALSE, conf.level = 0.95)
   The W value here is 405.5 with a p-value of 0.6494
   in SPSS, i am ranking my data and then performing a Mann-Whitney U by
   selecting  analyze - non-parametric tests - 2 independent samples  and
   then
   checking off the Mann-Whitney U test.
   For the Mann-Whitney test in SPSS i am gettting the following results:
   Mann-Whitney U = 350.5
   2- tailed p value = 0.643
   I  think maybe the descrepancy has to do with the specification of the
   NA
   values in R, but i'm not sure.
   If anyone has any suggestions, please let me know!
   I hope i have provided enough information to convey my problem.
   Thank-you,
   Nat
   __
   
   This  communication  is intended for the use of the recipient to which
   it is
   addressed, and may contain confidential, personal, and or privileged
   information. Please contact the sender immediately if you are not the
   intended recipient of this communication, and do not copy, distribute,
   or
   take action relying on it. Any communication received in error, or
   subsequent reply, should be deleted or destroyed.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Ted Harding
On 15-Aug-07 18:17:27, Greg Snow wrote:
> 
> 
> 
> Ted Harding wrote:
> [snip]
> 
>> So you if you want the density plot, you would need to calculate
>> this for yourself. E.g.
>> 
>> H1$density <- counts/sum(counts)
>> plot.histogram(H1,freq=FALSE)
> 
> shouldn't that be:
> H1$density <- counts/sum(counts)/diff(brkpts)
> 
> ?

Of course! Thank you.
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 20:23:13
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loading JMP Files

2007-08-15 Thread Marc Schwartz
You might want to specifically look at R Windows FAQ:

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file

In Windows, you need to use double back-slashes or use Unix-like
forward-slashes. So, for example, use:

  read.xport("D:\\Databases\\nameoffile.xpt")

Additionally, on data import matters, reading the R Data Import/Export
Manual, available online at:

  http://cran.r-project.org/doc/manuals/R-data.html

or from the menus in your R installation, would be helpful.

Lastly, you should not have needed to download the 'foreign' *package*,
as it is part of the default R installation. Using:

  library(foreign)

should have been sufficient.

HTH,

Marc Schwartz

On Wed, 2007-08-15 at 11:43 -0700, Diana C. Dolan wrote:
> Hi,
> I know how to use SPSS and JMP, and have quite a few
> JMP files I would like to use in R.  I converted them
> to .xpt files, downloaded the 'foreign' library then
> tried this command:
> 
> >read.xport("D:\\Databases\nameoffile.xpt")
> 
> to which I get:
> 
> >Error in lookup.xport(file) : unable to open file
> 
> I have read FAQ lists and Google searched and cannot
> figure out what I'm doing wrong that my files won't
> open.  I tried saving to the C drive, but no luck
> there.  I also have no luck getting R to read my SPSS
> files with read.spss
> 
> My file names do have spaces and dashes, and I do have
> variables/variable names longer than 8 characters.
> 
> Please help!  I am very new to R and do not understand
> all the package reference manuals...I can not seem to
> find a simple, basic guide to how to command R and use
> basic functions without a bunch of jargon (eg 'usage'
> and 'arguments').  It would help to at least be able
> to load my files to practice on.
> 
> Any help would be appreciated!
> Thanks,
> Diana
> 
> 
>
> 
> Pinpoint customers who are looking for what you sell.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loading JMP Files

2007-08-15 Thread Liaw, Andy
JMP can write CSV, and that's probably a safer choice than XPT.

Andy 

From: Diana C. Dolan
> 
> Hi,
> I know how to use SPSS and JMP, and have quite a few
> JMP files I would like to use in R.  I converted them
> to .xpt files, downloaded the 'foreign' library then
> tried this command:
> 
> >read.xport("D:\\Databases\nameoffile.xpt")
> 
> to which I get:
> 
> >Error in lookup.xport(file) : unable to open file
> 
> I have read FAQ lists and Google searched and cannot
> figure out what I'm doing wrong that my files won't
> open.  I tried saving to the C drive, but no luck
> there.  I also have no luck getting R to read my SPSS
> files with read.spss
> 
> My file names do have spaces and dashes, and I do have
> variables/variable names longer than 8 characters.
> 
> Please help!  I am very new to R and do not understand
> all the package reference manuals...I can not seem to
> find a simple, basic guide to how to command R and use
> basic functions without a bunch of jargon (eg 'usage'
> and 'arguments').  It would help to at least be able
> to load my files to practice on.
> 
> Any help would be appreciated!
> Thanks,
> Diana
> 
> 
>
> __
> __
> Pinpoint customers who are looking for what you sell.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RFclustering - is it available in R?

2007-08-15 Thread Liaw, Andy
Basically the random forest algorithm can generate a proximity 
matrix of the data, and it's up to you how you would want to 
proceed from there.  You can feed that into clustering 
algorithms that accept a similarity matrix, or turn it into a 
distance matrix for clustering algorithms that need a distance
matrix (e.g., hclust()).  You may or may not want to do 
ordination as the UCLA folks suggest.

I think this is one of the great things about working in R:
you have the freedom to choose how you want to proceed from
some intermediate result, and not locked in to something some
one decide to hardwire into the software.

Andy

From: Gavin Simpson
> 
> On Wed, 2007-08-15 at 09:44 -0700, David Katz wrote:
> > Several searches turned up nothing. Perhaps I will try to 
> implement it if
> > nobody else has. Thanks.
> 
> You can do this with Andy Liaw's randomForest package can do this and
> the first hit on a Google search (on term "RFclustering") was this:
> 
> http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclust
> ering.htm
> 
> which shows how one might go about this with some helper functions.
> 
> G
> 
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Gavin Simpson [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
>  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sampling weights in glm

2007-08-15 Thread Thomas Lumley
On Wed, 15 Aug 2007, Werner Wernersen wrote:

> Hi,
>
> I have a supposedly representative sample from a
> stratified survey. Hence, all observations have an
> associated sample weight each which inflate the sample
> to population size.
>
> I want to run a glm probit regression on the data but
> I am not clear about if or better how I can use the
> weights in it. From the description of the weights
> argument of glm it seems to me that I cannot plug
> these weights in there.
>

You want svyglm() in the survey package.

  -thomas


Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loading JMP Files

2007-08-15 Thread Diana C. Dolan
Hi,
I know how to use SPSS and JMP, and have quite a few
JMP files I would like to use in R.  I converted them
to .xpt files, downloaded the 'foreign' library then
tried this command:

>read.xport("D:\\Databases\nameoffile.xpt")

to which I get:

>Error in lookup.xport(file) : unable to open file

I have read FAQ lists and Google searched and cannot
figure out what I'm doing wrong that my files won't
open.  I tried saving to the C drive, but no luck
there.  I also have no luck getting R to read my SPSS
files with read.spss

My file names do have spaces and dashes, and I do have
variables/variable names longer than 8 characters.

Please help!  I am very new to R and do not understand
all the package reference manuals...I can not seem to
find a simple, basic guide to how to command R and use
basic functions without a bunch of jargon (eg 'usage'
and 'arguments').  It would help to at least be able
to load my files to practice on.

Any help would be appreciated!
Thanks,
Diana


   

Pinpoint customers who are looking for what you sell.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in building R

2007-08-15 Thread Prof Brian Ripley
On Wed, 15 Aug 2007, Giovanni Petris wrote:

>
> Hello,
>
> I am upgrading to the current R 2.5.1 under Sun Solaris 8.

Actually, 2.5.1 is not current: '2.5.1 patched' aka R-patched is and this 
has already been addressed there.

>  I call the configure script with the --without-readline flag, and it 
> works fine. Then, when I invoke make, I get this kind of error messages:
>
>
> make[2]: Entering directory `/usr/local/R/R-2.5.1-inst/src/library'
> >>> Building/Updating help pages for package 'base'
> Formats: text html latex example
> Can't use an undefined value as filehandle reference at 
> /usr/local/R/R-2.5.1-inst/share/perl/R/Rdconv.pm line 78.
> >>> Building/Updating help pages for package 'tools'
> Formats: text html latex example
> Can't use an undefined value as filehandle reference at 
> /usr/local/R/R-2.5.1-inst/share/perl/R/Rdconv.pm line 78.
> >>> Building/Updating help pages for package 'utils'
> Formats: text html latex example
> Can't use an undefined value as filehandle reference at 
> /usr/local/R/R-2.5.1-inst/share/perl/R/Rdconv.pm line 78.
>
>
> (I don't know if this has to do with perl, but I have version 5.005_03)

It does.  My memory is that version of Perl predates Solaris 8 (it comes 
from the 1990's).  You need Perl >= 5.6.1, and I would suggest installing 
Perl 5.8.x (which is already 6 years' old) as the next version of R will 
require it.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Independent Variable Rankings with MARS

2007-08-15 Thread sifter77

Hi,

I am trying to extract MARS' component analysis from the model. 
Essentially, for an input matrix where each row is a different device, and
each column is a different measurement, I'd like to know which columns were
selected as most important for making the model.  

I saw the all.terms and selected.terms parameters in the model output and
documentation, but could not figure out how to map them to the columns of my
input matrix if they are in fact related to my question.

Would somebody with knowledge of the MARS model please help me?

Thank you.

~James
-- 
View this message in context: 
http://www.nabble.com/Independent-Variable-Rankings-with-MARS-tf4274882.html#a12167779
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] yahoo finance data feed to R

2007-08-15 Thread Nestor Arguea
On Wednesday 15 August 2007 12:27 pm, Szymon Plucinski wrote:
> Hello,
>
> I was wondering if it is possible to create a live data feed from Yahoo
> Finance stock data into an R program? Do any such modules already exist?
> Thanks for any help.
>
> Szymon
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

library(fCalendar)
?yahooImport

-- 
Nestor M. Arguea, Chair
Department of Marketing and Economics
University of West Florida
11000 University Parkway
Pensacola, FL 32514
Phone: (850)474-3071
Fax: (850)474-3069

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in building R

2007-08-15 Thread Giovanni Petris

Hello,

I am upgrading to the current R 2.5.1 under Sun Solaris 8.  I call the
configure script with the --without-readline flag, and it works
fine. Then, when I invoke make, I get this kind of error messages:


make[2]: Entering directory `/usr/local/R/R-2.5.1-inst/src/library'
 >>> Building/Updating help pages for package 'base'
 Formats: text html latex example 
Can't use an undefined value as filehandle reference at 
/usr/local/R/R-2.5.1-inst/share/perl/R/Rdconv.pm line 78.
 >>> Building/Updating help pages for package 'tools'
 Formats: text html latex example 
Can't use an undefined value as filehandle reference at 
/usr/local/R/R-2.5.1-inst/share/perl/R/Rdconv.pm line 78.
 >>> Building/Updating help pages for package 'utils'
 Formats: text html latex example 
Can't use an undefined value as filehandle reference at 
/usr/local/R/R-2.5.1-inst/share/perl/R/Rdconv.pm line 78.


(I don't know if this has to do with perl, but I have version 5.005_03)

Does anybody have a suggestion about how to fix the problem? 

Thanks in advance,
Giovanni

-- 

Giovanni Petris  <[EMAIL PROTECTED]>
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Greg Snow



Ted Harding wrote:
[snip]

> So you if you want the density plot, you would need to calculate
> this for yourself. E.g.
> 
> H1$density <- counts/sum(counts)
> plot.histogram(H1,freq=FALSE)

shouldn't that be:
H1$density <- counts/sum(counts)/diff(brkpts)

?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann-Whitney U

2007-08-15 Thread Natalie O'Toole
Hi,

I do want to use the Mann-Whitney test which ranks my data and then uses 
those ranks rather than the actual data.

Here is the R code i am using:

 group1<- 
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<- 
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95, 
na.action)

paired = FALSE so that the Wilcoxon rank sum test which is equivalent to 
the Mann-Whitney test is used (my samples are NOT paired).
conf.level = 0.95 to specify the confidence level
na.action is used because i have a NA value (i suspect i am not using 
na.action in the correct manner)

When i use this code i get the following error message:

Error in arg == choices : comparison (1) is possible only for atomic and 
list types

When i use this code:

 group1<- 
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<- 
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95)

I get the following result:

  Wilcoxon rank sum test with continuity correction

data:  group1 and group2 
W = 405.5, p-value = 0.6494
alternative hypothesis: true location shift is not equal to 0 

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(group1, 
group2, paired = FALSE, conf.level = 0.95) 

The W value here is 405.5 with a p-value of 0.6494


in SPSS, i am ranking my data and then performing a Mann-Whitney U by 
selecting analyze - non-parametric tests - 2 independent samples  and then 
checking off the Mann-Whitney U test.

For the Mann-Whitney test in SPSS i am gettting the following results:

Mann-Whitney U = 350.5
 2- tailed p value = 0.643

I think maybe the descrepancy has to do with the specification of the NA 
values in R, but i'm not sure.


If anyone has any suggestions, please let me know!

I hope i have provided enough information to convey my problem.

Thank-you, 

Nat
__


Natalie,

It's best to provide at least a sample of your data.  Your field names 
suggest 
that your data might be collected in units of mm^2 or some similar 
measurement of area.  Why do you want to use Mann-Whitney, which will rank 

your data and then use those ranks rather than your actual data?  Unless 
your 
sample is quite small, why not use a two sample t-test?  Also,are your 
samples paired?  If they aren't, did you use the "paired = FALSE" option?

JWDougherty

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




 

This communication is intended for the use of the recipient to which it is 
addressed, and may
contain confidential, personal, and or privileged information. Please 
contact the sender
immediately if you are not the intended recipient of this communication, 
and do not copy,
distribute, or take action relying on it. Any communication received in 
error, or subsequent
reply, should be deleted or destroyed.



 

This communication is intended for the use of the recipient to which it is 
addressed, and may
contain confidential, personal, and or privileged information. Please 
contact the sender
immediately if you are not the intended recipient of this communication, 
and do not copy,
distribute, or take action relying on it. Any communication received in 
error, or subsequent
reply, should be deleted or destroyed.

 

This communication is intended for the use of the recipient to which it is 
addressed, and may
contain confidential, personal, and or privileged information. Please 
contact the sender
immediately if you are not the intended recipient of this communication, 
and do not copy,
distribute, or take action relying on it. Any communication received in 
error, or subsequent
reply, should be deleted or destroyed.
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA: Factorial designs with a separate control

2007-08-15 Thread Charles C. Berry


Lamac,

The ANOVA shown in 'pamph14' may not be suitable for your data.

If the replicates are separate experiments or blocks, you will need to 
become familiar with the nlme package and the lme function in it. ( Your 
labelling of replicates suggests that this is the case, viz. no
A==1 & B == 3 & replicate == 1 combination was found.) There is an 
excellent book that serves as reference for that package (Pinheiro, J.C., 
and Bates, D.M. (2000) "Mixed-Effects Models in S and S-PLUS", Springer.)
If this turns out to be too 'deep' for you, you will do best to find a 
statistician who is well versed in linear mixed effects models to help 
you.

--

However, if the replicates are independent realizations, then

> # copy Lamac's data to the clipboard first
>
> dat <- read.table("clipboard",header=T)
> fit <- lm ( response ~ I( A==0 ) + as.factor(A)*as.factor(B), dat )
> anova( fit )

Gives you the sequential sums of squares ( your design is unbalanced hence 
the 'sequential' qualifier ). If you want what Wendy Bergerud (author of 
'pamph14') called the 'TREAT' sum of squares, you can add all but the 
'Residuals' SS. Likewise with the df.

You can also get the TREAT ANOVA table directly by revising the formula in 
lm() above. The resulting formula is very simple, and you should have some 
fun - as well as strengthen your skill with formulas in R - trying to find 
it. You may want to review Chapter 11 Statistical models in R in 
An Introduction to R before trying this.

HTH,

Chuck

On Tue, 14 Aug 2007, lamack lamack wrote:

> Dear all, I would like to run in R the anova showed in the following
> pamphlet.
>
> http://www.for.gov.bc.ca/hre/biopamph/pamp14.pdf
>
> For A = 0 and B =0 I have de control group.
>
> Best regards.
>
> AB   replication   response
> 0 0 1 24
> 0 0 2 27
> 0 0 3 36
> 0 0 4 28
> 0 0 5 32
> 1 1 1 43
> 1 1 2 39
> 1 1 3 32
> 1 1 4 36
> 1 1 5 50
> 1 2 1 34
> 1 2 2 35
> 1 2 3 45
> 1 2 4 37
> 1 2 5 52
> 1 3 2 34
> 1 3 3 35
> 1 3 4 58
> 1 3 5 35
> 1 4 1 26
> 1 4 2 49
> 1 4 3 44
> 1 4 4 39
> 1 4 5 37
> 1 5 1 37
> 1 5 2 32
> 1 5 3 33
> 1 5 4 37
> 1 5 5 36
> 2 1 1 43
> 2 1 2 33
> 2 1 3 33
> 2 1 4 38
> 2 1 5 41
> 2 2 1 42
> 2 2 2 45
> 2 2 3 34
> 2 2 4 29
> 2 2 5 33
> 2 3 1 33
> 2 3 2 42
> 2 3 3 30
> 2 3 4 28
> 2 3 5 40
> 2 4 1 39
> 2 4 2 33
> 2 4 3 20
> 2 4 4 29
> 2 4 5 24
> 2 5 1 37
> 2 5 2 46
> 2 5 3 33
> 2 5 4 27
> 2 5 5 34
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] yahoo finance data feed to R

2007-08-15 Thread Szymon Plucinski
Hello,

I was wondering if it is possible to create a live data feed from Yahoo
Finance stock data into an R program? Do any such modules already exist?
Thanks for any help.

Szymon

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formula in lm inside lapply

2007-08-15 Thread Gabor Grothendieck
Here is another solution that gets around the non-standard
way that subset= is handled in lm.  It has the advantage that unlike
the previous solution where formula1 and group == x appear literally
in the output, in this one the formula appears written out and
group == "A" and group == "B" appear:

> lapply(levels(DF$group), function(x) do.call("lm",
+list(formula1, quote(DF), subset = bquote(group == .(x)
[[1]]

Call:
lm(formula = y ~ x1, data = DF, subset = group == "A")

Coefficients:
(Intercept)   x1
1.04855  0.04585


[[2]]

Call:
lm(formula = y ~ x1, data = DF, subset = group == "B")

Coefficients:
(Intercept)   x1
1.13593 -0.01627


On 8/15/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> It can't find x since the environment of formula1 and of formula2 is the 
> Global
> Environment and x is not there -- its local to the function.
>
> Try this:
>
> #generating data
> set.seed(1)
> DF <- data.frame(y = rnorm(100, 1), x1 = rnorm(100, 1), x2 = rnorm(100, 1),
>  group = rep(c("A", "B"), c(40, 60)))
>
> formula1 <- as.formula(y ~ x1)
> lapply(levels(DF$group), function(x) {
>   environment(formula1) <- environment()
>   lm(formula1, DF, subset = group == x)
> })
>
> formula2 <- as.formula(y ~ x1 + x2)
> lapply(levels(DF$group), function(x) {
>   environment(formula2) <- environment()
>   lm(formula2, DF, subset = group == x)
> })
>
>
>
> On 8/15/07, Li, Yan (IED) <[EMAIL PROTECTED]> wrote:
> > I am trying to run separate regressions for different groups of
> > observations using the lapply function. It works fine when I write the
> > formula inside the lm() function. But I would like to pass formulae into
> > lm(), so I can do multiple models more easily. I got an error message
> > when I tried to do that. Here is my sample code:
> >
> > #generating data
> > x1 <- rnorm(100,1)
> > x2 <- rnorm(100,1)
> > y  <- rnorm(100,1)
> > group <- rep(c("A","B"),c(40,60))
> > group <- factor(group)
> > df <- data.frame(y,x1,x2,group)
> >
> > #write formula inside lm--works fine
> > res1 <- lapply(levels(df$group), function(x) lm(y~x1,df, subset = group
> > ==x))
> > res1
> > res2 <- lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset =
> > group ==x))
> > res2
> >
> > #try to pass formula into lm()--does not work
> > formula1 <- as.formula(y~x1)
> > formula2 <- as.formula(y~x1+x2)
> > resf1 <- lapply(levels(df$group),function(x) lm(formula1,df, subset =
> > group ==x))
> > resf1
> > resf2 <- lapply(levels(df$group),function(x) lm(formula2,df, subset =
> > group ==x))
> > Resf2
> >
> > The error message is
> > 'Error in eval(expr, envir, enclos): object "x" not found'
> >
> > Any help is greatly appreciated!
> >
> > Yan
> > 
> >
> > This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RFclustering - is it available in R?

2007-08-15 Thread Gavin Simpson
On Wed, 2007-08-15 at 09:44 -0700, David Katz wrote:
> Several searches turned up nothing. Perhaps I will try to implement it if
> nobody else has. Thanks.

You can do this with Andy Liaw's randomForest package can do this and
the first hit on a Google search (on term "RFclustering") was this:

http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering.htm

which shows how one might go about this with some helper functions.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formula in lm inside lapply

2007-08-15 Thread Weiwei Shi
try this:

> x = predict(z, Iris[-train, ])
> x1 <- rnorm(100,1)
> x2 <- rnorm(100,1)
> y  <- rnorm(100,1)
> group <- rep(c("A","B"),c(40,60))
> group <- factor(group)
> df <- data.frame(y,x1,x2,group)
> resf1 <- lapply(levels(df$group),function(x) {formula1 <- as.formula(y~x1); 
> lm(formula1,df, subset =group ==x)}) # put the formula defn into function(x)
> resf1
[[1]]

Call:
lm(formula = formula1, data = df, subset = group == x)

Coefficients:
(Intercept)   x1
 0.8532   0.1189


[[2]]

Call:
lm(formula = formula1, data = df, subset = group == x)

Coefficients:
(Intercept)   x1
 0.7116   0.3398

HTH,

Weiwei

On 8/15/07, Li, Yan (IED) <[EMAIL PROTECTED]> wrote:
> I am trying to run separate regressions for different groups of
> observations using the lapply function. It works fine when I write the
> formula inside the lm() function. But I would like to pass formulae into
> lm(), so I can do multiple models more easily. I got an error message
> when I tried to do that. Here is my sample code:
>
> #generating data
> x1 <- rnorm(100,1)
> x2 <- rnorm(100,1)
> y  <- rnorm(100,1)
> group <- rep(c("A","B"),c(40,60))
> group <- factor(group)
> df <- data.frame(y,x1,x2,group)
>
> #write formula inside lm--works fine
> res1 <- lapply(levels(df$group), function(x) lm(y~x1,df, subset = group
> ==x))
> res1
> res2 <- lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset =
> group ==x))
> res2
>
> #try to pass formula into lm()--does not work
> formula1 <- as.formula(y~x1)
> formula2 <- as.formula(y~x1+x2)
> resf1 <- lapply(levels(df$group),function(x) lm(formula1,df, subset =
> group ==x))
> resf1
> resf2 <- lapply(levels(df$group),function(x) lm(formula2,df, subset =
> group ==x))
> Resf2
>
> The error message is
> 'Error in eval(expr, envir, enclos): object "x" not found'
>
> Any help is greatly appreciated!
>
> Yan
> 
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formula in lm inside lapply

2007-08-15 Thread Gabor Grothendieck
It can't find x since the environment of formula1 and of formula2 is the Global
Environment and x is not there -- its local to the function.

Try this:

#generating data
set.seed(1)
DF <- data.frame(y = rnorm(100, 1), x1 = rnorm(100, 1), x2 = rnorm(100, 1),
  group = rep(c("A", "B"), c(40, 60)))

formula1 <- as.formula(y ~ x1)
lapply(levels(DF$group), function(x) {
   environment(formula1) <- environment()
   lm(formula1, DF, subset = group == x)
})

formula2 <- as.formula(y ~ x1 + x2)
lapply(levels(DF$group), function(x) {
   environment(formula2) <- environment()
   lm(formula2, DF, subset = group == x)
})



On 8/15/07, Li, Yan (IED) <[EMAIL PROTECTED]> wrote:
> I am trying to run separate regressions for different groups of
> observations using the lapply function. It works fine when I write the
> formula inside the lm() function. But I would like to pass formulae into
> lm(), so I can do multiple models more easily. I got an error message
> when I tried to do that. Here is my sample code:
>
> #generating data
> x1 <- rnorm(100,1)
> x2 <- rnorm(100,1)
> y  <- rnorm(100,1)
> group <- rep(c("A","B"),c(40,60))
> group <- factor(group)
> df <- data.frame(y,x1,x2,group)
>
> #write formula inside lm--works fine
> res1 <- lapply(levels(df$group), function(x) lm(y~x1,df, subset = group
> ==x))
> res1
> res2 <- lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset =
> group ==x))
> res2
>
> #try to pass formula into lm()--does not work
> formula1 <- as.formula(y~x1)
> formula2 <- as.formula(y~x1+x2)
> resf1 <- lapply(levels(df$group),function(x) lm(formula1,df, subset =
> group ==x))
> resf1
> resf2 <- lapply(levels(df$group),function(x) lm(formula2,df, subset =
> group ==x))
> Resf2
>
> The error message is
> 'Error in eval(expr, envir, enclos): object "x" not found'
>
> Any help is greatly appreciated!
>
> Yan
> 
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formula in lm inside lapply

2007-08-15 Thread Henrique Dallazuanna
try this:

lapply(levels(df$group), function(x)lm(formula1, data=df[group==x,]))

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

On 15/08/07, Li, Yan (IED) <[EMAIL PROTECTED]> wrote:
>
> I am trying to run separate regressions for different groups of
> observations using the lapply function. It works fine when I write the
> formula inside the lm() function. But I would like to pass formulae into
> lm(), so I can do multiple models more easily. I got an error message
> when I tried to do that. Here is my sample code:
>
> #generating data
> x1 <- rnorm(100,1)
> x2 <- rnorm(100,1)
> y  <- rnorm(100,1)
> group <- rep(c("A","B"),c(40,60))
> group <- factor(group)
> df <- data.frame(y,x1,x2,group)
>
> #write formula inside lm--works fine
> res1 <- lapply(levels(df$group), function(x) lm(y~x1,df, subset = group
> ==x))
> res1
> res2 <- lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset =
> group ==x))
> res2
>
> #try to pass formula into lm()--does not work
> formula1 <- as.formula(y~x1)
> formula2 <- as.formula(y~x1+x2)
> resf1 <- lapply(levels(df$group),function(x) lm(formula1,df, subset =
> group ==x))
> resf1
> resf2 <- lapply(levels(df$group),function(x) lm(formula2,df, subset =
> group ==x))
> Resf2
>
> The error message is
> 'Error in eval(expr, envir, enclos): object "x" not found'
>
> Any help is greatly appreciated!
>
> Yan
> 
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RFclustering - is it available in R?

2007-08-15 Thread David Katz

Several searches turned up nothing. Perhaps I will try to implement it if
nobody else has. Thanks.
-- 
View this message in context: 
http://www.nabble.com/RFclustering---is-it-available-in-R--tf4274225.html#a12165636
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer coefficient distributions and p values

2007-08-15 Thread Daniel Lakeland
I am helping my wife do some statistical analysis. She is a biologist,
and she has performed some measurements on various genotypes of
mice. My background is in applied mathematics and engineering, and I
have a fairly good statistics background, but I am by no means a PhD
level expert in statistical methods.

We have used the lmer package to fit various models for the various
experiments that she has done (random effects from multiple
measurements for each animal or each trial, and fixed effects from
developmental stage, and genotype etc). The results are fairly clear
cut to me, and I suggested that she publish the results as coefficient
estimates for the relevant contrasts, and their standard error
estimates. However, she has read the statistical guidelines for the
journal and they insist on p values.

I personally think that p values, and sharp-null hypothesis tests are
misguided and should be banned from publications, but it doesn't much
matter what I think compared to what the editors want.

Based on searching the archives, and finding this message:

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

I am aware of the theoretical difficulties with p values from lmer
results. I am also aware of the mcmcsamp function which performs some
kind of bayesian sampling from the posterior distribution of the
coefficients based on some kind of prior (I will need to do some more
reading to more fully understand this). Is this the primary way in
which we can estimate the distribution of the model coefficients and
calculate a p value or a confidence interval?

What can I do with the t statistic provided by lmer? If as the above
message suggests, we are uncertain of the correct F and by extension t
distributions to use, what help are the t statistics? I suppose I
could test them against a very low degree of freedom t distribution
(say 3) and publish those p values?

Again, I'm content to ignore p values and stick to estimates, but the
journal isn't.

BTW: thanks to all on this list, I've benefitted greatly from R and
from the archives of help topics.

-- 
Daniel Lakeland
[EMAIL PROTECTED]
http://www.street-artists.org/~dlakelan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Negative Binomial: glm.nb

2007-08-15 Thread Ted Harding
Hi Folks,

I'm playing with glm.nb() in MASS.

Reference: the negative binomial distribution

  P(y) = (Gamma(theta+y)/(Gamma(theta)*y!))*(p^theta)*(1-p)^y

  y = 0,1,2,...

in the notation of the MASS book (section 7.4), where

  p = theta/(mu + theta)  so  (1-p) = mu/(mu + theta)

where mu is the expected value of Y. It seems from ?glm.nb that
an initial value of theta is either supplied, or obtained by
a method of moments after an initial Poisson fit to the data;
then it casts back and forth:

A: given theta, glm is applied to fit the linear model, with
log link, for mu. Then theta is re-estimed, then the linear model,
and so on.

I hope I've understood right!

However, this procedure if based on the assumption that the same
value of theta applies across the board, regardless of the values
of any covariates or factors.

The MASS book (Section 7.4) illustrates the use of glm.nb() on
the Quine data (for numbers of days absence from school in a
school year by Australian children). There are four factors in
addition to the  "outcome" Days:

Eth: Ethnicity (Aboriginal "A"/Non-Aboriginal "N")
Lrn: Learning Ability (Slow learner"SL", Average Learner "AL")
Age: Age Group (Primary "F0", First/Second/Third Forms "F1/2/3")
Sex: Gender ("M"/"F")

This dataset is earlier analysed thoroughly in the MASS book
(Section 6.8), primarily with reference to transforming Days
by a Box-Cox transformation. The negative binomial does not
play a role in that part.

When it comes to the glm.nb(), however, the assumption of "constant
theta" is implicit in Section 7.4.

But, if you look at the data, this does seem all that reasonable.

Preamble: Get the variables out of 'quine' into convenient names.

library(MASS)
D<-quine$Days ; E<-quine$Eth; L<-quine$Lrn; A<-quine$Age; S<-quine$Sex

D.A<-D[E=="A"]; L.A<-L[E=="A"]; A.A<-A[E=="A"]; S.A<-S[E=="A"]
D.N<-D[E=="N"]; L.N<-L[E=="N"]; A.N<-A[E=="N"]; S.N<-S[E=="N"]

Now, if you look at the histograms of D.A and D.N separately:

  hist(D.A,breaks=-0.5+4*(0:22))
  hist(D.N,breaks=-0.5+4*(0:22))

you can see that there is a major difference in shape, such as
would arise if theta<1 for Eth="A", and theta>1 for Eth="B".
This is confirmed by a separate covariate-free fit of the mean
to each group, from which the fitted theta is returned:

summary(glm.nb(D.A~1))
-->   Theta:  1.499
  Std. Err.:  0.260

summary(glm.nb(D.A~1))
-->   Theta:  0.919
  Std. Err.:  0.159

(1.499-0.919)/sqrt(0.260^2 + 0.159^2)
##[1] 1.903113

1-pnorm(1.903113)
##[1] 0.0285129

so we have a 1-sided P-value of 0.029, a 2-sided of 0.057

That's a fairly strong indication that theta is not the same
for the two groups; and the shapes of the histograms show that
the difference has an important effect on the distribution.

Of course, this dataset is notoriously unbalanced, so maybe this
is a reflection of the mixtures of categories. A simple linear
model on each subgroup eases tha above situation a bit:

summary(glm.nb(D.A ~ S.A+A.A+L.A))
-->   Theta:  1.767
  Std. Err.:  0.318
summary(glm.nb(D.N ~ S.N+A.N+L.N))
  Theta:  1.107
  Std. Err.:  0.201
1-pnorm((1.767-1.107)/sqrt(0.318^2+0.201^2)) = 0.0397

and throwing everything into the pot eases it a bit more:

summary(glm.nb(D.A ~ S.A*A.A*L.A))
-->   Theta:  2.317
  Std. Err.:  0.438
summary(glm.nb(D.N ~ S.N*A.N*L.N))
-->   Theta:  1.581
  Std. Err.:  0.318
1-pnorm((2.317-1.581)/sqrt(0.438^2+0.318^2)) = 0.0870

But, while the theta-values have now moved well up into the ">1"
range (where they have less effect on the shape of the distribution),
nevertheless the differences of the "A" and "N" estimates have
not changed much: 0.580 --> 0.660 --> 0.736
even though their P-values have eased off somwehat (and, with 69
and 77 data for the "A" and "N" groups respectively, we don't have
huge power here, I think).

I've gone through the above in detail, since it's an illustration
which you cam all work through yourselves in R, to illustrate the
issue about variation of theta in a negative binomial model, when
you want to get at the effects of covariates on the outcome Y (the
counts of events). In real life I'm considering other data where
the "theta" effect is more marked, and probably more important.
Which finally leads on to my

QUERY: Is there anything implement{ed|able} in R which can address
the issue that theta may vary according to factors or covariates,
as well as the level of mu given theta?

Or is one reduced to running glm.nb() (or the like) on subsets?
(Not that this would be much use if theta depended on a continuous
covariate).

With thanks, and best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 17:28:14
-- XFMail --

_

[R] Formula in lm inside lapply

2007-08-15 Thread Li, Yan \(IED\)
I am trying to run separate regressions for different groups of
observations using the lapply function. It works fine when I write the
formula inside the lm() function. But I would like to pass formulae into
lm(), so I can do multiple models more easily. I got an error message
when I tried to do that. Here is my sample code:

#generating data
x1 <- rnorm(100,1)
x2 <- rnorm(100,1)
y  <- rnorm(100,1)
group <- rep(c("A","B"),c(40,60))
group <- factor(group)
df <- data.frame(y,x1,x2,group)

#write formula inside lm--works fine
res1 <- lapply(levels(df$group), function(x) lm(y~x1,df, subset = group
==x))
res1
res2 <- lapply(levels(df$group),function(x) lm(y~x1+x2,df, subset =
group ==x))
res2

#try to pass formula into lm()--does not work
formula1 <- as.formula(y~x1)
formula2 <- as.formula(y~x1+x2)
resf1 <- lapply(levels(df$group),function(x) lm(formula1,df, subset =
group ==x))
resf1
resf2 <- lapply(levels(df$group),function(x) lm(formula2,df, subset =
group ==x))
Resf2

The error message is 
'Error in eval(expr, envir, enclos): object "x" not found'

Any help is greatly appreciated!

Yan


This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binomial simulation

2007-08-15 Thread Lucke, Joseph F
C is an R function for setting contrasts in a factor.  Hence the funky
error message.
?C

Use choose() for your C(N,k)
?choose

choose(200,2)
19900

choose(200,100)
 9.054851e+58 

N=200; k=100; m=50; p=.6; q=.95
choose(N,k)*p^k*(1-p)^(N-k)*choose(k,m)*q^m*(1-q)^(k-m)
6.554505e-41

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Moshe Olshansky
Sent: Wednesday, August 15, 2007 2:06 AM
To: sigalit mangut-leiba; r-help
Subject: Re: [R] binomial simulation

No wonder that you are getting overflow, since
gamma(N+1) = n! and 200! > (200/e)^200 > 10^370.
There exists another way to compute C(N,k). Let me know if you need this
and I will explain to you how this can be done.
But do you really need to compute the individual probabilities? May be
you need something else and there is no need to compute the individual
probabilities?

Regards,

Moshe.

--- sigalit mangut-leiba <[EMAIL PROTECTED]> wrote:

> Thank you,
> I'm trying to run the joint probabilty:
> 
> C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
> 
> and get the error: Error in C(N, k) : object not interpretable as a 
> factor
> 
> so I tried the long way:
> 
> gamma(N+1)/(gamma(k+1)*(gamma(N-k)))
> 
> and the same with k, and got the error:
> 
> 1: value out of range in 'gammafn' in: gamma(N + 1)
> 2: value out of range in 'gammafn' in: gamma(N - k) 
> 
> Do you know why it's not working?
> 
> Thanks again,
> 
> Sigalit.
> 
> 
> 
> On 8/14/07, Moshe Olshansky <[EMAIL PROTECTED]>
> wrote:
> >
> > As I understand this,
> > P(T+ | D-)=1-P(T+ | D+)=0.05
> > is the probability not to detect desease for a
> person
> > at ICU who has the desease. Correct?
> >
> > What I asked was whether it is possible to
> mistakenly
> > detect the desease for a person who does not have
> it?
> >
> > Assuming that this is impossible the formula is
> below:
> >
> > If there are N patients, each has a probability p
> to
> > have the desease (p=0.6 in your case) and q is the probability to 
> > detect the desease for a person who
> has
> > it (q = 0.95 for ICU and q = 0.8 for a regular
> unit),
> > then
> >
> > P(k have the desease AND m are detected) = P(k have the desease)*P(m

> > are detected / k have
> the
> > desease) =
> > C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
> > where C(a,b) is the Binomial coefficient "a above
> b" -
> > the number of ways to choose b items out of a
> (when
> > the order does not matter). You of course must
> assume
> > that N >= k >= m >= 0 (otherwise the probability
> is
> > 0).
> >
> > To generate such pairs (k infected and m detected)
> you
> > can do the following:
> >
> > k <- rbinom(N,1,p)
> > m <- rbinom(k,1,q)
> >
> > Regards,
> >
> > Moshe.
> >
> > --- sigalit mangut-leiba <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > > The probability of false detection is: P(T+ | D-)=1-P(T+ | 
> > > D+)=0.05.
> > > and I want to find the joint probability
> > > P(T+,D+)=P(T+|D+)*P(D+)
> > > Thank you for your reply,
> > > Sigalit.
> > >
> > >
> > > On 8/13/07, Moshe Olshansky
> <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > Hi Sigalit,
> > > >
> > > > Do you want to find the probability P(T+ = t
> AND
> > > D+ =
> > > > d) for all the combinations of t and d (for
> ICU
> > > and
> > > > Reg.)?
> > > > Is the probability of false detection (when
> there
> > > is
> > > > no disease) always 0?
> > > >
> > > > Regards,
> > > >
> > > > Moshe.
> > > >
> > > > --- sigalit mangut-leiba <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > hello,
> > > > > I asked about this simulation a few days
> ago,
> > > but
> > > > > still i can't get what i
> > > > > need.
> > > > > I have 2 units: icu and regular. from icu I
> want
> > > to
> > > > > take 200 observations
> > > > > from binomial distribution, when probability
> for
> > > > > disease is: p=0.6.
> > > > > from regular I want to take 300 observation
> with
> > > the
> > > > > same probability: p=0.6
> > > > > .
> > > > > the distribution to detect disease when
> disease
> > > > > occurred- *for someone from
> > > > > icu* - is: p(T+ | D+)=0.95.
> > > > > the distribution to detect disease when
> disease
> > > > > occurred- *for someone from
> > > > > reg.unit* - is: p(T+ | D+)=0.8.
> > > > > I want to compute the joint distribution for
> > > each
> > > > > unit: p(T+,D+) for icu,
> > > > > and the same for reg.
> > > > > I tried:
> > > > >
> > > > > pdeti <- 0
> > > > >
> > > > > pdetr <- 0
> > > > >
> > > > > picu <- pdeti*.6
> > > > >
> > > > > preg <- pdetr*.6
> > > > >
> > > > > dept <- c("icu","reg")
> > > > >
> > > > > icu <- rbinom(200, 1, .6)
> > > > >
> > > > > reg <- rbinom(300, 1, .6)
> > > > >
> > > > > for(i in 1:300) {
> > > > >
> > > > > if(dept=="icu") pdeti==0.95
> > > > >
> > > > > if (dept=="reg") pdetr== 0.80
> > > > >
> > > > > }
> > > > >
> > > > > print(picu)
> > > > >
> > > > > print(preg)
> > > > >
> > > > > and got 50 warnings:
> > > > >
> > > > > the condition has length > 1 and only the
> first
> > > > > element will be used in: if

[R] Polynomial fitting

2007-08-15 Thread Shiazy Fuzzy
Hi everybody!

I'm looking some way to do in R a polynomial fit, say like polyfit
function of Octave/MATLAB.

For who don't know, c = polyfit(x,y,m) finds the coefficients of a
polynomial p(x) of degree m that fits the data, p(x[i]) to y[i], in a
least squares sense. The result c is a vector of length m+1 containing
the polynomial coefficients in descending powers:
p(x) = c[1]*x^n + c[2]*x^(n-1) + ... + c[n]*x + c[n+1]

For prediction, one can then use function polyval like the following:

y0 = polyval( polyfit( x, y, degree ), x0 )

y0 are the prediction values at points x0 using the given polynomial.

In R, we know there is lm for 1-degree polynomial:
lm( y ~ x ) == polyfit( x, y, 1 )

and for prediction I can just create a function like:
lsqfit <- function( model, xx ) return( xx * coefficients(model)[2] +
coefficients(model)[1] );
and then: y0 <- lsqfit(x0)
(I've tried with predict.lm( model, newdata=x0 ) but obtain a bad result)

For a degree greater than 1, say m,  what can I use.??
I've tried with
   lm( y ~ poly(x, degree=m) )
I've also looked at glm, nlm, approx, ... but with these I can't
specify the polynomial degree.

Thank you so much!

Sincerely,

-- Marco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help and Firefox

2007-08-15 Thread Erich Neuwirth
I found a solution to my problem.
It is describe here.
http://kb.mozillazine.org/Windows_error_opening_Internet_shortcut_or_local_HTML_file_-_Firefox

Essentially, it involves switching off DDE for some file associations
in Windows Explorer.

It really is a Firefox bug.


Erich Neuwirth wrote:
> My configuration is  Windows XP, R-2.5.1patched.
> My standard browser in Windows is Firefox 2.0.6,
> and I am using htmlhelp.
> I have problems with starting the browser for displaying help.
> help("lm") works as it should when Firefox is already running.
> When I do help("lm") and the browser is not yet started,
> I get
> Error in shell.exec(url) :
> 'C:\PROGRA~2\R\R-25~1.1\library\stats\html\lm.html' not found
> but nevertheless the browser starts and the html file is displayed
> some seconds later.
> 
> If I try to use help from within a function and the browser is not open,
> the browser will not start and therefore help will not be displayed.
> 
> Has anybody else experienced the same problem?
> Is there a solution?
> 
> 


-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mitools and plm packages

2007-08-15 Thread Nathan Paxton
Hi all,

I am trying to use the functions in the plm package with multiply  
imputed datasets. I had tried to combine the datasets using the  
imputationList() function of mitools. plm, however, requires a data  
argument, and I don't know where to point it to. I'd appreciate any  
help people might have.

A (possible) fuller description of the problem and code is in a  
previous message: https://stat.ethz.ch/pipermail/r-help/2007-August/ 
138670.html.

Thanks.
-N
--
Nathan A. Paxton
Ph.D. Candidate
Dept. of Government, Harvard University

Resident Tutor
John Winthrop House, Harvard University

napaxton AT fas DOT harvard DOT edu
http://www.fas.harvard.edu/~napaxton

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using sampling weights in glm

2007-08-15 Thread Werner Wernersen
Hi,

I have a supposedly representative sample from a
stratified survey. Hence, all observations have an
associated sample weight each which inflate the sample
to population size. 

I want to run a glm probit regression on the data but
I am not clear about if or better how I can use the
weights in it. From the description of the weights
argument of glm it seems to me that I cannot plug
these weights in there.

Has anybody some advice for me?

Many thanks,
  Werner

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shorthand for plotmath-annotation in certain font

2007-08-15 Thread Daniel Sabanés Bové
Dear useRs,

I wanted to have a function that allows me to get the font.lab-font,
especially italic font,
in plotmath-annotation. Consulting the R-Help-list did not reveal a
solution,
but other requests, so I wrote a little function, which I would like to
share:

## code begin

math <- function (..., font = par ("font.lab"))
{
## unpack dot-dot-dot argument
dots <- as.list (substitute (...[]))
dots <- dots[2:(length (dots) - 1)]

## font extension
font <- switch (font, "plain", "bold", "italic", "bolditalic")
dots <- paste (font, "(", dots, ")")

## parse and build expression
do.call ("expression", as.list (parse (text = dots)))
}

## code end

Using this function, you can abbreviate the annotation work, a
small example:

## code begin

r <- rexp (100)
theta <- runif (100, 0, 2*pi)
group <- sample (1:4, 100, replace = TRUE)
par (font.lab = 4)
plot (r * cos (theta), r * sin (theta),
  xlab = math (r %*% cos (theta)), ylab = math (r %*% sin (theta)),
  pch = group)
legend ("topleft", title = "group",
legend = math (15 * alpha, 37 * beta, gamma + v[1], delta %/% d),
pch = 1:4)

## code end

Note that cos and sin are no "built-in" functions that plotmath can
recognize,
they are thus bolditalic as well. Using font = 3 meaning "italic" and
using the
Computer Math Symbol Adobe-Symbol-encoded fonts (cmsyase) developed by
Paul Murrell
(see http://www.stat.auckland.ac.nz/~paul/R/CM/CMR.html), you can produce
annotation that resembles that of the gold-standard LaTeX quite well.

regards,
Daniel - who is not subscribed to the list, please send direct answers
to my email address as well
as to the list, thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Vladimir Eremeev
Hello Nick,

Wednesday, August 15, 2007, 1:18:34 PM, you wrote:

NC> On 15/08/07, Vladimir Eremeev <[EMAIL PROTECTED]> wrote:
NC> Nick Chorley-3 wrote:
>>
>> I have a large amount of data that I would like to create a histogram of
>> and
>> plot and do things with in R. It is pretty much impossible to read the
>> data
>> into R, so I have written a program to bin the data and now have a list of
>> counts in each bin. Is it possible to somehow import this into R and use
>> hist(), so I can, for instance, plot the probability density? I have
>> looked
>> at the help page for hist(), but couldn't find anything related to this
>> there.
>>

NC> Hi! And why do you think, its impossible to import the data in R?
NC> It can handle rather large data volumes, especially in Linux. Just study
NC> help("Memory-limits").
NC> My data file is 4.8 GB!

NC> You can plot something looking like a histogram using barplot() or plot(...
NC> type="h").

NC> The problem with those is that I can't plot the probability density.

NC> You can create the "histogram" class object manually.

NC> For example,
NC> [ import bin counts... probably, it is a table of 2 columns, defining bin
NC> borders and counts.
NC>   let's  store it in ncounts. ]

NC> Yes, that's what I have. 

>> hst<-hist(rnorm(nrow(ncounts)),plot=FALSE)
>> str(hst)  # actually I called hist(rnorm(100))
>> List of 7
>>  $ breaks : num [1:10] -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
>>  $ counts : int [1:9] 3 6 12 9 24 19 14 9 4
>>  $ intensities: num [1:9] 0.06 0.12 0.24 0.18 0.48 ...
>>  $ density: num [1:9] 0.06 0.12 0.24 0.18 0.48 ...
>>  $ mids   : num [1:9] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75
>>  $ xname  : chr "rnorm(100)"
>>  $ equidist   : logi TRUE
>>  - attr(*, "class")= chr "histogram"
>> hst$breaks <-  [ bsdfgsdghsdghdfgh ]
>> hst$counts <-  [ asfd109,mnasdfkjhdsfl ]
>> hst$intensities <-

NC> My data isn't normally distributed, so I tried rexp() rather
NC> than rnorm(), but it's not looking like it should

The call of the random generator doesn't matter, since
it is used just to create a numeric vector for the hist().
And call to hist() just creates the dummy structure, which you must
fill with your data.

You then replace the returned result with yours.
You can call hist(1:100) with the same success. And any other numeric
vector can be used to call hist.
If the result doesn't look like it should then you, probably,
incorrectly or incompletely altered the list returned by hist().

Actually, you can create this structure from scratch:

hst<-list(breaks= [your breaks], counts= [your counts],
  intensities = [your intensities], density=[your density],
  mids= [your mids], xname= "hist(of  your data)",
  equidist=TRUE [or FALSE] )
attr(hst,"class")<-"histogram"
 
>> Studying the hist.default() sources will help you to understand, how every
>> list element is created.

Type hist.default (without parentheses) on the R prompt, and it will
display you the sources of this function.
You can also use dump(hist.default,file="hist_default.R") to save it
to a text file.

-- 
Best regards,
 Vladimirmailto:[EMAIL PROTECTED]


--SevinMail--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert factor to numeric vector of labels

2007-08-15 Thread John Kane
My reason for setting stringsAsFactors = FALSE is more
that I really dislike having R convert what I "think"
are character variables to factors when I import data.


I suspect that it takes quite a few new users by
surprise that what they had intended to be a character
variable has become a factor. And it can take a long
time to track down the problem if you're a newbie.
-

A quick (overly simple) example where I had intended
the data in the second column to be character. 

Original data found at
http://ca.geocities.com/jrkrideau/R/facts.txt

1, b
1, b
3, b
3, b
4, a
4, a
3, a

options(stringsAsFactors = TRUE)

df  <-
read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt";)
 ; df[,2]
[1]  b  b  b  a  a  a
Levels:  a  b


options(stringsAsFactors = FALSE)

df  <-
read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt";)
 ; df[,2]
[1] " b" " b" " b" " a" " a" " a"
-

There are probably good reasons for setting the
default either way and while currently, I am strongly
of the FALSE persuation I can see some serious
problems changing the default, particularly when most
existing code will assume TRUE.  

It might be that a  "Why are my character variables
turning into factors"  as a compliment to "How do I
convert factors to numeric" in the FAQ would be
sufficient.  As it is the reader knows what seems to
have happened but there is no clue as to why or how
this is happening.

If there are enough problems in importing numeric as
factors a note about the default might be worthwhile
in both FAQ entries  since it seems to indicate that
this is not a rare problem.



--- Matthew Keller <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> If we, the R community, are endeavoring to make R
> user friendly
> (gasp!), I think that one of the first places to
> start would be in
> setting stringsAsFactors = FALSE. Several times I've
> run into
> instances of folks decrying R's "rediculous usage of
> memory" in
> reading data, only to come to find out that these
> folks were
> unknowingly importing certain columns as factors.
> The fix is easy once
> you know it, but it isn't obvious to new users, and
> I'd bet that it
> turns some % of people off of the program. Factors
> are not used often
> enough to justify this default behavior in my
> opinion. When factors
> are used, the user knows to treat the variable as a
> factor, and so it
> can be done on a case-by-case (or should I say
> variable-by-variable?)
> basis.
> 
> Is this a default that should be changed?
> 
> Matt
> 
> 

> > This is one of R's rather _endearing_  little
> > idiosyncrasies. I ran into it a while ago.
> >
>
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
> >
> >
> > For some reason, possibly historical, the option
> > "stringAsFactors" is set to TRUE.
> >
> > As Prof Ripley says FAQ 7.10 will tell you
> > as.numeric(as.character(f)) # for a one-off
> conversion
> >
> > >From Gabor Grothendieck  A one-off solution for a
> > complete data.frame
> >
> > DF <- data.frame(let = letters[1:3], num = 1:3,
> >  stringsAsFactors = FALSE)
> >
> > str(DF)  # to see what has happened.
> >
> > You can reset the option globally, see below. 
> However
> > you might want to read Gabor Grothendieck's
> comment
> > about this in the thread referenced above since it
> > could cause problems if you transfer files alot.
> >
> > Personally I went with the global option since I
> don't
> > tend to transfer programs to other people and I
> was
> > getting tired of tracking down errors in my
> programs
> > caused by numeric and character variables suddenly
> > deciding to become factors.
> >
> > >From Steven Tucker:
> >
> > You can also this option globally with
> >  options(stringsAsFactors = TRUE)  # in
> > \library\base\R\Rprofile
> >
> > --- Falk Lieder <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > >
> > > I have imported a data file to R. Unfortunately
> R
> > > has interpreted some
> > > numeric variables as factors. Therefore I want
> to
> > > reconvert these to numeric
> > > vectors whose values are the factor levels'
> labels.
> > > I tried
> > > as.numeric(),
> > > but it returns a vector of factor levels (i.e.
> > > 1,2,3,...) instead of labels
> > > (i.e. 0.71, 1.34, 2.61,…).
> > > What can I do instead?
> > >
> > > Best wishes, Falk
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> -- 
> Matthew C Keller
> Postdoctoral Fellow
> Virginia Institute for Psychiatric and Behavioral
> Genetics
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and pro

Re: [R] "ung�ltige Versionsspezifikation"

2007-08-15 Thread Martin Maechler
> "JK" == John Kane <[EMAIL PROTECTED]>
> on Wed, 15 Aug 2007 08:55:59 -0400 (EDT) writes:

JK> I think we need more information about your system.
JK> Please run
JK> sessionInfo()
JK> and include the information in another posting.

Yes, indeed.
However R version 2.3.1 seems a bit too old to fit to a current
version of cairo (rather 'cairoDevice' ??).

And BTW: It is a *package*, not a library!!!

Martin Maechler, ETH Zurich

JK> --- "Mag. Ferri Leberl" <[EMAIL PROTECTED]> wrote:

>> Dear everybody,
>> excuse me if this question ist trivial, however, I
>> have now looked for
>> an answer for quite a while and therefore dare
>> placing it here.
>> I want to export .svg-files and got here the advice
>> to employ the
>> cairo-library.
>> I downloaded the *current*-version here and expanded
>> it
>> to /usr/local/lib/R/site-library.
>> library(cairo) returned ungültige
>> Versionsspezifikation (INVALID VERSION
>> SPECIFICATION).
>> I got some answer here (EPM39), but, stupid enough,
>> I could not employ
>> it as I don't know where to look for the variable
>> named there.
>> The R-version I employ is 2.3.1.
>> Can anybody solve the (probably simple) problem?
>> Thank you in advance.
>> Yours, Ferri



JK> 

JK> __
JK> R-help@stat.math.ethz.ch mailing list
JK> https://stat.ethz.ch/mailman/listinfo/r-help
JK> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
JK> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with specifying an offset term in lmer and lmer2

2007-08-15 Thread Matthias Schmidt
Dear all,

I just want to integrate an offset in a linear mixed model applying 
functions lmer and lmer2 from library(lme4). It seems that lmer just 
ignores the offset term and lmer2 does the fit but replies an error if 
calling the fitted object:

#lmer with offset
lmeTNr_offset<-lmer(res_id~1+offset(fit.gam)+(1|TNr),Bu_WP)
lmeTNr_offset
Linear mixed-effects model fit by REML
Formula: res_id ~ 1 + offset(fit.gam) + (1 | TNr)
Data: Bu_WP
AIC   BIC logLik MLdeviance REMLdeviance
  -8869 -8852   4436  -8883-8873
Random effects:
  Groups   Name Variance Std.Dev.
  TNr   0.016274 0.12757
  Residual  0.033355 0.18263
number of obs: 25208, groups: TNr, 4812

Fixed effects:
  Estimate Std. Error t value
(Intercept) -0.008815   0.002342  -3.764

#results in the same output as the specification without an offset:

#lmer without offset
lmeTNr<-lmer(res_id~1+(1|TNr),Bu_WP)
lmeTNr
Linear mixed-effects model fit by REML
Formula: res_id ~ 1 + (1 | TNr)
Data: Bu_WP
AIC   BIC logLik MLdeviance REMLdeviance
  -8869 -8852   4436  -8883-8873
Random effects:
  Groups   Name Variance Std.Dev.
  TNr   0.016274 0.12757
  Residual  0.033355 0.18263
number of obs: 25208, groups: TNr, 4812

Fixed effects:
  Estimate Std. Error t value
(Intercept) -0.008815   0.002342  -3.764

#lmer2 with offset
lme2TNr_offset<-offset<-lmer2(res_id~1+offset(fit.gam)+(1|TNr),Bu_WP)

#calling lme2TNr_offset results in an error:

error in checkSlotAssignment(object, name, value) :
 assignment of an object of class "array" is not valid for slot 
"offset" in an object of class "summary.lmer2"; is(value, "numeric") is 
not TRUE


thanks for any help

best regards

Matthias




---
Dr. Matthias Schmidt

Nordwestdeutsche Forstliche Versuchsanstalt
Abteilung Waldwachstum
Grätzelstr. 2
37079 Göttingen
TEL.: 0551/69401-110
Fax:  0551/69401-160

The Northwest German Forest Research Institute
Department of Forest Growth
Grätzelstr. 2
37079 Göttingen
TEL.: 0551/69401-110
Fax:  0551/69401-160

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "ung�ltige Versionsspezifikation"

2007-08-15 Thread John Kane
I think we need more information about your system.
Please run
sessionInfo()
and include the information in another posting.


--- "Mag. Ferri Leberl" <[EMAIL PROTECTED]> wrote:

> Dear everybody,
> excuse me if this question ist trivial, however, I
> have now looked for
> an answer for quite a while and therefore dare
> placing it here.
> I want to export .svg-files and got here the advice
> to employ the
> cairo-library.
> I downloaded the *current*-version here and expanded
> it
> to /usr/local/lib/R/site-library.
> library(cairo) returned ungültige
> Versionsspezifikation (INVALID VERSION
> SPECIFICATION).
> I got some answer here (EPM39), but, stupid enough,
> I could not employ
> it as I don't know where to look for the variable
> named there.
> The R-version I employ is 2.3.1.
> Can anybody solve the (probably simple) problem?
> Thank you in advance.
> Yours, Ferri



  

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "ungültige Versionsspezifikation"

2007-08-15 Thread Mag. Ferri Leberl
Dear everybody,
excuse me if this question ist trivial, however, I have now looked for
an answer for quite a while and therefore dare placing it here.
I want to export .svg-files and got here the advice to employ the
cairo-library.
I downloaded the *current*-version here and expanded it
to /usr/local/lib/R/site-library.
library(cairo) returned ungültige Versionsspezifikation (INVALID VERSION
SPECIFICATION).
I got some answer here (EPM39), but, stupid enough, I could not employ
it as I don't know where to look for the variable named there.
The R-version I employ is 2.3.1.
Can anybody solve the (probably simple) problem?
Thank you in advance.
Yours, Ferri

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] library kernlab, ksvm(x,y,scaled=T)

2007-08-15 Thread strinz
Hello,

I encountered the following problem with the parameter scaled in ksvm()
from package kernlab:
[Package kernlab version 0.9-5]

library(kernlab)

>   svp =ksvm(x=mydata,y=y,scaled=T)  
Using automatic sigma estimation (sigest) for RBF or laplace kernel 

>   svp =ksvm(x=mydata,y=y,scaled=F)  
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
Error in drop((scal^2) * crossprod(fitted(ret) - y)/m) : 
object "scal" not found

> class(mydata)
[1] "matrix"

> range(mydata)
[1]   0 100

what is the problem here ?

best regards
Björn




"Jetzt Handykosten senken mit klarmobil - 14 Ct./Min.! Hier klicken"
http://produkte.shopping.freenet.de/handy_voip_isdn/klarmobil/index.html?pid=730025

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] time series periodic data

2007-08-15 Thread Petr PIKAL
Dear all

Please help me with analysis of some periodic data. 

I have an output from measurement each minute and this output is modulated 
by rotation of the equipment (approx 6.5 min/revolution). I can easily 
spot this frequency from

spectrum(mydata, some suitable span)

However from other analysis I suspect there is a longer term oscilation 
(about 70-80 min) I am not able to find it from mentioned data.

Plese give me some hint how I could prove that such long term modulation 
of my data exist in presence of quite strong modulation by this short term 
oscilations.

Thank you

Best regards

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mda and kmeans

2007-08-15 Thread avanisco
Hello,

I am using the function mda of the mda library in order to discriminate 
4 groups with 8 explanatory variables. I only have 66 observations.
I tested all possible combinations of those variable and run for each 
the Mixture Discriminant Analysis.

For some iterations, I got an error message: "error in kmeans(xx, 
start): initial centers are not distinct".

I understood that the function kmeans() called by mda() choose randomly 
the initial centers for starting the clustering procedure.
As I aim to boostrap this function and need a lot of random selections, 
I'd like to avoid the effects of replicated centers by keeping the 
initial centers constant.

When debugging, it seems that mda() is linked with kmeans() by the 
following condition:
  if (inherits(weights, "mda")) {
if (is.null(weights$weights))weights <- 
predict(weights, x, type = "weights",g = fg)
else weights <- weights$weights
}

This condition call mda.start() if "weight" is null.
Kmeans() is called in mda.start() by starter() where arguments for 
kmeans (xx and start) are calculated.

The problem arises in the function sample() in starter() which sample 
randomly the data set.
For example, I could obtain duplicated row such as followed:

Debug: start <- xx[sample(1:nrow(xx), size = nc), ]
debug: TT <- kmeans(xx, start)
Browse[1]> start
  etm5  etm6  elevationslope SI NDVIEVI
28   0.7746975 0.4611835 -0.5566161 1.646738  4.5260250 1.519095  0.2501180
28.1 0.7746975 0.4611835 -0.5566161 1.646738  4.5260250 1.519095  0.2501180
30.1 0.4137596 0.2615745 -0.5367707 1.889310 -0.2040883 0.824643 -0.1526292

In sample function,it seems that sampling without replacement is the 
default. But actually, in the case above it sampled 2 times the same 
row (28).

So, this is still a black box for me.
Even if as it is mentionned in the help page of mda(), "the 'weights' 
argument need never be accessed", do you think it's possible to avoid 
this duplicated sampling?
  Thanks in advance for your ideas,


Amelie Vaniscotte
University of Franche-comté
25000 Besançon

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Ted Harding
On 15-Aug-07 08:30:08, Nick Chorley wrote:
> Hi,
> 
> I have a large amount of data that I would like to create a
> histogram of and plot and do things with in R. It is pretty
> much impossible to read the data into R, so I have written a
> program to bin the data and now have a list of counts in each
> bin. Is it possible to somehow import this into R and use
> hist(), so I can, for instance, plot the probability density?
> I have looked at the help page for hist(), but couldn't find
> anything related to this there.
> 
> Regards,
> 
> Nicky Chorley

Presumably you now have (or can readily generate) files on your
system whose contents are (or are equivalent to) something like:

brkpts.dat
0.0
0.5
1.0

9.5
10.0

counts.dat
10
7
38

7
0

where there is one more line in brkpts.dat than in counts.dat

Now simply read both files into R, creating variables 'brkpts',
'counts'

Now create a histogram template (any silly old data will do):

H1 <- hist(c(1,2))

Next, attach your variables to it:

H1$breaks <- brkpts
H1$counts <- counts

and you have your histogram in R. Also, you can use the data
in the variables 'brkpts', 'counts' to feed into any other
procedure which can acept data in this form.

Example (simulating the above in R):

  brkpts<-0.5*(0:20)
  counts<-rpois(20,7.5)
  H1<-hist(c(1,2))
  H1$breaks <- brkpts
  H1$counts <- counts
  plot(H1)

Or, if you want a more "realistic-looking" one, follow on with:

midpts<-(brkpts[1:20]+brkpts[2:21])/2
counts<-rpois(20,100*dnorm(midpts,mean=5,sd=3))
H1$breaks <- brkpts
H1$counts <- counts
plot(H1)


In other words, you've already done R's work for it with your
program which bins the data. All you need to do in R is to get
these results into the right place in R.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 10:38:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Ted Harding
On 15-Aug-07 10:15:13, Nick Chorley wrote:
>>[...]
>> Now create a histogram template (any silly old data will do):
>>
>> H1 <- hist(c(1,2))
>>
>> Next, attach your variables to it:
>>
>> H1$breaks <- brkpts
>> H1$counts <- counts
>>
>> and you have your histogram in R. Also, you can use the data
>> in the variables 'brkpts', 'counts' to feed into any other
>> procedure which can acept data in this form.
> 
> 
> This is precisely what I wanted to do, except I didn't realise
> that you could assign to the variables in the histogram object
> like this. 
> Normally when constructing the histogram, I'd use hist(x, prob=T)
> to plot the probability density against x, but obviously if you're
> just assigning values to the variables in the object, you can't do
> that. I tried putting "prob=T" in the call to hist when making the
> dummy object, but that didn't help.

Followup:

The histogram object I constructed in the example from my previous
reply contains the following:

H1
$breaks
 [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0
[12]  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0

$counts
 [1]  3  6  3  7  7  8 19 10 16 12 16 12 13 12 11 13 11  7  3  6

$intensities
[1] 0.992 1.000

$density
[1] 0.992 1.000

$mids
[1] 1.25 1.75

$xname
[1] "c(1, 2)"

$equidist
[1] TRUE

All these things are calculated when hist() is called for
raw data.

The "$breaks" and "$counts" are what I assigned to it with

  H1$breaks <- brkptsH1$counts <- counts

"$intensities", "$density", "$mids", "$xname" and "$equidist"
are what was set up by the initial call

H1 <- hist(c(0,1))

Previously, I used "plot(H1)" to illustrate that the above
assignments put the breakpoints and the counts into the
histogram object. For a more thorough approach, you need to
use plot.histogram() instead.

Here you can, in fact, set the "prob=TRUE" condition in the
plot by setting "freq=FALSE" in t call to plot.histogram().

But then it will look at "$density" for the values to plot.

So you if you want the density plot, you would need to calculate
this for yourself. E.g.

H1$density <- counts/sum(counts)
plot.histogram(H1,freq=FALSE)

And so on ... there are many relevant details in the help pages
?hist and ?plot.histogram

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 11:53:14
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for reading in multidimensional tables / data.frames

2007-08-15 Thread Werner Wernersen
Never mind, using scan() and putting it into an array
of the specific dimensions is sufficient for my case.

But it still would be interesting to know if there is
some function to read in more complex data objects. 

Thanks, 
  Werner

> Hi,
> 
> I was wondering if there is already some function
> implemented into R that reads in tables with more
> than
> 2 dimensions. There is probably something neat out
> there...
> 
> Thanks,
>   Werner



  Wissenswertes für Bastler und Hobby Handwerker. BE A BETTER HEIMWERKER! 
www.yahoo.de/clever

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help and Firefox

2007-08-15 Thread Erich Neuwirth
My configuration is  Windows XP, R-2.5.1patched.
My standard browser in Windows is Firefox 2.0.6,
and I am using htmlhelp.
I have problems with starting the browser for displaying help.
help("lm") works as it should when Firefox is already running.
When I do help("lm") and the browser is not yet started,
I get
Error in shell.exec(url) :
'C:\PROGRA~2\R\R-25~1.1\library\stats\html\lm.html' not found
but nevertheless the browser starts and the html file is displayed
some seconds later.

If I try to use help from within a function and the browser is not open,
the browser will not start and therefore help will not be displayed.

Has anybody else experienced the same problem?
Is there a solution?


-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Nick Chorley
On 15/08/07, Ted Harding <[EMAIL PROTECTED]> wrote:
>
> On 15-Aug-07 08:30:08, Nick Chorley wrote:
> > Hi,
> >
> > I have a large amount of data that I would like to create a
> > histogram of and plot and do things with in R. It is pretty
> > much impossible to read the data into R, so I have written a
> > program to bin the data and now have a list of counts in each
> > bin. Is it possible to somehow import this into R and use
> > hist(), so I can, for instance, plot the probability density?
> > I have looked at the help page for hist(), but couldn't find
> > anything related to this there.
> >
> > Regards,
> >
> > Nicky Chorley
>
> Presumably you now have (or can readily generate) files on your
> system whose contents are (or are equivalent to) something like:
>
> brkpts.dat
> 0.0
> 0.5
> 1.0
> 
> 9.5
> 10.0
>
> counts.dat
> 10
> 7
> 38
> 
> 7
> 0
>
> where there is one more line in brkpts.dat than in counts.dat
>
> Now simply read both files into R, creating variables 'brkpts',
> 'counts'
>
> Now create a histogram template (any silly old data will do):
>
> H1 <- hist(c(1,2))
>
> Next, attach your variables to it:
>
> H1$breaks <- brkpts
> H1$counts <- counts
>
> and you have your histogram in R. Also, you can use the data
> in the variables 'brkpts', 'counts' to feed into any other
> procedure which can acept data in this form.


This is precisely what I wanted to do, except I didn't realise that you
could assign to the variables in the histogram object like this.  Normally
when constructing the histogram, I'd use hist(x, prob=T) to plot the
probability density against x, but obviously if you're just assigning values
to the variables in the object, you can't do that. I tried putting "prob=T"
in the call to hist when making the dummy object, but that didn't help.

Example (simulating the above in R):
>
>   brkpts<-0.5*(0:20)
>   counts<-rpois(20,7.5)
>   H1<-hist(c(1,2))
>   H1$breaks <- brkpts
>   H1$counts <- counts
>   plot(H1)
>
> Or, if you want a more "realistic-looking" one, follow on with:
>
> midpts<-(brkpts[1:20]+brkpts[2:21])/2
> counts<-rpois(20,100*dnorm(midpts,mean=5,sd=3))
> H1$breaks <- brkpts
> H1$counts <- counts
> plot(H1)
>
>
> In other words, you've already done R's work for it with your
> program which bins the data. All you need to do in R is to get
> these results into the right place in R.


Yes!  Thanks very much!

NC

Hoping this helps,
> Ted.
>
> 
> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 15-Aug-07   Time: 10:38:05
> -- XFMail --
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Nick Chorley


On 15/08/07, Vladimir Eremeev <[EMAIL PROTECTED]> wrote:

Nick Chorley-3 wrote:
>
> I have a large amount of data that I would like to create a histogram of
> and
> plot and do things with in R. It is pretty much impossible to read the
> data
> into R, so I have written a program to bin the data and now have a list of
> counts in each bin. Is it possible to somehow import this into R and use
> hist(), so I can, for instance, plot the probability density? I have
> looked
> at the help page for hist(), but couldn't find anything related to this
> there.
>

Hi! And why do you think, its impossible to import the data in R?
It can handle rather large data volumes, especially in Linux. Just study
help("Memory-limits").


My data file is 4.8 GB! 



You can plot something looking like a histogram using barplot() or plot(...
type="h").


The problem with those is that I can't plot the probability density.



You can create the "histogram" class object manually.

For example,

[ import bin counts... probably, it is a table of 2 columns, defining bin
borders and counts.
  let's  store it in ncounts. ]


Yes, that's what I have. 



> hst<-hist(rnorm(nrow(ncounts)),plot=FALSE)
> str(hst)  # actually I called hist(rnorm(100))
List of 7
 $ breaks : num [1:10] -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
 $ counts : int [1:9] 3 6 12 9 24 19 14 9 4
 $ intensities: num [1:9] 0.06 0.12 0.24 0.18 0.48 ...
 $ density: num [1:9] 0.06 0.12 0.24 0.18 0.48 ...
 $ mids   : num [1:9] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75
 $ xname  : chr "rnorm(100)"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"
> hst$breaks <-  [ bsdfgsdghsdghdfgh ]
> hst$counts <-  [ asfd109,mnasdfkjhdsfl ]
> hst$intensities <-


My data isn't normally distributed, so I tried rexp() rather than rnorm(), but 
it's not looking like it should



Studying the hist.default() sources will help you to understand, how every
list element is created.

--
View this message in context: 
http://www.nabble.com/Possible-to-%22import%22-histograms-in-R--tf4271809.html#a12158586
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--SevinMail--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Vladimir Eremeev


Nick Chorley-3 wrote:
> 
> I have a large amount of data that I would like to create a histogram of
> and
> plot and do things with in R. It is pretty much impossible to read the
> data
> into R, so I have written a program to bin the data and now have a list of
> counts in each bin. Is it possible to somehow import this into R and use
> hist(), so I can, for instance, plot the probability density? I have
> looked
> at the help page for hist(), but couldn't find anything related to this
> there.
> 

Hi! And why do you think, its impossible to import the data in R?
It can handle rather large data volumes, especially in Linux. Just study
help("Memory-limits").

You can plot something looking like a histogram using barplot() or plot(...
type="h").

You can create the "histogram" class object manually.

For example, 

[ import bin counts... probably, it is a table of 2 columns, defining bin
borders and counts.
  let's  store it in ncounts. ]

> hst<-hist(rnorm(nrow(ncounts)),plot=FALSE)
> str(hst)  # actually I called hist(rnorm(100))
List of 7
 $ breaks : num [1:10] -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
 $ counts : int [1:9] 3 6 12 9 24 19 14 9 4
 $ intensities: num [1:9] 0.06 0.12 0.24 0.18 0.48 ...
 $ density: num [1:9] 0.06 0.12 0.24 0.18 0.48 ...
 $ mids   : num [1:9] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75
 $ xname  : chr "rnorm(100)"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"
> hst$breaks <-  [ bsdfgsdghsdghdfgh ]
> hst$counts <-  [ asfd109,mnasdfkjhdsfl ]
> hst$intensities <- 

Studying the hist.default() sources will help you to understand, how every
list element is created.

-- 
View this message in context: 
http://www.nabble.com/Possible-to-%22import%22-histograms-in-R--tf4271809.html#a12158586
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible to "import" histograms in R?

2007-08-15 Thread Nick Chorley
Hi,

I have a large amount of data that I would like to create a histogram of and
plot and do things with in R. It is pretty much impossible to read the data
into R, so I have written a program to bin the data and now have a list of
counts in each bin. Is it possible to somehow import this into R and use
hist(), so I can, for instance, plot the probability density? I have looked
at the help page for hist(), but couldn't find anything related to this
there.

Regards,

Nicky Chorley

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2007-08-15 Thread adrian
"Lawrence D. Brenninkmeyer" <[EMAIL PROTECTED]> writes:

> I am trying to find a way to diffuse GIS data on a European map. I have a
> dataset consisting of particular locations scattered across Europe,
> along with magnitude and value information.

The dataset is a spatial point pattern, where the locations are points,
and the value or magnitude is a `mark' attached to the point.

You can use the package 'spatstat' to smooth the values.

 library(spatstat)
# extract coords
 xy <- geocode[,c("LONG","LAT")]
# make an enclosing rectangle
 Box <- ripras(xy, shape="rectangle")
# create point pattern
 X <- as.ppp(xy, Box) %mark% (geocode$VALUE)
# visual check
 plot(X)
# smooth using density.ppp
 smoothVP <- density(X, weights = X$marks)
 smoothP <- density(X)
 smoothV <- eval.im(smoothVP/smoothV)
# inspect
 plot(smoothV)

The result 'smoothV' is a pixel image (object of class "im") that you can
manipulate, threshold, take histograms, etc

# plot the region where smoothed value exceeds 42
 plot(solutionset(smoothV > 42))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function for reading in multidimensional tables / data.frames

2007-08-15 Thread Werner Wernersen
Hi,

I was wondering if there is already some function
implemented into R that reads in tables with more than
2 dimensions. There is probably something neat out
there...

Thanks,
  Werner


  Wissenswertes zum Thema PC, Zubehör oder Programme. BE A BETTER 
INTERNET-GURU!  www.yahoo.de/clever

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to write to a table column by column?

2007-08-15 Thread Moshe Olshansky
Hi Yuchen,

First of all please notice that you may not have more
than 2^8 = 256 columns in Excel, so if you have more
than 256 time-series you can not put them all in one
Excel sheet. I also believe that you can not have more
than 2^16 = 65536 rows.
If you do not have more than 256 time-series, I see
two possibilities.
The first one is to create a list which will hold all
your time-series (each list member holds one
time-series) and then make a double loop: one from 1
to as long as there is more data and if the first
variable is i you create a string by looping through
the list and adding to your string either element i
(and a tab) of time-series j if it's length is >= j or
a tab (and a tab) otherwise. Then you can write each
row to the file (using 'cat'). When you are done
import the resulting file into Excel.
Another possibility is to use xlsReadWritePro which
allows you to write to a specific column of a specific
sheet of an Excel file (and to append this).

Regards,

Moshe.

--- Yuchen Luo <[EMAIL PROTECTED]> wrote:

> Dear friends.
> Every loop of my program will result in a list that
> is very long, with a
> structure similar to the one below:
> 
> Lst <- list(name="Fred", wife="Mary",
> daily.incomes=c(1:850))
> 
> Please notice the large size of "daily.incomes".
> 
> I need to store all such lists in a csv file so that
> I can easily view them
> in Excel. Excel cannot display a row of more than
> 300 elements, therefore, I
> have to store the lists as columns. It is not hard
> to store one list as a
> column in the csv file. The problem is how to store
> the second list as a
> second column, so that the two columns will lie side
> by side to each other
> and I can easily compare their elements. ( If I use
> 'appened=TRUE', the
> second time series will be stored in the same
> column. )
> 
> Thank you for your tine and your help will be highly
> appreciated!!
> 
> Best
> 
> Yuchen Luo
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column

2007-08-15 Thread Maciej Hoffman-Wecker

Thank you very much Professor Ripley! Afterwards it seems obvious where to 
look. 

Have a nice day,
Maciej

PS: Yes, my machine has not much memory, but it is sufficient for the smaller 
trial data. 

-Ursprüngliche Nachricht-
Von: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 14. August 2007 19:14
An: Maciej Hoffman-Wecker
Cc: r-help@stat.math.ethz.ch
Betreff: Re: AW: [R] Import of Access data via RODBC changes column name ("NO" 
to "Expr1014") and the content of the column

On Tue, 14 Aug 2007, Maciej Hoffman-Wecker wrote:

> Dear Professor Ripley,
>
> Thank you very much for your response. I send the problem, as I didn't 
> have any more ideas were to search for the reason. I didn't say this 
> is a R bug, knowing the responses on such mails.-)
>
> But I succeeded in developing a tiny example, that reproduces the bug 
> (wherever it is).

Thank you, that was helpful: much easier to follow that the previous code.

...

>> library(RODBC)
>> .con <- odbcConnectAccess("./test2.mdb") (.d <- try(sqlQuery(.con, 
>> "select * from Tab1")))
>  F1 NO F2
> 1  1  1  1
> 2  2  2  2
> 3  0 NA  1
> 4  1  0  0
>> (.d <- try(sqlQuery(.con, "select F1 , NO , F2 from Tab1")))
>  F1 Expr1001 F2
> 1  10  1
> 2  20  2
> 3  00  1
> 4  10  0
>> close(.con)
>
> So the problem occurs if the column names are specified within the query.
> Is the query "select F1 , NO , F2 from Tab1" invalid?

I believe so. 'NO' is an SQL92 and ODBC reserved word, at least according to 
http://www.bairdgroup.com/reservedwords.cfm

See also http://support.microsoft.com/default.aspx?scid=kb;en-us;286335
which says

   For existing objects with names that contain reserved words, you can
   avoid errors by surrounding the object name with brackets ([ ]).

and lists 'NO' as a reserved word.  RODBC quotes all column names it uses to be 
sure (and knows about most non-standard quoting mechanisms from the ODBC driver 
in use).  But this was a query you generated and so you need to do the quoting.

> Regarding the memory issue, I _knew_ that there must be a reason for 
> the running out of memory space. Sorry for not being more specific. My 
> question than is:
>
> Is there a way to 'reset' the environment without quitting R and 
> restarting it?

Sorry, no.  You cannot move objects in memory.

But why '477Mb' is coming up is still unexplained, and suggests that the 
machine has a peculiar amount of memory or some flag has been used.


>
> Thank you for your help.
>
> Kind regards,
> Maciej
>
>
> -Ursprüngliche Nachricht-
> Von: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
> Gesendet: Dienstag, 14. August 2007 11:51
> An: Maciej Hoffman-Wecker
> Cc: r-help@stat.math.ethz.ch
> Betreff: Re: [R] Import of Access data via RODBC changes column name ("NO" to 
> "Expr1014") and the content of the column
>
> On Tue, 14 Aug 2007, Maciej Hoffman-Wecker wrote:
>
>>
>> Dear all,
>>
>> I have some problems with importing data from an Access data base via
>> RODBC to R. The data base contains several tables, which all are
>> imported consecutively. One table has a column with column name "NO".
>> If I run the code attached on the bottom of the mail I get no
>> complain, but the column name (name of the respective vector of the
>> data.frame) is "Expr1014" instead of "NO". Additionally the original
>> column (type
>> "text") containes "0"s and missings, but the imported column contains
>> "0"s only (type "int"). If I change the column name in the Access data
>> base to "NOx", the import works fine with the right name and the same
>> data.
>>
>> Previously I generated a tiny Access data base which reproduced the
>> problem. To be on the safe site I installed the latest version (2.5.1)
>> and now the example works fine, but within my production process the
>> error still remaines. An import into excel via ODBC works fine.
>>
>> So there is no way to figure it out whether this is a bug or a
>> feature.-)
>
> It's most likely an ODBC issue, but you have not provided a reproducible 
> example.
>
>> The second problem I have is that when I rerun "rm(list = ls(all =
>> T)); gc()" and the import several times I get the following error:
>>
>> Error in odbcTables(channel) : Calloc could not allocate (263168 of 1)
>> memory In addition: Warning messages:
>> 1: Reached total allocation of 447Mb: see help(memory.size) in:
>> odbcQuery(channel, query, rows_at_time)
>> 2: Reached total allocation of 447Mb: see help(memory.size) in:
>> odbcQuery(channel, query, rows_at_time)
>> 3: Reached total allocation of 447Mb: see help(memory.size) in:
>> odbcTables(channel)
>> 4: Reached total allocation of 447Mb: see help(memory.size) in:
>> odbcTables(channel)
>>
>> which is surprising to me, as the first two statements should delete
>> all
>
> How do you _know _what they 'should' do?  That only deletes all objects in 
> the workspace, not all objects in R, and not all memory blocks used by R.
>
> Please do read ?"Memory-limits" 

Re: [R] binomial simulation

2007-08-15 Thread Moshe Olshansky
No wonder that you are getting overflow, since
gamma(N+1) = n! and 200! > (200/e)^200 > 10^370.
There exists another way to compute C(N,k). Let me
know if you need this and I will explain to you how
this can be done.
But do you really need to compute the individual
probabilities? May be you need something else and
there is no need to compute the individual
probabilities?

Regards,

Moshe.

--- sigalit mangut-leiba <[EMAIL PROTECTED]> wrote:

> Thank you,
> I'm trying to run the joint probabilty:
> 
> C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
> 
> and get the error: Error in C(N, k) : object not
> interpretable as a factor
> 
> so I tried the long way:
> 
> gamma(N+1)/(gamma(k+1)*(gamma(N-k)))
> 
> and the same with k, and got the error:
> 
> 1: value out of range in 'gammafn' in: gamma(N + 1)
> 2: value out of range in 'gammafn' in: gamma(N - k)
> 
> 
> Do you know why it's not working?
> 
> Thanks again,
> 
> Sigalit.
> 
> 
> 
> On 8/14/07, Moshe Olshansky <[EMAIL PROTECTED]>
> wrote:
> >
> > As I understand this,
> > P(T+ | D-)=1-P(T+ | D+)=0.05
> > is the probability not to detect desease for a
> person
> > at ICU who has the desease. Correct?
> >
> > What I asked was whether it is possible to
> mistakenly
> > detect the desease for a person who does not have
> it?
> >
> > Assuming that this is impossible the formula is
> below:
> >
> > If there are N patients, each has a probability p
> to
> > have the desease (p=0.6 in your case) and q is the
> > probability to detect the desease for a person who
> has
> > it (q = 0.95 for ICU and q = 0.8 for a regular
> unit),
> > then
> >
> > P(k have the desease AND m are detected) =
> > P(k have the desease)*P(m are detected / k have
> the
> > desease) =
> > C(N,k)*p^k*(1-p)^(N-k)*C(k,m)*q^m*(1-q)^(k-m)
> > where C(a,b) is the Binomial coefficient "a above
> b" -
> > the number of ways to choose b items out of a
> (when
> > the order does not matter). You of course must
> assume
> > that N >= k >= m >= 0 (otherwise the probability
> is
> > 0).
> >
> > To generate such pairs (k infected and m detected)
> you
> > can do the following:
> >
> > k <- rbinom(N,1,p)
> > m <- rbinom(k,1,q)
> >
> > Regards,
> >
> > Moshe.
> >
> > --- sigalit mangut-leiba <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > > The probability of false detection is: P(T+ |
> > > D-)=1-P(T+ | D+)=0.05.
> > > and I want to find the joint probability
> > > P(T+,D+)=P(T+|D+)*P(D+)
> > > Thank you for your reply,
> > > Sigalit.
> > >
> > >
> > > On 8/13/07, Moshe Olshansky
> <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > Hi Sigalit,
> > > >
> > > > Do you want to find the probability P(T+ = t
> AND
> > > D+ =
> > > > d) for all the combinations of t and d (for
> ICU
> > > and
> > > > Reg.)?
> > > > Is the probability of false detection (when
> there
> > > is
> > > > no disease) always 0?
> > > >
> > > > Regards,
> > > >
> > > > Moshe.
> > > >
> > > > --- sigalit mangut-leiba <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > hello,
> > > > > I asked about this simulation a few days
> ago,
> > > but
> > > > > still i can't get what i
> > > > > need.
> > > > > I have 2 units: icu and regular. from icu I
> want
> > > to
> > > > > take 200 observations
> > > > > from binomial distribution, when probability
> for
> > > > > disease is: p=0.6.
> > > > > from regular I want to take 300 observation
> with
> > > the
> > > > > same probability: p=0.6
> > > > > .
> > > > > the distribution to detect disease when
> disease
> > > > > occurred- *for someone from
> > > > > icu* - is: p(T+ | D+)=0.95.
> > > > > the distribution to detect disease when
> disease
> > > > > occurred- *for someone from
> > > > > reg.unit* - is: p(T+ | D+)=0.8.
> > > > > I want to compute the joint distribution for
> > > each
> > > > > unit: p(T+,D+) for icu,
> > > > > and the same for reg.
> > > > > I tried:
> > > > >
> > > > > pdeti <- 0
> > > > >
> > > > > pdetr <- 0
> > > > >
> > > > > picu <- pdeti*.6
> > > > >
> > > > > preg <- pdetr*.6
> > > > >
> > > > > dept <- c("icu","reg")
> > > > >
> > > > > icu <- rbinom(200, 1, .6)
> > > > >
> > > > > reg <- rbinom(300, 1, .6)
> > > > >
> > > > > for(i in 1:300) {
> > > > >
> > > > > if(dept=="icu") pdeti==0.95
> > > > >
> > > > > if (dept=="reg") pdetr== 0.80
> > > > >
> > > > > }
> > > > >
> > > > > print(picu)
> > > > >
> > > > > print(preg)
> > > > >
> > > > > and got 50 warnings:
> > > > >
> > > > > the condition has length > 1 and only the
> first
> > > > > element will be used in: if
> > > > > (dept == "icu") pdeti == 0.95
> > > > > the condition has length > 1 and only the
> first
> > > > > element will be used in: if
> > > > > (dept == "reg") pdetr == 0.8
> > > > >
> > > > > I would appreciate any suggestions,
> > > > >
> > > > > thank you,
> > > > >
> > > > > Sigalit.
> > > > >
> > > > >   [[alternative HTML version deleted]]
> > > > >
> > > > >
> __
> > > > > R-help@stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch