Re: [R] two methods for regression, two different results

2005-04-05 Thread Jari Oksanen
On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote:
> Please forgive a straight stats question, and the informal notation.
>  
> let us say we wish to perform a liner regression:
> y=b0 + b1*x + b2*z
>  
> There are two ways this can be done, the usual way, as a single
> regression, 
> fit1<-lm(y~x+z)
> or by doing two regressions. In the first regression we could have y as
> the dependent variable and x as the independent variable 
> fit2<-lm(y~x). 
> The second regrssion would be a regression in which the residuals from
> the first regression would be the depdendent variable, and the
> independent variable would be z.
> fit2<-lm(fit2$residuals~z)
>  
> I would think the two methods would give the same p value and the same
> beta coefficient for z. The don't. Can someone help my understand why
> the two methods do not give the same results. Additionally, could
> someone tell me when one method might be better than the other, i.e.
> what question does the first method anwser, and what question does the
> second method answer. I have searched a number of textbooks and have not
> found this question addressed.
>  
John,

Bill Venables already told you that they don't do that, because they are
not orthogonal. Here is a simpler way of getting the same result as he
suggested for the coefficients of z (but only for z):

> x <- runif(100)
> z <- x + rnorm(100, sd=0.4)
> y <- 3 + x + z + rnorm(100, sd=0.3)
> mod <- lm(y ~ x + z)
> mod2 <- lm(residuals(lm(y ~ x)) ~ x + z)
> summary(mod)

Call:
lm(formula = y ~ x + z)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.964360.06070  48.836  < 2e-16 ***
x0.962720.11576   8.317 5.67e-13 ***
z1.089220.06711  16.229  < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

> summary(mod2)

Call:
lm(formula = residuals(lm(y ~ x)) ~ x + z)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.157310.06070  -2.592   0.0110 *
x   -0.844590.11576  -7.296 8.13e-11 ***
z1.089220.06711  16.229  < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

You can omit x from the outer lm only if x and z are orthogonal,
although you already removed the effect of x... In orthogonal case the
coefficient for x would be 0.

Residuals are equal in these two models:

> range(residuals(mod) - residuals(mod2))
[1] -2.797242e-17  5.551115e-17

But, of course, fitted values are not equal, since you fit the mod2 to
the residuals after removing the effect of x...

cheers, jari oksanen
-- 
Jari Oksanen <[EMAIL PROTECTED]>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] French Curve

2005-04-05 Thread Martin Maechler
> "Michael" == Michael A Miller <[EMAIL PROTECTED]>
> on Tue, 05 Apr 2005 10:28:21 -0500 writes:

> "dream" == dream home <[EMAIL PROTECTED]> writes:
>> Does it sound like spline work do the job? It would be
>> hard to persuave him to use some modern math technique
>> but he did ask me to help him implement the French Curve
>> so he can do his work in Excel, rather than PAPER.

Michael> Splines are useful for interpolating points with a
Michael> continuous curve that passes through, or near, the
Michael> points.

not only!  (see below)

Michael> If you are looking for a way to estimate a
Michael> curve with a noise component removed, I think you'd
Michael> be better off filtering your data, rather than
Michael> interpolating with a spline.  

yes  for  "rather than interpolating"
no   for  "with a spline"

There's the  smooth.spline()   *smoothing* spline function (with
predict() methods, even for 1st and 2nd derivatives) which is
liked by `many' and even prefered to other ``filters'' for
diverse reasons, notably for the fact that spline smoothing
corresponds to linear "filtering" with a curvature-adaptive
varying bandwith.

Michael> Median (or mean) filtering may give results
Michael> similar to those from your chemist's manual method.
Michael> That is easy to do with running from the gtools
Michael> package.  The validity of this is another question!

Median filtering aka "running medians" has one distinctive
advantage {over smooth.spline() or other so called linear smoothers}:
   It is "robust" i.e. not distorted by gross outliers.
Running medians is implemented in runmed() {standard "stats" package}
in a particularly optimized way rather than using the more general
running(.) approach of package 'gtools'.

Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RE: ïdo0ïi4grjj40j09gjijgpïdï

2005-04-05 Thread domino . admin
Privitak je zaraÅen virusom. Kontaktirajte sistem administratora.
An attachment is infected by virus. Contact administrator.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] Version 0.93 of GAM package on CRAN

2005-04-05 Thread Trevor Hastie
I have posted an update to the GAM package. Note that this package 
implements gam() as described
in the "White" S book (Statistical models in S). In particular, you can 
fit models with lo() terms (local regression)
and/or s() terms (smoothing splines), mixed in, of course, with any 
terms appropriate for glms.

A number of bugs in version 0.92 have been fixed; notably

1) some problems with predict and newdata
2) plot.gam now works with any model for which predict( ..., 
type="terms") is appropriate
   (well, at least several). Examples are lm, glm, gam and coxph models.

  So for example, if you have fit a Cox model

cox1 <- coxph( Surv(Survival, death) ~ Grade + ns(Age,4) + ns(Size,4))

  Then plot.gam(cox1, se=T) will produce three plots, one for each term 
in the model, with standard error bands.

3) I have implemented the fast versions of backfitting for models 
consisting of all local regression
 terms (lo.wam) or all smoothing spline terms (s.wam).

Please let me know of any problems with the gam package

Trevor Hastie

---
  Trevor Hastie   [EMAIL PROTECTED]
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977 
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie 
    address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065 
  

[[alternative text/enriched version deleted]]

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] two methods for regression, two different results

2005-04-05 Thread Bill.Venables
This is possible if x and z are orthogonal, but in general it doesn't
work as you have noted.  (If it did work it would almost amount to a way
of inverting geenral square matrices by working one row at a time, no
going back...)

It is possible to fit a bivariate regression using simple linear
regression techniques iteratively like this, but it is a bit more
involved than your two step process.

1. regress y on x and take the residuals: ryx <- resid(lm(y ~ x))

2. regress z on x and take the residuals: rzx <- resid(lm(z ~ x))

3. regress ryx on rzx: fitz <- lm(ryx ~ rzx)

4. this gives you the estimate of the coefficient on z (what you call
below b2):
b2 <- coef(fitz)[2]

5. regress y - b2*z on x: fitx <- lm(I(y - b2*z) ~ x)

This last step gets you the estimates of b0 and b1.

None of this works with significances, though, because in going about it
this way you have essentially disguised the degrees of freedom involved.
So you can get the right estimates, but the standard errors,
t-statistics and residual variances are all somewhat inaccurate (though
usually not by much).

If x and z are orthogonal the (curious looking) step 2 is not needed.

This kind of idea lies behind some algorithms (e.g. Stevens' algorithm)
for fitting very large regressions essentially by iterative processes to
avoid constructing a huge model matrix.

Bill Venables

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin
Sent: Wednesday, 6 April 2005 12:55 PM
To: r-help@stat.math.ethz.ch
Subject: [R] two methods for regression, two different results


Please forgive a straight stats question, and the informal notation.
 
let us say we wish to perform a liner regression:
y=b0 + b1*x + b2*z
 
There are two ways this can be done, the usual way, as a single
regression, 
fit1<-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as
the dependent variable and x as the independent variable 
fit2<-lm(y~x). 
The second regrssion would be a regression in which the residuals from
the first regression would be the depdendent variable, and the
independent variable would be z.
fit2<-lm(fit2$residuals~z)
 
I would think the two methods would give the same p value and the same
beta coefficient for z. The don't. Can someone help my understand why
the two methods do not give the same results. Additionally, could
someone tell me when one method might be better than the other, i.e.
what question does the first method anwser, and what question does the
second method answer. I have searched a number of textbooks and have not
found this question addressed.
 
Thanks,
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
-- NOTE NEW EMAIL ADDRESS:
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] two methods for regression, two different results

2005-04-05 Thread John Sorkin
Please forgive a straight stats question, and the informal notation.
 
let us say we wish to perform a liner regression:
y=b0 + b1*x + b2*z
 
There are two ways this can be done, the usual way, as a single
regression, 
fit1<-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as
the dependent variable and x as the independent variable 
fit2<-lm(y~x). 
The second regrssion would be a regression in which the residuals from
the first regression would be the depdendent variable, and the
independent variable would be z.
fit2<-lm(fit2$residuals~z)
 
I would think the two methods would give the same p value and the same
beta coefficient for z. The don't. Can someone help my understand why
the two methods do not give the same results. Additionally, could
someone tell me when one method might be better than the other, i.e.
what question does the first method anwser, and what question does the
second method answer. I have searched a number of textbooks and have not
found this question addressed.
 
Thanks,
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
- NOTE NEW EMAIL ADDRESS:
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] nonlinear equation system

2005-04-05 Thread Jin Shusong
Dear all,

  Are there any functions which can solve a system of
nonlinear equations.  My system is R 2.0.1+Debian.

Thank you in advance.
-- 


   Yours  Sincerely

   Shusong Jin


===
Add: Meng Wah Complex, RM518 Email: [EMAIL PROTECTED]
 Dept. of Statistics Tel:   (+852)28597942
  and Actuarial Science  fax:   (+852)28589041
 The Univ. of Hong Kong
 Pokfulam Road, Hong Kong

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to do aggregate operations with non-scalar functions

2005-04-05 Thread Gabor Grothendieck
On Apr 5, 2005 6:59 PM, Itay Furman <[EMAIL PROTECTED]> wrote:
> 
> Hi,
> 
> I have a data set, the structure of which is something like this:
> 
> > a <- rep(c("a", "b"), c(6,6))
> > x <- rep(c("x", "y", "z"), c(4,4,4))
> > df <- data.frame(a=a, x=x, r=rnorm(12))
> 
> The true data set has >1 million rows. The factors "a" and "x"
> have about 70 levels each; combined together they subset 'df'
> into ~900 data frames.
> For each such subset I'd like to compute various statistics
> including quantiles, but I can't find an efficient way of
> doing this.  Aggregate() gives me the desired structure -
> namely, one row per subset - but I can use it only to compute
> a single quantile.
> 
> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
>   a x  x
> 1 a x  0.1693188
> 2 a y  0.1566322
> 3 b y -0.2677410
> 4 b z -0.6505710
> 
> With by() I could compute several quantiles per subset at
> each shot, but the structure of the output is not
> convenient for further analysis and visualization.
> 
> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
> a: a
> x: x
> 0%25%
> -0.7727268  0.1693188
> --
> a: b
> x: x
> NULL
> --
> 
> [snip]
> 
> I would like to end up with a data frame like this:
> 
>   a x 0%25%
> 1 a x -0.7727268  0.1693188
> 2 a y -0.3410671  0.1566322
> 3 b y -0.2914710 -0.2677410
> 4 b z -0.8502875 -0.6505710
> 
> I checked sweep() and apply() and didn't see how to harness
> them for that purpose.
> 
> So, is there a simple way to convert the object returned
> by by() into a data.frame?
> Or, is there a better way to go with this?
> Finally, if I should roll my own coercion function: any tips?
> 


One can use 

do.call("rbind", by(df, list(a = a, x = x), f))

where f is the appropriate function. 

In this case f can be described in terms of df.quantile which 
is like quantile except it returns a one row data frame:

df.quantile <- function(x,p) 
as.data.frame(t(data.matrix(quantile(x, p

f <- function(df, p = c(0.25, 0.5))
cbind(df[1,1:2], df.quantile(df[,"r"], p))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] nlme & SASmixed in 2.0.1

2005-04-05 Thread Deepayan Sarkar
On Tuesday 05 April 2005 18:40, Murray Jorgensen wrote:
> I assigned a class the first problem in Pinheiro & Bates, which uses
> the data set PBIB from the SASmixed package. I have recently
> downloaded 2.0.1 and its associated packages. On trying
>
> library(SASmixed)
> data(PBIB)
> library(nlme)
> plot(PBIB)
>
> I get a warning message
> Warning message:
> replacing previous import: coef in: namespaceImportFrom(self,
> asNamespace(ns))
>
> after library(nlme) and a "pairs" type plot of PBIB.

SASmixed now depends on lme4, which conflicted (until recently) with 
nlme. If you had your calls to library(SASmixed) and library(nlme) 
reversed, you probably would have gotten a warning.

I think the simplest thing you can do is not to load SASmixed at all. 
Instead, use 

data(PBIB, package = "SASmixed")

However, these datasets are (for all practical purposes) vanilla data 
frames, and you won't get trellis-style plots unless you create nlme 
groupedData objects yourself. (You should get them if you load the 
latticeExtra package manually, and then use 'gplot' instead of 'plot' 
to plot them.)

Deepayan

> Pressing on I get:
>  > lme1 <- lme(response ~ Treatment, data=PBIB, random =~1| Block)
>  > summary(lme1)
>
> Error: No slot of name "rep" for this object of class "lme"
> Error in deviance([EMAIL PROTECTED], REML = REML) :
>  Unable to find the argument "object" in selecting a method
> for function "deviance"
>
>
> Everything works fine under 1.8.1 and plot(PBIB) is of trellis style,
> which is what I think the authors intend.
>
> Cheers,   Murray Jorgensen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to do aggregate operations with non-scalar functions

2005-04-05 Thread Rich FitzJohn
Hi Itay,

Not sure if by() can do it directly, but this does it from first
principles, using lapply() and tapply() (which aggregate uses
internally).  It would be reasonably straightforward to wrap this up
in a function.

a <- rep(c("a", "b"), c(6,6))
x <- rep(c("x", "y", "z"), c(4,4,4))
df <- data.frame(a=a, x=x, r=rnorm(12))
## Probabilities for quantile
p <- c(.25, .5, .75)

## This does the hard work of calculating the statistics over your
## combinations, and over the values in `p'
y <- lapply(p, function(y)
tapply(df$r, list(a=a, x=x), quantile, probs=y))

## Then, we need to work out what combinations of a & x are possible:
## these are the header columns.  aggregate() does this in a much more
## complicated way, which may handle more difficult cases than this
## (e.g. if there are lots of missing values points, or something).
vars <- expand.grid(dimnames(y[[1]]))

## Finish up by converting `y' into a true data.frame, and ommiting
## all the cases where all the values in `y' are NA: these are
## combinations of a and x that we did not encounter.
y <- as.data.frame(lapply(y, as.vector))
names(y) <- paste(p, "%", sep="")
i <- colSums(apply(y, 1, is.na)) != ncol(y)
y <- cbind(vars, y)[i,]

Cheers,
Rich

On Apr 6, 2005 10:59 AM, Itay Furman <[EMAIL PROTECTED]> wrote:
> 
> Hi,
> 
> I have a data set, the structure of which is something like this:
> 
> > a <- rep(c("a", "b"), c(6,6))
> > x <- rep(c("x", "y", "z"), c(4,4,4))
> > df <- data.frame(a=a, x=x, r=rnorm(12))
> 
> The true data set has >1 million rows. The factors "a" and "x"
> have about 70 levels each; combined together they subset 'df'
> into ~900 data frames.
> For each such subset I'd like to compute various statistics
> including quantiles, but I can't find an efficient way of
> doing this.  Aggregate() gives me the desired structure -
> namely, one row per subset - but I can use it only to compute
> a single quantile.
> 
> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
>   a x  x
> 1 a x  0.1693188
> 2 a y  0.1566322
> 3 b y -0.2677410
> 4 b z -0.6505710
> 
> With by() I could compute several quantiles per subset at
> each shot, but the structure of the output is not
> convenient for further analysis and visualization.
> 
> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
> a: a
> x: x
> 0%25%
> -0.7727268  0.1693188
> --
> a: b
> x: x
> NULL
> --
> 
> [snip]
> 
> I would like to end up with a data frame like this:
> 
>   a x 0%25%
> 1 a x -0.7727268  0.1693188
> 2 a y -0.3410671  0.1566322
> 3 b y -0.2914710 -0.2677410
> 4 b z -0.8502875 -0.6505710
> 
> I checked sweep() and apply() and didn't see how to harness
> them for that purpose.
> 
> So, is there a simple way to convert the object returned
> by by() into a data.frame?
> Or, is there a better way to go with this?
> Finally, if I should roll my own coercion function: any tips?
> 
>Thank you very much in advance,
>Itay
> 
> 
> [EMAIL PROTECTED]  /  +1 (206) 543 9040  /  U of Washington
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
Rich FitzJohn
rich.fitzjohn  gmail.com   |http://homepages.paradise.net.nz/richa183
  You are in a maze of twisty little functions, all alike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Regression Modeling Strategies Workshop by Frank Harrell in Southern California

2005-04-05 Thread Madeline Bauer
Dr. Frank E. Harrell, Jr., Professor and Chair of the Department of 
Biostatistics at Vanderbilt University is giving a one-day workshop on 
Regression Modeling Strategies on Friday, April 29, 2005.  Analyses of the 
example datasets use R/S-Plus and make extensive use of the Hmisc library 
written by Professor Harrell.The workshop is sponsored by the Southern 
California Chapter of the American Statistical Association (SCASA) and will 
be held on the campus of California State University, Long Beach, about 30 
miles south of Los Angeles.  The workshop is open to anyone with interest 
in the topic, and SCASA is charging $60 if registering by April 18th and 
$65 after the 18th.  The registration fee covers all handouts, coffee and 
tea all day, a buffet lunch, and door prizes.  Springer-Verlag will have 
copies of the book Regression Modeling Strategies on sale at a 33% discount.

The workshop's web site is:
http://www.stat.ucla.edu/~rgould/asw2005/. For further information contact 
Anita Iannucci ([EMAIL PROTECTED]), Harold Dyck ([EMAIL PROTECTED]) or 
Madeline Bauer ([EMAIL PROTECTED]).

SCASA web site:  http://www.sc-asa.org/
===
Madeline Bauer, Ph.D.University of Southern California
Keck School of Medicine (Infectious Diseases)
IRD Room 620 (MC9520), 2020 Zonal Ave, Los Angeles 90033
(323) 226-2775 [Voice & FAX]
[EMAIL PROTECTED]  [Fastest communication method]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] nlme & SASmixed in 2.0.1

2005-04-05 Thread Murray Jorgensen
I assigned a class the first problem in Pinheiro & Bates, which uses the 
data set PBIB from the SASmixed package. I have recently downloaded 
2.0.1 and its associated packages. On trying

library(SASmixed)
data(PBIB)
library(nlme)
plot(PBIB)
I get a warning message
Warning message:
replacing previous import: coef in: namespaceImportFrom(self, 
asNamespace(ns))

after library(nlme) and a "pairs" type plot of PBIB.
Pressing on I get:
> lme1 <- lme(response ~ Treatment, data=PBIB, random =~1| Block)
> summary(lme1)
Error: No slot of name "rep" for this object of class "lme"
Error in deviance([EMAIL PROTECTED], REML = REML) :
Unable to find the argument "object" in selecting a method for 
function "deviance"
>

Everything works fine under 1.8.1 and plot(PBIB) is of trellis style, 
which is what I think the authors intend.

Cheers,   Murray Jorgensen
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] How to do aggregate operations with non-scalar functions

2005-04-05 Thread Itay Furman
Hi,
I have a data set, the structure of which is something like this:
a <- rep(c("a", "b"), c(6,6))
x <- rep(c("x", "y", "z"), c(4,4,4))
df <- data.frame(a=a, x=x, r=rnorm(12))
The true data set has >1 million rows. The factors "a" and "x"
have about 70 levels each; combined together they subset 'df'
into ~900 data frames.
For each such subset I'd like to compute various statistics
including quantiles, but I can't find an efficient way of
doing this.  Aggregate() gives me the desired structure - 
namely, one row per subset - but I can use it only to compute
a single quantile.

aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
  a x  x
1 a x  0.1693188
2 a y  0.1566322
3 b y -0.2677410
4 b z -0.6505710
With by() I could compute several quantiles per subset at
each shot, but the structure of the output is not
convenient for further analysis and visualization.
by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
a: a
x: x
0%25% 
-0.7727268  0.1693188 
-- 
a: b
x: x
NULL
--

[snip]
I would like to end up with a data frame like this:
  a x 0%25% 
1 a x -0.7727268  0.1693188 
2 a y -0.3410671  0.1566322 
3 b y -0.2914710 -0.2677410 
4 b z -0.8502875 -0.6505710

I checked sweep() and apply() and didn't see how to harness
them for that purpose.
So, is there a simple way to convert the object returned
by by() into a data.frame?
Or, is there a better way to go with this?
Finally, if I should roll my own coercion function: any tips?
Thank you very much in advance,
Itay

[EMAIL PROTECTED]  /  +1 (206) 543 9040  /  U of Washington
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] summing columns using partial labels

2005-04-05 Thread Rich FitzJohn
Gidday,

Perhaps try something along these lines:

## Establish which 4-letter group each row belongs to
prefix <- substr(names(d), 1, 4)
gp <- match(prefix, unique(prefix))
gp[regexpr("\\.total$", names(d)) > -1] <- NA # Exclude `*.total' rows

## Sum up each of the groups
d.sums <- lapply(split(seq(along=d), gp), function(x) rowSums(d[x]))
names(d.sums) <- paste(unique(prefix), "sum", sep=".")

## Append to the end of the original data.frame
d.new <- cbind(d, d.sums)

Cheers,
Rich

On Apr 6, 2005 6:05 AM, T Petersen <[EMAIL PROTECTED]> wrote:
> I have a dataset of the form
> 
> Year  tosk.fai   tosk.isd   tosk.gr  ...  tosk.total   hysa.fai
> hysa.isd ...
> 
> and so on. I want to sum all the columns using the first four letters in
> the columns label(e.g. 'tosk', 'hysa' etc.). How can you do that? Also,
> the sums should be without the '.total'column (e.g. 'tosk.total') as
> this serves as a check that everything was done right.
> 
> Kind regards
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
Rich FitzJohn
rich.fitzjohn  gmail.com   |http://homepages.paradise.net.nz/richa183
  You are in a maze of twisty little functions, all alike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R can not show plots (in Mac OS X terminal)

2005-04-05 Thread Tony Han Bao
On 5 Apr 2005, at 9:39 pm, Minyu Chen wrote:
Sorry for bothering again, but it doesn't work yet. Now it shows "x11" 
when I type getOption("device"), but when I do the plot, the terminal 
just simply told me x11 is not available.

This is why I asked you whether you have X11 before compiling R. It's 
no surprise that it doesn't work. I think now the best and cleanest way 
is to recompile R with the correct configurations so that it sets X11 
as the default device.

alternatively you may try options(device = 'quartz') to try to make use 
of quartz (Aqua) but I fear it won't work either.

Thanks,
Minyu Chen
On 5 Apr 2005, at 21:30, Tony Han Bao wrote:
On 5 Apr 2005, at 8:45 pm, Minyu Chen wrote:
No, the only output is postscipt. As I just install X11, I did not 
have it before compiling R.
You can try to set the device to x11 by issue the following command,
options(device = 'x11')
and hope now it works.
What to do now except for getting and compiling R Aqua?
Getting and compiling R Aqua is quite easy and you should do so for 
OS X to make use of its quartz and easy pdf related features.

All the best,
Thanks,
Minyu Chen
On 5 Apr 2005, at 20:23, Tony Han Bao wrote:
On 5 Apr 2005, at 19:12, Minyu Chen wrote:
Dear all:
I am a newbie in Mac. Just installed R and found R did not react 
on my command plot (I use command line in terminal). It did not 
give me any error message, either. All it did was just giving out 
a new command prompt--no reaction to the plot command. I suppose 
whenever I gives out a command of plot, it will invoke the 
AquaTerm for a small graph, as I experience in octave. What can I 
do for it?

issue command
>getOption("device")
to check the output device. The default I have on OS X is X11, do 
you have it installed before compiling R?

It could also be the case that you issued command such as 
postscript() before plot(...) but forgot to issue dev.off().

Many thanks,
Minyu Chen
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Tony Han Bao
[EMAIL PROTECTED]




Tony Han Bao
[EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] negativr binomial glmm's

2005-04-05 Thread dave fournier
Hi,
I guess by now you relaize that we have been trying to
promote our NBGLMM in order to show people some of the
capabiuliteis of our randome effects software ADMB-RE.
If you want I can help you to analyze your data with
the package we talk about on the R list. All I ask is that
if it works for you you write a brief note to the list
telling how it helped you.
 Cheers,
  dave
   Dave Fournier
   Otter Research Ltd
   PO Box 2040, sidney B.C.
   Canada  V8L 3S3
--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R can not show plots (in Mac OS X terminal)

2005-04-05 Thread Tony Han Bao
On 5 Apr 2005, at 8:45 pm, Minyu Chen wrote:
No, the only output is postscipt. As I just install X11, I did not 
have it before compiling R.
You can try to set the device to x11 by issue the following command,
options(device = 'x11')
and hope now it works.
What to do now except for getting and compiling R Aqua?
Getting and compiling R Aqua is quite easy and you should do so for OS 
X to make use of its quartz and easy pdf related features.

All the best,
Thanks,
Minyu Chen
On 5 Apr 2005, at 20:23, Tony Han Bao wrote:
On 5 Apr 2005, at 19:12, Minyu Chen wrote:
Dear all:
I am a newbie in Mac. Just installed R and found R did not react on 
my command plot (I use command line in terminal). It did not give me 
any error message, either. All it did was just giving out a new 
command prompt--no reaction to the plot command. I suppose whenever 
I gives out a command of plot, it will invoke the AquaTerm for a 
small graph, as I experience in octave. What can I do for it?

issue command
>getOption("device")
to check the output device. The default I have on OS X is X11, do you 
have it installed before compiling R?

It could also be the case that you issued command such as 
postscript() before plot(...) but forgot to issue dev.off().

Many thanks,
Minyu Chen
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Tony Han Bao
[EMAIL PROTECTED]


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] lists: removing elements, iterating over elements,

2005-04-05 Thread Gabor Grothendieck
On Apr 5, 2005 1:36 PM, Paul Johnson <[EMAIL PROTECTED]> wrote:
> I'm writing R code to calculate Hierarchical Social Entropy, a diversity
> index that Tucker Balch proposed.  One article on this was published in
> Autonomous Robots in 2000. You can find that and others through his web
> page at Georgia Tech.
> 
> http://www.cc.gatech.edu/~tucker/index2.html
> 
> While I work on this, I realize (again) that I'm a C programmer
> masquerading in R, and its really tricky working with R lists.  Here are
> things that surprise me, I wonder what your experience/advice is.
> 
> I need to calculate overlapping U-diametric clusters of a given radius.
>   (Again, I apologize this looks so much like C.)
> 
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap.  Clusters may be identical (redundant)
> getUDClusters <-function(distmat,radius){
>   mem <- list()
> 
>   nItems <- dim(distmat)[1]
>   for ( i in 1:nItems ){
> mem[[i]] <- c(i)
>   }
> 
>   for ( m in 1:nItems ){
> for ( n in 1:nItems ){
>   if (m != n & (distmat[m,n] <= radius)){
>##item is within radius, so add to collection m
> mem[[m]] <- sort(c( mem[[m]],n))
>   }
> }
>   }
> 
>   return(mem)
> }
> 
> That generates the list, like this:
> 
> [[1]]
> [1]  1  3  4  5  6  7  8  9 10
> 
> [[2]]
> [1]  2  3  4 10
> 
> [[3]]
> [1]  1  2  3  4  5  6  7  8 10
> 
> [[4]]
> [1]  1  2  3  4 10
> 
> [[5]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[6]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[7]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[8]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[9]]
> [1]  1  5  6  7  8  9 10
> 
> [[10]]
>  [1]  1  2  3  4  5  6  7  8  9 10
> 
> The next task is to eliminate the redundant elements.  unique() does not
> apply to lists, so I have to scan one by one.
> 
>   cluslist <- getUDClusters(distmat,radius)
> 
>   ##find redundant (same) clusters
>   redundantCluster <- c()
>   for (m in 1:(length(cluslist)-1)) {
> for ( n in (m+1): length(cluslist) ){
>   if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
> if ( sum(cluslist[[m]] == cluslist[[n]]){
>   redundantCluster <- c( redundantCluster,n)
> }
>   }
> }
>   }
> 
>   ##make sure they are sorted in reverse order
>   if (length(redundantCluster)>0)
> {
>   redundantCluster <- unique(sort(redundantCluster, decreasing=T))
> 
>   ## remove redundant clusters (must do in reverse order to preserve
> index of cluslist)
>   for (i in redundantCluster) cluslist[[i]] <- NULL
> }
> 
> Question: am I deleting the list elements properly?
> 
> I do not find explicit documentation for R on how to remove elements
> from lists, but trial and error tells me
> 
> myList[[5]] <- NULL
> 
> will remove the 5th element and then "close up" the hole caused by
> deletion of that element.  That suffles the index values, So I have to
> be careful in dropping elements. I must work from the back of the list
> to the front.
> 
> Is there an easier or faster way to remove the redundant clusters?
> 
> Now, the next question.  After eliminating the redundant sets from the
> list, I need to calculate the total number of items present in the whole
> list, figure how many are in each subset--each list item--and do some
> calculations.
> 
> I expected this would iterate over the members of the list--one step for
> each subcollection
> 
> for (i in cluslist){
> 
> }
> 
> but it does not.  It iterates over the items within the subsets of the
> list "cluslist."  I mean, if cluslist has 5 sets, each with 10 elements,
> this for loop takes 50 steps, one for each individual item.
> 
> I find this does what I want
> 
> for (i in 1:length(cluslist))
> 
> But I found out the hard way :)
> 
> Oh, one more quirk that fooled me.  Why does unique() applied to a
> distance matrix throw away the 0's  I think that's really bad!
> 
> > x <- rnorm(5)
> > myDist <- dist(x,diag=T,upper=T)
> > myDist
>   1 2 3 4 5
> 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918
> 2 1.2929976 0.000 0.3728735 1.3718027 0.7435058
> 3 1.6658710 0.3728735 0.000 0.9989292 1.1163793
> 4 2.6648003 1.3718027 0.9989292 0.000 2.1153085
> 5 0.5494918 0.7435058 1.1163793 2.1153085 0.000
> > unique(myDist)
>  [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
>  [8] 0.9989292 1.1163793 2.1153085
> >
> 
> --

If L is our list of vectors then the following gets the unique
elements of L.  

I have assumed that the individual vectors are sorted 
(sort them first if not via lapply(L, sort)) and that each element 
has a unique name (give it one if not, e.g. names(L) <- seq(L)).

The first line binds them together into rows.   This will
recycle to make them the same length and give you a warning 
but that's ok since you only need to know if they are the same or 
not.  Now, unique applied to a matrix finds the unique rows and 
in the third line

Re: [R] exclusion rules for propensity score matchng (pattern rec)

2005-04-05 Thread Frank E Harrell Jr
Alexis J. Diamond wrote:
hi,
thanks for the reply to my query about exclusion rules for propensity
score matching.

Exclusion can be based on the non-overlap regions from the propensity.
It should not be done in the individual covariate space.

i want a rule inspired by non-overlap in propensity score space, but that
binds in the space of the Xs.  because i don't really know how to
interpret the fact that i've excluded, say, people with scores > .87,
but i DO know what it means to say that i've excluded people from
country XYZ over age Q because i can't find good matches for them. if i
make my rule based on Xs, i know who i can and cannot make inference for,
and i can explain to other people who are the units that i can and cannot
make inference for.
after posting to the list last night, i thought of using the RGENOUD
package (genetic algorithm) to search over the space of exclusion rules
(eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function
associated with a rule should be increasing in # of tr units w/out support
excluded and decreasing in # of tr units w/ support excluded.
it might be tricky to get the right loss function, and i know this idea is
kind of nutty, but it's the only automated search method i could think of.
any comments?
alexis
Use the X space directly will not result in optimum exclusions unless 
you use a distance function but that will make assumptions.  My advice 
is to use rpart to make a classification rule that approximates the 
exclusion criteria to some desired degree of accuracy.  I.e. use rpart 
to predict propensity < lower cutoff and separately to predict 
propensity > upper cutoff.  This just assists in interpretation.

Frank


I tend to look
at the 10th smallest and largest values of propensity for each of the
two treatment groups for making the decision.  You will need to exclude
non-overlap regions whether you use matching or covariate adjustment of
propensity but covariate adjustment (using e.g. regression splines in
the logit of propensity) is often a better approach once you've been
careful about non-overlap.
Frank Harrell

On Tue, 5 Apr 2005, Frank E Harrell Jr wrote:

[EMAIL PROTECTED] wrote:
Dear R-list,
i have 6 different sets of samples.  Each sample has about 5000 observations,
with each observation comprised of 150 baseline covariates (X), 125 of which
are dichotomous. Roughly 20% of the observations in each sample are "treatment"
and the rest are "control" units.
i am doing propensity score matching, i have already estimated propensity
scores(predicted probabilities) using logistic regression, and in each sample i
am going to have to exclude approximately 100 treated observations for which I
cannot find matching control observations (because the scores for these treated
units are outside the support of the scores for control units).
in each sample, i must identify an exclusion rule that is interpretable on the
scale of the X's that excludes these unmatchable treated observations and
excludes as FEW of the remaining treated observations as possible.
(the reason is that i want to be able to explain, in terms of the Xs, who the
individuals are that I making causal inference about.)
i've tried some simple stuff over the past few days and nothing's worked.
is there an R-package or algorithm, or even estimation strategy that anyone
could recommend?
(i am really hoping so!)
thank you,
alexis diamond

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University


--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] lists: removing elements, iterating over elements,

2005-04-05 Thread Huntsinger, Reid
To get the neighborhoods of radius r of each point in your data set, given
distances calculated already in the matrix d, you could do (but note below)

$ A <- (d <= r)

then rows (or columns) of A are indicator vectors for the neighborhoods.
"Unique" will work on these vectors, as "unique.array", to give the unique
rows, which would be the unique neighborhood lists:

$ unique(A)

Your question about why "unique" applied to a distance matrix ignores zeros
points to a possible problem: the object you get from dist() is not a
matrix. The "upper" and "diag" options only control printing. If you check
length() you'll see you only have n(n-1)/2 elements, the lower triangle of
the distance matrix. (To answer the question: unique() sees only these;
there's not a method for objects of class dist.) So you need to do

$ d <- as.matrix(distmat)

to get a matrix. 

Reid Huntsinger



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul Johnson
Sent: Tuesday, April 05, 2005 1:36 PM
To: r-help@stat.math.ethz.ch
Subject: [R] lists: removing elements, iterating over elements,


I'm writing R code to calculate Hierarchical Social Entropy, a diversity 
index that Tucker Balch proposed.  One article on this was published in 
Autonomous Robots in 2000. You can find that and others through his web 
page at Georgia Tech.

http://www.cc.gatech.edu/~tucker/index2.html

While I work on this, I realize (again) that I'm a C programmer 
masquerading in R, and its really tricky working with R lists.  Here are 
things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius. 
   (Again, I apologize this looks so much like C.)


## Returns a list of all U-diametric clusters of a given radius
## Give an R distance matrix
## Clusters may overlap.  Clusters may be identical (redundant)
getUDClusters <-function(distmat,radius){
   mem <- list()

   nItems <- dim(distmat)[1]
   for ( i in 1:nItems ){
 mem[[i]] <- c(i)
   }


   for ( m in 1:nItems ){
 for ( n in 1:nItems ){
   if (m != n & (distmat[m,n] <= radius)){
##item is within radius, so add to collection m
 mem[[m]] <- sort(c( mem[[m]],n))
   }
 }
   }

   return(mem)
}


That generates the list, like this:

[[1]]
[1]  1  3  4  5  6  7  8  9 10

[[2]]
[1]  2  3  4 10

[[3]]
[1]  1  2  3  4  5  6  7  8 10

[[4]]
[1]  1  2  3  4 10

[[5]]
[1]  1  3  5  6  7  8  9 10

[[6]]
[1]  1  3  5  6  7  8  9 10

[[7]]
[1]  1  3  5  6  7  8  9 10

[[8]]
[1]  1  3  5  6  7  8  9 10

[[9]]
[1]  1  5  6  7  8  9 10

[[10]]
  [1]  1  2  3  4  5  6  7  8  9 10


The next task is to eliminate the redundant elements.  unique() does not 
apply to lists, so I have to scan one by one.


   cluslist <- getUDClusters(distmat,radius)

   ##find redundant (same) clusters
   redundantCluster <- c()
   for (m in 1:(length(cluslist)-1)) {
 for ( n in (m+1): length(cluslist) ){
   if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
 if ( sum(cluslist[[m]] == cluslist[[n]]){
   redundantCluster <- c( redundantCluster,n)
 }
   }
 }
   }


   ##make sure they are sorted in reverse order
   if (length(redundantCluster)>0)
 {
   redundantCluster <- unique(sort(redundantCluster, decreasing=T))

   ## remove redundant clusters (must do in reverse order to preserve 
index of cluslist)
   for (i in redundantCluster) cluslist[[i]] <- NULL
 }


Question: am I deleting the list elements properly?

I do not find explicit documentation for R on how to remove elements 
from lists, but trial and error tells me

myList[[5]] <- NULL

will remove the 5th element and then "close up" the hole caused by 
deletion of that element.  That suffles the index values, So I have to 
be careful in dropping elements. I must work from the back of the list 
to the front.


Is there an easier or faster way to remove the redundant clusters?


Now, the next question.  After eliminating the redundant sets from the 
list, I need to calculate the total number of items present in the whole 
list, figure how many are in each subset--each list item--and do some 
calculations.

I expected this would iterate over the members of the list--one step for 
each subcollection

for (i in cluslist){

}

but it does not.  It iterates over the items within the subsets of the 
list "cluslist."  I mean, if cluslist has 5 sets, each with 10 elements, 
this for loop takes 50 steps, one for each individual item.

I find this does what I want

for (i in 1:length(cluslist))

But I found out the hard way :)


Oh, one more quirk that fooled me.  Why does unique() applied to a 
distance matrix throw away the 0's  I think that's really bad!

 > x <- rnorm(5)
 > myDist <- dist(x,diag=T,upper=T)
 > myDist
   1 2 3 4 5
1 0.000 1.2929976 1.6658710 2.6648003 0.5494918
2 1.2929976 0.000 0.3728735 1.3718027 0.7435058
3 1.

Re: [R] R can not show plots (in Mac OS X terminal)

2005-04-05 Thread Tony Han Bao
On 5 Apr 2005, at 19:12, Minyu Chen wrote:
Dear all:
I am a newbie in Mac. Just installed R and found R did not react on my 
command plot (I use command line in terminal). It did not give me any 
error message, either. All it did was just giving out a new command 
prompt--no reaction to the plot command. I suppose whenever I gives 
out a command of plot, it will invoke the AquaTerm for a small graph, 
as I experience in octave. What can I do for it?

issue command
>getOption("device")
to check the output device. The default I have on OS X is X11, do you 
have it installed before compiling R?

It could also be the case that you issued command such as postscript() 
before plot(...) but forgot to issue dev.off().

Many thanks,
Minyu Chen
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Tony Han Bao
[EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] lists: removing elements, iterating over elements,

2005-04-05 Thread Liaw, Andy
> From: Paul Johnson
> 
> I'm writing R code to calculate Hierarchical Social Entropy, 
> a diversity 
> index that Tucker Balch proposed.  One article on this was 
> published in 
> Autonomous Robots in 2000. You can find that and others 
> through his web 
> page at Georgia Tech.
> 
> http://www.cc.gatech.edu/~tucker/index2.html
> 
> While I work on this, I realize (again) that I'm a C programmer 
> masquerading in R, and its really tricky working with R 
> lists.  Here are 
> things that surprise me, I wonder what your experience/advice is.
> 
> I need to calculate overlapping U-diametric clusters of a 
> given radius. 
>(Again, I apologize this looks so much like C.)
> 
> 
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap.  Clusters may be identical (redundant)
> getUDClusters <-function(distmat,radius){
>mem <- list()
> 
>nItems <- dim(distmat)[1]
>for ( i in 1:nItems ){
>  mem[[i]] <- c(i)
>}

This loop can be replaced with mem <- as.list(1:nItems)...

>for ( m in 1:nItems ){
>  for ( n in 1:nItems ){
>if (m != n & (distmat[m,n] <= radius)){
>   ##item is within radius, so add to collection m
>  mem[[m]] <- sort(c( mem[[m]],n))
>}
>  }
>}

If I understood the code correctly, this should do the same:

neighbors <- which(distmat <= radius, arr.ind=TRUE)
neighbors <- neighbors[neighbors[, 1] != neighbors[, 2],]
mem <- split(neighbors[, 2], neighbors[, 1])

What I'm not sure of is whether you intend to include the i-th item in the
i-th list (since the distance is presumably 0).  Your code seems to indicate
no, as you have m != n in the if() condition.  The second line above removes
such results.  However, your list below seems to indicate that you do have
such elements in your lists.  If such results can not be in the list, then
the list should already be unique, no?

For deleting an element of a list, see R FAQ 7.1.

HTH,
Andy

 
>return(mem)
> }
> 
> 
> That generates the list, like this:
> 
> [[1]]
> [1]  1  3  4  5  6  7  8  9 10
> 
> [[2]]
> [1]  2  3  4 10
> 
> [[3]]
> [1]  1  2  3  4  5  6  7  8 10
> 
> [[4]]
> [1]  1  2  3  4 10
> 
> [[5]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[6]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[7]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[8]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[9]]
> [1]  1  5  6  7  8  9 10
> 
> [[10]]
>   [1]  1  2  3  4  5  6  7  8  9 10
> 
> 
> The next task is to eliminate the redundant elements.  
> unique() does not 
> apply to lists, so I have to scan one by one.
> 
> 
>cluslist <- getUDClusters(distmat,radius)
> 
>##find redundant (same) clusters
>redundantCluster <- c()
>for (m in 1:(length(cluslist)-1)) {
>  for ( n in (m+1): length(cluslist) ){
>if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
>  if ( sum(cluslist[[m]] == cluslist[[n]]){
>redundantCluster <- c( redundantCluster,n)
>  }
>}
>  }
>}
> 
> 
>##make sure they are sorted in reverse order
>if (length(redundantCluster)>0)
>  {
>redundantCluster <- unique(sort(redundantCluster, 
> decreasing=T))
> 
>## remove redundant clusters (must do in reverse order to preserve 
> index of cluslist)
>for (i in redundantCluster) cluslist[[i]] <- NULL
>  }
> 
> 
> Question: am I deleting the list elements properly?
> 
> I do not find explicit documentation for R on how to remove elements 
> from lists, but trial and error tells me
> 
> myList[[5]] <- NULL
> 
> will remove the 5th element and then "close up" the hole caused by 
> deletion of that element.  That suffles the index values, So 
> I have to 
> be careful in dropping elements. I must work from the back of 
> the list 
> to the front.
> 
> 
> Is there an easier or faster way to remove the redundant clusters?
> 
> 
> Now, the next question.  After eliminating the redundant sets 
> from the 
> list, I need to calculate the total number of items present 
> in the whole 
> list, figure how many are in each subset--each list item--and do some 
> calculations.
> 
> I expected this would iterate over the members of the 
> list--one step for 
> each subcollection
> 
> for (i in cluslist){
> 
> }
> 
> but it does not.  It iterates over the items within the 
> subsets of the 
> list "cluslist."  I mean, if cluslist has 5 sets, each with 
> 10 elements, 
> this for loop takes 50 steps, one for each individual item.
> 
> I find this does what I want
> 
> for (i in 1:length(cluslist))
> 
> But I found out the hard way :)
> 
> 
> Oh, one more quirk that fooled me.  Why does unique() applied to a 
> distance matrix throw away the 0's  I think that's really bad!
> 
>  > x <- rnorm(5)
>  > myDist <- dist(x,diag=T,upper=T)
>  > myDist
>1 2 3 4 5
> 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918
> 2 1.2929976 0.000 0.3728735 1.3718027 0.

Re: [R] GLMs: Negative Binomial family in R?

2005-04-05 Thread Pierre Kleiber
Check out these recent postings to the R list:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/48429.html
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/48646.html
  Cheers, Pierre
[EMAIL PROTECTED] wrote:
Greetings R Users!
I have a data set of count responses for which I have made repeated observations
on the experimental units (stream reaches) over two air photo dates, hence the
mixed effect.  I have been using Dr. Jim Lindsey's GLMM function found in his
"repeated" measures package with the "poisson" family.
My problem though is that I don't think the poisson distribution is the right
one to discribe my data which is overdispersed; the variance is greater than
the mean.  I have read that the "negative binomial" regression models can
account for some of the differences among observations by adding in a error
term that independent of the the covariates.
I haven't yet come across a mixed effects model that can use the "negative
binomial" distribution.
If any of you know of such a function - I will certainly look forward to hearing
from you!  Additionally, if any of you have insight on zero-inflated data, and
testing for this, I'd be interested in your comments too.  I'll post a summary
of your responses to this list.
Best Regards,
Nadele Flynn, M.Sc. candidate.
University of Alberta
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
-
Pierre Kleiber, Ph.D   Email: [EMAIL PROTECTED]
Fishery BiologistTel: 808 983-5399 / (hm)808 737-7544
NOAA Fisheries Service - Honolulu LaboratoryFax: 808 983-2902
2570 Dole St., Honolulu, HI 96822-2396
-
 "God could have told Moses about galaxies and mitochondria and
  all.  But behold... It was good enough for government work."
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] GLMs: Negative Binomial family in R?

2005-04-05 Thread Anders Nielsen
Hi,

Also consider using the function supplied in the post:

https://stat.ethz.ch/pipermail/r-help/2005-March/066752.html

for fitting negative binomial mixed effects models.

Cheers,

Anders.


On Tue, 5 Apr 2005, Achim Zeileis wrote:

> On Tue,  5 Apr 2005 11:20:37 -0600 [EMAIL PROTECTED] wrote:
>
> > Greetings R Users!
> >
> > I have a data set of count responses for which I have made repeated
> > observations on the experimental units (stream reaches) over two air
> > photo dates, hence the mixed effect.  I have been using Dr. Jim
> > Lindsey's GLMM function found in his"repeated" measures package with
> > the "poisson" family.
> >
> > My problem though is that I don't think the poisson distribution is
> > the right one to discribe my data which is overdispersed; the variance
> > is greater than the mean.  I have read that the "negative binomial"
> > regression models can account for some of the differences among
> > observations by adding in a error term that independent of the the
> > covariates.
>
> glm.nb() from package MASS fits negative binomial GLMs.
>
> > I haven't yet come across a mixed effects model that can use the
> > "negative binomial" distribution.
>
> For known theta, you can plug negative.binomial(theta) into glmmPQL()
> for example. (Both functions are also available in MASS.) I'm not sure
> whether there is also code available for unknown theta.
>
> > If any of you know of such a function - I will certainly look forward
> > to hearing from you!  Additionally, if any of you have insight on
> > zero-inflated data, and testing for this, I'd be interested in your
> > comments too.  I'll post a summary of your responses to this list.
>
> Look at package zicounts for zero-inflated Poisson and NB models. For
> these models, there is also code available at
>   http://pscl.stanford.edu/content.html
> which also hosts code for hurdle models.
>
> hth,
> Z
>
> > Best Regards,
> > Nadele Flynn, M.Sc. candidate.
> > University of Alberta
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] lists: removing elements, iterating over elements,

2005-04-05 Thread Berton Gunter
The following strategy may or may not work, depending on whether the numbers
in your lists are integers or could be the result of flowing point
computations (so that 2 might be 1.99... etc.). 

As I understand it, you wish to reduce an arbitrary list to one with unique
members, where each member is a list/set, of course. If so, one way to do it
is to convert the members to a vector of strings, find the unique strings,
then convert the result back to a list. Something like (warning: not fully
tested)

> test<-list(a=1:3,b=1:4,c=1:3,d=1:5,e=1:3)
> l1<-lapply(test, paste,collapse='+')
> l2<-unique(unlist(l1))
> l2
[1] "1+2+3" "1+2+3+4"   "1+2+3+4+5"
> lapply(strsplit(l2,split='+',fixed=TRUE),as.numeric)
[[1]]
[1] 1 2 3

[[2]]
[1] 1 2 3 4

[[3]]
[1] 1 2 3 4 5

The basic idea is to get your list into a form where the efficiency of
unique() can be brought to bear. There may of course be better ways to do
this.

HTH

Cheers,

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Paul Johnson
> Sent: Tuesday, April 05, 2005 10:36 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] lists: removing elements, iterating over elements, 
> 
> I'm writing R code to calculate Hierarchical Social Entropy, 
> a diversity 
> index that Tucker Balch proposed.  One article on this was 
> published in 
> Autonomous Robots in 2000. You can find that and others 
> through his web 
> page at Georgia Tech.
> 
> http://www.cc.gatech.edu/~tucker/index2.html
> 
> While I work on this, I realize (again) that I'm a C programmer 
> masquerading in R, and its really tricky working with R 
> lists.  Here are 
> things that surprise me, I wonder what your experience/advice is.
> 
> I need to calculate overlapping U-diametric clusters of a 
> given radius. 
>(Again, I apologize this looks so much like C.)
> 
> 
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap.  Clusters may be identical (redundant)
> getUDClusters <-function(distmat,radius){
>mem <- list()
> 
>nItems <- dim(distmat)[1]
>for ( i in 1:nItems ){
>  mem[[i]] <- c(i)
>}
> 
> 
>for ( m in 1:nItems ){
>  for ( n in 1:nItems ){
>if (m != n & (distmat[m,n] <= radius)){
>   ##item is within radius, so add to collection m
>  mem[[m]] <- sort(c( mem[[m]],n))
>}
>  }
>}
> 
>return(mem)
> }
> 
> 
> That generates the list, like this:
> 
> [[1]]
> [1]  1  3  4  5  6  7  8  9 10
> 
> [[2]]
> [1]  2  3  4 10
> 
> [[3]]
> [1]  1  2  3  4  5  6  7  8 10
> 
> [[4]]
> [1]  1  2  3  4 10
> 
> [[5]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[6]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[7]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[8]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[9]]
> [1]  1  5  6  7  8  9 10
> 
> [[10]]
>   [1]  1  2  3  4  5  6  7  8  9 10
> 
> 
> The next task is to eliminate the redundant elements.  
> unique() does not 
> apply to lists, so I have to scan one by one.
> 
> 
>cluslist <- getUDClusters(distmat,radius)
> 
>##find redundant (same) clusters
>redundantCluster <- c()
>for (m in 1:(length(cluslist)-1)) {
>  for ( n in (m+1): length(cluslist) ){
>if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
>  if ( sum(cluslist[[m]] == cluslist[[n]]){
>redundantCluster <- c( redundantCluster,n)
>  }
>}
>  }
>}
> 
> 
>##make sure they are sorted in reverse order
>if (length(redundantCluster)>0)
>  {
>redundantCluster <- unique(sort(redundantCluster, 
> decreasing=T))
> 
>## remove redundant clusters (must do in reverse order to preserve 
> index of cluslist)
>for (i in redundantCluster) cluslist[[i]] <- NULL
>  }
> 
> 
> Question: am I deleting the list elements properly?
> 
> I do not find explicit documentation for R on how to remove elements 
> from lists, but trial and error tells me
> 
> myList[[5]] <- NULL
> 
> will remove the 5th element and then "close up" the hole caused by 
> deletion of that element.  That suffles the index values, So 
> I have to 
> be careful in dropping elements. I must work from the back of 
> the list 
> to the front.
> 
> 
> Is there an easier or faster way to remove the redundant clusters?
> 
> 
> Now, the next question.  After eliminating the redundant sets 
> from the 
> list, I need to calculate the total number of items present 
> in the whole 
> list, figure how many are in each subset--each list item--and do some 
> calculations.
> 
> I expected this would iterate over the members of the 
> list--one step for 
> each subcollection
> 
> for (i in cluslist){
> 
> }
> 
> but it does not.  It iterates over the items within the 
> subsets of the 
> list "cluslist."  I mean, if cl

[R] R can not show plots (in Mac OS X terminal)

2005-04-05 Thread Minyu Chen
Dear all:
I am a newbie in Mac. Just installed R and found R did not react on my 
command plot (I use command line in terminal). It did not give me any 
error message, either. All it did was just giving out a new command 
prompt--no reaction to the plot command. I suppose whenever I gives out 
a command of plot, it will invoke the AquaTerm for a small graph, as I 
experience in octave. What can I do for it?

Many thanks,
Minyu Chen
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] summing columns using partial labels

2005-04-05 Thread T Petersen
I have a dataset of the form
Year  tosk.fai   tosk.isd   tosk.gr  ...  tosk.total   hysa.fai  
hysa.isd ...

and so on. I want to sum all the columns using the first four letters in 
the columns label(e.g. 'tosk', 'hysa' etc.). How can you do that? Also, 
the sums should be without the '.total'column (e.g. 'tosk.total') as 
this serves as a check that everything was done right.

Kind regards
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] GLMs: Negative Binomial family in R?

2005-04-05 Thread Achim Zeileis
On Tue,  5 Apr 2005 11:20:37 -0600 [EMAIL PROTECTED] wrote:

> Greetings R Users!
> 
> I have a data set of count responses for which I have made repeated
> observations on the experimental units (stream reaches) over two air
> photo dates, hence the mixed effect.  I have been using Dr. Jim
> Lindsey's GLMM function found in his"repeated" measures package with
> the "poisson" family.
> 
> My problem though is that I don't think the poisson distribution is
> the right one to discribe my data which is overdispersed; the variance
> is greater than the mean.  I have read that the "negative binomial"
> regression models can account for some of the differences among
> observations by adding in a error term that independent of the the
> covariates.

glm.nb() from package MASS fits negative binomial GLMs.

> I haven't yet come across a mixed effects model that can use the
> "negative binomial" distribution.

For known theta, you can plug negative.binomial(theta) into glmmPQL()
for example. (Both functions are also available in MASS.) I'm not sure
whether there is also code available for unknown theta.

> If any of you know of such a function - I will certainly look forward
> to hearing from you!  Additionally, if any of you have insight on
> zero-inflated data, and testing for this, I'd be interested in your
> comments too.  I'll post a summary of your responses to this list.

Look at package zicounts for zero-inflated Poisson and NB models. For
these models, there is also code available at
  http://pscl.stanford.edu/content.html
which also hosts code for hurdle models.

hth,
Z

> Best Regards,
> Nadele Flynn, M.Sc. candidate.
> University of Alberta
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] lists: removing elements, iterating over elements,

2005-04-05 Thread Paul Johnson
I'm writing R code to calculate Hierarchical Social Entropy, a diversity 
index that Tucker Balch proposed.  One article on this was published in 
Autonomous Robots in 2000. You can find that and others through his web 
page at Georgia Tech.

http://www.cc.gatech.edu/~tucker/index2.html
While I work on this, I realize (again) that I'm a C programmer 
masquerading in R, and its really tricky working with R lists.  Here are 
things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius. 
  (Again, I apologize this looks so much like C.)

## Returns a list of all U-diametric clusters of a given radius
## Give an R distance matrix
## Clusters may overlap.  Clusters may be identical (redundant)
getUDClusters <-function(distmat,radius){
  mem <- list()
  nItems <- dim(distmat)[1]
  for ( i in 1:nItems ){
mem[[i]] <- c(i)
  }
  for ( m in 1:nItems ){
for ( n in 1:nItems ){
  if (m != n & (distmat[m,n] <= radius)){
##item is within radius, so add to collection m
mem[[m]] <- sort(c( mem[[m]],n))
  }
}
  }
  return(mem)
}
That generates the list, like this:
[[1]]
[1]  1  3  4  5  6  7  8  9 10
[[2]]
[1]  2  3  4 10
[[3]]
[1]  1  2  3  4  5  6  7  8 10
[[4]]
[1]  1  2  3  4 10
[[5]]
[1]  1  3  5  6  7  8  9 10
[[6]]
[1]  1  3  5  6  7  8  9 10
[[7]]
[1]  1  3  5  6  7  8  9 10
[[8]]
[1]  1  3  5  6  7  8  9 10
[[9]]
[1]  1  5  6  7  8  9 10
[[10]]
 [1]  1  2  3  4  5  6  7  8  9 10
The next task is to eliminate the redundant elements.  unique() does not 
apply to lists, so I have to scan one by one.

  cluslist <- getUDClusters(distmat,radius)
  ##find redundant (same) clusters
  redundantCluster <- c()
  for (m in 1:(length(cluslist)-1)) {
for ( n in (m+1): length(cluslist) ){
  if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
if ( sum(cluslist[[m]] == cluslist[[n]]){
  redundantCluster <- c( redundantCluster,n)
}
  }
}
  }
  ##make sure they are sorted in reverse order
  if (length(redundantCluster)>0)
{
  redundantCluster <- unique(sort(redundantCluster, decreasing=T))
  ## remove redundant clusters (must do in reverse order to preserve 
index of cluslist)
  for (i in redundantCluster) cluslist[[i]] <- NULL
}

Question: am I deleting the list elements properly?
I do not find explicit documentation for R on how to remove elements 
from lists, but trial and error tells me

myList[[5]] <- NULL
will remove the 5th element and then "close up" the hole caused by 
deletion of that element.  That suffles the index values, So I have to 
be careful in dropping elements. I must work from the back of the list 
to the front.

Is there an easier or faster way to remove the redundant clusters?
Now, the next question.  After eliminating the redundant sets from the 
list, I need to calculate the total number of items present in the whole 
list, figure how many are in each subset--each list item--and do some 
calculations.

I expected this would iterate over the members of the list--one step for 
each subcollection

for (i in cluslist){
}
but it does not.  It iterates over the items within the subsets of the 
list "cluslist."  I mean, if cluslist has 5 sets, each with 10 elements, 
this for loop takes 50 steps, one for each individual item.

I find this does what I want
for (i in 1:length(cluslist))
But I found out the hard way :)
Oh, one more quirk that fooled me.  Why does unique() applied to a 
distance matrix throw away the 0's  I think that's really bad!

> x <- rnorm(5)
> myDist <- dist(x,diag=T,upper=T)
> myDist
  1 2 3 4 5
1 0.000 1.2929976 1.6658710 2.6648003 0.5494918
2 1.2929976 0.000 0.3728735 1.3718027 0.7435058
3 1.6658710 0.3728735 0.000 0.9989292 1.1163793
4 2.6648003 1.3718027 0.9989292 0.000 2.1153085
5 0.5494918 0.7435058 1.1163793 2.1153085 0.000
> unique(myDist)
 [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
 [8] 0.9989292 1.1163793 2.1153085
>
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] GLMs: Negative Binomial family in R?

2005-04-05 Thread nflynn
Greetings R Users!

I have a data set of count responses for which I have made repeated observations
on the experimental units (stream reaches) over two air photo dates, hence the
mixed effect.  I have been using Dr. Jim Lindsey's GLMM function found in his
"repeated" measures package with the "poisson" family.

My problem though is that I don't think the poisson distribution is the right
one to discribe my data which is overdispersed; the variance is greater than
the mean.  I have read that the "negative binomial" regression models can
account for some of the differences among observations by adding in a error
term that independent of the the covariates.

I haven't yet come across a mixed effects model that can use the "negative
binomial" distribution.

If any of you know of such a function - I will certainly look forward to hearing
from you!  Additionally, if any of you have insight on zero-inflated data, and
testing for this, I'd be interested in your comments too.  I'll post a summary
of your responses to this list.

Best Regards,
Nadele Flynn, M.Sc. candidate.
University of Alberta

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Dendrogram for a type unbalanced ANOVA

2005-04-05 Thread Ben Fairbank
It is seldom a good idea to use the _significance_ of a difference as a
surrogate for its magnitude.  The reason is that the significance varies
with many irrelevant aspects of the analysis, such as the model and the
sample size.  If you have other measures of group similarity, then there
are many hierarchical clustering methods that might meet your needs. Try
?hclust to get started.

Ben Fairbank

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of bob sandefur
Sent: Tuesday, April 05, 2005 11:39 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Dendrogram for a type unbalanced ANOVA

Hi-

I have about 20 groups for which I know the mean, variance, and number
of
points per group. Is here an R function where I can plot (3 group
example)
something like

|  |- 2
|  |---|
|  |   |- 1
|  |
|--|
|  |- 3
|
|
0 25 50   75 100

ie 1 and 2 are different at 75% level of confidence
   1 2 combined are  different from 3 at 25% level of confidence

If this plot is a silly idea can someone point me to a reference for
anything similar?

Thanx

Robert (Bob) L. Sandefur PE
Senior Geostatistician / Reserve Analyst
CAM
200 Union Suite G-13
Lakewood, Co
80228
 
[EMAIL PROTECTED]
 
303 472-3240 (cell) <-best  choice
 
303 716-1617 ext 14

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Dendrogram for a type unbalanced ANOVA

2005-04-05 Thread bob sandefur
Hi-

I have about 20 groups for which I know the mean, variance, and number of
points per group. Is here an R function where I can plot (3 group example)
something like

|  |- 2
|  |---|
|  |   |- 1
|  |
|--|
|  |- 3
|
|
0 25 50   75 100

ie 1 and 2 are different at 75% level of confidence
   1 2 combined are  different from 3 at 25% level of confidence

If this plot is a silly idea can someone point me to a reference for
anything similar?

Thanx

Robert (Bob) L. Sandefur PE
Senior Geostatistician / Reserve Analyst
CAM
200 Union Suite G-13
Lakewood, Co
80228
 
[EMAIL PROTECTED]
 
303 472-3240 (cell) <-best  choice
 
303 716-1617 ext 14

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] extracting Proportion Var and Cumulative Var values from factanal

2005-04-05 Thread Tamas Gal
Hi R users,
I need some help in the followings:
I'm doing factor analysis and I need to extract the loading values and
the Proportion Var and Cumulative Var values one by one.
Here is what I am doing:

> fact <- factanal(na.omit(gnome_freq_r2),factors=5);
> fact$loadings

Loadings:
 Factor1 Factor2 Factor3 Factor4 Factor5
b1freqr2  0.246   0.486  0.145 
b2freqr2  0.129   0.575   0.175   0.130   0.175 
b3freqr2  0.605   0.253   0.166   0.138   0.134 
b4freqr2  0.191   0.220   0.949 
b5freqr2  0.286   0.265   0.113   0.891   0.190 
b6freqr2  0.317   0.460   0.151 
b7freqr2  0.138   0.199  0.119   0.711 
b8freqr2  0.769   0.258  0.195   0.137 
b9freqr2  0.148   0.449  0.103   0.327 

Factor1 Factor2 Factor3 Factor4 Factor5
SS loadings  1.294   1.268   1.008   0.927   0.730
Proportion Var   0.144   0.141   0.112   0.103   0.081
Cumulative Var   0.144   0.285   0.397   0.500   0.581
> 

I can get the loadings using:

> fact$loadings[1,1]
[1] 0.2459635
> 

but I couldn't find the way to do the same with the Proportion Var and
Cumulative Var values.

Thanks,
Tamas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help with three-way anova

2005-04-05 Thread John Fox
Dear Michael,

For unbalanced data, you might want to take a look at the Anova()
function in the car package.

As well, it probably makes sense to read something about how linear
models are expressed in R. ?lm and ?formula both have some information
about model formulas; the Introduction to R manual that comes with R
has a chapter on statistical models; and books on R typically take up
the subject at greater length.

I hope this helps,
 John 

On Tue, 5 Apr 2005 15:51:46 +0100
 "michael watson \(IAH-C\)" <[EMAIL PROTECTED]> wrote:
> Hi
> 
> I have data from 12 subjects.  The measurement is log(expression) of
> a
> particular gene and can be assumed to be normally distributed.  The
> 12
> subjects are divided into the following groups:
> 
> Infected, Vaccinated, Lesions - 3 measurements
> Infected, Vaccintaed, No Lesions - 2 measurements
> Infected, Not Vaccinated, Lesions - 4 measurements
> Uninfected, Not Vaccinated, No Lesions - 3 measurements
> 
> Although presence/absence of lesions could be considered to be a
> phenotype, here I would like to use it as a factor.  This explains
> some
> of the imbalance in the design (ie we could not control how many
> subjects, if any, in each group would get lesions).
> 
> First impressions - the data looks like we would expect.  Gene
> expression is lowest in the infected/not vaccinated group, then next
> lowest is the infected/vaccinated group and finally comes the
> uninfected/not vaccinated group.  So the working hypothesis is that
> gene
> expression of the gene in question is lowered by infection, but that
> the
> vaccine somehow alleviates this effect, but not as much as to the
> level
> of a totally uninfected subject.  We *might* have access to data
> relating to uninfected/vaccinated group, my pet scientist is digging
> for
> this as we speak.
> 
> As for lesions, well none of the uninfected subjects have them, all
> of
> the infected/not vaccinated subjects have them, and some of the
> infected/vaccinated have them, some don't.  Again, this makes for a
> very
> sensible hypothesis if we treat presence/absence of lesions as a
> phenotype, but in addition to that I want to know if gene expression
> is
> linked to presence/absence of lesion, but only one group of subjects
> has
> both lesions and non-lesions within it.  Eye-balling this group,
> presence/absence of lesions and gene expression are not linked.
> 
> So I have this as a data.frame in R, and I wanted to run an analysis
> of
> variance.  I did:
> 
> aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
> summary(aov)
> 
> And got:
> 
> Df  Sum Sq Mean Sq F valuePr(>F)
> Infected 1 29.8482 29.8482 66.7037 3.761e-05 ***
> Vaccinated   1 13.5078 13.5078 30.1868 0.0005777 ***
> Lesions  1  0.0393  0.0393  0.0878 0.7746009
> Residuals8  3.5798  0.4475  
> ---
> 
> This tells me that Infected and Vaccinated are highly significant,
> whereas lesions are not.
> 
> So, what I want to know is:
> 
> 1) Given my unbalanced experimental design, is it valid to use aov?
> 2) Have I used aov() correctly?  If so, how do I get access results
> for
> interactions?
> 3) Is there some other, more relevant way of analysing this?  What I
> am
> really interested in is the gene expression, and whether it can be
> shown
> to be statistically related to one or more of the factors involved
> (Infected, Vaccinated, Lesions) or interactions between those
> factors.
> 
> Many thanks in advance
> 
> Mick
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help with three-way anova

2005-04-05 Thread Federico Calboli
On Tue, 2005-04-05 at 15:51 +0100, michael watson (IAH-C) wrote:

> So, what I want to know is:
> 
> 1) Given my unbalanced experimental design, is it valid to use aov?

I'd say no. Use lm() instead, save your analysis in an object and then
possibly use drop1() to check the analysis

> 2) Have I used aov() correctly?  If so, how do I get access results for
> interactions?

The use of aov() per se seems fine, but you did not put any interaction
in the model... for that use factor * factor.

HTH,

F

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] cat bailing out in a for loop

2005-04-05 Thread Liaw, Andy
See ?try.

Andy

> From:  Federico Calboli
> 
> Dear All,
> 
> I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42
> SNPs. I am using the function HWE.exact from the package "genetics".
> 
> In order not to do a lot of coding "by hand", I have a for loop that
> goes through each column (each column is one SNP) and gives me the
> p.value for HWE.exact. Unfortunately some SNP have reached 
> fixation and
> HWE.exact requires a 2 alleles scenario.
> 
> So my problem is that my for loop:
> 
> ##
> 
> for (i in 1:42){
> xxx<-HWE.exact(genotype(laba.con[,i+3], sep=""))
> cat(colnames(laba)[i+3],xxx$p.value,"\n")}
> 
> ##
> 
> bails out as soon as it hits a SNP at fixation for one allele, because
> HWE.exact fails. 
> 
> I have a lot of this game to play and checking things by hand is not
> idea, and I do not care about failed SNP, all I want is for 
> the loop to
> carry on regardless of what's in xxx$p.value, even if HWE.exact failed
> to calculte it. Dump anything in there! 
> 
> Is there any way of forcing the loop to carry on?
> 
> Cheers,
> 
> Federico Calboli
> 
> -- 
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
> 
> Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
> 
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cat bailing out in a for loop

2005-04-05 Thread Uwe Ligges
Federico Calboli wrote:
Dear All,
I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42
SNPs. I am using the function HWE.exact from the package "genetics".
In order not to do a lot of coding "by hand", I have a for loop that
goes through each column (each column is one SNP) and gives me the
p.value for HWE.exact. Unfortunately some SNP have reached fixation and
HWE.exact requires a 2 alleles scenario.
So my problem is that my for loop:
##
for (i in 1:42){
xxx<-HWE.exact(genotype(laba.con[,i+3], sep=""))
cat(colnames(laba)[i+3],xxx$p.value,"\n")}
##
bails out as soon as it hits a SNP at fixation for one allele, because
HWE.exact fails. 

I have a lot of this game to play and checking things by hand is not
idea, and I do not care about failed SNP, all I want is for the loop to
carry on regardless of what's in xxx$p.value, even if HWE.exact failed
to calculte it. Dump anything in there! 

Is there any way of forcing the loop to carry on?
See ?try.
Uwe Ligges

Cheers,
Federico Calboli
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Fitdistr and likelihood

2005-04-05 Thread Carsten Steinhoff
Hi all,

I'm using the function "fitdistr" (library MASS) to fit a distribution to
given data.
What I have to do further, is getting the log-Likelihood-Value from this
estimation.

Is there any simple possibility to realize it?

Regards, Carsten

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] French Curve

2005-04-05 Thread Michael A. Miller
> "dream" == dream home <[EMAIL PROTECTED]> writes:

> Does it sound like spline work do the job? It would be hard
> to persuave him to use some modern math technique but he
> did ask me to help him implement the French Curve so he can
> do his work in Excel, rather than PAPER.

Splines are useful for interpolating points with a continuous
curve that passes through, or near, the points.  If you are
looking for a way to estimate a curve with a noise component
removed, I think you'd be better off filtering your data, rather
than interpolating with a spline.  Median (or mean) filtering may
give  results similar to those from your chemist's manual
method.  That is easy to do with running from the gtools
package.  The validity of this is another question!

require(gtools)

x <- seq(250)/10
y1 <- sin(x) + 15 + rnorm(250)/2
y2 <- cos(x) + 12 + rnorm(250)

plot(x, y1, ylim=c(0,18), col='grey')
points(x, y2, pch=2, col='grey')
points(x, y1-y2, col='grey', pch=3)

## running median filters
lines(running(x), running(y1, fun=median), col='blue')
lines(running(x), running(y2, fun=median), col='blue')
lines(running(x), running(y1, fun=median)-running(y2, fun=median), col='blue')

## running mean filters
lines(running(x), running(y1), col='red')
lines(running(x), running(y2), col='red')
lines(running(x), running(y1)-running(y2), col='red')

f <- sin(x) + 15 - ( cos(x) + 12 )
lines(x, f)

Mike

-- 
Michael A. Miller   [EMAIL PROTECTED]
  Imaging Sciences, Department of Radiology, IU School of Medicine

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cat bailing out in a for loop

2005-04-05 Thread Federico Calboli
Dear All,

I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42
SNPs. I am using the function HWE.exact from the package "genetics".

In order not to do a lot of coding "by hand", I have a for loop that
goes through each column (each column is one SNP) and gives me the
p.value for HWE.exact. Unfortunately some SNP have reached fixation and
HWE.exact requires a 2 alleles scenario.

So my problem is that my for loop:

##

for (i in 1:42){
xxx<-HWE.exact(genotype(laba.con[,i+3], sep=""))
cat(colnames(laba)[i+3],xxx$p.value,"\n")}

##

bails out as soon as it hits a SNP at fixation for one allele, because
HWE.exact fails. 

I have a lot of this game to play and checking things by hand is not
idea, and I do not care about failed SNP, all I want is for the loop to
carry on regardless of what's in xxx$p.value, even if HWE.exact failed
to calculte it. Dump anything in there! 

Is there any way of forcing the loop to carry on?

Cheers,

Federico Calboli

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] rgl to quicktime

2005-04-05 Thread simone gabbriellini
hello,
is it possible to convert a 3D plot I made with rgl package to a 
quicktime VR?

thanks,
simone
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dead wood code

2005-04-05 Thread Dan Bebber
Mike,

I used R for exactly that purpose, to test a new
method for sampling coarse woody debris in silico
against existing alternatives. The results are
published in:
Bebber, D.P. & Thomas, S.C., 2003. Prism sweeps for
coarse woody debris. Canadian Journal of Forest
Research 33, 1737–1743.
I will put a reprint in the post, and will forward the
R code in a separate message. Please cite the article
and acknowledge the original code in any publications.

Cheers,
Dan Bebber

Department of Plant Sciences
University of Oxford
South Parks Road
Oxford
OX1 3RB
UK

> 
> Date: Tue, 5 Apr 2005 10:07:00 -0400
> From: "Mike Saunders"
> <[EMAIL PROTECTED]>
> To: "R Help" 
> Subject: [R] Dead wood code
> 
> 
> Is there a package, or does anyone have code they
> are willing to share,
> that would allow me to simulate sampling of dead
> wood pieces across an
> area?  I am specifically looking for code to
> simulate the dead wood
> distribution as small line segments across an
> extent, and then I will
> "sample" the dead wood using different sampling
> techniques including
> line transects, fixed area plots, etc.
> 
> Thanks,
> Mike
> 
> Mike Saunders
> Research Assistant
> Forest Ecosystem Research Program
> Department of Forest Ecosystem Sciences
> University of Maine
> Orono, ME  04469
> 207-581-2763 (O)
> 207-581-4257 (F)
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> 
> 
> 

Send instant messages to your online friends http://uk.messenger.yahoo.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] exclusion rules for propensity score matchng (pattern rec)

2005-04-05 Thread Alexis J. Diamond

hi,

thanks for the reply to my query about exclusion rules for propensity
score matching.

> Exclusion can be based on the non-overlap regions from the propensity.
> It should not be done in the individual covariate space.

i want a rule inspired by non-overlap in propensity score space, but that
binds in the space of the Xs.  because i don't really know how to
interpret the fact that i've excluded, say, people with scores > .87,
but i DO know what it means to say that i've excluded people from
country XYZ over age Q because i can't find good matches for them. if i
make my rule based on Xs, i know who i can and cannot make inference for,
and i can explain to other people who are the units that i can and cannot
make inference for.

after posting to the list last night, i thought of using the RGENOUD
package (genetic algorithm) to search over the space of exclusion rules
(eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function
associated with a rule should be increasing in # of tr units w/out support
excluded and decreasing in # of tr units w/ support excluded.

it might be tricky to get the right loss function, and i know this idea is
kind of nutty, but it's the only automated search method i could think of.

any comments?

alexis


> I tend to look
> at the 10th smallest and largest values of propensity for each of the
> two treatment groups for making the decision.  You will need to exclude
> non-overlap regions whether you use matching or covariate adjustment of
> propensity but covariate adjustment (using e.g. regression splines in
> the logit of propensity) is often a better approach once you've been
> careful about non-overlap.
>
> Frank Harrell


On Tue, 5 Apr 2005, Frank E Harrell Jr wrote:

> [EMAIL PROTECTED] wrote:
> > Dear R-list,
> >
> > i have 6 different sets of samples.  Each sample has about 5000 
> > observations,
> > with each observation comprised of 150 baseline covariates (X), 125 of which
> > are dichotomous. Roughly 20% of the observations in each sample are 
> > "treatment"
> > and the rest are "control" units.
> >
> > i am doing propensity score matching, i have already estimated propensity
> > scores(predicted probabilities) using logistic regression, and in each 
> > sample i
> > am going to have to exclude approximately 100 treated observations for 
> > which I
> > cannot find matching control observations (because the scores for these 
> > treated
> > units are outside the support of the scores for control units).
> >
> > in each sample, i must identify an exclusion rule that is interpretable on 
> > the
> > scale of the X's that excludes these unmatchable treated observations and
> > excludes as FEW of the remaining treated observations as possible.
> > (the reason is that i want to be able to explain, in terms of the Xs, who 
> > the
> > individuals are that I making causal inference about.)
> >
> > i've tried some simple stuff over the past few days and nothing's worked.
> > is there an R-package or algorithm, or even estimation strategy that anyone
> > could recommend?
> > (i am really hoping so!)
> >
> > thank you,
> >
> > alexis diamond
> >
>
>
>
> --
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help with three-way anova

2005-04-05 Thread michael watson \(IAH-C\)
Hi

I have data from 12 subjects.  The measurement is log(expression) of a
particular gene and can be assumed to be normally distributed.  The 12
subjects are divided into the following groups:

Infected, Vaccinated, Lesions - 3 measurements
Infected, Vaccintaed, No Lesions - 2 measurements
Infected, Not Vaccinated, Lesions - 4 measurements
Uninfected, Not Vaccinated, No Lesions - 3 measurements

Although presence/absence of lesions could be considered to be a
phenotype, here I would like to use it as a factor.  This explains some
of the imbalance in the design (ie we could not control how many
subjects, if any, in each group would get lesions).

First impressions - the data looks like we would expect.  Gene
expression is lowest in the infected/not vaccinated group, then next
lowest is the infected/vaccinated group and finally comes the
uninfected/not vaccinated group.  So the working hypothesis is that gene
expression of the gene in question is lowered by infection, but that the
vaccine somehow alleviates this effect, but not as much as to the level
of a totally uninfected subject.  We *might* have access to data
relating to uninfected/vaccinated group, my pet scientist is digging for
this as we speak.

As for lesions, well none of the uninfected subjects have them, all of
the infected/not vaccinated subjects have them, and some of the
infected/vaccinated have them, some don't.  Again, this makes for a very
sensible hypothesis if we treat presence/absence of lesions as a
phenotype, but in addition to that I want to know if gene expression is
linked to presence/absence of lesion, but only one group of subjects has
both lesions and non-lesions within it.  Eye-balling this group,
presence/absence of lesions and gene expression are not linked.

So I have this as a data.frame in R, and I wanted to run an analysis of
variance.  I did:

aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
summary(aov)

And got:

Df  Sum Sq Mean Sq F valuePr(>F)
Infected 1 29.8482 29.8482 66.7037 3.761e-05 ***
Vaccinated   1 13.5078 13.5078 30.1868 0.0005777 ***
Lesions  1  0.0393  0.0393  0.0878 0.7746009
Residuals8  3.5798  0.4475  
---

This tells me that Infected and Vaccinated are highly significant,
whereas lesions are not.

So, what I want to know is:

1) Given my unbalanced experimental design, is it valid to use aov?
2) Have I used aov() correctly?  If so, how do I get access results for
interactions?
3) Is there some other, more relevant way of analysing this?  What I am
really interested in is the gene expression, and whether it can be shown
to be statistically related to one or more of the factors involved
(Infected, Vaccinated, Lesions) or interactions between those factors.

Many thanks in advance

Mick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] exclusion rules for propensity score matchng (pattern rec)

2005-04-05 Thread Frank E Harrell Jr
[EMAIL PROTECTED] wrote:
Dear R-list,
i have 6 different sets of samples.  Each sample has about 5000 observations,
with each observation comprised of 150 baseline covariates (X), 125 of which
are dichotomous. Roughly 20% of the observations in each sample are "treatment"
and the rest are "control" units.
i am doing propensity score matching, i have already estimated propensity
scores(predicted probabilities) using logistic regression, and in each sample i
am going to have to exclude approximately 100 treated observations for which I
cannot find matching control observations (because the scores for these treated
units are outside the support of the scores for control units).
in each sample, i must identify an exclusion rule that is interpretable on the
scale of the X's that excludes these unmatchable treated observations and
excludes as FEW of the remaining treated observations as possible.
(the reason is that i want to be able to explain, in terms of the Xs, who the
individuals are that I making causal inference about.)
i've tried some simple stuff over the past few days and nothing's worked.
is there an R-package or algorithm, or even estimation strategy that anyone
could recommend?
(i am really hoping so!)
thank you,
alexis diamond
Exclusion can be based on the non-overlap regions from the propensity. 
It should not be done in the individual covariate space.  I tend to look 
at the 10th smallest and largest values of propensity for each of the 
two treatment groups for making the decision.  You will need to exclude 
non-overlap regions whether you use matching or covariate adjustment of 
propensity but covariate adjustment (using e.g. regression splines in 
the logit of propensity) is often a better approach once you've been 
careful about non-overlap.

Frank Harrell
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R package that has (much) the same capabilities as SAS v9 PROC GENMOD

2005-04-05 Thread Thomas Lumley
On Tue, 5 Apr 2005, Simon Blomberg wrote:
The questioner clearly wants generalized linear mixed models. lmer in package 
lme4 may be more appropriate. (Prof. Bates is a co-author.). glmmPQL should 
do the same job, though, but with less accuracy.
Actually, I think the questioner wants GEE, from geepack or yags.  SAS has 
an excellent glmm implementation, but it's through PROC NLMIXED rather 
than GENMOD, which does marginal models.

-thomas
Simon.
check glm()
On Apr 4, 2005 6:46 PM, William M. Grove <[EMAIL PROTECTED]> wrote:
 I need capabilities, for my data analysis, like the Pinheiro & Bates
 S-Plus/R package nlme() but with binomial family and logit link.
 I need multiple crossed, possibly interacting fixed effects (age cohort 
of
 twin when entered study, sex of twin, sampling method used to acquire 
twin
 pair, and twin zygosity), a couple of random effects other than the 
cluster
 variable, and the ability to have a variable of the sort that P&B call
 "outer" to the clustering variable---zygosity.  Dependent variables are 
all
 parental (mom, dad separately of course) psychiatric diagnoses.

 In my data, twin pair ID is the clustering variable; correlations are
 expected to be exchangeable but substantially different between members 
of
 monozygotic twin pairs and members of dizygotic twin pairs.  Hence, in my
 analyses, the variable that's "outer" to twin pair is monozygotic vs.
 dizygotic which of course applies to the whole pair.

 nlme() does all that but requires quasi-continuous responses, according 
to
 the preface/intro of P&B's mixed models book and what I infer from online
 help (i.e., no family= or link= argument).

 The repeated() library by Lindsey seems to handle just one nested random
 effect, or so I believe I read while scanning backlogs of the R-Help 
list.

 glmmPQL() is in the ballpark of what I need, but once again seems to lack
 the "outer" variable specification that nlme() has, and which PROC GENMOD
 also has---and which I need.
 I read someplace of yags() that apparently uses GEE to estimate 
parameters
 of nonlinear models including GLIMs/mixed models, just the way PROC 
GENMOD
 (and many another program) does.  But on trying to install it (either
 v4.0-1.zip or v4.0-2.tar.gz from Carey's site, or Ripley's Windows port)
 from a local, downloaded zip file (or tar.gz file converted to zip file), 
I
 always get an error saying:
  > Error in file(file, "r") : unable to open connection
  > In addition: Warning message:
  > cannot open file `YAGS/DESCRIPTION'
 with no obvious solution.

 So I can't really try it out to see if it does what I want.
 You may ask:  Why not just use GENMOD and skip the R hassles?  Because I
 want to embed the GLIM/mixed model analysis in a stratified resampling
 bootstrapping loop.  Very easy to implement in R, moderately painful to 
do
 in SAS.

 Can anybody give me a lead, or some guidance, about getting this job done
 in R?  Thanks in advance for your help.
 Regards,
 Will Grove  | Iohannes Paulus PP. II, xxx
 Psychology Dept. |
 U. of Minnesota  |
 -+
 X-headers have PGP key info.; Call me at 612.625.1599 to verify key 
fingerprint
 before accepting signed mail as authentic!

 
 
 Will Grove   | Iohannes Paulus PP. II,
 xxx 
 Psychology Dept. |
 U. of Minnesota  |
 -+
 
 X-headers have PGP key info.; Call me at 612.625.1599 to verify key
 fingerprint
 before accepting signed mail as authentic!
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
WenSui Liu, MS MA
Senior Decision Support Analyst
Division of Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

--
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany & Zoology
The Australian National University
Canberra ACT 0200
Australia
T: +61 2 6125 8057  email: [EMAIL PROTECTED]
F: +61 2 6125 5573
CRICOS Provider # 00120C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] accessing header information in a table

2005-04-05 Thread Wensui Liu
good catch. you can also get access to both colnames and rownames by dimnames().

On Apr 5, 2005 9:57 AM, Jabez Wilson <[EMAIL PROTECTED]> wrote:
> thank you for your replies
> 
> what I really wanted was the individual names, but I think I can get them with
> 
> colnames(myTable)[x]
> 
> where x is 1:7
> 
> Send instant messages to your online friends http://uk.messenger.yahoo.com
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
WenSui Liu, MS MA
Senior Decision Support Analyst
Division of Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dead wood code

2005-04-05 Thread Duncan Murdoch
On Tue, 5 Apr 2005 10:07:00 -0400, "Mike Saunders"
<[EMAIL PROTECTED]> wrote :

>Is there a package, or does anyone have code they are willing to share, that 
>would allow me to simulate sampling of dead wood pieces across an area?  I am 
>specifically looking for code to simulate the dead wood distribution as small 
>line segments across an extent, and then I will "sample" the dead wood using 
>different sampling techniques including line transects, fixed area plots, etc.

Walter Zucchini and others wrote a package for wildlife surveys; that
might have some methods in common with what you need.  The package was
called WiSP.  I don't see it on CRAN, but you can read the abstract
here:

http://www.math.usu.edu/~iface03/abstracts/03049.html

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Dead wood code

2005-04-05 Thread Mike Saunders
Is there a package, or does anyone have code they are willing to share, that 
would allow me to simulate sampling of dead wood pieces across an area?  I am 
specifically looking for code to simulate the dead wood distribution as small 
line segments across an extent, and then I will "sample" the dead wood using 
different sampling techniques including line transects, fixed area plots, etc.

Thanks,
Mike

Mike Saunders
Research Assistant
Forest Ecosystem Research Program
Department of Forest Ecosystem Sciences
University of Maine
Orono, ME  04469
207-581-2763 (O)
207-581-4257 (F)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] accessing header information in a table

2005-04-05 Thread Jabez Wilson
thank you for your replies
 
what I really wanted was the individual names, but I think I can get them with 
 
colnames(myTable)[x] 
 
where x is 1:7

Send instant messages to your online friends http://uk.messenger.yahoo.com 
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] accessing header information in a table

2005-04-05 Thread Dimitris Rizopoulos
Look at "?names()", "?colnames()", e.g. check
names(myTable)
I hope it helps.
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm

- Original Message - 
From: "Jabez Wilson" <[EMAIL PROTECTED]>
To: 
Sent: Tuesday, April 05, 2005 3:41 PM
Subject: [R] accessing header information in a table


Dear list, I have read an excel table into R using read.table() and 
headers=T. The table is 7 columns of figures. Is there any way to 
gain access to the header information? Say the table headers are 
Mon, Tue etc I can get to the data with myTable$Tue, but how do I 
get the headers themselves.

TIA
Jab
Send instant messages to your online friends 
http://uk.messenger.yahoo.com
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re[2]: [R] problems with subset (misunderstanding somewhere)

2005-04-05 Thread Wladimir Eremeev
Ted,

TH> Did you test the function cubic.distance?
Yes, I did.

TH> As written, I think it
TH> will always return a single value,
Yes, here was the misunderstanding.
Subset required a vector, and I gave it a scalar.

Prof. Ripley has already shown my mistake.

--
Best regards
Wladimir Eremeev mailto:[EMAIL PROTECTED]

==
Research Scientist, PhD   Leninsky Prospect 33,
Space Monitoring & Ecoinformation Systems Sector, Moscow, Russia, 119071,
Institute of Ecology, Phone: (095) 135-9972;
Russian Academy of Sciences   Fax: (095) 135-9972

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] accessing header information in a table

2005-04-05 Thread Jabez Wilson

Dear list, I have read an excel table into R using read.table() and headers=T. 
The table is 7 columns of figures. Is there any way to gain access to the 
header information? Say the table headers are Mon, Tue etc I can get to the 
data with myTable$Tue, but how do I get the headers themselves.

TIA

Jab

Send instant messages to your online friends http://uk.messenger.yahoo.com 
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] problems with subset (misunderstanding somewhere)

2005-04-05 Thread Ted Harding
On 05-Apr-05 Wladimir Eremeev wrote:
> Dear r-help,
> 
> I have the following function defined:
> 
> cubic.distance<-function(x1,y1,z1,x2,y2,z2) {
>   max(c(abs(x1-x2),abs(y1-y2),abs(z1-z2)))
> }
> 
> I have a data frame from which I make subsets.
> 
> When I call
>   subset(dataframe,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2)
> I have the result with 0 rows.
> 
> However, the data frame contains the row (among others, that suit)
> tb19v tb19h tb37v
> 226.6 189.3 208.4

Did you test the function cubic.distance? As written, I think it
will always return a single value, since max() returns the maximum
of *all* the values, not by rows (even if you use cbind() rather
than c()).

If yor redefine the function as

cubic.distance<-function(x1,y1,z1,x2,y2,z2) {
  apply(cbind(abs(x1-x2),abs(y1-y2),abs(z1-z2)),1,max)
}

I think you will find it does what you want (if I have
understood your problem correctly).

Example (with the function defined as above):

 x<-cbind(rnorm(10,190,1),rnorm(10,210,1),rnorm(10,227,1))
 x<-cbind(rnorm(10,190,2),rnorm(10,210,2),rnorm(10,227,2))
 colnames(x)<-c("tb19h","tb37v","tb19v")
 x.df<-as.data.frame(x)
 (cubic.distance(x.df$tb19h,x.df$tb37v,x.df$tb19v,190,210,227)<=2)
#[1] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
 subset(x.df,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2)
#  tb19htb37vtb19v
#3  189.3930 211.4345 226.3436
#4  189.4521 208.8493 228.0324
#9  188.2441 210.4914 226.4521
#10 191.4781 211.5234 226.1837

With your definition, you would have got the *single"
result FALSE, since there is at least one case where
the distance > 2, so the max > 2, so the subset criterion
evaluates to FALSE, so no rows are selected.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-05   Time: 14:29:22
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problems with subset (misunderstanding somewhere)

2005-04-05 Thread Prof Brian Ripley
On Tue, 5 Apr 2005, Wladimir Eremeev wrote:
Dear r-help,
I have the following function defined:
cubic.distance<-function(x1,y1,z1,x2,y2,z2) {
 max(c(abs(x1-x2),abs(y1-y2),abs(z1-z2)))
}
I have a data frame from which I make subsets.
When I call
 subset(dataframe,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2)
I have the result with 0 rows.
However, the data frame contains the row (among others, that suit)
tb19v tb19h tb37v
226.6 189.3 208.4
Call
  cubic.distance(189.3,208.4,226.6,190,210,227)
gives
[1] 1.6
Next call:
cubic.distance(189.3,208.4,226.6,190,210,227)<=2
[1] TRUE
It seems to me, that I have made errors somewhere in calls.
Could you, please, be so kind, to tell me, where they are?
Your function finds the maximum distance over all rows (it is passed 
vectors).  Replace max() by pmax() for a logical result for each row.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problems with subset (misunderstanding somewhere)

2005-04-05 Thread Wladimir Eremeev
Dear r-help,

I have the following function defined:

cubic.distance<-function(x1,y1,z1,x2,y2,z2) {
  max(c(abs(x1-x2),abs(y1-y2),abs(z1-z2)))
}

I have a data frame from which I make subsets.

When I call
  subset(dataframe,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2)
I have the result with 0 rows.

However, the data frame contains the row (among others, that suit)
tb19v tb19h tb37v
226.6 189.3 208.4

Call
   cubic.distance(189.3,208.4,226.6,190,210,227)
gives
[1] 1.6

Next call:
> cubic.distance(189.3,208.4,226.6,190,210,227)<=2
[1] TRUE

It seems to me, that I have made errors somewhere in calls.
Could you, please, be so kind, to tell me, where they are?
Thank you.

--
Best regards
Wladimir Eremeev mailto:[EMAIL PROTECTED]

==
Research Scientist, PhD   Leninsky Prospect 33,
Space Monitoring & Ecoinformation Systems Sector, Moscow, Russia, 119071,
Institute of Ecology, Phone: (095) 135-9972;
Russian Academy of Sciences   Fax: (095) 135-9972

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] extract date

2005-04-05 Thread Gabor Grothendieck
I just started using gmail and one thing that I thought would
be annoying but sometimes is actually interesting are the
ads at the right hand side.  They are keyed off the content
of the email and in the case of your post produced:

http://www.visibone.com/regular-expressions/?via=google120

http://www.regexpbuddy.com

The first one is advertising a javascript reference card (which
I happen to own and is excellent); but in any case, the contents 
of the regexp part of the reference card are fully reproduced on 
the web page and includes dozens of examples of regexps that 
you could try.  I haven't explored the other web site.

Although I have not read it, there is a book called Mastering
Regular Expressions.

By the way, here is an alternative to calculating nd in Prof.
Riley's post just to give you something else to play with. 
I think I prefer his solution but this one is arguably a bit
simpler.  The three portions separated by the two  bars
are each deleted if they are present.  gsub causes it
to repeatedly try them so that it does not stop after
deleting the first one:

nd <- gsub("Date: |.*, | ..:.*$", "", dates)

On Apr 5, 2005 7:22 AM, Petr Pikal <[EMAIL PROTECTED]> wrote:
> Dear Prof.Ripley
> 
> Thank you for your answer. After some tests and errors I finished
> with suitable extraction function which gives me substatnial
> increase in positive answers.
> 
> Nevertheless I definitely need to gain more practice in regular
> expressions, but from the help page I can grasp only easy things. Is
> there any "Regular expressions for dummies" available?
> 
> Best regards
> Petr Pikal
> 
> On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote:
> 
> > On Tue, 5 Apr 2005, Petr Pikal wrote:
> >
> > > Dear all,
> > >
> > > please, is there any possibility how to extract a date from data
> > > which are like this:
> >
> > Yes, if you delimit all the possibilities.
> >
> > > 
> > > "Date: Sat, 21 Feb 04 10:25:43 GMT"
> > > "Date: 13 Feb 2004 13:54:22 -0600"
> > > "Date: Fri, 20 Feb 2004 17:00:48 +"
> > > "Date: Fri, 14 Jun 2002 16:22:27 -0400"
> > > "Date: Wed, 18 Feb 2004 08:53:56 -0500"
> > > "Date: 20 Feb 2004 02:18:58 -0600"
> > > "Date: Sun, 15 Feb 2004 16:01:19 +0800"
> > > 
> > >
> > > I used
> > >
> > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
> > > sep="-"), format="%d-%b-%Y")
> > >
> > > which suits to lines 3:5 and 7 (such are the most common in my
> > > dataset) but obviously does not work with other lines.
> >
> > For those examples, in character vector 'dates' (without quotes):
> >
> > > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
> >   "\\1 \\2 \\3", dates)
> > > strptime(nd, "%d %b %y")
> > [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
> > [6] "2020-02-20" "2020-02-15"
> >
> > You should be able to amend the regexp for a wider range of forms, but
> > your first line is ambiguous (2004 or 2021?) so there are limits.
> >
> > > If there is no stightforward solution I can live with what I use now
> > > but some automagical function like
> > >
> > > give.me.date.from.my.string.regardles.of.formating(x)
> > > would be great.
> >
> > It would be impossible: when Americans write 07/04/2004 they do not
> > mean April 7th.
> >
> > --
> > Brian D. Ripley,  [EMAIL PROTECTED]
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford, Tel:  +44 1865 272861 (self) 1 South
> > Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG,
> > UKFax:  +44 1865 272595
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> 
> Petr Pikal
> [EMAIL PROTECTED]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] extract date

2005-04-05 Thread Prof Brian Ripley
On Tue, 5 Apr 2005, Petr Pikal wrote:
Dear Prof.Ripley
Thank you for your answer. After some tests and errors I finished
with suitable extraction function which gives me substatnial
increase in positive answers.
Nevertheless I definitely need to gain more practice in regular
expressions, but from the help page I can grasp only easy things. Is
there any "Regular expressions for dummies" available?
Not that I know of.
One of my sysadmins uses an O'Reilly pocket guide for reference, and the 
O'Reilly `Mastering Regiular Expressions' book is to my mind no better 
than the POSIX standards.

A quick look on Amazon suggests
Sams Teach Yourself Regular Expressions in 10 Minutes SAMS 0672325667
to be highly rated.
Best regards
Petr Pikal
On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote:
On Tue, 5 Apr 2005, Petr Pikal wrote:
Dear all,
please, is there any possibility how to extract a date from data
which are like this:
Yes, if you delimit all the possibilities.

"Date: Sat, 21 Feb 04 10:25:43 GMT"
"Date: 13 Feb 2004 13:54:22 -0600"
"Date: Fri, 20 Feb 2004 17:00:48 +"
"Date: Fri, 14 Jun 2002 16:22:27 -0400"
"Date: Wed, 18 Feb 2004 08:53:56 -0500"
"Date: 20 Feb 2004 02:18:58 -0600"
"Date: Sun, 15 Feb 2004 16:01:19 +0800"

I used
strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
sep="-"), format="%d-%b-%Y")
which suits to lines 3:5 and 7 (such are the most common in my
dataset) but obviously does not work with other lines.
For those examples, in character vector 'dates' (without quotes):
nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
  "\\1 \\2 \\3", dates)
strptime(nd, "%d %b %y")
[1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
[6] "2020-02-20" "2020-02-15"
You should be able to amend the regexp for a wider range of forms, but
your first line is ambiguous (2004 or 2021?) so there are limits.
If there is no stightforward solution I can live with what I use now
but some automagical function like
give.me.date.from.my.string.regardles.of.formating(x)
would be great.
It would be impossible: when Americans write 07/04/2004 they do not
mean April 7th.
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] extract date

2005-04-05 Thread Petr Pikal
Dear Prof.Ripley

Thank you for your answer. After some tests and errors I finished 
with suitable extraction function which gives me substatnial 
increase in positive answers. 

Nevertheless I definitely need to gain more practice in regular 
expressions, but from the help page I can grasp only easy things. Is 
there any "Regular expressions for dummies" available?

Best regards
Petr Pikal


On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote:

> On Tue, 5 Apr 2005, Petr Pikal wrote:
> 
> > Dear all,
> >
> > please, is there any possibility how to extract a date from data
> > which are like this:
> 
> Yes, if you delimit all the possibilities.
> 
> > 
> > "Date: Sat, 21 Feb 04 10:25:43 GMT"
> > "Date: 13 Feb 2004 13:54:22 -0600"
> > "Date: Fri, 20 Feb 2004 17:00:48 +"
> > "Date: Fri, 14 Jun 2002 16:22:27 -0400"
> > "Date: Wed, 18 Feb 2004 08:53:56 -0500"
> > "Date: 20 Feb 2004 02:18:58 -0600"
> > "Date: Sun, 15 Feb 2004 16:01:19 +0800"
> > 
> >
> > I used
> >
> > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
> > sep="-"), format="%d-%b-%Y")
> >
> > which suits to lines 3:5 and 7 (such are the most common in my
> > dataset) but obviously does not work with other lines.
> 
> For those examples, in character vector 'dates' (without quotes):
> 
> > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
>   "\\1 \\2 \\3", dates)
> > strptime(nd, "%d %b %y")
> [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
> [6] "2020-02-20" "2020-02-15"
> 
> You should be able to amend the regexp for a wider range of forms, but
> your first line is ambiguous (2004 or 2021?) so there are limits.
> 
> > If there is no stightforward solution I can live with what I use now
> > but some automagical function like
> >
> > give.me.date.from.my.string.regardles.of.formating(x)
> > would be great.
> 
> It would be impossible: when Americans write 07/04/2004 they do not
> mean April 7th.
> 
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self) 1 South
> Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG,
> UKFax:  +44 1865 272595
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Stats Question: Single data item versus Sample from Norm

2005-04-05 Thread Ted Harding
On 05-Apr-05 Ross Clement wrote:
> Hi. I have a question that I have asked in other stat forums
> but do not yet have an answer for. I would like to know if
> there is some way in R or otherwise of performing the following
> hypothesis test.
> 
> I have a single data item x. The null hypothesis is that x
> was selected from a normal distribution N(mu,sigma). The
> alternate hypothesis is that x does not come from this
> distribution.
> 
> However, I do not know the values of mu and sigma. I have a
> sample of size N from which I can estimate mu and sigma.
> So, say that I have N(m,s,N), and x. I would like to say with
> some certainty (e.g. 95%) that I can, or can't reject the
> hypothesis that x came from N(mu,sigma). I would also like a
> power test to say how large N should be given the degree of
> accuracy I need when accepting or rejecting individual x
> values.
> 
> What is the name of the hypothesis test I need for this?
> Is it built into R, or are there packages I could use?

There is no name because there is no unique test.

The difficulty lies in your statement of alternative hypothesis:
"that x does not come from this distribution."

This allows any distribution whatever to be a possible source
of your single observation x. Therefore, whatever the value
of x, you can reject the null hypothesis that it comes from
any N(mu,sigma^2) that is remotely compatible with your N data,
in favour of some distribution that happens to predict with
near-certainty that you will get that particular observation x.

On that basis, for instance, suppose you had m=1.1 and s=2.5
say. And suppose x=1.15 which is very close to m with a
difference which is much smaller than s. You are still
entitled to reject H0 on the basis that your alternative
allows you to postulate N(1.15,0.0001) as the source
of the observation x.

What you need to do is to make clear what feature of the
value of x, in relation to any given Normal distribution,
would constitute an indication that it was not sampled
from that distribution.

If (as I surmise) this is simply "distance from mu" [the
true mean of the Normal distribution], so that you are
basically testing whether x is an "outlier", then you
could use the simple fact that the distribution of

   ((x - m)(N/(N+1))^0.5)/s

has a t distribution with (N-1) degrees of freedom.

This, if you have to give it a name, would be a "t" test
since that is all it depends on.

Note, however, that this pre-supposes that the variance
of the distribution from which x was sampled is the
same as the variance of the distribution giving your N
values, and also that both distributions are Normal,
differing therefore only in their means. So this is a
tight restriction of your original universal class of
alternatives.

Best wishes,
Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-05   Time: 11:35:01
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Stats Question: Single data item versus Sample from Norma l Distribution

2005-04-05 Thread Liaw, Andy
> From: Liaw, Andy 
> 
> > From: Liaw, Andy 
> > 
> > Here's one possibility, assuming muhat and sigmahat are 
> > estimtes of mu and sigma from N iid draws of N(mu, sigma^2):
> > 
> > tStat <- abs(x - muhat) / sigmahat
> > pValue <- pt(tStat, df=N, lower=TRUE)
> 
> Oops...  That should be:
> 
> pValue <- pt(tStat, df=N, lower=TRUE) / 2

I'm going mad!  That should be:

pValue <- pt(tStat, df=N, lower=FALSE) / 2

Sorry for wasting bw...

Andy

> Andy
> 
> > 
> > I'm not quite sure what df tStat should have (exercise for 
> > math stat), but given fairly large N, that should make little 
> > difference.
> > 
> > Andy
> > 
> > > From: Ross Clement
> > > 
> > > Hi. I have a question that I have asked in other stat forums 
> > > but do not
> > > yet have an answer for. I would like to know if there is 
> > some way in R
> > > or otherwise of performing the following hypothesis test.
> > > 
> > > I have a single data item x. The null hypothesis is that x 
> > > was selected
> > > from a normal distribution N(mu,sigma). The alternate 
> > > hypothesis is that
> > > x does not come from this distribution.
> > > 
> > > However, I do not know the values of mu and sigma. I have a 
> > sample of
> > > size N from which I can estimate mu and sigma. So, say that I have
> > > N(m,s,N), and x. I would like to say with some certainty 
> > > (e.g. 95%) that
> > > I can, or can't reject the hypothesis that x came from 
> > N(mu,sigma). I
> > > would also like a power test to say how large N should be 
> given the
> > > degree of accuracy I need when accepting or rejecting individual x
> > > values.
> > > 
> > > What is the name of the hypothesis test I need for this? 
> Is it built
> > > into R, or are there packages I could use?
> > > 
> > > Thanks in anticipation,
> > > 
> > > Ross Clement.
> > > 
> > > __
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > > 
> > > 
> > 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Stats Question: Single data item versus Sample from Norma l Distribution

2005-04-05 Thread Liaw, Andy
> From: Liaw, Andy 
> 
> Here's one possibility, assuming muhat and sigmahat are 
> estimtes of mu and sigma from N iid draws of N(mu, sigma^2):
> 
> tStat <- abs(x - muhat) / sigmahat
> pValue <- pt(tStat, df=N, lower=TRUE)

Oops...  That should be:

pValue <- pt(tStat, df=N, lower=TRUE) / 2

Andy

> 
> I'm not quite sure what df tStat should have (exercise for 
> math stat), but given fairly large N, that should make little 
> difference.
> 
> Andy
> 
> > From: Ross Clement
> > 
> > Hi. I have a question that I have asked in other stat forums 
> > but do not
> > yet have an answer for. I would like to know if there is 
> some way in R
> > or otherwise of performing the following hypothesis test.
> > 
> > I have a single data item x. The null hypothesis is that x 
> > was selected
> > from a normal distribution N(mu,sigma). The alternate 
> > hypothesis is that
> > x does not come from this distribution.
> > 
> > However, I do not know the values of mu and sigma. I have a 
> sample of
> > size N from which I can estimate mu and sigma. So, say that I have
> > N(m,s,N), and x. I would like to say with some certainty 
> > (e.g. 95%) that
> > I can, or can't reject the hypothesis that x came from 
> N(mu,sigma). I
> > would also like a power test to say how large N should be given the
> > degree of accuracy I need when accepting or rejecting individual x
> > values.
> > 
> > What is the name of the hypothesis test I need for this? Is it built
> > into R, or are there packages I could use?
> > 
> > Thanks in anticipation,
> > 
> > Ross Clement.
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> > 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Stats Question: Single data item versus Sample from Norma l Distribution

2005-04-05 Thread Liaw, Andy
Here's one possibility, assuming muhat and sigmahat are estimtes of mu and
sigma from N iid draws of N(mu, sigma^2):

tStat <- abs(x - muhat) / sigmahat
pValue <- pt(tStat, df=N, lower=TRUE)

I'm not quite sure what df tStat should have (exercise for math stat), but
given fairly large N, that should make little difference.

Andy

> From: Ross Clement
> 
> Hi. I have a question that I have asked in other stat forums 
> but do not
> yet have an answer for. I would like to know if there is some way in R
> or otherwise of performing the following hypothesis test.
> 
> I have a single data item x. The null hypothesis is that x 
> was selected
> from a normal distribution N(mu,sigma). The alternate 
> hypothesis is that
> x does not come from this distribution.
> 
> However, I do not know the values of mu and sigma. I have a sample of
> size N from which I can estimate mu and sigma. So, say that I have
> N(m,s,N), and x. I would like to say with some certainty 
> (e.g. 95%) that
> I can, or can't reject the hypothesis that x came from N(mu,sigma). I
> would also like a power test to say how large N should be given the
> degree of accuracy I need when accepting or rejecting individual x
> values.
> 
> What is the name of the hypothesis test I need for this? Is it built
> into R, or are there packages I could use?
> 
> Thanks in anticipation,
> 
> Ross Clement.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] build failed of package

2005-04-05 Thread Lars Schouw
Dear Professor Ripley

The good news is that I fot PVM up and running one two
Windows nodes now. I had to connect them with each
other manually . for now not using rsh or ssh.

Now building RPVM for Windows might not be so easy as
it sounds. Did anyone try this out before
successfully?

Also the SNOW package but that did not look so bad.

Regards
Lars
--- Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> On Thu, 24 Mar 2005, A.J. Rossini wrote:
> 
> > Looks like you are trying to install source
> tarball on Windows without
> > the relevant toolset (compiler, etc)?
> 
> To save further hassle, rpvm is not going to build
> on Windows 
> unless you have PVM installed and working on
> Windows.
> 
> If that is the case, this looks like the use of the
> wrong make, with the 
> wrong shell (that message is coming from a Windows
> shell, not sh.exe). 
> Do see the warnings in README.packages about the
> MinGW make.
> 
> > On Thu, 24 Mar 2005 00:11:34 -0800 (PST), Lars
> Schouw
> > <[EMAIL PROTECTED]> wrote:
> >> I am trying to install the rpvm package doing
> this:
> >>
> >> C:\R\rw2000\bin>rcmd install rpvm_0.6-2.tar.gz
> >>
> >> '.' is not recognized as an internal or external
> >> command,
> >> operable program or batch file.
> >> '.' is not recognized as an internal or external
> >> command,
> >> operable program or batch file.
> >> make: *** /rpvm: No such file or directory. 
> Stop.
> >> make: *** [pkg-rpvm] Error 2
> >> *** Installation of rpvm failed ***
> >>
> >> Removing 'C:/R/rw2000/library/rpvm'
> >>
> >> What does this error message tell me?
> 
> 
> -- 
> Brian D. Ripley, 
> [EMAIL PROTECTED]
> Professor of Applied Statistics, 
> http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865
> 272861 (self)
> 1 South Parks Road, +44 1865
> 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865
> 272595
> 



__ 

Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Stats Question: Single data item versus Sample from Normal Distribution

2005-04-05 Thread Ross Clement
Hi. I have a question that I have asked in other stat forums but do not
yet have an answer for. I would like to know if there is some way in R
or otherwise of performing the following hypothesis test.

I have a single data item x. The null hypothesis is that x was selected
from a normal distribution N(mu,sigma). The alternate hypothesis is that
x does not come from this distribution.

However, I do not know the values of mu and sigma. I have a sample of
size N from which I can estimate mu and sigma. So, say that I have
N(m,s,N), and x. I would like to say with some certainty (e.g. 95%) that
I can, or can't reject the hypothesis that x came from N(mu,sigma). I
would also like a power test to say how large N should be given the
degree of accuracy I need when accepting or rejecting individual x
values.

What is the name of the hypothesis test I need for this? Is it built
into R, or are there packages I could use?

Thanks in anticipation,

Ross Clement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] extract date

2005-04-05 Thread Prof Brian Ripley
On Tue, 5 Apr 2005, Petr Pikal wrote:
Dear all,
please, is there any possibility how to extract a date from data
which are like this:
Yes, if you delimit all the possibilities.

"Date: Sat, 21 Feb 04 10:25:43 GMT"
"Date: 13 Feb 2004 13:54:22 -0600"
"Date: Fri, 20 Feb 2004 17:00:48 +"
"Date: Fri, 14 Jun 2002 16:22:27 -0400"
"Date: Wed, 18 Feb 2004 08:53:56 -0500"
"Date: 20 Feb 2004 02:18:58 -0600"
"Date: Sun, 15 Feb 2004 16:01:19 +0800"

I used
strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
sep="-"), format="%d-%b-%Y")
which suits to lines 3:5 and 7 (such are the most common in my
dataset) but obviously does not work with other lines.
For those examples, in character vector 'dates' (without quotes):
nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
 "\\1 \\2 \\3", dates)
strptime(nd, "%d %b %y")
[1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
[6] "2020-02-20" "2020-02-15"
You should be able to amend the regexp for a wider range of forms, but 
your first line is ambiguous (2004 or 2021?) so there are limits.

If there is no stightforward solution I can live with what I use now but 
some automagical function like

give.me.date.from.my.string.regardles.of.formating(x)
would be great.
It would be impossible: when Americans write 07/04/2004 they do not mean 
April 7th.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] extract date

2005-04-05 Thread Petr Pikal
Dear all, 

please, is there any possibility how to extract a date from data 
which are like this:


"Date: Sat, 21 Feb 04 10:25:43 GMT"
"Date: 13 Feb 2004 13:54:22 -0600" 
"Date: Fri, 20 Feb 2004 17:00:48 +"
"Date: Fri, 14 Jun 2002 16:22:27 -0400"
"Date: Wed, 18 Feb 2004 08:53:56 -0500"
"Date: 20 Feb 2004 02:18:58 -0600" 
"Date: Sun, 15 Feb 2004 16:01:19 +0800"


I used 

strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), 
sep="-"), format="%d-%b-%Y")

which suits to lines 3:5 and 7 (such are the most common in my 
dataset) but obviously does not work with other lines. If there is no 
stightforward solution I can live with what I use now but some 
automagical function like

give.me.date.from.my.string.regardles.of.formating(x)

would be great.

Thank you.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] locfit and memory allocation

2005-04-05 Thread Prof Brian Ripley
Calling gc() before starting a memory-intensive task is normally a good 
idea, as it helps avoid memory fragmentation (which is possibly a problem 
in a 32-bit OS, but you did not say).  R 2.1.0 beta has some dodges to 
help, so you may find if helpful to try that out.

On Mon, 4 Apr 2005, Mike Hickerson wrote:
Hello
I am getting memory allocation errors when running a function that uses
locfit within a for loop.  After 25 or so loops, it gives this error.
"Error: cannot allocate vector of size 281250 Kb"
Running on linux cluster with a Gb of RAM.  Problem never happens on my
OS X (less memory).  The total data is 130 cols by 5000 rows
The first 129 cols are response variables, the 130th is the parameter
The function fits a local regression between the 129 variables in the
ith row of m[ ] to the 129 variables in 5000 rows after m was fed into
130 different vectors called Var1, .Var129, and PARAMETER.
array <- scan(("DataFile"),nlines=5000)
 m<-matrix(array,ncol=130,byrow=T)
for (i in 1:200)
{
result<-
function(m[i,c(1,,129)],PARAMETER,cbind(Var1,...,Var129)seq(1,len=50
00),F)
}
Any ideas on how to avoid this memory allocation problem would be
greatly appreciated.  Garbage collection? (or is that too slow?)
Many Thanks in Advance!
Mike

Mike Hickerson
University of California
Museum of Vertebrate Zoology
3101 Valley Life Sciences Building
Berkeley, California  94720-3160  USA
voice 510-642-8911
cell: 510-701-0861
fax 510-643-8238
[EMAIL PROTECTED]
[[alternative text/enriched version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Amount of memory under different OS

2005-04-05 Thread Prof Brian Ripley
On Mon, 4 Apr 2005, bogdan romocea wrote:
You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use
more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of
RAM, the seller will probably warn you about Windows and recommend a
suitable OS.
There _are_ versions of Windows 2000 and later which can support more than 
4Gb (and I hope if you buy a box with more memory you would be sold the 
appropriate version).  If you have a 32-bit address space you cannot have 
more than 4Gb of addresses, physical or virtual.  So some modern `32-bit' 
chips have segmented addressing: see e.g.

http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx
(where PAE is the scheme used on IA-32 aka ix86 chips).  AFAIK the story 
is the same for most IA-32 OSes: each process gets a 4Gb address space and 
(usually) 3Gb of that is user space.  R's memory usage will get cramped 
when the address space is tight: you probably do not want to be using 1Gb 
objects in a 3Gb address space (and if you need to, try R-2.1.0 beta).

The question did not mention R, and `very tough' is not at all explicit.
For large R tasks we have been recommending 64-bit OSes for quite a long 
time.  There are 64-bit versions of Windows for AMD64/EMD64T chips, but 
not versions of the compilers used to build R for Windows.  There are many 
users here of R under Linux (usually Fedora Core 3 or SuSE 9.x) on such 
chips.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Saturday, April 02, 2005 12:48 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Amount of memory under different OS
I have a problem: I need to perform a very tough analysis, so I would 
like to buy a new computer with about 16 GB of RAM. Is it possible to 
use all this memory under Windows or have I to install other OS?

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Principle Component Analysis in R

2005-04-05 Thread F.Tusell
I think the function that does the printing of the loadings assumes
that eigenvectors are normalized to the corresponding eigenvalue, rather
than unity, and the output it produces is incorrect.

ft.
-- 
Fernando TUSELLe-mail:
Departamento de Econometría y Estadística   [EMAIL PROTECTED] 
Facultad de CC.EE. y Empresariales Tel:   (+34)94.601.3733
Avenida Lendakari Aguirre, 83  Fax:   (+34)94.601.3754
E-48015 BILBAO  (Spain)Secr:  (+34)94.601.3740

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html