Re: [R] two methods for regression, two different results
On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote: > Please forgive a straight stats question, and the informal notation. > > let us say we wish to perform a liner regression: > y=b0 + b1*x + b2*z > > There are two ways this can be done, the usual way, as a single > regression, > fit1<-lm(y~x+z) > or by doing two regressions. In the first regression we could have y as > the dependent variable and x as the independent variable > fit2<-lm(y~x). > The second regrssion would be a regression in which the residuals from > the first regression would be the depdendent variable, and the > independent variable would be z. > fit2<-lm(fit2$residuals~z) > > I would think the two methods would give the same p value and the same > beta coefficient for z. The don't. Can someone help my understand why > the two methods do not give the same results. Additionally, could > someone tell me when one method might be better than the other, i.e. > what question does the first method anwser, and what question does the > second method answer. I have searched a number of textbooks and have not > found this question addressed. > John, Bill Venables already told you that they don't do that, because they are not orthogonal. Here is a simpler way of getting the same result as he suggested for the coefficients of z (but only for z): > x <- runif(100) > z <- x + rnorm(100, sd=0.4) > y <- 3 + x + z + rnorm(100, sd=0.3) > mod <- lm(y ~ x + z) > mod2 <- lm(residuals(lm(y ~ x)) ~ x + z) > summary(mod) Call: lm(formula = y ~ x + z) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.964360.06070 48.836 < 2e-16 *** x0.962720.11576 8.317 5.67e-13 *** z1.089220.06711 16.229 < 2e-16 *** --- Residual standard error: 0.2978 on 97 degrees of freedom > summary(mod2) Call: lm(formula = residuals(lm(y ~ x)) ~ x + z) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.157310.06070 -2.592 0.0110 * x -0.844590.11576 -7.296 8.13e-11 *** z1.089220.06711 16.229 < 2e-16 *** --- Residual standard error: 0.2978 on 97 degrees of freedom You can omit x from the outer lm only if x and z are orthogonal, although you already removed the effect of x... In orthogonal case the coefficient for x would be 0. Residuals are equal in these two models: > range(residuals(mod) - residuals(mod2)) [1] -2.797242e-17 5.551115e-17 But, of course, fitted values are not equal, since you fit the mod2 to the residuals after removing the effect of x... cheers, jari oksanen -- Jari Oksanen <[EMAIL PROTECTED]> __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] French Curve
> "Michael" == Michael A Miller <[EMAIL PROTECTED]> > on Tue, 05 Apr 2005 10:28:21 -0500 writes: > "dream" == dream home <[EMAIL PROTECTED]> writes: >> Does it sound like spline work do the job? It would be >> hard to persuave him to use some modern math technique >> but he did ask me to help him implement the French Curve >> so he can do his work in Excel, rather than PAPER. Michael> Splines are useful for interpolating points with a Michael> continuous curve that passes through, or near, the Michael> points. not only! (see below) Michael> If you are looking for a way to estimate a Michael> curve with a noise component removed, I think you'd Michael> be better off filtering your data, rather than Michael> interpolating with a spline. yes for "rather than interpolating" no for "with a spline" There's the smooth.spline() *smoothing* spline function (with predict() methods, even for 1st and 2nd derivatives) which is liked by `many' and even prefered to other ``filters'' for diverse reasons, notably for the fact that spline smoothing corresponds to linear "filtering" with a curvature-adaptive varying bandwith. Michael> Median (or mean) filtering may give results Michael> similar to those from your chemist's manual method. Michael> That is easy to do with running from the gtools Michael> package. The validity of this is another question! Median filtering aka "running medians" has one distinctive advantage {over smooth.spline() or other so called linear smoothers}: It is "robust" i.e. not distorted by gross outliers. Running medians is implemented in runmed() {standard "stats" package} in a particularly optimized way rather than using the more general running(.) approach of package 'gtools'. Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RE: ïdo0ïi4grjj40j09gjijgpïdï
Privitak je zaraÅen virusom. Kontaktirajte sistem administratora. An attachment is infected by virus. Contact administrator. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] Version 0.93 of GAM package on CRAN
I have posted an update to the GAM package. Note that this package implements gam() as described in the "White" S book (Statistical models in S). In particular, you can fit models with lo() terms (local regression) and/or s() terms (smoothing splines), mixed in, of course, with any terms appropriate for glms. A number of bugs in version 0.92 have been fixed; notably 1) some problems with predict and newdata 2) plot.gam now works with any model for which predict( ..., type="terms") is appropriate (well, at least several). Examples are lm, glm, gam and coxph models. So for example, if you have fit a Cox model cox1 <- coxph( Surv(Survival, death) ~ Grade + ns(Age,4) + ns(Size,4)) Then plot.gam(cox1, se=T) will produce three plots, one for each term in the model, with standard error bands. 3) I have implemented the fast versions of backfitting for models consisting of all local regression terms (lo.wam) or all smoothing spline terms (s.wam). Please let me know of any problems with the gam package Trevor Hastie --- Trevor Hastie [EMAIL PROTECTED] Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 [[alternative text/enriched version deleted]] ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] two methods for regression, two different results
This is possible if x and z are orthogonal, but in general it doesn't work as you have noted. (If it did work it would almost amount to a way of inverting geenral square matrices by working one row at a time, no going back...) It is possible to fit a bivariate regression using simple linear regression techniques iteratively like this, but it is a bit more involved than your two step process. 1. regress y on x and take the residuals: ryx <- resid(lm(y ~ x)) 2. regress z on x and take the residuals: rzx <- resid(lm(z ~ x)) 3. regress ryx on rzx: fitz <- lm(ryx ~ rzx) 4. this gives you the estimate of the coefficient on z (what you call below b2): b2 <- coef(fitz)[2] 5. regress y - b2*z on x: fitx <- lm(I(y - b2*z) ~ x) This last step gets you the estimates of b0 and b1. None of this works with significances, though, because in going about it this way you have essentially disguised the degrees of freedom involved. So you can get the right estimates, but the standard errors, t-statistics and residual variances are all somewhat inaccurate (though usually not by much). If x and z are orthogonal the (curious looking) step 2 is not needed. This kind of idea lies behind some algorithms (e.g. Stevens' algorithm) for fitting very large regressions essentially by iterative processes to avoid constructing a huge model matrix. Bill Venables -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin Sent: Wednesday, 6 April 2005 12:55 PM To: r-help@stat.math.ethz.ch Subject: [R] two methods for regression, two different results Please forgive a straight stats question, and the informal notation. let us say we wish to perform a liner regression: y=b0 + b1*x + b2*z There are two ways this can be done, the usual way, as a single regression, fit1<-lm(y~x+z) or by doing two regressions. In the first regression we could have y as the dependent variable and x as the independent variable fit2<-lm(y~x). The second regrssion would be a regression in which the residuals from the first regression would be the depdendent variable, and the independent variable would be z. fit2<-lm(fit2$residuals~z) I would think the two methods would give the same p value and the same beta coefficient for z. The don't. Can someone help my understand why the two methods do not give the same results. Additionally, could someone tell me when one method might be better than the other, i.e. what question does the first method anwser, and what question does the second method answer. I have searched a number of textbooks and have not found this question addressed. Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 -- NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] two methods for regression, two different results
Please forgive a straight stats question, and the informal notation. let us say we wish to perform a liner regression: y=b0 + b1*x + b2*z There are two ways this can be done, the usual way, as a single regression, fit1<-lm(y~x+z) or by doing two regressions. In the first regression we could have y as the dependent variable and x as the independent variable fit2<-lm(y~x). The second regrssion would be a regression in which the residuals from the first regression would be the depdendent variable, and the independent variable would be z. fit2<-lm(fit2$residuals~z) I would think the two methods would give the same p value and the same beta coefficient for z. The don't. Can someone help my understand why the two methods do not give the same results. Additionally, could someone tell me when one method might be better than the other, i.e. what question does the first method anwser, and what question does the second method answer. I have searched a number of textbooks and have not found this question addressed. Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nonlinear equation system
Dear all, Are there any functions which can solve a system of nonlinear equations. My system is R 2.0.1+Debian. Thank you in advance. -- Yours Sincerely Shusong Jin === Add: Meng Wah Complex, RM518 Email: [EMAIL PROTECTED] Dept. of Statistics Tel: (+852)28597942 and Actuarial Science fax: (+852)28589041 The Univ. of Hong Kong Pokfulam Road, Hong Kong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to do aggregate operations with non-scalar functions
On Apr 5, 2005 6:59 PM, Itay Furman <[EMAIL PROTECTED]> wrote: > > Hi, > > I have a data set, the structure of which is something like this: > > > a <- rep(c("a", "b"), c(6,6)) > > x <- rep(c("x", "y", "z"), c(4,4,4)) > > df <- data.frame(a=a, x=x, r=rnorm(12)) > > The true data set has >1 million rows. The factors "a" and "x" > have about 70 levels each; combined together they subset 'df' > into ~900 data frames. > For each such subset I'd like to compute various statistics > including quantiles, but I can't find an efficient way of > doing this. Aggregate() gives me the desired structure - > namely, one row per subset - but I can use it only to compute > a single quantile. > > > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25) > a x x > 1 a x 0.1693188 > 2 a y 0.1566322 > 3 b y -0.2677410 > 4 b z -0.6505710 > > With by() I could compute several quantiles per subset at > each shot, but the structure of the output is not > convenient for further analysis and visualization. > > > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25)) > a: a > x: x > 0%25% > -0.7727268 0.1693188 > -- > a: b > x: x > NULL > -- > > [snip] > > I would like to end up with a data frame like this: > > a x 0%25% > 1 a x -0.7727268 0.1693188 > 2 a y -0.3410671 0.1566322 > 3 b y -0.2914710 -0.2677410 > 4 b z -0.8502875 -0.6505710 > > I checked sweep() and apply() and didn't see how to harness > them for that purpose. > > So, is there a simple way to convert the object returned > by by() into a data.frame? > Or, is there a better way to go with this? > Finally, if I should roll my own coercion function: any tips? > One can use do.call("rbind", by(df, list(a = a, x = x), f)) where f is the appropriate function. In this case f can be described in terms of df.quantile which is like quantile except it returns a one row data frame: df.quantile <- function(x,p) as.data.frame(t(data.matrix(quantile(x, p f <- function(df, p = c(0.25, 0.5)) cbind(df[1,1:2], df.quantile(df[,"r"], p)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] nlme & SASmixed in 2.0.1
On Tuesday 05 April 2005 18:40, Murray Jorgensen wrote: > I assigned a class the first problem in Pinheiro & Bates, which uses > the data set PBIB from the SASmixed package. I have recently > downloaded 2.0.1 and its associated packages. On trying > > library(SASmixed) > data(PBIB) > library(nlme) > plot(PBIB) > > I get a warning message > Warning message: > replacing previous import: coef in: namespaceImportFrom(self, > asNamespace(ns)) > > after library(nlme) and a "pairs" type plot of PBIB. SASmixed now depends on lme4, which conflicted (until recently) with nlme. If you had your calls to library(SASmixed) and library(nlme) reversed, you probably would have gotten a warning. I think the simplest thing you can do is not to load SASmixed at all. Instead, use data(PBIB, package = "SASmixed") However, these datasets are (for all practical purposes) vanilla data frames, and you won't get trellis-style plots unless you create nlme groupedData objects yourself. (You should get them if you load the latticeExtra package manually, and then use 'gplot' instead of 'plot' to plot them.) Deepayan > Pressing on I get: > > lme1 <- lme(response ~ Treatment, data=PBIB, random =~1| Block) > > summary(lme1) > > Error: No slot of name "rep" for this object of class "lme" > Error in deviance([EMAIL PROTECTED], REML = REML) : > Unable to find the argument "object" in selecting a method > for function "deviance" > > > Everything works fine under 1.8.1 and plot(PBIB) is of trellis style, > which is what I think the authors intend. > > Cheers, Murray Jorgensen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to do aggregate operations with non-scalar functions
Hi Itay, Not sure if by() can do it directly, but this does it from first principles, using lapply() and tapply() (which aggregate uses internally). It would be reasonably straightforward to wrap this up in a function. a <- rep(c("a", "b"), c(6,6)) x <- rep(c("x", "y", "z"), c(4,4,4)) df <- data.frame(a=a, x=x, r=rnorm(12)) ## Probabilities for quantile p <- c(.25, .5, .75) ## This does the hard work of calculating the statistics over your ## combinations, and over the values in `p' y <- lapply(p, function(y) tapply(df$r, list(a=a, x=x), quantile, probs=y)) ## Then, we need to work out what combinations of a & x are possible: ## these are the header columns. aggregate() does this in a much more ## complicated way, which may handle more difficult cases than this ## (e.g. if there are lots of missing values points, or something). vars <- expand.grid(dimnames(y[[1]])) ## Finish up by converting `y' into a true data.frame, and ommiting ## all the cases where all the values in `y' are NA: these are ## combinations of a and x that we did not encounter. y <- as.data.frame(lapply(y, as.vector)) names(y) <- paste(p, "%", sep="") i <- colSums(apply(y, 1, is.na)) != ncol(y) y <- cbind(vars, y)[i,] Cheers, Rich On Apr 6, 2005 10:59 AM, Itay Furman <[EMAIL PROTECTED]> wrote: > > Hi, > > I have a data set, the structure of which is something like this: > > > a <- rep(c("a", "b"), c(6,6)) > > x <- rep(c("x", "y", "z"), c(4,4,4)) > > df <- data.frame(a=a, x=x, r=rnorm(12)) > > The true data set has >1 million rows. The factors "a" and "x" > have about 70 levels each; combined together they subset 'df' > into ~900 data frames. > For each such subset I'd like to compute various statistics > including quantiles, but I can't find an efficient way of > doing this. Aggregate() gives me the desired structure - > namely, one row per subset - but I can use it only to compute > a single quantile. > > > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25) > a x x > 1 a x 0.1693188 > 2 a y 0.1566322 > 3 b y -0.2677410 > 4 b z -0.6505710 > > With by() I could compute several quantiles per subset at > each shot, but the structure of the output is not > convenient for further analysis and visualization. > > > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25)) > a: a > x: x > 0%25% > -0.7727268 0.1693188 > -- > a: b > x: x > NULL > -- > > [snip] > > I would like to end up with a data frame like this: > > a x 0%25% > 1 a x -0.7727268 0.1693188 > 2 a y -0.3410671 0.1566322 > 3 b y -0.2914710 -0.2677410 > 4 b z -0.8502875 -0.6505710 > > I checked sweep() and apply() and didn't see how to harness > them for that purpose. > > So, is there a simple way to convert the object returned > by by() into a data.frame? > Or, is there a better way to go with this? > Finally, if I should roll my own coercion function: any tips? > >Thank you very much in advance, >Itay > > > [EMAIL PROTECTED] / +1 (206) 543 9040 / U of Washington > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Rich FitzJohn rich.fitzjohn gmail.com |http://homepages.paradise.net.nz/richa183 You are in a maze of twisty little functions, all alike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Regression Modeling Strategies Workshop by Frank Harrell in Southern California
Dr. Frank E. Harrell, Jr., Professor and Chair of the Department of Biostatistics at Vanderbilt University is giving a one-day workshop on Regression Modeling Strategies on Friday, April 29, 2005. Analyses of the example datasets use R/S-Plus and make extensive use of the Hmisc library written by Professor Harrell.The workshop is sponsored by the Southern California Chapter of the American Statistical Association (SCASA) and will be held on the campus of California State University, Long Beach, about 30 miles south of Los Angeles. The workshop is open to anyone with interest in the topic, and SCASA is charging $60 if registering by April 18th and $65 after the 18th. The registration fee covers all handouts, coffee and tea all day, a buffet lunch, and door prizes. Springer-Verlag will have copies of the book Regression Modeling Strategies on sale at a 33% discount. The workshop's web site is: http://www.stat.ucla.edu/~rgould/asw2005/. For further information contact Anita Iannucci ([EMAIL PROTECTED]), Harold Dyck ([EMAIL PROTECTED]) or Madeline Bauer ([EMAIL PROTECTED]). SCASA web site: http://www.sc-asa.org/ === Madeline Bauer, Ph.D.University of Southern California Keck School of Medicine (Infectious Diseases) IRD Room 620 (MC9520), 2020 Zonal Ave, Los Angeles 90033 (323) 226-2775 [Voice & FAX] [EMAIL PROTECTED] [Fastest communication method] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nlme & SASmixed in 2.0.1
I assigned a class the first problem in Pinheiro & Bates, which uses the data set PBIB from the SASmixed package. I have recently downloaded 2.0.1 and its associated packages. On trying library(SASmixed) data(PBIB) library(nlme) plot(PBIB) I get a warning message Warning message: replacing previous import: coef in: namespaceImportFrom(self, asNamespace(ns)) after library(nlme) and a "pairs" type plot of PBIB. Pressing on I get: > lme1 <- lme(response ~ Treatment, data=PBIB, random =~1| Block) > summary(lme1) Error: No slot of name "rep" for this object of class "lme" Error in deviance([EMAIL PROTECTED], REML = REML) : Unable to find the argument "object" in selecting a method for function "deviance" > Everything works fine under 1.8.1 and plot(PBIB) is of trellis style, which is what I think the authors intend. Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to do aggregate operations with non-scalar functions
Hi, I have a data set, the structure of which is something like this: a <- rep(c("a", "b"), c(6,6)) x <- rep(c("x", "y", "z"), c(4,4,4)) df <- data.frame(a=a, x=x, r=rnorm(12)) The true data set has >1 million rows. The factors "a" and "x" have about 70 levels each; combined together they subset 'df' into ~900 data frames. For each such subset I'd like to compute various statistics including quantiles, but I can't find an efficient way of doing this. Aggregate() gives me the desired structure - namely, one row per subset - but I can use it only to compute a single quantile. aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25) a x x 1 a x 0.1693188 2 a y 0.1566322 3 b y -0.2677410 4 b z -0.6505710 With by() I could compute several quantiles per subset at each shot, but the structure of the output is not convenient for further analysis and visualization. by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25)) a: a x: x 0%25% -0.7727268 0.1693188 -- a: b x: x NULL -- [snip] I would like to end up with a data frame like this: a x 0%25% 1 a x -0.7727268 0.1693188 2 a y -0.3410671 0.1566322 3 b y -0.2914710 -0.2677410 4 b z -0.8502875 -0.6505710 I checked sweep() and apply() and didn't see how to harness them for that purpose. So, is there a simple way to convert the object returned by by() into a data.frame? Or, is there a better way to go with this? Finally, if I should roll my own coercion function: any tips? Thank you very much in advance, Itay [EMAIL PROTECTED] / +1 (206) 543 9040 / U of Washington __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] summing columns using partial labels
Gidday, Perhaps try something along these lines: ## Establish which 4-letter group each row belongs to prefix <- substr(names(d), 1, 4) gp <- match(prefix, unique(prefix)) gp[regexpr("\\.total$", names(d)) > -1] <- NA # Exclude `*.total' rows ## Sum up each of the groups d.sums <- lapply(split(seq(along=d), gp), function(x) rowSums(d[x])) names(d.sums) <- paste(unique(prefix), "sum", sep=".") ## Append to the end of the original data.frame d.new <- cbind(d, d.sums) Cheers, Rich On Apr 6, 2005 6:05 AM, T Petersen <[EMAIL PROTECTED]> wrote: > I have a dataset of the form > > Year tosk.fai tosk.isd tosk.gr ... tosk.total hysa.fai > hysa.isd ... > > and so on. I want to sum all the columns using the first four letters in > the columns label(e.g. 'tosk', 'hysa' etc.). How can you do that? Also, > the sums should be without the '.total'column (e.g. 'tosk.total') as > this serves as a check that everything was done right. > > Kind regards > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Rich FitzJohn rich.fitzjohn gmail.com |http://homepages.paradise.net.nz/richa183 You are in a maze of twisty little functions, all alike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R can not show plots (in Mac OS X terminal)
On 5 Apr 2005, at 9:39 pm, Minyu Chen wrote: Sorry for bothering again, but it doesn't work yet. Now it shows "x11" when I type getOption("device"), but when I do the plot, the terminal just simply told me x11 is not available. This is why I asked you whether you have X11 before compiling R. It's no surprise that it doesn't work. I think now the best and cleanest way is to recompile R with the correct configurations so that it sets X11 as the default device. alternatively you may try options(device = 'quartz') to try to make use of quartz (Aqua) but I fear it won't work either. Thanks, Minyu Chen On 5 Apr 2005, at 21:30, Tony Han Bao wrote: On 5 Apr 2005, at 8:45 pm, Minyu Chen wrote: No, the only output is postscipt. As I just install X11, I did not have it before compiling R. You can try to set the device to x11 by issue the following command, options(device = 'x11') and hope now it works. What to do now except for getting and compiling R Aqua? Getting and compiling R Aqua is quite easy and you should do so for OS X to make use of its quartz and easy pdf related features. All the best, Thanks, Minyu Chen On 5 Apr 2005, at 20:23, Tony Han Bao wrote: On 5 Apr 2005, at 19:12, Minyu Chen wrote: Dear all: I am a newbie in Mac. Just installed R and found R did not react on my command plot (I use command line in terminal). It did not give me any error message, either. All it did was just giving out a new command prompt--no reaction to the plot command. I suppose whenever I gives out a command of plot, it will invoke the AquaTerm for a small graph, as I experience in octave. What can I do for it? issue command >getOption("device") to check the output device. The default I have on OS X is X11, do you have it installed before compiling R? It could also be the case that you issued command such as postscript() before plot(...) but forgot to issue dev.off(). Many thanks, Minyu Chen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Tony Han Bao [EMAIL PROTECTED] Tony Han Bao [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] negativr binomial glmm's
Hi, I guess by now you relaize that we have been trying to promote our NBGLMM in order to show people some of the capabiuliteis of our randome effects software ADMB-RE. If you want I can help you to analyze your data with the package we talk about on the R list. All I ask is that if it works for you you write a brief note to the list telling how it helped you. Cheers, dave Dave Fournier Otter Research Ltd PO Box 2040, sidney B.C. Canada V8L 3S3 -- Internal Virus Database is out-of-date. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R can not show plots (in Mac OS X terminal)
On 5 Apr 2005, at 8:45 pm, Minyu Chen wrote: No, the only output is postscipt. As I just install X11, I did not have it before compiling R. You can try to set the device to x11 by issue the following command, options(device = 'x11') and hope now it works. What to do now except for getting and compiling R Aqua? Getting and compiling R Aqua is quite easy and you should do so for OS X to make use of its quartz and easy pdf related features. All the best, Thanks, Minyu Chen On 5 Apr 2005, at 20:23, Tony Han Bao wrote: On 5 Apr 2005, at 19:12, Minyu Chen wrote: Dear all: I am a newbie in Mac. Just installed R and found R did not react on my command plot (I use command line in terminal). It did not give me any error message, either. All it did was just giving out a new command prompt--no reaction to the plot command. I suppose whenever I gives out a command of plot, it will invoke the AquaTerm for a small graph, as I experience in octave. What can I do for it? issue command >getOption("device") to check the output device. The default I have on OS X is X11, do you have it installed before compiling R? It could also be the case that you issued command such as postscript() before plot(...) but forgot to issue dev.off(). Many thanks, Minyu Chen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Tony Han Bao [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lists: removing elements, iterating over elements,
On Apr 5, 2005 1:36 PM, Paul Johnson <[EMAIL PROTECTED]> wrote: > I'm writing R code to calculate Hierarchical Social Entropy, a diversity > index that Tucker Balch proposed. One article on this was published in > Autonomous Robots in 2000. You can find that and others through his web > page at Georgia Tech. > > http://www.cc.gatech.edu/~tucker/index2.html > > While I work on this, I realize (again) that I'm a C programmer > masquerading in R, and its really tricky working with R lists. Here are > things that surprise me, I wonder what your experience/advice is. > > I need to calculate overlapping U-diametric clusters of a given radius. > (Again, I apologize this looks so much like C.) > > ## Returns a list of all U-diametric clusters of a given radius > ## Give an R distance matrix > ## Clusters may overlap. Clusters may be identical (redundant) > getUDClusters <-function(distmat,radius){ > mem <- list() > > nItems <- dim(distmat)[1] > for ( i in 1:nItems ){ > mem[[i]] <- c(i) > } > > for ( m in 1:nItems ){ > for ( n in 1:nItems ){ > if (m != n & (distmat[m,n] <= radius)){ >##item is within radius, so add to collection m > mem[[m]] <- sort(c( mem[[m]],n)) > } > } > } > > return(mem) > } > > That generates the list, like this: > > [[1]] > [1] 1 3 4 5 6 7 8 9 10 > > [[2]] > [1] 2 3 4 10 > > [[3]] > [1] 1 2 3 4 5 6 7 8 10 > > [[4]] > [1] 1 2 3 4 10 > > [[5]] > [1] 1 3 5 6 7 8 9 10 > > [[6]] > [1] 1 3 5 6 7 8 9 10 > > [[7]] > [1] 1 3 5 6 7 8 9 10 > > [[8]] > [1] 1 3 5 6 7 8 9 10 > > [[9]] > [1] 1 5 6 7 8 9 10 > > [[10]] > [1] 1 2 3 4 5 6 7 8 9 10 > > The next task is to eliminate the redundant elements. unique() does not > apply to lists, so I have to scan one by one. > > cluslist <- getUDClusters(distmat,radius) > > ##find redundant (same) clusters > redundantCluster <- c() > for (m in 1:(length(cluslist)-1)) { > for ( n in (m+1): length(cluslist) ){ > if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){ > if ( sum(cluslist[[m]] == cluslist[[n]]){ > redundantCluster <- c( redundantCluster,n) > } > } > } > } > > ##make sure they are sorted in reverse order > if (length(redundantCluster)>0) > { > redundantCluster <- unique(sort(redundantCluster, decreasing=T)) > > ## remove redundant clusters (must do in reverse order to preserve > index of cluslist) > for (i in redundantCluster) cluslist[[i]] <- NULL > } > > Question: am I deleting the list elements properly? > > I do not find explicit documentation for R on how to remove elements > from lists, but trial and error tells me > > myList[[5]] <- NULL > > will remove the 5th element and then "close up" the hole caused by > deletion of that element. That suffles the index values, So I have to > be careful in dropping elements. I must work from the back of the list > to the front. > > Is there an easier or faster way to remove the redundant clusters? > > Now, the next question. After eliminating the redundant sets from the > list, I need to calculate the total number of items present in the whole > list, figure how many are in each subset--each list item--and do some > calculations. > > I expected this would iterate over the members of the list--one step for > each subcollection > > for (i in cluslist){ > > } > > but it does not. It iterates over the items within the subsets of the > list "cluslist." I mean, if cluslist has 5 sets, each with 10 elements, > this for loop takes 50 steps, one for each individual item. > > I find this does what I want > > for (i in 1:length(cluslist)) > > But I found out the hard way :) > > Oh, one more quirk that fooled me. Why does unique() applied to a > distance matrix throw away the 0's I think that's really bad! > > > x <- rnorm(5) > > myDist <- dist(x,diag=T,upper=T) > > myDist > 1 2 3 4 5 > 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918 > 2 1.2929976 0.000 0.3728735 1.3718027 0.7435058 > 3 1.6658710 0.3728735 0.000 0.9989292 1.1163793 > 4 2.6648003 1.3718027 0.9989292 0.000 2.1153085 > 5 0.5494918 0.7435058 1.1163793 2.1153085 0.000 > > unique(myDist) > [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058 > [8] 0.9989292 1.1163793 2.1153085 > > > > -- If L is our list of vectors then the following gets the unique elements of L. I have assumed that the individual vectors are sorted (sort them first if not via lapply(L, sort)) and that each element has a unique name (give it one if not, e.g. names(L) <- seq(L)). The first line binds them together into rows. This will recycle to make them the same length and give you a warning but that's ok since you only need to know if they are the same or not. Now, unique applied to a matrix finds the unique rows and in the third line
Re: [R] exclusion rules for propensity score matchng (pattern rec)
Alexis J. Diamond wrote: hi, thanks for the reply to my query about exclusion rules for propensity score matching. Exclusion can be based on the non-overlap regions from the propensity. It should not be done in the individual covariate space. i want a rule inspired by non-overlap in propensity score space, but that binds in the space of the Xs. because i don't really know how to interpret the fact that i've excluded, say, people with scores > .87, but i DO know what it means to say that i've excluded people from country XYZ over age Q because i can't find good matches for them. if i make my rule based on Xs, i know who i can and cannot make inference for, and i can explain to other people who are the units that i can and cannot make inference for. after posting to the list last night, i thought of using the RGENOUD package (genetic algorithm) to search over the space of exclusion rules (eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function associated with a rule should be increasing in # of tr units w/out support excluded and decreasing in # of tr units w/ support excluded. it might be tricky to get the right loss function, and i know this idea is kind of nutty, but it's the only automated search method i could think of. any comments? alexis Use the X space directly will not result in optimum exclusions unless you use a distance function but that will make assumptions. My advice is to use rpart to make a classification rule that approximates the exclusion criteria to some desired degree of accuracy. I.e. use rpart to predict propensity < lower cutoff and separately to predict propensity > upper cutoff. This just assists in interpretation. Frank I tend to look at the 10th smallest and largest values of propensity for each of the two treatment groups for making the decision. You will need to exclude non-overlap regions whether you use matching or covariate adjustment of propensity but covariate adjustment (using e.g. regression splines in the logit of propensity) is often a better approach once you've been careful about non-overlap. Frank Harrell On Tue, 5 Apr 2005, Frank E Harrell Jr wrote: [EMAIL PROTECTED] wrote: Dear R-list, i have 6 different sets of samples. Each sample has about 5000 observations, with each observation comprised of 150 baseline covariates (X), 125 of which are dichotomous. Roughly 20% of the observations in each sample are "treatment" and the rest are "control" units. i am doing propensity score matching, i have already estimated propensity scores(predicted probabilities) using logistic regression, and in each sample i am going to have to exclude approximately 100 treated observations for which I cannot find matching control observations (because the scores for these treated units are outside the support of the scores for control units). in each sample, i must identify an exclusion rule that is interpretable on the scale of the X's that excludes these unmatchable treated observations and excludes as FEW of the remaining treated observations as possible. (the reason is that i want to be able to explain, in terms of the Xs, who the individuals are that I making causal inference about.) i've tried some simple stuff over the past few days and nothing's worked. is there an R-package or algorithm, or even estimation strategy that anyone could recommend? (i am really hoping so!) thank you, alexis diamond -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] lists: removing elements, iterating over elements,
To get the neighborhoods of radius r of each point in your data set, given distances calculated already in the matrix d, you could do (but note below) $ A <- (d <= r) then rows (or columns) of A are indicator vectors for the neighborhoods. "Unique" will work on these vectors, as "unique.array", to give the unique rows, which would be the unique neighborhood lists: $ unique(A) Your question about why "unique" applied to a distance matrix ignores zeros points to a possible problem: the object you get from dist() is not a matrix. The "upper" and "diag" options only control printing. If you check length() you'll see you only have n(n-1)/2 elements, the lower triangle of the distance matrix. (To answer the question: unique() sees only these; there's not a method for objects of class dist.) So you need to do $ d <- as.matrix(distmat) to get a matrix. Reid Huntsinger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Johnson Sent: Tuesday, April 05, 2005 1:36 PM To: r-help@stat.math.ethz.ch Subject: [R] lists: removing elements, iterating over elements, I'm writing R code to calculate Hierarchical Social Entropy, a diversity index that Tucker Balch proposed. One article on this was published in Autonomous Robots in 2000. You can find that and others through his web page at Georgia Tech. http://www.cc.gatech.edu/~tucker/index2.html While I work on this, I realize (again) that I'm a C programmer masquerading in R, and its really tricky working with R lists. Here are things that surprise me, I wonder what your experience/advice is. I need to calculate overlapping U-diametric clusters of a given radius. (Again, I apologize this looks so much like C.) ## Returns a list of all U-diametric clusters of a given radius ## Give an R distance matrix ## Clusters may overlap. Clusters may be identical (redundant) getUDClusters <-function(distmat,radius){ mem <- list() nItems <- dim(distmat)[1] for ( i in 1:nItems ){ mem[[i]] <- c(i) } for ( m in 1:nItems ){ for ( n in 1:nItems ){ if (m != n & (distmat[m,n] <= radius)){ ##item is within radius, so add to collection m mem[[m]] <- sort(c( mem[[m]],n)) } } } return(mem) } That generates the list, like this: [[1]] [1] 1 3 4 5 6 7 8 9 10 [[2]] [1] 2 3 4 10 [[3]] [1] 1 2 3 4 5 6 7 8 10 [[4]] [1] 1 2 3 4 10 [[5]] [1] 1 3 5 6 7 8 9 10 [[6]] [1] 1 3 5 6 7 8 9 10 [[7]] [1] 1 3 5 6 7 8 9 10 [[8]] [1] 1 3 5 6 7 8 9 10 [[9]] [1] 1 5 6 7 8 9 10 [[10]] [1] 1 2 3 4 5 6 7 8 9 10 The next task is to eliminate the redundant elements. unique() does not apply to lists, so I have to scan one by one. cluslist <- getUDClusters(distmat,radius) ##find redundant (same) clusters redundantCluster <- c() for (m in 1:(length(cluslist)-1)) { for ( n in (m+1): length(cluslist) ){ if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){ if ( sum(cluslist[[m]] == cluslist[[n]]){ redundantCluster <- c( redundantCluster,n) } } } } ##make sure they are sorted in reverse order if (length(redundantCluster)>0) { redundantCluster <- unique(sort(redundantCluster, decreasing=T)) ## remove redundant clusters (must do in reverse order to preserve index of cluslist) for (i in redundantCluster) cluslist[[i]] <- NULL } Question: am I deleting the list elements properly? I do not find explicit documentation for R on how to remove elements from lists, but trial and error tells me myList[[5]] <- NULL will remove the 5th element and then "close up" the hole caused by deletion of that element. That suffles the index values, So I have to be careful in dropping elements. I must work from the back of the list to the front. Is there an easier or faster way to remove the redundant clusters? Now, the next question. After eliminating the redundant sets from the list, I need to calculate the total number of items present in the whole list, figure how many are in each subset--each list item--and do some calculations. I expected this would iterate over the members of the list--one step for each subcollection for (i in cluslist){ } but it does not. It iterates over the items within the subsets of the list "cluslist." I mean, if cluslist has 5 sets, each with 10 elements, this for loop takes 50 steps, one for each individual item. I find this does what I want for (i in 1:length(cluslist)) But I found out the hard way :) Oh, one more quirk that fooled me. Why does unique() applied to a distance matrix throw away the 0's I think that's really bad! > x <- rnorm(5) > myDist <- dist(x,diag=T,upper=T) > myDist 1 2 3 4 5 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918 2 1.2929976 0.000 0.3728735 1.3718027 0.7435058 3 1.
Re: [R] R can not show plots (in Mac OS X terminal)
On 5 Apr 2005, at 19:12, Minyu Chen wrote: Dear all: I am a newbie in Mac. Just installed R and found R did not react on my command plot (I use command line in terminal). It did not give me any error message, either. All it did was just giving out a new command prompt--no reaction to the plot command. I suppose whenever I gives out a command of plot, it will invoke the AquaTerm for a small graph, as I experience in octave. What can I do for it? issue command >getOption("device") to check the output device. The default I have on OS X is X11, do you have it installed before compiling R? It could also be the case that you issued command such as postscript() before plot(...) but forgot to issue dev.off(). Many thanks, Minyu Chen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Tony Han Bao [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] lists: removing elements, iterating over elements,
> From: Paul Johnson > > I'm writing R code to calculate Hierarchical Social Entropy, > a diversity > index that Tucker Balch proposed. One article on this was > published in > Autonomous Robots in 2000. You can find that and others > through his web > page at Georgia Tech. > > http://www.cc.gatech.edu/~tucker/index2.html > > While I work on this, I realize (again) that I'm a C programmer > masquerading in R, and its really tricky working with R > lists. Here are > things that surprise me, I wonder what your experience/advice is. > > I need to calculate overlapping U-diametric clusters of a > given radius. >(Again, I apologize this looks so much like C.) > > > ## Returns a list of all U-diametric clusters of a given radius > ## Give an R distance matrix > ## Clusters may overlap. Clusters may be identical (redundant) > getUDClusters <-function(distmat,radius){ >mem <- list() > >nItems <- dim(distmat)[1] >for ( i in 1:nItems ){ > mem[[i]] <- c(i) >} This loop can be replaced with mem <- as.list(1:nItems)... >for ( m in 1:nItems ){ > for ( n in 1:nItems ){ >if (m != n & (distmat[m,n] <= radius)){ > ##item is within radius, so add to collection m > mem[[m]] <- sort(c( mem[[m]],n)) >} > } >} If I understood the code correctly, this should do the same: neighbors <- which(distmat <= radius, arr.ind=TRUE) neighbors <- neighbors[neighbors[, 1] != neighbors[, 2],] mem <- split(neighbors[, 2], neighbors[, 1]) What I'm not sure of is whether you intend to include the i-th item in the i-th list (since the distance is presumably 0). Your code seems to indicate no, as you have m != n in the if() condition. The second line above removes such results. However, your list below seems to indicate that you do have such elements in your lists. If such results can not be in the list, then the list should already be unique, no? For deleting an element of a list, see R FAQ 7.1. HTH, Andy >return(mem) > } > > > That generates the list, like this: > > [[1]] > [1] 1 3 4 5 6 7 8 9 10 > > [[2]] > [1] 2 3 4 10 > > [[3]] > [1] 1 2 3 4 5 6 7 8 10 > > [[4]] > [1] 1 2 3 4 10 > > [[5]] > [1] 1 3 5 6 7 8 9 10 > > [[6]] > [1] 1 3 5 6 7 8 9 10 > > [[7]] > [1] 1 3 5 6 7 8 9 10 > > [[8]] > [1] 1 3 5 6 7 8 9 10 > > [[9]] > [1] 1 5 6 7 8 9 10 > > [[10]] > [1] 1 2 3 4 5 6 7 8 9 10 > > > The next task is to eliminate the redundant elements. > unique() does not > apply to lists, so I have to scan one by one. > > >cluslist <- getUDClusters(distmat,radius) > >##find redundant (same) clusters >redundantCluster <- c() >for (m in 1:(length(cluslist)-1)) { > for ( n in (m+1): length(cluslist) ){ >if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){ > if ( sum(cluslist[[m]] == cluslist[[n]]){ >redundantCluster <- c( redundantCluster,n) > } >} > } >} > > >##make sure they are sorted in reverse order >if (length(redundantCluster)>0) > { >redundantCluster <- unique(sort(redundantCluster, > decreasing=T)) > >## remove redundant clusters (must do in reverse order to preserve > index of cluslist) >for (i in redundantCluster) cluslist[[i]] <- NULL > } > > > Question: am I deleting the list elements properly? > > I do not find explicit documentation for R on how to remove elements > from lists, but trial and error tells me > > myList[[5]] <- NULL > > will remove the 5th element and then "close up" the hole caused by > deletion of that element. That suffles the index values, So > I have to > be careful in dropping elements. I must work from the back of > the list > to the front. > > > Is there an easier or faster way to remove the redundant clusters? > > > Now, the next question. After eliminating the redundant sets > from the > list, I need to calculate the total number of items present > in the whole > list, figure how many are in each subset--each list item--and do some > calculations. > > I expected this would iterate over the members of the > list--one step for > each subcollection > > for (i in cluslist){ > > } > > but it does not. It iterates over the items within the > subsets of the > list "cluslist." I mean, if cluslist has 5 sets, each with > 10 elements, > this for loop takes 50 steps, one for each individual item. > > I find this does what I want > > for (i in 1:length(cluslist)) > > But I found out the hard way :) > > > Oh, one more quirk that fooled me. Why does unique() applied to a > distance matrix throw away the 0's I think that's really bad! > > > x <- rnorm(5) > > myDist <- dist(x,diag=T,upper=T) > > myDist >1 2 3 4 5 > 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918 > 2 1.2929976 0.000 0.3728735 1.3718027 0.
Re: [R] GLMs: Negative Binomial family in R?
Check out these recent postings to the R list: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/48429.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/48646.html Cheers, Pierre [EMAIL PROTECTED] wrote: Greetings R Users! I have a data set of count responses for which I have made repeated observations on the experimental units (stream reaches) over two air photo dates, hence the mixed effect. I have been using Dr. Jim Lindsey's GLMM function found in his "repeated" measures package with the "poisson" family. My problem though is that I don't think the poisson distribution is the right one to discribe my data which is overdispersed; the variance is greater than the mean. I have read that the "negative binomial" regression models can account for some of the differences among observations by adding in a error term that independent of the the covariates. I haven't yet come across a mixed effects model that can use the "negative binomial" distribution. If any of you know of such a function - I will certainly look forward to hearing from you! Additionally, if any of you have insight on zero-inflated data, and testing for this, I'd be interested in your comments too. I'll post a summary of your responses to this list. Best Regards, Nadele Flynn, M.Sc. candidate. University of Alberta __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- - Pierre Kleiber, Ph.D Email: [EMAIL PROTECTED] Fishery BiologistTel: 808 983-5399 / (hm)808 737-7544 NOAA Fisheries Service - Honolulu LaboratoryFax: 808 983-2902 2570 Dole St., Honolulu, HI 96822-2396 - "God could have told Moses about galaxies and mitochondria and all. But behold... It was good enough for government work." __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GLMs: Negative Binomial family in R?
Hi, Also consider using the function supplied in the post: https://stat.ethz.ch/pipermail/r-help/2005-March/066752.html for fitting negative binomial mixed effects models. Cheers, Anders. On Tue, 5 Apr 2005, Achim Zeileis wrote: > On Tue, 5 Apr 2005 11:20:37 -0600 [EMAIL PROTECTED] wrote: > > > Greetings R Users! > > > > I have a data set of count responses for which I have made repeated > > observations on the experimental units (stream reaches) over two air > > photo dates, hence the mixed effect. I have been using Dr. Jim > > Lindsey's GLMM function found in his"repeated" measures package with > > the "poisson" family. > > > > My problem though is that I don't think the poisson distribution is > > the right one to discribe my data which is overdispersed; the variance > > is greater than the mean. I have read that the "negative binomial" > > regression models can account for some of the differences among > > observations by adding in a error term that independent of the the > > covariates. > > glm.nb() from package MASS fits negative binomial GLMs. > > > I haven't yet come across a mixed effects model that can use the > > "negative binomial" distribution. > > For known theta, you can plug negative.binomial(theta) into glmmPQL() > for example. (Both functions are also available in MASS.) I'm not sure > whether there is also code available for unknown theta. > > > If any of you know of such a function - I will certainly look forward > > to hearing from you! Additionally, if any of you have insight on > > zero-inflated data, and testing for this, I'd be interested in your > > comments too. I'll post a summary of your responses to this list. > > Look at package zicounts for zero-inflated Poisson and NB models. For > these models, there is also code available at > http://pscl.stanford.edu/content.html > which also hosts code for hurdle models. > > hth, > Z > > > Best Regards, > > Nadele Flynn, M.Sc. candidate. > > University of Alberta > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] lists: removing elements, iterating over elements,
The following strategy may or may not work, depending on whether the numbers in your lists are integers or could be the result of flowing point computations (so that 2 might be 1.99... etc.). As I understand it, you wish to reduce an arbitrary list to one with unique members, where each member is a list/set, of course. If so, one way to do it is to convert the members to a vector of strings, find the unique strings, then convert the result back to a list. Something like (warning: not fully tested) > test<-list(a=1:3,b=1:4,c=1:3,d=1:5,e=1:3) > l1<-lapply(test, paste,collapse='+') > l2<-unique(unlist(l1)) > l2 [1] "1+2+3" "1+2+3+4" "1+2+3+4+5" > lapply(strsplit(l2,split='+',fixed=TRUE),as.numeric) [[1]] [1] 1 2 3 [[2]] [1] 1 2 3 4 [[3]] [1] 1 2 3 4 5 The basic idea is to get your list into a form where the efficiency of unique() can be brought to bear. There may of course be better ways to do this. HTH Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Paul Johnson > Sent: Tuesday, April 05, 2005 10:36 AM > To: r-help@stat.math.ethz.ch > Subject: [R] lists: removing elements, iterating over elements, > > I'm writing R code to calculate Hierarchical Social Entropy, > a diversity > index that Tucker Balch proposed. One article on this was > published in > Autonomous Robots in 2000. You can find that and others > through his web > page at Georgia Tech. > > http://www.cc.gatech.edu/~tucker/index2.html > > While I work on this, I realize (again) that I'm a C programmer > masquerading in R, and its really tricky working with R > lists. Here are > things that surprise me, I wonder what your experience/advice is. > > I need to calculate overlapping U-diametric clusters of a > given radius. >(Again, I apologize this looks so much like C.) > > > ## Returns a list of all U-diametric clusters of a given radius > ## Give an R distance matrix > ## Clusters may overlap. Clusters may be identical (redundant) > getUDClusters <-function(distmat,radius){ >mem <- list() > >nItems <- dim(distmat)[1] >for ( i in 1:nItems ){ > mem[[i]] <- c(i) >} > > >for ( m in 1:nItems ){ > for ( n in 1:nItems ){ >if (m != n & (distmat[m,n] <= radius)){ > ##item is within radius, so add to collection m > mem[[m]] <- sort(c( mem[[m]],n)) >} > } >} > >return(mem) > } > > > That generates the list, like this: > > [[1]] > [1] 1 3 4 5 6 7 8 9 10 > > [[2]] > [1] 2 3 4 10 > > [[3]] > [1] 1 2 3 4 5 6 7 8 10 > > [[4]] > [1] 1 2 3 4 10 > > [[5]] > [1] 1 3 5 6 7 8 9 10 > > [[6]] > [1] 1 3 5 6 7 8 9 10 > > [[7]] > [1] 1 3 5 6 7 8 9 10 > > [[8]] > [1] 1 3 5 6 7 8 9 10 > > [[9]] > [1] 1 5 6 7 8 9 10 > > [[10]] > [1] 1 2 3 4 5 6 7 8 9 10 > > > The next task is to eliminate the redundant elements. > unique() does not > apply to lists, so I have to scan one by one. > > >cluslist <- getUDClusters(distmat,radius) > >##find redundant (same) clusters >redundantCluster <- c() >for (m in 1:(length(cluslist)-1)) { > for ( n in (m+1): length(cluslist) ){ >if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){ > if ( sum(cluslist[[m]] == cluslist[[n]]){ >redundantCluster <- c( redundantCluster,n) > } >} > } >} > > >##make sure they are sorted in reverse order >if (length(redundantCluster)>0) > { >redundantCluster <- unique(sort(redundantCluster, > decreasing=T)) > >## remove redundant clusters (must do in reverse order to preserve > index of cluslist) >for (i in redundantCluster) cluslist[[i]] <- NULL > } > > > Question: am I deleting the list elements properly? > > I do not find explicit documentation for R on how to remove elements > from lists, but trial and error tells me > > myList[[5]] <- NULL > > will remove the 5th element and then "close up" the hole caused by > deletion of that element. That suffles the index values, So > I have to > be careful in dropping elements. I must work from the back of > the list > to the front. > > > Is there an easier or faster way to remove the redundant clusters? > > > Now, the next question. After eliminating the redundant sets > from the > list, I need to calculate the total number of items present > in the whole > list, figure how many are in each subset--each list item--and do some > calculations. > > I expected this would iterate over the members of the > list--one step for > each subcollection > > for (i in cluslist){ > > } > > but it does not. It iterates over the items within the > subsets of the > list "cluslist." I mean, if cl
[R] R can not show plots (in Mac OS X terminal)
Dear all: I am a newbie in Mac. Just installed R and found R did not react on my command plot (I use command line in terminal). It did not give me any error message, either. All it did was just giving out a new command prompt--no reaction to the plot command. I suppose whenever I gives out a command of plot, it will invoke the AquaTerm for a small graph, as I experience in octave. What can I do for it? Many thanks, Minyu Chen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] summing columns using partial labels
I have a dataset of the form Year tosk.fai tosk.isd tosk.gr ... tosk.total hysa.fai hysa.isd ... and so on. I want to sum all the columns using the first four letters in the columns label(e.g. 'tosk', 'hysa' etc.). How can you do that? Also, the sums should be without the '.total'column (e.g. 'tosk.total') as this serves as a check that everything was done right. Kind regards __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GLMs: Negative Binomial family in R?
On Tue, 5 Apr 2005 11:20:37 -0600 [EMAIL PROTECTED] wrote: > Greetings R Users! > > I have a data set of count responses for which I have made repeated > observations on the experimental units (stream reaches) over two air > photo dates, hence the mixed effect. I have been using Dr. Jim > Lindsey's GLMM function found in his"repeated" measures package with > the "poisson" family. > > My problem though is that I don't think the poisson distribution is > the right one to discribe my data which is overdispersed; the variance > is greater than the mean. I have read that the "negative binomial" > regression models can account for some of the differences among > observations by adding in a error term that independent of the the > covariates. glm.nb() from package MASS fits negative binomial GLMs. > I haven't yet come across a mixed effects model that can use the > "negative binomial" distribution. For known theta, you can plug negative.binomial(theta) into glmmPQL() for example. (Both functions are also available in MASS.) I'm not sure whether there is also code available for unknown theta. > If any of you know of such a function - I will certainly look forward > to hearing from you! Additionally, if any of you have insight on > zero-inflated data, and testing for this, I'd be interested in your > comments too. I'll post a summary of your responses to this list. Look at package zicounts for zero-inflated Poisson and NB models. For these models, there is also code available at http://pscl.stanford.edu/content.html which also hosts code for hurdle models. hth, Z > Best Regards, > Nadele Flynn, M.Sc. candidate. > University of Alberta > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lists: removing elements, iterating over elements,
I'm writing R code to calculate Hierarchical Social Entropy, a diversity index that Tucker Balch proposed. One article on this was published in Autonomous Robots in 2000. You can find that and others through his web page at Georgia Tech. http://www.cc.gatech.edu/~tucker/index2.html While I work on this, I realize (again) that I'm a C programmer masquerading in R, and its really tricky working with R lists. Here are things that surprise me, I wonder what your experience/advice is. I need to calculate overlapping U-diametric clusters of a given radius. (Again, I apologize this looks so much like C.) ## Returns a list of all U-diametric clusters of a given radius ## Give an R distance matrix ## Clusters may overlap. Clusters may be identical (redundant) getUDClusters <-function(distmat,radius){ mem <- list() nItems <- dim(distmat)[1] for ( i in 1:nItems ){ mem[[i]] <- c(i) } for ( m in 1:nItems ){ for ( n in 1:nItems ){ if (m != n & (distmat[m,n] <= radius)){ ##item is within radius, so add to collection m mem[[m]] <- sort(c( mem[[m]],n)) } } } return(mem) } That generates the list, like this: [[1]] [1] 1 3 4 5 6 7 8 9 10 [[2]] [1] 2 3 4 10 [[3]] [1] 1 2 3 4 5 6 7 8 10 [[4]] [1] 1 2 3 4 10 [[5]] [1] 1 3 5 6 7 8 9 10 [[6]] [1] 1 3 5 6 7 8 9 10 [[7]] [1] 1 3 5 6 7 8 9 10 [[8]] [1] 1 3 5 6 7 8 9 10 [[9]] [1] 1 5 6 7 8 9 10 [[10]] [1] 1 2 3 4 5 6 7 8 9 10 The next task is to eliminate the redundant elements. unique() does not apply to lists, so I have to scan one by one. cluslist <- getUDClusters(distmat,radius) ##find redundant (same) clusters redundantCluster <- c() for (m in 1:(length(cluslist)-1)) { for ( n in (m+1): length(cluslist) ){ if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){ if ( sum(cluslist[[m]] == cluslist[[n]]){ redundantCluster <- c( redundantCluster,n) } } } } ##make sure they are sorted in reverse order if (length(redundantCluster)>0) { redundantCluster <- unique(sort(redundantCluster, decreasing=T)) ## remove redundant clusters (must do in reverse order to preserve index of cluslist) for (i in redundantCluster) cluslist[[i]] <- NULL } Question: am I deleting the list elements properly? I do not find explicit documentation for R on how to remove elements from lists, but trial and error tells me myList[[5]] <- NULL will remove the 5th element and then "close up" the hole caused by deletion of that element. That suffles the index values, So I have to be careful in dropping elements. I must work from the back of the list to the front. Is there an easier or faster way to remove the redundant clusters? Now, the next question. After eliminating the redundant sets from the list, I need to calculate the total number of items present in the whole list, figure how many are in each subset--each list item--and do some calculations. I expected this would iterate over the members of the list--one step for each subcollection for (i in cluslist){ } but it does not. It iterates over the items within the subsets of the list "cluslist." I mean, if cluslist has 5 sets, each with 10 elements, this for loop takes 50 steps, one for each individual item. I find this does what I want for (i in 1:length(cluslist)) But I found out the hard way :) Oh, one more quirk that fooled me. Why does unique() applied to a distance matrix throw away the 0's I think that's really bad! > x <- rnorm(5) > myDist <- dist(x,diag=T,upper=T) > myDist 1 2 3 4 5 1 0.000 1.2929976 1.6658710 2.6648003 0.5494918 2 1.2929976 0.000 0.3728735 1.3718027 0.7435058 3 1.6658710 0.3728735 0.000 0.9989292 1.1163793 4 2.6648003 1.3718027 0.9989292 0.000 2.1153085 5 0.5494918 0.7435058 1.1163793 2.1153085 0.000 > unique(myDist) [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058 [8] 0.9989292 1.1163793 2.1153085 > -- Paul E. Johnson email: [EMAIL PROTECTED] Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GLMs: Negative Binomial family in R?
Greetings R Users! I have a data set of count responses for which I have made repeated observations on the experimental units (stream reaches) over two air photo dates, hence the mixed effect. I have been using Dr. Jim Lindsey's GLMM function found in his "repeated" measures package with the "poisson" family. My problem though is that I don't think the poisson distribution is the right one to discribe my data which is overdispersed; the variance is greater than the mean. I have read that the "negative binomial" regression models can account for some of the differences among observations by adding in a error term that independent of the the covariates. I haven't yet come across a mixed effects model that can use the "negative binomial" distribution. If any of you know of such a function - I will certainly look forward to hearing from you! Additionally, if any of you have insight on zero-inflated data, and testing for this, I'd be interested in your comments too. I'll post a summary of your responses to this list. Best Regards, Nadele Flynn, M.Sc. candidate. University of Alberta __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Dendrogram for a type unbalanced ANOVA
It is seldom a good idea to use the _significance_ of a difference as a surrogate for its magnitude. The reason is that the significance varies with many irrelevant aspects of the analysis, such as the model and the sample size. If you have other measures of group similarity, then there are many hierarchical clustering methods that might meet your needs. Try ?hclust to get started. Ben Fairbank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of bob sandefur Sent: Tuesday, April 05, 2005 11:39 AM To: r-help@stat.math.ethz.ch Subject: [R] Dendrogram for a type unbalanced ANOVA Hi- I have about 20 groups for which I know the mean, variance, and number of points per group. Is here an R function where I can plot (3 group example) something like | |- 2 | |---| | | |- 1 | | |--| | |- 3 | | 0 25 50 75 100 ie 1 and 2 are different at 75% level of confidence 1 2 combined are different from 3 at 25% level of confidence If this plot is a silly idea can someone point me to a reference for anything similar? Thanx Robert (Bob) L. Sandefur PE Senior Geostatistician / Reserve Analyst CAM 200 Union Suite G-13 Lakewood, Co 80228 [EMAIL PROTECTED] 303 472-3240 (cell) <-best choice 303 716-1617 ext 14 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Dendrogram for a type unbalanced ANOVA
Hi- I have about 20 groups for which I know the mean, variance, and number of points per group. Is here an R function where I can plot (3 group example) something like | |- 2 | |---| | | |- 1 | | |--| | |- 3 | | 0 25 50 75 100 ie 1 and 2 are different at 75% level of confidence 1 2 combined are different from 3 at 25% level of confidence If this plot is a silly idea can someone point me to a reference for anything similar? Thanx Robert (Bob) L. Sandefur PE Senior Geostatistician / Reserve Analyst CAM 200 Union Suite G-13 Lakewood, Co 80228 [EMAIL PROTECTED] 303 472-3240 (cell) <-best choice 303 716-1617 ext 14 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] extracting Proportion Var and Cumulative Var values from factanal
Hi R users, I need some help in the followings: I'm doing factor analysis and I need to extract the loading values and the Proportion Var and Cumulative Var values one by one. Here is what I am doing: > fact <- factanal(na.omit(gnome_freq_r2),factors=5); > fact$loadings Loadings: Factor1 Factor2 Factor3 Factor4 Factor5 b1freqr2 0.246 0.486 0.145 b2freqr2 0.129 0.575 0.175 0.130 0.175 b3freqr2 0.605 0.253 0.166 0.138 0.134 b4freqr2 0.191 0.220 0.949 b5freqr2 0.286 0.265 0.113 0.891 0.190 b6freqr2 0.317 0.460 0.151 b7freqr2 0.138 0.199 0.119 0.711 b8freqr2 0.769 0.258 0.195 0.137 b9freqr2 0.148 0.449 0.103 0.327 Factor1 Factor2 Factor3 Factor4 Factor5 SS loadings 1.294 1.268 1.008 0.927 0.730 Proportion Var 0.144 0.141 0.112 0.103 0.081 Cumulative Var 0.144 0.285 0.397 0.500 0.581 > I can get the loadings using: > fact$loadings[1,1] [1] 0.2459635 > but I couldn't find the way to do the same with the Proportion Var and Cumulative Var values. Thanks, Tamas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with three-way anova
Dear Michael, For unbalanced data, you might want to take a look at the Anova() function in the car package. As well, it probably makes sense to read something about how linear models are expressed in R. ?lm and ?formula both have some information about model formulas; the Introduction to R manual that comes with R has a chapter on statistical models; and books on R typically take up the subject at greater length. I hope this helps, John On Tue, 5 Apr 2005 15:51:46 +0100 "michael watson \(IAH-C\)" <[EMAIL PROTECTED]> wrote: > Hi > > I have data from 12 subjects. The measurement is log(expression) of > a > particular gene and can be assumed to be normally distributed. The > 12 > subjects are divided into the following groups: > > Infected, Vaccinated, Lesions - 3 measurements > Infected, Vaccintaed, No Lesions - 2 measurements > Infected, Not Vaccinated, Lesions - 4 measurements > Uninfected, Not Vaccinated, No Lesions - 3 measurements > > Although presence/absence of lesions could be considered to be a > phenotype, here I would like to use it as a factor. This explains > some > of the imbalance in the design (ie we could not control how many > subjects, if any, in each group would get lesions). > > First impressions - the data looks like we would expect. Gene > expression is lowest in the infected/not vaccinated group, then next > lowest is the infected/vaccinated group and finally comes the > uninfected/not vaccinated group. So the working hypothesis is that > gene > expression of the gene in question is lowered by infection, but that > the > vaccine somehow alleviates this effect, but not as much as to the > level > of a totally uninfected subject. We *might* have access to data > relating to uninfected/vaccinated group, my pet scientist is digging > for > this as we speak. > > As for lesions, well none of the uninfected subjects have them, all > of > the infected/not vaccinated subjects have them, and some of the > infected/vaccinated have them, some don't. Again, this makes for a > very > sensible hypothesis if we treat presence/absence of lesions as a > phenotype, but in addition to that I want to know if gene expression > is > linked to presence/absence of lesion, but only one group of subjects > has > both lesions and non-lesions within it. Eye-balling this group, > presence/absence of lesions and gene expression are not linked. > > So I have this as a data.frame in R, and I wanted to run an analysis > of > variance. I did: > > aov <- aov(IL.4 ~ Infected + Vaccinated + Lesions, data) > summary(aov) > > And got: > > Df Sum Sq Mean Sq F valuePr(>F) > Infected 1 29.8482 29.8482 66.7037 3.761e-05 *** > Vaccinated 1 13.5078 13.5078 30.1868 0.0005777 *** > Lesions 1 0.0393 0.0393 0.0878 0.7746009 > Residuals8 3.5798 0.4475 > --- > > This tells me that Infected and Vaccinated are highly significant, > whereas lesions are not. > > So, what I want to know is: > > 1) Given my unbalanced experimental design, is it valid to use aov? > 2) Have I used aov() correctly? If so, how do I get access results > for > interactions? > 3) Is there some other, more relevant way of analysing this? What I > am > really interested in is the gene expression, and whether it can be > shown > to be statistically related to one or more of the factors involved > (Infected, Vaccinated, Lesions) or interactions between those > factors. > > Many thanks in advance > > Mick > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with three-way anova
On Tue, 2005-04-05 at 15:51 +0100, michael watson (IAH-C) wrote: > So, what I want to know is: > > 1) Given my unbalanced experimental design, is it valid to use aov? I'd say no. Use lm() instead, save your analysis in an object and then possibly use drop1() to check the analysis > 2) Have I used aov() correctly? If so, how do I get access results for > interactions? The use of aov() per se seems fine, but you did not put any interaction in the model... for that use factor * factor. HTH, F -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] cat bailing out in a for loop
See ?try. Andy > From: Federico Calboli > > Dear All, > > I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 > SNPs. I am using the function HWE.exact from the package "genetics". > > In order not to do a lot of coding "by hand", I have a for loop that > goes through each column (each column is one SNP) and gives me the > p.value for HWE.exact. Unfortunately some SNP have reached > fixation and > HWE.exact requires a 2 alleles scenario. > > So my problem is that my for loop: > > ## > > for (i in 1:42){ > xxx<-HWE.exact(genotype(laba.con[,i+3], sep="")) > cat(colnames(laba)[i+3],xxx$p.value,"\n")} > > ## > > bails out as soon as it hits a SNP at fixation for one allele, because > HWE.exact fails. > > I have a lot of this game to play and checking things by hand is not > idea, and I do not care about failed SNP, all I want is for > the loop to > carry on regardless of what's in xxx$p.value, even if HWE.exact failed > to calculte it. Dump anything in there! > > Is there any way of forcing the loop to carry on? > > Cheers, > > Federico Calboli > > -- > Federico C. F. Calboli > Department of Epidemiology and Public Health > Imperial College, St Mary's Campus > Norfolk Place, London W2 1PG > > Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 > > f.calboli [.a.t] imperial.ac.uk > f.calboli [.a.t] gmail.com > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cat bailing out in a for loop
Federico Calboli wrote: Dear All, I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 SNPs. I am using the function HWE.exact from the package "genetics". In order not to do a lot of coding "by hand", I have a for loop that goes through each column (each column is one SNP) and gives me the p.value for HWE.exact. Unfortunately some SNP have reached fixation and HWE.exact requires a 2 alleles scenario. So my problem is that my for loop: ## for (i in 1:42){ xxx<-HWE.exact(genotype(laba.con[,i+3], sep="")) cat(colnames(laba)[i+3],xxx$p.value,"\n")} ## bails out as soon as it hits a SNP at fixation for one allele, because HWE.exact fails. I have a lot of this game to play and checking things by hand is not idea, and I do not care about failed SNP, all I want is for the loop to carry on regardless of what's in xxx$p.value, even if HWE.exact failed to calculte it. Dump anything in there! Is there any way of forcing the loop to carry on? See ?try. Uwe Ligges Cheers, Federico Calboli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Fitdistr and likelihood
Hi all, I'm using the function "fitdistr" (library MASS) to fit a distribution to given data. What I have to do further, is getting the log-Likelihood-Value from this estimation. Is there any simple possibility to realize it? Regards, Carsten __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] French Curve
> "dream" == dream home <[EMAIL PROTECTED]> writes: > Does it sound like spline work do the job? It would be hard > to persuave him to use some modern math technique but he > did ask me to help him implement the French Curve so he can > do his work in Excel, rather than PAPER. Splines are useful for interpolating points with a continuous curve that passes through, or near, the points. If you are looking for a way to estimate a curve with a noise component removed, I think you'd be better off filtering your data, rather than interpolating with a spline. Median (or mean) filtering may give results similar to those from your chemist's manual method. That is easy to do with running from the gtools package. The validity of this is another question! require(gtools) x <- seq(250)/10 y1 <- sin(x) + 15 + rnorm(250)/2 y2 <- cos(x) + 12 + rnorm(250) plot(x, y1, ylim=c(0,18), col='grey') points(x, y2, pch=2, col='grey') points(x, y1-y2, col='grey', pch=3) ## running median filters lines(running(x), running(y1, fun=median), col='blue') lines(running(x), running(y2, fun=median), col='blue') lines(running(x), running(y1, fun=median)-running(y2, fun=median), col='blue') ## running mean filters lines(running(x), running(y1), col='red') lines(running(x), running(y2), col='red') lines(running(x), running(y1)-running(y2), col='red') f <- sin(x) + 15 - ( cos(x) + 12 ) lines(x, f) Mike -- Michael A. Miller [EMAIL PROTECTED] Imaging Sciences, Department of Radiology, IU School of Medicine __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cat bailing out in a for loop
Dear All, I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 SNPs. I am using the function HWE.exact from the package "genetics". In order not to do a lot of coding "by hand", I have a for loop that goes through each column (each column is one SNP) and gives me the p.value for HWE.exact. Unfortunately some SNP have reached fixation and HWE.exact requires a 2 alleles scenario. So my problem is that my for loop: ## for (i in 1:42){ xxx<-HWE.exact(genotype(laba.con[,i+3], sep="")) cat(colnames(laba)[i+3],xxx$p.value,"\n")} ## bails out as soon as it hits a SNP at fixation for one allele, because HWE.exact fails. I have a lot of this game to play and checking things by hand is not idea, and I do not care about failed SNP, all I want is for the loop to carry on regardless of what's in xxx$p.value, even if HWE.exact failed to calculte it. Dump anything in there! Is there any way of forcing the loop to carry on? Cheers, Federico Calboli -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] rgl to quicktime
hello, is it possible to convert a 3D plot I made with rgl package to a quicktime VR? thanks, simone __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dead wood code
Mike, I used R for exactly that purpose, to test a new method for sampling coarse woody debris in silico against existing alternatives. The results are published in: Bebber, D.P. & Thomas, S.C., 2003. Prism sweeps for coarse woody debris. Canadian Journal of Forest Research 33, 17371743. I will put a reprint in the post, and will forward the R code in a separate message. Please cite the article and acknowledge the original code in any publications. Cheers, Dan Bebber Department of Plant Sciences University of Oxford South Parks Road Oxford OX1 3RB UK > > Date: Tue, 5 Apr 2005 10:07:00 -0400 > From: "Mike Saunders" > <[EMAIL PROTECTED]> > To: "R Help" > Subject: [R] Dead wood code > > > Is there a package, or does anyone have code they > are willing to share, > that would allow me to simulate sampling of dead > wood pieces across an > area? I am specifically looking for code to > simulate the dead wood > distribution as small line segments across an > extent, and then I will > "sample" the dead wood using different sampling > techniques including > line transects, fixed area plots, etc. > > Thanks, > Mike > > Mike Saunders > Research Assistant > Forest Ecosystem Research Program > Department of Forest Ecosystem Sciences > University of Maine > Orono, ME 04469 > 207-581-2763 (O) > 207-581-4257 (F) > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > > Send instant messages to your online friends http://uk.messenger.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] exclusion rules for propensity score matchng (pattern rec)
hi, thanks for the reply to my query about exclusion rules for propensity score matching. > Exclusion can be based on the non-overlap regions from the propensity. > It should not be done in the individual covariate space. i want a rule inspired by non-overlap in propensity score space, but that binds in the space of the Xs. because i don't really know how to interpret the fact that i've excluded, say, people with scores > .87, but i DO know what it means to say that i've excluded people from country XYZ over age Q because i can't find good matches for them. if i make my rule based on Xs, i know who i can and cannot make inference for, and i can explain to other people who are the units that i can and cannot make inference for. after posting to the list last night, i thought of using the RGENOUD package (genetic algorithm) to search over the space of exclusion rules (eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function associated with a rule should be increasing in # of tr units w/out support excluded and decreasing in # of tr units w/ support excluded. it might be tricky to get the right loss function, and i know this idea is kind of nutty, but it's the only automated search method i could think of. any comments? alexis > I tend to look > at the 10th smallest and largest values of propensity for each of the > two treatment groups for making the decision. You will need to exclude > non-overlap regions whether you use matching or covariate adjustment of > propensity but covariate adjustment (using e.g. regression splines in > the logit of propensity) is often a better approach once you've been > careful about non-overlap. > > Frank Harrell On Tue, 5 Apr 2005, Frank E Harrell Jr wrote: > [EMAIL PROTECTED] wrote: > > Dear R-list, > > > > i have 6 different sets of samples. Each sample has about 5000 > > observations, > > with each observation comprised of 150 baseline covariates (X), 125 of which > > are dichotomous. Roughly 20% of the observations in each sample are > > "treatment" > > and the rest are "control" units. > > > > i am doing propensity score matching, i have already estimated propensity > > scores(predicted probabilities) using logistic regression, and in each > > sample i > > am going to have to exclude approximately 100 treated observations for > > which I > > cannot find matching control observations (because the scores for these > > treated > > units are outside the support of the scores for control units). > > > > in each sample, i must identify an exclusion rule that is interpretable on > > the > > scale of the X's that excludes these unmatchable treated observations and > > excludes as FEW of the remaining treated observations as possible. > > (the reason is that i want to be able to explain, in terms of the Xs, who > > the > > individuals are that I making causal inference about.) > > > > i've tried some simple stuff over the past few days and nothing's worked. > > is there an R-package or algorithm, or even estimation strategy that anyone > > could recommend? > > (i am really hoping so!) > > > > thank you, > > > > alexis diamond > > > > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help with three-way anova
Hi I have data from 12 subjects. The measurement is log(expression) of a particular gene and can be assumed to be normally distributed. The 12 subjects are divided into the following groups: Infected, Vaccinated, Lesions - 3 measurements Infected, Vaccintaed, No Lesions - 2 measurements Infected, Not Vaccinated, Lesions - 4 measurements Uninfected, Not Vaccinated, No Lesions - 3 measurements Although presence/absence of lesions could be considered to be a phenotype, here I would like to use it as a factor. This explains some of the imbalance in the design (ie we could not control how many subjects, if any, in each group would get lesions). First impressions - the data looks like we would expect. Gene expression is lowest in the infected/not vaccinated group, then next lowest is the infected/vaccinated group and finally comes the uninfected/not vaccinated group. So the working hypothesis is that gene expression of the gene in question is lowered by infection, but that the vaccine somehow alleviates this effect, but not as much as to the level of a totally uninfected subject. We *might* have access to data relating to uninfected/vaccinated group, my pet scientist is digging for this as we speak. As for lesions, well none of the uninfected subjects have them, all of the infected/not vaccinated subjects have them, and some of the infected/vaccinated have them, some don't. Again, this makes for a very sensible hypothesis if we treat presence/absence of lesions as a phenotype, but in addition to that I want to know if gene expression is linked to presence/absence of lesion, but only one group of subjects has both lesions and non-lesions within it. Eye-balling this group, presence/absence of lesions and gene expression are not linked. So I have this as a data.frame in R, and I wanted to run an analysis of variance. I did: aov <- aov(IL.4 ~ Infected + Vaccinated + Lesions, data) summary(aov) And got: Df Sum Sq Mean Sq F valuePr(>F) Infected 1 29.8482 29.8482 66.7037 3.761e-05 *** Vaccinated 1 13.5078 13.5078 30.1868 0.0005777 *** Lesions 1 0.0393 0.0393 0.0878 0.7746009 Residuals8 3.5798 0.4475 --- This tells me that Infected and Vaccinated are highly significant, whereas lesions are not. So, what I want to know is: 1) Given my unbalanced experimental design, is it valid to use aov? 2) Have I used aov() correctly? If so, how do I get access results for interactions? 3) Is there some other, more relevant way of analysing this? What I am really interested in is the gene expression, and whether it can be shown to be statistically related to one or more of the factors involved (Infected, Vaccinated, Lesions) or interactions between those factors. Many thanks in advance Mick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] exclusion rules for propensity score matchng (pattern rec)
[EMAIL PROTECTED] wrote: Dear R-list, i have 6 different sets of samples. Each sample has about 5000 observations, with each observation comprised of 150 baseline covariates (X), 125 of which are dichotomous. Roughly 20% of the observations in each sample are "treatment" and the rest are "control" units. i am doing propensity score matching, i have already estimated propensity scores(predicted probabilities) using logistic regression, and in each sample i am going to have to exclude approximately 100 treated observations for which I cannot find matching control observations (because the scores for these treated units are outside the support of the scores for control units). in each sample, i must identify an exclusion rule that is interpretable on the scale of the X's that excludes these unmatchable treated observations and excludes as FEW of the remaining treated observations as possible. (the reason is that i want to be able to explain, in terms of the Xs, who the individuals are that I making causal inference about.) i've tried some simple stuff over the past few days and nothing's worked. is there an R-package or algorithm, or even estimation strategy that anyone could recommend? (i am really hoping so!) thank you, alexis diamond Exclusion can be based on the non-overlap regions from the propensity. It should not be done in the individual covariate space. I tend to look at the 10th smallest and largest values of propensity for each of the two treatment groups for making the decision. You will need to exclude non-overlap regions whether you use matching or covariate adjustment of propensity but covariate adjustment (using e.g. regression splines in the logit of propensity) is often a better approach once you've been careful about non-overlap. Frank Harrell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R package that has (much) the same capabilities as SAS v9 PROC GENMOD
On Tue, 5 Apr 2005, Simon Blomberg wrote: The questioner clearly wants generalized linear mixed models. lmer in package lme4 may be more appropriate. (Prof. Bates is a co-author.). glmmPQL should do the same job, though, but with less accuracy. Actually, I think the questioner wants GEE, from geepack or yags. SAS has an excellent glmm implementation, but it's through PROC NLMIXED rather than GENMOD, which does marginal models. -thomas Simon. check glm() On Apr 4, 2005 6:46 PM, William M. Grove <[EMAIL PROTECTED]> wrote: I need capabilities, for my data analysis, like the Pinheiro & Bates S-Plus/R package nlme() but with binomial family and logit link. I need multiple crossed, possibly interacting fixed effects (age cohort of twin when entered study, sex of twin, sampling method used to acquire twin pair, and twin zygosity), a couple of random effects other than the cluster variable, and the ability to have a variable of the sort that P&B call "outer" to the clustering variable---zygosity. Dependent variables are all parental (mom, dad separately of course) psychiatric diagnoses. In my data, twin pair ID is the clustering variable; correlations are expected to be exchangeable but substantially different between members of monozygotic twin pairs and members of dizygotic twin pairs. Hence, in my analyses, the variable that's "outer" to twin pair is monozygotic vs. dizygotic which of course applies to the whole pair. nlme() does all that but requires quasi-continuous responses, according to the preface/intro of P&B's mixed models book and what I infer from online help (i.e., no family= or link= argument). The repeated() library by Lindsey seems to handle just one nested random effect, or so I believe I read while scanning backlogs of the R-Help list. glmmPQL() is in the ballpark of what I need, but once again seems to lack the "outer" variable specification that nlme() has, and which PROC GENMOD also has---and which I need. I read someplace of yags() that apparently uses GEE to estimate parameters of nonlinear models including GLIMs/mixed models, just the way PROC GENMOD (and many another program) does. But on trying to install it (either v4.0-1.zip or v4.0-2.tar.gz from Carey's site, or Ripley's Windows port) from a local, downloaded zip file (or tar.gz file converted to zip file), I always get an error saying: > Error in file(file, "r") : unable to open connection > In addition: Warning message: > cannot open file `YAGS/DESCRIPTION' with no obvious solution. So I can't really try it out to see if it does what I want. You may ask: Why not just use GENMOD and skip the R hassles? Because I want to embed the GLIM/mixed model analysis in a stratified resampling bootstrapping loop. Very easy to implement in R, moderately painful to do in SAS. Can anybody give me a lead, or some guidance, about getting this job done in R? Thanks in advance for your help. Regards, Will Grove | Iohannes Paulus PP. II, xxx Psychology Dept. | U. of Minnesota | -+ X-headers have PGP key info.; Call me at 612.625.1599 to verify key fingerprint before accepting signed mail as authentic! Will Grove | Iohannes Paulus PP. II, xxx Psychology Dept. | U. of Minnesota | -+ X-headers have PGP key info.; Call me at 612.625.1599 to verify key fingerprint before accepting signed mail as authentic! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- WenSui Liu, MS MA Senior Decision Support Analyst Division of Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. Visiting Fellow School of Botany & Zoology The Australian National University Canberra ACT 0200 Australia T: +61 2 6125 8057 email: [EMAIL PROTECTED] F: +61 2 6125 5573 CRICOS Provider # 00120C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] accessing header information in a table
good catch. you can also get access to both colnames and rownames by dimnames(). On Apr 5, 2005 9:57 AM, Jabez Wilson <[EMAIL PROTECTED]> wrote: > thank you for your replies > > what I really wanted was the individual names, but I think I can get them with > > colnames(myTable)[x] > > where x is 1:7 > > Send instant messages to your online friends http://uk.messenger.yahoo.com > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- WenSui Liu, MS MA Senior Decision Support Analyst Division of Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dead wood code
On Tue, 5 Apr 2005 10:07:00 -0400, "Mike Saunders" <[EMAIL PROTECTED]> wrote : >Is there a package, or does anyone have code they are willing to share, that >would allow me to simulate sampling of dead wood pieces across an area? I am >specifically looking for code to simulate the dead wood distribution as small >line segments across an extent, and then I will "sample" the dead wood using >different sampling techniques including line transects, fixed area plots, etc. Walter Zucchini and others wrote a package for wildlife surveys; that might have some methods in common with what you need. The package was called WiSP. I don't see it on CRAN, but you can read the abstract here: http://www.math.usu.edu/~iface03/abstracts/03049.html Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Dead wood code
Is there a package, or does anyone have code they are willing to share, that would allow me to simulate sampling of dead wood pieces across an area? I am specifically looking for code to simulate the dead wood distribution as small line segments across an extent, and then I will "sample" the dead wood using different sampling techniques including line transects, fixed area plots, etc. Thanks, Mike Mike Saunders Research Assistant Forest Ecosystem Research Program Department of Forest Ecosystem Sciences University of Maine Orono, ME 04469 207-581-2763 (O) 207-581-4257 (F) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] accessing header information in a table
thank you for your replies what I really wanted was the individual names, but I think I can get them with colnames(myTable)[x] where x is 1:7 Send instant messages to your online friends http://uk.messenger.yahoo.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] accessing header information in a table
Look at "?names()", "?colnames()", e.g. check names(myTable) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: "Jabez Wilson" <[EMAIL PROTECTED]> To: Sent: Tuesday, April 05, 2005 3:41 PM Subject: [R] accessing header information in a table Dear list, I have read an excel table into R using read.table() and headers=T. The table is 7 columns of figures. Is there any way to gain access to the header information? Say the table headers are Mon, Tue etc I can get to the data with myTable$Tue, but how do I get the headers themselves. TIA Jab Send instant messages to your online friends http://uk.messenger.yahoo.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re[2]: [R] problems with subset (misunderstanding somewhere)
Ted, TH> Did you test the function cubic.distance? Yes, I did. TH> As written, I think it TH> will always return a single value, Yes, here was the misunderstanding. Subset required a vector, and I gave it a scalar. Prof. Ripley has already shown my mistake. -- Best regards Wladimir Eremeev mailto:[EMAIL PROTECTED] == Research Scientist, PhD Leninsky Prospect 33, Space Monitoring & Ecoinformation Systems Sector, Moscow, Russia, 119071, Institute of Ecology, Phone: (095) 135-9972; Russian Academy of Sciences Fax: (095) 135-9972 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] accessing header information in a table
Dear list, I have read an excel table into R using read.table() and headers=T. The table is 7 columns of figures. Is there any way to gain access to the header information? Say the table headers are Mon, Tue etc I can get to the data with myTable$Tue, but how do I get the headers themselves. TIA Jab Send instant messages to your online friends http://uk.messenger.yahoo.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] problems with subset (misunderstanding somewhere)
On 05-Apr-05 Wladimir Eremeev wrote: > Dear r-help, > > I have the following function defined: > > cubic.distance<-function(x1,y1,z1,x2,y2,z2) { > max(c(abs(x1-x2),abs(y1-y2),abs(z1-z2))) > } > > I have a data frame from which I make subsets. > > When I call > subset(dataframe,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2) > I have the result with 0 rows. > > However, the data frame contains the row (among others, that suit) > tb19v tb19h tb37v > 226.6 189.3 208.4 Did you test the function cubic.distance? As written, I think it will always return a single value, since max() returns the maximum of *all* the values, not by rows (even if you use cbind() rather than c()). If yor redefine the function as cubic.distance<-function(x1,y1,z1,x2,y2,z2) { apply(cbind(abs(x1-x2),abs(y1-y2),abs(z1-z2)),1,max) } I think you will find it does what you want (if I have understood your problem correctly). Example (with the function defined as above): x<-cbind(rnorm(10,190,1),rnorm(10,210,1),rnorm(10,227,1)) x<-cbind(rnorm(10,190,2),rnorm(10,210,2),rnorm(10,227,2)) colnames(x)<-c("tb19h","tb37v","tb19v") x.df<-as.data.frame(x) (cubic.distance(x.df$tb19h,x.df$tb37v,x.df$tb19v,190,210,227)<=2) #[1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE subset(x.df,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2) # tb19htb37vtb19v #3 189.3930 211.4345 226.3436 #4 189.4521 208.8493 228.0324 #9 188.2441 210.4914 226.4521 #10 191.4781 211.5234 226.1837 With your definition, you would have got the *single" result FALSE, since there is at least one case where the distance > 2, so the max > 2, so the subset criterion evaluates to FALSE, so no rows are selected. Hoping this helps, Ted. E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 05-Apr-05 Time: 14:29:22 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problems with subset (misunderstanding somewhere)
On Tue, 5 Apr 2005, Wladimir Eremeev wrote: Dear r-help, I have the following function defined: cubic.distance<-function(x1,y1,z1,x2,y2,z2) { max(c(abs(x1-x2),abs(y1-y2),abs(z1-z2))) } I have a data frame from which I make subsets. When I call subset(dataframe,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2) I have the result with 0 rows. However, the data frame contains the row (among others, that suit) tb19v tb19h tb37v 226.6 189.3 208.4 Call cubic.distance(189.3,208.4,226.6,190,210,227) gives [1] 1.6 Next call: cubic.distance(189.3,208.4,226.6,190,210,227)<=2 [1] TRUE It seems to me, that I have made errors somewhere in calls. Could you, please, be so kind, to tell me, where they are? Your function finds the maximum distance over all rows (it is passed vectors). Replace max() by pmax() for a logical result for each row. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problems with subset (misunderstanding somewhere)
Dear r-help, I have the following function defined: cubic.distance<-function(x1,y1,z1,x2,y2,z2) { max(c(abs(x1-x2),abs(y1-y2),abs(z1-z2))) } I have a data frame from which I make subsets. When I call subset(dataframe,cubic.distance(tb19h,tb37v,tb19v,190,210,227)<=2) I have the result with 0 rows. However, the data frame contains the row (among others, that suit) tb19v tb19h tb37v 226.6 189.3 208.4 Call cubic.distance(189.3,208.4,226.6,190,210,227) gives [1] 1.6 Next call: > cubic.distance(189.3,208.4,226.6,190,210,227)<=2 [1] TRUE It seems to me, that I have made errors somewhere in calls. Could you, please, be so kind, to tell me, where they are? Thank you. -- Best regards Wladimir Eremeev mailto:[EMAIL PROTECTED] == Research Scientist, PhD Leninsky Prospect 33, Space Monitoring & Ecoinformation Systems Sector, Moscow, Russia, 119071, Institute of Ecology, Phone: (095) 135-9972; Russian Academy of Sciences Fax: (095) 135-9972 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] extract date
I just started using gmail and one thing that I thought would be annoying but sometimes is actually interesting are the ads at the right hand side. They are keyed off the content of the email and in the case of your post produced: http://www.visibone.com/regular-expressions/?via=google120 http://www.regexpbuddy.com The first one is advertising a javascript reference card (which I happen to own and is excellent); but in any case, the contents of the regexp part of the reference card are fully reproduced on the web page and includes dozens of examples of regexps that you could try. I haven't explored the other web site. Although I have not read it, there is a book called Mastering Regular Expressions. By the way, here is an alternative to calculating nd in Prof. Riley's post just to give you something else to play with. I think I prefer his solution but this one is arguably a bit simpler. The three portions separated by the two bars are each deleted if they are present. gsub causes it to repeatedly try them so that it does not stop after deleting the first one: nd <- gsub("Date: |.*, | ..:.*$", "", dates) On Apr 5, 2005 7:22 AM, Petr Pikal <[EMAIL PROTECTED]> wrote: > Dear Prof.Ripley > > Thank you for your answer. After some tests and errors I finished > with suitable extraction function which gives me substatnial > increase in positive answers. > > Nevertheless I definitely need to gain more practice in regular > expressions, but from the help page I can grasp only easy things. Is > there any "Regular expressions for dummies" available? > > Best regards > Petr Pikal > > On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote: > > > On Tue, 5 Apr 2005, Petr Pikal wrote: > > > > > Dear all, > > > > > > please, is there any possibility how to extract a date from data > > > which are like this: > > > > Yes, if you delimit all the possibilities. > > > > > > > > "Date: Sat, 21 Feb 04 10:25:43 GMT" > > > "Date: 13 Feb 2004 13:54:22 -0600" > > > "Date: Fri, 20 Feb 2004 17:00:48 +" > > > "Date: Fri, 14 Jun 2002 16:22:27 -0400" > > > "Date: Wed, 18 Feb 2004 08:53:56 -0500" > > > "Date: 20 Feb 2004 02:18:58 -0600" > > > "Date: Sun, 15 Feb 2004 16:01:19 +0800" > > > > > > > > > I used > > > > > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), > > > sep="-"), format="%d-%b-%Y") > > > > > > which suits to lines 3:5 and 7 (such are the most common in my > > > dataset) but obviously does not work with other lines. > > > > For those examples, in character vector 'dates' (without quotes): > > > > > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*", > > "\\1 \\2 \\3", dates) > > > strptime(nd, "%d %b %y") > > [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18" > > [6] "2020-02-20" "2020-02-15" > > > > You should be able to amend the regexp for a wider range of forms, but > > your first line is ambiguous (2004 or 2021?) so there are limits. > > > > > If there is no stightforward solution I can live with what I use now > > > but some automagical function like > > > > > > give.me.date.from.my.string.regardles.of.formating(x) > > > would be great. > > > > It would be impossible: when Americans write 07/04/2004 they do not > > mean April 7th. > > > > -- > > Brian D. Ripley, [EMAIL PROTECTED] > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 (self) 1 South > > Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, > > UKFax: +44 1865 272595 > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > Petr Pikal > [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] extract date
On Tue, 5 Apr 2005, Petr Pikal wrote: Dear Prof.Ripley Thank you for your answer. After some tests and errors I finished with suitable extraction function which gives me substatnial increase in positive answers. Nevertheless I definitely need to gain more practice in regular expressions, but from the help page I can grasp only easy things. Is there any "Regular expressions for dummies" available? Not that I know of. One of my sysadmins uses an O'Reilly pocket guide for reference, and the O'Reilly `Mastering Regiular Expressions' book is to my mind no better than the POSIX standards. A quick look on Amazon suggests Sams Teach Yourself Regular Expressions in 10 Minutes SAMS 0672325667 to be highly rated. Best regards Petr Pikal On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote: On Tue, 5 Apr 2005, Petr Pikal wrote: Dear all, please, is there any possibility how to extract a date from data which are like this: Yes, if you delimit all the possibilities. "Date: Sat, 21 Feb 04 10:25:43 GMT" "Date: 13 Feb 2004 13:54:22 -0600" "Date: Fri, 20 Feb 2004 17:00:48 +" "Date: Fri, 14 Jun 2002 16:22:27 -0400" "Date: Wed, 18 Feb 2004 08:53:56 -0500" "Date: 20 Feb 2004 02:18:58 -0600" "Date: Sun, 15 Feb 2004 16:01:19 +0800" I used strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), sep="-"), format="%d-%b-%Y") which suits to lines 3:5 and 7 (such are the most common in my dataset) but obviously does not work with other lines. For those examples, in character vector 'dates' (without quotes): nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*", "\\1 \\2 \\3", dates) strptime(nd, "%d %b %y") [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18" [6] "2020-02-20" "2020-02-15" You should be able to amend the regexp for a wider range of forms, but your first line is ambiguous (2004 or 2021?) so there are limits. If there is no stightforward solution I can live with what I use now but some automagical function like give.me.date.from.my.string.regardles.of.formating(x) would be great. It would be impossible: when Americans write 07/04/2004 they do not mean April 7th. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] extract date
Dear Prof.Ripley Thank you for your answer. After some tests and errors I finished with suitable extraction function which gives me substatnial increase in positive answers. Nevertheless I definitely need to gain more practice in regular expressions, but from the help page I can grasp only easy things. Is there any "Regular expressions for dummies" available? Best regards Petr Pikal On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote: > On Tue, 5 Apr 2005, Petr Pikal wrote: > > > Dear all, > > > > please, is there any possibility how to extract a date from data > > which are like this: > > Yes, if you delimit all the possibilities. > > > > > "Date: Sat, 21 Feb 04 10:25:43 GMT" > > "Date: 13 Feb 2004 13:54:22 -0600" > > "Date: Fri, 20 Feb 2004 17:00:48 +" > > "Date: Fri, 14 Jun 2002 16:22:27 -0400" > > "Date: Wed, 18 Feb 2004 08:53:56 -0500" > > "Date: 20 Feb 2004 02:18:58 -0600" > > "Date: Sun, 15 Feb 2004 16:01:19 +0800" > > > > > > I used > > > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), > > sep="-"), format="%d-%b-%Y") > > > > which suits to lines 3:5 and 7 (such are the most common in my > > dataset) but obviously does not work with other lines. > > For those examples, in character vector 'dates' (without quotes): > > > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*", > "\\1 \\2 \\3", dates) > > strptime(nd, "%d %b %y") > [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18" > [6] "2020-02-20" "2020-02-15" > > You should be able to amend the regexp for a wider range of forms, but > your first line is ambiguous (2004 or 2021?) so there are limits. > > > If there is no stightforward solution I can live with what I use now > > but some automagical function like > > > > give.me.date.from.my.string.regardles.of.formating(x) > > would be great. > > It would be impossible: when Americans write 07/04/2004 they do not > mean April 7th. > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) 1 South > Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, > UKFax: +44 1865 272595 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Stats Question: Single data item versus Sample from Norm
On 05-Apr-05 Ross Clement wrote: > Hi. I have a question that I have asked in other stat forums > but do not yet have an answer for. I would like to know if > there is some way in R or otherwise of performing the following > hypothesis test. > > I have a single data item x. The null hypothesis is that x > was selected from a normal distribution N(mu,sigma). The > alternate hypothesis is that x does not come from this > distribution. > > However, I do not know the values of mu and sigma. I have a > sample of size N from which I can estimate mu and sigma. > So, say that I have N(m,s,N), and x. I would like to say with > some certainty (e.g. 95%) that I can, or can't reject the > hypothesis that x came from N(mu,sigma). I would also like a > power test to say how large N should be given the degree of > accuracy I need when accepting or rejecting individual x > values. > > What is the name of the hypothesis test I need for this? > Is it built into R, or are there packages I could use? There is no name because there is no unique test. The difficulty lies in your statement of alternative hypothesis: "that x does not come from this distribution." This allows any distribution whatever to be a possible source of your single observation x. Therefore, whatever the value of x, you can reject the null hypothesis that it comes from any N(mu,sigma^2) that is remotely compatible with your N data, in favour of some distribution that happens to predict with near-certainty that you will get that particular observation x. On that basis, for instance, suppose you had m=1.1 and s=2.5 say. And suppose x=1.15 which is very close to m with a difference which is much smaller than s. You are still entitled to reject H0 on the basis that your alternative allows you to postulate N(1.15,0.0001) as the source of the observation x. What you need to do is to make clear what feature of the value of x, in relation to any given Normal distribution, would constitute an indication that it was not sampled from that distribution. If (as I surmise) this is simply "distance from mu" [the true mean of the Normal distribution], so that you are basically testing whether x is an "outlier", then you could use the simple fact that the distribution of ((x - m)(N/(N+1))^0.5)/s has a t distribution with (N-1) degrees of freedom. This, if you have to give it a name, would be a "t" test since that is all it depends on. Note, however, that this pre-supposes that the variance of the distribution from which x was sampled is the same as the variance of the distribution giving your N values, and also that both distributions are Normal, differing therefore only in their means. So this is a tight restriction of your original universal class of alternatives. Best wishes, Ted. E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 05-Apr-05 Time: 11:35:01 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Stats Question: Single data item versus Sample from Norma l Distribution
> From: Liaw, Andy > > > From: Liaw, Andy > > > > Here's one possibility, assuming muhat and sigmahat are > > estimtes of mu and sigma from N iid draws of N(mu, sigma^2): > > > > tStat <- abs(x - muhat) / sigmahat > > pValue <- pt(tStat, df=N, lower=TRUE) > > Oops... That should be: > > pValue <- pt(tStat, df=N, lower=TRUE) / 2 I'm going mad! That should be: pValue <- pt(tStat, df=N, lower=FALSE) / 2 Sorry for wasting bw... Andy > Andy > > > > > I'm not quite sure what df tStat should have (exercise for > > math stat), but given fairly large N, that should make little > > difference. > > > > Andy > > > > > From: Ross Clement > > > > > > Hi. I have a question that I have asked in other stat forums > > > but do not > > > yet have an answer for. I would like to know if there is > > some way in R > > > or otherwise of performing the following hypothesis test. > > > > > > I have a single data item x. The null hypothesis is that x > > > was selected > > > from a normal distribution N(mu,sigma). The alternate > > > hypothesis is that > > > x does not come from this distribution. > > > > > > However, I do not know the values of mu and sigma. I have a > > sample of > > > size N from which I can estimate mu and sigma. So, say that I have > > > N(m,s,N), and x. I would like to say with some certainty > > > (e.g. 95%) that > > > I can, or can't reject the hypothesis that x came from > > N(mu,sigma). I > > > would also like a power test to say how large N should be > given the > > > degree of accuracy I need when accepting or rejecting individual x > > > values. > > > > > > What is the name of the hypothesis test I need for this? > Is it built > > > into R, or are there packages I could use? > > > > > > Thanks in anticipation, > > > > > > Ross Clement. > > > > > > __ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Stats Question: Single data item versus Sample from Norma l Distribution
> From: Liaw, Andy > > Here's one possibility, assuming muhat and sigmahat are > estimtes of mu and sigma from N iid draws of N(mu, sigma^2): > > tStat <- abs(x - muhat) / sigmahat > pValue <- pt(tStat, df=N, lower=TRUE) Oops... That should be: pValue <- pt(tStat, df=N, lower=TRUE) / 2 Andy > > I'm not quite sure what df tStat should have (exercise for > math stat), but given fairly large N, that should make little > difference. > > Andy > > > From: Ross Clement > > > > Hi. I have a question that I have asked in other stat forums > > but do not > > yet have an answer for. I would like to know if there is > some way in R > > or otherwise of performing the following hypothesis test. > > > > I have a single data item x. The null hypothesis is that x > > was selected > > from a normal distribution N(mu,sigma). The alternate > > hypothesis is that > > x does not come from this distribution. > > > > However, I do not know the values of mu and sigma. I have a > sample of > > size N from which I can estimate mu and sigma. So, say that I have > > N(m,s,N), and x. I would like to say with some certainty > > (e.g. 95%) that > > I can, or can't reject the hypothesis that x came from > N(mu,sigma). I > > would also like a power test to say how large N should be given the > > degree of accuracy I need when accepting or rejecting individual x > > values. > > > > What is the name of the hypothesis test I need for this? Is it built > > into R, or are there packages I could use? > > > > Thanks in anticipation, > > > > Ross Clement. > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Stats Question: Single data item versus Sample from Norma l Distribution
Here's one possibility, assuming muhat and sigmahat are estimtes of mu and sigma from N iid draws of N(mu, sigma^2): tStat <- abs(x - muhat) / sigmahat pValue <- pt(tStat, df=N, lower=TRUE) I'm not quite sure what df tStat should have (exercise for math stat), but given fairly large N, that should make little difference. Andy > From: Ross Clement > > Hi. I have a question that I have asked in other stat forums > but do not > yet have an answer for. I would like to know if there is some way in R > or otherwise of performing the following hypothesis test. > > I have a single data item x. The null hypothesis is that x > was selected > from a normal distribution N(mu,sigma). The alternate > hypothesis is that > x does not come from this distribution. > > However, I do not know the values of mu and sigma. I have a sample of > size N from which I can estimate mu and sigma. So, say that I have > N(m,s,N), and x. I would like to say with some certainty > (e.g. 95%) that > I can, or can't reject the hypothesis that x came from N(mu,sigma). I > would also like a power test to say how large N should be given the > degree of accuracy I need when accepting or rejecting individual x > values. > > What is the name of the hypothesis test I need for this? Is it built > into R, or are there packages I could use? > > Thanks in anticipation, > > Ross Clement. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] build failed of package
Dear Professor Ripley The good news is that I fot PVM up and running one two Windows nodes now. I had to connect them with each other manually . for now not using rsh or ssh. Now building RPVM for Windows might not be so easy as it sounds. Did anyone try this out before successfully? Also the SNOW package but that did not look so bad. Regards Lars --- Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > On Thu, 24 Mar 2005, A.J. Rossini wrote: > > > Looks like you are trying to install source > tarball on Windows without > > the relevant toolset (compiler, etc)? > > To save further hassle, rpvm is not going to build > on Windows > unless you have PVM installed and working on > Windows. > > If that is the case, this looks like the use of the > wrong make, with the > wrong shell (that message is coming from a Windows > shell, not sh.exe). > Do see the warnings in README.packages about the > MinGW make. > > > On Thu, 24 Mar 2005 00:11:34 -0800 (PST), Lars > Schouw > > <[EMAIL PROTECTED]> wrote: > >> I am trying to install the rpvm package doing > this: > >> > >> C:\R\rw2000\bin>rcmd install rpvm_0.6-2.tar.gz > >> > >> '.' is not recognized as an internal or external > >> command, > >> operable program or batch file. > >> '.' is not recognized as an internal or external > >> command, > >> operable program or batch file. > >> make: *** /rpvm: No such file or directory. > Stop. > >> make: *** [pkg-rpvm] Error 2 > >> *** Installation of rpvm failed *** > >> > >> Removing 'C:/R/rw2000/library/rpvm' > >> > >> What does this error message tell me? > > > -- > Brian D. Ripley, > [EMAIL PROTECTED] > Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 > 272861 (self) > 1 South Parks Road, +44 1865 > 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 > 272595 > __ Show us what our next emoticon should look like. Join the fun. http://www.advision.webevents.yahoo.com/emoticontest __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stats Question: Single data item versus Sample from Normal Distribution
Hi. I have a question that I have asked in other stat forums but do not yet have an answer for. I would like to know if there is some way in R or otherwise of performing the following hypothesis test. I have a single data item x. The null hypothesis is that x was selected from a normal distribution N(mu,sigma). The alternate hypothesis is that x does not come from this distribution. However, I do not know the values of mu and sigma. I have a sample of size N from which I can estimate mu and sigma. So, say that I have N(m,s,N), and x. I would like to say with some certainty (e.g. 95%) that I can, or can't reject the hypothesis that x came from N(mu,sigma). I would also like a power test to say how large N should be given the degree of accuracy I need when accepting or rejecting individual x values. What is the name of the hypothesis test I need for this? Is it built into R, or are there packages I could use? Thanks in anticipation, Ross Clement. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] extract date
On Tue, 5 Apr 2005, Petr Pikal wrote: Dear all, please, is there any possibility how to extract a date from data which are like this: Yes, if you delimit all the possibilities. "Date: Sat, 21 Feb 04 10:25:43 GMT" "Date: 13 Feb 2004 13:54:22 -0600" "Date: Fri, 20 Feb 2004 17:00:48 +" "Date: Fri, 14 Jun 2002 16:22:27 -0400" "Date: Wed, 18 Feb 2004 08:53:56 -0500" "Date: 20 Feb 2004 02:18:58 -0600" "Date: Sun, 15 Feb 2004 16:01:19 +0800" I used strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), sep="-"), format="%d-%b-%Y") which suits to lines 3:5 and 7 (such are the most common in my dataset) but obviously does not work with other lines. For those examples, in character vector 'dates' (without quotes): nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*", "\\1 \\2 \\3", dates) strptime(nd, "%d %b %y") [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18" [6] "2020-02-20" "2020-02-15" You should be able to amend the regexp for a wider range of forms, but your first line is ambiguous (2004 or 2021?) so there are limits. If there is no stightforward solution I can live with what I use now but some automagical function like give.me.date.from.my.string.regardles.of.formating(x) would be great. It would be impossible: when Americans write 07/04/2004 they do not mean April 7th. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] extract date
Dear all, please, is there any possibility how to extract a date from data which are like this: "Date: Sat, 21 Feb 04 10:25:43 GMT" "Date: 13 Feb 2004 13:54:22 -0600" "Date: Fri, 20 Feb 2004 17:00:48 +" "Date: Fri, 14 Jun 2002 16:22:27 -0400" "Date: Wed, 18 Feb 2004 08:53:56 -0500" "Date: 20 Feb 2004 02:18:58 -0600" "Date: Sun, 15 Feb 2004 16:01:19 +0800" I used strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), sep="-"), format="%d-%b-%Y") which suits to lines 3:5 and 7 (such are the most common in my dataset) but obviously does not work with other lines. If there is no stightforward solution I can live with what I use now but some automagical function like give.me.date.from.my.string.regardles.of.formating(x) would be great. Thank you. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] locfit and memory allocation
Calling gc() before starting a memory-intensive task is normally a good idea, as it helps avoid memory fragmentation (which is possibly a problem in a 32-bit OS, but you did not say). R 2.1.0 beta has some dodges to help, so you may find if helpful to try that out. On Mon, 4 Apr 2005, Mike Hickerson wrote: Hello I am getting memory allocation errors when running a function that uses locfit within a for loop. After 25 or so loops, it gives this error. "Error: cannot allocate vector of size 281250 Kb" Running on linux cluster with a Gb of RAM. Problem never happens on my OS X (less memory). The total data is 130 cols by 5000 rows The first 129 cols are response variables, the 130th is the parameter The function fits a local regression between the 129 variables in the ith row of m[ ] to the 129 variables in 5000 rows after m was fed into 130 different vectors called Var1, .Var129, and PARAMETER. array <- scan(("DataFile"),nlines=5000) m<-matrix(array,ncol=130,byrow=T) for (i in 1:200) { result<- function(m[i,c(1,,129)],PARAMETER,cbind(Var1,...,Var129)seq(1,len=50 00),F) } Any ideas on how to avoid this memory allocation problem would be greatly appreciated. Garbage collection? (or is that too slow?) Many Thanks in Advance! Mike Mike Hickerson University of California Museum of Vertebrate Zoology 3101 Valley Life Sciences Building Berkeley, California 94720-3160 USA voice 510-642-8911 cell: 510-701-0861 fax 510-643-8238 [EMAIL PROTECTED] [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Amount of memory under different OS
On Mon, 4 Apr 2005, bogdan romocea wrote: You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of RAM, the seller will probably warn you about Windows and recommend a suitable OS. There _are_ versions of Windows 2000 and later which can support more than 4Gb (and I hope if you buy a box with more memory you would be sold the appropriate version). If you have a 32-bit address space you cannot have more than 4Gb of addresses, physical or virtual. So some modern `32-bit' chips have segmented addressing: see e.g. http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx (where PAE is the scheme used on IA-32 aka ix86 chips). AFAIK the story is the same for most IA-32 OSes: each process gets a 4Gb address space and (usually) 3Gb of that is user space. R's memory usage will get cramped when the address space is tight: you probably do not want to be using 1Gb objects in a 3Gb address space (and if you need to, try R-2.1.0 beta). The question did not mention R, and `very tough' is not at all explicit. For large R tasks we have been recommending 64-bit OSes for quite a long time. There are 64-bit versions of Windows for AMD64/EMD64T chips, but not versions of the compilers used to build R for Windows. There are many users here of R under Linux (usually Fedora Core 3 or SuSE 9.x) on such chips. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Saturday, April 02, 2005 12:48 PM To: r-help@stat.math.ethz.ch Subject: [R] Amount of memory under different OS I have a problem: I need to perform a very tough analysis, so I would like to buy a new computer with about 16 GB of RAM. Is it possible to use all this memory under Windows or have I to install other OS? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Principle Component Analysis in R
I think the function that does the printing of the loadings assumes that eigenvectors are normalized to the corresponding eigenvalue, rather than unity, and the output it produces is incorrect. ft. -- Fernando TUSELLe-mail: Departamento de Econometría y Estadística [EMAIL PROTECTED] Facultad de CC.EE. y Empresariales Tel: (+34)94.601.3733 Avenida Lendakari Aguirre, 83 Fax: (+34)94.601.3754 E-48015 BILBAO (Spain)Secr: (+34)94.601.3740 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html