[R] [R-pkgs] new package 'trackObjs' - mirror objects to files, provide summaries & modification times
bles are saved on disk and will be no longer accessible until tracking is started again. * The objects are stored each in their own file in the tracking dir, in the format used by 'save()'/'load()' (RData files). List of basic functions and common calling patterns: Six functions cover the majority of common usage of the trackObjs package: * 'track.start(dir=...)': start tracking the global environment, with files saved in 'dir' * 'track.stop()': stop tracking (any unsaved tracked variables are saved to disk and all tracked variables become unavailable until tracking starts again) * 'track(x)': start tracking 'x' - 'x' in the global environment is replaced by an active binding and 'x' is saved in its corresponding file in the tracking directory and, if caching is on, in the tracking environment * 'track(x <- value)': start tracking 'x' * 'track(list=c('x', 'y'))': start tracking specified variables * 'track(all=TRUE)': start tracking all untracked variables in the global environment * 'untrack(x)': stop tracking variable 'x' - the R object 'x' is put back as an ordinary object in the global environment * 'untrack(all=TRUE)': stop tracking all variables in the global environment (but tracking is still set up) * 'untrack(list=...)': stop tracking specified variables * 'track.summary()': print a summary of the basic characteristics of tracked variables: name, class, extent, and creation, modification and access times. * 'track.remove(x)': completely remove all traces of 'x' from the global environment, tracking environment and tracking directory. Note that if variable 'x' in the global environment is tracked, 'remove(x)' will make 'x' an "orphaned" variable: 'remove(x)' will just remove the active binding from the global environment, and leave 'x' in the tracked environment and on file, and 'x' will reappear after restarting tracking. Complete list of functions and common calling patterns: The 'trackObjs' package provides many additional functions for controlling how tracking is performed (e.g., whether or not tracked variables are cached in memory), examining the state of tracking (show which variables are tracked, untracked, orphaned, masked, etc.) and repairing tracking environments and databases that have become inconsistent or incomplete (this may result from resource limitiations, e.g., being unable to write a save file due to lack of disk space, or from manual tinkering, e.g., dropping a new save file into a tracking directory.) [truncated here -- see ?trackObjs] -- Tony Plate PS: to give credit where due, the end of ?trackObjs says: References: Roger D. Peng. Interacting with data using the filehash package. R News, 6(4):19-24, October 2006. 'http://cran.r-project.org/doc/Rnews' and 'http://sandybox.typepad.com/software' David E. Brahm. Delayed data packages. R News, 2(3):11-12, December 2002. 'http://cran.r-project.org/doc/Rnews' See Also: [...] Inspriation from the packages 'g.data' and 'filehash'. ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Q: selecting a name when it is known as a string
For the column names of the result of expand.grid(), I would just assign them the values I wanted, like this: > x <- expand.grid(tmp=1:3,y=1:2) > x tmp y 1 1 1 2 2 1 3 3 1 4 1 2 5 2 2 6 3 2 > colnames(x)[1] <- "whatever" > x whatever y 11 1 22 1 33 1 41 2 52 2 6 3 2 > -- Tony Plate D. R. Evans wrote: > D. R. Evans said the following at 09/04/2007 04:14 PM : >> I am 100% certain that there is an easy way to do this, but after > > I have reconsidered this and now believe it to be essentially impossible > (or at the very least remarkably difficult) although I don't understand why > it is so :-( > > At least, I spent another two hours trying variations on the suggestions I > received, but still nothing worked properly. > > It sure seems like it _ought_ to be easy, because of the following argument: > > If I type an expression such as "A <- " then R is perfectly > capable of parsing the and executing it and assigning the > result to A. So it seems to follow that it ought to be able to parse a > string that contains exactly the same sequence of characters (after all, > why should the R parsing engine care whether the input string comes from > the terminal or from a variable?) and therefore it should be possible to > assign "" to a variable and then have R parse that variable > precisely as if it had been typed. > > That was my logic as to why this ought to be easy, anyway. (And there was > the subsidiary argument that this is easy in the other languages I use, but > R is sufficiently different that I'm not certain that that argument carries > much force.) > > It does seem that there are several ways to make the > > lo <- loess(percent ~ ncms * ds, d, control=loess.control(trace.hat = >> 'approximate')) > > command work OK if the right hand side is in a character variable, but I > haven't been able to find a way to make > > grid <- data.frame(expand.grid(ds=MINVAL:MAXVAL, ncms=MINCMS:MAXCMS)) > > work. > > I always end up with a parse error or a complaint that "'newdata' does not > contain the variables needed" when I perform the next task: > > plo <- predict(lo, grid). > > So I guess I have to stick with half a dozen compound "if" statements, all > of which do essentially the same thing :-( > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Q: selecting a name when it is known as a string
You can use substitute() for this. The drawback with this approach is that the formula in the call in the printed value of loess() is ugly. > x <- data.frame(y=rnorm(20), x1=rnorm(20), x2=rnorm(20)) > loess(y~x2, data=x) Call: loess(formula = y ~ x2, data = x) Number of Observations: 20 Equivalent Number of Parameters: 4.68 Residual Standard Error: 1.208 > loess(substitute(y~X, list(X=as.name('x2'))), data=x) Call: loess(formula = substitute(y ~ X, list(X = as.name("x2"))), data = x) Number of Observations: 20 Equivalent Number of Parameters: 4.68 Residual Standard Error: 1.208 > loess(y~x1, data=x) Call: loess(formula = y ~ x1, data = x) Number of Observations: 20 Equivalent Number of Parameters: 4.87 Residual Standard Error: 1.179 > loess(substitute(y~X, list(X=as.name('x1'))), data=x) Call: loess(formula = substitute(y ~ X, list(X = as.name("x1"))), data = x) Number of Observations: 20 Equivalent Number of Parameters: 4.87 Residual Standard Error: 1.179 > hope this helps, Tony Plate D. R. Evans wrote: > I am 100% certain that there is an easy way to do this, but after > experimenting off and on for a couple of days, and searching everywhere I > could think of, I haven't been able to find the trick. > > I have this piece of code: > > ... > attach(d) > > if (ORDINATE == 'ds') > { lo <- loess(percent ~ ncms * ds, d, control=loess.control(trace.hat = > 'approximate')) > grid <- data.frame(expand.grid(ds=MINVAL:MAXVAL, ncms=MINCMS:MAXCMS)) > ... > > then there several almost-identical "if" statements for different values of > ORDINATE. For example, the next "if" statement starts with: > > ... > if (ORDINATE == 'dsl') > { lo <- loess(percent ~ ncms * dsl, d, control=loess.control(trace.hat = > 'approximate')) > grid <- data.frame(expand.grid(dsl=MINVAL:MAXVAL, ncms=MINCMS:MAXCMS)) > ... > > This is obviously pretty silly code (although of course it does work). > > I imagine that my question is obvious: given that I have a variable, > ORDINATE, whose value is a string, how do I re-write statements such as the > "lo <-" and "grid <-" statements above so that they use ORDINATE instead of > the hard-coded names "ds" and "dsl". > > I am almost sure (almost) that it has something to do with "deparse()", but > I couldn't find the right incantation, and the ?deparse() help left my head > swimming. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do i use the get function to obtain an element from a list...
One simple way that I haven't seen mentioned yet is to do: > get("a")$x (which of course allows further variants such as get("a")$x[3:6] ...) -- Tony Plate Juan Manuel Barreneche wrote: > my problem can be explained with the following example: > > x <- 1:12 > y <- 13:24 > a <- data.frame(x = x, y = y) > > ## if i write > a$x > ## it returns > [1] 1 2 3 4 5 6 7 8 9 10 11 12 > > ## but the function get doesn't recognize a$x. Instead it produces the > following error: > get("a$x") > Error in get(x, envir, mode, inherits) : variable "a$x" was not found > > i intend to do it inside a loop, using a new object (and hence, a new > name) for each iteration (i.e., instead of a$x, it would be a$1, a$2, > a$3, and so on, for a million times). > > i would greatly appreciate it if someone could help me on this issue, > > thanks in advance, > > Juan Manuel Barreneche, > Zoología de Vertebrados, > Facultad de Ciencias, > UDELAR, Uruguay. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] poor rbind performance
As Jim points out, building up a data frame by rbinding in a loop can be a slow way to do things in R. Here's an example of how you can easily read data frames into a list: > # Create 3 files > invisible(lapply(1:3, function(i) write.csv(file=paste("tmp",i,".csv",sep=""), data.frame(i=2*i+(1:2),c=letters[2*i+(1:2)] > # Read the files into a list of data frames > list.of.dfs <- lapply(paste("tmp",1:3,".csv",sep=""), read.csv, row.names=1) > # rbind the data frames > myData <- do.call("rbind", list.of.dfs) > myData i c 1 3 c 2 4 d 3 5 e 4 6 f 5 7 g 6 8 h > (and of course, these last two expressions can be composed into a single expression if you want) -- Tony Plate Aydemir, Zava (FID) wrote: > Hi > > I rbind data frames in a loop in a cumulative way and the performance > detriorates very quickly. > > My code looks like this: > > for( k in 1:N) > { > filename <- paste("/tmp/myData_",as.character(k),".txt",sep="") > myDataTmp <- read.table(filename,header=TRUE,sep=",") > if( k == 1) { > myData <- myDataTmp > } > else{ > myData <- rbind(myData,myDataTmp) > } > } > > Some more details: > - the size of the stored text files is about 100,000 rows and 50 columns > each > - for k=1: rbind takes 0.0004 seconds > - for k=2: rbind takes 13 seconds > - for k=3: rbind takes 30 seconds > - for k=4: rbind takes 36 seconds > etc > > Any suggestions to improve speed? > > Thanks > > Zava > > > This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with RSVGTipsDevice
The new version of RSVGTipsDevice (0.7.1) that is now available on CRAN should fix this problem. Please let me know if it doesn't, or if there are other problems. -- Tony Plate mister_bluesman wrote: > Hi there. > > I am still trying to get the RSVGTipsDevice to work, yet I can not. > > I have copied the first example from RSVGTipsDevice documentation: > > library(RSVGTipsDevice) > devSVGTips("C:\\svgplot1.svg", toolTipMode=1, > title="SVG example plot 1: shapes and points, tooltips are title + 1 line") > plot(c(0,10),c(0,10), type="n", xlab="x", ylab="y", > main="Example SVG plot with title + 1 line tips (mode=1)") > setSVGShapeToolTip(title="A rectangle", desc="that is yellow") > rect(1,1,4,6, col='yellow') > setSVGShapeToolTip(title="1st circle with title only") > points(5.5,7.5,cex=20,pch=19,col='red') > setSVGShapeToolTip(title="A triangle", desc="big and green") > polygon(c(3,6,8), c(3,6,3), col='green') > # no tooltips on these points > points(2:8, 8:2, cex=3, pch=19, col='black') > # tooltips on each these points > invisible(sapply(1:7, function(x) > {setSVGShapeToolTip(title=paste("point", x)) > points(x+1, 8-x, cex=3, pch=1, col='black')})) > dev.off() > > This results in the following output: > > http://www.nabble.com/file/p11064573/svgplot1.svg svgplot1.svg > > It opens but when I try and hover over the triangle, for example, I do not > get a topptip box appear. I have tried opening the file though firefox, and > XP IE - and on more than one computer yet it does not work. Do I need to > install something else as well? > > Many thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find how many modes in 2 dimensions case
If you want to count the local maxima in the n x n matrix returned by kde2d, AND you know there are no ties, you could do something like the following: > set.seed(1) > x <- matrix(sample(10, 25, rep=TRUE), 5, 5) > x [,1] [,2] [,3] [,4] [,5] [1,]3935 10 [2,]4 10283 [3,]677 107 [4,] 107442 [5,]31883 > sum(x > cbind(0, x[,-5]) & x > cbind(x[,-1], 0) & x > rbind(x[-1,], 0) & x > rbind(0, x[-5,])) [1] 4 > Just be careful that your counting formula matches your definition of "neighbor" (the above formula does not include diagonal neighbors). And of course, ties make things more complicated (note that the above simple algorithm misses the local maximum consisting of two 8's in the last row.) -- Tony Plate Patrick Wang wrote: > Hi, > > Does anyone know how to count the number of modes in 2 dimensions using > kde2d function? > > Thanks > Pat > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interactive plots?
The package RSVGTipsDevice allows you to do just it just -- you create a plot in an SVG file that can be viewed in a browser like FireFox, and the points (or shapes) in that plot can have pop-up tooltips. -- Tony Plate mister_bluesman wrote: > Hi there. > > I have a matrix that provides place names and the distances between them: > >Chelt Exeter London Birm > Chelt 0 118 96 50 > Exeter 1180 118 163 > London 96 118 0 118 > Birm 50 163 118 0 > > After performing multidimensional scaling I get the following points plotted > as follows > > http://www.nabble.com/file/p10810700/demo.jpeg > > I would like to know how if I hover a point I can get a little box telling > me which place the point refers to. Does anyone know? > > Many thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2 always increases as variables are added?
The answer to your question three is that the calculation of r-squared in summary.lm does depend on whether or not an intercept is included in the model. (Another part of the reason for you puzzlement is, I think, that you are computing R-squared as SSR/SST, which is only valid when when the model has an intercept). The code is in summary.lm, here are the relevant excerpts (assuming your model does not have weights): r <- z$residuals f <- z$fitted w <- z$weights if (is.null(w)) { mss <- if (attr(z$terms, "intercept")) sum((f - mean(f))^2) else sum(f^2) rss <- sum(r^2) } ... ans$r.squared <- mss/(mss + rss) If you want to compare models with and without an intercept based on R^2, then I suspect it's most appropriate to use the version of R^2 that does not use a mean. It's also worthwhile thinking about what you are actually doing. I find the most intuitive definition of R^2 (http://en.wikipedia.org/wiki/R_squared) is R2 = 1 - SSE / SST where SSE = sum_i (yhat_i - y_i)^2, (sum of errors in predictions for you model) and SST = sum_i (y_i - mean(y))^2 (sum of errors in predictions for an intercept-only model) This means that the standard definition of R2 effectively compares the model with an intercept-only model. As the error in predictions goes down, R2 goes up, and the model that uses the mean(y) as a prediction (i.e., the intercept-only model) provides a scale for these errors. If you think or know that the true mean of y is zero then it may be appropriate to compare against a zero model rather than an intercept-only model (in SST). And if the sample mean of y is quite different from zero, and you compare a no-intercept model against an intercept-only model, then you're going to get results that are not easily interpreted. Note that a common way of expressing and computing R^2 is as SSR/SST (which you used). (Where SSR = sum_i (yhat_i - mean(y))^2 ). However, this is only valid when the model has an intercept (i.e., SSR/SST = 1 - SSE/SST ONLY when the model has an intercept.) Here's some examples, based on your example: > set.seed(1) > data <- data.frame(x1=rnorm(10), x2=rnorm(10), y=rnorm(10), I=1) > > lm1 <- lm(y~1, data=data) > summary(lm1)$r.squared [1] 0 > y.hat <- fitted(lm1) > sum((y.hat-mean(data$y))^2)/sum((data$y-mean(data$y))^2) [1] 5.717795e-33 > > # model with no intercept > lm2 <- lm(y~x1+x2-1, data=data) > summary(lm2)$r.squared [1] 0.6332317 > y.hat <- fitted(lm2) > # no-intercept version of R^2 (2 ways to compute) > 1-sum((y.hat-data$y)^2)/sum((data$y)^2) [1] 0.6332317 > sum((y.hat)^2)/sum((data$y)^2) [1] 0.6332317 > # standard (assuming model has intercept) computations for R^2: > SSE <- sum((y.hat - data$y)^2) > SST <- sum((data$y - mean(data$y))^2) > SSR <- sum((y.hat - mean(data$y))^2) > 1 - SSE/SST [1] 0.6252577 > # Note that SSR/SST != 1 - SSE/SST (because the model doesn't have an intercept) > SSR/SST [1] 0.6616612 > > # model with intercept included in data > lm3 <- lm(y~x1+x2+I-1, data=data) > summary(lm3)$r.squared [1] 0.6503186 > y.hat <- fitted(lm3) > # no-intercept version of R^2 (2 ways to compute) > 1-sum((y.hat-data$y)^2)/sum((data$y)^2) [1] 0.6503186 > sum((y.hat)^2)/sum((data$y)^2) [1] 0.6503186 > # standard (assuming model has intercept) computations for R^2: > SSE <- sum((y.hat - data$y)^2) > SST <- sum((data$y - mean(data$y))^2) > SSR <- sum((y.hat - mean(data$y))^2) > 1 - SSE/SST [1] 0.6427161 > SSR/SST [1] 0.6427161 > > hope this helps, Tony Plate Disclaimer: I too do not have any degrees in statistics, but I'm 95% sure the above is mostly correct :-) If there are any major mistakes, I'm sure someone will point them out. ??? wrote: > Hi, everybody, > > 3 questions about R-square: > -(1)--- Does R2 always increase as variables are added? > -(2)--- Does R2 always greater than 1? > -(3)--- How is R2 in summary(lm(y~x-1))$r.squared > calculated? It is different from (r.square=sum((y.hat-mean > (y))^2)/sum((y-mean(y))^2)) > > I will illustrate these problems by the following codes: > -(1)--- R2 doesn't always increase as variables are added > >> x=matrix(rnorm(20),ncol=2) >> y=rnorm(10) >> >> lm=lm(y~1) >> y.hat=rep(1*lm$coefficients,length(y)) >> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > [1] 2.646815e-33 >> lm=lm(y~x-1) >> y.hat=x%*%lm$coefficients >> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > [1] 0.4443356 >> This is the biggest model, but its R2 is not the biggest, > why? >> lm=lm(
Re: [R] getting informative error messages
Prof Brian Ripley wrote: > It is not clear to me what you want here. I just wanted to be able to quickly find the expression in which an error occurred when it was inside a lengthy function. I now know that 'debug()' can help with this (debug() allows me to easily step through the function and see where the error occurs.) > Errors are tagged by a 'call', and f(1:3) is the innermost 'call' (special > primitives do not set a context and so do not count if you consider '[' > to be a function). Thanks for the explanation. I suspected that it had something to do with primitive functions, but was unable to confirm that by searching. > > The message could tell you what the type was, but it does not and we have > lost the pool of active contributors we once had to submit tested patches > for things like that. What is required to test patches for things like this? Is there anything written up on that anywhere? I've not been able to clearly discern what the desired output of 'make check' is -- there seem to be reported differences that don't actually matter, but I didn't see a fast and easy way of distinguishing those from the ones that do matter. I did look in R-exts, and on developer.r-project.org but was unable to find clear guidance there either. -- Tony Plate > > > On Mon, 7 May 2007, Tony Plate wrote: > >> Certain errors seem to generate messages that are less informative than >> most -- they just tell you which function an error happened in, but >> don't indicate which line or expression the error occurred in. >> >> Here's a toy example: >> >>> f <- function(x) {a <- 1; y <- x[list(1:3)]; b <- 2; return(y)} >>> options(error=NULL) >>> f(1:3) >> Error in f(1:3) : invalid subscript type >>> traceback() >> 1: f(1:3) >> In this function, it's clear that the error is in subscripting 'x', but >> it's not always so immediately obvious in lengthier functions. >> >> Is there anything I can do to get a more informative error message in >> this type of situation? I couldn't find any help in the section >> "Debugging R Code" in "R-exts" (or anything at all relevant in "R-intro"). >> >> (Different values for options(error=...) and different formatting of the >> function made no difference.) >> >> -- Tony Plate >> >>> sessionInfo() >> R version 2.5.0 (2007-04-23) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" >> [7] "base" >> >> other attached packages: >> tap.misc >>"1.0" >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting informative error messages
Certain errors seem to generate messages that are less informative than most -- they just tell you which function an error happened in, but don't indicate which line or expression the error occurred in. Here's a toy example: > f <- function(x) {a <- 1; y <- x[list(1:3)]; b <- 2; return(y)} > options(error=NULL) > f(1:3) Error in f(1:3) : invalid subscript type > traceback() 1: f(1:3) > In this function, it's clear that the error is in subscripting 'x', but it's not always so immediately obvious in lengthier functions. Is there anything I can do to get a more informative error message in this type of situation? I couldn't find any help in the section "Debugging R Code" in "R-exts" (or anything at all relevant in "R-intro"). (Different values for options(error=...) and different formatting of the function made no difference.) -- Tony Plate > sessionInfo() R version 2.5.0 (2007-04-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: tap.misc "1.0" > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] new package: RSVGTipsDevice: create SVG plots with tooltips & hyperlinks
the DESCRIPTION file: Package: RSVGTipsDevice Version: 0.7.0 Date:04/30/2007 Title: An R SVG graphics device with dynamic tips and hyperlinks Author: Tony Plate <[EMAIL PROTECTED]>, based on RSvgDevice by T Jake Luciani <[EMAIL PROTECTED]> Maintainer: Tony Plate <[EMAIL PROTECTED]> Depends: R (>= 1.4) Description: A graphics device for R that uses the w3.org xml standard for Scalable Vector Graphics. This version supports tooltips with 1 to 3 lines, hyperlinks, and line styles. License: GPL version 2 or newer. http://www.gnu.org/copyleft/gpl.html ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying rbind to list elements
do.call("rbind", l) or, in the case of matrices, using the abind package: abind(l, along=1) > library(abind) > l <- list(matrix(1:6, ncol=2), matrix(11:14, ncol=2)) > abind(l, along=1) [,1] [,2] [1,]14 [2,]25 [3,]36 [4,] 11 13 [5,] 12 14 > Hendrik Fuß wrote: > Hi, > > I have a list of n data.frames (or matrices) which I would like to > convert to a single data.frame using rbind: > >x <- rbind( l[[1]], l[[2]], l[[3]], l[[4]], ..., l[[n]] ) > > Is there a simple way to do this? > > thanks > Hendrik > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions with grep() and negative indexing
I use regexpr() instead of grep() in cases like this, e.g.: x2[regexpr("exclude",x2)==-1] (regexpr returns a vector of the same length as character vector given it, so there's no problem with it returning a zero length vector) -- Tony Plate Peter Dalgaard wrote: > Stephen Tucker wrote: >> Dear R-helpers, >> >> Does anyone know how to use regular expressions to return vector elements >> that don't contain a word? For instance, if I have a vector >> x <- c("seal.0","seal.1-exclude") >> I'd like to get back the elements which do not contain the word "exclude", >> using something like (I know this doesn't work) but: >> grep("[^(exclude)]",x) >> >> I can use >> x[-grep("exclude",x)] >> for this case but then if I use this expression in a recursive function, it >> will not work for instances in which the vector contains no elements with >> that word. For instance, if I have >> x2 <- c("dolphin.0","dolphin.1") >> then >> x2[-grep("exclude",x2)] >> will give me 'character(0)' >> >> I know I can accomplish this in several steps, for instance: >> myfunc <- function(x) { >> iexclude <- grep("exclude",x) >> if(length(iexclude) > 0) x2 <- x[-iexclude] else x2 <- x >> # do stuff with x2 <...? >> } >> >> But this is embedded in a much larger function and I am trying to minimize >> intermediate variable assignment (perhaps a futile effort). But if anyone >> knows of an easy solution, I'd appreciate a tip. >> > It has come up a couple of times before, and yes, it is a bit of a pain. > > Probably the quickest way out is > > negIndex <- function(i) > >if(length(i)) > >-i > >else > >TRUE > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] intersect more than two sets
I don't think there's that sort of "apply-reduce" function in R, but for this problem, the last line below happens to be a "one-liner": > set.seed(1) > x <- lapply(1:10, function(i) sample(letters, 20)) > table(unlist(x)) a b c d e f g h i j k l m n o p q r s t u v w x y z 6 8 7 8 9 9 10 9 8 10 6 7 9 7 6 8 8 6 9 6 9 6 9 7 6 7 > which(table(unlist(x))==10) g j 7 10 > names(which(table(unlist(x))==10)) [1] "g" "j" > Weiwei Shi wrote: > assume t2 is a list of size 11 and each element is a vector of characters. > > the following codes can get what I wanted but I assume there might be > a one-line code for that: > > t3 <- t2[[1]] > for ( i in 2:11){ > t3 <- intersect(t2[[i]], t3) > } > > or there is no such "apply"? > > On 4/24/07, Weiwei Shi <[EMAIL PROTECTED]> wrote: >> Hi, >> I searched the archives and did not find a good solution to that. >> >> assume I have 10 sets and I want to have the common character elements of >> them. >> >> how could i do that? >> >> -- >> Weiwei Shi, Ph.D >> Research Scientist >> GeneGO, Inc. >> >> "Did you always know?" >> "No, I did not. But I believed..." >> ---Matrix III >> > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handling of arrays
Try the following and look at what they return: str(ca) dimnames(ca) -- Tony Plate [EMAIL PROTECTED] wrote: > Dear R-Experts, > > I just imported a workspace from Matlab. I know that I can get the names of > the imported variables with names(). It works. The variable "ca" consists of > several elements. I want to get the names of the elements to handle my output > better. But names(ca) doesn't work. Why? I did the following commands: > >> class(ca) > [1] "array" >> mode(ca) > [1] "list" >> dim(ca) > [1] 66 1 1 >> length(ca) > [1] 66 > > How can I now get the names which are stored in ca? When I use the command > "ca[18]" I receive the content which stands there but not the name collables > which I wanted to extract. > > Any ideas? > > Thanks, Corinna __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fastest way to repeatedly subset a data frame?
This type of information about speeds of various techniques can really only be found out by trying things out, especially because R-core has recently made a fair number of improvements to some of the underlying code in R. That's part of the reason I put these tests together -- I wanted to know for myself what sort of speed differences there was now among the various approaches. -- Tony Plate Iestyn Lewis wrote: > This is fantastic. I just tested the first match() method and it is > acceptably fast. I'll look into some of the even better methods > later. Thank you for taking the time to put this together. > > Is this kind of optimization information on the web anywhere? I can > imagine that a lot of people have slow sets of commands that could be > optimized with this kind of knowledge. > > Thank you so much, > > Iestyn > > Tony Plate wrote: >> Here's some timings on seemingly minor variations of data structure >> showing timings ranging by a factor of 100 (factor of 3 if the worst >> is omitted). One of the keys is to avoid use of the partial string >> match that happens with ordinary data frame subscripting. >> >> -- Tony Plate >> >>> n <- 1 # number of rows in data frame >>> k <- 500 # number of vectors in indexing list >>> # use a data frame with regular row names and id as factor (defaults >> for data.frame) >>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), >> result=seq(len=n), stringsAsFactors=TRUE) >>> object.size(df) >> [1] 440648 >>> df[1:3,,drop=FALSE] >>id result >> 1 ID1 1 >> 2 ID2 2 >> 3 ID3 3 >>> set.seed(1) >>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, >> size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) >>> sum(sapply(ids, length)) >> [1] 1263508 >>> system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE])) >>user system elapsed >>3.000.003.03 >>> # use a data frame with automatic row names (should be low overhead) >> and id as factor >>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), >> result=seq(len=n), row.names=NULL, stringsAsFactors=TRUE) >>> object.size(df) >> [1] 440648 >>> df[1:3,,drop=FALSE] >>id result >> 1 ID1 1 >> 2 ID2 2 >> 3 ID3 3 >>> set.seed(1) >>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, >> size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) >>> sum(sapply(ids, length)) >> [1] 1263508 >>> system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE])) >>user system elapsed >>2.680.002.70 >>> # use a data frame with automatic row names (should be low overhead) >> and id as character >>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), >> result=seq(len=n), row.names=NULL, stringsAsFactors=FALSE) >>> object.size(df) >> [1] 400448 >>> df[1:3,,drop=FALSE] >>id result >> 1 ID1 1 >> 2 ID2 2 >> 3 ID3 3 >>> set.seed(1) >>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, >> size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) >>> sum(sapply(ids, length)) >> [1] 1263508 >>> system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE])) >>user system elapsed >>1.540.001.59 >>> # use a data frame with ids as the row names & subscripting for >> matching (should be high overhead) >>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), >> result=seq(len=n), row.names="id") >>> object.size(df) >> [1] 400384 >>> df[1:3,,drop=FALSE] >> result >> ID1 1 >> ID2 2 >> ID3 3 >>> set.seed(1) >>> ids <- lapply(seq(k), function(i) paste("ID", sample(n, >> size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) >>> sum(sapply(ids, length)) >> [1] 1263508 >>> system.time(lapply(ids, function(i) df[i,,drop=FALSE])) >>user system elapsed >> 109.150.04 111.28 >>> # use a data frame with ids as the row names & match() >>> df <- data.frame(id=paste("ID", seq(len=n), sep=""), >> result=seq(len=n), row.names="id") >>> object.size(df) >> [1] 400384 >>> df[1:3,,drop=FALSE] >> result >> ID1 1 &
Re: [R] Fastest way to repeatedly subset a data frame?
Here's some timings on seemingly minor variations of data structure showing timings ranging by a factor of 100 (factor of 3 if the worst is omitted). One of the keys is to avoid use of the partial string match that happens with ordinary data frame subscripting. -- Tony Plate > n <- 1 # number of rows in data frame > k <- 500 # number of vectors in indexing list > # use a data frame with regular row names and id as factor (defaults for data.frame) > df <- data.frame(id=paste("ID", seq(len=n), sep=""), result=seq(len=n), stringsAsFactors=TRUE) > object.size(df) [1] 440648 > df[1:3,,drop=FALSE] id result 1 ID1 1 2 ID2 2 3 ID3 3 > set.seed(1) > ids <- lapply(seq(k), function(i) paste("ID", sample(n, size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) > sum(sapply(ids, length)) [1] 1263508 > system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE])) user system elapsed 3.000.003.03 > > # use a data frame with automatic row names (should be low overhead) and id as factor > df <- data.frame(id=paste("ID", seq(len=n), sep=""), result=seq(len=n), row.names=NULL, stringsAsFactors=TRUE) > object.size(df) [1] 440648 > df[1:3,,drop=FALSE] id result 1 ID1 1 2 ID2 2 3 ID3 3 > set.seed(1) > ids <- lapply(seq(k), function(i) paste("ID", sample(n, size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) > sum(sapply(ids, length)) [1] 1263508 > system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE])) user system elapsed 2.680.002.70 > > # use a data frame with automatic row names (should be low overhead) and id as character > df <- data.frame(id=paste("ID", seq(len=n), sep=""), result=seq(len=n), row.names=NULL, stringsAsFactors=FALSE) > object.size(df) [1] 400448 > df[1:3,,drop=FALSE] id result 1 ID1 1 2 ID2 2 3 ID3 3 > set.seed(1) > ids <- lapply(seq(k), function(i) paste("ID", sample(n, size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) > sum(sapply(ids, length)) [1] 1263508 > system.time(lapply(ids, function(i) df[match(i, df$id),,drop=FALSE])) user system elapsed 1.540.001.59 > > # use a data frame with ids as the row names & subscripting for matching (should be high overhead) > df <- data.frame(id=paste("ID", seq(len=n), sep=""), result=seq(len=n), row.names="id") > object.size(df) [1] 400384 > df[1:3,,drop=FALSE] result ID1 1 ID2 2 ID3 3 > set.seed(1) > ids <- lapply(seq(k), function(i) paste("ID", sample(n, size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) > sum(sapply(ids, length)) [1] 1263508 > system.time(lapply(ids, function(i) df[i,,drop=FALSE])) user system elapsed 109.150.04 111.28 > > # use a data frame with ids as the row names & match() > df <- data.frame(id=paste("ID", seq(len=n), sep=""), result=seq(len=n), row.names="id") > object.size(df) [1] 400384 > df[1:3,,drop=FALSE] result ID1 1 ID2 2 ID3 3 > set.seed(1) > ids <- lapply(seq(k), function(i) paste("ID", sample(n, size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) > sum(sapply(ids, length)) [1] 1263508 > system.time(lapply(ids, function(i) df[match(i, rownames(df)),,drop=FALSE])) user system elapsed 1.530.001.58 > > # use a named numeric vector to store the same data as was stored in the data frame > x <- seq(len=n) > names(x) <- paste("ID", seq(len=n), sep="") > object.size(x) [1] 400104 > x[1:3] ID1 ID2 ID3 1 2 3 > set.seed(1) > ids <- lapply(seq(k), function(i) paste("ID", sample(n, size=sample(seq(ceiling(n/1000), n/2, 1))), sep="")) > sum(sapply(ids, length)) [1] 1263508 > system.time(lapply(ids, function(i) x[match(i, names(x))])) user system elapsed 1.140.051.19 > Iestyn Lewis wrote: > Good tip - an Rprof trace over my real data set resulted in a file > filled with: > > pmatch [.data.frame [ FUN lapply > pmatch [.data.frame [ FUN lapply > pmatch [.data.frame [ FUN lapply > pmatch [.data.frame [ FUN lapply > pmatch [.data.frame [ FUN lapply > ... > with very few other calls in there. pmatch seems to be the string > search function, so I'm guessing there's no hashing going on, or not > very good hashing. > > I'll let you know how the environment option works - the Bioconductor > project seems to make extensive use of it, so I'm guessing it's the way > to go. > >
Re: [R] Replacement in an expression - can't use parse()
Peter Dalgaard wrote: > Daniel Berg wrote: >> Dear all, >> >> Suppose I have a very long expression e. Lets assume, for simplicity, that >> it is >> >> e = expression(u1+u2+u3) >> >> Now I wish to replace u2 with x and u3 with 1. I.e. the 'new' >> expression, after replacement, should be: >> >> >>> e >>> >> expression(u1+x+1) >> >> My question is how to do the replacement? >> >> I have tried using: >> >> >>> e = parse(text=gsub("u2","x",e)) >>> e = parse(text=gsub("u3",1,e)) >>> >> Even though this works fine in this simple example, the use of parse >> when e is very long will fail since parse has a maximum line length >> and will cut my expressions. I need to keep mode(e)=expression since I >> will use e further in symbolic derivation and division. >> >> Any suggestions are most welcome. >> > The short answer is substitute(). > > However, this is not entirely trivial to apply if you have your > expression already inside an expression() object. > > The easy thing to do is > >> substitute(u1+u2+u3, list(u2=quote(x),u3=1)) > u1 + x + 1 > > but notice that this "autoquotes" the first argument, so > >> substitute(e, list(u2=quote(x),u3=1)) > e > > which is pretty much useless. > > (Arguably it would have been a better design to avoid this feature and > require substitute(quote(.)) for the former case.) > > The way around this is to add a further layer of substitute() to insert > the value of e: > > eval(substitute(substitute(call,list(u2=quote(x),u3=1)),list(call=e[[1]]))) > u1 + x + 1 > > Notice that substitute will not go inside expression objects, so we need > to extract the mode "call" object using e[[1]]. Also, the result is > "call" not "expression". You may need an as.expression construct around > the result to get exactly what you asked. > I usually use do.call() to do this kind of thing: > e <- expression(u1+u2+u3) > e expression(u1 + u2 + u3) > do.call("substitute", list(e[[1]], list(u2=quote(x),u3=1))) u1 + x + 1 > (and of course one can wrap the result in as.expression() to get an expression back). Are there any circumstances where this construct will produce different results to the nested substitute suggested by Peter? -- Tony __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prefered date and date/time classes
I put a list of date/time classes and pointers to documents describing them on the R-wiki at http://wiki.r-project.org/rwiki/doku.php?id=guides:times-dates The various reasons one might use each of them are described in the documents. (If anyone feels like adding summaries to the "tips" section on the Wiki, please go ahead!) -- Tony Plate Petr Pikal wrote: > Hi > > On 27 Mar 2007 at 9:09, Charles Dupont wrote: > > Date sent:Tue, 27 Mar 2007 09:09:27 -0500 > From: Charles Dupont <[EMAIL PROTECTED]> > Organization: Vanderbilt University; Department of Biostatistics > To: r-help@stat.math.ethz.ch > Subject: [R] Prefered date and date/time classes > Send reply to:[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> > <mailto:[EMAIL PROTECTED]> > >> What are the preferred date, and data/time classes for R? > > It is probably a personal choice. You can use POSIX, chron or other > options. They are nicely described in RNEWS 4-1 in section Help Desk. > > Regards > Petr > > >> Thanks >> >> Charles Dupont >> >> >> -- >> Charles Dupont Computer System Analyst School of Medicine >> Department of BiostatisticsVanderbilt University >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. > > Petr Pikal > [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame handling
'table()' can compute your desired result in this particular case (though I don't know if it's what you want in general): > y <- factor(c("a","b","c")[c(1,1,1,2,2,3,3,3)]) > x <- factor(c("x","y","z")[c(1,2,3,1,2,1,2,3)]) > table(x, y) y x a b c x 1 1 1 y 1 1 1 z 1 0 1 > If x and y are already columns in a data frame, then just do > table(X$factor1, X$factor2) hope this helps, Tony Plate Michela Cameletti wrote: > Dear R-users, > I have a little problem that I can't solve by myself. > I have a data frame with 2 factors and 8 observations (see the following > code): > > y <- c(1,1,1,2,2,3,3,3) > y <- factor(y) > levels(y) <- c("a","b","c") > x <- c(1,2,3,1,2,1,2,3) > x <- factor(x) > levels(x) <- c("x","y","z") > X <- data.frame(factor1=x,factor2=y) > > and the final result is > > factor1 factor2 > 1 x a > 2 y a > 3 z a > 4 x b > 5 y b > 6 x c > 7 y c > 8 z c > >>From the above data I'd like to obtain the following matrix: > a b c > x 1 1 1 > y 1 1 1 > z 1 0 1 > > Do you have any advice? Can you help me please? > Thank you in advance, > Michela > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] timeDate & business day
There are two articles describing time and date classes in the R-News letter: Brian D. Ripley and Kurt Hornik. Date-time classes. R News, 1(2):8-11, June 2001. http://cran.r-project.org/doc/Rnews/Rnews_2001-2.pdf Gabor Grothendieck and Thomas Petzoldt. R help desk: Date and time classes in R. R News, 4(1):29-32, June 2004. http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf The Ripley and Hornik article discusses the "POSIXt" (Posix time) classes: "POSIXlt" (POSIX local time) and "POSIXct" (POSIX calendar time). The Grothendieck and Petzoldt article discusses the "Date", "chron" and "POSIXt" classes, and has a very helpful table of how to do various operations on "Date", "chron" and "POSIXct" objects. There is also the fCalandar package, which includes a timeDate class and has support for holidays, operations on timeDate objects, and various other features useful for dealing with times and dates as they are used in financial data. Obviously, there is the online help for the fCalendar package, but there are also three other documents describing how to work with timeDate objects: Computing with R and S-Plus For Financial Engineers 1 - Part I - Markets, Basic Statistics, Date and Time Management, Diethelm W¨urtz http://www.itp.phys.ethz.ch/econophysics/R/docs/fBasics.pdf R and Rmetrics for Teaching. Financial Engineering and Computational Finance, Part II, Dates, Time, and, Calendars, Diethelm W¨urtz http://www.itp.phys.ethz.ch/econophysics/R/docs/rCalendar.pdf S4 ’timeDate’ and ’timeSeries’ Classes for R, Diethelm W¨urtz http://www.itp.phys.ethz.ch/econophysics/R/pdf/calendar.pdf -- Tony Plate Michael Toews wrote: > Sadly, I don't know of any tutorials or much help on the web for R ... > that doesn't mean it doesn't exist ... you might just have to look > around for it (www.rseek.org is a good place to start) > I've learned almost everything I know through: > ?strptime > > Also check out the methods for the classes, for example: > > methods(class="Date") > methods(class="POSIXct") > > And certainly check their help pages ... there is loads of stuff here > that I haven't discovered myself. (Note, if you are new to S3 classes .. > if it begins with the method, then "." class, you only need to type the > beginning. For example "summary(ymd)" ... not "summary.Date(ymd)" if > "ymd" has `class(ymd) == "Date" `. > > I think the fundamental things to know are there are three main > DateTimeClasses: > > 1. "POSIXct" - has date, time and optionally time-zone info -- very > handy for using in data.frame objects (and frankly I think it > should be renamed to "DateTime" since the class "POSIXct" has > nothing really to do directly with date/times) > 2. "POSIXlt" - as far as I'm concerned, this is has the same > functionality as "POSIXct", but it cannot be used in data.frame > objects (and frankly, I think it should be deprecated in favour of > #1 to reduce future confusion) > 3. "Date" - use this if you don't care about times or time-zones > > But it would be nice to track down a good tutorial somewhere. > +mt > > Young Cho wrote: >> Thanks so Michael! If you know of a tutorial or introductory document >> about timeDate manipulation or time series manipulation in R, can you >> share it? It is hard to find by googling... I'd very appreciate any >> advice. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] timeDate & business day
The R timeDate class is in the fCalendar package. Does anyone know how to change the output format of a timeDate object? (Other than by explicitly supplying a format= argument to the format() function.) I tried creating a timeDate object, and then changing the format slot. However, all the functions I used on the object ('print', 'format', 'as.character', 'show') seemed to ignore the value in the format slot. And does anyone else find it a little confusing that print() and show() convert timeDate to the local time zone, but as.character() and format() display it in the time zone of its "FinCenter" slot? Here is a transcript: > library(fCalendar) > tt <- c("2005-01-04", "2005-01-05", "2005-01-06", "2005-01-07") > x <- timeDate(tt) > [EMAIL PROTECTED] [1] "%Y-%m-%d" > # Change the format on the timeDate object > [EMAIL PROTECTED] <- "%Y%m%d" > x An object of class "timeDate" Slot "Data": [1] "2005-01-03 17:00:00 Mountain Standard Time" [2] "2005-01-04 17:00:00 Mountain Standard Time" [3] "2005-01-05 17:00:00 Mountain Standard Time" [4] "2005-01-06 17:00:00 Mountain Standard Time" Slot "Dim": [1] 4 Slot "format": [1] "%Y%m%d" Slot "FinCenter": [1] "GMT" > # Can get what I want by explicitly supplying format > # argument to format() > format(x, format="%Y%m%d") [1] "20050104" "20050105" "20050106" "20050107" > # But format() seems to ignore the format slot > format(x) [1] "2005-01-04" "2005-01-05" "2005-01-06" "2005-01-07" > print(x) GMT [1] [2005-01-04] [2005-01-05] [2005-01-06] [2005-01-07] > as.character(x) [1] "2005-01-04" "2005-01-05" "2005-01-06" "2005-01-07" attr(,"control") FinCenter "GMT" > > sessionInfo() R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: fCalendar fEcofin "240.10068" "240.10067" > Sys.getenv("TZ") TZ "" > -- Tony Plate Michael Toews wrote: > Those numbers look like ... well, numbers. You want characters! Try > converting the integer to a character before trying to do a string > parse, e.g.: > > ymd.int <- c(20050104, 20050105, 20050106, 20050107, 20050110, 20050111, > 20050113, 20050114) > ymd <- as.Date(as.character(ymd.int),"%Y%m%d") > > As far as the other functions you are looking at ("timeDate", > "timeRelative") -- I've never seen these, so I'm guessing they are > S-PLUS. In R, you can use "diff" or "difftime" (which works with "Date" > and "POSIXlt"-or Date-Time classes) , e.g.: > > diff(ymd) > diff(ymd,2) > diff(ymd,3) > > or do some arithmetic: > > difftime(ymd[1],ymd[4]) > difftime(ymd[1],ymd[4],unit="weeks") > > Hopefully this is helpful to you! > +mt > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim(method="L-BFGS-B") abnormal termination
I usually see this message only when my gradient and objective functions do not match each other. I debug by comparing a finite difference approximation to the gradient with the result of the gradient function. I think you can also run optim() without supplying a gr() function - optim() will then use a finite difference approximation. If optim() works fine like this with your function, that's a strong sign that your gradient function doesn't match your objective function. It is of course possible that your gradient function is properly specified, and the function along the line being searched is so badly behaved that the line search can't find a minimum in 20 steps. If that's the case you might want to look in scaling issues, or reformulating the problem. It's also possible that even if you have a theoretically well-behaved objective and gradient, your computation of may be subject to rounding error and giving apparently discontinuous results to optim(). I'd look into all of the above possibilities before I tried increasing the limit of 20 evaluations in the line search - in my experience 20 steps is plenty to find an adequate point for a reasonably well-behaved function. It may be possible to increase the number of steps, but I don't see how from the docs for ?optim. Of course, the source is available. hope this helps, Tony Plate Petr Klasterecky wrote: > Hi, > my call of optim() with the L-BFGS-B method ended with the following > error message: ERROR: ABNORMAL_TERMINATION_IN_LNSRCH > > Further tracing shows: > Line search cannot locate an adequate point after 20 function and > gradient evaluations > final value 0.086627 > stopped after 7 iterations > > Could someone pls tell me whether it is possible to increase the limit > of 20 evaluations? Is it even worth doing so? > > My function(s) to be minimized are polynomial functions of tens of > variables - let say 10 - 60 variables, all of them constrained to the > (0,1) interval. Is it even possible and meaningfull to attempt such > minimization? (Suppose I have good starting values.) > > Thaks, Petr __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to print a double quote
> cat('Open fnd "test"\n') Open fnd "test" > cat("Open fnd \"test\"\n") Open fnd "test" > Bos, Roger wrote: > Can anyone tell me how to get R to include a double quote in the middle > of a character string? > > For example, the following code is close: > >> fnd<-"Open fnd 'test'" >>cat(fnd) > Open fnd 'test'> > > But instead of Open fnd 'test' I need: Open fnd "test". Difference > seems minor, but I am writing batch files for another program to read in > and it has to have the double quotes to work. > > Thanks in advance for any help or ideas, > > Roger > > ** * > This message is for the named person's use only. It may > contain confidential, proprietary or legally privileged > information. No right to confidential or privileged treatment > of this message is waived or lost by any error in > transmission. If you have received this message in error, > please immediately notify the sender by e-mail, > delete the message and all copies from your system and destroy > any hard copies. You must not, directly or indirectly, use, > disclose, distribute, print or copy any part of this message > if you are not the intended recipient. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing
> a <- data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) > x <- c(1,1,2,7,6,5,4,3,2,2,2) > match(x, a$class) [1] 1 1 4 NA NA 3 NA 2 4 4 4 > a[match(x, a$class), "value"] [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 > -- Tony Plate javier garcia-pintado wrote: > Hello, > In a nutshell, I've got a data.frame like this: > > >>assignation <- data.frame(value=c(6.5,7.5,8.5,12.0),class=c(1,3,5,2)) >>assignation > > value class > 1 6.5 1 > 2 7.5 3 > 3 8.5 5 > 4 12.0 2 > >> > > > and a long vector of classes like this: > > >>x <- c(1,1,2,7,6,5,4,3,2,2,2...) > > > And would like to obtain a vector of length = length(x), with the > corresponding values extracted from assignation table. Like this: > >>x.value > > [1] 6.5 6.5 12.0 NA NA 8.5 NA 7.5 12.0 12.0 12.0 > > Could you help me with an elegant way to do this ? > (I just can do it with looping for each class in the assignation table, > what a think is not perfect in R's sense) > > Wishes, > Javier > > > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Outlook does threading
Your final paragraph has the take-home message for everyone (not just MS Outlook users): "just create, from scratch, a new message when initiating a new subject." Viewing threads can be completely different to sorting based on the subject line. Your initial post with the subject "regexpr and parsing question" was in fact a reply to the message from Gabor Grothendick in the thread "Re: [R] change plotting symbol for groups in trellis graph." (I can see this by looking at the header information: I see a "In-reply-to:" header item.) When I view threads in the Thunderbird mail reader, your post and replies with the subject "regexpr and parsing question" do in fact show up under the thread in which Gabor's message appeared, not in their own thread. According to http://office.microsoft.com/en-us/outlook/HA011356671033.aspx, one can view threads in Outlook by selecting "View->Arrange By->Conversation". Hope this helps (in case the horse was not thoroughly dead already.) -- Tony Plate Kimpel, Mark William wrote: > See below for Bert Gunter's off list reply to me (which I do > appreciate). I'm putting it back on the list because it seems there is > still confusion regarding the difference between threading and sorting > by subject. I thought the example I will give below will serve as > instructional for other Outlook users who may be similarly confused as I > was (am?). > > Per Bert's instructions, I just set up my inbox to sort by subject. I > sent one email to myself with the subject "test1" and then replied to it > without changing the subject. The reply correctly went to "test1" in the > inbox sorter. I then changed the subject heading in the test1 reply to > "test2" and sent it to myself. This time Outlook re-categorized it and > put it in a separate compartment in the view called "test2". > > If Outlook can do threading the way the R mail server does, I don't > think this is the way to do it. > > Unless someone has an idea of how to correctly set up Outlook to do > threading in the manner that the R mail server does, I think the message > for us Outlook users is to just create, from scratch, a new message when > initiating a new subject. > > Thanks for all your help. > > Mark > > -Original Message- > From: Bert Gunter [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 31, 2007 7:03 PM > To: Kimpel, Mark William > Subject: Outlook does threading > > Mark: > > No need to bother the R list with this. Outlook does threading. Just > sort on > Subject in the viewer. > > Bert Gunter > Genentech Nonclinical Statistics > South San Francisco, CA 94404 > 650-467-7374 > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark > William > Sent: Wednesday, January 31, 2007 3:36 PM > To: Peter Dalgaard > Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] > Subject: Re: [R] possible spam alert > > Peter, > > Thanks you for your explanation, I had taken Mr. Connolly's message to > me to imply that I was not changing the subject line. I use MS Outlook > 2007 and, unless I am just not seeing it, Outlook does not normally > display the "in reply to" header, I was under the mistaken impression > that that was what the Subject line was for. See, for example, the > header to your message to me below. Outlook will, however, sort messages > by Subject, and that is what I thought was meant by threading. > > Well, I learned something today and apologize for any inconvenience my > posts may have caused. > > BTW, I use Outlook because it is supported by my university server and > will synch my appointments and contacts with my PDA, which runs Windows > CE. If anyone has a suggestion for me of a better email program that > will provide proper threading AND work with a MS email server and synch > with Windows CE, I'd love to hear it. > > Thanks again, > > Mark > > Mark W. Kimpel MD > > > > (317) 490-5129 Work, & Mobile > > > > (317) 663-0513 Home (no voice mail please) > > 1-(317)-536-2730 FAX > > > -Original Message- > From: Peter Dalgaard [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 31, 2007 6:25 PM > To: Kimpel, Mark William > Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch > Subject: Re: [R] possible spam alert > > Kimpel, Mark William wrote: > >>The last two times I have originated message threads on R or >>Bioconductor I have received the message included below from someone >>named Patrick Connolly. Both times I was the originator of the message
Re: [R] Simple Date problems with cbind
> It is probably something blindingly simple but can > anyone suggest something? You need to use the format code "%Y" for 4-digits years. You need to create a data frame using 'data.frame()' (cbind() creates a matrix when given just vectors). > as.Date(c("2005/01/24" ,"2006/01/23" ,"2006/01/23"), "%Y/%m/%d") [1] "2005-01-24" "2006-01-23" "2006-01-23" > data.frame(int=1:3, date=as.Date(c("2005/01/24" ,"2006/01/23" ,"2006/01/23"), "%Y/%m/%d")) int date 1 1 2005-01-24 2 2 2006-01-23 3 3 2006-01-23 > (x <- data.frame(int=1:3, date=as.Date(c("2005/01/24" ,"2006/01/23" ,"2006/01/23"), "%Y/%m/%d"))) int date 1 1 2005-01-24 2 2 2006-01-23 3 3 2006-01-23 > class(x) [1] "data.frame" > sapply(x, class) int date "integer""Date" > -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to avoid test for NAs in foreign function call
Supply NAOK=TRUE argument to .C; the help page for .C() contains the following: Usage .C(name, ..., NAOK = FALSE, DUP = TRUE, PACKAGE) Also, you might want to consider using the "raw" data type instead of integers -- that way you should have fewer problems with R code making unwanted interpretations of certain bit patterns. -- Tony Plate Knut M. Wittkowski wrote: > We have packed logical vectors into integers, 32 flags at a time and > then want to AND or OR these vectors of "integers" using other C functions. > > The problem: occasionally, the packed sequence of 32 logical values > resembles NA, causing the error message: > > Error in bitAND(packed1, packed2, lenx) : > NAs in foreign function call (arg 1) > > How does one instruct R to avoid checking for NAs? > > Knut M. Wittkowski, PhD,DSc > -- > The Rockefeller University, > Center for Clinical and Translational Science > Research Design and Biostatistics, > 1230 York Ave #121B, Box 322, NY,NY 10021 > +1(212)327-7175, +1(212)327-8450 (Fax), [EMAIL PROTECTED] > http://www.rockefeller.edu/ccts/rdbs.php > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse question
I think you can find your answer if you study this part of the documentation for ifelse: Details: If yes or no are too short, their elements are recycled. yes will be evaluated if and only if any element of test is true, and analogously for no. Also, consider this call: ifelse(1:12 > 5, 1:3, 11:14) -- Tony Plate Jacques Ropers wrote: >>But you got only two (eventually one) distinct values, right? Look at >>the code for 'ifelse': yes and no are only called once each, then >>recycled to desired length. >> >>I guess you want something like >> >>x <- rnorm(10) >>y <- rnorm(10) >>z <- rnorm(10) >>y1 <- ifelse(x > 0, y, z) >> > > Thanks for the help. > > Although this would do the trick, is there a way to call repetitively > rnorm (rpois...) *inside the ifelse* rather than constructing the vector > outside ? Like in the following where cos() and sin() functions are > evaluated for each row : > x <- rnorm(10) > y1 <- ifelse(x > 0, cos(x), sin(x)) > > I am trying to understand the difference of behaviour. R acts as if > rnorm(1) return value were known after the first call and does not > evaluate rnorm(1) in > > y1 <- ifelse(x > 0, rnorm(1) , rnorm(1)) > > again after the first evaluation. > > > Jacques. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
Did you try supplying gradient information to nlminb? (I note that nlminb is used for the optimization, but I don't see any gradient information supplied to it.) I would suspect that supplying gradient information would greatly speed up the computation (as you note in comments at http://otter-rsch.ca/tresults.htm.) I'm curious -- when you say "R may not be a suitable platform for development for such models", what aspect of R do you feel is lacking? Is it the specific optimization routines available, or is it some other more general aspect? Also, another optimization algorithm available in R is the "L-BFGS-B" method for optim() in the MASS package. I've had extremely good experiences with using this code in S-PLUS. It can take box constraints, and can use gradient information. It is my first choice for most optimization problems, and I believe it is very widely used. Did you try using that optimization routine with this problem? -- Tony Plate dave fournier wrote: > There has recently been some discussion on the list about > AD Model builder and the suitability of R for constructing the > types of models used in fisheries management. > >https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html > >https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html > > I think that many R users understimate the numerical challenges > that some of the typical nonlinear statistical model used in different > fields present. R may not be a suitable platform for development for > such models. > > Around 10 years ago John Schnute, Laura Richards, and Norm Olsen > with Canadian federal fisheries undertook an investigation > comparing various statistical modeling packages for a simple > age-structured statistical model of the type commonly used in > fisheries. They compared AD Mdel Builder, Gauss, Matlab, and > Splus. Unfortunately a working model could not be produced with Splus > so its times could not be included in the comparison. It is possible > to produce a working model with the present day version of R so that > R can now be directly compared with AD Model Builder for this type of model. > > I have put the results of the test together with the original > Schnute and Richards paper and the working R and AD Model Builder > codes on Otter's web site > > http://otter-rsch.ca/tresults.htm > > The results are that AD Model builder is roughly 1000 times faster than > R for this problem. ADMB takes about 2 seconds to converge while > R takes over 90 minutes. > > This is a simple toy example. Real fisheries models are often hundred of > times more computationally intensive as this one. > > Cheers, > > Dave > ~ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data storage/cubes and pointers in R
What kind of operations do you need to be able to do? I frequently use 3 and higher dimensional arrays for storing data, and then I use indexing operations to extract slices of data, or sometimes apply() and friends to process the data. The abind() function (in the 'abind' package) will bind together vectors and arrays into higher dimensional arrays -- it might come in handy for you. -- Tony Plate Piet van Remortel wrote: > Hi all, > > I am faced with the situation where I want to store/analyze > relatively large, organized sets of numerical data, which depend on a > number of conditions (biological properties, exposure times, > concentrations etc etc). Imagine about a hundred dataframes of a few > thousand numerical values, with some annotation in text for some > entries. > > Intuitively, I would like to be able to slice the data in a 'data- > cube' kind of way to query, analyze, cluster, fit etc., which > resembles the database data-cube way of thinking common in de db > world these days. ( http://en.wikipedia.org/wiki/Data_cube ) > > I have no knowledge of a package that supports such things in an > elegant way within R. If this exists, please point me to it. > > Also considering implementing a similar setup myself, I started > wondering about the possibility of use references (or "pointers" > aargh) to dataframes and store them in a list etc. Separate lists > can then represent different 'views' on the shared instance > dataframes etc. I have no knowledge if that is even possible in R, > and if that is even the smart way to do it. If someone could provide > some help, that would be great. > > Other option is of course to link to MySQL and do all data handling > in that way. Also considering that. > > Any thoughts/hints would be appreciated ! > > thanks, > > Piet > > > > -- > Dr. P. van Remortel > Intelligent Systems Lab > Dept. of Mathematics and Computer Science > University of Antwerp > Belgium > http://www.islab.ua.ac.be > +32 3 265 33 57 (secr.) > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem about using list element in for cycle
Your problem is that you are using cat() on a factor. Use as.character() or format() to convert the factor to character data, which cat will then print in the way you want. > x <- data.frame(L=letters[1:3]) > x L 1 a 2 b 3 c > x$L [1] a b c Levels: a b c > cat(x$L, "\n") 1 2 3 > cat(as.character(x$L), "\n") a b c > cat(format(x$L), "\n") a b c > Hu Chen wrote: > sorry, pressed "sent" by mistake. > for example > >>data <- read.csv("data.txt") >>single > > V1 V2 > 1 YHR165C CG8877 > 2 YJL130C CG18572 > 3 YDL171C CG9674 > 4 YKR054C CG7507 > 5 YDL140C CG1554 > 6 YLR106C CG13185 > 7 YGL206C CG9012 > 8 YNL262W CG6768 > 9 YER172C CG5931 > > >>typeof(data) > > [1] "list" > >>for (i in 1:nrow(data)){ > > cat(data[i,1] >} > > it'll not return things like "YHR165C" but number like 6,7,9.. > is this a new feature of list? how to turn off it. > thanks > > On 10/23/06, Hu Chen <[EMAIL PROTECTED]> wrote: > >>for example >>data <- read.csv("data.txt") >>typeof(data) >>[1] "list" >>for (i in 1:nrow(data)){ >> >> > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] not understanding a do.call
Suppose you have a list of equal-length numeric vectors and you want to bind them together in a matrix. You want to a piece of code that will work no matter how many vectors are in the list. That's what this construct with do.call() is useful for, e.g.: > a <- 1:3 > b <- 4:6 > c <- 7:9 > x1 <- list(a=a,b=b) > x2 <- list(a=a,b=b,c=c) > do.call("cbind", x1) a b [1,] 1 4 [2,] 2 5 [3,] 3 6 > do.call("cbind", x2) a b c [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > -- Tony Plate Leeds, Mark (IED) wrote: > I did a ?do.call but i don't think i understand it. > > if a, b,c,d are numeric vectors then could someone explain the > difference between > > do.call("cbind",list(a,b,c,d)) > > and cbind(a,b,c,d). > > or point to an archive on it. > > the return value of cbind is a matrix or dataframe depending on what is > sent in but i don't > understand wheen it would be useful to use do.call. i realize it takes a > list but that's > all i know about why one use it ? thanks. > > > This is not an offer (or solicitation of an offer) to buy/sell the > securities/instruments mentioned or an official confirmation. Morgan Stanley > may deal as principal in or own or act as market maker for > securities/instruments mentioned or may advise the issuers. This is not > research and is not from MS Research but it may refer to a research > analyst/research report. Unless indicated, these views are the author's and > may differ from those of Morgan Stanley research or others in the Firm. We > do not represent this is accurate or complete and we may not update this. > Past performance is not indicative of future returns. For additional > information, research reports and important disclosures, contact me or see > https://secure.ms.com/servlet/cls. You should not use e-mail to request, > authorize or effect the purchase or sale of any security or instrument, to > send transfer instructions, or to effect any other transactions. We cannot > guarantee that any such requests received vi a ! > e-mail will be processed in a timely manner. This communication is solely > for the addressee(s) and may contain confidential information. We do not > waive confidentiality by mistransmission. Contact me if you do not wish to > receive these communications. In the UK, this communication is directed in > the UK to those persons who are market counterparties or intermediate > customers (as defined in the UK Financial Services Authority's rules). > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: rarefy a matrix of counts
Two things to note: (1) rep() can be vectorized: > rep(1:3, 2:4) [1] 1 1 2 2 2 3 3 3 3 > (2) you will likely get much better performance if you work with integers and convert to strings after sampling (or use factors), e.g.: > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)] [1] "red" "blue" "red" "red" "red" > -- Tony Plate Brian Frappier wrote: > I tried all of the approaches below. > > the problem with: > > > x <- data.frame(matrix(NA,100,3)) > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > > if you want result in data frame > > or > > x<-vector("list", 3) > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > is that this code still samples the rows, not the elements, i.e. returns > 100 or 300 in the matrix cells instead of "red" or a matrix of counts by > color (object type) like: >x1x2 x3 > red 32 560 > gr6895 40 > sum 100 100 100 > > It looks like Tony is right: sampling without replacement requires > listing of all elements to be sampled. But, the code Petr provided > > x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100) > > did give me a clue of how to quickly make such a list using the 'rep' > command. I will for-loop a rep statement using my original matrix to > create a list of elements for each sample: > > Thanks Petr and Tony for your help! > > On 10/11/06, *Tony Plate* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > wrote: > > Here's a way using apply(), and the prob= argument of sample(): > > > df <- data.frame(sample1=c(red=400,green=100,black=300), > sample2=c(300,0,1000), sample3=c(2500,200,500)) > > df >sample1 sample2 sample3 > red 400 3002500 > green 100 0 200 > black 3001000 500 > > set.seed(1) > > apply(df, 2, function(counts) sample(seq(along=counts), rep=T, > size=7, prob=counts)) > sample1 sample2 sample3 > [1,] 1 3 1 > [2,] 1 3 1 > [3,] 3 3 1 > [4,] 2 3 2 > [5,] 1 3 1 > [6,] 2 3 1 > [7,] 2 3 3 > > > > Note that this does sampling WITH replacement. > AFAIK, sampling without replacement requires enumerating the entire > population to be sampled from. I.e., you cannot do > > sample(1:3, prob=1:3, rep=F, size=4) > instead of > > sample(c(1,2,2,3,3,3), rep=F, size=4) > > -- Tony Plate > > From reading ?sample, I was a little unclear on whether sampling > without replacement could work > > Petr Pikal wrote: > > Hi > > > > a litle bit different story. But > > > > x1 <- sample(c(rep("red",400),rep("green", 100), > > rep("black",300)),100) > > > > is maybe close. With data frame (if it is not big) > > > > > >>DF > > > > color sample1 sample2 sample3 > > 1 red 400 3002500 > > 2 green 100 0 200 > > 3 black 3001000 500 > > > > x <- data.frame(matrix(NA,100,3)) > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > > if you want result in data frame > > or > > x<-vector("list", 3) > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > > > if you want it in list. Maybe somebody is clever enough to discard > > for loop but you said you have 80 columns which shall be no problem. > > > > HTH > > Petr > > > > > > > > > > > > > > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote: > > > > Date sent:Wed, 11 Oct 2006 10:11:33 -0400 > > From: "Brian Frappier" < [EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> > > To: "Petr Pikal" <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> > > Subject: Fwd: [R] rarefy a matrix of counts > > > > > >>-- Forwarded message -- >
Re: [R] Fwd: rarefy a matrix of counts
Here's a way using apply(), and the prob= argument of sample(): > df <- data.frame(sample1=c(red=400,green=100,black=300), sample2=c(300,0,1000), sample3=c(2500,200,500)) > df sample1 sample2 sample3 red 400 3002500 green 100 0 200 black 3001000 500 > set.seed(1) > apply(df, 2, function(counts) sample(seq(along=counts), rep=T, size=7, prob=counts)) sample1 sample2 sample3 [1,] 1 3 1 [2,] 1 3 1 [3,] 3 3 1 [4,] 2 3 2 [5,] 1 3 1 [6,] 2 3 1 [7,] 2 3 3 > Note that this does sampling WITH replacement. AFAIK, sampling without replacement requires enumerating the entire population to be sampled from. I.e., you cannot do > sample(1:3, prob=1:3, rep=F, size=4) instead of > sample(c(1,2,2,3,3,3), rep=F, size=4) -- Tony Plate From reading ?sample, I was a little unclear on whether sampling without replacement could work Petr Pikal wrote: > Hi > > a litle bit different story. But > > x1 <- sample(c(rep("red",400),rep("green", 100), > rep("black",300)),100) > > is maybe close. With data frame (if it is not big) > > >>DF > > color sample1 sample2 sample3 > 1 red 400 3002500 > 2 green 100 0 200 > 3 black 3001000 500 > > x <- data.frame(matrix(NA,100,3)) > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > if you want result in data frame > or > x<-vector("list", 3) > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > if you want it in list. Maybe somebody is clever enough to discard > for loop but you said you have 80 columns which shall be no problem. > > HTH > Petr > > > > > > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote: > > Date sent:Wed, 11 Oct 2006 10:11:33 -0400 > From: "Brian Frappier" <[EMAIL PROTECTED]> > To: "Petr Pikal" <[EMAIL PROTECTED]> > Subject: Fwd: [R] rarefy a matrix of counts > > >>-- Forwarded message -- >>From: Brian Frappier <[EMAIL PROTECTED]> >>Date: Oct 11, 2006 10:10 AM >>Subject: Re: [R] rarefy a matrix of counts >>To: r-help@stat.math.ethz.ch >> >>Hi Petr, >> >>Thanks for your response. I have data that looks like the following: >> >> sample 1 sample 2 sample 3 >>red candy400 300 2500 >>green candy1000 200 >>black candy 3001000500 >> >>I don't want to randomly select either the samples (columns) or the >>"candy" types (rows), which sample as you state would allow me. >>Instead, I want to randomly sample 100 candies from each sample and >>retain info on their associated type. I could make a list of all the >>candies in each sample: >> >>sample 1 >>red >>red >>red >>red >>green >>green >>black >>red >>black >>... >> >>and then randomly sample those rows. Repeat for each sample. But, I >>am not sure how to do that without alot of loops, and am wondering if >>there is an easier way in R. Thanks! I should have laid this out in >>the first email...sorry. >> >> >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED]> wrote: >> >>>Hi >>> >>>I am not experienced in Matlab and from your explanation I do not >>>understand what exactly do you want. It seems that you want randomly >>>choose a sample of 100 rows from your martix, what can be achived by >>>sample. >>> >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300) >>>DF[sample(1:100, 10),] >>> >>>If you want to do this several times, you need to save your result >>>and than it depends on what you want to do next. One suitable form >>>is list of matrices the other is array and you can use for loop for >>>completing it. >>> >>>HTH >>>Petr >>> >>> >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote: >>> >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400 >>>From: "Brian Frappier" <[EMAIL PROTECTED]> >>>To: r-help@stat.math.ethz.ch Subject: >>>[R] rarefy a matrix of counts >>> >>> >>>>Hi all, >>
Re: [R] shifting a huge matrix left or right efficiently ?
If you're able to work with the transpose of your matrix, you might consider the function 'filter()', e.g.: > filter(diag(1:5), c(2,3), sides=1) Time Series: Start = 1 End = 5 Frequency = 1 [,1] [,2] [,3] [,4] [,5] 1 NA NA NA NA NA 234000 306600 400980 5000 12 10 > I don't know if the conversion to and from a time-series class will impact the timing, but if this might serve your purposes, it's easy to do some experiments to find out. - Tony Plate Huang-Wen Chen wrote: > I'm wondering what's the best way to shift a huge matrix left or right. > My current implementation is the following: > > shiftMatrixL <- function(X, shift, padding=0) { > cbind(X[, -1:-shift], matrix(padding, dim(X)[1], shift)) > } > > X <- shiftMatrixL(X, 1)*3 + shiftMatrixL(X,2)*5... > > However, it's still slow due to heavy use of this function. > The resulting matrix will only be read once and then discarded, > so I believe the best implementation of this function is in C, > manipulating the internal data structure of this matrix. > Anyone know similar package for doing this job ? > > Huang-Wen > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how ot replace the diagonal of a matrix
You are indexing with numeric 0's and 1's, which will refer to only the matrix element 1,1 (multiple times), cf: > matrix(1:9,3)[diag(3)] [1] 1 1 1 > Try one of these: > idx <- diag(3) > 0 > idx <- which(diag(3)>0) > idx <- cbind(seq(len=n), seq(len=n)) (For very large matrices, the third will be more efficient, I believe.) -- Tony Plate roger bos wrote: > Dear useRs, > > Trying to replace the diagonal of a matrix is not working for me. I > want a matrix with .6 on the diag and .4 elsewhere. The following > code looks like it should work--when I lookk at mps and idx they look > how I want them too--but it only replaces the first element, not each > element on the diagonal. > > mps <- matrix(rep(.4, 3*3), nrow=n, byrow=TRUE) > idx <- diag(3) > mps > idx > mps[idx] <- rep(.6,3) > > I also tried something along the lines of diag(mps=.6, ...) but it > didn't know what mps was. > > Thanks, > > Roger > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List-manipulation
Does this do what you want? > x <- list(1,2,3:7,8,9:10) > sapply(x, function(xx) xx[1]) [1] 1 2 3 8 9 > -- Tony Plate Benjamin Otto wrote: > Hi, > > > > Sorry for the question, I know it should be basic knowledge but I'm > struggling for two hours now. > > > > How do I select only the first entry of each list member and ignore the > rest? > > > > So for > > > > >>$"121_at" > > >>-113691170 > > > > > >>$"1255_g_at" > > >>42231151 > > > > > >>$"1316_at" > > >>35472685 35472588 > > > > > >>$"1320_at" > > >>-88003869 > > > > > I only want to select > > > > -113691170, 42231151, 35472685 and -88003869 .? > > > > Regards > > Benjamin > > -- > Benjamin Otto > Universitaetsklinikum Eppendorf Hamburg > Institut fuer Klinische Chemie > Martinistrasse 52 > 20246 Hamburg > > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbolic matrix elements...
If I construct the matrix by list()ing together the expressions rather than c()ing, then it works OK: > x <- matrix(list( expression(x3-5*x+4), expression(log(x2-4*x > x[1,1] [[1]] expression(x3 - 5 * x + 4) > x[[1,1]] expression(x3 - 5 * x + 4) > D(x[[1,1]], "x") -5 > The reason c() doesn't work properly here might have something to do with it creating a language object of an unconventional type: > c( expression(x3-5*x+4), expression(log(x2-4*x))) expression(x3 - 5 * x + 4, log(x2 - 4 * x)) > expression(x3-5*x+4) expression(x3 - 5 * x + 4) > Using list() with language objects is much safer if you just want to make lists of them. -- Tony Plate Evan Cooch wrote: > > Eik Vettorazzi wrote: > >>test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x >>works. > > Well, not really (or I'm misunderstanding). Your code enters fine (no > errors), but I can't access individual elements - e.g., test[1,1] gives > me an error: > > > test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x > > test[1,1] > Error: matrix subscripting not handled for this type > > Meaning...what? > > >>btw. you recieved an error because D expects an expression and you >>offered a list > > > OK - so why then are each of the elements identified as an expression > which I print out the vector? Each element is reported to be an > expression. OK, if so, then I remain puzzled as to how this is a 'list'. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Access Rows in a Data Frame by Row Name
Matrix-style indexing works for both columns and rows of data frames. E.g.: > x <- data.frame(a=1:5, b=6:10, d=11:15) > x a b d 1 1 6 11 2 2 7 12 3 3 8 13 4 4 9 14 5 5 10 15 > x[2:4,c(1,3)] a d 2 2 12 3 3 13 4 4 14 > Time spend reading the help document "An Introduction to R" will probably be well worth it. The relevant sections are "5 Arrays and matrices", and "6.3 Data frames". -- Tony Plate Michael Gormley wrote: > I have created a data frame using the read.table command. I want to be able > to access the rows by the row name, or a vector of row names. I know that you > can access columns by using the data.frame.name$col.name. Is there a way to > access row names in a similar manner? > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rename cols
The following works for data frames and matrices (you didn't say which you were working with). > x <- data.frame(V1=1:3,V2=4:6) > x V1 V2 1 1 4 2 2 5 3 3 6 > colnames(x) <- c("Apple", "Orange") > x Apple Orange 1 1 4 2 2 5 3 3 6 > For a data frame, 'names(x) <- c("Apple", "Orange")' also works, because a dataframe is stored internally as a list of columns. -- Tony Plate Ethan Johnsons wrote: > A quick question please! > > How do you rename column names? i.e. V1 --> Apple; V2 --> Orange, etc. > > thx much > > ej > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with putting objects in list
I suspect you are not thinking about the list and the subsetting/extraction operators in the right way. A list contains a number of components. To get a subset of the list, use the '[' operator. The subset can contain zero or more components of the list, and it is a list itself. So, if x is a list, then x[2] is a list containing a single component. To extract a component from the list, use the '[[' operator. You can only extract one component at a time. If you supply a vector index with more than one element, it will index recursively. > x <- list(1,2:3,letters[1:3]) > x [[1]] [1] 1 [[2]] [1] 2 3 [[3]] [1] "a" "b" "c" > # a subset of the list > x[2:3] [[1]] [1] 2 3 [[2]] [1] "a" "b" "c" > # a list with one component: > x[2] [[1]] [1] 2 3 > # the second component itself > x[[2]] [1] 2 3 > # recursive indexing > x[[c(2,1)]] [1] 2 > x[[c(3,2)]] [1] "b" > Rainer M Krug wrote: > Hi > > I use the following code and it stores the results of density() in the > list dr: > > dens <- function(run) { density( positions$X[positions$run==run], bw=3, > cut=-2 ) } > dr <- lapply(1:5, dens) > > but the results are stored in dr[[i]] and not dr[i], i.e. plot(dr[[1]]) > works, but plot([1]) doesn't. > > Is there any way that I can store them in dr[i]? > > Thanks a lot, > > Rainer > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot get simple data.frame binding.
Maybe I'm missing something, but your "Real life code" looks like it should work. What happens when you do: > ire1 <- data.frame(md1[, 1:11], other) Error in data.frame(md1[, 1:11], other) : arguments imply differing number of rows: 11, 75 > str(md1[, 1:11]) > str(other) ? Maybe the labelled data frame is causing the problem? Did you try as.data.frame(md1[,1:11])? (I'm guessing that will strip off extra attributes). -- Tony Plate John Kane wrote: > I am stuck on a simple problem where an example works > fine but the real one does not. > > I have a data.frame where I wish to sum up some values > across the rows and create a new data.frame with some > of old data.frame variables and the new summed > variable. > > It works fine in my simple example but I am doing > something wrong in the real world. In the real world > I am loading a labeled data.frame. The orginal data > comes from a spss file imported using spss.get but the > current data.frame is a subset of the orginal spss > file. > > EXAMPLE > cata <- c( 1,1,6,1,1,NA) > catb <- c( 1,2,3,4,5,6) > doga <- c(3,5,3,6,4, 0) > dogb <- c(2,4,6,8,10, 12) > rata <- c (NA, 9, 9, 8, 9, 8) > ratb <- c( 1,2,3,4,5,6) > bata <- c( 12, 42,NA, 45, 32, 54) > batb <- c( 13, 15, 17,19,21,23) > id <- c('a', 'b', 'b', 'c', 'a', 'b') > site <- c(1,1,4,4,1,4) > mat1 <- cbind(cata, catb, doga, dogb, rata, ratb, > bata, batb) > > data1 <- data.frame(site, id, mat1) > attach(data1) > data1 > aa <- which(names(data1)=="rata") > bb <- length(names(data1)) > > mat1 <- as.matrix(data1[,aa:bb]) > food <- apply( mat1, 1, sum , na.rm=T) > food > > abba <- data.frame(data1[, 1:6], food) > abba > > -- > Real life problem > > >>load("C:/start/R.objects/partly.corrected.materials.Rdata") >>md1<-partly.corrected.materials >>aa <- which(names(md1)=="oaks") >>bb <- length(names(md1)) >> >># sum the values of the "other" variables >>mat1 <- as.matrix( md1[, aa:bb] ) >>other <- apply(mat1,1, sum, na.rm=T) >>ire1 <- data.frame(md1[, 1:11], other) > > Error in data.frame(md1[, 1:11], other) : arguments > imply differing number of rows: 11, 75 > > - > > I have simply worked around the problem by using > ire1 <- data.frame(md1$site, md1$colour, md1$ss1 ... , > other) > but I would like to know what stupid thing I am doing. > > Thanks > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex scares me
I think this does the trick. Note that it is case sensitive. > x <- c("lad.tab", "xxladyy.tab", "xxyy.tab", "lad.tabx", "LAD.tab", "lad.TAB") > grep("lad.*\\.tab$", x, value=T) [1] "lad.tab" "xxladyy.tab" > Jon Minton wrote: > Hi, apologies if this is too simple but I've been stuck on the following for > a while: > > > > I have a vector of strings: filenames with a name before the extension and a > variety of possible extensions > > > > I want to select only those files with: > > 1) a ".tab" extension > > AND > > 2) the character sequence "lad" anywhere in the name of the file before the > extension. > > > > Surely this won't take long to do, I thought. (But I was wrong.) > > > > What's the regexp pattern to specify here? > > > > Thanks, > > > > Jon Minton > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta characters in file path
What is the problem you are having? Seems to work fine for me running under Windows2000: > write.table(data.frame(a=1:3,b=4:6), file="@# x.csv", sep=",") > read.csv(file="@# x.csv") a b 1 1 4 2 2 5 3 3 6 > sessionInfo() Version 2.3.1 (2006-06-01) i386-pc-mingw32 attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: XML "0.99-8" > Li,Qinghong,ST.LOUIS,Molecular Biology wrote: > Hi, > > I need to read in some files. The file names contain come meta characters > such as @, #, and white spaces etc, In read.csv, file= option, is there any > way that one can make the function to recognize a file path with those > characters? > > Thanks > Johnny > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting a directory
?unlink says that unlink() can remove directories (and has a 'recursive' argument). 'unlink' is in the "SEE ALSO" section in ?file.remove. -- Tony Plate Sundar Dorai-Raj wrote: > Hi, all, > > I'm looking a utility for removing a directory from within R. Currently, > I'm using: > > foo <- function(...) { >mydir <- tempdir() >dir.create(mydir, showWarnings = FALSE, recursive = TRUE) >on.exit(system(sprintf("rm -rf %s", mydir))) >## do some stuff in "mydir" >invisible() > } > > However, this is assumes "rm" is available. I know of ?dir.create, but > there is no opposite. And ?file.remove appears to work only on files and > not directories. > > Any advice? Or is my current approach the only solution? > > > R.version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 3.1 > year 2006 > month 06 > day01 > svn rev38247 > language R > version.string Version 2.3.1 (2006-06-01) > > > Thanks, > > --sundar > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Functions ,Optim, & Dataframe
I added an example of passing additional arguments through optim() to the objective and gradient functions to the Discussion section of the Wiki-fied R documentation. See it at http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optim -- Tony Plate PS. I had to add "&purge=true" to the end of the URL, i.e., http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optim&purge=true in order to see the original documentation the first time -- it's something to do with bad cache entries for the page. Michael Papenfus wrote: > I think I need to clarify a little further on my original question. > > I have the following two rows of data: > mydat<-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35)) > >mydat > d1 d2 p1 p2 > 1 3 6 0.55 0.85 > 2 5 10 0.05 0.35 > > I need to optimize the following function using optim for each row in mydat > fr<-function(x) { > u<-x[1] > v<-x[2] > sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2)) > } > x0<-c(1,1)# starting values for two unknown parameters > y<-optim(x0,fr) > > In my defined function fr, (d1 d2 p1 p2) are known values which I need > to read in from my dataframe and u & v are the TWO unknown parameters. > I want to solve this equation for each row of my dataframe. > > I can get this to work when I manually plug in the known values (d1 d2 > p1 p2). However, I would like to apply this to each row in my dataframe > where the known values are automatically passed to my function which > then is sent to optim which solves for the two unknown parameters for > each row in the dataframe. > > thanks again, > mike > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Functions ,Optim, & Dataframe
Supply your additional arguments to optim() and they will get passed to your function: > mydat<-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35)) > > fr<-function(x, d) { + # d is a vector of d1, d2, p1 & p2 + u <- x[1] + v <- x[2] + d1 <- d[1] + d2 <- d[2] + p1 <- d[3] + p2 <- d[4] + sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2))) + } > x0 <- c(1,1)# starting values for two unknown parameters > y1 <- optim(x0,fr,d=unlist(mydat[1,])) > y2 <- optim(x0,fr,d=unlist(mydat[2,])) > y1$par [1] 0.462500 0.828125 > y2$par [1] -1.0937500 0.2828125 > yall <- apply(mydat, 1, function(d) optim(x0,fr,d=d)) > yall[[1]]$par [1] 0.462500 0.828125 > yall[[2]]$par [1] -1.0937500 0.2828125 > One thing you must be careful of is that none of the arguments to your function match or partially match the named arguments of optim(), which are: > names(formals(optim)) [1] "par" "fn" "gr" "method" "lower" "upper" "control" [8] "hessian" "..." > For example, if your function has an argument 'he=', you will not be able to pass it, because if you say optim(x0, fr, he=3), the 'he' will match the 'hessian=' argument of optim(), and it will not be interpreted as being a '...' argument. -- Tony Plate Michael Papenfus wrote: > I think I need to clarify a little further on my original question. > > I have the following two rows of data: > mydat<-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35)) > >mydat > d1 d2 p1 p2 > 1 3 6 0.55 0.85 > 2 5 10 0.05 0.35 > > I need to optimize the following function using optim for each row in mydat > fr<-function(x) { > u<-x[1] > v<-x[2] > sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2)) > } > x0<-c(1,1)# starting values for two unknown parameters > y<-optim(x0,fr) > > In my defined function fr, (d1 d2 p1 p2) are known values which I need > to read in from my dataframe and u & v are the TWO unknown parameters. > I want to solve this equation for each row of my dataframe. > > I can get this to work when I manually plug in the known values (d1 d2 > p1 p2). However, I would like to apply this to each row in my dataframe > where the known values are automatically passed to my function which > then is sent to optim which solves for the two unknown parameters for > each row in the dataframe. > > thanks again, > mike > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transformation matrice of vector into array
Here's a way to convert a matrix of vectors like you have into an array: > x <- array(lapply(seq(0,len=6,by=4), "+", c(a=1,b=2,c=3,d=4)), dim=c(2,3), dimnames=list(c("X","Y"),c("e","f","g"))) > x e f g X Numeric,4 Numeric,4 Numeric,4 Y Numeric,4 Numeric,4 Numeric,4 > x[["Y","e"]] a b c d 5 6 7 8 > xa <- array(unlist(x, use.names=F), dim=c(length(x[[1,1]]),dim(x)), dimnames=c(list(names(x[[1,1]])),dimnames(x))) > x["Y","e"] [[1]] a b c d 5 6 7 8 > xa[,"Y","e"] a b c d 5 6 7 8 > Then you can do whatever sums you want over the array. I have not extensively checked the above code, and if I were going to use it, I would do numerous spot checks of elements to make sure all the elements are going to the right places -- it's not too difficult to make mistakes when pulling apart and reassembling arrays like this. (For simpler cases involving lists of vectors or matrices, the abind() function can help.) -- Tony Plate Jessica Gervais wrote: > Hi, > > I need some help > > I have a matrix M(m,n) in which each element is a vector V of lenght 6 > 1 2 3 4 5 6 7 > 1 List,6 List,6 List,6 List,6 List,6 List,6 List,6 > 2 List,6 List,6 List,6 List,6 List,6 List,6 List,6 > 3 List,6 List,6 List,6 List,6 List,6 List,6 List,6 > 4 List,6 List,6 List,6 List,6 List,6 List,6 List,6 > > > i would like to make the sum on the matrix of each element of the > matrix, that is to say > sum(on the matrix)(M[j,][[j]][[1]]) > sum(on the matrix)(M[j,][[j]][[2]]) > ... > sum(on the matrix)(M[j,][[j]][[6]]) > > I don't really know how to do. > I thought it was possible to transform the matrix M into an array A of > dimension (m,n,6), and then use the command sum(colsums(A[,,1]), which > seems to be possible and quite fast. > ...but I don't know how to convert a matrix of vector into an array > > As anyone any little idea about that ? > > Thanks by advance > > Jessica > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] References verifying accuracy of R for basic statistical calculations and tests
This might be a place to start: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html Among the references listed there are: Assessing the Reliability of Statistical Software: Part I by B. D. McCullough (1998) http://www.amstat.org/publications/tas/mccull-1.pdf Assessing the Reliability of Statistical Software: Part II by B. D. McCullough (1999) http://www.amstat.org/publications/tas/mccull.pdf Those might have some relevance Then, doing within an R session: > RSiteSearch("Assessing Reliability Statistical Software") turns up 14 hits, many of them looking relevant [leaving "the" and "of" in the query results in the search engine timing out - odd?] -- Tony Plate Corey Powell wrote: > Do you know of any references that verify the accuracy of R for basic > statistical calculations and tests. The results of these studies should > indicate that R results are the same as the results of other statistical > packages to a certain number of decimal places on some benchmark calculations. > > Thanks, > > Corey Powell > Clinical Data Analyst > Broncus Technologies > [EMAIL PROTECTED] > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] max / pmax
Here's an example of how I think you can do what you want. Play with the definition of the function highest.use() to get random selection of multiple maxima. > drug.names <- c("marijuana", "crack", "cocaine", "heroin") > drugs <- factor(drug.names, levels=drug.names) > drugs [1] marijuana crack cocaine heroin Levels: marijuana crack cocaine heroin > as.numeric(drugs) [1] 1 2 3 4 > N <- 20 > set.seed(1) > primary.drug <- sample(drugs, N, rep=T) > primary.drug[sample(1:20, 10)] <- NA > primary.drug [1] crack heroin [8] cocaine cocaine marijuana cocaine crack [15] heroin cocaine heroin Levels: marijuana crack cocaine heroin > # usage frequencies > marijuana <- sample(1:3, N, rep=T) > crack <- sample(1:3, N, rep=T) > cocaine <- sample(1:3, N, rep=T) > heroin <- sample(1:3, N, rep=T) > cbind(marijuana, crack, cocaine, heroin) marijuana crack cocaine heroin [1,] 2 2 2 1 [2,] 2 3 3 1 [3,] 2 2 2 2 [4,] 1 1 2 3 [5,] 3 1 2 3 [6,] 3 1 3 3 [7,] 3 1 3 2 [8,] 1 2 2 2 [9,] 3 2 3 3 [10,] 2 2 3 2 [11,] 3 3 2 2 [12,] 2 1 3 2 [13,] 3 2 2 1 [14,] 2 1 1 3 [15,] 2 2 3 2 [16,] 3 1 1 1 [17,] 1 2 3 1 [18,] 2 3 1 2 [19,] 3 1 1 3 [20,] 3 3 1 2 > highest.use <- function(x) {y <- which(x==max(x, na.rm=T)); if (length(y)==1) return(y) else return(NA)} > apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use) [1] NA NA NA 4 NA NA NA NA NA 3 NA 3 1 4 3 1 3 2 NA NA > impute.primary.drug <- drugs[ifelse(is.na(primary.drug), apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use), as.numeric(primary.drug))] > data.frame(primary.drug, impute.primary.drug) primary.drug impute.primary.drug 1 2 crack crack 3 4heroin 5 6 7heroin heroin 8 cocaine cocaine 9 cocaine cocaine 10marijuana marijuana 11 12 cocaine 13 cocaine cocaine 14crack crack 15 heroin heroin 16marijuana 17 cocaine cocaine 18 heroin heroin 19 20 > Brian Perron wrote: > Hello R users, > > I am relatively new to R and cannot seem to crack a coding problem. I > am working with substance abuse data, and I have a variable called > "primary.drug" which is considered the drug of choice for each > subject. I have just a few missing values on that variable. Instead > of using a multiple imputation method like chained equations, I would > prefer to derive these values from other survey responses. > Specifically, I have a frequency of use (in days) for each of the major > drugs, so I would like the missing values to be replaced by that drug > with the highest level of use. I am starting with the "ifelse" and > "max" statements, but I know it is wrong: > > impute.primary.drug <- ifelse(is.na(primary.drug), max(marijuana, > crack, cocaine, heroin), primary.drug) > > Here are the problems. First, the max statement (should it be "pmax"?), > returns the highest numeric quantity rather than the variable itself. > In other words, I want to test which drug has the highest value, but > return the variable name rather than the observed value. Second, if > ties are observed, how can I specify the value to be NA? Or, how can I > specify one of the values to be randomly selected? > > Thank in advance for your assistance. > > Regards, > Brian > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to multiply a constant to a matrix?
I still can't see why this is a problem. If a 1x1 matrix should be treated as a scalar, then it can just be wrapped in drop(), and the arithmetic will be computed correctly by R. Are there any cases where this cannot be done? More specifically, are there any matrix algebra expressions where, depending on the particular dimensions of the variables used, drop() must be used in some cases, and not in other cases? A related but different behavior is the default dropping dimensions with extent equal to one by indexing operations. This can be problematic because if one is not careful, incorrect results can be obtained for particular values used in the expression. For example, consider the following, in which we are trying to compute the cross product of some columns of x with some rows of y. If x has n rows and y has n columns, then the result should always be an nxn matrix. However, if we are not careful with using drop=F in the indexing expressions, we can inadvertently end up with a 1x1 inner product matrix result for the case where we just use one column of x and one row of y. The solution to this is to always use drop=F in indexing in situations where this can occur. > x <- matrix(1:9, ncol=3) > y <- matrix(-(1:9), ncol=3) > i <- 1:2 > x[,i] %*% y[i,] [,1] [,2] [,3] [1,] -9 -24 -39 [2,] -12 -33 -54 [3,] -15 -42 -69 > i <- 1:3 > x[,i] %*% y[i,] [,1] [,2] [,3] [1,] -30 -66 -102 [2,] -36 -81 -126 [3,] -42 -96 -150 > # i has just one element -- the expression without drop=F > # no longer computes an outer product > i <- 2 > x[,i] %*% y[i,] [,1] [1,] -81 > x[,i,drop=F] %*% y[i,,drop=F] [,1] [,2] [,3] [1,] -8 -20 -32 [2,] -10 -25 -40 [3,] -12 -30 -48 > Cannot all cases in the situations you mention be handled in an analogous manner, by always wrapping appropriate quadratic expressions in drop(), or are there some cases where the result of the quadratic expression must be treated as a matrix, and other cases where the result of the quadratic expression must be treated as a scalar? -- Tony Plate Michael wrote: > imagine when you have complicated matrix algebra computation using R, > > you cannot prevent some middle-terms become quadratic and absorbs into one > scalar, right? > > if R cannot intelligently determine this, and you have to manually add > "drop" everywhere, > > do you think it is reasonable? > > On 5/23/06, Patrick Burns <[EMAIL PROTECTED]> wrote: > >>I think >> >>drop(B/D) * solve(A) >> >>would be a more transparent approach. >> >>It isn't that R can not do what you want, it is that >>it is saving you from shooting yourself in the foot >>in your attempt. What you are doing is not really >>a matrix computation. >> >> >>Patrick Burns >>[EMAIL PROTECTED] >>+44 (0)20 8525 0696 >>http://www.burns-stat.com >>(home of S Poetry and "A Guide for the Unwilling S User") >> >>Michael wrote: >> >> >>>This is very strange: >>> >>>I want compute the following in R: >>> >>>g = B/D * solve(A) >>> >>>where B and D are quadratics so they are just a scalar number, e.g. >> >>B=t(a) >> >>>%*% F %*% a; >>> >>>I want to multiply B/D to A^(-1), >>> >>>but R just does not allow me to do that and it keeps complaining that >>>"nonconformable array, etc." >>> >>> >>>I tried the following two tricks and they worked: >>> >>>as.numeric(B/D) * solve(A) >>> >>>diag(as.numeric(B/D), 5, 5) %*% solve (A) >>> >>> >>> >>>But if R cannot intelligently do scalar and matrix multiplication, it is >>>really problemetic. >>> >>>It basically cannot be used to do computations, since in complicated >> >>matrix >> >>>algebras, you have to distinguish where is scalar, and scalars obtained >> >>from >> >>>quadratics cannot be directly used to multiply another matrix, etc. It is >>>going to a huge mess... >>> >>>Any thoughts? >>> >>> [[alternative HTML version deleted]] >>> >>>__ >>>R-help@stat.math.ethz.ch mailing list >>>https://stat.ethz.ch/mailman/listinfo/r-help >>>PLEASE do read the posting guide! >> >>http://www.R-project.org/posting-guide.html >> >>> >>> >>> > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subset dataframe based on condition
Works OK for me: > x <- data.frame(a=10^(-2:7), b=10^(10:1)) > subset(x, a > 1) a b 4 1e+01 1e+07 5 1e+02 1e+06 6 1e+03 1e+05 7 1e+04 1e+04 8 1e+05 1e+03 9 1e+06 1e+02 10 1e+07 1e+01 > subset(x, a > 1 & b < a) ab 8 1e+05 1000 9 1e+06 100 10 1e+07 10 > Do you get all "numeric" for the following? > sapply(x, class) a b "numeric" "numeric" > If not, then your data frame is probably encoding the information in some way that you don't want (though if it was as factors, I would have expected a warning from the comparison operator). You might get more help by distilling your problem to a simple example that can be tried out by others. -- Tony Plate Sachin J wrote: > Hi, > > I am trying to extract subset of data from my original data frame > based on some condition. For example : (mydf -original data frame, submydf > - subset dada frame) > > >submydf = subset(mydf, a > 1 & b <= a), > > here column a contains values ranging from 0.01 to 10. I want to > extract only those matching condition 1 i.e a > . But when i execute > this command it is not giving me appropriate result. The subset df - > submydf contains rows with 0.01 also. Please help me to resolve this > problem. > > Thanks in advance. > > Sachin > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] for loop should check the looping index !!
Yep, you missed the fact that 2:1 generates the sequence c(2,1). Personally, I'd excuse you for missing this, as the documentation for seq says: The operator ':' and the 'seq(from, to)' form generate the sequence 'from, from+1, ..., to'. Maybe I'm missing something, but I don't see anywhere on the help page for seq and ":" any mention of the fact the seq() generates a descending sequence if 'to' is less than 'from'. In programming, *never* use a construct like 1:length(x) or 2:length(x), always using something like seq(1,len=length(x)) (or simply seq(len=length(x)), or seq(2, len=length(x)-1) or seq(along=x)[-1]. -- Tony Plate johan Faux wrote: > Hello , > > a<-c(1) > for(i in 2:length(a)) > do.something with a[[i]] > > I get : > Error in a[[i]] : subscript out of bounds > > Am I missing something here? Doesnt R check the value of i inside "for" > and if the condition is not tru, dont do anything > > thanks, > johan > > > - > > Got holiday prints? See all the ways to get quality prints in your hands > ASAP. > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Convert matrix to data.frame
When I try converting a matrix to a data frame, it works for me: > x <- matrix(1:6,ncol=2,dimnames=list(LETTERS[1:3],letters[24:25])) > data.frame(x) x y A 1 4 B 2 5 C 3 6 > str(data.frame(x)) `data.frame': 3 obs. of 2 variables: $ x: int 1 2 3 $ y: int 4 5 6 > You can also use as.data.frame() to convert a matrix to a data.frame (but note that if colnames are missing form the matrix, as.data.frame() constructs different colnames than does data.frame(). You say "it didn't work" -- it's difficult to help with such a non-specific complaint. Can you explain exactly how it didn't work for you? (e.g., show the exact error message). -- Tony Plate Chia, Yen Lin wrote: > Hi all, > > > > I wonder how could I convert a matrix A to a dataframe such that > whenever I'm running a linear model such lme, I can use A$x1? I tried > data.frame(A), it didn't work. Should I initialize A not as a matrix? > Thanks. > > > > Yen Lin > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Correct way to test for exact dimensions of matrix or array
There's a gotcha in using identical() to compare dimensions -- it also compares names, e.g.: > x <- array(1:14, dim=c(rows=3,cols=5)) > dim(x) rows cols 35 > identical(dim(x)+0, c(3,5)) [1] FALSE > identical(as.numeric(dim(x)+0), c(3,5)) [1] TRUE > Gabor Grothendieck wrote: > If its just succint you are after then this is slightly > shorter: > >identical(dim(x)+0, c(3,5)) > > > On 1/10/06, Gregory Jefferis <[EMAIL PROTECTED]> wrote: > >>Thanks for suggestions. This is a simple question in principle, but there >>seem to be some wrinkles - I am always having to think quite carefully about >>how to test for equality in R. I should also have said that I would like >>the check to be efficient as well safe and succinct. >> >>One suggestion was: >> >> isTRUE(all.equal(dim(obj), c(3, 5))) >> >>But that is not so efficient because all.equal does lots of work esp if it >>the objects are not equal. >> >>Another suggestion was: >> >> all( dim( obj) == c(3,5) ) >> >>But that is not safe eg because dim(vector(10)) is NULL and >>all(NULL==c(3,5)) is actually TRUE (to my initial surprise) so vectors would >>pass through the net. >> >>So, so far the only way that is efficient, safe and succinct is: >> >> identical( dim( obj) , as.integer(c(3,5))) >> >>Martin Maechler pointed out that at the beginning of a function you might >>want to break down the test into something less succinct, that printed more >>specific error messages - a good suggestion for a top level function that is >>supposed to be user friendly. >> >>Any other suggestions? Many thanks, >> >>Greg Jefferis. >> >>On 10/1/06 15:13, "Martin Maechler" <[EMAIL PROTECTED]> wrote: >> >> "Gregory" == Gregory Jefferis <[EMAIL PROTECTED]> on Tue, 10 Jan 2006 14:47:43 + writes: >>> >>>Gregory> Dear R Users, >>> >>> Gregory> I want to test the dimensions of an incoming >>> Gregory> vector, matrix or array safely >>> >>> >>>Gregory> and succinctly. Specifically I want to check if >>>Gregory> the unknown object has exactly 2 dimensions with a >>>Gregory> specified number of rows and columns. >>> >>>Gregory> I thought that the following would work: >>> >>> >obj=matrix(1,nrow=3,ncol=5) >identical( dim( obj) , c(3,5) ) >>> >>>Gregory> [1] FALSE >>> >>>Gregory> But it doesn't because c(3,5) is numeric and the dims are >>>integer. I >>>Gregory> therefore ended up doing something like: >>> >>> >identical( dim( obj) , as.integer(c(3,5))) >>> >>>Gregory> OR >>> >>> >isTRUE(all( dim( obj) == c(3,5) )) >>> >>>the last one is almost perfect if you leave a way the superfluous >>>isTRUE(..). >>> >>>But, you say that it's part of your function checking it's >>>arguments. >>>In that case, I'd recommend >>> >>> if(length(d <- dim(obj)) != 2) >>> stop("'d' must be matrix-like") >>> if(!all(d == c(3,5))) >>> stop("the matrix must be 3 x 5") >>> >>>which also provides for nice error messages in case of error. >>>A more concise form with less nice error messages is >>> >>> stopifnot(length(d <- dim(obj)) == 2, >>>d == c(3,50)) >>> >>> ## you can leave away all(.) for things in stopifnot(.) >>> >>> >>> >>> >>>Gregory> Neither of which feel quite right. Is there a 'correct' way to >>>do this? >>> >>>Gregory> Many thanks, >>> >>>You're welcome, >>>Martin Maechler, ETH Zurich >>> >>>Gregory> Greg Jefferis. >>> >>>Gregory> PS Thinking about it, the second form is (doubly) wrong because: >>> >>> >obj=array(1,dim=c(3,5,3,5)) >isTRUE(all( dim( obj) == c(3,5) )) >>> >>>Gregory> [1] TRUE >>> >>>Gregory> OR >>> >obj=numeric(10) >isTRUE(all( dim( obj) == c(3,5) )) >>> >>>Gregory> [1] TRUE >>> >>>Gregory> (neither of which are equalities that I am happy with!) >>> >> >>-- >>Gregory Jefferis, PhD and: >>Research Fellow >>Department of Zoology St John's College >>University of Cambridge Cambridge >>Downing Street CB2 1TP >>Cambridge, CB2 3EJ >>United Kingdom >> >>Tel: +44 (0)1223 336683 +44 (0)1223 339899 >>Fax: +44 (0)1223 336676 +44 (0)1223 337720 >> >>[EMAIL PROTECTED] >> >>__ >>R-help@stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >> > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wikis etc.
I second Frank's comment! I wonder if questioners who receive a bunch of useful replies could be encouraged to enter a summary of those on a Wiki, in much the same way as users of S-news were expected to post a summary of their answers as a way of giving something back. An existing R Wiki is located at http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome However, there's currently not much on it. Recently on R-help there was a summary of using databases with R, which looked very useful, so I put that on the Wiki. Maybe if others just start putting things there it can gather momentum? -- Tony Plate Frank E Harrell Jr wrote: > I feel that as long as people continue to provide help on r-help wikis > will not be successful. I think we need to move to a central wiki or > discussion board and to move away from e-mail. People are extremely > helpful but e-mail seems to be to always be memory-less and messages get > too long without factorization of old text. R-help is now too active > and too many new users are asking questions asked dozens of times for > e-mail to be effective. > > The wiki also needs to collect and organize example code, especially for > data manipulation. I think that new users would profit immensely from a > compendium of examples. > > Just my .02 Euros > > Frank __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] update to posting guide: use 'sessionInfo()' instead of 'version'
Some changes have been made to the posting guide, based on suggestions from various R-help contributors over the past year. The most significant change is the recommendation to use 'sessionInfo()' rather than 'version' when asking questions about unexpected behavior or bugs. This change was made because 'sessionInfo()' reports the version and a list of packages currently attached. As more and more packages become available, it becomes more likely that unexpected behavior is due to conflicts between packages, so this is relevant information. [Note that sessionInfo() currently does not report all the information that 'version' does (it omits at least "Status" and "svn rev"). R-core members are aware of this -- whether or not they change this is up to them.] -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R and databases - a comment
This is very useful, thanks for posting! I created a page for this at the R Wiki: http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?DataBases If any one has any info to add, go at it! -- Tony Plate charles loboz wrote: > 1. That was a part of a private email exchange. It has > been suggested that more people may be interested. > > 2. I did use various databases (significant part of my > job) for the last 15 years. Some with R for the last 3 > years as a hobby. Some comments on the ones used > below. Sorry, no links - I am time-constrained at the > moment - please google if interested in details. The > remarks are from the point of view of R user, not that > of 'general database user'. > > 3. SQLITE. www.sqlite.org - probably the best datase > to use with R. No setup, no administration, embedded - > so less connection overhead. All data in one file - so > easy to transfer. Solid. Very functional SQL, fast if > you play it right (almost as fast as SQLServer on > Windows...) . Some limitations - no stored procedures. > Some preprocessing/parsing can be done using TCL - > well integrated with sqlite if you need that. Due to > the implementation quirk you can even compute > recursive functions (like exponential moving average > or Fibonacci numbers) with SQL :-). Easy import/export > of data to text files. After trying few other dbs I > settled down on this one. Even considered writing a > tutorial on SQLite use with R (like how to process > gigabytes of data on a 128mb computer :-) ) - but time > constraints stopped me. [Personally I think that > SQLite should come bundled with the standard R > installation. Could even be used to keep a lot of R's > internal stuff, would probably simplify overall > coding. But that is for others to decide] > > All other databases (including mysql) require typical > setup - installation, administration, user rights, > keeping track of ports, services/daemons, directories, > backups etc - so some db administrative skills are > required.I am not sure how many R users are willing to > go through that. The ones who may be interested in the > stuff below > > 4. www.postgres.org Postgres. Free. As complete as one > can wish, small download, great functionality. > Interfaces well to other languages, so you can do > numerics in C++ and store that in the database (though > why not do numerics in R?). Current version 8.1, much > improved. > > 5. Firebird. open source verion of Interbase. Easy > setup and can have all data in one file. But... slow > development - not many developers there. SQL full but > somewhat quirky (when porting from other dialects). > > 6. Mysql. the inheritance from the original ISAM > system still shows. Nice user interface, but... if you > need real db why not use postgres? if you need > something simpler, without administration, why not use > SQLITE? No doubt mysql is fine for many simple > websites etc - this is mysql's niche. > > 7. derby and hsqldb. both are written in Java, open > source. HSQLDB (used now by OpenOffice) allows > creation of in-memory tables and it's fast there - but > it's usage from inside R is tricky - there is no > easily available, installable and current ODBC driver. > Similar for derby - the ODBC driver is there, but > installation can be tricky to non-professionals. May > be in the future... > > There are three 'express' versions of commercial > databases. They all share some restrictions, like max > disc data size 2-4gb, max mem size 1-2gb and usage of > single processor only. Plus various licensing > restrictions, so be careful how you use them. > > - Microsoft - in beta now, over 100mb download > (windows only) (the old version, MSDE, is also > available) > - Oracle - 150mb download, if i remember correctly > even free to distribute, but check the license > - DB2 - 500mb download, currently 90 day version, IBM > strong rumour is that early next year the new version > will be free. > > Each commercial DB has some OLAP capability, but I am > not sure how much of it is/will be available in the > Express version. > > > > __ > > Just $16.99/mo. or less. > dsl.yahoo.com > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Still a bug with NA in sd() or var()?
Roger Dungan wrote: > [snip]> > There are obvious work-rounds, like > >>sd(x, is.na(x)==F) > > which gives the result (with error message) > [1] 1.707825 > Warning message: > the condition has length > 1 and only the first element will be used in: > if (na.rm) "complete.obs" else "all.obs" > What you are doing here looks very odd to me -- you are passing a vector of logicals as the value for the argument na.rm. This is odd because na.rm should be just a single logical value, not a vector of the same length as x (hence the warning message). Only the first element of that vector is used, so you are passing essentially a random value. By luck, in your example, the first element was T, which is why you got a value of 1.707825 as the result, and not NA. The rest might fall into place when this understanding is cleared up. -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] unvectorized option for outer()
Apologies for the cross post. I explicitly tried to avoid this but somehow r-help got tacked onto the end of the To: line without my realizing it. -- Tony Plate Tony Plate wrote: > [following on from a thread on R-help, but my post here seems more > appropriate to R-devel] > ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] unvectorized option for outer()
[following on from a thread on R-help, but my post here seems more appropriate to R-devel] Would a patch to make outer() work with non-vectorized functions be considered? It seems to come up moderately often on the list, which probably indicates that many many people get bitten by the same incorrect expectation, despite the documentation and the FAQ entry. It looks pretty simple to modify outer() appropriately: one extra function argument and an if-then-else clause to call mapply(FUN, ...) instead of calling FUN directly. Here's a function demonstrating this: outer2 <- function (X, Y, FUN = "*", ..., VECTORIZED=TRUE) { no.nx <- is.null(nx <- dimnames(X <- as.array(X))) dX <- dim(X) no.ny <- is.null(ny <- dimnames(Y <- as.array(Y))) dY <- dim(Y) if (is.character(FUN) && FUN == "*") { robj <- as.vector(X) %*% t(as.vector(Y)) dim(robj) <- c(dX, dY) } else { FUN <- match.fun(FUN) Y <- rep(Y, rep.int(length(X), length(Y))) if (length(X) > 0) X <- rep(X, times = ceiling(length(Y)/length(X))) if (VECTORIZED) robj <- FUN(X, Y, ...) else robj <- mapply(FUN, X, Y, MoreArgs=list(...)) dim(robj) <- c(dX, dY) } if (no.nx) nx <- vector("list", length(dX)) else if (no.ny) ny <- vector("list", length(dY)) if (!(no.nx && no.ny)) dimnames(robj) <- c(nx, ny) robj } # Some examples f <- function(x, y, p=1) {cat("in f\n"); (x*y)^p} outer2(1:2, 3:5, f, 2) outer2(numeric(0), 3:5, f, 2) outer2(1:2, numeric(0), f, 2) outer2(1:2, 3:5, f, 2, VECTORIZED=F) outer2(numeric(0), 3:5, f, 2, VECTORIZED=F) outer2(1:2, numeric(0), f, 2, VECTORIZED=F) # Output on examples > f <- function(x, y, p=1) {cat("in f\n"); (x*y)^p} > outer2(1:2, 3:5, f, 2) in f [,1] [,2] [,3] [1,]9 16 25 [2,] 36 64 100 > outer2(numeric(0), 3:5, f, 2) in f [,1] [,2] [,3] > outer2(1:2, numeric(0), f, 2) in f [1,] [2,] > outer2(1:2, 3:5, f, 2, VECTORIZED=F) in f in f in f in f in f in f [,1] [,2] [,3] [1,]9 16 25 [2,] 36 64 100 > outer2(numeric(0), 3:5, f, 2, VECTORIZED=F) [,1] [,2] [,3] > outer2(1:2, numeric(0), f, 2, VECTORIZED=F) [1,] [2,] > If a patch to add this feature would be considered, I'd be happy to submit one (including documentation). If so, and if there are any potential traps I should bear in mind, please let me know! -- Tony Plate Rau, Roland wrote: > Dear all, > > a big thanks to Thomas Lumley, James Holtman and Tony Plate for their > answers. They all pointed in the same direction => I need a vectorized > function to be applied. Hence, I will try to work with a 'wrapper' > function as described in the FAQ. > > Thanks again, > Roland > > > >>-Original Message- >>From: Thomas Lumley [mailto:[EMAIL PROTECTED] >>Sent: Thursday, October 27, 2005 11:39 PM >>To: Rau, Roland >>Cc: r-help@stat.math.ethz.ch >>Subject: Re: [R] outer-question >> >> >>You want FAQ 7.17 Why does outer() behave strangely with my function? >> >> -thomas >> >>On Thu, 27 Oct 2005, Rau, Roland wrote: >> >> >>>Dear all, >>> >>>This is a rather lengthy message, but I don't know what I >> >>made wrong in >> >>>my real example since the simple code works. >>>I have two variables a, b and a function f for which I would like to >>>calculate all possible combinations of the values of a and b. >>>If f is multiplication, I would simply do: >>> >>>a <- 1:5 >>>b <- 1:5 >>>outer(a,b) >>> >>>## A bit more complicated is this: >>>f <- function(a,b,d) { >>> return(a*b+(sum(d))) >>>} >>>additional <- runif(100) >>>outer(X=a, Y=b, FUN=f, d=additional) >>> >>>## So far so good. But now my real example. I would like to plot the >>>## log-likelihood surface for two parameters alpha and beta of >>>## a Gompertz distribution with given data >>> >>>### I have a function to generate random-numbers from a >>>Gompertz-Distribution >>>### (using the 'inversion method') >>> >>>random.gomp <- function(n, alpha, beta) { >>> return( (log(1-(beta/alpha*log(1-runif(n)/beta) >>>} >>> >>>## Now I generate some 'lifetimes' >>>no.people <- 1000 >>>al <- 0.1 >>>bet <- 0.1 >>>lifetimes <- random.gomp(n=no.people, alpha=al, beta=bet
Re: [R] outer-question
It looks like you didn't vectorize the function you gave "outer" in your longer example. Consider your short example with a diagnostic printout: > a <- 1:3 > b <- 1:4 > f <- function(a,b,d) { + cat("In f:", length(a), length(b), "\n") + return(a*b+(sum(d))) + } > additional <- runif(100) > outer(X=a, Y=b, FUN=f, d=additional) In f: 12 12 [,1] [,2] [,3] [,4] [1,] 53.61985 54.61985 55.61985 56.61985 [2,] 54.61985 56.61985 58.61985 60.61985 [3,] 55.61985 58.61985 61.61985 64.61985 > Note that "f" is called only once, with vectors for "a" and "b". -- Tony Plate Rau, Roland wrote: > Dear all, > > This is a rather lengthy message, but I don't know what I made wrong in > my real example since the simple code works. > I have two variables a, b and a function f for which I would like to > calculate all possible combinations of the values of a and b. > If f is multiplication, I would simply do: > > a <- 1:5 > b <- 1:5 > outer(a,b) > > ## A bit more complicated is this: > f <- function(a,b,d) { > return(a*b+(sum(d))) > } > additional <- runif(100) > outer(X=a, Y=b, FUN=f, d=additional) > > ## So far so good. But now my real example. I would like to plot the > ## log-likelihood surface for two parameters alpha and beta of > ## a Gompertz distribution with given data > > ### I have a function to generate random-numbers from a > Gompertz-Distribution > ### (using the 'inversion method') > > random.gomp <- function(n, alpha, beta) { > return( (log(1-(beta/alpha*log(1-runif(n)/beta) > } > > ## Now I generate some 'lifetimes' > no.people <- 1000 > al <- 0.1 > bet <- 0.1 > lifetimes <- random.gomp(n=no.people, alpha=al, beta=bet) > > ### Since I neither have censoring nor truncation in this simple case, > ### the log-likelihood should be simply the sum of the log of the > ### the densities (following the parametrization of Klein/Moeschberger > ### Survival Analysis, p. 38) > > loggomp <- function(alphas, betas, timep) { > return(sum(log(alphas) + betas*timep + (alphas/betas * > (1-exp(betas*timep) > } > > ### Now I thought I could obtain a matrix of the log-likelihood surface > ### by specifying possible values for alpha and beta with the given > data. > ### I was able to produce this matrix with two for-loops. But I thought > ### I could use also 'outer' in this case. > ### This is what I tried: > > possible.alphas <- seq(from=0.05, to=0.15, length=30) > possible.betas <- seq(from=0.05, to=0.15, length=30) > > outer(X=possible.alphas, Y=possible.betas, FUN=loggomp, timep=lifetimes) > > ### But the result is: > >>outer(X=possible.alphas, Y=possible.betas, FUN=loggomp, > > timep=lifetimes) > Error in outer(X = possible.alphas, Y = possible.betas, FUN = loggomp, > : > dim<- : dims [product 900] do not match the length of object [1] > In addition: Warning messages: > ... > > ### Can somebody give me some hint where the problem is? > ### I checked my definition of 'loggomp' but I thought this looks fine: > loggomp(alphas=possible.alphas[1], betas=possible.betas[1], > timep=lifetimes) > loggomp(alphas=possible.alphas[4], betas=possible.betas[10], > timep=lifetimes) > loggomp(alphas=possible.alphas[3], betas=possible.betas[11], > timep=lifetimes) > > > ### I'd appreciate any kind of advice. > ### Thanks a lot in advance. > ### Roland > > > + > This mail has been sent through the MPI for Demographic Rese...{{dropped}} > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R on a supercomputer
In general, R is not written in such a way that data remain in cache. However, R can use optimized BLAS libraries, and these are. So if your version of R is compiled to use an optimized BLAS library appropriate to the machine (e.g., ATLAS, or Prof. Goto's Blas), AND a considerable amount of the computation done in your R program involves basic linear algebra (matrix multiplication, etc.), then you might see a good speedup. -- Tony Plate Kimpel, Mark William wrote: > I am using R with Bioconductor to perform analyses on large datasets > using bootstrap methods. In an attempt to speed up my work, I have > inquired about using our local supercomputer and asked the administrator > if he thought R would run faster on our parallel network. I received the > following reply: > > > > > > "The second benefit is that the processors have large caches. > > Briefly, everything is loaded into cache before going into the > processor. With large caches, there is less movement of data between > memory and cache, and this can save quite a bit of time. Indeed, when > programmers optimize code they usually think about how to do things to > keep data in cache as long as possible. > > Whether you would receive any benefit from larger cache depends on how > R is written. If it's written such that data remain in cache, the > speed-up could be considerable, but I have no way to predict it." > > > > My question is, "is R written such that data remain in cache?" > > > > Thanks, > > > > > > Mark W. Kimpel MD > > > > Indiana University School of Medicine > > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Assign references
Looking at what objects exist after the call to myFunk() should give you a clue as to what happened: > remove(list=objects()) > myFunk<-function(a,b,foo,bar) {foo<<-a+b; bar<<-a*b;} > x<-0; y<-0; > myFunk(4,5,x,y) > x [1] 0 > y [1] 0 > objects() [1] "bar""foo""myFunk" "x" "y" > bar [1] 20 > foo [1] 9 > I suspect that you might have slightly misinterpreted Thomas Lumely's explanations of how the <<- operator works in different situations (the LHS must exist if you are assigning using a replacement operator, e.g., as in "foo[1] <<- ...", but not when you are assigning the whole object as in "foo <<- ..."). But I really would suggest careful consideration of what might be the best way to approach your problem -- modifying global data from within a function is not the standard way of using R. Unless you are very careful about how you do it, it is likely to cause headaches for yourself and/or others down the road (because R is just not intended to be used that way). The standard way of doing this sort of thing in R is to modify a local copy of the dataframe and return that, or if you have to return several dataframes, then return a list of dataframes. -- Tony Plate [EMAIL PROTECTED] wrote: > Folks, > > I've run into trouble while writing functions that I hope will create > and modify a dataframe or two. To that end I've written a toy function > that simply sets a couple of variables (well, tries but fails). > Searching the archives, Thomas Lumley recently explained the <<- > operator, showing that it was necessary for x and y to exist prior to > the function call, but I haven't the faintest why this isn't working: > > >>myFunk<-function(a,b,foo,bar) {foo<<-a+b; bar<<-a*b;} >>x<-0; y<-0; >>myFunk(4,5,x,y) >>x<-0; y<-0; >>myFunk(4,5,x,y) >>x > > [1] 0 > >>y > > [1] 0 > > What (no doubt simple) reason is there for x and y not changing? > > Thank you, > cur > -- > Curt Seeliger, Data Ranger > CSC, EPA/WED contractor > 541/754-4638 > [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] books about MCMC to use MCMC R packages?
I've found "Bayesian Data Analysis" by Gelman, Carlin, Stern & Rubin (2nd ed) to be quite useful for understanding how MCMC can be used for Bayesian models. It has a little bit of R code in it too. -- Tony Plate Molins, Jordi wrote: > Dear list users, > > I need to learn about MCMC methods, and since there are several packages in > R that deal with this subject, I want to use them. > > I want to buy a book (or more than one, if necessary) that satisfies the > following requirements: > > - it teaches well MCMC methods; > > - it is easy to implement numerically the ideas of the book, and notation > and concepts are similar to the corresponding R packages that deal with MCMC > methods. > > I have done a search and 2 books seem to satisfy my requirements: > > - Markov Chain Monte Carlo In Practice, by W.R. Gilks and others. > > - Monte Carlo Statistical methods, Robert and Casella. > > What do people think about these books? Is there a suggestion of some other > book that could satisfy better my requirements? > > Thank you very much in advance. > > > > > > The information contained herein is confidential and is inte...{{dropped}} > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions & sub
> x <- scan("clipboard", what="") Read 7 items > x [1] "1.11" "10.11" "11.11" "113.31" "114.2" "114.3" "114.8" > gsub("[0-9]*\\.", "", x) [1] "11" "11" "11" "31" "2" "3" "8" > Bernd Weiss wrote: > Dear all, > > I am struggling with the use of regular expression. I got > > >>as.character(test$sample.id) > > [1] "1.11" "10.11" "11.11" "113.31" "114.2" "114.3" "114.8" > > and need > > [1] "11" "11" "11" "31" "2" "3" "8" > > I.e. remove everything before the "." . > > TIA, > > Bernd > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] queer data set
Here's one way of working with the data you gave: > x <- read.table(file("clipboard"), fill=T, header=T) > x HEADER1 HEADER2 HEADER3 HEADER3.1 1 A1 B1 C1 X11;X12;X13 2 A2 B2 C2 X21;X22;X23;X24;X25 3 A3 B3 C3 4 A4 B4 C4 X41;X42;X43 5 A5 B5 C5 X51 > apply(x, 1, function(x) strsplit(x[4], ";")[[1]]) $"1" [1] "X11" "X12" "X13" $"2" [1] "X21" "X22" "X23" "X24" "X25" $"3" character(0) $"4" [1] "X41" "X42" "X43" $"5" [1] "X51" > do.call("rbind", apply(x, 1, function(x) { +y <- strsplit(x[4], ";")[[1]] +x3 <- matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T) +return(cbind(x3, if (length(y)) y else "NA")) + })) [,1] [,2] [,3] [,4] [1,] "A1" "B1" "C1" "X11" [2,] "A1" "B1" "C1" "X12" [3,] "A1" "B1" "C1" "X13" [4,] "A2" "B2" "C2" "X21" [5,] "A2" "B2" "C2" "X22" [6,] "A2" "B2" "C2" "X23" [7,] "A2" "B2" "C2" "X24" [8,] "A2" "B2" "C2" "X25" [9,] "A3" "B3" "C3" "NA" [10,] "A4" "B4" "C4" "X41" [11,] "A4" "B4" "C4" "X42" [12,] "A4" "B4" "C4" "X43" [13,] "A5" "B5" "C5" "X51" > This of course is a matrix; you can convert it back to a dataframe using as.data.frame() if you desire. Use either "NA" (with quotes) or NA (without quotes) to control whether you get just the string "NA" or an actual character NA value in column 4. If you're processing a huge amount of data, you can probably do better by rewriting the above code to avoid implicit coercions of data types. hope this helps, Tony Plate S.O. Nyangoma wrote: > I have a dataset that is basically structureless. Its dimension varies > from row to row and sep(s) are a mixture of tab and semi colon (;) and > example is > > HEADER1 HEADER2 HEADER3 HEADER3 > A1 B1 C1 X11;X12;X13 > A2 B2 C2 X21;X22;X23;X24;X25 > A3 B3 C3 > A4 B4 C4 X41;X42;X43 > A5 B5 C5 X51 > > etc., say. Note that a blank under HEADER3 corresponds to non > occurance and all semi colon (;) delimited variables are under > HEADER3. These values run into tens of thousands. I want to give some > order to this queer matrix to something like: > > HEADER1 HEADER2 HEADER3 HEADER3 > A1 B1 C1 X11 > A1 B1 C1 X12 > A1 B1 C1 X13 > A1 B1 C1 X14 > A2 B2 C2 X21 > A2 B2 C2 X22 > A2 B2 C2 X23 > A2 B2 C2 X24 > A2 B2 C2 X25 > A2 B2 C2 X26 > A3 B3 C3 NA > A4 B4 C4 X41 > A4 B4 C4 X42 > A4 B4 C4 X43 > > Is there a brilliant R-way of doing such task? > > Goodday. Stephen. > > > > > > > > > - Original Message - > From: Prof Brian Ripley <[EMAIL PROTECTED]> > Date: Monday, August 15, 2005 11:13 pm > Subject: Re: [R] How to get a list work in RData file > > >>On Mon, 15 Aug 2005, Xiyan Lon wrote: >> >> >>>Dear R-Helper, >> >>(There are quite a few of us.) >> >> >>>I want to know how I get a list work which I saved in RData >> >>file. For >> >>>example, >> >>I don't understand that at all, but it looks as if you want to >>save an >>unevaluated call, in which case see ?quote and use something like >> >>xyadd <- quote(test.xy(x=2, y=3)) >> >>load and saving has nothing to do with this: it doesn't change the >>meaning >>of objects in the workspace. >> >> >>>>test.xy <- function(x,y) { >>> >>>+xy <- x+y >>>+xy >>>+ } >>> >>>>xyadd <- test.xy(x=2, y=3) >>>>xyadd >>> >>>[1] 5 >>> >>>>x1 <- c(2,43,60,8) >>>>y1 <- c(91,7,5,30) >>>> >>>>xyad
Re: [R] Why only a "" string for heading for row.names with write.csv with a matrix?
Here's a relatively easy way to get what I think you want. Note that converting x to a data frame before cbind'ing allows the type of the elements of x to be preserved: > x <- matrix(1:6, 2,3) > rownames(x) <- c("ID1", "ID2") > colnames(x) <- c("Attr1", "Attr2", "Attr3") > x Attr1 Attr2 Attr3 ID1 1 3 5 ID2 2 4 6 > write.table(cbind(id=row.names(x), as.data.frame(x)), row.names=FALSE, sep=",") "id","Attr1","Attr2","Attr3" "ID1",1,3,5 "ID2",2,4,6 > As to why you can't get this via an argument to write.table (or write.csv), I suspect that part of the answer is a wish to avoid "creeping featuritis". Transferring data between programs is notoriously infuriating. There are more data formats than there are programs, but few programs use the same format as their default & preferred format. So to accommodate everyone's preferred format would require an extremely large number of features in the data import/export functions. Maintaining software that contains a large number of features is difficult -- it's easy for errors to creep in because there are so many combinations of how different features can be used on different functions. The alternative to having lots of features on each function is to have a relatively small set of powerful functions that can be used to construct the behavior you want. This type of software is thought by many to be easier to maintain and extend. I think is is pretty much the preferred approach in R. The above one-liner for writing the data in the form you want is really not much more complex than using an additional argument to write.table(). (And if you need to do this kind of thing frequently, then it's easy in R to create your own wrapper function for 'write.table'.) One might object to this line of explanation by noting that many functions already have many arguments and lots of features. I think the situation is that the original author of any particular function gets to decide what features the function will have, and after that there is considerable reluctance (justifiably) to add new features, especially in cases where there desired functionality can be easily achieved in other ways with existing functions. -- Tony Plate Earl F. Glynn wrote: > Consider: > >>x <- matrix(1:6, 2,3) >>rownames(x) <- c("ID1", "ID2") >>colnames(x) <- c("Attr1", "Attr2", "Attr3") > > >>x > > Attr1 Attr2 Attr3 > ID1 1 3 5 > ID2 2 4 6 > > >>write.csv(x,file="x.csv") > > "","Attr1","Attr2","Attr3" > "ID1",1,3,5 > "ID2",2,4,6 > > Have I missed an easy way to get the "" string to be something meaningful? > > There is no information in the "" string. This column heading for the row > names often could used as a database key, but the "" entry would need to be > manually edited first. Why not provide a way to specify the string instead > of putting "" as the heading for the rownames? > >>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html > > Header line > R prefers the header line to have no entry for the row names, > . . . > Some other systems require a (possibly empty) entry for the row names, > which is what write.table will provide if argument col.names = NA is > specified. Excel is one such system. > > Why is an "empty" entry the only option here? > > A quick solution that comes to mind seems a bit kludgy: > > >>y <- cbind(rownames(x), x) >>colnames(y)[1] <- "ID" >>y > > IDAttr1 Attr2 Attr3 > ID1 "ID1" "1" "3" "5" > ID2 "ID2" "2" "4" "6" > > >>write.table(y, row.names=F, col.names=T, sep=",", file="y.csv") > > "ID","Attr1","Attr2","Attr3" > "ID1","1","3","5" > "ID2","2","4","6" > > Now the rownames have an "ID" header, which could be used as a key in a > database if desired without editing (but all the "numbers" are now > characters strings, too). > > It's also not clear why I had to use write.table above, instead of > write.csv: > >>write.csv(y, row.names=F, col.names=T, file="y.csv") > > Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") : > col.names = NA makes no sense when row.names = FALSE > > Thanks for any insight about this. > > efg > -- > Earl F. Glynn > Bioinformatics > Stowers Institute > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Seeking help with a loop
> x <- data.frame(q33a=3:4,q33b=5:6,q35a=1:2,q35b=2:1) > y <- list() > for (i in grep("q33", colnames(x), value=TRUE)) +y[[sub("q33","",i)]] <- ifelse(x[[sub("q33","q35",i)]]==1, x[[i]], NA) > as.data.frame(y) a b 1 3 NA 2 NA 6 > # if you really want to create new variables rather > # than have them in a data frame: > # (use paste() or sub() to modify the names if you > # want something like "newfielda") > for (i in names(y)) assign(i, y[[i]]) > a [1] 3 NA > b [1] NA 6 > hope this helps, Tony Plate Greg Blevins wrote: > Hello R Helpers, > > After spending considerable time attempting to write a loop (and searching > the help archives) I have decided to post my problem. > > In a dataframe I have columns labeled: > > q33a q33b q33c...q33rq35a q35b q35c...q35r > > What I want to do is create new variables based on the following logic: > newfielda <- ifelse(q35a==1, q33a, NA) > newfieldb <- ifelse(q35b==1, q33b, NA) > ... > newfieldr > > What I did was create two new dataframes, one containing q33a-r the other > q35a-r and tried to loop over both, but I could not get any of the loop > syntax I tried to give me the result I was seeking. > > Any help would be much appreciated. > > Greg Blevins > Partner > The Market Solutions Group, Inc. > Minneapolis, MN > > Windows XP, R 2.1.1 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
Isn't this a little trickier with non-normal variables? It sounds like Menghui Chen wants variables that have uniform marginal distribution, and a specified correlation. When I look at histograms (or just the quantiles) of the rows of dat2 in your example, I see something for dat2[2,] that does not look much like it comes from a uniform distribution. > dat<-matrix(runif(2000),2,1000) > rho<-.77 > R<-matrix(c(1,rho,rho,1),2,2) > ch<-chol(R) > dat2<-t(ch)%*%dat > cor(dat2[1,],dat2[2,]) [1] 0.7513892 > hist(dat2[1,]) > hist(dat2[2,]) > > quantile(dat2[1,]) 0% 25% 50% 75%100% 0.000655829 0.246216035 0.507075912 0.745158441 0.16418 > quantile(dat2[2,]) 0% 25% 50% 75% 100% 0.0393046 0.4980066 0.7150426 0.9208855 1.3864704 > -- Tony Plate Jim Brennan wrote: > dat<-matrix(runif(2000),2,1000) > rho<-.77 > R<-matrix(c(1,rho,rho,1),2,2) > ch<-chol(R) > dat2<-t(ch)%*%dat > cor(dat2[1,],dat2[2,]) [1] 0.7513892 > >>dat<-matrix(runif(2),2,1) >>rho<-.28 >>R<-matrix(c(1,rho,rho,1),2,2) >>ch<-chol(R) >>dat2<-t(ch)%*%dat >>cor(dat2[1,],dat2[2,]) > > [1] 0.2681669 > >>dat<-matrix(runif(20),2,10) >>rho<-.28 >>R<-matrix(c(1,rho,rho,1),2,2) >>ch<-chol(R) >>dat2<-t(ch)%*%dat >>cor(dat2[1,],dat2[2,]) > > [1] 0.2814035 > > See ?choleski > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen > Sent: July 1, 2005 4:49 PM > To: r-help@stat.math.ethz.ch > Subject: [R] Generating correlated data from uniform distribution > > Dear R users, > > I want to generate two random variables (X1, X2) from uniform > distribution (-0.5, 0.5) with a specified correlation coefficient r. > Does anyone know how to do it in R? > > Many thanks! > > Menghui > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] function for cumulative occurrence of elements
I'm not entirely sure what you want, but is it "9 5 3" for this data? (9 "new" species occur at the first point, 5 "new" at the second, and 3 "new" at the third). If this is right, then to get "accumulation curve when random Points are considered", you can probably just index rows of dt appropriately. > dd <- read.table("clipboard", header=T) > dd[,1:3] Pointspecies frequency 1 7 American_elm 7 2 7 apple 2 3 7 black_cherry 8 4 7 black_oak 1 5 7chokecherry 1 6 7 oak_sp 1 7 7 pignut_hickory 1 8 7 red_maple 1 9 7 white_oak 5 10 9 black_spruce 2 11 9blue_spruce 2 12 9missing12 13 9 Norway_spruce 8 14 9 white_spruce 3 1512 apple 2 1612 black_cherry 1 1712 black_locust 1 1812 black_walnut 1 1912 lilac 3 2012missing 2 > # dt: table of which species occur at which "Points" > dt <- table(dd$Point, dd$species) > # doc: for each species, the index of the "Point" where > # it first occurs > doc <- apply(dt, 2, function(x) which(x==1)[1]) > doc American_elm apple black_cherry black_locust black_oak 1 1 1 3 1 black_spruce black_walnutblue_sprucechokecherry lilac 2 3 2 1 3 missing Norway_spruce oak_sp pignut_hickory red_maple 2 2 1 1 1 white_oak white_spruce 1 2 > table(doc) doc 1 2 3 9 5 3 > hope this helps, Tony Plate Steven K Friedman wrote: > Hello, > > I have a data set with 9700 records, and 7 parameters. > > The data were collected for a survey of forest communities. Sample plots > (1009) and species (139) are included in this data set. I need to determine > how species are accumulated as new plots are considered. Basically, I want > to develop a species area curve. > > I've included the first 20 records from the data set. Point represents the > plot id. The other parameters are parts of the information statistic H'. > > Using "Table", I can construct a data set that lists the occurrence of a > species at any Point (it produces a binary 0/1 data table). From there it > get confusing, regarding the most efficient approach to determining the > addition of new and or repeated species occurrences. > > ptcount <- table(sppoint.freq$species, sppoint.freq$Point) > > From here I've played around with colSums to calculate the number of species > at each Point. The difficulty is determining if a species is new or > repeated. Also since there are 1009 points a function is needed to screen > every Point. > > Two goals are of interest: 1) the species accumulation curve, and 2) an > accumulation curve when random Points are considered. > > Any help would be greatly appreciated. > > Thank you > Steve Friedman > > > Pointspecies frequency point.list point.prop log.prop > point.hprime > 1 7 American elm 7 27 0.25925926 -1.3499267 > 0.3499810 > 2 7 apple 2 27 0.07407407 -2.6026897 > 0.1927918 > 3 7 black cherry 8 27 0.29629630 -1.2163953 > 0.3604134 > 4 7 black oak 1 27 0.03703704 -3.2958369 > 0.1220680 > 5 7chokecherry 1 27 0.03703704 -3.2958369 > 0.1220680 > 6 7 oak sp 1 27 0.03703704 -3.2958369 > 0.1220680 > 7 7 pignut hickory 1 27 0.03703704 -3.2958369 > 0.1220680 > 8 7 red maple 1 27 0.03703704 -3.2958369 > 0.1220680 > 9 7 white oak 5 27 0.18518519 -1.6863990 > 0.3122961 > 10 9 black spruce 2 27 0.07407407 -2.6026897 > 0.1927918 > 11 9blue spruce 2 27 0.07407407 -2.6026897 > 0.1927918 > 12 9missing12 27 0. -0.8109302 > 0.3604134 > 13 9 Norway spruce 8 27 0.29629630 -1.2163953 > 0.3604134 > 14 9 white spruce 3 27 0. -2.1972246 > 0.2441361 > 1512 apple 2 27 0.07407407 -2.6026897 > 0.1927918 > 1612 black cherry 1 27
Re: [R] summary(as.factor(x) - force to not sort the result according factor levels
Christoph Lehmann wrote: Hi The result of a summary(as.factor(x)) (see example below) call is sorted according to the factor level. How can I get the result not sorted but in the original order of the levels in x? by creating the factor with the levels in the order you want: > test <- c(120402, 120402, 120402, 1323, 1323,200393, 200393, 200393, 200393, 200393) > summary(factor(test, levels=unique(test))) 120402 1323 200393 3 2 5 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] na.action
Maybe this does what you want: > x <- as.matrix(read.table("clipboard")) > x V1 V2 V3 V4 1 NA 0 0 0 2 0 NA 0 NA 3 0 0 NA 2 4 0 0 2 NA > rowSums(x==2, na.rm=T) 1 2 3 4 0 0 1 1 > There's probably at least 5 or 6 other quite sensible ways of doing this, but this is probably the fastest (and the least versatile). A more general building block is the sum() function, as in: > sum(x[3,]==2, na.rm=T) [1] 1 > The key is the use of the 'na.rm=T' argument value. hope this helps, Tony Plate Tim Smith wrote: Hi, I had the following code: testp <- rcorr(t(datcm1),type = "pearson") mat1 <- testp[[1]][,] > 0.6 mat2 <- testp[[3]][,] < 0.05 mat3 <- mat1 + mat2 The resulting mat3 (smaller version) matrix looks like: NA 000 0 NA0 NA 0 0 NA2 0 02 NA To get to the number of times a '2' appears in the rows, I was trying to run the following code: numrow = nrow(mat3) counter <- matrix(nrow = numrow,ncol =1) for(i in 1:numrow){ count = 0; for(j in 1:numrow){ if(mat3[i,j] == 2){ count = count + 1 } } counter[i,1] = count } However, I get the following error: 'Error in if (mat3[i, j] == 2) { : missing value where TRUE/FALSE needed' I also tried to use the na.action, but couldn't get anything. I'm sure there must be a relatively easy fix to this. Is there a workaround this problem? thanks, Tim __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subarrays
Here's one way: > subarray <- function(x, marginals, intervals) { + if (length(marginals) != length(intervals)) + stop("marginals and intervals must be the same length (intervals can be a list)") + if (any(marginals<1 | marginals>length(dim(x + stop("marginals must contain values in 1:length(dim(x))") + ic <- Quote(x[, drop=T]) + # ic has 4 elts with one empty index arg + ic2 <- ic[c(1, 2, rep(3, length(dim(x))), 4)] + # ic2 has an empty arg for each dim of x + ic2[marginals+2] <- intervals + eval(ic2) > } > subarray(v, c(1,4), c(3,2)) [,1] [,2] [,3] [,4] [1,] 67 83 99 115 [2,] 71 87 103 119 [3,] 75 91 107 123 [4,] 79 95 111 127 > subarray(v, c(1,4), list(3,2)) [,1] [,2] [,3] [,4] [1,] 67 83 99 115 [2,] 71 87 103 119 [3,] 75 91 107 123 [4,] 79 95 111 127 > subarray(v, c(1,3,4), list(c(1,3,4),1,2)) [,1] [,2] [,3] [,4] [1,] 65 69 73 77 [2,] 67 71 75 79 [3,] 68 72 76 80 > Question for language experts: is this the best way to create and manipulate R language expressions that contain empty arguments, or are there other preferred ways? -- Tony Plate Gunnar Hellmund wrote: Define an array v<-1:256 dim(v)<-rep(4,4) Subarrays can be obtained as follows: v[3,2,,2] [1] 71 87 103 119 v[3,,,2] [,1] [,2] [,3] [,4] [1,] 67 83 99 115 [2,] 71 87 103 119 [3,] 75 91 107 123 [4,] 79 95 111 127 In the general case this procedure is very tedious. Given an array A, dim(A)=(dim_1,dim_2,...,dim_d) and two vectors v1=(n_i1,...n_ik), v2=(int_1,...,int_k) ('marginals' and relevant 'interval numbers') is there a smart way to obtain A[,...,int_1,,int_2,,,int_k,] ? Best wishes Gunnar Hellmund __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reconstruction of a "valid" expression within a function
You are passing just a string to subset(). At the very least you need to parse it (but still this does not work easily with subset() -- see below). But are you sure you need to do this? subset() for dataframes already accepts subset expressions involving the columns of the dataframe, e.g.: > df <- data.frame(x=1:10,y=rep(1:5,2)) > subset(df, y==2) x y 2 2 2 7 7 2 > However, it's tricky to get subset() to work with an expression for its subset argument. This is because of the way it evaluates its subset expression (look at the code for subset.data.frame()). > subset(df, parse(text="df$y==2")) Error in subset.data.frame(df, parse(text = "df$y==2")) : 'subset' must evaluate to logical > subset(df, parse(text="y==2")) Error in subset.data.frame(df, parse(text = "y==2")) : 'subset' must evaluate to logical > It's a little tricky in general passing R language expressions around, because many functions that work with expressions work with the unevaluated form of the actual argument, rather than with an R language expression as the value of a variable. E.g.: > with(df, y==2) [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE > cond <- parse(text="y==2") > cond expression(y == 2) > with(df, cond) expression(y == 2) One way to make these types of functions work with R language expressions as the value of a variable is to use do.call(): > do.call("with", list(df, cond)) [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE > So, returning to subset(), you can give it an expression that is stored in the value of a variable like this: > do.call("subset", list(df, cond)) x y 2 2 2 7 7 2 > However, if you're a beginner at R, I suspect that you'll get much further if you avoid such meta-language constructs and just find a way to make subset() work for you without trying to paste together R language expressions. Hope this helps, -- Tony Plate Pascal Boisson wrote: Hello all, I have some trouble in reconstructing a valid expression within a function, here is my question. I am building a function : SUB<-function(DF,subset=TRUE) { #where DF is a data frame, with Var1, Var2, Fact1, Fact2, Fact3 #and subset would be an expression, eg. Fact3 == 1 #in a first time I want to build a subset from DF #I managed to, with an expression like eg. DF$Fact3, # but I would like to skip the DF$ for convenience # so I tried something like this : tabsub<-deparse(substitute(subset)) dDF<-deparse(substitute(DF)) if (tabsub[1]!="TRUE") { subset<-paste(dDF,"$",tabsub,sep="")} #At this point, I have a string that seems to be the expression that I want sDF<-subset(DF, subset) } #But I have an error message : Error in r & !is.na(r) : operations are possible only for numeric or logical types I can not understand why is that, even after I've tried to convert properly the string into an expression. I've been all the day trying to sort that problem ... Maybe this attempt is ackward and I have not understood what is really behind an expression. But if anyone could give me a tip concerning this problem or point me to relevant references, I would really appreciate. Thanks Pascal Boisson _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DISCLAIMER:\ \ This email is from the Scottish Crop Researc...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting the name of an object as character
If you're trying to find the textual form of an actual argument, here's one way: > foo <- function(x) { + xn <- substitute(x) + if (is.name(xn) && !exists(as.character(xn))) + as.character(xn) + else + x + } > foo(x) [1] 3 > foo(xx) [1] "xx" > foo(list(xx)) Error in foo(list(xx)) : Object "xx" not found > If you want the textual form of arguments that are expressions, use deparse() and a different test (& beware that deparse() can return a vector of character data). Although you can do this in R, it is not always advisable practice. Many people who have written functions with non-standard evaluation rules like this have come to regret it (one reason is that it makes these functions difficult to use in programs, another is that the behavior of the function can depend upon what global variables exists, another is that when the function works as intended, that's great, but when it doesn't, users can get quite confused trying to figure out what it's doing.) The R function help() is an example of a commonly used function with a non-standard evaluation rule. -- Tony Plate Ali - wrote: This could be really trivial, but I cannot find the right function to get the name of an object as a character. Assume we have a function like: getName <- function(obj) Now if we call the function like: getName(blabla) and 'blabla' is not a defined object, I want getName to return "blabla". In other word, if paste("blabla") returns "blabla" I want to define a paste function which returns the same character by: paste(blabla) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Defining binary indexing operators
Excuse me! I misunderstood the question, and indeed, it is necessary be that complicated when you try to make x$y behave the same as foo(x,y), rather than foo(x,"y") (doing the former would be inadvisible, as I think someelse pointed out too.) Tony Plate wrote: It's not necessary to be that complicated, is it? AFAIK, the '$' operator is treated specially by the parser so that its RHS is treated as a string, not a variable name. Hence, a method for "$" can just take the indexing argument directly as given -- no need for any fancy language tricks (eval(), etc.) > x <- structure(3, class = "myclass") > y <- 5 > foo <- function(x,y) paste(x, " indexed by '", y, "'", sep="") > foo(x, y) [1] "3 indexed by '5'" > "$.myclass" <- foo > x$y [1] "3 indexed by 'y'" > The point of the above example is that foo(x,y) behaves differently from x$y even when both call the same function: foo(x,y) uses the value of the variable 'y', whereas x$y uses the string "y". This is as desired for an indexing operator "$". -- Tony Plate Gabor Grothendieck wrote: On 4/27/05, Ali - <[EMAIL PROTECTED]> wrote: Assume we have a function like: foo <- function(x, y) how is it possible to define a binary indexing operator, denoted by $, so that x$y functions the same as foo(x, y) Here is an example. Note that $ does not evaluate y so you have to do it yourself: x <- structure(3, class = "myclass") y <- 5 foo <- function(x,y) x+y "$.myclass" <- function(x, i) { i <- eval.parent(parse(text=i)); foo(x, i) } x$y # structure(8, class = "myclass") [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Defining binary indexing operators
It's not necessary to be that complicated, is it? AFAIK, the '$' operator is treated specially by the parser so that its RHS is treated as a string, not a variable name. Hence, a method for "$" can just take the indexing argument directly as given -- no need for any fancy language tricks (eval(), etc.) > x <- structure(3, class = "myclass") > y <- 5 > foo <- function(x,y) paste(x, " indexed by '", y, "'", sep="") > foo(x, y) [1] "3 indexed by '5'" > "$.myclass" <- foo > x$y [1] "3 indexed by 'y'" > The point of the above example is that foo(x,y) behaves differently from x$y even when both call the same function: foo(x,y) uses the value of the variable 'y', whereas x$y uses the string "y". This is as desired for an indexing operator "$". -- Tony Plate Gabor Grothendieck wrote: On 4/27/05, Ali - <[EMAIL PROTECTED]> wrote: Assume we have a function like: foo <- function(x, y) how is it possible to define a binary indexing operator, denoted by $, so that x$y functions the same as foo(x, y) Here is an example. Note that $ does not evaluate y so you have to do it yourself: x <- structure(3, class = "myclass") y <- 5 foo <- function(x,y) x+y "$.myclass" <- function(x, i) { i <- eval.parent(parse(text=i)); foo(x, i) } x$y # structure(8, class = "myclass") [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Summarizing factor data in table?
Do you want to count the number of non-NA divisions and organizations in the data for each year (where duplicates are counted as many times as they appear)? > tapply(!is.na(foo$div), foo$yr, sum) 1998 1999 2000 042 > tapply(!is.na(foo$org), foo$yr, sum) 1998 1999 2000 442 > Or perhaps the number of unique non-NA divisions and organizations in the data for each year? > tapply(foo$div, foo$yr, function(x) length(na.omit(unique(x 1998 1999 2000 042 > tapply(foo$org, foo$yr, function(x) length(na.omit(unique(x 1998 1999 2000 442 > (I don't understand where the "3" in your desired output comes from though, which maybe indicates I completely misunderstand your request.) Andy Bunn wrote: I have a very simple query with regard to summarizing the number of factors present in a certain snippet of a data frame. Given the following data frame: foo <- data.frame(yr = c(rep(1998,4), rep(1999,4), rep(2000,2)), div = factor(c(rep(NA,4),"A","B","C","D","A","C")), org = factor(c(1:4,1:4,1,2))) I want to get two new variables. Object ndiv would give the number of divisions by year: 1998 0 1999 3 2000 2 Object norgs would give the number of organizations 1998 4 1999 4 2000 2 I figure xtabs should be able to do it, but I'm stuck without a for loop. Any suggestions? -Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Index matrix to pick elements from 3-dimensional matrix
I'm assuming what you want to do is randomly sample from slices of A selected on the 3-rd dimension, as specified by J. Here's a way that uses indexing by a matrix. The cbind() builds a three column matrix of indices, the first two of which are randomly selected. The use of replace() is to make the result have the same attributes, e.g., dim and dimnames, as J. > A <- array(letters[1:12],c(2,2,3)) > J <- matrix(c(1,2,3,3),2,2) > replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), sample(dim(A)[2], length(J), rep=T), as.vector(J))]) [,1] [,2] [1,] "b" "l" [2,] "f" "k" > replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), sample(dim(A)[2], length(J), rep=T), as.vector(J))]) [,1] [,2] [1,] "b" "l" [2,] "h" "i" > replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), sample(dim(A)[2], length(J), rep=T), as.vector(J))]) [,1] [,2] [1,] "c" "l" [2,] "h" "k" > -- Tony Plate Robin Hankin wrote: Hello Juhana try this (but there must be a better way!) stratified.select <- function(A,J){ out <- sapply(J,function(i){sample(A[,,i],1)}) attributes(out) <- attributes(J) return(out) } A <- array(letters[1:12],c(2,2,3)) J <- matrix(c(1,2,3,3),2,2) R> stratified.select(A,J) [,1] [,2] [1,] "b" "i" [2,] "g" "k" R> stratified.select(A,J) [,1] [,2] [1,] "d" "j" [2,] "f" "l" R> best wishes Robin On Apr 26, 2005, at 05:16 am, juhana vartiainen wrote: Hi all Suppose I have a dim=c(2,2,3) matrix A, say: A[,,1]= a b c d A[,,2]= e f g h A[,,3]= i j k l Suppose that I want to create a 2x2 matrix X, which picks elements from the above-mentioned submatrices according to an index matrix J referring to the "depth" dimension: J= 1 3 2 3 In other words, I want X to be X= a j g l since the matrix J says that the (1,1)-element should be picked from A[,,1], the (1,2)-element should be picked from A[,,3], etc. I have A and I have J. Is there an expression in A and J that creates X? Thanks Juhana [EMAIL PROTECTED] -- Juhana Vartiainen docent in economics Director, FIEF (Trade Union Foundation for Economic Research, Stockholm), http://www.fief.se gsm +46 70 360 9915 office +46 8 696 9915 email [EMAIL PROTECTED] homepage http://www.fief.se/staff/Juhana/index.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Robin Hankin Uncertainty Analyst Southampton Oceanography Centre European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Proba( Ut+2=1 / ((Ut+1==1) && (Ut==1))) ?
table() can return all the n-gram statistics, e.g.: > v <- sample(c(-1,1), 1000, rep=TRUE) > table("v_{t-2}"=v[-seq(to=length(v), len=2)], "v_{t-1}"=v[-c(1,length(v))], "v_t"=v[-(1:2)]) , , v_t = -1 v_{t-1} v_{t-2} -1 1 -1 136 134 1 131 112 , , v_t = 1 v_{t-1} v_{t-2} -1 1 -1 131 113 1 115 126 > This says that there were 136 cases in which a -1 followed two -1's (and 126 cases in which a 1 followed to 1's). If you're really only interested in particular contexts, you can do something like: > table(v[-seq(to=length(v), len=2)]==1 & v[-c(1,length(v))]==1 & v[-(1:2)]==1) FALSE TRUE 872 126 > table(v[-seq(to=length(v), len=2)]==-1 & v[-c(1,length(v))]==-1 & v[-(1:2)]==-1) FALSE TRUE 862 136 or > sum(v[-seq(to=length(v), len=2)]==-1 & v[-c(1,length(v))]==-1 & v[-(1:2)]==-1) [1] 136 > vincent wrote: Dear all, First I apologize if my question is quite simple, but i'm very newbie with R. I have vectors of the form v = c(1,1,-1,-1,-1,1,1,1,1,-1,1) (longer than this one of course). The elements are only +1 or -1. I would like to calculate : - the frequencies of -1 occurences after 2 consecutives -1 - the frequencies of +1 occurences after 2 consecutives +1 It looks probably something like : Proba( Ut+2=1 / ((Ut+1==1) && (Ut==1))) could someone please give me a little hint about how i should/could begin to proceed ? Thanks (Thanks also to the R creators/contributors, this soft seems really great !) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pointer to comments re Paul Murrell's new book, R, & SAS on Andrew Gelman's blog
There are some interesting comments re Paul Murrell's new book, R, & SAS on Andrew Gelman's blog: http://www.stat.columbia.edu/~cook/movabletype/archives/2005/04/a_new_book_on_r.html -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] terminate R program when trying to access out-of-bounds array element?
Oops. The message in the 'stop' should be something more like "numeric index out of range". -- Tony Plate Tony Plate wrote: One way could be to make a special class with an indexing method that checks for out-of-bounds numeric indices. Here's an example for vectors: > setOldClass(c("oobcvec")) > x <- 1:3 > class(x) <- "oobcvec" > x [1] 1 2 3 attr(,"class") [1] "oobcvec" > "[.oobcvec" <- function(x, ..., drop=T) { +if (!missing(..1) && is.numeric(..1) && any(is.na(..1) | ..1 < 1 | ..1 > length(x))) +stop("numeric vector out of range") +NextMethod("[") + } > x[2:3] [1] 2 3 > x[2:4] Error in "[.oobcvec"(x, 2:4) : numeric vector out of range > Then, for vectors for which you want out-of-bounds checks done when they indexed, set the class to "oobcvec". This should work for simple vectors (I checked, and it works if the vectors have names). If you want this write a method like this for indexing matrices, you can use ..1 and ..2 to refer to the i and j indices. If you want to also be able to check for missing character indices, you'll just need to add more code. Note that the above example disallows 0 and negative indices, which may or may not be what you want. If you're extensively using other classes that you've defined, and you want out-of-bounds checking for them, then you need to integrate the checks into the subsetting methods for those classes -- you can't just use the above approach. hope this helps, Tony Plate Vivek Rao wrote: I want R to stop running a script (after printing an error message) when an array subscript larger than the length of the array is used, for example x = c(1) print(x[2]) rather than printing NA, since trying to access such an element may indicate an error in my program. Is there a way to get this behavior in R? Explicit testing with the is.na() function everywhere does not seem like a good solution. Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] terminate R program when trying to access out-of-bounds array element?
One way could be to make a special class with an indexing method that checks for out-of-bounds numeric indices. Here's an example for vectors: > setOldClass(c("oobcvec")) > x <- 1:3 > class(x) <- "oobcvec" > x [1] 1 2 3 attr(,"class") [1] "oobcvec" > "[.oobcvec" <- function(x, ..., drop=T) { +if (!missing(..1) && is.numeric(..1) && any(is.na(..1) | ..1 < 1 | ..1 > length(x))) +stop("numeric vector out of range") +NextMethod("[") + } > x[2:3] [1] 2 3 > x[2:4] Error in "[.oobcvec"(x, 2:4) : numeric vector out of range > Then, for vectors for which you want out-of-bounds checks done when they indexed, set the class to "oobcvec". This should work for simple vectors (I checked, and it works if the vectors have names). If you want this write a method like this for indexing matrices, you can use ..1 and ..2 to refer to the i and j indices. If you want to also be able to check for missing character indices, you'll just need to add more code. Note that the above example disallows 0 and negative indices, which may or may not be what you want. If you're extensively using other classes that you've defined, and you want out-of-bounds checking for them, then you need to integrate the checks into the subsetting methods for those classes -- you can't just use the above approach. hope this helps, Tony Plate Vivek Rao wrote: I want R to stop running a script (after printing an error message) when an array subscript larger than the length of the array is used, for example x = c(1) print(x[2]) rather than printing NA, since trying to access such an element may indicate an error in my program. Is there a way to get this behavior in R? Explicit testing with the is.na() function everywhere does not seem like a good solution. Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem using uniroot with integrate
At Wednesday 09:27 AM 3/9/2005, Ken Knoblauch wrote: Hi, I'm trying to calculate the value of the variable, dp, below, in the argument to the integral of dnorm(x-dp) * pnorm(x)^(m-1). This corresponds to the estimate of the sensitivity of an observer in an m-alternative forced choice experiment, given the probability of a correct response, Pc, a Gaussian assumption for the noise and no bias. The function that I wrote below gives me an error: Error in f(x, ...) : recursive default argument reference The problem seems to be at the statement using uniroot, because the furntion est.dp works fine outside of the main function. I've been using R for awhile but there are still many nuances about the scoping and the use of environments that I'm weak on and would like to understand better. I would appreciate any suggestions or solutions that anyone might offer for fixing my error. Thank you. dprime.mAFC <- function(Pc, m) { est.dp <- function(dp, Pc = Pc, m = m) { pr <- function(x, dpt = dp, m0 = m) { dnorm(x - dpt) * pnorm(x)^(m0 - 1) } Pc - integrate(pr, lower = -Inf, upper = Inf, dpt = dp, m0 = m)$value } dp.res <- uniroot(est.dp, interval = c(0,5), Pc = Pc, m = m) dp.res$root } You've got several problems here * recursive argument defaults: these are unnecessary but result in the particular error message you are seeing (e.g., in the def of est.dp, the default value for the argument 'm' is the value of the argument 'm' itself -- default values for arguments are interpreted in the frame of the function itself) * the argument m=m you supply to uniroot() is being interpreted as specifying the 'maxiter' argument to uniroot() I think you can fix it by changing the 'm' argument of function est.dp to be named 'm0', and specifying 'm0' in the call to uniroot. (but I can't tell for sure because you didn't supply a working example -- when I just guess at values to pass in I get numerical errors.) Also, it would be best to remove the incorrect recursive default arguments for the functions est.dp and pr. -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm and percentage data with many zero values
A very quick and easy thing to do with count data is to add 1 (or 0.5) to all your counts (I'm sure you can work backwards from abundance data to counts and then forward again). This gets rid of zero problems. In some cases this approximates a Bayesian approach with a low-information prior (though I'm not at all sure whether this is the case with a glm with Poisson errors). -- Tony Plate At Wednesday 08:02 AM 4/20/2005, Christian Kamenik wrote: Dear all, I am interested in correctly testing effects of continuous environmental variables and ordered factors on bacterial abundance. Bacterial abundance is derived from counts and expressed as percentage. My problem is that the abundance data contain many zero values: Bacteria <- c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0,0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15) First I tried transforming the data (e.g., logit) but because of the zeros I was not satisfied. Next I converted the percentages into integer values by round(Bacteria*10) or ceiling(Bacteria*10) and calculated a glm with a Poisson error structure; however, I am not very happy with this approach because it changes the original percentage data substantially (e.g., 0.03 becomes either 0 or 1). The same is true for converting the percentages into factors and calculating a multinomial or proportional-odds model (anyway, I do not know if this would be a meaningful approach). I was searching the web and the best answer I could get was http://www.biostat.wustl.edu/archives/html/s-news/1998-12/msg00010.html in which several persons suggested quasi-likelihood. Would it be reasonable to use a glm with quasipoisson? If yes, how I can I find the appropriate variance function? Any other suggestions? Many thanks in advance, Christian Christian Kamenik Institute of Plant Sciences University of Bern Altenbergrain 21 3013 Bern Switzerland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] two dimensional array of object elements
Create your original matrix as a list datatype. When assigning elements, be careful with the list structure, as the example indicates. > m <- 2; n <- 3 > a <- array(list(),c(m,n)) > a[1,2] <- list(b=1,c=2) Error in "[<-"(`*tmp*`, 1, 2, value = list(b = 1, c = 2)) : number of items to replace is not a multiple of replacement length > a[1,2] <- list(list(b=1,c=2)) > At Friday 11:36 AM 2/11/2005, Weijie Cai wrote: Hi list, I want to create a two (possibly three) dimensional array of objects. These objects are classes in object oriented style. I failed by using a<-array(NA,c(m,n)) for (i in 1:m){ for (j in 1:n){ a[i,j]<-My.Obj } } The elements are still NA. Any suggestions? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] reading the seed from a simulation
With most modern random number generators you can't capture the current state in a single 32-bit integer. (I suspect the .Random.seed you are seeing is the state contained in 625 integers). The easiest way to run reproducible simulations is to explicitly set the seed, using an integer, before each run. Then it's easy to put the random number generator into the same state again, e.g.: for (sim.num in 1:100) { set.seed(sim.num) ... run simulation ... } If you can't do this, you can record the value of .Random.seed prior to the simulation, and then when you want to reproduce that simulation again, set .Random.seed to that value, e.g.: > set.seed(1) > sample(1:100, 5) [1] 27 37 57 89 20 > sample(1:100, 5) [1] 90 94 65 62 6 > set.seed(1) > sample(1:100, 5) [1] 27 37 57 89 20 > saved.seed <- .Random.seed > sample(1:100, 5) [1] 90 94 65 62 6 > .Random.seed <- saved.seed > sample(1:100, 5) [1] 90 94 65 62 6 > This is not guaranteed to work with all random-number generators; see the NOTE section in ?set.seed -- Tony Plate At Friday 09:50 AM 12/17/2004, Suzette Blanchard wrote: Greetings, I have a simulation of a nonlinear model that is failing. But it does not fail til way into the simulation. I would like to look at the run that is failing and maybe I could if I could capture the seed for the failing run. The help file on set.seed says you can do it but when I tried rs<-.Random.seed print(paste("rs",rs,sep=" ")) I got 626 of them so I don't know how to identify the right one. Please can you help? Thank you, Suzette = Suzette Blanchard, Ph.D. UCSD-PPRU __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Percentages in contingency tables *warning trivial question*
The 'abind' function in the 'abind' package is a generalized binding functions for arrays. (I've never tried it with tables.) At Monday 04:36 AM 12/13/2004, BXC (Bendix Carstensen) wrote: [...snip...] The last step is necessary in the absence of a generalized cbind/rbind for tables/arrays. Please correct me if such a thing exists. If it does, it should be referenced under "see also" in the help page for cbind. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: Protocol for answering basic questions
Perhaps something like the following paragraph should be added to the start of the "Posting Guide" (as a new paragraph right after the existing first paragraph): Note that R-help is *not* intended for questions that are easily answered by consulting one of the FAQs or other introductory material (see "Do your homework before posting" below).Such questions are actively discouraged and are likely to evoke a brusque response. Questions about seemingly simple matters that are mentioned in the FAQs or other introductory material *are welcomed* on R-help *when the questioner obviously has done their homework and the question is accompanied by an explanation* like "FAQ 7.2.1 seems to be relevant to this but I couldn't understand/apply the answer because ...". Something like this would make it very clear up front what type of questions are not appropriate. (I'm not trying at all to dictate the policy, but as far as I can tell, the above summaries the attitude of the majority of very knowledgeable helpers that respond to questions on R-help.) Also, I think that John Maindonald's idea of a "I am new to R, where do I start?" page, with a link from the posting guide, is an excellent idea. I'm aware that some feel that the posting guide is already too long, but my feeling is that if users don't read a very easily accessible posting guide AND post inappropriate questions AND become offended by brusque responses, then they are beyond where they can easily be helped. The most important thing is to make it very clear what types of questions are and are not considered appropriate, so that beginning users know what they are getting into. And the following might merit inclusion in the FAQ: Why is R-help not for hand-holding beginner questions? R-help is a high traffic list and the general sentiment is that too many very simple questions will overwhelm everyone and most importantly result in the knowledgeable helpers ceasing to participate. The reason that there is no "R-help-me-quickly-I-dont-want-to-read-the-documentation" list is that no-one has felt that it would work well -- it is unlikely that many knowledgeable users of R would be willing to participate. Without such users participating, it is likely that sometimes bad advice would be offered and stand uncorrected, because R is a complex language with many ways of doing things, some markedly inferior to others. For these reasons, some feel it would be a very bad idea to create such a list. (However, anyone who believes otherwise and wishes to start and maintain such a list or other similar service is free to do so.) One reason for this overall state of affairs is that R is free software and consequently there is no revenue stream to support a hand-holding support service with paid employees. So although the actual software is free, some investment in terms of time spent reading documentation is required in order to use it. Furthermore, many of the frequent helpers on R-help have written introductory documents intended to help beginners with many aspects of learning and using R (e.g., "An Introduction to R", and the various FAQs). Consequently they sometimes get fed up getting asked again and again the same question they have already written a document to explain. Nonetheless, the general sentiment on R-help is very helpful -- a quote summarizes it well: "It's OK if you need some spoonfeeding (I need that quite often myself), but at least show how you have tried to use the spoon yourself, instead of just showing us your open mouth." [Attribution to Andy Liaw, or remain anonymous?] As some feel that sufficient time and bandwidth has already been spent on this issue, if anyone has any comments on this particular matter of an addition to the posting guide (or FAQ), feel free to choose to respond to me privately, and I will summarize as appropriate. -- Tony Plate __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] hashing using named lists
Use match() for exact matching, i.e., > test[[match("name", names(test))]] Yes, it is more cumbersome. This partial matching is considered by some to be a design fault, but changing it would break too many programs that depend upon it. I don't understand your question about all.equal.list() -- it does seem to require exact matches on names, e.g.: > all.equal(list(a=1:3), list(aa=1:3)) [1] "Names: 1 string mismatches" > all.equal(list(aa=1:3), list(a=1:3)) [1] "Names: 1 string mismatches" > (the above run in R 2.0.0) -- Tony Plate (BTW, in R this operation is generally called "indexing" or "subscripting" or "extraction", but not "hashing". "Hashing" is a specific technique for managing and looking up indices, which is why some other programming languages refer to list-like objects that are indexed by character strings as "hashes". I don't think hashing is used for list names in R, but someone please correct me if I'm wrong! ) At Thursday 09:29 AM 11/18/2004, ulas karaoz wrote: hi all, I am trying to use named list to hash a bunch of vector by name, for instance: test = list() test$name = c(1,2,3) the problem is that when i try to get the values back by using the name, the matching isn't done in an exact way, so test$na is not NULL. is there a way around this? Why by default all.equal.list doesnt require an exact match? How can I do hashing in R? thanks. ulas. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Resources for optimizing code
Have you tried reading the manual "An Introduction to R", with special attention to "Array Indexing" (indexing for data frames is pretty similar to indexing for matrices). Unless I'm misunderstanding, what you want to do is very simple. It is possible to use numeric vectors with 0 and 1 to indicate whether you want to keep the row, but it's a little easier with logical vectors. Here's an example: > x <- data.frame(a=1:5,b=letters[1:5]) > keep.num <- ifelse(x$a %% 2 == 1, 1, 0) > keep.num [1] 1 0 1 0 1 > keep.logical <- (x$a %% 2) == 1 > keep.logical [1] TRUE FALSE TRUE FALSE TRUE > x[keep.num==1,,drop=F] a b 1 1 a 3 3 c 5 5 e > x[keep.logical,,drop=F] a b 1 1 a 3 3 c 5 5 e > At Friday 10:34 AM 11/5/2004, Janet Elise Rosenbaum wrote: I want to eliminate certain observations in a large dataframe (21000x100). I have written code which does this using a binary vector (0=delete obs, 1=keep), but it uses for loops, and so it's slow and in the extreme it causes R to hang for indefinite time periods. I'm looking for one of two things: 1. A document which discusses how to avoid for loops and situations in which it's impossible to avoid for loops. or 2. A function which can do the above better than mine. My code is pasted below. Thanks so much, Janet # asst is a binary vector of length= nrow(DATAFRAME). # 1= observations you want to keep. 0= observation to get rid of. remove.xtra.f <-function(asst, DATAFRAME) { n<-sum(asst, na.rm=T) newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME)) j<-1 for(i in 1:length(data)) { if (asst[i]==1) { newdata[j,]<-DATAFRAME[i,] j<-j+1 } } newdata.f<-as.data.frame(newdata) names(newdata.f)<-names(DATAFRAME) return(newdata.f) } -- Janet Rosenbaum [EMAIL PROTECTED] PhD Candidate in Health Policy, Harvard GSAS Harvard Injury Control Research Center, Harvard School of Public Health __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reading word by word in a dataset
Trying to make it work when not all rows have the same numbers of fields seems like a good place to use the "flush" argument to scan() (to skip everything after the first field on the line): With the following copied to the clipboard: i1-apple10$ New_York i2-banana i3-strawberry 7$Japan do: > scan("clipboard", "", flush=T) Read 3 items [1] "i1-apple" "i2-banana" "i3-strawberry" > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T)) Read 3 items [1] "apple" "banana" "strawberry" > -- Tony Plate At Monday 01:59 PM 11/1/2004, Spencer Graves wrote: Uwe and Andy's solutions are great for many applications but won't work if not all rows have the same numbers of fields. Consider for example the following modification of Lee's example: i1-apple10$ New_York i2-banana i3-strawberry 7$Japan If I copy this to "clipboard" and run Andy's code, I get the following: > read.table("clipboard", colClasses=c("character", "NULL", "NULL")) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 3 elements We can get around this using "scan", then splitting things apart similar to the way Uwe described: > dat <- + scan("clipboard", character(0), sep="\n") Read 3 items > dash <- regexpr("-", dat) > dat2 <- substring(dat, pmax(0, dash)+1) > > blank <- regexpr(" ", dat2) > if(any(blank<0)) + blank[blank<0] <- nchar(dat2[blank<0]) > substring(dat2, 1, blank) [1] "apple " "banana" "strawberry " hope this helps. spencer graves Uwe Ligges wrote: Liaw, Andy wrote: Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have: read.table("clipboard", colClasses=c("character", "NULL", "NULL")) V1 1 i1-apple 2 i2-banana 3 i3-strawberry ... and if only the words after "-" are of interest, the statement can be followed by sapply(strsplit(, "-"), "[", 2) Uwe Ligges HTH, Andy From: j lee Hello All, I'd like to read first words in lines into a new file. If I have a data file the following, how can I get the first words: apple, banana, strawberry? i1-apple10$ New_York i2-banana 5$London i3-strawberry 7$Japan Is there any similar question already posted to the list? I am a bit new to R, having a few months of experience now. Cheers, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] make apply() return a list
for()-loops aren't so bad. Look inside the code of apply() and see what it uses! The important thing is that you use vectorized functions to manipulate vectors. It's often fine to use for-loops to manipulate the rows or columns of a matrix, but once you've extracted a row or a column, then use a vectorized function to manipulate that data. In any case, one way to get apply() to return a list is to wrap the result from the subfunction inside a list, e.g.: > x <- apply(matrix(1:6,2), 1, function(x) list((c(mean=mean(x), sd=sd(x) > x [[1]] [[1]][[1]] mean sd 32 [[2]] [[2]][[1]] mean sd 42 > # to remove the extra level of listing here, do: > lapply(x, "[[", 1) [[1]] mean sd 32 [[2]] mean sd 42 > At Monday 11:37 AM 11/1/2004, Arne Henningsen wrote: Hi, I have a dataframe (say myData) and want to get a list (say myList) that contains a matrix for each row of the dataframe myData. These matrices are calculated based on the corresponding row of myData. Using a for()-loop to do this is very slow. Thus, I tried to use apply(). However, afaik apply() does only return a list if the matrices have different dimensions, while my matrices have all the same dimension. To get a list I could change the dimension of one matrix artificially and restore it after apply(): This a (very much) simplified example of what I did: > myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) > myFunction <- function( values ) { +myMatrix <- matrix( values, 2, 2 ) +if( all( values == myData[ 1, ] ) ) { + myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) +} +return( myMatrix ) + } > myList <- apply( myData, 1, myFunction ) > myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] > myList $"1" [,1] [,2] [1,]11 [2,]44 $"2" [,1] [,2] [1,]22 [2,]55 $"3" [,1] [,2] [1,]33 [2,]66 This exactly does what I want and really speeds up the calculation, but I wonder if there is an easier way to make apply() return a list. Thanks for your help, Arne -- Arne Henningsen Department of Agricultural Economics University of Kiel Olshausenstr. 40 D-24098 Kiel (Germany) Tel: +49-431-880 4445 Fax: +49-431-880 1397 [EMAIL PROTECTED] http://www.uni-kiel.de/agrarpol/ahenningsen/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] why should you set the mode in a vector?
It's useful when you need to be certain of the mode of a vector. One such situation is when you are about to call a C-language function using the .C() interface. As you point out, some assignments (even just to vector elements) can change the mode of the entire vector. This is why it's important to check the mode of vectors passed to external language functions immediately before the call. As to what assigning the mode does, it specifies (or changes, if necessary) the underlying type of storage of the vector. In R, all the elements in a vector have the same storage mode. In the example below, the storage is initial as double-precision floats, but after the assignment of character data to element 2, the vector is stored as character data (with suitably coerced values of the other elements). After assignment of list data to element 1, the entire vector becomes a list (i.e., a vector of pointers to general objects). [The terminology I'm using here is a little loose, but someone please correct me if it is outright wrong.] Finally, the assigning of mode "numeric" to the list fails because not all elements can be coerced. (And I'm not sure why the last assignment succeeds and produces the results it does.) > v <- vector(mode="numeric",length=4) > v[3:4] <- 3:4 > storage.mode(v) [1] "double" > v[2] <- "foo" > v [1] "0" "foo" "3" "4" > storage.mode(v) [1] "character" > > v[1] <- list(1:3) > v [[1]] [1] 1 2 3 [[2]] [1] "foo" [[3]] [1] "3" [[4]] [1] "4" > mode(v) <- "numeric" Error in as.double.default(list(as.integer(c(1, 2, 3)), "foo", "3", "4")) : (list) object cannot be coerced to double > x <- v[2:4] > mode(x) <- "numeric" > x [1] NA NA NA > -- Tony Plate At Friday 03:41 PM 10/29/2004, Joel Bremson wrote: Hi all, If I write v = vector(mode="numeric",length=10) I'm still allowed to assign non-numerics to v. Furthermore, R figures out what kind of vector I've got anyway when I use the mode() function. So what is it that assigning a mode does? Joel __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub() on Matrix
Many more recent regular expression implementations have ways of indicating a match on a word boundary. It's usually "\b". Here's what you did: > gsub("x1", "i1", "x1 + x2 + x10 + xx1") [1] "i1 + x2 + i10 + xi1" The following worked for me to just change "x1" to "i1", while leaving alone any larger "word" that contains "x1": > gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1") [1] "i1 + x2 + x10 + xx1" > Note that the backslash must be escaped itself to get past the R lexical analyser, which is independent of the regexp processor. What the regexp processor sees is just a single backslash. For more on this, look for perl documentation of regular expressions. Be aware that to use full perl regexps, you must supply the perl=T argument to gsub(). Also note that "\b" seems to be part of the most basic regular expression language in R; it even works with extended=F: > gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1", perl=T) [1] "i1 + x2 + x10 + xx1" > gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1", perl=F) [1] "i1 + x2 + x10 + xx1" > gsub("\\bx1\\b", "i1", "x1 + x2 + x10 + xx1", perl=F, ext=F) [1] "i1 + x2 + x10 + xx1" > (I assumed the fact that you have a matrix of strings is not relevant.) Hope this helps, Tony Plate At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote: Hi, Suppose I've got a matrix, and the first few elements look like "x1 + x3 + x4 + x5 + x1:x3 + x1:x4" "x1 + x2 + x3 + x5 + x1:x2 + x1:x5" "x1 + x3 + x4 + x5 + x1:x3 + x1:x5" and so on (have got terms from x1 ~ x14). If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13, for example. Is there an easy way? I tried to put what I want to replace in a vector, like: repl = c("i7", "i14", "i13", "d2", "i8", "i5", "i6", "i3", "A", "i9", "i2", "i4", "i15", "i21") and have another vector, say: > orig [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" "x10" [11] "x11" "x12" "x13" "x14" Then I tried something like gsub(orig, repl, mat) ## mat is the name of my matrix but it didn't work *_*.it would replace terms like x10 with i70. (I know it may be an easy question...but I haven't done much regular expression) Cheers, Kevin Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] indexing problem
Maybe this does what you want: > dm <- cbind(1:2,11:12,101:102) > idx <- cbind(c(1,2),c(2,3)) > row(idx) [,1] [,2] [1,]11 [2,]22 > cbind(as.vector(row(idx)), as.vector(idx)) [,1] [,2] [1,]11 [2,]22 [3,]12 [4,]23 > dm[cbind(as.vector(row(idx)), as.vector(idx))] [1] 1 12 11 102 > array(dm[cbind(as.vector(row(idx)), as.vector(idx))], dim=dim(idx)) [,1] [,2] [1,]1 11 [2,] 12 102 > At Tuesday 12:43 PM 10/19/2004, you wrote: ah sorry, here's an example: > dm = cbind(1:2,11:12,101:102) > dm [,1] [,2] [,3] [1,]1 11 101 [2,]2 12 102 > idx=cbind(c(1,2),c(2,3)) > idx [,1] [,2] [1,]12 [2,]23 the result I want to get: 1 11 12 102 that is: each row of idx gives the column index in dm diana Sundar Dorai-Raj wrote: [EMAIL PROTECTED] wrote: Hi, I have the following indexing problem, can you help me please ? Given: dm = a data.frame or a matrix dm, idx = a 2 columns (or any number) matrix with the same number of rows as dm I want get a subset of dm, for each row, the columns which specified by idx. thank you, diana Diana, From what I gather it appears as if you want to split dm by all the unique rows of idx? Is that right? If so, you can do the following: x <- split(dm, do.call("paste", as.data.frame(idx)) This will split dm into a list with each element a subset of dm corresponding to a unique row in idx. The length of the x will be the number of unique rows in idx. If this is not what you want, please provide an example and what you expect to see. --sundar __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] read "4-jan-02" as date
Works fine when you give as.Date() a character vector. I suspect the Date column in your data frame is a factor. > d <- c("12-Jan-01", "11-Jan-01", "10-Jan-01", "9-Jan-01", "8-Jan-01", "5-Jan-01") > d [1] "12-Jan-01" "11-Jan-01" "10-Jan-01" "9-Jan-01" "8-Jan-01" "5-Jan-01" > as.Date(d, format="%d-%b-%y") [1] "2001-01-12" "2001-01-11" "2001-01-10" "2001-01-09" "2001-01-08" [6] "2001-01-05" > as.Date(factor(d), format="%d-%b-%y") Error in fromchar(x) : character string is not in a standard unambiguous format > Hope this helps, Tony Plate At Monday 09:04 AM 10/11/2004, bogdan romocea wrote: Dear R users, I have a column with dates (character) in a data frame: 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 and I need to convert them to (Julian) dates so that I can sort the whole data frame by date. I thought it would be very simple, but after checking the documentation and the list I still don't have something that works. 1. as.Date returns the error below. What am I doing wrong? As far as I can see the character strings are in standard format. d$Date <- as.Date(d$Date, format="%d-%b-%y") Error in fromchar(x) : character string is not in a standard unambiguous format 2. as.date {Survival} produces this error, d$Date <- as.date(d$Date, order = "dmy") Error in as.date(d$Date, order = "dmy") : Cannot coerce to date format 3. Assuming all else fails, is there a text function similar to SCAN in SAS? Given a string like "9-Jan-01" and "-" as separator, I'd like a function that can read the first, second and third values (9, Jan, 01), so that I can get Julian dates with mdy.date {survival}. Thanks in advance, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html