Re: [R] Scatterplot Showing All Points
Wayne Aldo Gavioli wrote: Hello all, I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of data with the caveat that many of the data points overlap with each other (share the same x AND y coordinates). In using the usual plot command, plot(education, xlab=etc, ylab=etc) it seems that the overlap of points is not shown in the graph. Namely, there are 5,000 points that should be plotted, as I mentioned above, but because so many of the points overlap with each other exactly, only about 50-60 points are actually plotted on the graph. Thus, there's no indication that Point A shares its coordinates with 200 other pieces of data and thus is very common while Point B doesn't share its coordinates with any other pieces of data and thus isn't common at all. Is there anyway to indicate the frequency of such points on such a graph? Should I be using a different command than plot? Hi Wayne, While this is not a really pretty picture, you can get a viewable plot with count.overplot if the first two elements of education are named x and y and they are the coordinates you want to plot. Otherwise, pass the x and y coordinates separately. library(plotrix) count.overplot(education, tol=c(diff(range(education$x))/10, diff(range(education$y))/10)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
Wayne Aldo Gavioli wgavioli at fas.harvard.edu writes: Hello all, I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of data with the caveat that many of the data points overlap with each other (share the same x AND y coordinates). In using the usual plot command, plot(education, xlab=etc, ylab=etc) it seems that the overlap of points is not shown in the graph. Namely, there are 5,000 points that should be plotted, as I mentioned above, but because so many of the points overlap with each other exactly, only about 50-60 points are actually plotted on the graph. Thus, there's no indication that Point A shares its coordinates with 200 other pieces of data and thus is very common while Point B doesn't share its coordinates with any other pieces of data and thus isn't common at all. Is there anyway to indicate the frequency of such points on such a graph? Should I be using a different command than plot? One suggestion seems to be still missing: 'sunflowerplot' of base R. May look taggy, though, if you have 200 petals. Actually the documentation of sunflowerplot is wrong in botanical sense. Sunflowers have composite flowers in capitula, and the things called 'petals' in documentation are ligulate, sterile ray-florets (each with vestigial petals which are not easily visible in sunflower, but in some other species you may see three (occasionally two) teeth). cheers, jari oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bar plot colors
Winkel, David wrote: All, I have a question regarding colors in bar plots. I want to stack a total of 18 cost values in each bar. Basically, it is six cost types and each cost type has three components- direct, indirect, and induced costs. I would like to use both solid color bars and bars with the slanted lines (using the density parameter). The colors would distinguish cost types and the lines would distinguish direct/indirect/induced. I want the cost types (i.e. colors) to be stacked together for each cost type. In other words, I don't want all of the solid bars at the bottom and all of the slanted lines at the top. So far, I have made a bar plot with all solid colors and then tried to overwrite that bar plot by calling barplot() again and putting the white slanted lines across the bars. However, I can't get this method to work while still grouping the cost types together. Hi David, This is a real challenge: heights-matrix(sample(10:70,54),ncol=3) bar.colors-rep(rep(2:7,each=3),3) bar.densities-rep(10,54) bar.angles-matrix(rep(rep(c(45,90,135),6),3),ncol=3) barplot(heights,col=bar.colors) barplot(heights,angle=bar.angles,add=TRUE,density=bar.densities) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot-How to define fill colours?
Dear R's (most likely Hadley), I want to build a stacked bar plot where I would like to define which colours will be used for each of the groups. However, I do not seem to find a way to do this, even if I've been looking over many places. I have tried several variations, and my final try was this code, but I still do not manage to get the colours as I pre-define. Any hints about how to get this? Thanks in advance, Pedro my code: plotdata1-data.frame(x=rep(factor(1:4),4), y=rep(0.1*(1:4),4), +group=as.character(rep(c('white', 'red', 'blue', 'green'),rep(4,4 plot0-ggplot() plot3-plot0+layer(data=plotdata1, mapping=aes_string(x='x',y='y', +fill='group'),geom='bar', stat='identity', position='stack') print(plot3) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] integration
Dear All, I need to perform a numerical integration of one dimensional fucntions. The extrems of integration are both finite and the functions I'm working on are quite complicated. I have already tried both area() and integrate(), but they do not perform well: area() is very slow and integrate() does not converge. Are in R other functions for numerical integration of one dimentional functions? Thanks in advance Davide Tiscali.Fax: il tuo fax online in promo fino al 31 dicembre, paghi 15€ e ricarichi 20€ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hazard ratio of interaction Cox model
Dear Forum, I have a question about interaction estimate in the Cox model: why the hazard ratio of the interaction is not produced in the summary of the model? (Instead, the estimate of the coefficient is given in the print of the model.) # Example: modINT -cph( Surv(T_BASE, T_FIN,STATUS)~ NYHA + ASINI + RFP + FE_REC + XX_PR*XX_DISF) print(modINT) coef se(coef) zp NYHA=2 1.2540.584 2.15 0.031767 ASINI 0.6650.409 1.62 0.104247 RFP=20.7250.704 1.03 0.302578 FE_REC=2-1.6370.810 -2.02 0.043331 XX_PR2.1890.649 3.37 0.000748 XX_DISF 3.2331.000 3.23 0.001222 XX_PR * XX_DISF -2.8521.280 -2.23 0.025852 summary(modINT) Effects Response : Surv(T_BASE, T_FIN, STATUS) Factor LowHigh Diff. Effect S.E. Lower 0.95 Upper 0.95 ASINI 2.0725 2.85 0.7775 0.52 0.32 -0.111.14 Hazard Ratio 2.0725 2.85 0.7775 1.68NA 0.903.13 XX_PR 0. 1.00 1. 2.19 0.65 0.923.46 Hazard Ratio 0. 1.00 1. 8.92NA 2.50 31.86 XX_DISF0. 1.00 1. 3.23 1.00 1.275.19 Hazard Ratio 0. 1.00 1. 25.35NA 3.57 179.88 NYHA - 2:1 1. 2.00 NA 1.25 0.58 0.112.40 Hazard Ratio 1. 2.00 NA 3.50NA 1.12 11.00 RFP - 2:1 1. 2.00 NA 0.73 0.70 -0.652.10 Hazard Ratio 1. 2.00 NA 2.07NA 0.528.20 FE_REC - 2:1 1. 2.00 NA -1.64 0.81 -3.23 -0.05 Hazard Ratio 1. 2.00 NA 0.19NA 0.040.95 Adjusted to: XX_PR=0 XX_DISF=0 Be a better friend, newshound, and [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-users
Kunio takezawa wrote: R-users E-mail: r-help@r-project.org I have a quenstion on gam() in gam package. The help of gam() says: 'gam' uses the _backfitting algorithm_ to combine different smoothing or fitting methods. On the other hand, lm.wfit(), which is a routine of gam.fit() contains: z - .Fortran(dqrls, qr = x * wts, n = n, p = p, y = y * wts, ny = ny, tol = as.double(tol), coefficients = mat.or.vec(p, ny), residuals = y, effects = mat.or.vec(n, ny), rank = integer(1), pivot = 1:p, qraux = double(p), work = double(2 * p), PACKAGE = base) It may indicate that QR decomposition is used to derive an additive model instead of backfitting. I am wondering if my guess is correct, or this the _backfitting algorithm has another meaning. Please don't ask the same question multiple times! And no, backfitting and QR are unrelated concepts. You need to read up on the theory, there are two fundamental books: Hastie Tibshirani (gam package) and Simon Wood (mgcv package). Both are a bit much to ask to have summarized in email. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] axis names in triangle.plot
Hi folks, I am using a data.frame with sediment grain sizes: grain sand silt clay OAT 10.03 56.77 18.25 OAT 10.40 57.40 17.94 WG1 50.03 20.68 12.57 WG1 43.20 25.69 13.41 WG1 33.89 31.10 14.48 WG2 2.84 62.81 20.79 WG2 2.79 60.46 19.16 WG2 16.27 33.04 6.48 WG2 1.39 57.90 9.13 WG3 4.54 52.91 17.20 WG3 5.20 50.55 15.65 WG3 7.71 49.13 10.80 WG3 4.43 50.03 11.83 WG3 1.72 57.53 14.20 WG3 1.51 58.99 13.96 I would like to do a trinagle.plot with labeled axis-names sand silt and clay. However using the command: tringle.plot(grain) from ade4-apckage) does not plot the axis names and there is no paramter to set the labels like xlab in the plot command. Does anybody has a good advice? Thanks in advance Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and precision?
On Mon, 17 Dec 2007, Moshe Olshansky wrote: Dear List, Following the below question I have a question of my own: Suppose that I have large matrices which are produced sequentially and must be used sequentially in the reverse order. I do not have enough memory to store them and so I would like to write them to disk and then read them. This raises two questions: 1) what is the fastest (and the most economic space-wise) way to do this? Using save/load is the simplest. Don't worry about finding better solutions until you know those are not good enough. (serialize / unserialize is another interface to the same underlying idea.) 2) functions like write, write.table, etc. write the data the way it is printed and this may result in a loss of accuracy. Is there any way to prevent this, except for setting the digits option to a higher value or using format prior to writing the data? Do please read the help before making false claims. ?write.table says Real and complex numbers are written to the maximal possible precision. OTOH, ?write says it is a wrapper for cat, whose help says 'cat' converts numeric/complex elements in the same way as 'print' (and not in the same way as 'as.character' which is used by the S equivalent), so 'options' 'digits' and 'scipen' are relevant. However, it uses the minimum field width necessary for each element, rather than the same field width for all elements. so this hints as.character() might be a useful preprocessor. Is it possible to write binary files (similar to Fortran)? See ?writeBin. save/load by default write binary files, but use of writeBin can be faster (and less flexible). Any suggestion will be greatly appreciated. Somehow you have missed a great deal of information about R I/O. Try help.start() and reading the sections the search engine shows you that look relevant. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Res: Scatterplot Showing All Points
?jitter is simpler: x-rep(1:10,10) y-x plot(x,y) #100 points, only 10 show plot(jitter(x),jitter(y)) #overlap removed. Milton Cezar Ribeiro [EMAIL PROTECTED] 18/12/2007 04:36 Hi Wayne, I have two suggestion to you. 1. You add some random noise on both x and y data or 2. You graph bubble points, where the size is proportional to the frequence of the xy combination. x-sample(1:10,1,replace=T) y-sample(1:10,1,replace=T) xy-cbind(x,y) x11(1400,800) par(mfrow=c(1,3)) plot(xy) xy.random-xy+rnorm(2,0.1,0.1) plot(xy.random,cex=0.1) xy.tab-data.frame(table(x,y)) xy.tab$x-as.numeric(as.character(xy.tab$x)) xy.tab$y-as.numeric(as.character(xy.tab$y)) min.freq-min(xy.tab$Freq) max.freq-max(xy.tab$Freq) plot(xy.tab$x,xy.tab$y,cex=(xy.tab$Freq-min.freq)/(max.freq-min.freq)*5) Kind regards, Miltinho Brazil - Mensagem original De: Wayne Aldo Gavioli [EMAIL PROTECTED] Para: r-help@r-project.org Enviadas: Segunda-feira, 17 de Dezembro de 2007 22:14:23 Assunto: [R] Scatterplot Showing All Points Hello all, I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of data with the caveat that many of the data points overlap with each other (share the same x AND y coordinates). In using the usual plot command, plot(education, xlab=etc, ylab=etc) it seems that the overlap of points is not shown in the graph. Namely, there are 5,000 points that should be plotted, as I mentioned above, but because so many of the points overlap with each other exactly, only about 50-60 points are actually plotted on the graph. Thus, there's no indication that Point A shares its coordinates with 200 other pieces of data and thus is very common while Point B doesn't share its coordinates with any other pieces of data and thus isn't common at all. Is there anyway to indicate the frequency of such points on such a graph? Should I be using a different command than plot? Thanks, Wayne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. para armazenamento! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axis names in triangle.plot
Hi Thomas, This looks quite strange. By default, the function use the column names as labels. What is grain ? A data.frame ? Why have you duplicated row.names? Please return the results of class(grain) and names(grain) Cheers, PS: If you have questions related to ade4, you can use the adelist (http://listes.univ-lyon1.fr/wws/info/adelist) Thomas Hoffmann wrote: Hi folks, I am using a data.frame with sediment grain sizes: grain sand silt clay OAT 10.03 56.77 18.25 OAT 10.40 57.40 17.94 WG1 50.03 20.68 12.57 WG1 43.20 25.69 13.41 WG1 33.89 31.10 14.48 WG2 2.84 62.81 20.79 WG2 2.79 60.46 19.16 WG2 16.27 33.04 6.48 WG2 1.39 57.90 9.13 WG3 4.54 52.91 17.20 WG3 5.20 50.55 15.65 WG3 7.71 49.13 10.80 WG3 4.43 50.03 11.83 WG3 1.72 57.53 14.20 WG3 1.51 58.99 13.96 I would like to do a trinagle.plot with labeled axis-names sand silt and clay. However using the command: tringle.plot(grain) from ade4-apckage) does not plot the axis names and there is no paramter to set the labels like xlab in the plot command. Does anybody has a good advice? Thanks in advance Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stéphane DRAY ([EMAIL PROTECTED] ) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88 http://biomserv.univ-lyon1.fr/~dray/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dual Core vs Quad Core
Hiding in the windows faq is the observation that R's computation is single-threaded, and so it cannot use more than one CPU. So multi-core should make no difference other than allowing R to run with less interruption from other tasks. That is often a significant advantage, though. Andrew Perrin [EMAIL PROTECTED] 18/12/2007 01:13 On Mon, 17 Dec 2007, Kitty Lee wrote: Dear R-users, I use R to run spatial stuff and it takes up a lot of ram. Runs can take hours or days. I am thinking of getting a new desktop. Can R take advantage of the dual-core system? I have a dual-core computer at work. But it seems that right now R is using only one processor. The new computers feature quad core with 3GB of RAM. Can R take advantage of the 4 chips? Or am I better off getting a dual core with faster processing speed per chip? Thanks! Any advice would be really appreciated! K. If I have my information right, R will use dual- or quad-cores if it's doing two (or four) things at once. The second core will help a little bit insofar as whatever else your machine is doing won't interfere with the one core on which it's running, but generally things that take a single thread will remain on a single core. As for RAM, if you're doing memory-bound work you should certainly be using a 64-bit machine and OS so you can utilize the larger memory space. -- Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu Associate Professor of Sociology; Book Review Editor, _Social Forces_ University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gene shaving method
Gene shaving is implemented in the GeneClust package for R which you can download from http://odin.mdacc.tmc.edu/~kim/geneclust/ For more details see The Analysis of Gene Expression Data: Methods and Software edited by Giovanni Parmiagiani, Elizabeth S. Garett, Rafael A. Irizarry and Scott L. Zeger (you can get a preview on Google Book Search http://books.google.com/books?id=r9gROQvdelcCpg=PA352lpg=PA352dq=gene clustsource=webots=3FO3jQlfrpsig=BeOgUK2cgfuv7d12vWsvpRXOkBU#PPA353,M 1) Best wishes, Heather -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Aimin Yan Sent: 17 December 2007 19:51 To: r-help@r-project.org Subject: [R] gene shaving method Does anyone know if Hastie's gene shaving method is implemented in R Thanks, Aimin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] accessing dimension names
I have a matrix y: dimnames(y) $x93 [1] 1 2 $x94 [1] 0 1 2 .. so on (there are other dimensions as well) I need to access a particular dimension, but a random mechanism tells me which dimension it would. So, sometimes I might need to access dimnames(y)$x93, some other time it would be dimnames(y)$x94.. and so on. Now let that random dimension be idx, then dimnames(y)$paste('x',idx,sep='') doesn't work. Can anyone help? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two repeated warnings when running gam(mgcv) to analyse my dataset?
The model here is just a penalised GLM, and the warnings relate to the GLM fitting process. Fitted probabilities of 0 or 1 can be perfectly appropriate, but do indicate that the linear predictor is not really uniquely defined, and that some care may be needed in interpreting results (for example, if the fitted probabilities are zero or one, then a CI for the corresponding linear predictor will depend more on the prior assumptions about smoothness than anything else). This problem is not really GAM specific, it relates to any `logistic regression' model. Similarly, the GLM fitting IRLS iterations are not guaranteed to converge, and can fail, especially for overly flexible logistic regression models. Try this, for example x - 1:10 y - c(0,0,0,0,0,1,1,1,1,1) glm(y~x,family=binomial) I get... ... Warning messages: 1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : algorithm did not converge 2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted probabilities numerically 0 or 1 occurred ...as models become more complex the scope for this sort of thing to happen increases, and some simplification may be appropriate. That said, mgcv::gam fitting with all smoothing parameters fixed, is slightly more likely to fail in this way than `glm' or `mgcv::gam' with some smoothing parameters estimated, because of the steps taken to stabilise divergent fit iterations. When all smoothing parameters are fixed, mgcv uses older fitting routines that don't try as hard to stabilise a divergent fit as the newer fitting routines. This is a bit of an anomaly and I'll try and fix it for a future release. best, Simon On Monday 17 December 2007 11:53, zhijie zhang wrote: Dear Simon, Sorry for an incomplete listing of the question. #mgcv version is 1.3-29, R 2.6.1, windows XP #m.gam-gam(mark~s(x)+s(y)+s(lstday2004)+s(ndvi2004)+s(slope)+s(elevation)+ disbinary,family=binomial(logit),data=point) The above program's the core codes in my following loop programs. It works well if i run the above codes only one time for my dataset, but warnings will occur if i run many times for the following loop. while (j1001) { + index=sample(ID, replace=F) + m.data$x=coords[index,]$x + m.data$y=coords[index,]$y + # For each permutation, we run the GAM using the optimal span for the above model m.gam + s.gam -gam(mark~s(x)+s(y)+s(lstday2004)+s(ndvi2004)+s(slope)+s(elevation)+disbin ary,,sp=c( 5.582647e-07,4.016504e-02,2.300424e-04,1.274065e+03,9.558236e-09, 1.868827e-08),family=binomial(logit),data=m.data) + permresults[,i]=predict.gam(s.gam) + i=i+1 + if (j%%100==0) print(i) + j=j+1 + } [1] 101 [1] 201 [1] 301 [1] 401 [1] 501 [1] 601 [1] 701 [1] 801 [1] 901 [1] 1001 warnings() over 50 warnings() 1: In gam.fit(G, family = G$family, control = control, gamma = gamma, ... : fitted probabilities numerically 0 or 1 occurred .. 14: In gam.fit(G, family = G$family, control = control, gamma = gamma, ... Algorithm did not converge .. On Dec 17, 2007 4:54 PM, Simon Wood [EMAIL PROTECTED] wrote: What mgcv version are you running (and on what platform)? n Thursday 13 December 2007 17:46, zhijie zhang wrote: Dear all, I run the GAMs (generalized additive models) in gam(mgcv) using the following codes. m.gam -gam(mark~s(x)+s(y)+s(lstday2004)+s(ndvi2004)+s(slope)+s(elevation)+disb in ary,family=binomial(logit),data=point) And two repeated warnings appeared. Warnings: 1: In gam.fit(G, family = G$family, control = control, gamma = gamma, ... : Algorithm did not converge 2: In gam.fit(G, family = G$family, control = control, gamma = gamma, ... : fitted probabilities numerically 0 or 1 occurred Q1: For warning1, could it be solved by changing the value of mgcv.toloptions for gam.control(mgcv.tol=1e-7)? Q1: For warning2, is there any impact for the results if the fitted probabilities numerically 0 or 1 occurred ? How can i solve it? I didn't try the possible solutions for them, because it took such a longer time to run the whole programs. Could anybody suggest their solutions? Any help or suggestions are greatly appreciated. Thanks. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axis names in triangle.plot
Thanks for this advice. grain was not a data.frame but a matrix. Now it works:-) Cheers Thomas Stéphane Dray schrieb: Hi Thomas, This looks quite strange. By default, the function use the column names as labels. What is grain ? A data.frame ? Why have you duplicated row.names? Please return the results of class(grain) and names(grain) Cheers, PS: If you have questions related to ade4, you can use the adelist (http://listes.univ-lyon1.fr/wws/info/adelist) Thomas Hoffmann wrote: Hi folks, I am using a data.frame with sediment grain sizes: grain sand silt clay OAT 10.03 56.77 18.25 OAT 10.40 57.40 17.94 WG1 50.03 20.68 12.57 WG1 43.20 25.69 13.41 WG1 33.89 31.10 14.48 WG2 2.84 62.81 20.79 WG2 2.79 60.46 19.16 WG2 16.27 33.04 6.48 WG2 1.39 57.90 9.13 WG3 4.54 52.91 17.20 WG3 5.20 50.55 15.65 WG3 7.71 49.13 10.80 WG3 4.43 50.03 11.83 WG3 1.72 57.53 14.20 WG3 1.51 58.99 13.96 I would like to do a trinagle.plot with labeled axis-names sand silt and clay. However using the command: tringle.plot(grain) from ade4-apckage) does not plot the axis names and there is no paramter to set the labels like xlab in the plot command. Does anybody has a good advice? Thanks in advance Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reshape Dataframe
Hi, I'm having a bit of problems in creating a new dataframe. Below you'll find a description of the current dataframe and of the dataframe that needs to be created. Can someone help me out on this one? Thx in advance. Bert Current Dataframe Var1Var2Var3Var4 A Fa W1 1 A Si W1 2 A Fa W2 3 A Si W3 4 B Si W1 5 C La W2 6 C Do W4 7 New Dataframe Var1Var2W1 W2 W3 W4 A Fa 1 3 A Si 2 4 A La A Do B Fa B Si 5 B La B Do C Fa C Si C La 6 C Do 7 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshape Dataframe
On 12/18/07, Bert Jacobs [EMAIL PROTECTED] wrote: Hi, I'm having a bit of problems in creating a new dataframe. Below you'll find a description of the current dataframe and of the dataframe that needs to be created. Can someone help me out on this one? library(reshape) dfm - melt(df, id = 1:3) cast(dfm, ... ~ Var3) You can find out more about the reshape package at http://had.co.nz/reshape Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 - getting at the grobs
Dear All, I continue trying to get several of my plotting functions to use ggplot, because I really do like the concept of the graphical objects, and working with them in the abstract. I am now trying to access the grobs to manipulate using grid. However, until now all I managed was to get the plot as a gTree object, and manipulate it as a gTree from there. The problem is that then it is no longer a ggplot, and thus I can no longer use the ggplot functions. How to get at the grobs, without converting the ggplot into a gTree? Thanks, Pedro __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integration
Hi Davide, It is difficult to say what the problem is without knowing more about the nature of the integrand. So, you should do a couple of preliminary things before attempting compute the integral. First, is the integral is finite? You should establish this. Second, plot the integrand over the entire interval. Then you need to think about the following: Is the integrand unimodal, with the mass concentrated over a small region? Or is it multimodal? Does it have thick tail? Assuming that the integral is finite, you could try a few things: 1. Divide the interval of integration into several small intervals (say, 10 or 100), and then use integrate() on each and then add up the results. You can make this process more efficient if you know where the mass is concentrated. 2. Transform the integrand. 3. Try a simple trapezoidal rule quadrature. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, December 17, 2007 11:03 AM To: r-help@r-project.org Subject: [R] integration Dear All, I need to perform a numerical integration of one dimensional fucntions. The extrems of integration are both finite and the functions I'm working on are quite complicated. I have already tried both area() and integrate(), but they do not perform well: area() is very slow and integrate() does not converge. Are in R other functions for numerical integration of one dimentional functions? Thanks in advance Davide Tiscali.Fax: il tuo fax online in promo fino al 31 dicembre, paghi 15€ e ricarichi 20€ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshape Dataframe
On Dec 18, 2007 9:07 AM, Bert Jacobs [EMAIL PROTECTED] wrote: Hi, I'm having a bit of problems in creating a new dataframe. Below you'll find a description of the current dataframe and of the dataframe that needs to be created. Can someone help me out on this one? Thx in advance. Bert Current Dataframe Var1Var2Var3Var4 A Fa W1 1 A Si W1 2 A Fa W2 3 A Si W3 4 B Si W1 5 C La W2 6 C Do W4 7 New Dataframe Var1Var2W1 W2 W3 W4 A Fa 1 3 A Si 2 4 A La A Do B Fa B Si 5 B La B Do C Fa C Si C La 6 C Do 7 Try this: out - ftable(xtabs(Var4 ~ Var1 + Var2 + Var3, DF)) out[out == 0] - NA Omit the last line is 0 fill is what you had wanted. This will do it except that it will eliminate all rows without data: out2 - reshape(DF, dir = wide, timevar = Var3, idvar = c(Var1, Var2)) out2[is.na(out2)] - 0 Omit the last line if NA fill is what you wanted. The reshape package melt/cast routines (see Hadley's solution in this thread) can be used to give a similar result to the reshape command above (i.e. all missing rows are not included) except that cast is a bit more flexible since it has a fill= argument. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
Duncan Murdoch wrote: On 18/12/2007 7:31 AM, Antony Unwin wrote: Wayne, Try the iplot command in iPlots. You can then vary both the pointsize and the transparency of your scatterplot interactively and decide which scatterplot conveys the information best. Sometimes it's helpful to use more than one scatterplot when presenting your results. (I must admit to being very surprised that jittering and sunflower plots have been suggested for a dataset of 5000 points. Do those who mentioned these methods have examples on that scale where they are effective?) Sure. The original post said there were about 50-60 unique locations. This plot: x - rbinom(5000, 20, 0.15) y - rbinom(5000, 20, 0.15) plot(x,y) has a few more unique locations; tune those probabilities if you want it closer. Due to the overlap, the distribution is very unclear. But this plot plot(jitter(x), jitter(y)) Another alternative is smoothscatter() in the geneplotter package from Bioconductor, which does a pretty reasonable job with these example data. Best, Jim makes the distribution quite clear. I wouldn't use the default pch if I had 5 points, but with pch=., it's not so bad even in that case. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshape Dataframe
On Dec 18, 2007 9:54 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: On Dec 18, 2007 9:07 AM, Bert Jacobs [EMAIL PROTECTED] wrote: Hi, I'm having a bit of problems in creating a new dataframe. Below you'll find a description of the current dataframe and of the dataframe that needs to be created. Can someone help me out on this one? Thx in advance. Bert Current Dataframe Var1Var2Var3Var4 A Fa W1 1 A Si W1 2 A Fa W2 3 A Si W3 4 B Si W1 5 C La W2 6 C Do W4 7 New Dataframe Var1Var2W1 W2 W3 W4 A Fa 1 3 A Si 2 4 A La A Do B Fa B Si 5 B La B Do C Fa C Si C La 6 C Do 7 Try this: out - ftable(xtabs(Var4 ~ Var1 + Var2 + Var3, DF)) out[out == 0] - NA Omit the last line is 0 fill is what you had wanted. This will do it except that it will eliminate all rows without data: out2 - reshape(DF, dir = wide, timevar = Var3, idvar = c(Var1, Var2)) out2[is.na(out2)] - 0 Omit the last line if NA fill is what you wanted. The reshape package melt/cast routines (see Hadley's solution in this thread) can be used to give a similar result to the reshape command above (i.e. all missing rows are not included) except that cast is a bit more flexible since it has a fill= argument. Just one correction. The cast function in reshape has an add.missing= argument that can control this so actually any of the solutions could be obtained with cast using the fill= and add.missing= arguments to control which one you want. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
On 18 Dec 2007, at 2:42 pm, Duncan Murdoch wrote: (I must admit to being very surprised that jittering and sunflower plots have been suggested for a dataset of 5000 points. Do those who mentioned these methods have examples on that scale where they are effective?) Sure. The original post said there were about 50-60 unique locations. This plot: x - rbinom(5000, 20, 0.15) y - rbinom(5000, 20, 0.15) plot(x,y) has a few more unique locations; tune those probabilities if you want it closer. Due to the overlap, the distribution is very unclear. But this plot plot(jitter(x), jitter(y)) makes the distribution quite clear. No it doesn't! It makes it moderately clearer than the plot without jittering. One good alternative here is the fluctuation diagram variant of a mosaic plot: xx-as.factor(x) yy-as.factor(y) imosaic(xx,yy, type=f) Using jittering for categorical data is really not to be recommended and will certainly degrade in performance as the dataset gets bigger. Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, University of Augsburg, Germany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
On 18/12/2007 10:01 AM, Antony Unwin wrote: On 18 Dec 2007, at 2:42 pm, Duncan Murdoch wrote: (I must admit to being very surprised that jittering and sunflower plots have been suggested for a dataset of 5000 points. Do those who mentioned these methods have examples on that scale where they are effective?) Sure. The original post said there were about 50-60 unique locations. This plot: x - rbinom(5000, 20, 0.15) y - rbinom(5000, 20, 0.15) plot(x,y) has a few more unique locations; tune those probabilities if you want it closer. Due to the overlap, the distribution is very unclear. But this plot plot(jitter(x), jitter(y)) makes the distribution quite clear. No it doesn't! It makes it moderately clearer than the plot without jittering. One good alternative here is the fluctuation diagram variant of a mosaic plot: xx-as.factor(x) yy-as.factor(y) imosaic(xx,yy, type=f) That plot is better than jittering, but there's the problem in the mosaic plot of understanding the scale of the rectangles: is it area or diameter that encodes the count? With a jittered plot, you lose resolution when the number of points gets too high because you just see a mess of ink, but at least you only require the viewer to count in order to get a close numerical reading from the plot. I could also claim that while imperfect, at least jittering is widely applicable. For example, if the data were not on a regular grid, perhaps because they had been generated like this: xloc - rnorm(50) yloc - rnorm(50) index - sample(1:50, 5000, rep=TRUE, prob = abs(xloc)) x - xloc[index] y - yloc[index] then jittering still works as well (or as poorly), but the imosaic would not work at all. There are better plots than jittering available, but jittering is easy. (Actually, with this dataset, plot(jitter(x), jitter(y)) is really poor, because jitter() chooses a bad amount of jittering. But with manual tuning (e.g. plot(jitter(x, a=0.1), jitter(y, a=0.1), pch=.)) it's not too bad. So I'd say jittering worked, but the R implementation of it may need improvement). Using jittering for categorical data is really not to be recommended and will certainly degrade in performance as the dataset gets bigger. Yes, I probably wouldn't recommend jittering if there were more than a few hundred replications at any point, or more than a few hundred unique points. Duncan Murdoch P.S. iplots 1.1-1 may have an init problem in Windows: in my first attempt, the plot made the boxes too large to fit in their cells, but it fixed itself when I resized the window, and the bug doesn't seem to be repeatable. Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, University of Augsburg, Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
On 18/12/2007 10:02 AM, James W. MacDonald wrote: Duncan Murdoch wrote: On 18/12/2007 7:31 AM, Antony Unwin wrote: Wayne, Try the iplot command in iPlots. You can then vary both the pointsize and the transparency of your scatterplot interactively and decide which scatterplot conveys the information best. Sometimes it's helpful to use more than one scatterplot when presenting your results. (I must admit to being very surprised that jittering and sunflower plots have been suggested for a dataset of 5000 points. Do those who mentioned these methods have examples on that scale where they are effective?) Sure. The original post said there were about 50-60 unique locations. This plot: x - rbinom(5000, 20, 0.15) y - rbinom(5000, 20, 0.15) plot(x,y) has a few more unique locations; tune those probabilities if you want it closer. Due to the overlap, the distribution is very unclear. But this plot plot(jitter(x), jitter(y)) Another alternative is smoothscatter() in the geneplotter package from Bioconductor, which does a pretty reasonable job with these example data. Yes, I agree. (As an aside, there's actually a capital S in smoothScatter(), and it's a bit of a pain to install, because geneplotter depends on something that depends on DBI, which is not so easily available these days.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
Duncan Murdoch wrote: Yes, I agree. (As an aside, there's actually a capital S in smoothScatter(), and it's a bit of a pain to install, because geneplotter depends on something that depends on DBI, which is not so easily available these days.) Somehow I always forget the capital S and wonder if I have loaded the correct package ;-D As for installing the required dependencies, I believe this is actually quite straightforward: source(http://www.bioconductor.org/biocLite.R;) biocLite(geneplotter) Should install geneplotter and all required dependencies. Best, Jim Duncan Murdoch -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bar plot colors
I think you're going to find that barchart with that many values in a bar is going to be pretty well uninterpretable. Jim Lemon gives the desired barchart but it is very difficult to read. Stealing his code to create the same matrix I'd suggest may be looking at a dotchart. I'm not sure if this is even close to an optimal solution but I do think it's a bit better than a barchart approach == heights-matrix(sample(10:70,54),ncol=3) bar.colors-rep(rep(2:7,each=3),3) cost.types - c(Direct, Indirec, Induced) colnames(heights) - c(A, B, C) rownames(heights) - c(rep(cost.types, 6)) dotchart(heights, col=bar.colors, pch=16, cex=.6) === --- Winkel, David [EMAIL PROTECTED] wrote: All, I have a question regarding colors in bar plots. I want to stack a total of 18 cost values in each bar. Basically, it is six cost types and each cost type has three components- direct, indirect, and induced costs. I would like to use both solid color bars and bars with the slanted lines (using the density parameter). The colors would distinguish cost types and the lines would distinguish direct/indirect/induced. I want the cost types (i.e. colors) to be stacked together for each cost type. In other words, I don't want all of the solid bars at the bottom and all of the slanted lines at the top. So far, I have made a bar plot with all solid colors and then tried to overwrite that bar plot by calling barplot() again and putting the white slanted lines across the bars. However, I can't get this method to work while still grouping the cost types together. Thanks in advance for any help you can provide. David Winkel Applied Biology and Aerosol Technology Battelle Memorial Institute 505 King Ave. Columbus, Ohio 43201 614.424.3513 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
On 18 Dec 2007, at 4:49 pm, Duncan Murdoch wrote: One good alternative here is the fluctuation diagram variant of a mosaic plot: xx-as.factor(x) yy-as.factor(y) imosaic(xx,yy, type=f) That plot is better than jittering, but there's the problem in the mosaic plot of understanding the scale of the rectangles: is it area or diameter that encodes the count? Area is used. With a jittered plot, you lose resolution when the number of points gets too high because you just see a mess of ink, but at least you only require the viewer to count in order to get a close numerical reading from the plot. If someone needs a count, they should be given a table. Graphics are for qualitative conclusions not details. Anyway, counting will only work for really small datasets. I could also claim that while imperfect, at least jittering is widely applicable. For example, if the data were not on a regular grid, perhaps because they had been generated like this: xloc - rnorm(50) yloc - rnorm(50) index - sample(1:50, 5000, rep=TRUE, prob = abs(xloc)) x - xloc[index] y - yloc[index] then jittering still works as well (or as poorly), but the imosaic would not work at all. That's right and that's (almost) the sort of example I was thinking of. For a limited number of locations like this a bubble plot would be best (which has already been suggested in this thread, I think). For many locations and few replications I would still go for varying pointsize and transparency. Incidentally, to check your suggestion I ran your code and discovered that the transparency in iplot does not seem to like replications. Very strange, we'll have to check why. I then looked closely at the numbers of replications generated and discovered that case 25 was picked 325 times and case 40 only once. Rather too extreme for my liking! Running it again gave very similar results, though not exactly the same: this time it was 325 times for case 25 and case 40 was not picked at all. Other numbers varied slightly. This is not what I expected, any ideas? P.S. iplots 1.1-1 may have an init problem in Windows: in my first attempt, the plot made the boxes too large to fit in their cells, but it fixed itself when I resized the window, and the bug doesn't seem to be repeatable. Thanks. This happens occasionally on the Mac too. Refreshing solves it in practice, but we need to find out why it can happen (and stop it happening!). Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, University of Augsburg, Germany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating the number of days from dates
Sorry for using library instead package, but library() is one command for using packages. ... which is why all efforts to make folks say package instead of library are doomed to fail, IMHO. Besides, in English, library also means a collection of software or data usually reflecting a specific theme or application (#9 on the list from http://dictionary.reference.com/ ). Therefore: library == package [1] TRUE! and just about the only way to clear up the confusion would be to rename library() to package(), and replace library with folder or directory. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Knut Krueger Sent: Monday, December 17, 2007 2:11 AM To: 'R R-help' Subject: Re: [R] calculating the number of days from dates it's a package , not a library, please! Sorry for using library instead package, but library() is one command for using packages. Therefore I (and it seems that i am not the only one) used library instead package. Knut __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave and Scientific Workplace
Dear HelpeRs, a colleague of mine uses Scientific Workplace to write his LaTeX documents. I made his mouth water mentioning the advantages of using Sweave. Not using SW myself I wonder if anyone out there has gathered some experiences in using the combination of both. Thank you in advance Dietrich -- Dietrich Trenkler c/o Universitaet Osnabrueck Rolandstr. 8; D-49069 Osnabrueck, Germany email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
On 12/18/2007 11:21 AM, James W. MacDonald wrote: Duncan Murdoch wrote: Yes, I agree. (As an aside, there's actually a capital S in smoothScatter(), and it's a bit of a pain to install, because geneplotter depends on something that depends on DBI, which is not so easily available these days.) Somehow I always forget the capital S and wonder if I have loaded the correct package ;-D As for installing the required dependencies, I believe this is actually quite straightforward: source(http://www.bioconductor.org/biocLite.R;) biocLite(geneplotter) Should install geneplotter and all required dependencies. Yes, that works. Not sure why DBI was unavailable for a simple install of geneplotter from the Windows Rgui; when I try it now (on a different PC, maybe using a different mirror) it's there. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
On 12/18/2007 12:44 PM, Antony Unwin wrote: On 18 Dec 2007, at 4:49 pm, Duncan Murdoch wrote: One good alternative here is the fluctuation diagram variant of a mosaic plot: xx-as.factor(x) yy-as.factor(y) imosaic(xx,yy, type=f) That plot is better than jittering, but there's the problem in the mosaic plot of understanding the scale of the rectangles: is it area or diameter that encodes the count? Area is used. With a jittered plot, you lose resolution when the number of points gets too high because you just see a mess of ink, but at least you only require the viewer to count in order to get a close numerical reading from the plot. If someone needs a count, they should be given a table. Graphics are for qualitative conclusions not details. Anyway, counting will only work for really small datasets. I could also claim that while imperfect, at least jittering is widely applicable. For example, if the data were not on a regular grid, perhaps because they had been generated like this: xloc - rnorm(50) yloc - rnorm(50) index - sample(1:50, 5000, rep=TRUE, prob = abs(xloc)) x - xloc[index] y - yloc[index] then jittering still works as well (or as poorly), but the imosaic would not work at all. That's right and that's (almost) the sort of example I was thinking of. For a limited number of locations like this a bubble plot would be best (which has already been suggested in this thread, I think). For many locations and few replications I would still go for varying pointsize and transparency. Incidentally, to check your suggestion I ran your code and discovered that the transparency in iplot does not seem to like replications. Very strange, we'll have to check why. I then looked closely at the numbers of replications generated and discovered that case 25 was picked 325 times and case 40 only once. Rather too extreme for my liking! Running it again gave very similar results, though not exactly the same: this time it was 325 times for case 25 and case 40 was not picked at all. Other numbers varied slightly. This is not what I expected, any ideas? abs(xloc) typically varies by a factor of about 100 from smallest to largest, but sometimes the small end is really small, and so the ratio is really big. Duncan Murdoch P.S. iplots 1.1-1 may have an init problem in Windows: in my first attempt, the plot made the boxes too large to fit in their cells, but it fixed itself when I resized the window, and the bug doesn't seem to be repeatable. Thanks. This happens occasionally on the Mac too. Refreshing solves it in practice, but we need to find out why it can happen (and stop it happening!). Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, University of Augsburg, Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R brakes when submitting a query to MySQL
Hello, I would like to retrieve data stored in MySQL database, so I installed RMySQL package. I can successfully connect with the my database using the following code dvr-dbDriver(MySQL) con2-dbConnect(dvr,group=exbardiv) mysqlDescribeConnection(con2) MySQLConnection:(972,0) User: mmorag Host: localhost Dbname: exbardiv Connection type: localhost via TCP/IP No resultSet available I can even see the tables in the database dbListTables(con2) [1] agouebhigh_ld rescuesjlc_info sjlc_ld temp [7] temp_snp1 temp_snp2 However, when I try to query the database, R breakes. res-dbSendQuery(con,'select * from sjlc_ld') Can anyone help me tune up the connection between R and MySQL? Thank you, Marc. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [EMAIL PROTECTED] quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R brakes when submitting a query to MySQL
Is it your use of 'con' rather than 'con2' in dbSendQuery? -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Moragues Sent: Tuesday, December 18, 2007 1:14 PM To: r-help@r-project.org Subject: [R] R brakes when submitting a query to MySQL Hello, I would like to retrieve data stored in MySQL database, so I installed RMySQL package. I can successfully connect with the my database using the following code dvr-dbDriver(MySQL) con2-dbConnect(dvr,group=exbardiv) mysqlDescribeConnection(con2) MySQLConnection:(972,0) User: mmorag Host: localhost Dbname: exbardiv Connection type: localhost via TCP/IP No resultSet available I can even see the tables in the database dbListTables(con2) [1] agouebhigh_ld rescuesjlc_info sjlc_ld temp [7] temp_snp1 temp_snp2 However, when I try to query the database, R breakes. res-dbSendQuery(con,'select * from sjlc_ld') Can anyone help me tune up the connection between R and MySQL? Thank you, Marc. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER:\ \ This email is from the Scottish Crop Rese...{{dropped:30}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
Another approach which I'm pleased with but was not suggested so far is jitter + kde2d from MASS: plot(jitter(x), jitter(y)) if (!exists(kde2d)) require(MASS) kdesamp - 2 #depending on your RAM forkde - if (kdesamp length(x)) sample(1:length(x), kdesamp, replace=FALSE) else 1:length(x) d - kde2d(x[forkde], y[forkde]) contour(d, add=TRUE) -Original Message- From: [EMAIL PROTECTED] Subject: Re: [R] Scatterplot Showing All Points __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R brakes when submitting a query to MySQL
You are right, it was a mistake copying and pasting the code. There is no error message from R when I run con2. I get a Windows error message saying R for windows terminal front-end has encountered a problem and need to close. The error signature is: AppName: rterm.exe AppVer: 2.60.43063.0ModName: msvcrt.dll ModVer: 7.0.2600.2180Offset: 000378c0 Marc. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zembower, Kevin Sent: 18 December 2007 18:18 To: r-help@r-project.org Subject: Re: [R] R brakes when submitting a query to MySQL Is it your use of 'con' rather than 'con2' in dbSendQuery? -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Moragues Sent: Tuesday, December 18, 2007 1:14 PM To: r-help@r-project.org Subject: [R] R brakes when submitting a query to MySQL Hello, I would like to retrieve data stored in MySQL database, so I installed RMySQL package. I can successfully connect with the my database using the following code dvr-dbDriver(MySQL) con2-dbConnect(dvr,group=exbardiv) mysqlDescribeConnection(con2) MySQLConnection:(972,0) User: mmorag Host: localhost Dbname: exbardiv Connection type: localhost via TCP/IP No resultSet available I can even see the tables in the database dbListTables(con2) [1] agouebhigh_ld rescuesjlc_info sjlc_ld temp [7] temp_snp1 temp_snp2 However, when I try to query the database, R breakes. res-dbSendQuery(con,'select * from sjlc_ld') Can anyone help me tune up the connection between R and MySQL? Thank you, Marc. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER:\ \ This email is from the Scottish Crop Rese...{{dropped:30}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify [EMAIL PROTECTED] quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA - cov.wt(z) : 'x' must contain finite values only
The problem is the missing values. The argument na.action is not active in princomp(), which I think is a bug, even though the help page claims that factory fresh default is na.omit. So, you need to either get rid of the rows with any missing values in them, or use a PCA code that can deal with missing values by somehow imputing them. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Johnson, Bethany Sent: Tuesday, December 18, 2007 1:14 PM To: r-help@r-project.org Subject: [R] PCA - cov.wt(z) : 'x' must contain finite values only I am trying to run PCA on a matrix (the first column and row are headers). There are several cells with NA's. When I run PCA with the following code: __ setwd(I:/PCA) AsianProp-read.csv(Matrix.csv, sep=,, header=T, row.names=1) attach(AsianProp) AsianProp AsianProp.pca-princomp(AsianProp, na.omit) _ I get the error message: cov.wt(z) : 'x' must contain finite values only What am I doing wrong? Thanks very much! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scatterplot3d model reporting question
I've used the scatterplot3d function to graph some data and had it graph a smooth fit. Is there a way to actualy find out the function of the surface? I've looked through the help and figured out how to get it to report the following: Family: gaussian Link function: identity Formula: y ~ s(x, z) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.207500.01223 16.97 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Approximate significance of smooth terms: edf Est.rank F p-value s(x,z) 8.403 17 9.729 1.76e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = 0.684 Deviance explained = 70.9% GCV score = 0.017692 Scale est. = 0.016151 n = 108 But I'm still not really sure what I'm looking at, either that or smooth means something different than I thought. Any help would be great! thanks, -Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing poisson distributions
Hello all, I would like to compare two sets of count data which form Poisson distributions. I'd like to generate some sort of p-value of the likely-hood that the distributions are the same. Thanks in advance for your advice. Cheers, Mark Mark Gosink, Ph.D. Head of Computational Biology Scripps Florida 5353 Parkside Drive - RFA Jupiter, FL 33458 tel: 561-799-8921 fax: 561-799-8952 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3d plotting
I am trying to dp a 3d plot. I tried persp but my data is not a matrix. the 3dplot function returns this error, (list) object cannot be coerced to 'double' heres my code, td-read.csv(td.csv, header=TRUE) price-read.csv(price.csv, header=TRUE) contractdate-read.csv(contractdate.csv, header=TRUE) library(rgl) plot3d(td,contractdate,price) the 3 csv data files have the following format, 1 2 3 4 5 6 ... 60,000 basically I have 3 columns, x y and z that have 6 rows (data points) I want to plot. - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Import GAUSS .FMT files
Dear All, Is it possible to import GAUSS .FMT files into R? Thanks for your time. Kind Regards, Pedro N. Rodriguez [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3d plotting
60,000 I hope that you actually haven't got any comma to separate the thousands... it separates fields in a csv files (as the Comma Separated Values name may suggest). If so, get rid of the commas. the 3dplot function returns this error, (list) object cannot be coerced to 'double' td-read.csv(td.csv, header=TRUE) price-read.csv(price.csv, header=TRUE) contractdate-read.csv(contractdate.csv, header=TRUE) You have to coerce the 1-column dataframes created by read.csv to numeric vectors or to a 6x3 dataframe. solution 1: myData - cbind(td,contractdate,price) library(rgl) plot3d(mydata) solution 2: td - as.numeric(td) ... price - as.numeric(price) plot3d(td,contractdate,price) Bye, ScionForbai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specifying starting values in lme (nlme package) using msScale
I am using package nlme and would like to specify initial values for a linear mixed-effects model to help with convergence. I am trying to specify those initial values using the msScale option under ‘control’ in the lme() function: lme(Y ~ X1, random= ~ X1|X2, control=list(msScale=lmeScale)) where, (as far as I understand), lmeScale is a function that can take initial values for parameters. However, I am unsure about how to input those starting values (e.g., what names lme will recognize for fixed and random effects, in what format, and if a partial list of initial values would be acceptable?). Any advice or examples of code inputting starting values would be extremely helpful. I have been unable to find examples myself online. Note, although it may be easier to do this in the lme4 package, I would prefer to use nlme. Thank you for your attention. Sincerely, Carrie Holt, Ph.D., M.Sc., B.Sc.(Honours) University of Washington School of Aquatic Fishery Sciences Box 355020 Seattle, WA 98195 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Analyzing Publications from Pubmed via XML
Armin Goralczyk [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: On 12/18/07, David Winsemius [EMAIL PROTECTED] wrote: David Winsemius [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: Armin Goralczyk [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: I tried the above function with simple search terms and it worked fine for me (also more output thanks to Martin's post) but when I use search terms attributed to certain fields, i.e. with [au] or [ta], I get the following error message: pm.srch() 1: laryngeal neoplasms[mh] 2: I am wondering if you used spaces, rather than +'s? If so then you may want your function to do more gsub-processing of the input string. I tried my theory that one would need +'s instead of spaces, but disproved it. Spaces in the input string seems to produce acceptable results on my WinXP/R.2.6.1/RGui system even with more complex search strings. -- It's not the spaces, the problem is the tag (sorry that I didn't specify this), or maybe the string []. I am working on a Mac OS X 10.4 with R version 2.6. Is it maybe a string conversion problem? In the following warning strings in the html adress seem to be different: Fehler in .Call(RS_XML_ParseTree, as.character(file), handlers, as.logical(ignoreBlanks), : error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmedter m=laryngeal neoplasms[mh] I/O warning : failed to load external entity http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubme dterm=laryngeal%20neoplasms%5Bmh%5D I do not have an up-to-date version of R on my Mac, since I have not yet upgraded to OSX10.4. I can try with my older version of R, but failure (or even success) with versions OSX-10.2/R-2.0 is not likely to be very informative. If you will post an example of the input that is resulting in the error, I can try it on my WinXP machine. If we cannot reproduce it there, then it may be more appropriate to take further questions to the Mac-R mailing list. The error message suggests to me that the fault lies in the connection phase of the task. -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] All anchored series from a vector?
Hi all, What may be a smart, efficient way to get the following result: myvector - c(A,B,C,D,E) myseries - miracle(myvector) myseries [1] [[1]] A [2] [[1]] A B [3] [[1]] A B [4] [[1]] A B C [5] [[1]] A B C D [6] [[1]] A B C D E Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] All anchored series from a vector?
Debugged version: lapply(1:length(myvector), function(.length) { myvector[1:.length] }) Thanks for showing the direction! Joh [EMAIL PROTECTED] wrote: From: Johannes Graumann [EMAIL PROTECTED] Date: 2007/12/18 Tue PM 04:40:37 CST To: [EMAIL PROTECTED] Subject: [R] All anchored series from a vector? lapply(1:length(myvector) function(.length) { c(myvector[1}:myvector[.length]) }) but test it because i didn't. Hi all, What may be a smart, efficient way to get the following result: myvector - c(A,B,C,D,E) myseries - miracle(myvector) myseries [1] [[1]] A [2] [[1]] A B [3] [[1]] A B [4] [[1]] A B C [5] [[1]] A B C D [6] [[1]] A B C D E Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] All anchored series from a vector?
On Wed, Dec 19, 2007 at 12:01:25AM +0100, Johannes Graumann wrote: Debugged version: lapply(1:length(myvector), function(.length) { myvector[1:.length] }) Thanks for showing the direction! Joh Note that this fails if length(myvector)==0. Good to know the corner cases. Gabor [EMAIL PROTECTED] wrote: From: Johannes Graumann [EMAIL PROTECTED] Date: 2007/12/18 Tue PM 04:40:37 CST To: [EMAIL PROTECTED] Subject: [R] All anchored series from a vector? lapply(1:length(myvector) function(.length) { c(myvector[1}:myvector[.length]) }) but test it because i didn't. Hi all, What may be a smart, efficient way to get the following result: myvector - c(A,B,C,D,E) myseries - miracle(myvector) myseries [1] [[1]] A [2] [[1]] A B [3] [[1]] A B [4] [[1]] A B C [5] [[1]] A B C D [6] [[1]] A B C D E Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Csardi Gabor [EMAIL PROTECTED]MTA RMKI, ELTE TTK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] All anchored series from a vector?
Nothing to be sorry about. You suggested a viable solution untested ... my job to figure it out ;0) Joh [EMAIL PROTECTED] wrote: From: [EMAIL PROTECTED] Date: 2007/12/18 Tue PM 02:50:52 CST To: Johannes Graumann [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] All anchored series from a vector? i'm sorry. i tested it afterwards and of course it had some problems. below is the working version. myvector-c(A,B,C,D,E) result- lapply(1:length(myvector), function(.length) { myvector[1:.length] }) print(result) From: Johannes Graumann [EMAIL PROTECTED] Date: 2007/12/18 Tue PM 04:40:37 CST To: [EMAIL PROTECTED] Subject: [R] All anchored series from a vector? lapply(1:length(myvector) function(.length) { c(myvector[1}:myvector[.length]) }) but test it because i didn't. Hi all, What may be a smart, efficient way to get the following result: myvector - c(A,B,C,D,E) myseries - miracle(myvector) myseries [1] [[1]] A [2] [[1]] A B [3] [[1]] A B [4] [[1]] A B C [5] [[1]] A B C D [6] [[1]] A B C D E Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Update of the np package (version 0.14-1)
Dear R users, An updated version of the np package has recently been uploaded to CRAN (version 0.14-1). The package is briefly described in a recent issue of Rnews (October, 2007, http://cran.r-project.org/doc/Rnews/Rnews_2007-2.pdf) for those who might be interested. A somewhat more detailed paper that describes the np package is forthcoming in the Journal of Statistical Software (http://www.jstatsoft.org) for those might be interested. A much more thorough treatment of the subject matter can be found in Li, Q. and J. S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press, ISBN: 0691121613 (768 Pages) for those who might be interested (http://press.princeton.edu/titles/8355.html) Information on the np package: This package provides a variety of nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor datatypes. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC:www.nserc.ca), the Social Sciences and Humanities Research Council of Canada (SSHRC:www.sshrc.ca), and the Shared Hierarchical Academic Research Computing Network (SHARCNET:www.sharcnet.ca). Changes from version 0.13-1 to 0.14-1: * now use optim rather than nlm for minimisation in single index and smooth coefficient models * fixed bug in klein-spady objective function * regression standard errors are now available in the case of no continuous variables * summary should look prettier, print additional information * tidied up lingering issues with out-of-sample data and conditional modes * fixed error when plotting asymptotic errors with conditional densities * fixed a bug in npplot with partially linear regressions and plot.behavior='data' or 'plot-data' * maximum default number of multistarts is now set to 5 * least-squares cross-validation of conditional densities uses a new, faster algorithm * new, faster algorithm for least-squares cross-validation for both local-constant and local linear regressions. The estimator has changed somewhat: both cross-validation and the estimator use a method of shrinking towards the local constant estimator rather than the standard ridge approach that shrinks towards zero * optimised smooth coefficient code, added ridging * fixed bug in uniform CDF kernel * fixed bug where npindexbw would ignore bandwidth.compute = FALSE and compute bandwidths when supplied with a preexisting bw object * now can handle estimation out of discrete support. * summary would misreport the values of discrete scale factors which were computed with bwscaling = TRUE We are grateful to John Fox, Achim Zeilies, Roger Koenker, and numerous users for their valuable feedback which resulted in an improved version of the package. -- Jeffrey Racine Tristen Hayfield. -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of EconomicsFAX:(905) 521-8232 McMaster Universitye-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance' ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I extract the AIC score from a mixed model object produced using lmer?
I am running a series of candidate mixed models using lmer (package lme4) and I'd like to be able to compile a list of the AIC scores for those models so that I can quickly summarize and rank the models by AIC. When I do logistic regression, I can easily generate this kind of list by creating the model objects using glm, and doing: md - c(md1.lr, md2.lr, md3.lr) aic - c(md1.lr$aic, md2.lr$aic, md3.lr$aic) aic2 - cbind(md, aic) but when I try to extract the AIC score from the model object produced by lmer I get: md1.lme$aic NULL Warning message: In md1.lme$aic : $ operator not defined for this S4 class, returning NULL So... How do I query the AIC value out of a mixed model object created by lmer? --- Peter Singleton USFS Pacific Northwest Research Station 1133 N. Western Ave. Wenatchee WA 98801 Phone: (509)664-1732 Fax: (509)665-8362 E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random forests
Dear all, I would like to use a tree regression method to analyze my dataset. I am interested in the fact that random forests creates in-bag and out-of-bag datasets, but I also need an estimate of support for each split. That seems hard to do in random forests since each tree is grown using a subset of the predictor variables. I was thinking of setting mtry = number of predictor variables, growing several trees, and computing the support for each node as the number of times that a certain predictor variable was chosen for that node. Can this be implemented using random forests? Thanks! Naiara. -- Naiara Pinto PhD Candidate Ecology, Evolution and Behavior University of Texas Austin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I extract the AIC score from a mixed model object produced using lmer?
You can calculate the AIC as follows: (fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)) aic1 - AIC(logLik(fm1)) Hope this helps. Dave On 12/18/07, Peter H Singleton [EMAIL PROTECTED] wrote: I am running a series of candidate mixed models using lmer (package lme4) and I'd like to be able to compile a list of the AIC scores for those models so that I can quickly summarize and rank the models by AIC. When I do logistic regression, I can easily generate this kind of list by creating the model objects using glm, and doing: md - c(md1.lr, md2.lr, md3.lr) aic - c(md1.lr$aic, md2.lr$aic, md3.lr$aic) aic2 - cbind(md, aic) but when I try to extract the AIC score from the model object produced by lmer I get: md1.lme$aic NULL Warning message: In md1.lme$aic : $ operator not defined for this S4 class, returning NULL So... How do I query the AIC value out of a mixed model object created by lmer? --- Peter Singleton USFS Pacific Northwest Research Station 1133 N. Western Ave. Wenatchee WA 98801 Phone: (509)664-1732 Fax: (509)665-8362 E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Barron Said Business School Jesus College Park End Street Oxford Oxford OX1 1HP OX1 3DW 01865 288906 01865 279684 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating the number of days from dates
On 19/12/2007, at 6:46 AM, bogdan romocea wrote: Sorry for using library instead package, but library() is one command for using packages. ... which is why all efforts to make folks say package instead of library are doomed to fail, IMHO. Yes, but it gives so much pleasure to those who appreciate the distinction to rail at those who don't! :-) cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting magnitude
On Dec 18, 2007 2:06 PM, [EMAIL PROTECTED] wrote: I am plotting fishing vessel positions and want these points to be relative in size to the catch at that point. Is this possible? I am just begining to use R and my search of the help section didnt help in this area. Heres what Im using so far xyplot(data$latdeg~data$londeg |vessek , groups=vessek, xlim=rev(range(69:77)),ylim=(range(35:42)), data=data, main=list (Mackerel catches, cex=1.0), ylab=latitude, notch=T, varwidth=T, xlab=longitude, cex.axis=0.5,) any info would be appreciated This is pretty easy to do with the ggplot2 package: library(ggplot2) qplot(longdeg, latdeg, data = data, facets = . ~ vessek, size = catch) or maybe qplot(longdeg, latdeg, data = data, facets = . ~ vessek, size = catch) + scale_area() if you want the area of the points proportional to the catch, rather than their radius Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clustering Question (Support Vector Clustering)
I am currently designing a clustering algorithm in collaboration with one of my colleagues. For comparison purposes we would like to contrast it with the Support Vector Clustering algorithm of (A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. Support vector clustering. Journal of Machine Learning Research, 2:125-137, 2001). This is supposedly the most powerful unsupervised clustering algorithm available. Unfortunately, we cannot find code for it anywhere. We have tried emailing the authors and they cannot find the code either (one of them has not answer yet). I was wondering, perhaps someone in the R community has implemented it or knows about some implementation for it. We really do not want to implement it ourselves because this will probably delay our paper considerably. One more thing, I am talking about Support vector clustering not classification, algorithms for that we could find in abundance. Thank you for any help. Ionut Florescu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple plots with single box
Hello, I am trying to display some harmonic functions in a plot. The kind of display I have in mind is like the one that cn be obtained by a call to plot.ts with plot.type = multiple. The only difference is that I want a single box containing all the plots instead of one box per plot. I thought box(which = outer) would have done the job, but it didn't. Below is the code I have used so far. (R 2.5.1, I know, I know...) Any help is greatly appreciated. Thank you in advance, Giovanni = ### Plot harmonic functions n - 6 # even omega - 2 * pi / n par(mfrow = c(n - 1, 1), mar = c(0, 5.1, 0, 5.1), oma = c(3, 1, 2, 1)) for (i in 1:(n/2 - 1)) { curve(cos(x * i * omega), 0, n, ylim = c(-1.1, 1.1), ylab = , axes = FALSE) points(1:n, cos(i * omega * 1:n)) axis(2); abline(h = 0, col = lightgrey) curve(sin(x * i * omega), 0, n, ylim = c(-1.1, 1.1), ylab = , axes = FALSE) points(1:n, sin(i * omega * 1:n)) axis(4); abline(h = 0, col = lightgrey) } curve(cos(x * (n/2) * omega), 0, n, ylim = c(-1.1, 1.1), ylab = , axes = FALSE) points(1:n, rep(c(-1,1), n/2)) axis(1); axis(2); abline(h = 0, col = lightgrey) -- Giovanni Petris [EMAIL PROTECTED] Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor Madness
From ?cbind: Data frame methods The cbind data frame method is just a wrapper for data.frame(..., check.names = FALSE). This means that it will split matrix columns in data frame arguments, and convert character columns to factors unless stringsAsFactors = TRUE is passed. (I'm guessing 'spectrum' is a data.frame before the code fragment you've shown) hope this helps, Tony Plate Johannes Graumann wrote: Why is class(spectrum[[Ion]]) after this factor? spectrum - cbind(spectrum,Ion=rep(, nrow(spectrum)),Deviation.AMU=rep(0.0, nrow(spectrum))) slowly going crazy ... Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam() in gam package
R-users E-mail: r-help@r-project.org This iteration seems to be for iteratively reweighted least squares not for backfitting. And lm.wfit may solve multiple linear equation using QR decomposition; but I am not sure. Let me tell you something about my guess above. The iteration below is from 1 to maxit. for (iter in 1:maxit) { -- fit - eval(bf.call) -- } The help of gam.control() says: maxit: maximum number of local scoring iterations bf.maxit: maximum number of backfitting iterations Therefore the iteration above is local scoring iteration not backfitting iteration. The first line of lm.wfit() is: function (x, y, w, offset = NULL, method = qr, tol = 1e-07, singular.ok = TRUE, ...) There is no bf.maxit or its equivalent in these arguments. And it contains 'method = qr'; it may mean that this routine solves a multiple linear equation using QR decomposition. -- *[EMAIL PROTECTED]* http://cse.naro.affrc.go.jp/takezawa/intro.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] accessing dimension names
To access your dimension idx you could do either assign(a,paste(dimnames(y)$x,idx,sep=)) or eval(parse(text=paste(a-dimnames(y)$x,idx,sep=))) --- [EMAIL PROTECTED] wrote: I have a matrix y: dimnames(y) $x93 [1] 1 2 $x94 [1] 0 1 2 .. so on (there are other dimensions as well) I need to access a particular dimension, but a random mechanism tells me which dimension it would. So, sometimes I might need to access dimnames(y)$x93, some other time it would be dimnames(y)$x94.. and so on. Now let that random dimension be idx, then dimnames(y)$paste('x',idx,sep='') doesn't work. Can anyone help? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a database
Can a simple matrix be used for this or would it be better and more efficient to create an external database to hold the data. If so, should the database be created using C and how would I do this (seeing as that I have never programmed in C)? You don't want to be down at the C level, most likely: it would be much more straightforward and programmer-efficient to use one of the available bindings to one of the available open-source databases. R has useful / usable bindings to postgresql, sqlite, and mysql, among many others. These are, however, more generally useful when you reach the point that you simply can't manage the volume of your data in R objects or in data frames. [And, well, you can go a LONG way with intelligently named R objects. :-)] --elijah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assigning and saving datasets in a loop, with names changing with i
Dear R users, I am analysing a very large data set and I need to perform several data manipulations. The dataset is so big that the only way I can play with it without having memory problems (E.g. cannot allocate vectors of size...) is to write a batch script to: 1. cut the data into pieces 2. save the pieces in seperate .RData files 3. Remove everything from the environment 4. load one of the piece 5. perform the manipulations on it 6. save it and remove it from the environment 7. Redo 4-6 for every piece 8. Merge everything together at the end It works if coded line by line but since I'll have to perform these tasks on other data sets, I am trying to automate this as much as I can. I am using a loop in which I used 'assign' and 'get' (pseudo code below). My problem is when I use 'get', it prints the whole object on the screen. I am wondering whether there is a more efficient way to do what I need to do. Any help would be appreciated. Please keep in mind that the whole process is quite computer-intensive, so I can't keep everything in the environment while R performs calculations. Say I have 1 big dataframe called data. I use 'split' to divide it into a list of 12 dataframes (call this list my.list) my.fun is a function that takes a dataframe, performs several manipulations on it and returns a dataframe. for (i in 1:12){ assign( paste( data, i, sep=), my.fun(my.list[i])) # this works # now I need to save this new object as a RData. # The following line does not work save(paste(data, i, sep = ), file = paste( paste(data, i, sep = ), RData, sep=.)) } # This works but it is a bit convoluted!!! temp - get(paste(data, i, sep = )) save(temp, file = lala.RData) } I am *sure* there is something more clever to do but I can't find it. Any help would be appreciated. best regards, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning and saving datasets in a loop, with names changing with i
you want to use: save(list=paste(data, i, sep=), file=paste(data, i, .Rdata, sep=)) b On Dec 18, 2007, at 9:24 PM, Marie Pierre Sylvestre wrote: Dear R users, I am analysing a very large data set and I need to perform several data manipulations. The dataset is so big that the only way I can play with it without having memory problems (E.g. cannot allocate vectors of size...) is to write a batch script to: 1. cut the data into pieces 2. save the pieces in seperate .RData files 3. Remove everything from the environment 4. load one of the piece 5. perform the manipulations on it 6. save it and remove it from the environment 7. Redo 4-6 for every piece 8. Merge everything together at the end It works if coded line by line but since I'll have to perform these tasks on other data sets, I am trying to automate this as much as I can. I am using a loop in which I used 'assign' and 'get' (pseudo code below). My problem is when I use 'get', it prints the whole object on the screen. I am wondering whether there is a more efficient way to do what I need to do. Any help would be appreciated. Please keep in mind that the whole process is quite computer-intensive, so I can't keep everything in the environment while R performs calculations. Say I have 1 big dataframe called data. I use 'split' to divide it into a list of 12 dataframes (call this list my.list) my.fun is a function that takes a dataframe, performs several manipulations on it and returns a dataframe. for (i in 1:12){ assign( paste( data, i, sep=), my.fun(my.list[i])) # this works # now I need to save this new object as a RData. # The following line does not work save(paste(data, i, sep = ), file = paste( paste(data, i, sep = ), RData, sep=.)) } # This works but it is a bit convoluted!!! temp - get(paste(data, i, sep = )) save(temp, file = lala.RData) } I am *sure* there is something more clever to do but I can't find it. Any help would be appreciated. best regards, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and precision?
Thank you for your response! 'write.table' writes up to 15 decimal digits which is not the machine (double) precision but not far from that - sorry for the misleading comments! After all I found a way to do what I needed without using disk or much memory and doing only twice as much work as I could with unlimited memory, so I will stick to this approach. --- Prof Brian Ripley [EMAIL PROTECTED] wrote: On Mon, 17 Dec 2007, Moshe Olshansky wrote: Dear List, Following the below question I have a question of my own: Suppose that I have large matrices which are produced sequentially and must be used sequentially in the reverse order. I do not have enough memory to store them and so I would like to write them to disk and then read them. This raises two questions: 1) what is the fastest (and the most economic space-wise) way to do this? Using save/load is the simplest. Don't worry about finding better solutions until you know those are not good enough. (serialize / unserialize is another interface to the same underlying idea.) 2) functions like write, write.table, etc. write the data the way it is printed and this may result in a loss of accuracy. Is there any way to prevent this, except for setting the digits option to a higher value or using format prior to writing the data? Do please read the help before making false claims. ?write.table says Real and complex numbers are written to the maximal possible precision. OTOH, ?write says it is a wrapper for cat, whose help says 'cat' converts numeric/complex elements in the same way as 'print' (and not in the same way as 'as.character' which is used by the S equivalent), so 'options' 'digits' and 'scipen' are relevant. However, it uses the minimum field width necessary for each element, rather than the same field width for all elements. so this hints as.character() might be a useful preprocessor. Is it possible to write binary files (similar to Fortran)? See ?writeBin. save/load by default write binary files, but use of writeBin can be faster (and less flexible). Any suggestion will be greatly appreciated. Somehow you have missed a great deal of information about R I/O. Try help.start() and reading the sections the search engine shows you that look relevant. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange timings in convolve(x,y,type=open)
On Tue, 18 Dec 2007, Art Owen wrote: Dear R-ophiles, I've found something very odd when I apply convolve to ever larger vectors. Here is an example below with vectors ranging from 2^11 to 2^17. There is a funny bump up at 2^12. Then it gets very slow at 2^16. The time is consumed by fft, which is called with vectors of length 2*2^i-1 when type = 'o' system.time( fft(seq_len(2^13-1)) ) user system elapsed 0.310.000.32 system.time( fft(seq_len(2^14-1)) ) user system elapsed 0 0 0 There are no factors of 2^13-1 or 2^17-1 or 2^18-1 for (i in 11:20 ) print( c(index=i, nfact=length(which( 0 == (2^(i+1)-1)%%(2:trunc(sqrt(2^(i+1)-1)) ) index nfact 1111 index nfact 12 0 index nfact 13 3 index nfact 14 3 index nfact 15 7 index nfact 16 0 index nfact 1715 index nfact 18 0 index nfact 1923 index nfact 20 5 It looks like the code in fft.c tries to find factors of the series length and works from there. So, the size of the problem evidently depends on its factors. HTH, Chuck for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type=o))) user system elapsed 0.002 0.000 0.002 user system elapsed 0.373 0.002 0.375 user system elapsed 0.014 0.001 0.016 user system elapsed 0.031 0.002 0.034 user system elapsed 0.126 0.004 0.130 user system elapsed 194.095 0.013 194.185 user system elapsed 0.345 0.011 0.356 This example is run on a fedora machine with 64 bits. I hit the same wall at 2^16 on a Macbook (Intel processor I think). The fedora machine is running R 2.5.0. That's a bit old (April 07) but I saw no mention of this speed problem in some web searching, and it's not mentioned in the 2.6 what's new notes. I've rerun it and found the same bump at 12 and wall at 16. The timing at 2^16 can change appreciably. In one other case it was about 270 user, 271 elapsed. The 2^18 case ran for hours without ever finishing. At first I thought that this was a memory latency issue. Maybe it is. But that makes it hard to explain why 2^17 works better than 2^16. I've seen that three times now, so I'm almost ready to call it reproducible. Also, one of the machines I'm using has lots of memory. Maybe it's a cache issue ... but that still does not explain why 2^12 is slower than 2^13 or 2^16 is slower than 2^17. I've checked by running convolve without type=o and I don't see the wall. Similarly fft does not have that problem. Here's an example without type=open for( k in 11:20)print(system.time( convolve( 1:2^k,1:2^k))) user system elapsed 0.001 0.000 0.000 user system elapsed 0.001 0.000 0.001 user system elapsed 0.002 0.000 0.002 user system elapsed 0.004 0.000 0.004 user system elapsed 0.009 0.001 0.010 user system elapsed 0.017 0.001 0.018 user system elapsed 0.138 0.005 0.143 user system elapsed 0.368 0.012 0.389 user system elapsed 1.010 0.032 1.051 user system elapsed 1.945 0.069 2.015 This is more what I expected. Something like N or N log(N) , with the difference hard to discern in granularity and noise. The convolve function is not very big (see below). When type is not specified, it defaults to circular. So my guess is that something mysterious might be happening inside the first else clause below, at least on some architectures. -Art Owen convolve function (x, y, conj = TRUE, type = c(circular, open, filter)) { type - match.arg(type) n - length(x) ny - length(y) Real - is.numeric(x) is.numeric(y) if (type == circular) { if (ny != n) stop(length mismatch in convolution) } else { n1 - ny - 1 x - c(rep.int(0, n1), x) n - length(y - c(y, rep.int(0, n - 1))) } x - fft(fft(x) * (if (conj) Conj(fft(y)) else fft(y)), inv = TRUE) if (type == filter) (if (Real) Re(x) else x)[-c(1:n1, (n - n1 + 1):n)]/n else (if (Real) Re(x) else x)/n } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
[R] plot cummulative sum from calendar time
I have the following list of observations of calendar time: [1] 03-Nov-1997 09-Oct-1991 27-Aug-1992 01-Jul-1994 19-Jan-1990 12-Nov-1993 [7] 08-Oct-1993 10-Nov-1982 08-Dec-1986 23-Dec-1987 02-Aug-1995 20-Oct-1998 [13] 29-Apr-1991 16-Mar-1994 20-May-1991 28-Dec-1987 14-Jul-1999 27-Nov-1998 [19] 09-Sep-1999 26-Aug-1999 20-Jun-1997 05-May-1995 26-Mar-1998 15-Aug-1994 [25] 24-Jun-1996 02-Oct-1996 12-Aug-1985 24-Jun-1992 09-Jan-1991 04-Mar-1988 3230 Levels: 01-Apr-1987 01-Apr-1990 01-Apr-1991 01-Apr-1996 ... 31-Oct-1998 I want to make a plot where x-axis is the calendar time and y-axix is the cumulative sum of observations observed on or before that dates. Can anyone give some suggestions? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 4 questions regarding hypothesis testing, survey package, ts on samples, plotting
Good morning! I have 4 questions which trouble me: 1. I want to test the hypothesis that the 2 proportions (the mean of a binomial) which come from 2 different samples are equal. I want to use the following function z= (p1-p2)/ sqrt((p1(1-p1)/n1)+(p2(1-p2)/n2)) which is one of the standard formulas for this case. Is there such a function in R? p1=the proportion from the first sample n1=the first sample size p1=the proportion from the second sample n2=the second sample size 2. How can i find the median of a variable in survey package? a-c(1:10) b-sample(1:20, 10, replace=T) b1-sample(0:1, 10, replace=T) c-data.frame(a,b, b1) library(survey) design-svydesign(id=~1, data=c) svymean(~b, design) svymean gives me the mean, but what function gives me the svymedian, and what function gives me the svyproportion for b1? I want to know the median for a metric variable and the proportion for a binomial variable ? 3. Could time series analysis (arima/arma) be applied on data that are obtained from a sample?. Eg: Say I have data from the same sample in 40 moments. Could I use arima/arma on the series? Or must i have data for the whole population like data from the Statistical Office on investment, inflation, demographics and so on? 4. I want to modify the scale of my axes within a plot but i really could not find this option. I think there is such an option, but i can not find it. x-rnorm(100, 100, 3) x-ts(x, frequency=12) acf(x) plot(x) On the above example, i want to a scale like this 95,96,97,98,104,105 (on y) and 1,2,3,4,...7,8 (on x). Thank you very much and have a great day! - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.