Re: [R] zero truncated poisson regression
On Sun, 31 Jul 2011, Iasonas Lamprianou wrote: Thanks Pscl seems to be a sensible option. I have the counts variable with the name N. This variable can only take values bigger than zero! I have two explanatory variables with the names type and diam but when I run hpm - hurdle(n ~ type+diam, data = an, dist = poisson) I get the message invalid dependent variable, minimum count is not zero. Well, I know that N0, that is why want to run a zero-truncated model. But I must be missing something...and the manual does not seem to help a lot... Can anyone help please? As previously pointed out by others on this list: hurdle() is not what you are looking for (although it is related to what you want to do). The hurdle() model is a two-part model consisting of a zero-truncated count part and a binary part for modeling N=0 vs N0. See also vignette(countreg, package = pscl) for details. As you don't need the binary hurdle part, you cannot use hurdle() directly. This is why the package countreg on R-Forge provides the function zerotrunc() which essentially does the same thing as the count part in hurdle(). install.packages(countreg, repos = http://R-Forge.R-project.org;) library(countreg) m - zerotrunc(n ~ type + diam, data = an, dist = poisson) summary(m) ? Dr. Iasonas Lamprianou Department of Social and Political Sciences University of Cyprus From: Mitchell Maltenfort mmal...@gmail.com To: Iasonas Lamprianou lampria...@yahoo.com; r-help@r-project.org r-help@r-project.org Sent: Sunday, 31 July 2011, 20:45 Subject: Re: [R] zero truncated poisson regression Pscl package. On 7/31/11, Iasonas Lamprianou lampria...@yahoo.com wrote: Dear friends, does anyone know how I can run a zero truncated poisson regression using R (or even SPSS)? Dr. Iasonas Lamprianou Department of Social and Political Sciences University of Cyprus ??? [[alternative HTML version deleted]] -- Sent from my mobile device Due to the recession, requests for instant gratification will be deferred until arrears in scheduled gratification have been satisfied. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use dump or write? or what?
oaxacamatt wrote: I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. You did not tell use what you want to do with the data in the file. If you just want a copy of the output, bracketing with sink(file) and sink() can be useful. If you want to process the results later, try save/load. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Use-dump-or-write-or-what-tp3708904p3709031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use dump or write? or what?
Hi Matt, I assume that you want a tabular text file of the results. Since I don't know what your tempA and tempB are I'll steal some examples from ?t.test t.example.1 - t.test(1:10,y=c(7:20)) t.example.2 - t.test(1:10,y=c(7:20, 200)) Now looking at ?dump, the first argument needs to be *character*, signifying The names of one or more R objects to be dumped. So put the names of the objects in quotes. I'm ignoring your ttest_results = tempfile() line because it appears you want to put the results into dumpdata.txt. dump (t.example.1, file = dumpdata.txt) dump (t.example.2, file = dumpdata.txt, append=TRUE) Both objects show up in the file (yay!) but the result probably isn't what you're after, with some R-like code along the lines of t.example.1 - structure(list(statistic [ ... lots of other stuff that isn't in a tabular format ... ] write()ing a list isn't the way to go either: write(t.example.1,test.txt) Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat' write.table() of the whole result gives some kind of problem as well: write.table(t.example.1,test.txt) Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class 'htest' into a data.frame What about saving only part of the results? The first line below overwrites the dumpdata.txt created above. The second line appends to the file, and also doesn't write the column names because they are already present from the first write.table. write.table(t.example.1[1:3], dumpdata.txt) write.table(t.example.2[1:3], dumpdata.txt, append=TRUE, col.names=FALSE) There are certainly many variations to try. This writes only the statistic, parameter and p.value of the t-tests. Here is the resulting file. cat(readLines(dumpdata.txt), sep=\n) statistic parameter p.value t -5.43492976389406 21.982212340189 1.85528183251181e-05 t -1.63290263320121 14.1645989530125 0.124513498089745 Jeff On Sun, Jul 31, 2011 at 8:41 PM, Matt Curcio matt.curcio...@gmail.com wrote: Greetings all, I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. But I can't figure out how to write it to file and then append subsequent t-tests. (maybe too tired ;} ) I have tried to use dump and file.append to no avial. ttest_results = tempfile() two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE) welch_ttest - t.test (tempA, tempB, var.equal = FALSE) dump (two_sample_ttest, file = dumpdata.txt, append=TRUE) ttest_results - file.append (ttest_results, two_sample_ttest) Any suggestions, M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with modFit of FME package
Dear R users, I'm trying to fit a set an ODE to an experimental time series. In the attachment you find the R code I wrote using modFit and modCost of FME package and the file of the time series. When I run summary(Fit) I obtain this error message, and the values of the parameters are equal to the initial guesses I gave to them. The problem is not due to the fact that I have only one equation (I tried also with more equations, but I still obtain this error). I would appreciate if someone could help me in understanding the reason of the error and in fixing it. Thanks for your attention, Paola Lecca. Here the error: summary(Fit) Parameters: Estimate Std. Error t value Pr(|t|) pro1_strength1 NA NA NA Residual standard error: 2.124 on 10 degrees of freedom Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix In addition: Warning message: In summary.modFit(Fit) : Cannot estimate covariance; system is singular __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *Paola Lecca, PhD* *The Microsoft Research - University of Trento* *Centre for Computational and Systems Biology* *Piazza Manci 17 38123 Povo/Trento, Italy* *Phome: +39 0461282843* *Fax: +39 0461282814* timepp1_mrna 0 0 2 2.754 4 2.958 6 4.058 8 3.41 10 3.459 12 2.453 14 1.234 16 2.385 18 3.691 20 3.252 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot frame color and linewidth
marcel wrote: I have a figure with a lattice plot and a basic plot. Is there a way to select the color and line width of the surrounding boxes for each of these? I could not find any documentation on this. Thanks for providing a nice self-contained example. There was nothing wrong with it, but you can simplyfy your life with: library(lattice) xyplot(1~1, par.settings = list(axis.line=list(col=green))) When lost in trellis space, I always do: str(trellis.par.get()) For standard graphics, there is an example with gray color at the bottom of the par-help page. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Plot-frame-color-and-linewidth-tp3708858p3709062.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] export/import matrix
Rosario Garcia Gil-2 wrote: I have a problem on keeping the format when I export a matrix file with the write.table() function. When I import the data volcano from rgl package it looks like this in R: data[1:5,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 102 [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 103 ... I use this data to represent a 3D map with the follwing script and it works PEFECT! y- 2*data x - 10* (1:nrow(y)) z - 10* (1:ncol(y)) ylim - range(y) ylen -ylim[2] - ylim[1] + 1 colorlut - terrain.colors(ylen) col - colorlut[y-ylim[1] + 1] rgl.open() rgl.surface(x,z,y, color=col, back=lines) ... Then I export it as write.table(data, file=datam.txt, row.names=TRUE, col.names=TRUE), ... when I import it back into R again with read.table(datam.txt) it looks like this in R: ... The script I mention before does not anymore work on it, if I converted to matrix with as.matrix still does not work. ... It is always better to report what str(mydata) looks like, instead of showing the data. And I an quite sure that something like as.matrix would work, but you did not tell use what the error message in still does not work looked like. Dieter -- View this message in context: http://r.789695.n4.nabble.com/export-import-matrix-tp3708935p3709072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with ks.test()
(I'm replying to your original post because your follow-up omits the context.) The K-S test is designed for continuous distributions. You have far too many zeros in your data to get anything reasonable out of the test. For your data, the K-S statistic is the difference in the (e)cdfs at zero. Your results just show that this can be sensitive to the degree of rounding used for the theoretical cdf. Peter Ehlers On 2011-07-29 02:07, Jochen1980 wrote: Hi, I got two data point vectors. Now I want to make a ks.test(). I you print both vectors you will see, that they fit pretty fine. Here is a picture: http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png As you can see there is one histogram and moreover there is the gumbel density function plotted. Now I took to bin-mids and the bin-height for vector1 and computed the distribution-values to all bin-mids as vector2. I pass these two vectors to ks.test(). Are those the right vectors, if I want to decide afterwards, if my experiment-data is gumbel-distributed? Surprisingly the p-value changes tremendously if I calculate more digits out of my theoretical formula. If I round to 0 digits, p is 1, if I round to 4 digits, p drops to 0 - how could this happen, I thought more digits will bring more accurate results?! Case 0 digits: XXX [1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16 7 [19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [91] 0 0 0 0 0 0 0 0 0 0 [1] 0 0 0 0 1 10 49 113 160 168 147 113 81 55 37 24 15 10 [19] 6 4 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [91] 0 0 0 0 0 0 0 0 0 0 [1] Ergebnisse [1] Analyse der Eingangsdaten [1] Mean: 0.104537195 [1] SAbw.: 0.0277657985898433 [1] Parameter-Berechnung der Daten bei angenommener Gumbelverteilung [1] Mue: 0.0920411082987717 [1] Beta: 0.0216489043196013 [1] KS-Test - 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 = Histogrammhöhen [1] KST D: 0.04 [1] KST P: 1 XXX Case 4 digits: [1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16 7 [19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [91] 0 0 0 0 0 0 0 0 0 0 [1] 0.000 0.000 0.000 0.006 0.622 10.094 49.271 112.776 160.174 [10] 168.419 146.527 113.137 81.026 55.344 36.690 23.870 15.347 9.793 [19] 6.220 3.939 2.490 1.572 0.992 0.625 0.394 0.248 0.157 [28] 0.099 0.062 0.039 0.025 0.016 0.010 0.006 0.004 0.002 [37] 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 [46] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [55] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [64] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [73] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [82] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [91] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [100] 0.000 [1] Ergebnisse [1] Analyse der Eingangsdaten [1] Mean: 0.104537195 [1] SAbw.: 0.0277657985898433 [1] Parameter-Berechnung der Daten bei angenommener Gumbelverteilung [1] Mue: 0.0920411082987717 [1] Beta: 0.0216489043196013 [1] KS-Test - 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 = Histogrammhöhen [1] KST D: 0.2 [1] KST P: 0.0366 Thanks in advance for some help. Jochen -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Beta fit returns NaNs
Hi, sorry for repeating the question but this is kind of important to me and i don't know whom should i ask. So as noted before when I do a parameter fit to the beta distr i get: fitdist(vectNorm,beta); Fitting of the distribution ' beta ' by maximum likelihood Parameters: estimate Std. Error shape1 2.148779 0.1458042 shape2 810.067515 61.8608126 Warning messages: 1: In dbeta(x, shape1, shape2, log) : NaNs produced 2: In dbeta(x, shape1, shape2, log) : NaNs produced 3: In dbeta(x, shape1, shape2, log) : NaNs produced 4: In dbeta(x, shape1, shape2, log) : NaNs produced 5: In dbeta(x, shape1, shape2, log) : NaNs produced 6: In dbeta(x, shape1, shape2, log) : NaNs produced Now im my vector has cca 900 points. are those 6 error messages some thing to be really concerned or what does it mean ?? -- View this message in context: http://r.789695.n4.nabble.com/Beta-fit-returns-NaNs-tp3709139p3709139.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zero truncated poisson regression
Thank you, it works! And the estimates (as well as the standard errors) seem to be more reasonable now, compared to the normal Poisson model. Thank you. However, I tried to find a manual (although I did manage to find the paper published in the Journal of Statistical Software. For example, how does ?zerotrunc return? Dr. Iasonas Lamprianou Department of Social and Political Sciences University of Cyprus From: Achim Zeileis achim.zeil...@uibk.ac.at To: Iasonas Lamprianou lampria...@yahoo.com Cc: Mitchell Maltenfort mmal...@gmail.com; r-help@r-project.org r-help@r-project.org Sent: Monday, 1 August 2011, 10:10 Subject: Re: [R] zero truncated poisson regression On Sun, 31 Jul 2011, Iasonas Lamprianou wrote: Thanks Pscl seems to be a sensible option. I have the counts variable with the name N. This variable can only take values bigger than zero! I have two explanatory variables with the names type and diam but when I run hpm - hurdle(n ~ type+diam, data = an, dist = poisson) I get the message invalid dependent variable, minimum count is not zero. Well, I know that N0, that is why want to run a zero-truncated model. But I must be missing something...and the manual does not seem to help a lot... Can anyone help please? As previously pointed out by others on this list: hurdle() is not what you are looking for (although it is related to what you want to do). The hurdle() model is a two-part model consisting of a zero-truncated count part and a binary part for modeling N=0 vs N0. See also vignette(countreg, package = pscl) for details. As you don't need the binary hurdle part, you cannot use hurdle() directly. This is why the package countreg on R-Forge provides the function zerotrunc() which essentially does the same thing as the count part in hurdle(). install.packages(countreg, repos = http://R-Forge.R-project.org;) library(countreg) m - zerotrunc(n ~ type + diam, data = an, dist = poisson) summary(m) ? Dr. Iasonas Lamprianou Department of Social and Political Sciences University of Cyprus From: Mitchell Maltenfort mmal...@gmail.com To: Iasonas Lamprianou lampria...@yahoo.com; r-help@r-project.org r-help@r-project.org Sent: Sunday, 31 July 2011, 20:45 Subject: Re: [R] zero truncated poisson regression Pscl package. On 7/31/11, Iasonas Lamprianou lampria...@yahoo.com wrote: Dear friends, does anyone know how I can run a zero truncated poisson regression using R (or even SPSS)? Dr. Iasonas Lamprianou Department of Social and Political Sciences University of Cyprus ??? [[alternative HTML version deleted]] -- Sent from my mobile device Due to the recession, requests for instant gratification will be deferred until arrears in scheduled gratification have been satisfied. [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta fit returns NaNs
On Aug 1, 2011, at 10:33 , baxy77 wrote: Hi, sorry for repeating the question but this is kind of important to me and i don't know whom should i ask. So as noted before when I do a parameter fit to the beta distr i get: fitdist(vectNorm,beta); Fitting of the distribution ' beta ' by maximum likelihood Parameters: estimate Std. Error shape1 2.148779 0.1458042 shape2 810.067515 61.8608126 Warning messages: 1: In dbeta(x, shape1, shape2, log) : NaNs produced 2: In dbeta(x, shape1, shape2, log) : NaNs produced 3: In dbeta(x, shape1, shape2, log) : NaNs produced 4: In dbeta(x, shape1, shape2, log) : NaNs produced 5: In dbeta(x, shape1, shape2, log) : NaNs produced 6: In dbeta(x, shape1, shape2, log) : NaNs produced Now im my vector has cca 900 points. are those 6 error messages some thing to be really concerned or what does it mean ?? They are probably harmless. It just means that in the search of the parameter space, the fitting algorithm ventured into forbidden territory (most likely, it tried a negative value for one of the shape parameters). You could try setting start= to something closer to the final estimates and see if the warnings go away. BTW: I assume this is using the fitdistrplus contributed package (and not just misspelling fitdistr from MASS)? You really should specify such things -- to make it easier for people to help, but also out of courtesy to the author. -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta fit returns NaNs
On 2011-08-01 01:33, baxy77 wrote: Hi, sorry for repeating the question but this is kind of important to me and i don't know whom should i ask. So as noted before when I do a parameter fit to the beta distr i get: fitdist(vectNorm,beta); Fitting of the distribution ' beta ' by maximum likelihood Parameters: estimate Std. Error shape1 2.148779 0.1458042 shape2 810.067515 61.8608126 Warning messages: 1: In dbeta(x, shape1, shape2, log) : NaNs produced 2: In dbeta(x, shape1, shape2, log) : NaNs produced 3: In dbeta(x, shape1, shape2, log) : NaNs produced 4: In dbeta(x, shape1, shape2, log) : NaNs produced 5: In dbeta(x, shape1, shape2, log) : NaNs produced 6: In dbeta(x, shape1, shape2, log) : NaNs produced Now im my vector has cca 900 points. are those 6 error messages some thing to be really concerned or what does it mean ?? Those warnings are from optim(). You probably don't have to worry about them. I usually use fitdistr() in the MASS package. But it will require reasonable start values. To avoid the warnings, you could try using the parameter estimates from your fitdist(vectNorm, beta) call as start values and re-run fitdist() with those values, and you might also set the optim method to BFGS (which, BTW, is the default in fitdistr()). library(fitdistrplus) fitdist(vectNorm, beta, start = list(shape1 = 2.15, shape2 = 810), optim.method = BFGS) Peter Ehlers -- View this message in context: http://r.789695.n4.nabble.com/Beta-fit-returns-NaNs-tp3709139p3709139.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatial Data Interpolation
Dear Peter, The spatial taskview lists a number of interpolation methods [1]. Some of those support spatio-temporal interpolation. For example gstat supports spatio-temporal kriging [2,3,4]. regards, Paul [1] http://cran.r-project.org/web/views/Spatial.html [2] http://en.wikipedia.org/wiki/Kriging [3] http://www.google.nl/search?q=space+time+kriging [4] http://cran.r-project.org/web/packages/gstat/index.html On 07/30/2011 08:35 PM, Peter Maclean wrote: Dear GIS people What is the best way of implemeting spatial data interpolation (from large to small grids)-especially for dummies. I searched the internet and could not get concrete answer. Here is an example with simulated data. #Example of spatial data interpolation require(utils) #I need to interpolate the temp and rain data (from its surounding points) #for the same period and accoubting for elevation #New coordinates and elevation lat -seq(-1, -5, by=-0.1) lon -seq(28, 30, by=0.1) year - seq(2000, 2005, by=1) period - c(Mar, Apr,May) ndata - list(year=year,period=period,lat=lat, lon=lon) ndata - expand.grid(ndata) ndata$elev -sample(1000: 8000,nrow(ndata),replace=T) ndata - ndata[order(ndata$year,ndata$period) , ] fix(ndata) #Original data with elevation-same period lat - seq(-1, -5, by=-0.5) lon - seq(28, 30, by=0.5) data - list(year=year,period=period,lat=lat, lon=lon) data - expand.grid(data) data$temp - sample(15:100, nrow(data),replace=T) data$rain - sample(0: 1000,nrow(data),replace=T) data - data[order(data$year,data$period) , ] data - na.omit(merge(data,ndata, by=c(year, period, lat,lon))) fix(data) ## #Spatial-Temporal Interpolation from original data (temp rain) to new data Peter Maclean Department of Economics UDSM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beta fit returns NaNs
yes it is the fitdistrplus package. Sorry form not mentioning it earlier. Usually i do those things but this time it somehow slipped my mind , sorry and Thank you both! -- View this message in context: http://r.789695.n4.nabble.com/Beta-fit-returns-NaNs-tp3709139p3709277.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend for 2 plots on same screen
On 08/01/2011 04:52 AM, Cheryl Johnson wrote: Hello, I have two plots on the same screen. I use the command par(mfrow=c(1,2)) in order to do this. When I try to make a legend for both plots, it only puts the legend in the plot on the right side. If I would like a legend that is outside of both of the plots, how would I do this? Hi Cheryl, You probably want to use par(xpd=TRUE) to allow displaying the legend outside the plot areas. Look at the color.legend (plotrix) function to see how it's done. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 102, Issue 1
Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden Fällen melden Sie sich bei Stefanie von Felten steffi.vonfel...@oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfel...@oikostat.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] axes label
Dear All, I am trying to put 10^-8 st km^-2day^-1 on x-axis of my plot. I tried using : ylab = expression(paste(st / , plain(km)^2, / day)) to see if I can at least get the unit before thinking about the power of 10 (10^-8). However, ylab = expression(paste(st / , plain(km)^2, / day)) didn't give the result I expected. The power 2 in km was missing. I will be glad for any help on how to label 10^-8 st km^-2day^-1 on the axis. Many thanks Regards Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save generic plot to file (before rendering to device)
Bumping this one up because the 'before.plot.new' solution turned out to be sub-optimal after all. It should be possible to do this with a before.plot.new hook, right? Yes, sure, if you treat the first and last plot separately. It turns out that the before.plot.new hook does not is not triggered at the right moments. I'm not sure if this is intended behavior or incorrect implementation. What I was expecting is a hook/event that is triggered every time before a new graphics frame is opened. E.g. if there is an open PDF device and some plots are printed, the number of times the hook is called should be exactly equal to the number of pages in the resulting PDF document. Sometimes this works as expected, sometimes it doesn't. At the end of this message some example code. In the first example, the hook works as expected is called 4 times, as there are 4 plots. In all the other examples the event is either triggered too often or not triggered at all. I guess the hook is called when the plot.new() function is explicitly called, which might not always happen. My question would be if (1) this is the intended behavior for 'before.plot.new', and (2) if yes, would it be possible to define an additional event that always triggers, and only triggers, if a completely new graphics device is opened. I.e. whenever a pdf device would start a new page. Thank you. #set the hook (event listener) setHook(before.plot.new, NULL); setHook(before.plot.new, function(){ message(Yay! A new plot!)}); #works as expected: plot(lm(speed~dist, cars), ask=F); #triggered way too often, once for every partition of the plot plot(mtcars); #not triggered at all by lattice library(lattice); dotplot(speed~dist, cars); #not triggered at all by ggplot2 library(ggplot2); qplot(speed, dist, data=cars); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with algorithm
On 07/31/2011 05:57 PM, r student wrote: I'm wondering if anyone can give some basic advice about how to approach a specific task in R. I'm new to R but have used SAS for many years, and while I can muscle through a lot of the code details, I'm unsure of a few things. Specific questions: If I have to perform a set of actions on a group of files, should I use a loop (I feel like I've heard people say to avoid looping in R)? Hi, Looping over several files is best done using the apply family of functions. Especially the llply, ldply and ddply functions from the plyr package I use a lot for processing. An example of looping over files and recombining the results would look something like: library(plyr) listoffiles = list.files(/where/the/files/are) combinedResult = ldply(listoffiles, function(filename) { bla = read.table(filename) ... now maybe do some stuff with it... return(result) # Note that result is a data.frame # Can contain e.g. summary stats of bla }) ldply will automatically combine the result of the function call in an efficient manner. It can take some time to get the hang of these things, but I love working with them when processing data. How to get means for by groups and subset a files based on those (subset highest and lowest groups)? (I can do this in multiple steps* but wonder what the best, R way is to do this.) when your data.frame has the form and is called dat: valueby 1 A 5 A 3 B etc You can use ddply like this to get the mean value per category in 'by': ddply(dat, .(by), summarise, m = mean(value)) How to draw cutoff lines at specific points on density plots? How to create a matrix of plots? (Take 4 separate plots and put them into a single graphic.) I really like the ggplot2 package, this provides drawing several plots using a special syntax construct (no need to manually subdivide the canvas nor keep the axis of the plots equal manually). Take a look at the website of ggplot2, specifically look at the examples given for the facet_wrap and facet_grid functions. cheers, Paul * Get group means, add means back to file, sort by mean, take first and last groups Feel free to excoriate me if I'm asking for too much help. If possible though, a few words of advice (loops are the best way, just use the main parameter to combine plots) would be lovely if you can provide. Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: converting factor to numeric gives NAs introduced by coercion
Hi Hi, I have a dataframe that I imported from a .txt file by: skogTemp - read.delim2(Skogaryd_shoot_data.txt, header=TRUE, fill=TRUE) and the data are factors, how can avoid factors from the beginning? Although the file contains both characters and numbers. You have got an answer but here are some comments. If you have characters and numbers in one column the character values are converted to NA by as.numeric I tried to convert some of the columns from factor to numeric and as I understood it you can not use only as.numeric but as.character first. I got this warning message: skogTemp_1 - as.numeric(as.character(skogTemp_1[,2:4])) Warning message: NAs introduced by coercion What is skogTemp_1? I presume skogTemp is data frame and in that case you can not use such construction directly. I have lots of NAs in my data. Tries to check what class I had now but another warning is given me: class(skogTemp_1[,2]) skogTemp_1 is probably a vector with only one dimension therefore you get this error. class(skogTemp_1) shall give you the desired result, however I prefer ?str Regards Petr Error in skogTemp_1[, 2] : incorrect number of dimensions class(skogTemp_1[1,2]) Error in skogTemp_1[1, 2] : incorrect number of dimensions frustrating... I don't know what this mean. Can anyone help? Thank you, Angelica -- View this message in context: http://r.789695.n4.nabble.com/converting- factor-to-numeric-gives-NAs-introduced-by-coercion-tp3703408p3703408.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axes label
On 2011-08-01 03:32, ogbos okike wrote: Dear All, I am trying to put 10^-8 st km^-2day^-1 on x-axis of my plot. I tried using : ylab = expression(paste(st / , plain(km)^2, / day)) to see if I can at least get the unit before thinking about the power of 10 (10^-8). However, ylab = expression(paste(st / , plain(km)^2, / day)) didn't give the result I expected. The power 2 in km was missing. Works for me. But I don't see the need for paste() or plain(). Try this: plot(0, ylab=, xlab=) title(ylab = expression(10^{-8} ~ st ~ km^{-2} ~ day^{-1})) Replace any '~' with '*' if you don't want the space. Peter Ehlers I will be glad for any help on how to label 10^-8 st km^-2day^-1 on the axis. Many thanks Regards Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ivreg and structural change
Hello, I am looking for some help with this question: how could I test structural breaks in a instrumental variables´s model? For example, I was trying to do something with my model with three time series. tax_ivreg - ivreg(l_y ~ l_x2 + l_x1+ dl_y | lag(l_x2, -1)+lag(l_x2, -2)+ lag(l_x1, -1)+lag(l_x1, -2)+lag(l_y, -1)+lag(l_y, -2), data=tax1) summary(tax_ivreg) ## after estimating it, something weird happened with the several tests in package strucchange. For example: cusum - efp(l_y ~ l_x2 + l_x1+ dl_y | lag(l_x2, -1)+lag(l_x2, -2)+ lag(l_x1, -1)+lag(l_x1, -2)+lag(l_y, -1)+lag(l_y, -2), data=tax1, type=OLS-CUSUM) sctest(cusum) plot(cusum) coef(cusum, breaks=2) ## And: cusum - efp(tax_ivreg, data=tax1, type=OLS-CUSUM) sctest(cusum) plot(cusum) coef(cusum, breaks=2) ## 1. The plot of the two above were very different and ## 2. When I ask for the breaks, instead of the dates, it returned me a line of the summary of the estimated tax_ivreg Any help would be very appreciated. Thanks Claudio -- http://www.shikida.net and http://works.bepress.com/claudio_shikida/ Esta mensagem pode conter informação confidencial e/ou privilegiada. Se você não for o destinatário ou a pessoa autorizada a receber esta mensagem, não poderá usar, copiar ou divulgar as informações nela contidas ou tomar qualquer ação baseada nessas informações. Se você recebeu esta mensagem por engano, por favor avise imediatamente o remetente, respondendo o presente e-mail e apague-o em seguida. This message may contain confidential and/or privileged ...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with modFit of FME package
On Aug 1, 2011, at 3:41 AM, Paola Lecca wrote: Dear R users, I'm trying to fit a set an ODE to an experimental time series. In the attachment you find the R code I wrote using modFit and modCost of FME package and the file of the time series. This is getting a bit tiresome. None of the three duplicate such messages have had successful attachment of any code. Why don't you look at what got distributed to the list? The rule I have developed is that I should assume that any file not ending in .pdf or .txt will get scrubbed by the mail server. I realize it is not an exact rule, but it keeps me from submitting files ending in .r or .rdata because I know they will get scrubbed. For some reason a recent rewrite of the Posting Guide appears to have left out this information which my memory tells me used to be there last year. -- David. When I run summary(Fit) I obtain this error message, and the values of the parameters are equal to the initial guesses I gave to them. The problem is not due to the fact that I have only one equation (I tried also with more equations, but I still obtain this error). I would appreciate if someone could help me in understanding the reason of the error and in fixing it. Thanks for your attention, Paola Lecca. Here the error: summary(Fit) Parameters: Estimate Std. Error t value Pr(|t|) pro1_strength1 NA NA NA Residual standard error: 2.124 on 10 degrees of freedom Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix In addition: Warning message: In summary.modFit(Fit) : Cannot estimate covariance; system is singular __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *Paola Lecca, PhD* *The Microsoft Research - University of Trento* *Centre for Computational and Systems Biology* *Piazza Manci 17 38123 Povo/Trento, Italy* *Phome: +39 0461282843* *Fax: +39 0461282814* wild_pp1_mrna.txt__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with modFit of FME package 2
* Apologies for multiple posting * I attached to my previous e-mail a .r file, and it was not permitted by the rules of the mailing lis. Again, please receive my sincere apologies for this. I re-send again the e-mail with .txt attachemnt in the hope someone an help me to solve my problem. I'm trying to fit a set an ODE to an experimental time series. In the attachment you find the R code I wrote using modFit and modCost of FME package and the file of the time series. When I run summary(Fit) I obtain this error message, and the values of the parameters are equal to the initial guesses I gave to them. The problem is not due to the fact that I have only one equation (I tried also with more equations, but I still obtain this error). I would appreciate if someone could help me in understanding the reason of the error and in fixing it. Thanks for your attention, Paola Lecca. Here the error: summary(Fit) Parameters: Estimate Std. Error t value Pr(|t|) pro1_strength1 NA NA NA Residual standard error: 2.124 on 10 degrees of freedom Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix In addition: Warning message: In summary.modFit(Fit) : Cannot estimate covariance; system is singular __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *Paola Lecca, PhD* *The Microsoft Research - University of Trento* *Centre for Computational and Systems Biology* *Piazza Manci 17 38123 Povo/Trento, Italy* *Phome: +39 0461282843* *Fax: +39 0461282814* timepp1_mrna 0 0 2 2.754 4 2.958 6 4.058 8 3.41 10 3.459 12 2.453 14 1.234 16 2.385 18 3.691 20 3.252 require(deSolve) require(FME) ## # PART 1 # ## # Differential equations model_1_part_1 - function(t, S, parameters) { with(as.list(parameters), { # cod1 = pro1_strength # pp1_mrna_degradation_rate - 1 ### # v1 = cod1 v2 = pp1_mrna_degradation_rate * S[1] # # # dS1 = v1 - v2 # # list(c(dS1)) }) } # Parameters parms_part_1 - c(pro1_strength = 1.0) # Initial values of the species concentration S - c(pp1_mrna = 0) times - seq(0, 20, by = 2) # Solve the system ode_solutions_part_1 - ode(S, times, model_1_part_1, parms = parms_part_1) ode_solutions_part_1 summary(ode_solutions_part_1) ## Default plot method plot(ode_solutions_part_1) # Estimate of the parameters experiment - read.table(./wild_pp1_mrna.txt, header=TRUE) rw - dim(experiment)[1] names - array(, rw) for (i in 1:rw) { names[i] - pp1_mrna } names observed_data_part_1 - data.frame(name = names, time = experiment[,1], val = experiment[,2]) observed_data_part_1 ode_solutions_part_1 Cost_function - function (pars) { out - ode_solutions_part_1 cost - modCost(model = out, obs = observed_data_part_1, y = val) cost } Cost_function(parms) # Fit the model to the observed data Fit - modFit(f = Cost_function, p = parms_part_1) Fit # Summary of the fit summary(Fit) # Model coefficients coef(Fit) # Deviance of the fit deviance(Fit)__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: converting factor to numeric gives NAs introduced by coercion
If you are not going to be using factors, then you can keep everything a character (if there are non-numerics in a column) by adding 'as.is=TRUE' as a parameter on the 'rad.table' functions. On Mon, Aug 1, 2011 at 7:55 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi Hi, I have a dataframe that I imported from a .txt file by: skogTemp - read.delim2(Skogaryd_shoot_data.txt, header=TRUE, fill=TRUE) and the data are factors, how can avoid factors from the beginning? Although the file contains both characters and numbers. You have got an answer but here are some comments. If you have characters and numbers in one column the character values are converted to NA by as.numeric I tried to convert some of the columns from factor to numeric and as I understood it you can not use only as.numeric but as.character first. I got this warning message: skogTemp_1 - as.numeric(as.character(skogTemp_1[,2:4])) Warning message: NAs introduced by coercion What is skogTemp_1? I presume skogTemp is data frame and in that case you can not use such construction directly. I have lots of NAs in my data. Tries to check what class I had now but another warning is given me: class(skogTemp_1[,2]) skogTemp_1 is probably a vector with only one dimension therefore you get this error. class(skogTemp_1) shall give you the desired result, however I prefer ?str Regards Petr Error in skogTemp_1[, 2] : incorrect number of dimensions class(skogTemp_1[1,2]) Error in skogTemp_1[1, 2] : incorrect number of dimensions frustrating... I don't know what this mean. Can anyone help? Thank you, Angelica -- View this message in context: http://r.789695.n4.nabble.com/converting- factor-to-numeric-gives-NAs-introduced-by-coercion-tp3703408p3703408.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] export/import matrix
On Jul 31, 2011, at 7:54 PM, Rosario Garcia Gil wrote: Hello I have a problem on keeping the format when I export a matrix file with the write.table() function. The quick answer is ... don't do that. Use save() if you want to preserve the attributes of an R object. And that especially applies if you don't understand the differences between R object types. I have discarded a longer answer that complained about your failure to provide complete code. -- David When I import the data volcano from rgl package it looks like this in R: data[1:5,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [, 13] [,14] [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 102 [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 103 [3,] 102 102 103 103 103 103 103 102 102 102 103 103 104 104 [4,] 103 103 104 104 104 104 104 103 103 103 103 104 104 104 [5,] 104 104 105 105 105 105 105 104 104 103 104 104 105 105 I use this data to represent a 3D map with the follwing script and it works PEFECT! y- 2*data x - 10* (1:nrow(y)) z - 10* (1:ncol(y)) ylim - range(y) ylen -ylim[2] - ylim[1] + 1 colorlut - terrain.colors(ylen) col - colorlut[y-ylim[1] + 1] rgl.open() rgl.surface(x,z,y, color=col, back=lines) Then I export it as write.table(data, file=datam.txt, row.names=TRUE, col.names=TRUE), when I import it back into R again with read.table(datam.txt) it looks like this in R: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 103 104 103 2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103 104 105 104 3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104 105 106 105 4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105 106 107 106 5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106 107 108 108 The script I mention before does not anymore work on it, if I converted to matrix with as.matrix still does not work. I have read the pdf on import/export of R and searched by googleling but I have not found any answer to my problem. I am sorry if the answer is very obvious but I have tried for more than a week. Any help is really wellcome, thanks in advance. Rosario __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] export/import matrix
If you are just exporting it so you can read it back into R later, it is better to use save/load since it keep the data in the internal format so it will look the same. Can you describe what you are going to be doing with the data that you 'export'; that might help us come up with a solution to your problem. On Sun, Jul 31, 2011 at 7:54 PM, Rosario Garcia Gil m.rosario.gar...@slu.se wrote: Hello I have a problem on keeping the format when I export a matrix file with the write.table() function. When I import the data volcano from rgl package it looks like this in R: data[1:5,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 102 [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 103 [3,] 102 102 103 103 103 103 103 102 102 102 103 103 104 104 [4,] 103 103 104 104 104 104 104 103 103 103 103 104 104 104 [5,] 104 104 105 105 105 105 105 104 104 103 104 104 105 105 I use this data to represent a 3D map with the follwing script and it works PEFECT! y- 2*data x - 10* (1:nrow(y)) z - 10* (1:ncol(y)) ylim - range(y) ylen -ylim[2] - ylim[1] + 1 colorlut - terrain.colors(ylen) col - colorlut[y-ylim[1] + 1] rgl.open() rgl.surface(x,z,y, color=col, back=lines) Then I export it as write.table(data, file=datam.txt, row.names=TRUE, col.names=TRUE), when I import it back into R again with read.table(datam.txt) it looks like this in R: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 103 104 103 2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103 104 105 104 3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104 105 106 105 4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105 106 107 106 5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106 107 108 108 The script I mention before does not anymore work on it, if I converted to matrix with as.matrix still does not work. I have read the pdf on import/export of R and searched by googleling but I have not found any answer to my problem. I am sorry if the answer is very obvious but I have tried for more than a week. Any help is really wellcome, thanks in advance. Rosario __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use dump or write? or what?
Can you define better exactly what you what to do with the data. I would suggest that you keep each of the outputs (objects) of the test in a 'list' that way you can access each one and do what you need. You can also 'save' the list and later 'load' it into another session. On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio matt.curcio...@gmail.com wrote: Greetings all, I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. But I can't figure out how to write it to file and then append subsequent t-tests. (maybe too tired ;} ) I have tried to use dump and file.append to no avial. ttest_results = tempfile() two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE) welch_ttest - t.test (tempA, tempB, var.equal = FALSE) dump (two_sample_ttest, file = dumpdata.txt, append=TRUE) ttest_results - file.append (ttest_results, two_sample_ttest) Any suggestions, M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R the right choice for simulating first passage times of random walks?
Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt : Glad to help -- I haven't taken a look at Dennis' solution (which may be far better than mine), but if you do want to keep going down the path outlined below you might consider the following: I will try Dennis’ solution right away but looked at your suggestions first. Thank you very much. Instead of throwing away a simulation if something starts negative, why not just multiply the entire sample by -1: that lets you still use the sample and saves you some computations: of course you'll have to remember to adjust your final results accordingly. That is a nice suggestion. For a symmetric random walk this is indeed possible and equivalent to looking when the walk first hits zero. This might avoid the loop: x = ## Whatever x is. xLag = c(0,x[-length(x)]) # 'lag' x by 1 step. which.max((x=0) (xLag 0)) + 1 # Depending on how you've decided to count things, this +1 may be extraneous. The inner expression sets a 0 except where there is a switch from negative to positive and a one there: the which.max function returns the location of the first maximum, which is the first 1, in the vector. If you are guaranteed the run starts negative, then the location of the first positive should give you the length of the negative run. That is the same idea as from Bill [1]. The problem is, when the walk never returns to zero in a sample, `which.max(»everything FALSE)` returns 1 [2]. That is no problem though, when we do not have to worry about a walk starting with a positive value and adding 1 (+1) can be omitted when we count the epochs of first hitting 0 instead of the time of how long the walk stayed negative, which is always one less. Additionally my check `(x=0) (xLag 0)` is redundant when we know we start with a negative value. `(x=0)` should be good enough in this case. This all gives you, f4 - function(n = 10, # number of simulations length = 10) # length of iterated sum { R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) R = apply(R,1,cumsum) R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row The line above seems to look the columns instead of rows. I think the following is correct since after the `apply()` above the random walks are in the columns. R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] fTemp - function(x) { xLag = c(0,x[-length(x)]) return(which.max((x=0) (xLag 0))+1) countNegative = apply(R,2,fTemp) tabulate(as.vector(countNegative), length) } That just crashed my computer though, so I wouldn't recommend it for large n,length. Welcome to my world. I would have never thought that simulating random walks with a length of say a million would create that much data and push common desktop systems with let us say 4 GB of RAM to their limits. Instead, you can help a little by combining the lagging and the all in one. f4 - function(n = 10, llength = 10) { R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) R = apply(R,1,cumsum) R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row R = (cbind(rep(0,NROW(R)),R)0)(cbind(R,rep(0,NROW(R)))=0) countNegative = apply(R,1,which.max) + 1 return (tabulate(as.vector(countNegative), length) ) } I left that one out, because as written above the check can be shortened. Of course, this is all starting to approach a very specific question that could actually be approached much more efficiently if it's your end goal (though I think I remember from your first email a different end goal): That is true. But to learn some optimization techniques on a simple example is much appreciated and will hopefully help me later on for the iterated random walk cases. We can use the symmetry and restartability of RW to do the following: x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T) D = diff(which(x == 0)) Nice! This will give you a vector of how long x stays positive or negative at a time. Thinking through some simple translations lets you see that this set has the same distribution as how long a RW that starts negative stays negative. I have to write those translations down. On first sight though we need again to handle the case where it stays negative the whole time. `D` then has length 0 and we have to count that for a walk longer than `BIGNUMBER`. Again, this is only good for answering a very specific question about random walks and may not be useful if you have other more complicated questions in sight. Just testing for 0 for the iterated cases will not be enough for iterated random walks since an iterated random walk can go from negative to non-negative without being zero at this time/epoch. I implemented all your suggestions and got
[R] Problem Fixed: axes label
Hi Peter, Many thanks. It worked. Regards Ogbos On 1 August 2011 14:05, Peter Ehlers ehl...@ucalgary.ca wrote: On 2011-08-01 03:32, ogbos okike wrote: Dear All, I am trying to put 10^-8 st km^-2day^-1 on x-axis of my plot. I tried using : ylab = expression(paste(st / , plain(km)^2, / day)) to see if I can at least get the unit before thinking about the power of 10 (10^-8). However, ylab = expression(paste(st / , plain(km)^2, / day)) didn't give the result I expected. The power 2 in km was missing. Works for me. But I don't see the need for paste() or plain(). Try this: plot(0, ylab=, xlab=) title(ylab = expression(10^{-8} ~ st ~ km^{-2} ~ day^{-1})) Replace any '~' with '*' if you don't want the space. Peter Ehlers I will be glad for any help on how to label 10^-8 st km^-2day^-1 on the axis. Many thanks Regards Ogbos [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use dump or write? or what?
Greetings all, Thanks for all your help so far. Let me give a better idea of what I am doing. I have hundreds of files that I need to plow thru with a t-test and correlation test. BTW, 'tempA' and tempB' are simply columns of numbers from a gene-chip experiment that spits out dna 'amounts'. So I have set up a loop to read the files and carry out the tests but need to save it for later inspection (and Jim H-you are probably right, for later inspection). By inspection I mean I don't know what I want to do with it yet, Remember: That's why they call it Research. So it seems that 'save/load' might be a good alternative for my work. Any suggestions, M On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio matt.curcio...@gmail.com wrote: Greetings all, I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. But I can't figure out how to write it to file and then append subsequent t-tests. (maybe too tired ;} ) I have tried to use dump and file.append to no avial. ttest_results = tempfile() two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE) welch_ttest - t.test (tempA, tempB, var.equal = FALSE) dump (two_sample_ttest, file = dumpdata.txt, append=TRUE) ttest_results - file.append (ttest_results, two_sample_ttest) Any suggestions, M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] formula used by R to compute the t-values in a linear regression
Hello, I was wondering if someone knows the formula used by the function lm to compute the t-values. I am trying to implement a linear regression myself. Assuming that I have K variables, and N observations, the formula I am using is: For the k-th variable, t-value= b_k/sigma_k With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) _kk is its standard deviation. I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) With sigma: the estimated standard deviation of the residuals, Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) With: N: number of observations K: number of variables This formula comes from my old course of econometrics. For some reason it doesn't match the t-value produced by R (I am off by about 1%). I can match the other results produced by R (coefficients of the regression, r squared, etc.). I would be grateful if someone could provide some clarifications. Samuel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula used by R to compute the t-values in a linear regression
On Aug 1, 2011, at 9:27 AM, Samuel Le wrote: Hello, I was wondering if someone knows the formula used by the function lm to compute the t-values. I am trying to implement a linear regression myself. Assuming that I have K variables, and N observations, the formula I am using is: For the k-th variable, t-value= b_k/sigma_k With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) _kk is its standard deviation. I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) With sigma: the estimated standard deviation of the residuals, Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) With: N: number of observations K: number of variables This formula comes from my old course of econometrics. For some reason it doesn't match the t-value produced by R (I am off by about 1%). I can match the other results produced by R (coefficients of the regression, r squared, etc.). Usually such a small difference results from using different degrees of freedom. Have you reduced the df's appropriately after considering the number of other estimated parameters? Just quoting code from you econometrics reference is not enough to answer the question. We would need to see code... as the message states at the end of every posting.) I would be grateful if someone could provide some clarifications. Samuel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula used by R to compute the t-values in a linear regression
On Aug 1, 2011, at 15:27 , Samuel Le wrote: Hello, I was wondering if someone knows the formula used by the function lm to compute the t-values. I am trying to implement a linear regression myself. Assuming that I have K variables, and N observations, the formula I am using is: For the k-th variable, t-value= b_k/sigma_k With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) _kk is its standard deviation. I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) With sigma: the estimated standard deviation of the residuals, Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) With: N: number of observations K: number of variables This formula comes from my old course of econometrics. For some reason it doesn't match the t-value produced by R (I am off by about 1%). I can match the other results produced by R (coefficients of the regression, r squared, etc.). I would be grateful if someone could provide some clarifications. AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k involves matrix inversion. Also, even for K=1, beware that textbook formulas like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and easily loses multiple digits of precision, so software tends to use rather more careful algorithms. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading name-value data
Yes! Would you mind filing an issue so I dont forget? Hadley On Friday, July 29, 2011, Stavros Macrakis macra...@alum.mit.edu wrote: Perfect! Thanks! By the way, I see that, unlike base rbind, it does not work for vectors and lists: rbind(c(a=1),c(b=2)) = matrix(1:2,2,1,dimnames=list(NULL,a)) == as.matrix(data.frame(a=1:2)) but rbind.fill(c(a=1),c(b=2)) = NULL Shouldn't it give something like matrix(c(1,NA,NA,2),2,2,dimnames=list(NULL,c(a,b))) or data.frame(a=c(1,NA),b=c(NA,2)) If, on the other hand, it insists on data.frames as input, it should err out if give non-data-frames. -s On Thu, Jul 28, 2011 at 19:30, Hadley Wickham had...@rice.edu wrote: Use plyr::rbind.fill? That does match up columns by name. Hadley On Thu, Jul 28, 2011 at 5:23 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I have a file of data where each line is a series of name-value pairs, but where the names are not necessarily the same from line to line, e.g. a=1,b=2,d=5 b=4,c=3,e=3 a=5,d=1 I would like to create a data frame which lines up the data in the corresponding columns. In this case, this would be data.frame( a = (1, NA, 4), b = (2, 4, NA), c = (NA, 3, NA), d = (5, NA, 1), e = (NA, 3, 1) ) One way I can think of doing this is to read in the data as one 'long' data frame per line with a unique ID, e.g. line one becomes cbind(id=1,data.frame(variable=c('a','b','d'),value=c(1,2,5))) then rbind all the lines and use the reshape package function 'cast'. Is there a more straightforward way? (I'd have thought rbind would line up columns by name, but it doesn't.) -s -- You received this message because you are subscribed to the Google Groups manipulatr group. To post to this group, send email to manipul...@googlegroups.com. To unsubscribe from this group, send email to manipulatr+unsubscr...@googlegroups.com manipulatr%2bunsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/manipulatr?hl=en. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ http://had.co.nz/ -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula used by R to compute the t-values in a linear regression
-Original Message- [mailto:r-help-boun...@r-project.org] On Behalf Of Samuel Le Subject: [R] formula used by R to compute the t-values in a linear regression I was wondering if someone knows the formula used by the function lm to compute the t-values. Typing summary.lm I found the standard error and t calculation (for around line 58-62 of the resulting listing. resvar - rss/rdf R - chol2inv(Qr$qr[p1, p1, drop = FALSE]) se - sqrt(diag(R) * resvar) est - z$coefficients[Qr$pivot[p1]] tval - est/se You can also find (rather further up) that the degrees of freedom df used are taken directly from the linear model $df (z$df in the function). Others noted that incorrect df often cause problems, so checking that you're using the correct df is possible by inspecting the lm summary. The standard errors are apparently (as is usual for a least squares problem, I think) taken from the diagonal of the inverse of the hessian, multiplied by the residual variance. Unfortunately I could not get at the hessian calculation quite as easily (it looks like it uses a function that's not exported from stats) so that's left as an exercise in browsing source code ... S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula used by R to compute the t-values in a linear regression
Exactly. My formula holds only for k=1, this is how I generated it. Do you have any references concerning the rather more careful algorithms? Thanks, Samuel -Original Message- From: peter dalgaard [mailto:pda...@gmail.com] Sent: 01 August 2011 14:45 To: Samuel Le Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] formula used by R to compute the t-values in a linear regression On Aug 1, 2011, at 15:27 , Samuel Le wrote: Hello, I was wondering if someone knows the formula used by the function lm to compute the t-values. I am trying to implement a linear regression myself. Assuming that I have K variables, and N observations, the formula I am using is: For the k-th variable, t-value= b_k/sigma_k With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) _kk is its standard deviation. I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) With sigma: the estimated standard deviation of the residuals, Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) With: N: number of observations K: number of variables This formula comes from my old course of econometrics. For some reason it doesn't match the t-value produced by R (I am off by about 1%). I can match the other results produced by R (coefficients of the regression, r squared, etc.). I would be grateful if someone could provide some clarifications. AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k involves matrix inversion. Also, even for K=1, beware that textbook formulas like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and easily loses multiple digits of precision, so software tends to use rather more careful algorithms. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula used by R to compute the t-values in a linear regression
Yes, that's what I was looking for. Many thanks, Samuel -Original Message- From: S Ellison [mailto:s.elli...@lgcgroup.com] Sent: 01 August 2011 15:16 To: Samuel Le; r-h...@stat.math.ethz.ch Subject: RE: formula used by R to compute the t-values in a linear regression -Original Message- [mailto:r-help-boun...@r-project.org] On Behalf Of Samuel Le Subject: [R] formula used by R to compute the t-values in a linear regression I was wondering if someone knows the formula used by the function lm to compute the t-values. Typing summary.lm I found the standard error and t calculation (for around line 58-62 of the resulting listing. resvar - rss/rdf R - chol2inv(Qr$qr[p1, p1, drop = FALSE]) se - sqrt(diag(R) * resvar) est - z$coefficients[Qr$pivot[p1]] tval - est/se You can also find (rather further up) that the degrees of freedom df used are taken directly from the linear model $df (z$df in the function). Others noted that incorrect df often cause problems, so checking that you're using the correct df is possible by inspecting the lm summary. The standard errors are apparently (as is usual for a least squares problem, I think) taken from the diagonal of the inverse of the hessian, multiplied by the residual variance. Unfortunately I could not get at the hessian calculation quite as easily (it looks like it uses a function that's not exported from stats) so that's left as an exercise in browsing source code ... S Ellison *** This email and any attachments are confidential. Any use...{{dropped:25}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] General indexing in multidimensional arrays
Dear R community, I have a general question regarding indexing in multidiemensional arrays. Imagine I have a three dimensional array and I only want to extract on vector along a single dimension from it: data- array(rnorm(64),dim=c(4,4,4)) result - data[1,1,] If I want to extract more than one of these vectors, it would now really help me to supply a logical matrix of the size of the first two dimensions: indices- matrix(FALSE,ncol=4,nrow=4) indices[1,3] - TRUE indices[4,1] - TRUE result - data[indices,] This, however would give me an error. I am used to this kind of indexing from Matlab and was wonderingt whether there exists an easy way to do this in R without supplying complicated index matrices of all three dimensions or logical vectors of the size of the whole matrix? The only way I could imagine would be to: result - data[rep(as.vector(indices),times=4)] but this seems rather complicated and also depends on the order of the dimensions I want to extract. I do not want R to copy Matlabs behaviour, I am just wondering whether I missed one concept of indexing in R? Thanks a lot Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb
Thanks a lot for the help. Actually, I am using a mac which (R for Mac OS X GUI 1.40-devel Leopard build 32-bit (5751)) but I think I can find access on windows 7 64-bit. What I am trying to do is a maximization through grid search (because I am not sure that any of the optim() methods works sufficiently to my case, at least all of them provide quite different results), the reason that I want the optimizing is because I want to use it for a Monte Carlo analysis for Smoothed Maximum Score estimator, and for that reason I want the optimization to be the most efficient possible, but given that I am kind of amateur on R and on programming in general, I doubt that I can do that sufficiently. Thanks again for your help Dimitris -- View this message in context: http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3709002.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with gam()
Dear group, I experience some problems with gam() function after R update to version 2.13.1 The function in both gam and mgcv packages stopped to work. Before, with the same code I used, everything was fine. The function from gam package yields following warning: Residual degrees of freedom are negative or zero. This occurs when the sum of the parametric and nonparametric degrees of freedom exceeds the number of observations. The model is probably too complex for the amount of data available. while gam() from mgcv crashes R. Did I miss something? Thank you in advance. PJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with gam() after R update
Dear group, I experience some problems with gam() function after R update to version 2.13.1 The function in both gam and mgcv packages stopped to work. Before, with the same code I used, everything was fine. The function from gam package yields following warning: Residual degrees of freedom are negative or zero. This occurs when the sum of the parametric and nonparametric degrees of freedom exceeds the number of observations. The model is probably too complex for the amount of data available while gam() from mgcv crashes R. Did I miss something? Thank you in advance. PJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting question
Hi, I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are distinguished by a third variable which is categorical with 10 levels. I have been plotting x against y and using gray scales to distinguish the level of the categorical variable for each point. It looks ok to me but a journal reviewer says this is not any use. I cannot afford to pay for colour prints. Any ideas on what is the best way to distinguish 10 groups on an xy scatter plot? If all else fails I can just remove the graph and give them a table of regression coefficients. Thanks. Yours Sincerely Andrew McCulloch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to colour specific edges in a dendrogram
Dear Mailing-list I used hclust to make a dendrogram of 2613 leafs. I also have a list with the names of certain labels which are of interest and I would like to visualize their appearance within the dendrogram. I found an example how to use dendrapply to colour the labels but the problem is that with 2613 leafs I cannot plot the labels as it gets super messy. I now tried to write a function using dendrapply() to colour the edges of the leafs of interest red. Unfortunately, I fail writing this function. Could someone help me out with the stub of a function colouring edges? I have the dendrogram list of labels to colour their edges I would like to colour the edges between the final leaf node and their parental node. Thank you very much for your help! Jan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Impact of multiple imputation on correlations
Dear all, I have been attempting to use multiple imputation (MI) to handle missing data in my study. I use the mice package in R for this. The deeper I get into this process, the more I realize I first need to understand some basic concepts which I hope you can help me with. For example, let us consider two arbitrary variables in my study that have the following missingness pattern: Variable 1 available, Variable 2 available: 51 (of 118 observations, 43%) Variable 1 available, Variable 2 missing: 37 (31,3%) Variable 1 missing, Variable 2 available: 10 (8,4%) Variable 1 missing, Variable 2 missing: 20 (16,9%) I am interested in the correlation between Variable 1 and Variable 2. Q1. Does it even make sense for me to use MI (or anything else, really) to replace my missing data when such large fractions are not available? Plot 1 (http://imgur.com/KFV9yCmV1sl) provides a scatter plot of these example variables in the original data. The correlation coefficient r = -0.34 and p = 0.016. Q2. I notice that correlations between variables in imputed data (pooled estimates over all imputations) are much lower and less significant than the correlations in the original data. For this example, the pooled estimates for the imputed data show r = -0.11 and p = 0.22. Since this seems to happen in all the variable combinations that I have looked at, I would like to know if MI is known to have this behavior, or whether this is specific to my imputation. Q3. When going through the imputations, the distribution of the individual variables (min, max, mean, etc.) matches the original data. However, correlations and least-square line fits vary quite a bit from imputation to imputation (see Plot 2, http://imgur.com/KFV9ylCmV1s). Is this normal? Q4. Since my results differ (quite significantly) between the original and imputed data, which one should I trust? Thank you for your help in advance. Tina -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-Forge] R 2.13.1 can't find package binaries on R-Forge
Dear all, this must have been a temporary problem. In this case I assume that the build cycle did not finish in time, i.e., binaries were synced to the staging area although not all were built. best, stefan On 07/31/2011 05:52 PM, David Winsemius wrote: On Jul 31, 2011, at 11:26 AM, Michael Friendly wrote: [Env: Win XP] I've just upgraded from R 2.12.2 to R 2.13.1. As part of my upgrade process, I typically install some in-development packages from R-Forge that are not on cran. But for the first time, it doesn't work. e.g., install.packages(p3d, repos=http://R-Forge.R-project.org;) trying URL 'http://R-Forge.R-project.org/bin/windows/contrib/2.13/p3d_0.02-2.zip' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://R-Forge.R-project.org/bin/windows/contrib/2.13/p3d_0.02-2.zip' In addition: Warning message: In download.file(url, destfile, method, mode = wb, ...) : cannot open: HTTP status was '404 Not Found' Warning in download.packages(pkgs, destdir = tmpd, available = available, : download of package 'p3d' failed The list of packages I install this way is: special- c(p3d, patchDVI, spacemakeR, spida) install.packages(special,repos=http://R-Forge.R-project.org;) Is this just an R-Forge problem? I'm not informed about the workings of r-forge, but did you notice that there were no packages in that bin/windows directory whose alphabetical collation would be after lowercase i. That seems to suggest some sort of system error encountered before the next package after ipreds was completed. On the project page the binaries for windows are listed as offline. https://r-forge.r-project.org/R/?group_id=431 I don't see any C modules in the source. Have you tried installing from source? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General indexing in multidimensional arrays
On 11-08-01 5:38 AM, Jannis wrote: Dear R community, I have a general question regarding indexing in multidiemensional arrays. Imagine I have a three dimensional array and I only want to extract on vector along a single dimension from it: data- array(rnorm(64),dim=c(4,4,4)) result- data[1,1,] If I want to extract more than one of these vectors, it would now really help me to supply a logical matrix of the size of the first two dimensions: indices- matrix(FALSE,ncol=4,nrow=4) indices[1,3]- TRUE indices[4,1]- TRUE result- data[indices,] This, however would give me an error. I am used to this kind of indexing from Matlab and was wonderingt whether there exists an easy way to do this in R without supplying complicated index matrices of all three dimensions or logical vectors of the size of the whole matrix? The only way I could imagine would be to: result- data[rep(as.vector(indices),times=4)] but this seems rather complicated and also depends on the order of the dimensions I want to extract. I do not want R to copy Matlabs behaviour, I am just wondering whether I missed one concept of indexing in R? Base R doesn't have anything like that as far as I know. The closest is matrix indexing: you construct a 3 column matrix whose rows are the indices of each element you want to extract. Possibly plyr or some other package has functions to do this. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error message jpeg62.dll missing
Dear R-help We are getting an error message `jpeg62.dll missing'. We are running Windows 7 64-bit, from a Mac using Boot Camp. Do you know of this error message, and can you give us help trying to resolve the problem? many thanks Rocky Rocky Hyacinth Technician Department of Archaeology University of Sheffield United Kingdom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting question
On 11-08-01 5:44 AM, Andrew McCulloch wrote: Hi, I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are distinguished by a third variable which is categorical with 10 levels. I have been plotting x against y and using gray scales to distinguish the level of the categorical variable for each point. It looks ok to me but a journal reviewer says this is not any use. I cannot afford to pay for colour prints. Any ideas on what is the best way to distinguish 10 groups on an xy scatter plot? Plot digits or letters or other symbols. Duncan Murdoch If all else fails I can just remove the graph and give them a table of regression coefficients. Thanks. Yours Sincerely Andrew McCulloch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a sinus curve
Dear David and Hans- Werner, Thank you very much for your help. I would like to compare now if a polynomial or the sinus model fits better. How can I see R-squared or the F- Statistic for the sinus regression, so as to be able to compare it with the polynomial model? Thanks a lot and have a nice evening. Best, Mairanne -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Hans W Borchers Sent: Friday, July 29, 2011 12:21 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] fitting a sinus curve David Winsemius dwinsemius at comcast.net writes: On Jul 28, 2011, at 1:07 PM, Hans W Borchers wrote: maaariiianne marianne.zeyringer at ec.europa.eu writes: Dear R community! I am new to R and would be very grateful for any kind of help. I am a PhD student and need to fit a model to an electricity load profile of a household (curve with two peaks). I was thinking of looking if a polynomial of 4th order, a sinus/cosinus combination or a combination of 3 parabels fits the data best. I have problems with the sinus/cosinus regression: time - c( 0.00, 0.15, 0.30, 0.45, 1.00, 1.15, 1.30, 1.45, 2.00, 2.15, 2.30, 2.45, 3.00, 3.15, 3.30, 3.45, 4.00, 4.15, 4.30, 4.45, 5.00, 5.15, 5.30, 5.45, 6.00, 6.15, 6.30, 6.45, 7.00, 7.15, 7.30, 7.45, 8.00, 8.15, 8.30, 8.45, 9.00, 9.15, 9.30, 9.45, 10.00, 10.15, 10.30, 10.45, 11.00, 11.15, 11.30, 11.45, 12.00, 12.15, 12.30, 12.45, 13.00, 13.15, 13.30, 13.45, 14.00, 14.15, 14.30, 14.45, 15.00, 15.15, 15.30, 15.45, 16.00, 16.15, 16.30, 16.45, 17.00, 17.15, 17.30, 17.45, 18.00, 18.15, 18.30, 18.45, 19.00, 19.15, 19.30, 19.45, 20.00, 20.15, 20.30, 20.45, 21.00, 21.15, 21.30, 21.45, 22.00, 22.15, 22.30, 22.45, 23.00, 23.15, 23.30, 23.45) watt - c( 94.1, 70.8, 68.2, 65.9, 63.3, 59.5, 55, 50.5, 46.6, 43.9, 42.3, 41.4, 40.8, 40.3, 39.9, 39.5, 39.1, 38.8, 38.5, 38.3, 38.3, 38.5, 39.1, 40.3, 42.4, 45.6, 49.9, 55.3, 61.6, 68.9, 77.1, 86.1, 95.7, 105.8, 115.8, 124.9, 132.3, 137.6, 141.1, 143.3, 144.8, 146, 147.2, 148.4, 149.8, 151.5, 153.5, 156, 159, 162.4, 165.8, 168.4, 169.8, 169.4, 167.6, 164.8, 161.5, 158.1, 154.9, 151.8, 149, 146.5, 144.4, 142.7, 141.5, 140.9, 141.7, 144.9, 151.5, 161.9, 174.6, 187.4, 198.1, 205.2, 209.1, 211.1, 212.2, 213.2, 213, 210.4, 203.9, 192.9, 179, 164.4, 151.5, 141.9, 135.3, 131, 128.2, 126.1, 124.1, 121.6, 118.2, 113.4, 107.4, 100.8) df-data.frame(time, watt) lmfit - lm(time ~ watt + cos(time) + sin(time), data = df) Your regression formula does not make sense to me. You seem to expect a periodic function within 24 hours, and if not it would still be possible to subtract the trend and then look at a periodic solution. Applying a trigonometric regression results in the following approximations: library(pracma) plot(2*pi*time/24, watt, col=red) ts - seq(0, 2*pi, len = 100) xs6 - trigApprox(ts, watt, 6) xs8 - trigApprox(ts, watt, 8) lines(ts, xs6, col=blue, lwd=2) lines(ts, xs8, col=green, lwd=2) grid() where as examples the trigonometric fits of degree 6 and 8 are used. I would not advise to use higher orders, even if the fit is not perfect. Thank you ! That is a real gem of a worked example. Not only did it introduce me to a useful package I was not familiar with, but there was even a worked example in one of the help pages that might have specifically answered the question about getting a 2nd(?) order trig regression. If I understood the commentary on that page, this method might also be appropriate for an irregular time series, whereas trigApprox and trigPoly would not? That's true. For the moment, the trigPoly() function works correctly only with equidistant data between 0 and 2*pi. This is adapted from the trigPoly help page in Hans Werner's pracma package: The error I made myself was to take the 'time' variable literally, though obviously the numbers after the decimal point were meant as minutes. Thus time - seq(0, 23.75, len = 96) would be a better choice. The rest in your adaptation is absolutely correct. A - cbind(1, cos(pi*time/24), sin(pi*time/24), cos(2*pi*time/24), sin(2*pi*time/24)) (ab - qr.solve(A, watt)) # [1] 127.29131 -26.88824 -10.06134 -36.22793 -38.56219 ts - seq(0, pi, length.out = 100) xs - ab[1] + ab[2]*cos(ts) + ab[3]*sin(ts) + ab[4]*cos(2*ts) + ab[5]*sin(2*ts) plot(pi*time/24, watt, col = red, xlim=c(0, pi), ylim=range(watt), main = Trigonometric Regression) lines(ts, xs, col=blue) Hans: I corrected the spelling of Trigonometric, but other than that I may well have introduced other errors for which I would be happy to be corrected. For instance, I'm unsure of the terminology regarding the ordinality of this model. I'm also not sure if my pi/24 and 2*pi/24 factors were correct in normalizing the time scale, although the prediction seemed sensible. And yes, this curve is the best
Re: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb
On Aug 1, 2011, at 3:04 AM, Dimitris.Kapetanakis wrote: Thanks a lot for the help. Actually, I am using a mac which (R for Mac OS X GUI 1.40-devel Leopard build 32-bit (5751)) but I think I can find access on windows 7 64- bit. I don't think that was what Holtman was advising. You just need more available memory, no need to use Win7. The Mac platform has been 64- bit capable longer than the Windoze OS, anyway. The way you get there might be as simple as rebooting, not starting any other applications, and re-running your code. Success depends upon how much addressable memory you have, which you did not state. All of the stuff below is immaterial to these considerations. What I am trying to do is a maximization through grid search (because I am not sure that any of the optim() methods works sufficiently to my case, at least all of them provide quite different results), the reason that I want the optimizing is because I want to use it for a Monte Carlo analysis for Smoothed Maximum Score estimator, and for that reason I want the optimization to be the most efficient possible, but given that I am kind of amateur on R and on programming in general, I doubt that I can do that sufficiently. Your code ran without problem on my Mac running Leopard using an R64 GUI session with 32 GB RAM (R.app GUI 1.41 (5866)). str(G.search) num [1:4000, 1:3] 1 1 1 1 1 1 1 1 1 1 ... I have no idea whether it produced meaningful results, but a 120 million item matrix is not a problem with enough physical memory. It's only around a Gig. Your error indicated a problem with allocating 915.5 Mb. That should be possible (although borderline) in 4GB Mac running 32 bit R. (32 bit R is more memory efficient when working with physical memory of 4 GB or less because the pointer size is smaller.) -- david. -- View this message in context: http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3709002.html Sent from the R help mailing list archive at Nabble.com. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting problems directional or rose plots
Hi again, I have tried playing around with the code given to me by Alan and Jim, thank you for the code but unfortunatelyI can't seem to get either of them to work... Alans does not work with the sample data and Jims is giving the error : Error in radial.grid(labels = labels, label.pos = label.pos, radlab = radlab, : could not find function boxed.labels I have also tried Rose plots in the (heR.Misc) library to to avail. Sorry, does anyone know how to get the plots I need? Thank you all for reading this and for your help k. On Tue, Jul 26, 2011 at 10:20 PM, kitty kitty.a1...@gmail.com wrote: Hi, I'm trying to get a plot that looks somewhat like the attached image (sketched in word). I think I need somthing called a rose diagram? but I can't get it to do what I want. I'm happy to use any library. Essentially, I want a circle with degree slices every 10 degrees with 0 at the top representing north, and 'tick marks' around the outside in 10 degree increments to match the slices (so the slices need to be ofset by 5 degrees so the 0 degree slice actually faces north) I then want to be able to colour in the slices depending on the distance that the factor extends to; so for example the 9000 dist is the largest in the example so should fill the slice, a distance in this plot of 4500 would fill halfway up the slice. I also want to be able to specify the colour of each slice so that I can relate it back to the spatial correlograms I have. I have added some sample data below. Thank you for reading my post, All help is greatly appreciated, K sample data: #distance factor extends to dist-c(5000,7000,9000,4500,6000,500) #direction angle-c(0,10,20,30,40,50) #list of desired colour example, order corrisponds to associated angle/direction color.list-c('red','blue','green','yellow','pink','black') (my real data is from 0 to 350 degrees, and so I have corresponding distance and colour data for each 10 degree increment). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General indexing in multidimensional arrays
On Aug 1, 2011, at 10:50 AM, Duncan Murdoch wrote: On 11-08-01 5:38 AM, Jannis wrote: Dear R community, I have a general question regarding indexing in multidiemensional arrays. Imagine I have a three dimensional array and I only want to extract on vector along a single dimension from it: data- array(rnorm(64),dim=c(4,4,4)) result- data[1,1,] If I want to extract more than one of these vectors, it would now really help me to supply a logical matrix of the size of the first two dimensions: indices- matrix(FALSE,ncol=4,nrow=4) indices[1,3]- TRUE indices[4,1]- TRUE result- data[indices,] Is this the right answer? result- which(indices, arr.ind=TRUE) result row col [1,] 4 1 [2,] 1 3 apply(result, 1, function(x) data[x[1], x[2], ]) [,1] [,2] [1,] 1.62880528 0.7781005 [2,] -0.08861725 -2.1791674 [3,] 0.78242531 -1.0352826 [4,] 1.40012118 -1.2541230 if so, it should be possible to encapsulate that behavior in a function. -- David Winsemius, MD West Hartford, CT This, however would give me an error. I am used to this kind of indexing from Matlab and was wonderingt whether there exists an easy way to do this in R without supplying complicated index matrices of all three dimensions or logical vectors of the size of the whole matrix? The only way I could imagine would be to: result- data[rep(as.vector(indices),times=4)] but this seems rather complicated and also depends on the order of the dimensions I want to extract. I do not want R to copy Matlabs behaviour, I am just wondering whether I missed one concept of indexing in R? Base R doesn't have anything like that as far as I know. The closest is matrix indexing: you construct a 3 column matrix whose rows are the indices of each element you want to extract. Possibly plyr or some other package has functions to do this. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize(stack data) a dataframe inducing names
Dear Contributors thanks for any help you can provide. I searched the threads but I could not find any query that satisfied my needs. This is my database: index time values 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43 I need to select only the rows of this database that correspond to each of the first occurrences of the string represented in column index. In the example shown I would like to obtain a new data.frame which is index time values 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 13775 31215 DATA.Q211.WF.Index04/14/11 1.44 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 As you can see, it is not the whole string to change, rather a substring that is part of it. I want to select only the first values related to the row that presents for the first time the different part of the string(substring). I know how to select rows according to a substring condition on the index column, but I cannot use it here because the substring changes and moreover the number of occurrences per substring is variable. Thank you for any help you can provide. Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting question
plot(1:10, pch=letters[1:10]) On Mon, Aug 1, 2011 at 4:44 AM, Andrew McCulloch amccu...@yahoo.co.ukwrote: Hi, I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are distinguished by a third variable which is categorical with 10 levels. I have been plotting x against y and using gray scales to distinguish the level of the categorical variable for each point. It looks ok to me but a journal reviewer says this is not any use. I cannot afford to pay for colour prints. Any ideas on what is the best way to distinguish 10 groups on an xy scatter plot? If all else fails I can just remove the graph and give them a table of regression coefficients. Thanks. Yours Sincerely Andrew McCulloch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Write.table Question
Hi, I'm trying to create an abbreviated data file from a larger version. I can use the subset command to create a value for this data: dat -subset(raw.data, select=c(SNP, Pvalue)) head (dat) SNP Pvalue 1 rs11 0.6516 2 rs12 0.3311 3 rs13 0.5615 but when I try to write.table using: write.table (dat, file = /path/to/my/data.txt, sep = , col.names=NA) I end up with a file that looks like this: SNP Pvalue 1 rs11 0.6516 2 rs12 0.3311 3 rs13 0.5615 when what I want is something that looks like this: rs11 0.6516 rs12 0.3311 rs13 0.5615 What should I be including? Thanks, Margaux [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 5 arguments passed to .Internal(matrix) which requires 7
Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with gam() after R update
On Aug 1, 2011, at 5:01 AM, Przemek Jura wrote: Dear group, I experience some problems with gam() function after R update to version 2.13.1 The function in both gam and mgcv packages stopped to work. Before, with the same code I used, everything was fine. Reports like this often turn out to be inaccurate because either the (not offered) code was not the same or the (also not offered) data was different. Did you reinstall these packages? How? How many versions up was the update? sessionInfo()? The function from gam package yields following warning: Residual degrees of freedom are negative or zero. This occurs when the sum of the parametric and nonparametric degrees of freedom exceeds the number of observations. The model is probably too complex for the amount of data available That certainly looks like an informative error message. What do you want us to do about it? while gam() from mgcv crashes R. A report of a real crash should go to the package maintainer with a lot more detail than you have provided above. Did I miss something? Perhaps reading the Posting Guide? Thank you in advance. PJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting question
IMHO: On Mon, Aug 1, 2011 at 7:51 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 11-08-01 5:44 AM, Andrew McCulloch wrote: Hi, I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are distinguished by a third variable which is categorical with 10 levels. I have been plotting x against y and using gray scales to distinguish the level of the categorical variable for each point. It looks ok to me but a journal reviewer says this is not any use. I cannot afford to pay for colour prints. Any ideas on what is the best way to distinguish 10 groups on an xy scatter plot? Plot digits or letters or other symbols. Duncan Murdoch No, this does not work. See Cleveland's books (e.g. Visualizing Data). 10 is too many symbols to constantly refer to a legend to keep straight, and digits or letters do not allow you to readily perceive the pattern. (Caveat: If most of the data are only 2 or 3 of the symbols, then these can work). I think the OP's idea of using gray scales was better. I would dispute the reviewer and refer them to appropriate references. Alternatively, thermometer plots (aka filled rectangle plots) would be best. Again, Cleveland's books provide scientific justification rather than merely the (possibly uninformed) aesthetic opinion of a reviewer. Presumably, the journal editor would accept hard data and psychological research in preference to opinions. If all else fails I can just remove the graph and give them a table of regression coefficients. No. I think your attempt to use a graph is a much better way to go. Try to resist poor practices such as just publishing summary statistics. Cheers, Bert Thanks. Yours Sincerely Andrew McCulloch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General indexing in multidimensional arrays
What do you think about this? apply(data, 3, '[', indices) On Mon, Aug 1, 2011 at 4:38 AM, Jannis bt_jan...@yahoo.de wrote: Dear R community, I have a general question regarding indexing in multidiemensional arrays. Imagine I have a three dimensional array and I only want to extract on vector along a single dimension from it: data- array(rnorm(64),dim=c(4,4,4)) result - data[1,1,] If I want to extract more than one of these vectors, it would now really help me to supply a logical matrix of the size of the first two dimensions: indices- matrix(FALSE,ncol=4,nrow=4) indices[1,3] - TRUE indices[4,1] - TRUE result - data[indices,] This, however would give me an error. I am used to this kind of indexing from Matlab and was wonderingt whether there exists an easy way to do this in R without supplying complicated index matrices of all three dimensions or logical vectors of the size of the whole matrix? The only way I could imagine would be to: result - data[rep(as.vector(indices),**times=4)] but this seems rather complicated and also depends on the order of the dimensions I want to extract. I do not want R to copy Matlabs behaviour, I am just wondering whether I missed one concept of indexing in R? Thanks a lot Jannis __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Write.table Question
Hi Margaux, Check the row.names and col.names arguments of write.table. See ?write.table write.table (dat, file = /path/to/my/data.txt, sep = , col.names=FALSE, row.names=FALSE) HTH, Ivan Le 8/1/2011 17:18, Margaux Keller a écrit : Hi, I'm trying to create an abbreviated data file from a larger version. I can use the subset command to create a value for this data: dat-subset(raw.data, select=c(SNP, Pvalue)) head (dat) SNP Pvalue 1 rs11 0.6516 2 rs12 0.3311 3 rs13 0.5615 but when I try to write.table using: write.table (dat, file = /path/to/my/data.txt, sep = , col.names=NA) I end up with a file that looks like this: SNP Pvalue 1 rs11 0.6516 2 rs12 0.3311 3 rs13 0.5615 when what I want is something that looks like this: rs11 0.6516 rs12 0.3311 rs13 0.5615 What should I be including? Thanks, Margaux [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Dept. Mammalogy Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a nomogam and Calibration plot
Kindly do not attach questions in a separate document. Install and read the documentation for the R rms package, and see handouts at http://biostat.mc.vanderbilt.edu/rms Frank sytangping wrote: Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. My dataset and questions in detail are shown in two attached files. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc R_help.doc - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710126.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] possible reason for merge not working
Hi Guys, working on a merge for 2 data frames. Using the command: x - merge(annotatedData, UCSCgenes, by.x=names, by.y=Ensembl.Gene.ID, all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make a nomogam and Calibration plot
Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. My dataset and questions in detail are shown in two attached files. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc R_help.doc -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710068.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize(stack data) a dataframe inducing names
Try this: had to add extra names to your data since it was not clear how it was organized. Next time use 'dput' to enclose data. x - read.table(textConnection( index time key date values + 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 + 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 + 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 + 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 + 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 + 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 + 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 + 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 + 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 + 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 + 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 + 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 + 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 + 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 + 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 + 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 + 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43) + , header = TRUE + , as.is = TRUE + ) closeAllConnections() x index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 # get index of first occurance of 'key' column indx - !duplicated(x$key) x[indx,] index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 On Mon, Aug 1, 2011 at 11:13 AM, Francesca francesca.panco...@gmail.com wrote: Dear Contributors thanks for any help you can provide. I searched the threads but I could not find any query that satisfied my needs. This is my database: index time values 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 I need to select only the rows of this database that correspond to each of the first occurrences of the string represented in column index. In the example shown I would like to obtain a new data.frame which is index time values 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 As you can see, it is not the whole string to change, rather a substring that is part of it. I want to select only the first values related to the row that presents for the first time the different part of the string(substring). I know how to select
Re: [R] possible reason for merge not working
What you see and what the data really is may be two different things. You should have at least enclosed an 'str' of the two data frames; even better would be a subset of the data using 'dput'. Most likely your problem is that your data is not what you 'expect' it to be. On Mon, Aug 1, 2011 at 12:17 PM, world peace buysellrentof...@gmail.com wrote: Hi Guys, working on a merge for 2 data frames. Using the command: x - merge(annotatedData, UCSCgenes, by.x=names, by.y=Ensembl.Gene.ID, all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] possible reason for merge not working
Dan, If the variables you are merging by are character variables, there may be subtle differences that you haven't noticed, e.g., capitalization or spacing. You can look for differences by listing off the unique values: table(c(annotatedData$names, UCSCgenes$Ensembl.Gene.ID)) Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: world peace buysellrentof...@gmail.com To: r-help@r-project.org Date: 08/01/2011 11:24 AM Subject: [R] possible reason for merge not working Sent by: r-help-boun...@r-project.org Hi Guys, working on a merge for 2 data frames. Using the command: x - merge(annotatedData, UCSCgenes, by.x=names, by.y=Ensembl.Gene.ID, all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] possible reason for merge not working
On Aug 1, 2011, at 12:17 PM, world peace wrote: Hi Guys, working on a merge for 2 data frames. Using the command: x - merge(annotatedData, UCSCgenes, by.x=names, by.y=Ensembl.Gene.ID, all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities The merge function does not work on similarities. Matches need to be exact. between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. Probably logical. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R the right choice for simulating first passage times of random walks?
I've only got a 20 minute layover, but three quick remarks: 1) Do a sanity check on your data size: if you want a million walks of a thousand steps, that already gets you to a billion integers to store--even at a very low bound of one byte each, thats already 1GB for the data and you still have to process it all and run the OS. If you bump this to walks of length 10k, you are in big trouble. Considered like that, it shouldn't surprise you that you are getting near memory limits. If you really do need such a large simulation and are willing to make the time/space tradeoff, it may be worth doing simulations in smaller batches (say 50-100) and aggregating the needed stats for analysis. Also, consider direct use of the rm() function for memory management. 2) If you know that which.max()==1 can't happen for your data, might this trick be easier than forcing it through some tricky logic inside the which.max() X=which.max(...) if(X[1]==1) X=Inf # or whatever value 3) I dont have any texts at hand to confirm this but isn't the expected value of the first hit time of a RW infinite? I think a handwaving proof can be squeezed out of the optional stopping theorem with T=min(T_a,T_b) for a0b and let a - -Inf. If I remember right, this suggests you are trying to calculate a CI for a distribution with no finite moments, a difficult task to say the least. Hope these help and I'll write a more detailed reply to your notes below later, Michael Weylandt PS - what's an iterated RW? This is all outside my field (hence my spitball on #2 above) PS2 - sorry about the row/column mix-up: I usually think of sample paths as rows... On Aug 1, 2011, at 8:49 AM, Paul Menzel paulepan...@users.sourceforge.net wrote: Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt : Glad to help -- I haven't taken a look at Dennis' solution (which may be far better than mine), but if you do want to keep going down the path outlined below you might consider the following: I will try Dennis’ solution right away but looked at your suggestions first. Thank you very much. Instead of throwing away a simulation if something starts negative, why not just multiply the entire sample by -1: that lets you still use the sample and saves you some computations: of course you'll have to remember to adjust your final results accordingly. That is a nice suggestion. For a symmetric random walk this is indeed possible and equivalent to looking when the walk first hits zero. This might avoid the loop: x = ## Whatever x is. xLag = c(0,x[-length(x)]) # 'lag' x by 1 step. which.max((x=0) (xLag 0)) + 1 # Depending on how you've decided to count things, this +1 may be extraneous. The inner expression sets a 0 except where there is a switch from negative to positive and a one there: the which.max function returns the location of the first maximum, which is the first 1, in the vector. If you are guaranteed the run starts negative, then the location of the first positive should give you the length of the negative run. That is the same idea as from Bill [1]. The problem is, when the walk never returns to zero in a sample, `which.max(»everything FALSE)` returns 1 [2]. That is no problem though, when we do not have to worry about a walk starting with a positive value and adding 1 (+1) can be omitted when we count the epochs of first hitting 0 instead of the time of how long the walk stayed negative, which is always one less. Additionally my check `(x=0) (xLag 0)` is redundant when we know we start with a negative value. `(x=0)` should be good enough in this case. This all gives you, f4 - function(n = 10, # number of simulations length = 10) # length of iterated sum { R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) R = apply(R,1,cumsum) R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row The line above seems to look the columns instead of rows. I think the following is correct since after the `apply()` above the random walks are in the columns. R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] fTemp - function(x) { xLag = c(0,x[-length(x)]) return(which.max((x=0) (xLag 0))+1) countNegative = apply(R,2,fTemp) tabulate(as.vector(countNegative), length) } That just crashed my computer though, so I wouldn't recommend it for large n,length. Welcome to my world. I would have never thought that simulating random walks with a length of say a million would create that much data and push common desktop systems with let us say 4 GB of RAM to their limits. Instead, you can help a little by combining the lagging and the all in one. f4 - function(n = 10, llength = 10) { R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) R = apply(R,1,cumsum)
Re: [R] possible reason for merge not working
the answer was indeed in subtle differences, and 'str' did help. Problem is solved. Thanks everybody for comments which was all very useful. Best, On Mon, Aug 1, 2011 at 12:25 PM, jim holtman jholt...@gmail.com wrote: What you see and what the data really is may be two different things. You should have at least enclosed an 'str' of the two data frames; even better would be a subset of the data using 'dput'. Most likely your problem is that your data is not what you 'expect' it to be. On Mon, Aug 1, 2011 at 12:17 PM, world peace buysellrentof...@gmail.com wrote: Hi Guys, working on a merge for 2 data frames. Using the command: x - merge(annotatedData, UCSCgenes, by.x=names, by.y=Ensembl.Gene.ID, all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Impact of multiple imputation on correlations
Hi Tina, That is quite a bit of missingness, especially considering the sample size is not large to begin with. This would make me treat *any* result cautiously. That said, if you have a reasonable idea what the mechanism causing the missingness is or if from additional variables in your study, you can model the missing data mechanism sufficiently that you are confident (for some definition of confident) that the missingness is random after accounting for your model (conditional independence, I forget if Rubin calls it MCAR or MAR), you are in a reasonable place to use MI and draw inferences from the results. Even if you are uncertain about this, it is *not* any better to just say, well there was too much missing data for me to feel safe using MI so here is the correlation based just on the observed data. That _will be biased_ unless the missing data mechanism is completely random (even unconditioned on anything else in your study; for example if participants flipped coins to decide which questions to respond to). When averaging correlations, it is conventional to average the inverse hyperbolic function of the correlations and then use the hyperbolic function to transform the averaged value back to the original units (also known as Fisher's Z transformation). The mice package may do this automatically if there is a functiong to compute pooled correlations. How results between simply deleted cases with any value unobserved and using MI varies. There may be no difference, are larger difference, or a smaller difference. Looking at the scatter plot matrix from the different imputations, I do not know that I would actually classify that as varying quite a bit. I realize the sign of the slope changes some, but that is not too surprising because all of them are somewhat close to flat. You can compare the between imputation variance to the within imputation variance (I think mice gives you this information). I partly addressed your last question at the beginning---I would certainly not trust the correlation obtained simply by deleting missingness, but I also would not trust the result obtained using MI unless it was well setup. Although you have shown us some of the data, you have not mentioned how you modelled the missingness. This can have a substantial impact on your results (and also their trustworthyness). mice provides a number of different models and you have a choice in what variables you use if you collect a lot in your study. Given all of this, I would suggest finding a local statistician or consultant to talk with about this. Your question(s) are more statistical than they are R related. Also, in addition to learning more about MI (there are several good books and articles on it that you can look up or email me offlist and I can provide references if you want), someone who is there can be more helpful because they will have access to your whole dataset and can work with you to find the best variables/model to model the missing data mechanism. I hope this helps and good luck, Josh On Mon, Aug 1, 2011 at 12:03 AM, lifty.g...@gmx.de wrote: Dear all, I have been attempting to use multiple imputation (MI) to handle missing data in my study. I use the mice package in R for this. The deeper I get into this process, the more I realize I first need to understand some basic concepts which I hope you can help me with. For example, let us consider two arbitrary variables in my study that have the following missingness pattern: Variable 1 available, Variable 2 available: 51 (of 118 observations, 43%) Variable 1 available, Variable 2 missing: 37 (31,3%) Variable 1 missing, Variable 2 available: 10 (8,4%) Variable 1 missing, Variable 2 missing: 20 (16,9%) I am interested in the correlation between Variable 1 and Variable 2. Q1. Does it even make sense for me to use MI (or anything else, really) to replace my missing data when such large fractions are not available? Plot 1 (http://imgur.com/KFV9yCmV1sl) provides a scatter plot of these example variables in the original data. The correlation coefficient r = -0.34 and p = 0.016. Q2. I notice that correlations between variables in imputed data (pooled estimates over all imputations) are much lower and less significant than the correlations in the original data. For this example, the pooled estimates for the imputed data show r = -0.11 and p = 0.22. Since this seems to happen in all the variable combinations that I have looked at, I would like to know if MI is known to have this behavior, or whether this is specific to my imputation. Q3. When going through the imputations, the distribution of the individual variables (min, max, mean, etc.) matches the original data. However, correlations and least-square line fits vary quite a bit from imputation to imputation (see Plot 2, http://imgur.com/KFV9ylCmV1s). Is this normal? Q4. Since my results differ (quite significantly) between the original and
[R] 5 arguments passed to .Internal(matrix) which requires 7
Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inserting column in between
Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 5 arguments passed to .Internal(matrix) which requires 7
Robert, What code did you run to get that error? Do you get the error if the only code that you run is ... matrix(0, 30, 10) You gave three arguments to matrix, which requires none, but can take up to five. In the function matrix there is a call to .Internal(matrix) which requires 7 arguments. See ... matrix Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: Robert Pfister rw...@virginia.edu To: r-help@r-project.org Date: 08/01/2011 11:56 AM Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 Sent by: r-help-boun...@r-project.org Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 5 arguments passed to .Internal(matrix) which requires 7
Y'know, you aren't likely to get many responses with this kind of request. Why don't you go read the posting guidelines and come back with: R version info Sample data Actual commands used, so we can reproduce the problem --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Robert Pfister rw...@virginia.edu wrote: Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Limited number of principal components in PCA
Providing the data will help, but the first thing I noted is that you have more columns (variables) than rows (cases). PCA will return a maximum of (the number of columns) or (the number of rows-1) whichever is less. With 84 columns and 66 rows means you can get no more than 65 components. If the variables are highly correlated, you will get fewer components and that probably explains the reduction to 54. I would guess the variables are highly correlated and the first eigenvalue is very large. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joshua Wiley Sent: Friday, July 29, 2011 10:20 PM To: William Armstrong Cc: r-help@r-project.org Subject: Re: [R] Limited number of principal components in PCA Hi Billy, Can you provide your data? You could attach it as a text file or provide it by pasting the output of: dput(Q) into an email. It would help if we could reproduce what you are doing. You might also consider a list or forum that is more statistics oriented than Rhelp, as your questions are more related to the statistics than the software itself (but still, if you give us data, you will probably get farther). Cheers, Josh On Fri, Jul 29, 2011 at 11:33 AM, William Armstrong william.armstr...@noaa.gov wrote: Hi all, I am attempting to run PCA on a matrix (nrow=66, ncol=84) using 'prcomp' (stats package). My data (referred to as 'Q' in the code below) are separate river streamflow gaging stations (columns) and peak instantaneous discharge (rows). I am attempting to use PCA to identify regions of that vary together. I am entering the following command: test_pca_Q-prcomp(~.,data=Q,scale.=TRUE,retx=FALSE,na.action=na.omit) It is outputting 54 'standard deviation' numbers (which are the sqrt(eigenvalues) in respect to a certain PC, am I correct?), and 54 'rotation' numbers, which are the variable loadings with respect to a given PC. I have two questions: 1.) Why is it only outputting 54 PCs and standard deviations? If I have 84 variables isn't the maximum number of PCs I can create 84 as well? 2.) Can I now use the 'rotation' values to find clusters of gages that I acting together, or is there another step I must take? Thank you very much for your insight. Billy -- View this message in context: http://r.789695.n4.nabble.com/Limited-number-of-principal-components-in-PCA-tp3704956p3704956.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between
x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between
Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting problems directional or rose plots
Searching R Graphical Manual (http://www.oga-lab.net/RGM2/, mirror http://www.oga-lab.net/RGM2/) shows possible candidates in packages circular (windrose), IDPmisc (plot.rose), climatol (rosavent), openair (windRose), and oce (as.windrose). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of kitty Sent: Monday, August 01, 2011 10:39 AM To: r-help@r-project.org Subject: Re: [R] Plotting problems directional or rose plots Hi again, I have tried playing around with the code given to me by Alan and Jim, thank you for the code but unfortunatelyI can't seem to get either of them to work... Alans does not work with the sample data and Jims is giving the error : Error in radial.grid(labels = labels, label.pos = label.pos, radlab = radlab, : could not find function boxed.labels I have also tried Rose plots in the (heR.Misc) library to to avail. Sorry, does anyone know how to get the plots I need? Thank you all for reading this and for your help k. On Tue, Jul 26, 2011 at 10:20 PM, kitty kitty.a1...@gmail.com wrote: Hi, I'm trying to get a plot that looks somewhat like the attached image (sketched in word). I think I need somthing called a rose diagram? but I can't get it to do what I want. I'm happy to use any library. Essentially, I want a circle with degree slices every 10 degrees with 0 at the top representing north, and 'tick marks' around the outside in 10 degree increments to match the slices (so the slices need to be ofset by 5 degrees so the 0 degree slice actually faces north) I then want to be able to colour in the slices depending on the distance that the factor extends to; so for example the 9000 dist is the largest in the example so should fill the slice, a distance in this plot of 4500 would fill halfway up the slice. I also want to be able to specify the colour of each slice so that I can relate it back to the spatial correlograms I have. I have added some sample data below. Thank you for reading my post, All help is greatly appreciated, K sample data: #distance factor extends to dist-c(5000,7000,9000,4500,6000,500) #direction angle-c(0,10,20,30,40,50) #list of desired colour example, order corrisponds to associated angle/direction color.list-c('red','blue','green','yellow','pink','black') (my real data is from 0 to 350 degrees, and so I have corresponding distance and colour data for each 10 degree increment). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
Folks: I consider my reply below rather clumsy: One has to keep track of index numbers other than that which is inserted and must separately change column names. Is there as essentially better way to do this, either via base R or via an R package. I leave it to you to define essentially better. Thanks. Cheers, Bert On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote: Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing the index of factor in by() function
Since I didn't get an answer to this question, I'm rephrasing my question in simpler terms: I have a dataframe and I want to split it based on the levels of one of its columns, and apply a function to each section of the data. Output of the function may be drawing a plot, returning a value, whatever. I want to do it efficiently though (for loops are very slow). How can I do that? M On Tue, Jul 26, 2011 at 10:12 AM, Ista Zahn iz...@psych.rochester.eduwrote: Hi Merik, Please keep the mailing list copied. On Tue, Jul 26, 2011 at 6:44 AM, Merik Nanish merik.nan...@gmail.com wrote: You can convert my data into a dataframe simply by dat - data.frame(id, month, value). That doesn't help though. Can you be more specific? What is the problem you are having? And no, that's not what I'm looking for. What I intend to do is for by to loop through the data based on levels of id factor (1,2, and 3), and for each level, for my function to printout the values of value and month belonging to the section of data with that id. OK, easy enough: dat.tmp - data.frame(id, month, value) my.plot - function(dat) {print(dat[, c(id, value)])} by(dat.tmp, id, my.plot) Right now, I achieve this with a for loop but I want to avoid looping in the data as much as possible. Why? What do you have against loops? Best, Ista On Tue, Jul 26, 2011 at 12:18 AM, Ista Zahn iz...@psych.rochester.edu wrote: Hi Merik, by() works most easily with data.frames. Is this what you are after? my.plot - function(dat) { print(dat$value); print(dat$month[dat$id==dat$value]) } by(dat.tmp, id, my.plot) Best, Ista On Mon, Jul 25, 2011 at 9:19 PM, Merik Nanish merik.nan...@gmail.com wrote: Hello, Here are three vectors to give context to my question below: *id- c(1,1,1,1,1,2,2,2,3,3,3)) month - c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5) value - c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15)* and I want to plot value over month separately for each id. Before I can do that, I need to section both month and value, based on ID. I create a my.plot function like this (at this point, it doesn't draw any plots, it is just an effort to help my understand what I'm doing): *my.plot - function(y) { print(y); print(month[id==y]) }* Now, I tried: *by(value, id, my.plot)* But of course, it didn't do what I wanted. I realized that the parameter passed to my.plot, is a secion of value per ID, and not the ID value itself. Question is, how can I get the value of factor ID at each level of by()? Please advise, Merik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between
Not when I do it. a - data.frame(A=1:10, B=11:20, D=31:40, E=41:50) a A B D E 1 1 11 31 41 2 2 12 32 42 3 3 13 33 43 4 4 14 34 44 5 5 15 35 45 6 6 16 36 46 7 7 17 37 47 8 8 18 38 48 9 9 19 39 49 10 10 20 40 50 b - cbind(a[,1:2], C=21:30, a[,3:4]) b A B C D E 1 1 11 21 31 41 2 2 12 22 32 42 3 3 13 23 33 43 4 4 14 24 34 44 5 5 15 25 35 45 6 6 16 26 36 46 7 7 17 27 37 47 8 8 18 28 38 48 9 9 19 29 39 49 10 10 20 30 40 50 -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Monday, August 01, 2011 12:18 PM To: Sarah Goslee Cc: r-help@r-project.org Subject: Re: [R] Inserting column in between Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between
Bert, On Mon, Aug 1, 2011 at 1:17 PM, Bert Gunter gunter.ber...@gene.com wrote: Doesn't work -- you lose column names. But I don't lose column names: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) x A B C D E 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 newcol - 4:6 cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 It's even possible to change names in the cbind() statement: cbind(x[,1:2], Y=newcol, x[,3:ncol(x)]) A B Y C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 If for some reason it isn't working for you, you might try explicitly calling cbind.data.frame() instead of the default cbind(). Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) This shouldn't be necessary, I think. What happens if you use my above example? Sarah Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
Bert, On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter gunter.ber...@gene.com wrote: Folks: I consider my reply below rather clumsy: One has to keep track of index numbers other than that which is inserted and must separately change column names. Is there as essentially better way to do this, either via base R or via an R package. I leave it to you to define essentially better. Having tried your solution with sample data, I'd have to agree. :) Your approach does mess up the column names, and also doesn't work if x is a matrix rather than data frame. Mine, using the full cbind(), works in both cases, preserving the column names and running even if x is a matrix. It could be written as a function, but since it's only one line and really only requires knowing at what position you'd like to add the new column, it hardly seems worth it unless it's something to be done repeatedly. x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 x[,3:6] - cbind(newcol, x[,3:5]) x A B C D E E.1 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) x - as.matrix(x) cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E [1,] 1 1 4 1 1 1 [2,] 2 2 5 2 2 2 [3,] 3 3 6 3 3 3 x[,3:6] - cbind(newcol, x[,3:5]) Error in x[, 3:6] - cbind(newcol, x[, 3:5]) : subscript out of bounds Sarah Thanks. Cheers, Bert On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote: Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
On Mon, Aug 1, 2011 at 1:43 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Bert, On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter gunter.ber...@gene.com wrote: Folks: I consider my reply below rather clumsy: One has to keep track of index numbers other than that which is inserted and must separately change column names. Is there as essentially better way to do this, either via base R or via an R package. I leave it to you to define essentially better. A variation on the theme that I prefer for aesthetic reasons is a - data.frame(A=1:10, B=11:20, D=31:40, E=41:50) a$F - 21:30 a - a[, c(1:2, 5, 3:4)] I doubt that it is essentially better, as it still requires keeping track of the index, but to me this is easier to follow. Best, Ista Having tried your solution with sample data, I'd have to agree. :) Your approach does mess up the column names, and also doesn't work if x is a matrix rather than data frame. Mine, using the full cbind(), works in both cases, preserving the column names and running even if x is a matrix. It could be written as a function, but since it's only one line and really only requires knowing at what position you'd like to add the new column, it hardly seems worth it unless it's something to be done repeatedly. x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 x[,3:6] - cbind(newcol, x[,3:5]) x A B C D E E.1 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) x - as.matrix(x) cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E [1,] 1 1 4 1 1 1 [2,] 2 2 5 2 2 2 [3,] 3 3 6 3 3 3 x[,3:6] - cbind(newcol, x[,3:5]) Error in x[, 3:6] - cbind(newcol, x[, 3:5]) : subscript out of bounds Sarah Thanks. Cheers, Bert On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote: Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing the index of factor in by() function
Merik, You did get an answer to the question, and it's even included in the material below. What doesn't work for you in Ista's suggestion? id- c(1,1,1,1,1,2,2,2,3,3,3) month - c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5) value - c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15) dat.tmp - data.frame(id, month, value) my.plot - function(dat) {print(dat[, c(id, value)])} by(dat.tmp, id, my.plot) But if for some reason you need to get the separate sections, not just act on them, this might also work: dat.split - split(dat.tmp, dat.tmp$id) lapply(dat.split, my.plot) Sarah On Mon, Aug 1, 2011 at 1:34 PM, Merik Nanish merik.nan...@gmail.com wrote: Since I didn't get an answer to this question, I'm rephrasing my question in simpler terms: I have a dataframe and I want to split it based on the levels of one of its columns, and apply a function to each section of the data. Output of the function may be drawing a plot, returning a value, whatever. I want to do it efficiently though (for loops are very slow). How can I do that? M On Tue, Jul 26, 2011 at 10:12 AM, Ista Zahn iz...@psych.rochester.eduwrote: Hi Merik, Please keep the mailing list copied. On Tue, Jul 26, 2011 at 6:44 AM, Merik Nanish merik.nan...@gmail.com wrote: You can convert my data into a dataframe simply by dat - data.frame(id, month, value). That doesn't help though. Can you be more specific? What is the problem you are having? And no, that's not what I'm looking for. What I intend to do is for by to loop through the data based on levels of id factor (1,2, and 3), and for each level, for my function to printout the values of value and month belonging to the section of data with that id. OK, easy enough: dat.tmp - data.frame(id, month, value) my.plot - function(dat) {print(dat[, c(id, value)])} by(dat.tmp, id, my.plot) Right now, I achieve this with a for loop but I want to avoid looping in the data as much as possible. Why? What do you have against loops? Best, Ista On Tue, Jul 26, 2011 at 12:18 AM, Ista Zahn iz...@psych.rochester.edu wrote: Hi Merik, by() works most easily with data.frames. Is this what you are after? my.plot - function(dat) { print(dat$value); print(dat$month[dat$id==dat$value]) } by(dat.tmp, id, my.plot) Best, Ista On Mon, Jul 25, 2011 at 9:19 PM, Merik Nanish merik.nan...@gmail.com wrote: Hello, Here are three vectors to give context to my question below: *id - c(1,1,1,1,1,2,2,2,3,3,3)) month - c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5) value - c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15)* and I want to plot value over month separately for each id. Before I can do that, I need to section both month and value, based on ID. I create a my.plot function like this (at this point, it doesn't draw any plots, it is just an effort to help my understand what I'm doing): *my.plot - function(y) { print(y); print(month[id==y]) }* Now, I tried: *by(value, id, my.plot)* But of course, it didn't do what I wanted. I realized that the parameter passed to my.plot, is a secion of value per ID, and not the ID value itself. Question is, how can I get the value of factor ID at each level of by()? Please advise, Merik -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between
Thanks Sarah and David. Yes, but note this: z - data.frame(a=1:2,b=3:4) z a b 1 1 3 2 2 4 newdat - 5:6 cbind(z[,1],newdat,z[,2]) newdat [1,] 1 5 3 [2,] 2 6 4 cbind.data.frame(z[,1],newdat,z[,2]) z[, 1] newdat z[, 2] 1 1 53 2 2 64 Aha moment! -- You need drop=FALSE: cbind(z[,1,drop=FALSE],newdat,z[,2,drop=FALSE]) a newdat b 1 1 5 3 2 2 6 4 So your solution does not work in general (and you may not have intended it to); while mine does, but is blatantly clumsy. I would say the better approach is merely to add the drop = FALSE option to yours even though it is unnecessary in your simple example: cbind(x[,1:2,drop = FALSE], newcol, x[,3:ncol(x)], drop= FALSE) ... and I would definitely count this as an R 'gotcha' . (and it has gotcha'ed me before). Cheers, -- Bert On Mon, Aug 1, 2011 at 10:37 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Bert, On Mon, Aug 1, 2011 at 1:17 PM, Bert Gunter gunter.ber...@gene.com wrote: Doesn't work -- you lose column names. But I don't lose column names: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) x A B C D E 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 newcol - 4:6 cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 It's even possible to change names in the cbind() statement: cbind(x[,1:2], Y=newcol, x[,3:ncol(x)]) A B Y C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 If for some reason it isn't working for you, you might try explicitly calling cbind.data.frame() instead of the default cbind(). Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) This shouldn't be necessary, I think. What happens if you use my above example? Sarah Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fill Matrix quicker
dear all, i have a quite simple question, i want to fill up a Matrix like done in the following function, but the performance is very bad for large dimensions is there a way to do this like with apply or something similar? makeMatrix - function(a, b,dim) { X=matrix(0,ncol=dim,nrow=dim) for (i in c(1:dim)){ for (j in c(1:dim)) { if (i==j) {X[i,j]-a} else { X[i,j]- exp(( -1*abs(i-j))/(3*b)) } } } X } -- View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error message jpeg62.dll missing
See the footer of this and every R-help message. In particular, that DLL is not used by R itself, so this is probably something called from a third-party package. A number of packages used to use that DLL (which is rather out of date), but no longer, so is your R actually current (the posting guide asked you to update *before* posting: it also asked you for 'at a minimum information)? On Mon, 1 Aug 2011, Rocky Hyacinth wrote: Dear R-help We are getting an error message `jpeg62.dll missing'. We are running Windows 7 64-bit, from a Mac using Boot Camp. Do you know of this error message, and can you give us help trying to resolve the problem? many thanks Rocky Rocky Hyacinth Technician Department of Archaeology University of Sheffield United Kingdom [[alternative HTML version deleted]] And not to send HTML __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill Matrix quicker
Making use of the row() and col() functions speeds things up a bit. makeMatrix2 - function(a, b, dim) { X - matrix(NA, ncol=dim, nrow=dim) X - exp( (-1*abs(row(X) - col(X)))/(3*b) ) diag(X) - a X } system.time(makeMatrix(1, 2, 1000)) system.time(makeMatrix2(1, 2, 1000)) Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: monk m...@hush.com To: r-help@r-project.org Date: 08/01/2011 01:20 PM Subject: [R] fill Matrix quicker Sent by: r-help-boun...@r-project.org dear all, i have a quite simple question, i want to fill up a Matrix like done in the following function, but the performance is very bad for large dimensions is there a way to do this like with apply or something similar? makeMatrix - function(a, b,dim) { X=matrix(0,ncol=dim,nrow=dim) for (i in c(1:dim)){ for (j in c(1:dim)) { if (i==j) {X[i,j]-a} else { X[i,j]- exp(( -1*abs(i-j))/(3*b)) } } } X } -- View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 5 arguments passed to .Internal(matrix) which requires 7
Yes, even if I only run the command matrix(0,30,10) I get the error. I am running R with Ubuntu 10.10 (maverick) with R version: R version 2.13.1 (2011-07-08) When I check the function matrix, I can see that it is only passing five arguments to the function .Internal() (shown below). function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) { data - as.vector(data) if (missing(nrow)) nrow - ceiling(length(data)/ncol) else if (missing(ncol)) ncol - ceiling(length(data)/nrow) .Internal(matrix(data, nrow, ncol, byrow, dimnames)) } environment: namespace:base On Mon, Aug 1, 2011 at 1:02 PM, Jean V Adams jvad...@usgs.gov wrote: Robert, What code did you run to get that error? Do you get the error if the only code that you run is ... matrix(0, 30, 10) You gave three arguments to matrix, which requires none, but can take up to five. In the function matrix there is a call to .Internal(matrix) which requires 7 arguments. See ... matrix Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: Robert Pfister rw...@virginia.edu To: r-help@r-project.org Date: 08/01/2011 11:56 AM Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 Sent by: r-help-boun...@r-project.org -- Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 5 arguments passed to .Internal(matrix) which requires 7
That's interesting. My function matrix() looks like this: function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) { if (is.object(data) || !is.atomic(data)) data - as.vector(data) .Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow), missing(ncol))) } environment: namespace:base I'm running Windows R version 2.13.0 (2011-04-13). Jean From: Robert Pfister rw...@virginia.edu To: Jean V Adams jvad...@usgs.gov Cc: r-help@r-project.org Date: 08/01/2011 01:35 PM Subject: Re: [R] 5 arguments passed to .Internal(matrix) which requires 7 Yes, even if I only run the command matrix(0,30,10) I get the error. I am running R with Ubuntu 10.10 (maverick) with R version: R version 2.13.1 (2011-07-08) When I check the function matrix, I can see that it is only passing five arguments to the function .Internal() (shown below). function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) { data - as.vector(data) if (missing(nrow)) nrow - ceiling(length(data)/ncol) else if (missing(ncol)) ncol - ceiling(length(data)/nrow) .Internal(matrix(data, nrow, ncol, byrow, dimnames)) } environment: namespace:base On Mon, Aug 1, 2011 at 1:02 PM, Jean V Adams jvad...@usgs.gov wrote: Robert, What code did you run to get that error? Do you get the error if the only code that you run is ... matrix(0, 30, 10) You gave three arguments to matrix, which requires none, but can take up to five. In the function matrix there is a call to .Internal(matrix) which requires 7 arguments. See ... matrix Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: Robert Pfister rw...@virginia.edu To: r-help@r-project.org Date: 08/01/2011 11:56 AM Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 Sent by: r-help-boun...@r-project.org Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill Matrix quicker
Most certainly you can speed it up: X - exp(-abs(row(X) - col(X)) / (3*b)) diag(X) - a should do what you want. This is called 'vectorization' and is discussed lots of places -- for instance, in the two documents mentioned below in my signature. On 01/08/2011 19:12, monk wrote: dear all, i have a quite simple question, i want to fill up a Matrix like done in the following function, but the performance is very bad for large dimensions is there a way to do this like with apply or something similar? makeMatrix- function(a, b,dim) { X=matrix(0,ncol=dim,nrow=dim) for (i in c(1:dim)){ for (j in c(1:dim)) { if (i==j) {X[i,j]-a} else { X[i,j]- exp(( -1*abs(i-j))/(3*b)) } } } X } -- View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
Actually Sara's method fails if the insertion is after the first or before the last column: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[,1], newcol, x[,2:ncol(x)]) x[, 1] newcol B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 cbind(x[,1:4], newcol, x[,ncol(x)]) A B C D newcol x[, ncol(x)] 1 1 1 1 1 41 2 2 2 2 2 52 3 3 3 3 3 63 Inserting drop=FALSE fixes them. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee Sent: Monday, August 01, 2011 12:44 PM To: Bert Gunter Cc: r-help@r-project.org Subject: Re: [R] Inserting column in between -- better way? Bert, On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter gunter.ber...@gene.com wrote: Folks: I consider my reply below rather clumsy: One has to keep track of index numbers other than that which is inserted and must separately change column names. Is there as essentially better way to do this, either via base R or via an R package. I leave it to you to define essentially better. Having tried your solution with sample data, I'd have to agree. :) Your approach does mess up the column names, and also doesn't work if x is a matrix rather than data frame. Mine, using the full cbind(), works in both cases, preserving the column names and running even if x is a matrix. It could be written as a function, but since it's only one line and really only requires knowing at what position you'd like to add the new column, it hardly seems worth it unless it's something to be done repeatedly. x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 x[,3:6] - cbind(newcol, x[,3:5]) x A B C D E E.1 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) x - as.matrix(x) cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E [1,] 1 1 4 1 1 1 [2,] 2 2 5 2 2 2 [3,] 3 3 6 3 3 3 x[,3:6] - cbind(newcol, x[,3:5]) Error in x[, 3:6] - cbind(newcol, x[, 3:5]) : subscript out of bounds Sarah Thanks. Cheers, Bert On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote: Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote: x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 51st..?I will be very thankful to you. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Identifying US holidays
Hello! I am trying to identify which ones of a vector of dates are US holidays. And, ideally, which is which. And I do not know (a-priori) which dates those should be. I have, for example: x-seq(as.Date(2011-01-01),as.Date(2011-12-31),by=day) (x) I think chron should help me here - but maybe I am not using it properly: library(chron) is.holiday(chron) # Says that none of those dates are holidays ?is.holiday says: holidays is an object that should be listing holidays. But I want to figure out which of my dates are US holidays and don't want to provide a list of Package timeDate does almost what I need: library(timeDate) holidayNYSE(2008:2010) holidayNYSE() However, I don't need all the NYSE holidays (like Good Friday). Just the major US holidays - New Years, MLK, Memorial Day, Independence Day, Labor Day, Halloween, Thanksgiving, Christmas. Is there any way to identify major US holidays? Thanks a lot! - Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in self-made function - cannot deal with objects of length = 1
bjmjarrett wrote: ... rate - function(x){ storage - matrix(nrow=length(x),ncol=1) ifelse(length(x)==1,storage[1,] - NA,{ storage[1,] - x[1]/max(x) for(i in 2:length(x)){ p - i-1 storage[i,] - ((x[i] - x[p]) / max(x)) } }) return(storage) } but I end up with this error when I try and use the above function in tapply(): Error in ans[!test !nas] - rep(no, length.out = length(ans))[!test : replacement has length zero ifelse is for vector arguments. You should use if() {...} else {.} But why not just c(x[1], diff(x))/max(x) Berend -- View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710621.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] example package for devel newcomers
On 7/31/2011 6:24 PM, Alexandre Aguiar wrote: Em Domingo 31 Julho 2011, você escreveu: My memory is that this question gets asked every few months and one of the stock answers is to use the function 'package.skeleton' in the utils package as a starting point. Got that from docs. And actually I already have most of the code written. My question addresses known tricks and impressions by experienced R interface programmers. This kind of stuff can be really useful. For instance, tricks are much better than docs when embedding php. Thanx. Hadley Wickham is working on this sort of thing. I know he has given a master class on package development. Some things related to that are on the wiki associated with his devtools package: https://github.com/hadley/devtools/wiki -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error while trying to install a package
Hi Everyone, When i try to install a package using install.packages(agricolae) --- Please select a CRAN mirror for use in this session --- | The cursor keeps blinking i dont get a popup menu to choose a CRAN mirror? Is it due to my proxy server settings? I tried to echo $http_proxy , it doesnt carry any proxy , its blank. Please help me. Thanks, Sushil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill Matrix quicker
thanks a lot , that will do the trick -- View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710533.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error in self-made function - cannot deal with objects of length = 1
I have a function to calculate the rate of increase (the difference between the value and the previous value divided by the total number of eggs in a year) of egg production over the course of a year: rate - function(x){ storage - matrix(nrow=length(x),ncol=1) storage[1,] - x[1] / max(x) # as there is no previous value for( i in 2:length(x)){ p - i - 1 storage[i,] - ((x[i] - x[p] / max(x)) } return(storage) } However, as it requires the subtraction of one term with the previous term it fails when dealing with objects with length = 1 (when only one reading has been taken in a year). I have tried adding an ifelse() function into `rate' with NA added for length 1: rate - function(x){ storage - matrix(nrow=length(x),ncol=1) ifelse(length(x)==1,storage[1,] - NA,{ storage[1,] - x[1]/max(x) for(i in 2:length(x)){ p - i-1 storage[i,] - ((x[i] - x[p]) / max(x)) } }) return(storage) } but I end up with this error when I try and use the above function in tapply(): Error in ans[!test !nas] - rep(no, length.out = length(ans))[!test : replacement has length zero Thanks in advance, Ben -- View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710555.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 5 arguments passed to .Internal(matrix) which requires 7
It looks like I had been missing an update needed for Ubuntu systems. All I needed was the following. Thank you. update.packages(lib.loc = /usr/local/lib/R/site-library) On Mon, Aug 1, 2011 at 2:39 PM, Jean V Adams jvad...@usgs.gov wrote: That's interesting. My function matrix() looks like this: function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) { if (is.object(data) || !is.atomic(data)) data - as.vector(data) .Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow), missing(ncol))) } environment: namespace:base I'm running Windows R version 2.13.0 (2011-04-13). Jean From: Robert Pfister rw...@virginia.edu To: Jean V Adams jvad...@usgs.gov Cc: r-help@r-project.org Date: 08/01/2011 01:35 PM Subject: Re: [R] 5 arguments passed to .Internal(matrix) which requires 7 -- Yes, even if I only run the command matrix(0,30,10) I get the error. I am running R with Ubuntu 10.10 (maverick) with R version: R version 2.13.1 (2011-07-08) When I check the function matrix, I can see that it is only passing five arguments to the function .Internal() (shown below). function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) { data - as.vector(data) if (missing(nrow)) nrow - ceiling(length(data)/ncol) else if (missing(ncol)) ncol - ceiling(length(data)/nrow) .Internal(matrix(data, nrow, ncol, byrow, dimnames)) } environment: namespace:base On Mon, Aug 1, 2011 at 1:02 PM, Jean V Adams *jvad...@usgs.gov*jvad...@usgs.gov wrote: Robert, What code did you run to get that error? Do you get the error if the only code that you run is ... matrix(0, 30, 10) You gave three arguments to matrix, which requires none, but can take up to five. In the function matrix there is a call to .Internal(matrix) which requires 7 arguments. See ... matrix Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: Robert Pfister *rw...@virginia.edu* rw...@virginia.edu To: * r-help@r-project.org* r-help@r-project.org Date: 08/01/2011 11:56 AM Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 Sent by: *r-help-boun...@r-project.org* r-help-boun...@r-project.org -- Hello, I am having a problem with the function matrix. Specifically, when I pass three arguments (two more being instantiated in the function), I get the following error message: Error in matrix(0, 30, 10) : 5 arguments passed to .Internal(matrix) which requires 7 I looked into it, and someone has suggested that this may be the function from an old version of R. I recently changed my source path from the lucid version to the maverick version and installed all of the R packages I need like so, but why would this change the matrix() function? Also, how does R know that I passed five arguments (only three being given) if the matrix() function is supposed to take seven arguments? Thank you, Robert [[alternative HTML version deleted]] __* **R-help@r-project.org* R-help@r-project.org mailing list* **https://stat.ethz.ch/mailman/listinfo/r-help*https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide * http://www.R-project.org/posting-guide.html*http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.