Re: [R] x86 SSE* Pointer Favors
Let me pick up on Enabling SSE instructions in addition while building R (yes, you have to enable them explicitly, see man gcc) is possible but does not help much since all maths is mostly done in BLAS. The final part is not true for my 'maths', only for those doing linear algebra. Enabling use of SSE registers can help with CPU scheduling, and so can have a suprisingly large effect, so if you only run R on a single CPU type it is worth tuning the code to that CPU (e.g. -mtune=core2) alongside turning up optimization levels. On Fri, 13 Jun 2008, Ivan Adzhubey wrote: Hi Ivo, On Friday 13 June 2008 12:23:06 am ivo welch wrote: Dear Statisticians--- This is not even an R question, so please forgive me. I have so much ignorance in this matter that I do not know where to begin. I hope someone can point me to documentation and/or a sample. You will sure find some answers to your questions if you look into R-admin.html file under "Building from source" section. Do a search on BLAS and you will be presented with some options. Using a bit of R web site search on the same keyword will give you even more food for thought. I want to compute a covariance as quickly as non-humanly possible on an Intel core processor (up to SSE4) under linux. Alas, I have no idea how to engage CPU vectorization. Do I need to use special data types, or is "double" correct? Does SSE* understand NaN? Should I rely on gcc autodetection of the vectorized meaning of my code, or are there specific libraries that I should call? I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster than the stock R BLAS library, depending on your code. Enabling SSE instructions in addition while building R (yes, you have to enable them explicitly, see man gcc) is possible but does not help much since all maths is mostly done in BLAS. That said, optimized BLAS libraries give most speed increase with older processors. Newer crop of multi-core CPUs with large shared caches is much more difficult to hand-tune code for. You may want to subscribe to Goto BLAS mailing list for an in-depth discussion. ATLAS community is also very helpful (I use their code with our AMD CPUs). What I want to learn about is as simple as it gets: typedef double Double; // or whatever SSE* needs as close equivalent Double vector1[N], vector2[N]; // then fill them with stuff. R does not have types, everything that does not look like character string or an integer is treated as double. All arithmetics are always done in double precision. vector3= vector_mult(vector1,vector2, N); vector4= sum(vector1, N); I just need a pointer and/or primer. PS: If someone knows of a superfast vectorized implementation of Gentleman's WLS algorithm, please point me to it, too. I am still using my old non-vectorized C routines. HTH, Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with rowMeans()
Erik Iverson wrote: > > > ss wrote: >> It is: >> >> > data <- >> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', >> row.names = NULL ,header=TRUE, fill=TRUE) >> > class(data[3]) >> [1] "data.frame" >> > >> > > Oops, should have said class(data[[3]]) and > is.numeric(data[[3]]) > oops, my typo. of course, data[3] is a *data frame* (if data is one), so is.numeric(data[3]) must be FALSE. but clearly if column 3 was excluded, is.numeric(data[[3]]) must have been FALSE. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] parsing - input buffer overflow
Hi, I am trying to parse a large amount of text using gregexpr(). Unfortunately, I get an "input buffer overflow" message when I attempt that with too large an amount of text. The error messages occurs before the parsing. The problem is that I cannot assign the text to a variable (an object) if the text is too large. This problem has been mentioned before, which I found using the RSiteSearch. However, the post is from 2006, and I thought it might have improved by now. Is there any way to increase the limit or to get around this problem? x="Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island, Tristan da Cunha" #What I want to achieve is to parse the text for the number of occurrences of a certain character string within the text. #This is done using: n=100 #choose n large enough length(which(is.na(gregexpr("Saint",x,ignore.case=TRUE)[[1]][1:n])==FALSE)) But again, if the text is large, I cannot assign it to x. I'd be grateful for any suggestions. Cheers, Daniel - cuncta stricte discussurus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave: looping over mixed R/LaTeX code
Dear guRus, I would like to loop over a medium amount of Sweave code, including both R and LaTeX chunks. Is there any way to do so? As an illustration, can I create a .tex file like this using a loop within a .Rnw file, where the "1,2,3" comes from some iteration variable in R? \documentclass{article} \usepackage{Sweave} \begin{document} Iteration 1 Iteration 2 Iteration 3 \end{document} Right now, I do have a working but painful solution. I put the loop contents in a separate loop.Rnw file, then: 1. run everything before the loop through R for initialization 2. Sweave loop.Rnw; shell("move loop.tex loop_1.tex") Sweave loop.Rnw; shell("move loop.tex loop_2.tex") ... Sweave loop.Rnw; shell("move loop.tex loop_n.tex") 3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw This does what I need, however, it is a major pain code-wise, e.g., there appears to be no way to control the loop during execution (n must be known in advance), and I need to control all graphics using \includegraphics with the iteration counter paste()d into the filename. An alternative may be not using Sweave and working with one giant sink() and lots of print()s, letting R just write the entire .tex file. This also appears inelegant to me. Is there a better way to do this? I have tried to do my homework, see below. Do I get partial credit ;-) ? Thank you all for your time! Stephan # I can't simply start a for loop within an R chunk and finish it in another one. whiledo in the ifthen.sty package doesn't like Sweave at all. And of course, it would simply reuse the R chunks if it did work, without changing things between loops. For the same reason, I cannot define a \newcommand{\loopcontent}{...} with the entire loop contents and then simply write \loopcontent \loopcontent ... or \input or \include the loop content from an external file. Of course it would be possible to not use Sweave and just use the output from the R console, but there are a couple of figures I would really like to see close to the relevant portions of the calculations. I also thought about putting the entire loop in *one* R chunk, but then I see no way to include LaTeX chunks *within* this R chunk. I can't just sink() to the .tex file in the middle of the R chunk (as the sink() gets appended to the .tex file only after Sweave is done with it). I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did both RSiteSearches and RSeek searches for all combinations of "Sweave" and "loop", "for", "while" I could think of. For what it's worth, here's my sessionInfo(): R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets tcltk methods base other attached packages: [1] svIDE_0.9-5 loaded via a namespace (and not attached): [1] svMisc_0.9-5 -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding custom axis to image.plot() and strange clipping behavior
Hi list, I wanted to plot an image with a colorbar to the right of the plot, but set my own axis labels (text rather than numbers) to the image. I have previously accomplished this with two calls to image(), but the package 'fields' has a wrapper function, image.plot(), which does this task conveniently. However, I could not add axes to the original image after a call to image.plot(); I have found that I needed to set par(xpd=TRUE) within the function to allow this to happen: ###=== begin code library(fields) ## make data matrix m <- matrix(1:15,ncol=3) ## plot image.plot(m,axes=FALSE) axis(1) # doesn't work par(xpd=TRUE) axis(1) # still doesn't work ## replace the 28th element of the body of image.plot() ## and assign to new function called 'imp' ## here I just use the second condition of 'if' statement ## and set 'xpd = TRUE' imp <- `body<-`(image.plot,value=`[[<-`(body(image.plot),28, quote({par(big.par) par(plt = big.par$plt, xpd = TRUE) par(mfg = mfg.save, new = FALSE) invisible()}))) imp(m,axes=FALSE) box() axis(1,axTicks(1),lab=letters[1:length(axTicks(1))]) ## clip to plotting region for additional ## graphical elements to be added: par(xpd=FALSE) abline(v=0.5) ###=== end code I wonder if anyone has any insights into this behavior? Since in the axis() documentation, it says: "Note that xpd is not accepted as clipping is always to the device region" I am surprised to find (1) that the par(xpd=TRUE) works in the case above, and (2) that it must be called before the function call is terminated. I wonder if anyone has any insights into this behavior. I have reproduced this on both my Linux box (Ubuntu Gutsy Gibbon 64-bit, R 2.7.0, fields package version 4.1) and Windows machine (32-bit XP Pro, R 2.7.0, fields package version 4.1). Thanks very much, Stephen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: looping over mixed R/LaTeX code
Dear Stephan, I have the same problem than you. My solution is a bit different but not very elegant I have a master document (let say master.Snw) and a file containing the code to repeat (which would be in the loop). In the master document I start a counter at 0, and I copy " \SweaveInput{loop.Snw}" as many times as the n of the loop. And in my loop.Snw, I don't forget to increment the counter of 1. Not marvelous, but it works... Delphine Delphine Fontaine Statistician Data & Statistics Department Genexion SA Please consider the environment before printing this e-mail > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > project.org] On Behalf Of Stephan Kolassa > Sent: vendredi 13 juin 2008 10:22 > To: r-help@r-project.org > Subject: [R] Sweave: looping over mixed R/LaTeX code > > > Dear guRus, > > I would like to loop over a medium amount of Sweave code, including > both R and LaTeX chunks. Is there any way to do so? As an illustration, > can I create a .tex file like this using a loop within a .Rnw file, > where the "1,2,3" comes from some iteration variable in R? > > > \documentclass{article} > \usepackage{Sweave} > \begin{document} > Iteration 1 > Iteration 2 > Iteration 3 > \end{document} > > > Right now, I do have a working but painful solution. I put the loop > contents in a separate loop.Rnw file, then: > 1. run everything before the loop through R for initialization > 2. Sweave loop.Rnw; shell("move loop.tex loop_1.tex") >Sweave loop.Rnw; shell("move loop.tex loop_2.tex") >... >Sweave loop.Rnw; shell("move loop.tex loop_n.tex") > 3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw > > This does what I need, however, it is a major pain code-wise, e.g., > there appears to be no way to control the loop during execution (n must > be known in advance), and I need to control all graphics using > \includegraphics with the iteration counter paste()d into the filename. > > An alternative may be not using Sweave and working with one giant > sink() and lots of print()s, letting R just write the entire .tex file. > This also appears inelegant to me. > > Is there a better way to do this? > > I have tried to do my homework, see below. Do I get partial credit ;-) > ? > > Thank you all for your time! > Stephan > > > # > > > I can't simply start a for loop within an R chunk and finish it in > another one. > > whiledo in the ifthen.sty package doesn't like Sweave at all. And of > course, it would simply reuse the R chunks if it did work, without > changing things between loops. For the same reason, I cannot define a > \newcommand{\loopcontent}{...} with the entire loop contents and then > simply write \loopcontent \loopcontent ... or \input or \include the > loop content from an external file. > > Of course it would be possible to not use Sweave and just use the > output from the R console, but there are a couple of figures I would > really like to see close to the relevant portions of the calculations. > > I also thought about putting the entire loop in *one* R chunk, but then > I see no way to include LaTeX chunks *within* this R chunk. I can't > just sink() to the .tex file in the middle of the R chunk (as the > sink() gets appended to the .tex file only after Sweave is done with > it). > > I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did > both RSiteSearches and RSeek searches for all combinations of "Sweave" > and "loop", "for", "while" I could think of. > > For what it's worth, here's my sessionInfo(): > > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY > =German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets tcltk methods > base > > other attached packages: > [1] svIDE_0.9-5 > > loaded via a namespace (and not attached): > [1] svMisc_0.9-5 > > -- > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looping, Control Flow & Conditional Statements
Dear R Group: I have little experience using R and even less experience with control flow type questions. See the following code: a1 = c(0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0) for(i in 1:1){ sx <- paste("a",i,sep="") s <- eval(parse(text = paste("a",i,sep=""))) {g = numeric(length(s)) k = numeric(length(s)) {for (i in 1:length(s)) {for (j in 1:length(s)) ifelse(((j=i)>1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i])) }} h1 <- hist(g,freq=TRUE) h <- h1$counts[4] cat(sx,":", h,"\n",file = "C:/temp/test-beta.txt", append=TRUE) }} The output is: > g [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0 > k [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > h [1] 7 & a text file, which has: a1 : 7 k is a by-product of the ifelse statement and is of no interest & g and h only go part-way to answering my question, which is: For every time an object i.e. a1 (which is actually a time series) - 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 has as value over 0 how long do the values stay above 0. So in this case a1 has two goups or events where the value is above zero, the first event lasts for 3 'days' and the second event lasts for 4 'days'. I have my code telling me that there was a total of 7 'days' in event or above 0, but what I need to know is that there were two 'events' and the 1st lasted 3 'days' and the 2nd lasted '4' days. Essentially I want a text file output to say: a1.1 : 3 a1.2 : 4 My thinking is that I need to somehow get the code working through each vector one value at a time and when a value is found to meet the critera of > 0 R creates a new vector; to use the above example it would come to the first value >0 and then create the new vector a1.1 = (1,1,1) then as the next value in the series is 0 it would close this new vector 'a1.1'. It would then continue until it reaches the next value >0 and then create the vector a1.2 = (1,1,1,1) then again as the next value in the series is 0 it would close this new vector, and so on. Then all I need to do is perform a count of '1's in these new vectors to find how many days they met this criteria of being greater than 0 I hope the above makes sense and I really hope there is someone willing and able to help. I don't know how to proceed. Thanks, Garth [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing - input buffer overflow
On Fri, 13 Jun 2008, Daniel Malter wrote: Hi, I am trying to parse a large amount of text using gregexpr(). Unfortunately, I get an "input buffer overflow" message when I attempt that with too large an amount of text. The error messages occurs before the parsing. The problem is that I cannot assign the text to a variable (an object) if the text is too large. R does have limits on the command line length (1024 bytes up to R-devel, 4096 bytes there). What happens if you exceed that depends on the interface you are using (and you have not told us). Beyond that, the parser has a limit of MAXELTSIZE (8192 bytes) on strings. I don't see any need for 'improvement' though: why are you entering very long strings as part of the R program? They are data, and e.g. readLines() and scan() have no limits on string length beyond those imposed by R's internals (2^31-1 bytes). This problem has been mentioned before, which I found using the RSiteSearch. However, the post is from 2006, and I thought it might have improved by now. Is there any way to increase the limit or to get around this problem? x="Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island, Tristan da Cunha" I presume that is not an example? It looks like a character vector which has been collapsed by paste(x, ", ") and would be better strsplit() into its components than using gregexpr. #What I want to achieve is to parse the text for the number of occurrences of a certain character string within the text. #This is done using: n=100 #choose n large enough length(which(is.na(gregexpr("Saint",x,ignore.case=TRUE)[[1]][1:n])==FALSE)) But again, if the text is large, I cannot assign it to x. I'd be grateful for any suggestions. Cheers, Daniel - cuncta stricte discussurus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to increase the for() loop speed?
Rafael Barros de Rezende: > I would like to know if there is a way to increase the for() loop speed > because in my routine the calculations are too slow. Read the article 'How Can I Avoid This Loop or Make It Faster?' on page 46 in the latest R News "http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf";. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Output of silhouette (cluster package)
Dear R users, I am mailing you about the graphical output of silhouette (cluster package) From the example of silhouette in help(silhouette): > ar <- agnes(ruspini) > si3 <- silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam() above +daisy(ruspini)) > plot(si3, nmax = 80, cex.names = 0.5) from which one may conclude that group 1 is composed by units from 1 to 20, group 2 by units from 21 to 43, group 3 by units from 44 to 57, group 4 by units from 58 to 60 and, finally, group 5 by units from 61 to 75. However, this seems to be in contrast with the output of silhouette where the fourth group is composed by units from 46 to 48 instead of units from 58 to 60 (belonging to the third cluster), see > si3 cluster neighbor sil_width [1,] 15 0.679838078 [2,] 15 0.745615002 [3,] 15 0.758796123 [4,] 14 0.715554768 [5,] 15 0.664657114 [6,] 14 0.783993831 [7,] 12 0.590057470 [8,] 14 0.747969458 [9,] 15 0.792304760 [10,] 14 0.803547635 [11,] 14 0.742402051 [12,] 14 0.722302731 [13,] 14 0.665412622 [14,] 15 0.756910666 [15,] 15 0.700685403 [16,] 15 0.743601834 [17,] 15 0.614854124 [18,] 15 0.708007860 [19,] 15 0.700093839 [20,] 14 0.568989067 [21,] 24 0.751866935 [22,] 24 0.790783667 [23,] 24 0.802659788 [24,] 24 0.785895823 [25,] 24 0.822943473 [26,] 24 0.831313347 [27,] 24 0.818043337 [28,] 24 0.805454305 [29,] 24 0.770547118 [30,] 24 0.768289979 [31,] 23 0.794485567 [32,] 24 0.829925955 [33,] 24 0.807379640 [34,] 24 0.790626589 [35,] 24 0.817427927 [36,] 23 0.793572412 [37,] 24 0.760561408 [38,] 24 0.743170109 [39,] 23 0.761413953 [40,] 23 0.704193051 [41,] 24 0.297007126 [42,] 24 0.522049838 [43,] 23 0.488556828 [44,] 34 0.377632488 [45,] 34 0.007214464 [46,] 43 0.699407534 [47,] 43 0.837451212 [48,] 43 0.794349431 [49,] 34 0.632862996 [50,] 34 0.586149139 [51,] 34 0.647326133 [52,] 34 0.650020368 [53,] 34 0.629131005 [54,] 34 0.618843633 [55,] 34 0.586439350 [56,] 34 0.586788051 [57,] 34 0.668108812 [58,] 34 0.650074540 [59,] 34 0.628444500 [60,] 34 0.591393005 [61,] 51 0.770110294 [62,] 51 0.815309198 [63,] 54 0.771622667 [64,] 51 0.806125429 [65,] 51 0.850310507 [66,] 51 0.822984066 [67,] 51 0.852743923 [68,] 51 0.762055943 [69,] 51 0.839180986 [70,] 51 0.854894699 [71,] 51 0.838106473 [72,] 51 0.774812117 [73,] 51 0.795021304 [74,] 51 0.759681469 [75,] 51 0.742553847 attr(,"Ordered") [1] FALSE attr(,"call") silhouette.default(x = cutree(ar, k = 5), dist = daisy(ruspini)) attr(,"class") [1] "silhouette" Thanks for your attention, Cristiano - Cristiano Varin [EMAIL PROTECTED] http://www.dst.unive.it/~sammy/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model simplification using Crawley as a guide
Peter Dalgaard wrote: ... That'll be anti-hist()-amine, I presume? I would think p-necillin a more appropriate treatment. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] uncertainty bounds for a weighted moving average
Hi, well; this is not a R-specific question. But perhaps you can help. If I've got an irregularly sampled time series, and conduct a moving average filter (e.g., with a triangular kernel), how could the uncertainty bounds be calculated? Thanks and best regards J. --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Writing a new link for a GLM.
Hi, I wish to write a new link function for a GLM. R's glm routine does not supply the "loglog" link. I modified the make.link function adding the code: }, loglog = { linkfun <- function(mu) -log(-log(mu)) linkinv <- function(eta) exp(-exp(-eta)) mu.eta <- function(eta) exp(-exp(-eta)-eta) valideta <- function(eta) all(eta != 0) }, stop(sQuote(link), " link not recognised")) structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = "link-glm") } and then call glm with argument glm(y~x1+x2+x3,family=binomial(link=make.link("loglog")),data=X) and that seems to work. Is this the way to include a new link function? Any other suggestions? Jan. -- |Jan Graffelman |tel: +34-93-4011739| |Dpt. of Statistics & Operations Research|fax: +34-93-4016575| |Universitat Politecnica de Catalunya|email: [EMAIL PROTECTED]| |Av. Diagonal 647, 6th floor |www: | |08028 Barcelona, Spain | http://www-eio.upc.es/~jan/| __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding custom axis to image.plot() and strange clipping behavior
I also noticed that adding a custom axis with image.plot was a problem; you can also do: library(fields) m <- matrix(1:15,ncol=3) par(mar=c(5,5,5,7)) image(m, axes=FALSE) # add axis axis(1,axTicks(1),lab=letters[1:length(axTicks(1))]) box() ## add legend image.plot(m, legend.only=TRUE) On Fri, 13 Jun 2008, Stephen Tucker wrote: > Hi list, > > I wanted to plot an image with a colorbar to the right of the plot, but > set my own axis labels (text rather than numbers) to the image. I have > previously accomplished this with two calls to image(), but the package > 'fields' has a wrapper function, image.plot(), which does this task > conveniently. > > However, I could not add axes to the original image after a call to > image.plot(); I have found that I needed to set par(xpd=TRUE) within the > function to allow this to happen: > > ###=== begin code > library(fields) > > ## make data matrix > m <- matrix(1:15,ncol=3) > > ## plot > image.plot(m,axes=FALSE) > axis(1) # doesn't work > > par(xpd=TRUE) > axis(1) # still doesn't work > > ## replace the 28th element of the body of image.plot() > ## and assign to new function called 'imp' > ## here I just use the second condition of 'if' statement > ## and set 'xpd = TRUE' > imp <- `body<-`(image.plot,value=`[[<-`(body(image.plot),28, > quote({par(big.par) > par(plt = big.par$plt, xpd = TRUE) > par(mfg = mfg.save, new = FALSE) > invisible()}))) > imp(m,axes=FALSE) > box() > axis(1,axTicks(1),lab=letters[1:length(axTicks(1))]) > ## clip to plotting region for additional > ## graphical elements to be added: > par(xpd=FALSE) > abline(v=0.5) > ###=== end code > > I wonder if anyone has any insights into this behavior? Since in the axis() > documentation, it says: > "Note that xpd is not accepted as clipping is always to the device region" > I am surprised to find (1) that the par(xpd=TRUE) works in the case above, > and (2) that it must be called before the function call is terminated. > > I wonder if anyone has any insights into this behavior. I have reproduced > this on both my Linux box (Ubuntu Gutsy Gibbon 64-bit, R 2.7.0, fields > package version 4.1) and Windows machine (32-bit XP Pro, R 2.7.0, fields > package version 4.1). > > Thanks very much, > > Stephen > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: looping over mixed R/LaTeX code
Stephan Kolassa gmx.de> writes: > I would like to loop over a medium amount of Sweave code, including both R and LaTeX chunks. Is there any way to > do so? As an illustration, can I create a .tex file like this using a loop within a .Rnw file, where the > "1,2,3" comes from some iteration variable in R? > > > \documentclass{article} > \usepackage{Sweave} > \begin{document} > Iteration 1 > Iteration 2 > Iteration 3 > \end{document} > I normally do this with a \newcommand: all latex stuff in the newcommand{}, passing parameters created by R. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MCA in R
Dear Kimmo, > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of K. Elo > Sent: June-13-08 1:43 AM > To: r-help@r-project.org > Subject: Re: [R] MCA in R > > Dear John, > > thanks for Your quick reply. > > > John Fox wrote: > > Dear Kimmo, > > > > MCA is a rather old name (introduced, I think, in the 1960s by > > Songuist and Morgan in the OSIRIS package) for a linear model > > consisting entirely of factors and with only additive effects -- > > i.e., an ANOVA model will no interactions. > > It is true, that MCA is an old name, but the technique itself is still > robust, I think. The problem I am facing is that I have a research > project where I try to find out which factors affect measured knowledge > of a specific issue. As predictors I have formal education, interest, > gender and consumption of different medias (TV, newspapers etc.). Now, > these are correlated predictors and running e.g. a simple anova > (anova(lm(...)) as You suggested) won't - if I have understood correctly > - consider the problem of correlated predictors. MCA would do this. That's because anova() calculates sequential ("type-I") sums of squares; if you use the Anova() function in the car package, for example, you'll get so-called type-II sums of squares -- for each factor after the others. You could also more tediously do these tests directly using the anova() function, by contrasting alternative models: the full model and the model deleting each factor in turn. > > A colleague of mine has run anova and MCA in SPSS and the results differ > significantly. Yes, see above. > Because I am more familiar with R, I just hoped that this > marvelous statistical package could handle MCA, too :) > > > Typically, the results of > > an MCA are reported using "adjusted means." You could compute these > > manually, or via the effects package. > > Well, I am interested in the eta and beta values, too. Aren't the eta values just the square-roots of the R^2's from the individual one-way ANOVAs? I don't remember how the betas are defined, but do recall that they are a peculiar attempt to define standardized partial regression coefficients for factors that combine all of the levels. > I have tried to > use the effects package but my attempts with all.effects resulted in > errors. I have to figure out what's going wrong here :) If you tell me what you did, ideally including an example that I can reproduce, I can probably tell you what's wrong. Regards, John > > Kind regards, > Kimmo Elo > > -- > University of Turku, Finland > Dep. of political science > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Switching the order of legend boxes in a lattice bar graph
I suspect there is a simple solution to this problem, but have been unable to find it. Below is some code that I have run to create 3 lattice graphs. I have been asked to change the legend so that the 'No' and dark blue are above "Y" and light blue in the legend to mirror the stacked bars in the graph which feature dark blue above light blue. I have tried changing the data as well as the order of the legend text, without success. Any assistance is much appreciated, regards Bob Green library(lattice) SNFP1 <- as.table(matrix(c(4,1, 4,4, 1,3, 2,7, 1,6, 0,4), ncol = 6, dimnames = list(group=c("Y","No"), Status=c("A","B", "C", "D", "E", "F" barplot(SNFP1, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of patients", main ="district 1", col=c("light blue", "dark blue")) # "A","B", "C", "D", "E", "F" SNFP2 <- as.table(matrix(c(3,7, 1,5, 0,1, 0,1), ncol = 4, dimnames = list(group=c("Y","No"), Status=c("G","H", "I", "J" barplot(SNFP2, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of patients", main ="district 2", col=c("light blue", "dark blue")) # "G", "H", "I", "J", SNFP3 <- as.table(matrix(c(3,0, 0,2, 3,4), ncol = 3, dimnames = list(group=c("Y","No"), Status=c("K","L", "M" barplot(SNFP3, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of patients", main ="district 3", col=c("light blue", "dark blue")) df1 <- as.data.frame(t(SNFP1)) df2 <- as.data.frame(t(SNFP2)) df3 <- as.data.frame(t(SNFP3)) stuff <- make.groups(A=df1, B=df2, C=df3) # simple version barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE,scales=list(x=list(relation="free")), auto.key=TRUE) # advanced version barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, as.table=TRUE, layout=c(2,2), skip=c(F,T,F,F),scales=list(x=list(relation="free")), ylab="patients", main="Figure 1: X by district", par.settings=list(superpose.polygon=list(col=c("light blue", "dark blue"))), auto.key=list(x = .6, y = .7, corner = c(0, 0))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Switching the order of legend boxes in a lattice bar graph
Hi Bob, Would this: mykey <- list( rectangles = list(col=c("dark blue","light blue") ), text=list(lab=c("No","Yes")),x = .6, y = .7, corner = c(0, 0)) barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, as.table=TRUE, layout=c(2,2), skip=c(F,T,F,F),scales=list(x=list(relation="free")), ylab="patients", main="Figure 1: X by district", par.settings=list(superpose.polygon=list(col=c("light blue", "dark blue"))), key=mykey) solve your problem? Regards, Markus Markus Gesmann │Associate Director│Libero Ventures Ltd, One Broadgate, London EC2M 2QS tel: +44 (0)207 826 9080│ dir: +44 (0)207 826 9085│fax: +44 (0)207 826 9090 │www.libero.uk.com A Lehman Brothers Company AUTHORISED AND REGULATED BY THE FINANCIAL SERVICES AUTHORITY -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Green Sent: 13 June 2008 12:14 To: r-help@r-project.org Subject: [R] Switching the order of legend boxes in a lattice bar graph I suspect there is a simple solution to this problem, but have been unable to find it. Below is some code that I have run to create 3 lattice graphs. I have been asked to change the legend so that the 'No' and dark blue are above "Y" and light blue in the legend to mirror the stacked bars in the graph which feature dark blue above light blue. I have tried changing the data as well as the order of the legend text, without success. Any assistance is much appreciated, regards Bob Green library(lattice) SNFP1 <- as.table(matrix(c(4,1, 4,4, 1,3, 2,7, 1,6, 0,4), ncol = 6, dimnames = list(group=c("Y","No"), Status=c("A","B", "C", "D", "E", "F" barplot(SNFP1, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of patients", main ="district 1", col=c("light blue", "dark blue")) # "A","B", "C", "D", "E", "F" SNFP2 <- as.table(matrix(c(3,7, 1,5, 0,1, 0,1), ncol = 4, dimnames = list(group=c("Y","No"), Status=c("G","H", "I", "J" barplot(SNFP2, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of patients", main ="district 2", col=c("light blue", "dark blue")) # "G", "H", "I", "J", SNFP3 <- as.table(matrix(c(3,0, 0,2, 3,4), ncol = 3, dimnames = list(group=c("Y","No"), Status=c("K","L", "M" barplot(SNFP3, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of patients", main ="district 3", col=c("light blue", "dark blue")) df1 <- as.data.frame(t(SNFP1)) df2 <- as.data.frame(t(SNFP2)) df3 <- as.data.frame(t(SNFP3)) stuff <- make.groups(A=df1, B=df2, C=df3) # simple version barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE,scales=list(x=list(relation="free")), auto.key=TRUE) # advanced version barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, as.table=TRUE, layout=c(2,2), skip=c(F,T,F,F),scales=list(x=list(relation="free")), ylab="patients", main="Figure 1: X by district", par.settings=list(superpose.polygon=list(col=c("light blue", "dark blue"))), auto.key=list(x = .6, y = .7, corner = c(0, 0))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message is intended for the personal and confidential use for the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction or as an official statement of Libero Ventures Ltd. Email transmissions cannot be guaranteed to be secure or error-free. Therefore we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing a new link for a GLM.
I wrote an R-news note about this sort of thing in 2006, you can navigate there via CRAN... url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Jun 13, 2008, at 4:54 AM, Jan Graffelman wrote: Hi, I wish to write a new link function for a GLM. R's glm routine does not supply the "loglog" link. I modified the make.link function adding the code: }, loglog = { linkfun <- function(mu) -log(-log(mu)) linkinv <- function(eta) exp(-exp(-eta)) mu.eta <- function(eta) exp(-exp(-eta)-eta) valideta <- function(eta) all(eta != 0) }, stop(sQuote(link), " link not recognised")) structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = "link-glm") } and then call glm with argument glm(y~x1+x2+x3,family=binomial(link=make.link("loglog")),data=X) and that seems to work. Is this the way to include a new link function? Any other suggestions? Jan. -- |Jan Graffelman |tel: +34-93-4011739| |Dpt. of Statistics & Operations Research|fax: +34-93-4016575| |Universitat Politecnica de Catalunya|email: [EMAIL PROTECTED] | |Av. Diagonal 647, 6th floor | www: | |08028 Barcelona, Spain | http://www-eio.upc.es/~jan/ | __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R and Browninan Motion/ Langevin Equation package
Hi, I'm writing a short course tutorial to Browninan Motion/ Langevin Equation. At the end of the theory section I wanted to add a short GNU R example, so the students can play a little around. I already looked in the MASS book (by Venables and Ripley) but I couldn't find any Brownian Motion/ Langevin Equation package. Are there any good packages or tutorials available which cover R and Browninan Motion/ Langevin Equation? Thanks Peter -- Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MCA in R
Although John Fox naturally mentions his Anova function, I would like to point out that drop1() (and MASS::dropterm) also does the tests of Type-II ANOVA of which John says 'more tediously do these tests directly'. It seems a lot easier to teach newcomers about drop1() than to introduce the SAS terminology and then say (to quote ?Anova) 'the definitions used here do not correspond precisely to those employed by SAS' (I would welcome a description of the precise differences on the Anova help page.) On Fri, 13 Jun 2008, John Fox wrote: Dear Kimmo, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of K. Elo Sent: June-13-08 1:43 AM To: r-help@r-project.org Subject: Re: [R] MCA in R Dear John, thanks for Your quick reply. John Fox wrote: Dear Kimmo, MCA is a rather old name (introduced, I think, in the 1960s by Songuist and Morgan in the OSIRIS package) for a linear model consisting entirely of factors and with only additive effects -- i.e., an ANOVA model will no interactions. It is true, that MCA is an old name, but the technique itself is still robust, I think. The problem I am facing is that I have a research project where I try to find out which factors affect measured knowledge of a specific issue. As predictors I have formal education, interest, gender and consumption of different medias (TV, newspapers etc.). Now, these are correlated predictors and running e.g. a simple anova (anova(lm(...)) as You suggested) won't - if I have understood correctly - consider the problem of correlated predictors. MCA would do this. That's because anova() calculates sequential ("type-I") sums of squares; if you use the Anova() function in the car package, for example, you'll get so-called type-II sums of squares -- for each factor after the others. You could also more tediously do these tests directly using the anova() function, by contrasting alternative models: the full model and the model deleting each factor in turn. A colleague of mine has run anova and MCA in SPSS and the results differ significantly. Yes, see above. Because I am more familiar with R, I just hoped that this marvelous statistical package could handle MCA, too :) Typically, the results of an MCA are reported using "adjusted means." You could compute these manually, or via the effects package. Well, I am interested in the eta and beta values, too. Aren't the eta values just the square-roots of the R^2's from the individual one-way ANOVAs? I don't remember how the betas are defined, but do recall that they are a peculiar attempt to define standardized partial regression coefficients for factors that combine all of the levels. I have tried to use the effects package but my attempts with all.effects resulted in errors. I have to figure out what's going wrong here :) If you tell me what you did, ideally including an example that I can reproduce, I can probably tell you what's wrong. Regards, John Kind regards, Kimmo Elo -- University of Turku, Finland Dep. of political science __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MCA in R
Dear Brian, > -Original Message- > From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] > Sent: June-13-08 8:13 AM > To: John Fox > Cc: 'K. Elo'; r-help@r-project.org > Subject: Re: [R] MCA in R > > Although John Fox naturally mentions his Anova function, I would like to > point out that drop1() (and MASS::dropterm) also does the tests of Type-II > ANOVA of which John says 'more tediously do these tests directly'. It's true that for an additive model (such as Kimmo's), drop1() and Anova() produce the same sums of squares, but for a model in which some terms are marginal to others, drop1() produces tests only for the high-order terms. One could specify scope = ~ . to drop1(), but that produces so-called "type-III" tests. Perhaps there's some convenient way around this of which I'm unaware. > > It seems a lot easier to teach newcomers about drop1() than to introduce > the SAS terminology and then say (to quote ?Anova) > >'the definitions used here do not correspond precisely to those > employed by SAS' > > (I would welcome a description of the precise differences on the Anova > help page.) As I recall, the differences are for "type-III" tests, where in Anova() these are dependent upon contrast coding. Regards, John > > > On Fri, 13 Jun 2008, John Fox wrote: > > > Dear Kimmo, > > > >> -Original Message- > >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > On > >> Behalf Of K. Elo > >> Sent: June-13-08 1:43 AM > >> To: r-help@r-project.org > >> Subject: Re: [R] MCA in R > >> > >> Dear John, > >> > >> thanks for Your quick reply. > >> > >>> John Fox wrote: > >>> Dear Kimmo, > >>> > >>> MCA is a rather old name (introduced, I think, in the 1960s by > >>> Songuist and Morgan in the OSIRIS package) for a linear model > >>> consisting entirely of factors and with only additive effects -- > >>> i.e., an ANOVA model will no interactions. > >> > >> It is true, that MCA is an old name, but the technique itself is still > >> robust, I think. The problem I am facing is that I have a research > >> project where I try to find out which factors affect measured knowledge > >> of a specific issue. As predictors I have formal education, interest, > >> gender and consumption of different medias (TV, newspapers etc.). Now, > >> these are correlated predictors and running e.g. a simple anova > >> (anova(lm(...)) as You suggested) won't - if I have understood correctly > >> - consider the problem of correlated predictors. MCA would do this. > > > > That's because anova() calculates sequential ("type-I") sums of squares; if > > you use the Anova() function in the car package, for example, you'll get > > so-called type-II sums of squares -- for each factor after the others. You > > could also more tediously do these tests directly using the anova() > > function, by contrasting alternative models: the full model and the model > > deleting each factor in turn. > > > >> > >> A colleague of mine has run anova and MCA in SPSS and the results differ > >> significantly. > > > > Yes, see above. > > > >> Because I am more familiar with R, I just hoped that this > >> marvelous statistical package could handle MCA, too :) > >> > >>> Typically, the results of > >>> an MCA are reported using "adjusted means." You could compute these > >>> manually, or via the effects package. > >> > >> Well, I am interested in the eta and beta values, too. > > > > Aren't the eta values just the square-roots of the R^2's from the > individual > > one-way ANOVAs? I don't remember how the betas are defined, but do recall > > that they are a peculiar attempt to define standardized partial regression > > coefficients for factors that combine all of the levels. > > > >> I have tried to > >> use the effects package but my attempts with all.effects resulted in > >> errors. I have to figure out what's going wrong here :) > > > > If you tell me what you did, ideally including an example that I can > > reproduce, I can probably tell you what's wrong. > > > > Regards, > > John > > > >> > >> Kind regards, > >> Kimmo Elo > >> > >> -- > >> University of Turku, Finland > >> Dep. of political science > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865
Re: [R] Problems with mars in R in the case of nonlinear functions
| I'm trying to use mars function in R to interpolate nonlinear | multivariate functions. | However, it seems that mars gives me a fit which uses only very few | basis function and it underfits very badly. Try the "earth" package which extends the mars function in the mda package. Your example becomes library(earth) # was mda f <- function(x,y) { x^2-y^2 } x <- seq(-1,1,length=10) x <- outer(x*0,x,FUN="+") y <- t(x) X <- cbind(as.vector(x),as.vector(y)) z <- f(x,y) fit <- earth(X, as.vector(z)) summary(fit) plotmo(fit) # note better fit than before # your original plotting code could be used too For this kind of data, you could possibly use the minspan parameter. MARS by default does not allow every observation to be used as a knot in the generated basis functions. This strategyy increases resistance to runs of correlated noise in the data. For non-noisy data, you can set minspan=1 to allow MARS to consider every observation as a potential knot. If your data were noisy then minspan=1 could overfit the data. With earth, you can use trace=2 to see the calculated minspan value. If you run the above example with the earth parameter trace=1, you will see that the stopping condition for the forward pass is: Reached delta RSq threshold (DeltaRSq 0.00030214 < 0.001) To make the forward pass continue further, change the "delta RSq threshold" by using the thresh parameter: fit <- earth(X, as.vector(z), thresh=1e-6) The resulting model "looks" better when plotted, but note that using thresh here makes almost no change to the GRSq. That is, with the lower threshold the model is more complicated (has more terms) but does not have a greater predictive power. The threshold is just one of the reasons that the forward pass can terminate (reaching the the maximum number of terms nk is another). AFAIK Friedman's code (that you ran from Matlab) does not use the threshold but instead just continues forward stepping until nk is reached. In this case the Matlab model is arguably more complicated than it need be. I believe the forward threshhold for MARS was an innovation of Hastie and Tibshirani, but I could be wrong. To reduce mailing list traffic, let's continue this discussion off-line i.e. by direct mail to each other, and if necessary I will summarize results of our discussions in the earth documentation. Regards Steve | Message: 76 | Date: Thu, 12 Jun 2008 13:35:35 -0700 | From: Janne Huttunen <[EMAIL PROTECTED]> | Subject: [R] Problems with mars in R in the case of nonlinear | functions | To: | Message-ID: <[EMAIL PROTECTED]> | Content-Type: text/plain; charset=ISO-8859-1; format=flowed | | Hi, | | I'm trying to use mars function in R to interpolate nonlinear | multivariate functions. | However, it seems that mars gives me a fit which uses only very few | basis function and | it underfits very badly. | | For example, I have tried the following code to test mars: | | require("mda") | | f <- function(x,y) { x^2-y^2 }; | #f <- function(x,y) { x+2*y }; | | # Grid | x <- seq(-1,1,length=10); | x <- outer(x*0,x,FUN="+"); y <- t(x); | X <- cbind(as.vector(x),as.vector(y)); | | # Data | z <- f(x,y); | | fit <- mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2); | | # Plotting | par(mfrow=c(1,2),pty="s") | lims <- c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted | persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50, | xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims) | persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed', |col='lightblue', | xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS', |phi=25,theta=55,zlim=lims) | | (the code is also here if someone wants to try it: | http://venda.uku.fi/~jmhuttun/R/marstest.R) | | The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The | fitted model contains only | 5 terms which is not enough in this case. Adjusting parameters like nk, | thresh, penalty and degree | seems only have minor effect or no effect at all. It's also strange that | when I increase | the number of points in the grid, the results are ever worse: | see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid. | However Mars seems to work well with linear functions (e.g. with the | function which | is commented in the above code). | | Do anyone know what is wrong in this case? Do I miss something is there | something | wrong in my code? | | This seems not to be a problem with MARS method in general. For example, | Friedman's MARS implementation (ran in Matlab) gives a rather good fit: | see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf . | | Thank you | | Janne | | -- | Janne Huttunen | University of California | Department of Statistics | 367 Evans Hall Berlekey, CA 94720-3860 | email: [EMAIL PROTECTED] | phone: +1-510-502-5205 | office room: 449 Evans Hall __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/
Re: [R] R and Browninan Motion/ Langevin Equation package
Google "R and Browninan Motion".. It turned up this link: http://landshape.org/enm/r-code-for-brownian-motion/ Mybe this will help. On Fri, Jun 13, 2008 at 8:08 AM, Peter Mueller <[EMAIL PROTECTED]> wrote: > Hi, > > I'm writing a short course tutorial to Browninan Motion/ Langevin Equation. > At the end of the theory section I wanted to add a short GNU R example, so > the students can play a little around. > > I already looked in the MASS book (by Venables and Ripley) but I couldn't > find any Brownian Motion/ Langevin Equation package. > Are there any good packages or tutorials available which cover R and > Browninan Motion/ Langevin Equation? > > Thanks > Peter > -- > > Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] piper diagram
RSEIS - I think may have a piper diagram. On Thu, Jun 12, 2008 at 8:39 PM, Michael Grant <[EMAIL PROTECTED]> wrote: > Sorry no previous message text or addresses, but I just cleaned my mailbox > and then found something relevant. Regarding the Piper diagram. I just > noticed the 'hydrogeo' package on CRAN, courtesy of one Myles English. That > should be what you need or close to it. > > > > Best regards, > > Michael Grant > > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C# and R
Hello R-Users, I came across this link on CodeProject.com and was wondering, if anyone has implemented this and the benefits of doing so. This may also be of some help for others. Here is a link to the project: http://www.codeproject.com/KB/cs/RtoCSharp.aspx Regards, Neil Gupta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C# and R
This is about Windows, C# and R-(D)COM. The latter has its own list which would be much more appropriate. See http://sunsite.univie.ac.at/rcom/ (Linked from CRAN->Software->Other.) On Fri, 13 Jun 2008, Neil Gupta wrote: Hello R-Users, I came across this link on CodeProject.com and was wondering, if anyone has implemented this and the benefits of doing so. This may also be of some help for others. Here is a link to the project: http://www.codeproject.com/KB/cs/RtoCSharp.aspx Regards, Neil Gupta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with Freq function {prettyR}
Does someone have an idea? Thanks a lot! Udo Quoting Udo <[EMAIL PROTECTED]>: > Dear list, > I have a problem with freq from prettyR. > > Please have a look at my syntax with a litte example: > > > library(prettyR) > > #Version 1 > test.df<-data.frame(q1=sample(1:4,8,TRUE), gender=sample(c("f","m"),8,TRUE)) > test.df > freq(test.df) #No error message > > #Version 2 > test.df<-data.frame(gender=sample(c("f","m"),8,TRUE), q1=sample(1:4,8,TRUE)) > test.df > freq(test.df) > > Error message: "Error in vector("integer", length) : Vector size can´t be NA" > > Can someone tell me, why an error message occurs in version two? I am > helpless... > > Thanks in advance! > > Udo K ö n i g > > > > Clinic for Child an Adolescent Psychiatry > Philipps University of Marburg / Germany > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Access violation when calling Front41
Hello! When I tried to call Front41 in R, I met some problem. After I entered: system ('front41.exe'), an error occured : "jwe0019i-u The program was terminated abnormally with Exception Code EXCEPTION_ACCESS_VIOLATION. error summary (Fortran) error number error level error count jwe0019i u 1 total error count = 1 FRONTIER - Version 4.1c *** " How can i deal with it? -- Siyi FENG Department of Agricultural Economics Texas A&M University, 2124 TAMU College Station, TX 77843-2124 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting data-frame by vector of characters
Hi, I have a very simple problem but I can't think how to solve it without using a for loop and creating a large logical vector. However given the nature of the problem I am sure there is a "1-liner" that could do the same thing much more efficiently. bascially I have a dataframe with characters in, eg >names.and.numbers (index)NameFave.Number 1John7 2Tony12 3Phil14 4Adam22 5Robert23 Now, imagine I have a vector of names, ie: >names = c("John,Phil,Robert") All I want to do is get the subset of the dataframe which corresponds to the names in the vector "Names". IE (index)NameFave.Number 1John7 2Phil14 3Robert23 Sorry, I know its trivial but I'm new to R and its hard to start thinking in R, as I say, I've written a complicated for loop using intersect and creating a logical table, but this is very long winded!!! Regards, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Maximum likelihood estimation in R with censored Data
Hello, I'm trying to calculate the Maximum likelihood estimators for a dataset which contains censored data. I started by using the function "nlm", but isn't there a separate method for doing this for e.g. the "weibull" and the "log-normal" distribution? Thanks, Olivia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package Installation produces "linux/limits.h: No such file or directory" error when installing the lpSolve package
Hi, I too had this same problem but it got resolved by installing two packages : 1. kernel-headers 2. kernel-devel I hope this helps in your case. Regards Sharwan Joe_K wrote: > > Dear Friends, > > I am trying to install a few packages in R and am receiving error > messages. Since the error messages are different, I am posting them > separately. The second error is with the installation of lpSolve. > > The core error message is: > > In file included from /usr/include/bits/posix1_lim.h:153, > from /usr/include/limits.h:145, > from > /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:122, > from > /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/syslimits.h:7, > from > /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:11, > from colamd.c:677: > /usr/include/bits/local_lim.h:36:26 > error: linux/limits.h: No such file or directory > make: *** [colamd.o] Error 1 > ERROR: compilation failed for package 'lpSolve' > > > The first things that I tried was to figure out where linux/limits.h was. > I discovered that there are seven versions of limits.h on the system and > they are not identical. > > /usr/include/limits.h > /usr/src/linux-2.6.22.13-0.3/Documentation/i2c/chips/limits.h > /usr/include/c++/4.2.1/tr1/limits.h > /usr/lib64/qt4/demos/qtdemo/xml/limits.h > /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h > /usr/src/linux-2.6.22.13-0.3/include/asm-arm/limits.h > /usr/src/linux-2.6.22.13-0.3/include/asm-arm26/limits.h > > Only one has "linux" immediately preceding it in the path: > /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h > > I assume that /usr/include/bits/local_lim.h is trying to use a relative > path. The only line in local_lim.h with limits.h in it is: > > #include > > So, I tried modifying the line to read: > > #include > > That did not work, so I changed it back again. I guess my theory about it > looking for a relative path was wrong. > > Since then, I have been Googling the issue all weekend and have found > similar errors, but not exactly the same. Some are suggesting changing > kernel headers and other files. Since the context of these other posts > are dissimilar, I figured it best not to mess with kernel headers or some > of the other radical solutions offered. > > There was one suggestion in a post to install glibc-headers, however, I > cannot seem to find that for Suse 10.3. Is it something included in > another package? Is it something that is now obsolete? > > CAN ANYONE HELP ME DEBUG THIS? > > I am running R version 2.6.1 (2007-11-26) on Suse Linux 10.3 64-bit x86_64 > on a Boxx Technologies computer with a TYAN Thunder K8WE S2895 Motherboard > with 4Gb Ram and 2 dual CPUs (total of 4 CPUs). The CPUs are AMD Opteron. > Hard Disk Usage is 4 150 Gb SATA drives array with a Com3 9550SX > Controller set at RAID 5. > > The full error message received from Rkward upon the package installation > attempt was: > > R version 2.6.1 (2007-11-26) > Copyright (C) 2007 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. >> > options (repos=c (CRAN="http://lib.stat.cmu.edu/R/CRAN";)) >> install.packages (pkgs=c ("lpSolve"), >> lib="/home/joe/R/x86_64-unknown-linux-gnu-library/2.6", >> destdir="/home/joe/.rkward/package_archive", dependencies=TRUE) > trying URL > 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/lpSolve_5.5.8.tar.gz' > Content type 'application/x-gzip' length 449804 bytes (439 Kb) > opened URL > > downloaded 439 Kb > /home/joe/R/x86_64-unknown-linux-gnu-library/2.6 > * Installing *source* package 'lpSolve' ... > ** libs > gcc -std=gnu99 -I/usr/lib64/R/include -I/usr/lib64/R/include -I . > -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL > -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL > -DINLINE=static -DParanoia -I/usr/local/include-fpic -g -O2 -c > colamd.c -o colamd.o > In file included from /usr/include/bits/posix1_lim.h:153, > from /usr/include/limits.h:145, > from > /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:122, > from > /usr/lib64/gcc/x86_64-suse-linux/4
Re: [R] Problem with Freq function {prettyR}
Since this is a contributed package, you should be contacting the maintainer (as mentioned in the posting guide). Anyway, the problem occurs because in the second case you have a factor in the first column and numeric in the second. This part of the code will illustrate what I mean: for (i in 1:nfreq) { if (display.na) nna <- sum(is.na(x[[i]])) else nna <- 0 xt <- na.omit(x[[i]]) if (is.null(levels)) levels <- unique(xt) if (is.numeric(x[[i]])) xt <- factor(xt, levels = levels) So the first time through this loop the levels variable is set to c("m","f"). On the second time levels is no longer NULL, so when the xt variable is created it is essentially this: xt <- factor(xt, levels = c("m","f")) and since xt contains only numbers you get [1] Levels: m f Best, Jim [EMAIL PROTECTED] wrote: Does someone have an idea? Thanks a lot! Udo Quoting Udo <[EMAIL PROTECTED]>: Dear list, I have a problem with freq from prettyR. Please have a look at my syntax with a litte example: library(prettyR) #Version 1 test.df<-data.frame(q1=sample(1:4,8,TRUE), gender=sample(c("f","m"),8,TRUE)) test.df freq(test.df) #No error message #Version 2 test.df<-data.frame(gender=sample(c("f","m"),8,TRUE), q1=sample(1:4,8,TRUE)) test.df freq(test.df) Error message: "Error in vector("integer", length) : Vector size can´t be NA" Can someone tell me, why an error message occurs in version two? I am helpless... Thanks in advance! Udo K ö n i g Clinic for Child an Adolescent Psychiatry Philipps University of Marburg / Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting data-frame by vector of characters
On 6/13/2008 10:07 AM, james perkins wrote: Hi, I have a very simple problem but I can't think how to solve it without using a for loop and creating a large logical vector. However given the nature of the problem I am sure there is a "1-liner" that could do the same thing much more efficiently. bascially I have a dataframe with characters in, eg >names.and.numbers (index)NameFave.Number 1John7 2Tony12 3Phil14 4Adam22 5Robert23 Now, imagine I have a vector of names, ie: >names = c("John,Phil,Robert") All I want to do is get the subset of the dataframe which corresponds to the names in the vector "Names". IE (index)NameFave.Number 1John7 2Phil14 3Robert23 Sorry, I know its trivial but I'm new to R and its hard to start thinking in R, as I say, I've written a complicated for loop using intersect and creating a logical table, but this is very long winded!!! How about this: subset(names.and.numbers, Name %in% mynames) where mynames is the vector of names you want? ?subset ?is.element Regards, Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] histogram
Hello everyone, I am trying to plot a histogram from the following code: dat<-read.table(file="C:\\Documents and Settings\\Owner\\My Documents\\Yeast\\Yeast.txt",header=T,row.names=1) file.show(file="C:\\Documents and Settings\\Owner\\My Documents\\Yeast\\Yeast.txt") x<-dat[2,23:46] y=mean(x,trim=0,na.rm=T) colMeans(dat[2,23:46]) boxplot(dat[2,23:46]) hist(dat[2,23:46]) The box plot is fine but the histogram keeps giving me the error that x must be numeric.I am not sure what is wrong here with the instructions for the histogram plot. Any help would be appreciated Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting data-frame by vector of characters
james perkins wrote: > Hi, > > I have a very simple problem but I can't think how to solve it without > using a for loop and creating a large logical vector. However given > the nature of the problem I am sure there is a "1-liner" that could do > the same thing much more efficiently. > > bascially I have a dataframe with characters in, eg > > >names.and.numbers > > (index)NameFave.Number > 1John7 > 2Tony12 > 3Phil14 > 4Adam22 > 5Robert23 > > > Now, imagine I have a vector of names, ie: > > >names = c("John,Phil,Robert") this is a one-element vector of string(s) that are concatenated names (strings with names). or you mean: names = c("John", "Phil", "Robert") > > All I want to do is get the subset of the dataframe which corresponds > to the names in the vector "Names". IE > > (index)NameFave.Number > 1John7 > 2Phil14 > 3Robert23 this should do: names.and.numbers[names.and.numbers$Name %in% names,] if names is as you say above, do names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ] you do create a logical vector here (what does 'large' mean?), but no loop is involved at the surface. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram
It is hard to respond without reproducible examples. Do str(dat[2,23:46]) and see what it reports. My guess is that one of the columns is not numeric. Find out which one it is, fix it and then try 'hist' again. On Fri, Jun 13, 2008 at 10:21 AM, Paul Adams <[EMAIL PROTECTED]> wrote: > Hello everyone, > I am trying to plot a histogram from the following code: > dat<-read.table(file="C:\\Documents and Settings\\Owner\\My > Documents\\Yeast\\Yeast.txt",header=T,row.names=1) > file.show(file="C:\\Documents and Settings\\Owner\\My > Documents\\Yeast\\Yeast.txt") > x<-dat[2,23:46] > y=mean(x,trim=0,na.rm=T) > colMeans(dat[2,23:46]) > boxplot(dat[2,23:46]) > hist(dat[2,23:46]) > The box plot is fine but the histogram keeps giving me the error that x > must be numeric.I am not sure what is wrong here with the instructions > for the histogram plot. > Any help would be appreciated > Paul > > > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram
Paul Adams wrote: Hello everyone, I am trying to plot a histogram from the following code: dat<-read.table(file="C:\\Documents and Settings\\Owner\\My Documents\\Yeast\\Yeast.txt",header=T,row.names=1) file.show(file="C:\\Documents and Settings\\Owner\\My Documents\\Yeast\\Yeast.txt") x<-dat[2,23:46] y=mean(x,trim=0,na.rm=T) colMeans(dat[2,23:46]) boxplot(dat[2,23:46]) hist(dat[2,23:46]) Check what the class of your object is class(dat[2, 23:46]) may be a data.frame. If so, you can try to convert accordingly (see ?as.numeric) Erik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting data-frame by vector of characters
Thanks a lot for that. Its the %in% I needed to work out mainly large didn't mean anything in particular, just that it gets quite long with the real data. I did mean: names = c("John", "Phil", "Robert") The only problem is that using the method you suggest is that I lose the indexing, ie in the example, instead of: (index)NameFave.Number 1John7 2Phil14 3Robert23 I end up with (index) Name Fave.Number 1 John 7 3 Phil 14 5 Robert 23 This isnt a problem at the moment but I guess it could be if I used the table later in loops. Is there an easy way to re-index the table? Kind regards Jim Wacek Kusnierczyk wrote: james perkins wrote: Hi, I have a very simple problem but I can't think how to solve it without using a for loop and creating a large logical vector. However given the nature of the problem I am sure there is a "1-liner" that could do the same thing much more efficiently. bascially I have a dataframe with characters in, eg names.and.numbers (index)NameFave.Number 1John7 2Tony12 3Phil14 4Adam22 5Robert23 Now, imagine I have a vector of names, ie: names = c("John,Phil,Robert") this is a one-element vector of string(s) that are concatenated names (strings with names). or you mean: names = c("John", "Phil", "Robert") All I want to do is get the subset of the dataframe which corresponds to the names in the vector "Names". IE (index)NameFave.Number 1John7 2Phil14 3Robert23 this should do: names.and.numbers[names.and.numbers$Name %in% names,] if names is as you say above, do names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ] you do create a logical vector here (what does 'large' mean?), but no loop is involved at the surface. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram
Hi, please someone correct me, but On 13/06/2008, 07:21, [EMAIL PROTECTED] wrote: > dat<-read.table(file="C:\\Documents and Settings\\Owner\\My > Documents\\Yeast\\Yeast.txt",header=T,row.names=1) Check mode and class of dat. read.table provided you with a dataframe of, essentially, string data. You have to apply as.numeric where it fits. > x<-dat[2,23:46] ^ most probably here. Regards Lars p.s. Your code is awfully to read, please add some spaces where appropriate. -- Lars Fischertel: +49 (0)6151 16-2889 Technische Universität Darmstadt Fachbereich Informatik/ FG Sicherheit in der Informationstechnik PGP FPR: A197 CBE1 91FC 0CE3 A71D 77F2 1094 CB6E CEE3 7111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram
jim holtman wrote: > It is hard to respond without reproducible examples. Do > str(dat[2,23:46]) and see what it reports. My guess is that one of > the columns is not numeric. Find out which one it is, fix it and then > try 'hist' again. > No, this will be wrong whatever the data are. The problem is that dat[2,23:46] is a one-row dataframe, i.e. a list, which is not a numeric vector. Possibly hist(unlist(dat[2,23:46])) is what is wanted. I don't think the boxplot is "fine" either, except in the sense that it does not give an error (try boxplot(airquality[2,])). > On Fri, Jun 13, 2008 at 10:21 AM, Paul Adams <[EMAIL PROTECTED]> wrote: > >> Hello everyone, >> I am trying to plot a histogram from the following code: >> dat<-read.table(file="C:\\Documents and Settings\\Owner\\My >> Documents\\Yeast\\Yeast.txt",header=T,row.names=1) >> file.show(file="C:\\Documents and Settings\\Owner\\My >> Documents\\Yeast\\Yeast.txt") >> x<-dat[2,23:46] >> y=mean(x,trim=0,na.rm=T) >> colMeans(dat[2,23:46]) >> boxplot(dat[2,23:46]) >> hist(dat[2,23:46]) >> The box plot is fine but the histogram keeps giving me the error that x >> must be numeric.I am not sure what is wrong here with the instructions >> for the histogram plot. >> Any help would be appreciated >> Paul >> >> >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > > -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting data-frame by vector of characters
james perkins wrote: > Thanks a lot for that. Its the %in% I needed to work out mainly > > large didn't mean anything in particular, just that it gets quite long > with the real data. > I did mean: names = c("John", "Phil", "Robert") > > The only problem is that using the method you suggest is that I lose > the indexing, ie in the example, instead of: > > (index)NameFave.Number > 1John7 > 2Phil14 > 3Robert23 > > > I end up with > > > (index) Name Fave.Number > 1 John 7 > 3 Phil 14 > 5 Robert 23 > > This isnt a problem at the moment but I guess it could be if I used > the table later in loops. Is there an easy way to re-index the table? > Notice that these are names, not numbers: result[2,1] is "Phil" in both cases. If it bothers you, just set rownames(result) <- NULL (BTW, are your names unique? in that case you could set them as rownames and use them for indexing: rownames(names.and.numbers) <- names.and.numbers$Name names.and.numbers[names, ] > Kind regards > > Jim > > Wacek Kusnierczyk wrote: >> james perkins wrote: >> >>> Hi, >>> >>> I have a very simple problem but I can't think how to solve it without >>> using a for loop and creating a large logical vector. However given >>> the nature of the problem I am sure there is a "1-liner" that could do >>> the same thing much more efficiently. >>> >>> bascially I have a dataframe with characters in, eg >>> >>> names.and.numbers >>> (index)NameFave.Number >>> 1John7 >>> 2Tony12 >>> 3Phil14 >>> 4Adam22 >>> 5Robert23 >>> >>> >>> Now, imagine I have a vector of names, ie: >>> >>> names = c("John,Phil,Robert") >> >> this is a one-element vector of string(s) that are concatenated names >> (strings with names). >> or you mean: names = c("John", "Phil", "Robert") >> >> >> >>> All I want to do is get the subset of the dataframe which corresponds >>> to the names in the vector "Names". IE >>> >>> (index)NameFave.Number >>> 1John7 >>> 2Phil14 >>> 3Robert23 >>> >> >> this should do: >> names.and.numbers[names.and.numbers$Name %in% names,] >> >> if names is as you say above, do >> names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ] >> >> you do create a logical vector here (what does 'large' mean?), but no >> loop is involved at the surface. >> >> vQ >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package under unix
Hi the list, I write a package for clusterizing longitudinal data using a non parametric algorithm. I develop the package under windows. To be as user friendly as possible, the package use some graphical procedure to "show" to the user the evolution of the cluster construction, and to export the graph in a friendly way. Here are some example : http://christophe.genolini.free.fr/kml Everything works fine... under windows. Unfortunately, it seems it does not work under linux. I first use the instruction: windows(5,5,xpos=0) which seems to be incompatible. Then I used : if(getOption("device")=="windows"){windows(5,5,xpos=0)}else{} but it is non portable either. I do not know linux so it will be very hard for me to test and change my code. On the other hand, I spend a lot of time to develop a graphical interface for exporting the result in a easy way, so it would be a pity to remove the code that deal with graphics. Can someone help ? Christophe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting data-frame by vector of characters
james perkins wrote: > Thanks a lot for that. Its the %in% I needed to work out mainly > > large didn't mean anything in particular, just that it gets quite long > with the real data. > I did mean: names = c("John", "Phil", "Robert") > > The only problem is that using the method you suggest is that I lose > the indexing, ie in the example, instead of: > > (index)NameFave.Number > 1John7 > 2Phil14 > 3Robert23 > > > I end up with > > > (index) Name Fave.Number > 1 John 7 > 3 Phil 14 > 5 Robert 23 > > This isnt a problem at the moment but I guess it could be if I used > the table later in loops. Is there an easy way to re-index the table? strange. i run this simulated example, and it's ok: d = data.frame(a=letters[rep(1:5,2)], b=letters[10:1]) d[d$a %in% letters[1:3], ] you can always add an index column: d = data.frame(index=1:dim(d)[[1]],d) vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wanted: your examples of logged axes with custom tick marks
Dear all, I'm trying to improve the default layout of tick marks for log scaled axes in ggplot2. To this end, it would be really useful to see what people actually do in practice. If you've ever made a log-log (or semi-log) plot and customised the location of the ticks, I'd really appreciate a copy of your graph (if it's publicly available) or a statement of the range of the data, and the tick marks you used. I'm not aware of any published research on this topic, but if I've missed something, a pointer to relevant work would be greatly appreciated. Thanks! Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CRAN package XML (omegahat)
Hi, I'm having issues using this package to parse large XML files. Where should bugs be reported? The omegahat website has several broken links. Regards David Keegan. -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rest of a division
Dear useRs, How do I ask for the rest of a division? For instantce, in C is like: 4%2 = 0 Best regards, -- Eric B Ferreira Exact Sciences Department Federal University of Lavras Brasil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rest of a division
Eric Ferreira wrote: > Dear useRs, > > How do I ask for the rest of a division? > > For instantce, in C is like: > > 4%2 = 0 > > Best regards, > > > 4%%2 [1] 0 -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rest of a division
?"%%" On Jun 13, 2008, at 11:23 AM, Eric Ferreira wrote: Dear useRs, How do I ask for the rest of a division? For instantce, in C is like: 4%2 = 0 Best regards, -- Eric B Ferreira Exact Sciences Department Federal University of Lavras Brasil Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CRAN package XML (omegahat)
Bugs to the package maintainer, for this and all packages > packageDescription('XML')[['Maintainer']] [1] "Duncan Temple Lang <[EMAIL PROTECTED]>" Best luck will come with the usual, sessionInfo(), easily reproducible and compact example, use of current software versions, etc. Martin David Keegan wrote: Hi, I'm having issues using this package to parse large XML files. Where should bugs be reported? The omegahat website has several broken links. Regards David Keegan. -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave: looping over mixed R/LaTeX code
Stephan Kolassa wrote on 06/13/2008 03:22 AM: Dear guRus, I would like to loop over a medium amount of Sweave code, including both R and LaTeX chunks. Is there any way to do so? As an illustration, can I create a .tex file like this using a loop within a .Rnw file, where the "1,2,3" comes from some iteration variable in R? \documentclass{article} \usepackage{Sweave} \begin{document} Iteration 1 Iteration 2 Iteration 3 \end{document} Another alternative would be to use the brew package from CRAN: http://cran.r-project.org/web/packages/brew/index.html While the disadvantage would be a change of syntax from Sweave to brew, you would gain the advantage of looping over code chunks. brew also installs a collection of example files, one being a conversion of the Sweave test file to brew. Scope out the 'Examples' section from the brew help page. Best, Jeff Right now, I do have a working but painful solution. I put the loop contents in a separate loop.Rnw file, then: 1. run everything before the loop through R for initialization 2. Sweave loop.Rnw; shell("move loop.tex loop_1.tex") Sweave loop.Rnw; shell("move loop.tex loop_2.tex") ... Sweave loop.Rnw; shell("move loop.tex loop_n.tex") 3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw This does what I need, however, it is a major pain code-wise, e.g., there appears to be no way to control the loop during execution (n must be known in advance), and I need to control all graphics using \includegraphics with the iteration counter paste()d into the filename. An alternative may be not using Sweave and working with one giant sink() and lots of print()s, letting R just write the entire .tex file. This also appears inelegant to me. Is there a better way to do this? I have tried to do my homework, see below. Do I get partial credit ;-) ? Thank you all for your time! Stephan # I can't simply start a for loop within an R chunk and finish it in another one. whiledo in the ifthen.sty package doesn't like Sweave at all. And of course, it would simply reuse the R chunks if it did work, without changing things between loops. For the same reason, I cannot define a \newcommand{\loopcontent}{...} with the entire loop contents and then simply write \loopcontent \loopcontent ... or \input or \include the loop content from an external file. Of course it would be possible to not use Sweave and just use the output from the R console, but there are a couple of figures I would really like to see close to the relevant portions of the calculations. I also thought about putting the entire loop in *one* R chunk, but then I see no way to include LaTeX chunks *within* this R chunk. I can't just sink() to the .tex file in the middle of the R chunk (as the sink() gets appended to the .tex file only after Sweave is done with it). I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did both RSiteSearches and RSeek searches for all combinations of "Sweave" and "loop", "for", "while" I could think of. For what it's worth, here's my sessionInfo(): R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets tcltk methods base other attached packages: [1] svIDE_0.9-5 loaded via a namespace (and not attached): [1] svMisc_0.9-5 -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://biostat.mc.vanderbilt.edu/JeffreyHorner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rest of a division
?Arithmetic Eric Ferreira wrote: Dear useRs, How do I ask for the rest of a division? For instantce, in C is like: 4%2 = 0 Best regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with colsplit (reshape)
Dear list, I'm trying to figure out how to use the reshape package to reshape data from a "wide" format to a "long" format. I have data like this pid <- c(1:10) predA <- c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2) predB.1 <- c(0,0,0,1,1,0,0,0,1,1) predB.2 <- c(2,2,3,3,3,2,2,3,3,3) predC.1 <- c(10,10,10,10,10,11,11,11,11,11) predC.2 <- c(12,12,13,13,13,12,12,13,13,13) out.1 <- c(100:109) out.2 <- c(200:209) Data <- data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out. 1, out.2) and I want to make it look like this: head(L.Data <- reshape(Data, varying = list(3:4, 5:6, 7:8), idvar="pid", v.names=c("PredA", "PredB", "Out"), timevar="measure.num", times=c(1,2), direction="long")) pid predA measure.num PredA PredB Out 1.1 1-1 1 010 100 2.1 2-2 1 010 101 3.1 3-1 1 010 102 4.1 4-2 1 110 103 5.1 5-1 1 110 104 6.1 6-2 1 011 105 Using Hadley's JSS article "Reshaping Data with the reshape Package" as a guide, I tried the following: M.Data <- melt(Data, id="pid") M.Data2 <- cbind(M.Data, colsplit(M.Data$variable, split = ".", names = c("treatment", "time"))) but this gave a warning and resulted in head(M.Data2) pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4 1 1predA-1NA NA NANANANANA 2 2predA-2NA NA NANANANANA 3 3predA-1NA NA NANANANANA 4 4predA-2NA NA NANANANANA 5 5predA-1NA NA NANANANANA 6 6predA-2NA NA NANANANANA I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html which led me to try M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names = c("treatment", "time"))) which gave: head(M.Data2) pid variable value treatment time 1 1predA-1 predA predA 2 2predA-2 predA predA 3 3predA-1 predA predA 4 4predA-2 predA predA 5 5predA-1 predA predA 6 6predA-2 predA predA Closer but no cigar. I would be grateful if someone will tell me (a) how to reshape the data as described above using the reshape package, (b) what difference between split = "." and split = "\\." is, and (c) if more information about the colsplit command is available anywhere. Thank you very much in advance, Ista __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping, Control Flow & Conditional Statements
See ?rle Start with this: a1.runs <- rle( a1 ) a1.runs$lengths[ a1.runs$values>0 ] [1] 3 4 HTH, Chuck p.s. library(fortunes) fortune(106) If the answer is parse() you should usually rethink the question. -- Thomas Lumley R-help (February 2005) -- see ?get On Fri, 13 Jun 2008, [EMAIL PROTECTED] wrote: Dear R Group: I have little experience using R and even less experience with control flow type questions. See the following code: a1 = c(0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0) for(i in 1:1){ sx <- paste("a",i,sep="") s <- eval(parse(text = paste("a",i,sep=""))) {g = numeric(length(s)) k = numeric(length(s)) {for (i in 1:length(s)) {for (j in 1:length(s)) ifelse(((j=i)>1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i])) }} h1 <- hist(g,freq=TRUE) h <- h1$counts[4] cat(sx,":", h,"\n",file = "C:/temp/test-beta.txt", append=TRUE) }} The output is: g [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0 k [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h [1] 7 & a text file, which has: a1 : 7 k is a by-product of the ifelse statement and is of no interest & g and h only go part-way to answering my question, which is: For every time an object i.e. a1 (which is actually a time series) - 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 has as value over 0 how long do the values stay above 0. So in this case a1 has two goups or events where the value is above zero, the first event lasts for 3 'days' and the second event lasts for 4 'days'. I have my code telling me that there was a total of 7 'days' in event or above 0, but what I need to know is that there were two 'events' and the 1st lasted 3 'days' and the 2nd lasted '4' days. Essentially I want a text file output to say: a1.1 : 3 a1.2 : 4 My thinking is that I need to somehow get the code working through each vector one value at a time and when a value is found to meet the critera of > 0 R creates a new vector; to use the above example it would come to the first value >0 and then create the new vector a1.1 = (1,1,1) then as the next value in the series is 0 it would close this new vector 'a1.1'. It would then continue until it reaches the next value >0 and then create the vector a1.2 = (1,1,1,1) then again as the next value in the series is 0 it would close this new vector, and so on. Then all I need to do is perform a count of '1's in these new vectors to find how many days they met this criteria of being greater than 0 I hope the above makes sense and I really hope there is someone willing and able to help. I don't know how to proceed. Thanks, Garth [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternative to matching/merge?
Jim, My code is this: mergefunc <- function(x,seqFile){ # merge(seqFile,x) cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)), ]) } LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile) Each matrix/data.frame takes 0.2 seconds and then to do this 1240 times takes ~4 minutes. Thanks, Lana -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Thursday, June 12, 2008 6:40 PM To: Lana Schaffer Cc: r-help@r-project.org Subject: Re: [R] alternative to matching/merge? It would be nice if you at least included the code that you are using and a subset of the data. Have you run Rprof to determine which of the functions is consuming the time? On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <[EMAIL PROTECTED]> wrote: > > Greetings, > I am doing matching/merge for a table (40919x3) to data which is in > the form of a list of 1268 data.frames. Using lapply this is taking > ~5 minutes. I know that the match/merge functions are time consuming, > so is there an alternative to this accomplish this goal? is lapply > not efficient? > > Lana Schaffer > Biostatistics/Informatics > The Scripps Research Institute > DNA Array Core Facility > La Jolla, CA 92037 > (858) 784-2263 > (858) 784-2994 > [EMAIL PROTECTED] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Freq function {prettyR}
Thanks a lot, Jim! > Since this is a contributed package, you should be contacting the > maintainer (as mentioned in the posting guide). sorry > > Anyway, the problem occurs because in the second case you have a factor > in the first column and numeric in the second. This part of the code > will illustrate what I mean: > > for (i in 1:nfreq) { > if (display.na) > nna <- sum(is.na(x[[i]])) > else nna <- 0 > xt <- na.omit(x[[i]]) > if (is.null(levels)) > levels <- unique(xt) > if (is.numeric(x[[i]])) > xt <- factor(xt, levels = levels) > > So the first time through this loop the levels variable is set to > c("m","f"). On the second time levels is no longer NULL, so when the xt > variable is created it is essentially this: > > xt <- factor(xt, levels = c("m","f")) > > and since xt contains only numbers you get > > [1] > Levels: m f > > Best, > > Jim > > > > [EMAIL PROTECTED] wrote: > > Does someone have an idea? > > Thanks a lot! > > > > Udo > > > > > > Quoting Udo <[EMAIL PROTECTED]>: > > > >> Dear list, > >> I have a problem with freq from prettyR. > >> > >> Please have a look at my syntax with a litte example: > >> > >> > >> library(prettyR) > >> > >> #Version 1 > >> test.df<-data.frame(q1=sample(1:4,8,TRUE), > gender=sample(c("f","m"),8,TRUE)) > >> test.df > >> freq(test.df) #No error message > >> > >> #Version 2 > >> test.df<-data.frame(gender=sample(c("f","m"),8,TRUE), > q1=sample(1:4,8,TRUE)) > >> test.df > >> freq(test.df) > >> > >> Error message: "Error in vector("integer", length) : Vector size can´t be > NA" > >> > >> Can someone tell me, why an error message occurs in version two? I am > >> helpless... > >> > >> Thanks in advance! > >> > >> Udo K ö n i g > >> > >> > >> > >> Clinic for Child an Adolescent Psychiatry > >> Philipps University of Marburg / Germany > >> > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternative to matching/merge?
What is the structure of 'd.frame' and 'segFile'? Run Rprof so that we can see which of the functions it is spending its time in. What happens if x$index is not in seqFile$index? Are the values in the 'index' unique in both structures? Subsetting a data frame can be expensive when compared to using a matrix. Could you use a matrix instead of a data frame; are all the columns the same mode? Again either a subset of data would be helpful or an 'str' on the data objects being used so that we can understand what they are. On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer <[EMAIL PROTECTED]> wrote: > Jim, > My code is this: > mergefunc <- function(x,seqFile){ > # merge(seqFile,x) > cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)), > ]) > } > LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile) > Each matrix/data.frame takes 0.2 seconds and then to do this > 1240 times takes ~4 minutes. > Thanks, > Lana > > -Original Message- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 12, 2008 6:40 PM > To: Lana Schaffer > Cc: r-help@r-project.org > Subject: Re: [R] alternative to matching/merge? > > It would be nice if you at least included the code that you are using > and a subset of the data. Have you run Rprof to determine which of the > functions is consuming the time? > > On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <[EMAIL PROTECTED]> > wrote: >> >> Greetings, >> I am doing matching/merge for a table (40919x3) to data which is in >> the form of a list of 1268 data.frames. Using lapply this is taking >> ~5 minutes. I know that the match/merge functions are time consuming, > >> so is there an alternative to this accomplish this goal? is lapply >> not efficient? >> >> Lana Schaffer >> Biostatistics/Informatics >> The Scripps Research Institute >> DNA Array Core Facility >> La Jolla, CA 92037 >> (858) 784-2263 >> (858) 784-2994 >> [EMAIL PROTECTED] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternative to matching/merge?
Jim, d.frame[[i]] is a list of data.frames and seqFile is a data.frame. I have coverted them to vectors/matrixes and the timing is the same as data.frame. 'index' is unique in both structures. The list is subset into data.frame/matrix structures. Lana -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Friday, June 13, 2008 9:45 AM To: Lana Schaffer Cc: r-help@r-project.org Subject: Re: [R] alternative to matching/merge? What is the structure of 'd.frame' and 'segFile'? Run Rprof so that we can see which of the functions it is spending its time in. What happens if x$index is not in seqFile$index? Are the values in the 'index' unique in both structures? Subsetting a data frame can be expensive when compared to using a matrix. Could you use a matrix instead of a data frame; are all the columns the same mode? Again either a subset of data would be helpful or an 'str' on the data objects being used so that we can understand what they are. On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer <[EMAIL PROTECTED]> wrote: > Jim, > My code is this: > mergefunc <- function(x,seqFile){ > # merge(seqFile,x) > cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)), > ]) > } > LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile) Each > matrix/data.frame takes 0.2 seconds and then to do this 1240 times > takes ~4 minutes. > Thanks, > Lana > > -Original Message- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 12, 2008 6:40 PM > To: Lana Schaffer > Cc: r-help@r-project.org > Subject: Re: [R] alternative to matching/merge? > > It would be nice if you at least included the code that you are using > and a subset of the data. Have you run Rprof to determine which of > the functions is consuming the time? > > On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <[EMAIL PROTECTED]> > wrote: >> >> Greetings, >> I am doing matching/merge for a table (40919x3) to data which is in >> the form of a list of 1268 data.frames. Using lapply this is taking >> ~5 minutes. I know that the match/merge functions are time >> consuming, > >> so is there an alternative to this accomplish this goal? is lapply >> not efficient? >> >> Lana Schaffer >> Biostatistics/Informatics >> The Scripps Research Institute >> DNA Array Core Facility >> La Jolla, CA 92037 >> (858) 784-2263 >> (858) 784-2994 >> [EMAIL PROTECTED] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Level Plot and Scale of Colorkey
I am drawing level plots but I would like to specify the range of the colorkey, I am not having any success figuring this out so any help would be greatly appreciated! Here is an example of what I am trying to do: disp<-1 x <- seq(1, 10,by=1) y <- seq(1,10,by=1) g <- expand.grid(x = x, y = y) g$z <- 1/exp((abs(g$x-5)+abs(g$y-5))*disp) g$z<-g$z/sum(g$z) levelplot(z ~ x * y, g,xlab="x co-ordinate", ylab="y co-ordinate" ,colorkey=TRUE,col.regions=(col=gray((0:32)/32))) I would like to enforce the number of divisions on the colorkey scale and the size – so for example from 0 to 0.1 in increments of 0.02 (just as an example). I apologize if this is an obvious question but I have read the documentation and scoured the archives and cannot figure it out. __ can.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cluster.stats
Dear list, I just tried to use the function cluster.stat in the package fpc. I just have a couple of questions about the syntax: cluster.stats(d,clustering,alt.clustering=NULL, silhouette=TRUE,G2=FALSE,G3=FALSE) 1) the distance object (d) is an object obtained by the function dist() on my own original matrix? 2) clustering is the clusters vector as result of one of the many clustering methods? Thank you very much in advance and sorry for such basic question, but I did not manage to clarify my mind. Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Level Plot and Scale of Colorkey
Try colscaledivs=100#colscaledivs=15 here is the R default levelplot(z ~ x * y, g,xlab="x co-ordinate",ylab="y co-ordinate",colorkey=TRUE,at=seq(from=-0.01,to=0.25,length=colscaledivs),col.regions=(col=gray((0:colscaledivs)/colscaledivs))) Toby Marthews Le Ven 13 juin 2008 18:50, emma hartnett a écrit : > I am drawing level plots but I would like to specify the range of the > colorkey, I am not having any success figuring this out so any help would > be greatly appreciated! > > Here is an example of what I am trying to do: > > disp<-1 > > x <- seq(1, 10,by=1) > y <- seq(1,10,by=1) > g <- expand.grid(x = x, y = y) > g$z <- 1/exp((abs(g$x-5)+abs(g$y-5))*disp) > g$z<-g$z/sum(g$z) > > levelplot(z ~ x * y, g,xlab="x co-ordinate", ylab="y co-ordinate" > ,colorkey=TRUE,col.regions=(col=gray((0:32)/32))) > > I would like to enforce the number of divisions on the colorkey scale and > the size – so for example from 0 to 0.1 in increments of 0.02 (just as an > example). > > I apologize if this is an obvious question but I have read the > documentation and scoured the archives and cannot figure it out. > > > > > __ > can.html > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] restricted coefficient and factor for linear regression.
Hi, my data set is data.frame(id, yr, y, l, e, k). I would like to estimate Lee and Schmidts (1993, OUP) model in R. My colleague wrote SAS code as follows: ** procedures for creating dummy variables are omitted ** ** di# and dt# are dummy variables for industry and time ** data a2; merge a1 a2 a; by id yr; proc sysnlin maxit=100 outest=beta2; endogenous y; exogenous l e k di1-di12 dt2-dt10; parms a0 0.94 al -0.14 ae 1.8 ak -0.9 b1 0 b2 0 b3 0 b4 0 b5 0 b6 0 b7 0 b8 0 b9 0 b10 0 b11 0 b12 0 c2 0 c3 0 c4 0 c5 0 c6 0 c7 0 c8 0 c9 0 c10 0; y=a0+al*l+ae*e+ak*k +(b1*di1+b2*di2+b3*di3+b4*di4+b5*di5+b6*di6 +b7*di7+b8*di8+b9*di9+b10*di10+b11*di11+b12*di12)* (1*dt1+c2*dt2+c3*dt3+c4*dt4+c5*dt5+c6*dt6+c7*dt7 +c8*dt8+c9*dt9+c10*dt10); title '* lee/schmidt parameter estimates *'; My R code is as follows: ## library(plm) dt <- read.table("dt.dta", sep = "\t", header= T) dt$id <- factor(dt$id) dt$yr <- factor(dt$yr) fit.model <- I(log(y)) ~ I(log(l)) + I(log(e)) + yr * id re.fit.gls <- pggls(fit.model, data = dt) # I've got the following error message: # Error message ### Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent End of Error message I would like to figure out three things. 1. How can I restrict coefficient in model? As you can see in SAS code, coefficient of dt1 is restricted to 1. 2. If it is possible to restrict coefficients, it is possible to restrict coefficients of factors? If so, how? Thanks in advance. Best, = Dong-hyun Oh Center of Excellence for Science and Innovation Studies Royal Institute or Technology, Sweden e-mail: [EMAIL PROTECTED] cel: +46 73 563 45 22 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum likelihood estimation in R with censored Data
Bluder Olivia k-ai.at> writes: > > Hello, > > I'm trying to calculate the Maximum likelihood estimators for a dataset > which contains censored data. > > I started by using the function "nlm", but isn't there a separate method > for doing this for e.g. the "weibull" and the "log-normal" distribution? > > Thanks, > > Olivia This is not *quite* enough detail about what you want to do. Can you (as the posting guide suggests!) give us a small example of what you want to do? You may be able to do this via the survreg() command in the survival package, or you may want to do it yourself by constructing a log-likelihood function with dweibull() for uncensored data and pweibull() for censored data [or dlnorm/plnorm]. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls() vs lm() estimates
Hi, I'm trying to understand why the coefficients "a" and "b" for the model: W = a*L^b estimated via nls() differs from those obtained for the log transformed model: log(W) = log(a) + b*log(L) estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a "better" adjustment for the model using coefficients estimated by lm() . Perhaps I'm doing something wrong in nls()? I hope the code below explains this better. Thanks in advance for any hints. Héctor L <- c(8,8.1,8.5,9,9.4,9.4,9.5,9.5,9.5,9.6,9.8,10,10,10,10,10,10,10,10,10,10,10.2,10.3,10.4,10.4,1 0.4,10.4,10.5,10.5,10.5,10.5,10.5,10.5,10.5,10.5,10.7,10.7,10.8,10.9,10.9,10.9,11,11,11,11,1 1,11,11,11,11,11,11,11,11,11,11,11,11,11.1,11.1,11.2,11.2,11.2,11.3,11.3,11.3,11.3,11.3,11. 4,11.4,11.4,11.4,11.5,11.5,11.5,11.5,11.5,11.5,11.5,11.5,11.6,11.6,11.6,11.6,11.6,11.6,11.6, 11.6,11.7,11.7,11.7,11.7,11.7,11.8,11.8,11.8,11.8,11.8,11.9,12,12,12,12,12,12,12,12,12,12,1 2,12,12,12,12,12,12,12,12,12,12,12,12,12,12.1,12.2,12.2,12.2,12.3,12.3,12.3,12.3,12.3,12.3, 12.3,12.3,12.3,12.4,12.4,12.4,12.4,12.4,12.4,12.5,12.5,12.5,12.5,12.5,12.5,12.5,12.5,12.6,12 .6,12.7,12.7,12.8,12.8,12.8,12.9,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13, 13,13.2,13.2,13.3,13.5,13.5,13.5,13.5,13.5,13.5,14) W <- c(11,13,13.45,21.66,19.5,19.73,19.74,19.42,21.48,20.47,23.02,22.7,20.19,23.3,27.05,19.81, 20.01,26,24,25,20,25,26.29,31.26,23.08,29.85,24.27,27.49,25,26.03,24,26,28.21,24.62,21.6 9,24.68,23.6,25.42,26.7,30.25,30.06,33.62,32,30,32.46,30,30,28.8,30.2,31.44,32.84,33.04,3 5,28,29,33,34,28,28.51,35.67,33.72,33,28.53,34.85,34.5,37.44,37.74,31.36,30.12,36.03,33.4 ,33.51,34,33,33.79,34.93,35,34.13,35.65,34,32.77,41.71,31.26,32.4,28.81,35.63,34.96,36.74 ,32.38,38.14,34.12,40.26,40.27,36.96,38.35,42.36,40.33,31.59,34.44,38,42.63,40,36.28,37,3 4.4,34,33.64,39.05,40.46,35.45,38.72,35,33,35,33,40,35,37,36,32,43,35,40,33.54,40.06,43.3 8,40.3,44.81,43,46.32,37.45,37.71,45.9,36.1,44.78,43.12,45.5,41.62,38,37,43.08,43.82,47.2 5,43,41.59,43.58,41,44,48,43,45.46,43.5,43.38,47.54,45,46.92,44.75,49.02,43.37,43.44,48,4 3,46,42,48,45,48,43,45,46,43,40,42,40,43,43,50,44,50.65,42.11,50,51.44,53.1,52,56.2,45,49 ,55) ## Using nls() to find "a" and "b" for model: W = a*L^b WL.nls <- nls((W ~ a * L^b), start = list(a = 0.02, b = 1), trace = TRUE, algorithm = "default", model = TRUE) summary(WL.nls) ## Scatterplot with fitted model plot(L, W) lines(L, predict(WL.nls), col = "blue", lwd = 2) ## Finding "log(a)" and "b" for log transformed model: log(W) = log(a)+ b*log(L) logWL.lm <- lm(log10(W) ~ log10(L)) summary(logWL.lm) ## Adding model to plot lines(L, 10^coef(logWL.lm)[1]*L^coef(logWL.lm)[2], col="red", lwd=2) ## R-squared for W = a*L^b Rsq.nls <- sum((predict(WL.nls) - mean(W))^2) / sum((W - mean(W))^2) ## R-squared for W = a*L^b with coefs from log(W) = log(a)+ b*log(L) pred <- 10^coef(logWL.lm )[1]*L^coef(logWL.lm )[2] Rsq.lm <- sum((pred - mean(W))^2) / sum((W - mean(W))^2) text(c(9, 13), c(50, 20), paste("R-squared:", formatC(c(Rsq.nls, Rsq.lm), digits=4)), col=c("blue", "red")) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster.stats
Dear Laura, Dear list, I just tried to use the function cluster.stat in the package fpc. I just have a couple of questions about the syntax: cluster.stats(d,clustering,alt.clustering=NULL, silhouette=TRUE,G2=FALSE,G3=FALSE) 1) the distance object (d) is an object obtained by the function dist() on my own original matrix? d is allowed to be an object of class dist or a dissimilarity matrix. The answer to your question depends on what your "original matrix" is. If it is something on which you can compute a distance by dist(), you're right, at least if dist() delivers the distance you are interested in. 2) clustering is the clusters vector as result of one of the many clustering methods? The help page tells you what clustering can be. So it could be the clustering/partition vector of a clustering method or it could be something else. Note that cluster.stats doesn't depend on any particular clustering method. It computes the statistics regardless of where the clustering vector comes from. Best regards, Christian Thank you very much in advance and sorry for such basic question, but I did not manage to clarify my mind. Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with colsplit (reshape)
> M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names > = c("treatment", "time"))) > > which gave: > > head(M.Data2) > pid variable value treatment time > 1 1predA-1 predA predA > 2 2predA-2 predA predA > 3 3predA-1 predA predA > 4 4predA-2 predA predA > 5 5predA-1 predA predA > 6 6predA-2 predA predA > > Closer but no cigar. Have a look at the whole thing - it's getting it right most of the time. Going back to the original variable names, I see that "PredA" does not have a time associated with it. What do you expect the time to be? > I would be grateful if someone will tell me (a) how to reshape the data as > described above using the reshape package, (b) what difference between split > = "." and split = "\\." is, The splitting argument is a regular expression, and in regular expression speak "." means to match any one character. "\\." escapes the full stop, so it only matches full stops. > and (c) if more information about the colsplit > command is available anywhere. Probably the best way is just to look at the code (it's pretty simple): > colsplit.character function (x, split = "", names) { vars <- as.data.frame(do.call(rbind, strsplit(x, split))) names(vars) <- names as.data.frame(lapply(vars, function(x) type.convert(as.character(x } If strsplit doesn't do what you want, you might need to write your own function following those lines. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls() vs lm() estimates
Héctor Villalobos wrote: Hi, I'm trying to understand why the coefficients "a" and "b" for the model: W = a*L^b estimated via nls() differs from those obtained for the log transformed model: log(W) = log(a) + b*log(L) estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a "better" adjustment for the model using coefficients estimated by lm() . Perhaps I'm doing something wrong in nls()? I didn't tried your code, but in general these estimates are different: for the former estimate you minimize the norm of the difference W-a*L^b (W are ) and for the latter you minimize the norm of the difference log(W)-(log(a)+b*log(L)). The solution for these problems are equal. That which approach you should choose depends on errors, for additive error model the former is better choice. -- Janne Huttunen University of California Department of Statistics 367 Evans Hall Berlekey, CA 94720-3860 email: [EMAIL PROTECTED] phone: +1-510-502-5205 office room: 449 Evans Hall __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum likelihood estimation in R with censored Data
Le ven. 13 juin à 13:55, Ben Bolker a écrit : Bluder Olivia k-ai.at> writes: Hello, I'm trying to calculate the Maximum likelihood estimators for a dataset which contains censored data. I started by using the function "nlm", but isn't there a separate method for doing this for e.g. the "weibull" and the "log-normal" distribution? Thanks, Olivia This is not *quite* enough detail about what you want to do. Can you (as the posting guide suggests!) give us a small example of what you want to do? You may be able to do this via the survreg() command in the survival package, or you may want to do it yourself by constructing a log-likelihood function with dweibull() for uncensored data and pweibull() for censored data [or dlnorm/plnorm]. If you want to go the second route, function coverage() in package actuar will build the censored density function for you. You can then feed this function to fitdistr() just like for "usual" ML estimation. HTH Vincent Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls() vs lm() estimates
Janne Huttunen wrote: Héctor Villalobos wrote: Hi, I'm trying to understand why the coefficients "a" and "b" for the model: W = a*L^b estimated via nls() differs from those obtained for the log transformed model: log(W) = log(a) + b*log(L) estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a "better" adjustment for the model using coefficients estimated by lm() . Perhaps I'm doing something wrong in nls()? I didn't tried your code, but in general these estimates are different: for the former estimate you minimize the norm of the difference W-a*L^b (W are ) and for the latter you minimize the norm of the difference log(W)-(log(a)+b*log(L)). The solution for these problems are equal. That which approach you should choose depends on errors, for additive error model the former is better choice. I should read what I have written before sending my message. I meant that the solutions of these problems are NOT equal (in general) and therefore estimates differ. -- Janne Huttunen University of California Department of Statistics 367 Evans Hall Berkeley, CA 94720-3860 email: [EMAIL PROTECTED] phone: +1-510-502-5205 office room: 449 Evans Hall __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quartile regression question
I have data that looks like lake,loglength,logweight 1,2.369215857,1.929418926 1,2.426511261,2.230448921 1,2.434568904,2.298853076 1,2.437750563,2.298853076 1,2.442479769,2.230448921 1,2.445604203,2.356025857 ... 102,2.722633923,3.310268367 102,2.781755375,3.502153893 102,2.836324116,3.683407299 102,2.802773725,3.583312152 102,2.790285164,3.546419267 102,2.806179974,3.599118565 102,2.716837723,3.316180099 I can regress log weight on log length simply enough, but how would I model the third quartile of log weights? In other words, rather than finding a 2nd quartile (or 50th percentile) regression line, e.g., mod=lm(logweight~loglength) can R find a 75th percentile line? Further, since my data is lake>1, is there a way to run 3rd quartile regressions on each lake? I would imagine that regressing each population would require some call of the subset function, but I cannot figure out how to call it. Thanks in advance, SR Steven H. Ranney Graduate Research Assistant (Ph.D) USGS Montana Cooperative Fishery Research Unit Montana State University PO Box 173460 Bozeman, MT 59717-3460 phone: (406) 994-6643 fax: (406) 994-7479 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with colsplit (reshape)
Thanks Hadley, with your help I'm getting things figured out. On Jun 13, 2008, at 2:09 PM, hadley wickham wrote: M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\ \.", names = c("treatment", "time"))) which gave: head(M.Data2) pid variable value treatment time 1 1predA-1 predA predA 2 2predA-2 predA predA 3 3predA-1 predA predA 4 4predA-2 predA predA 5 5predA-1 predA predA 6 6predA-2 predA predA Closer but no cigar. Have a look at the whole thing - it's getting it right most of the time. Going back to the original variable names, I see that "PredA" does not have a time associated with it. What do you expect the time to be? Right, there is no time associated with this variable. So I tried again, treating it as an id: M.Data <- melt(Data, id = c("pid", "predA")) From here I was able to achieve the desired result, as follows: M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names=c("measure", "time"))) M.Data$variable <- M.Data$measure M.Data <- M.Data[-5] L.Data <- cast(M.Data, ... ~ variable) This is perhaps a bit inelegant but it works! I'm interested in knowing if there is a better way to do it, but I'm happy that I've at least figured out this much. As always I'm humbled by the generosity of people who not only make their software available but also take the time to answer questions on this list. Thank you! -Ista I would be grateful if someone will tell me (a) how to reshape the data as described above using the reshape package, (b) what difference between split = "." and split = "\\." is, The splitting argument is a regular expression, and in regular expression speak "." means to match any one character. "\\." escapes the full stop, so it only matches full stops. and (c) if more information about the colsplit command is available anywhere. Probably the best way is just to look at the code (it's pretty simple): colsplit.character function (x, split = "", names) { vars <- as.data.frame(do.call(rbind, strsplit(x, split))) names(vars) <- names as.data.frame(lapply(vars, function(x) type.convert(as.character(x } If strsplit doesn't do what you want, you might need to write your own function following those lines. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quartile regression question
Hello, Look at package quantreg. Philippe Grosjean Ranney, Steven wrote: I have data that looks like lake,loglength,logweight 1,2.369215857,1.929418926 1,2.426511261,2.230448921 1,2.434568904,2.298853076 1,2.437750563,2.298853076 1,2.442479769,2.230448921 1,2.445604203,2.356025857 ... 102,2.722633923,3.310268367 102,2.781755375,3.502153893 102,2.836324116,3.683407299 102,2.802773725,3.583312152 102,2.790285164,3.546419267 102,2.806179974,3.599118565 102,2.716837723,3.316180099 I can regress log weight on log length simply enough, but how would I model the third quartile of log weights? In other words, rather than finding a 2nd quartile (or 50th percentile) regression line, e.g., mod=lm(logweight~loglength) can R find a 75th percentile line? Further, since my data is lake>1, is there a way to run 3rd quartile regressions on each lake? I would imagine that regressing each population would require some call of the subset function, but I cannot figure out how to call it. Thanks in advance, SR Steven H. Ranney Graduate Research Assistant (Ph.D) USGS Montana Cooperative Fishery Research Unit Montana State University PO Box 173460 Bozeman, MT 59717-3460 phone: (406) 994-6643 fax: (406) 994-7479 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quartile regression question
Thanks for your help. Worked great. SR Steven H. Ranney Graduate Research Assistant (Ph.D) USGS Montana Cooperative Fishery Research Unit Montana State University PO Box 173460 Bozeman, MT 59717-3460 phone: (406) 994-6643 fax: (406) 994-7479 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with colsplit (reshape)
> Right, there is no time associated with this variable. So I tried again, > treating it as an id: > > M.Data <- melt(Data, id = c("pid", "predA")) > > From here I was able to achieve the desired result, as follows: > > M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", > names=c("measure", "time"))) > M.Data$variable <- M.Data$measure > M.Data <- M.Data[-5] > L.Data <- cast(M.Data, ... ~ variable) > > This is perhaps a bit inelegant but it works! I'm interested in knowing if > there is a better way to do it, but I'm happy that I've at least figured out > this much. As always I'm humbled by the generosity of people who not only > make their software available but also take the time to answer questions on > this list. Thank you! You're welcome. And don't worry too much about data cleaning routines being elegant - it's very very hard to write elegant code to clean up something that's not at all elegant. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternative to matching/merge?
On Fri, Jun 13, 2008 at 11:45 AM, jim holtman <[EMAIL PROTECTED]> wrote: > What is the structure of 'd.frame' and 'segFile'? Run Rprof so that > we can see which of the functions it is spending its time in. What > happens if x$index is not in seqFile$index? Are the values in the > 'index' unique in both structures? Subsetting a data frame can be > expensive when compared to using a matrix. Could you use a matrix > instead of a data frame; are all the columns the same mode? Again > either a subset of data would be helpful or an 'str' on the data > objects being used so that we can understand what they are. A few other ideas to try: * try merging do.call("rbind", d.frame) and seqFile, and then spliting the results back up * try turning giving seqFile rownames (rownames(seqFile) <- seqFile$index) and then use character matching: cbind(x, seqFile[ as.character(x$index)] * if there is a one to one corresponding between index in seqFile and all data.frames in d.frame, merge all of the d.frames together, order both by index then just cbind Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stretching text vertically
I'd like to stretch a plotted character vertically, to create a "sequence logo". Is there a parameter to allow stretching text() output vertically or squeeze horizontally? I know about Oliver Bembom's seqLogo library, but this generates a sequence logo plot using a separate bitmap device. I want to recreate the sequence logo *inside* an existing plot. Alternatively, is there a way to embed one plot inside another? I could use imagemagick outside R to 'montage' separate bitmaps, but then the sequence logo is going to be very difficult to align (base for base) with the plot I'm trying to join it to. Thanks for any tips, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Importing data with different delimters
All, I have a data file with 56 entries that looks like this: City State JanTemp Lat Long Mobile, AL 44 31.288.5 Montgomery, AL 38 32.986.8 Phoenix, AZ 35 33.6112.5 Little Rock, AR 31 35.492.8 Los Angeles, CA 47 34.3118.7 San Francisco, CA 42 38.4123.0 I would like to "read" this data into a dataframe. Is it possible to do without editing the datafile? D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with stat.table in Epi package,
R Fans-- I am having problems with the following code. It worked under R 2.6.0 but not in 2.7.0. > library(Epi) > df <- read.table( "c:/Documents and Settings/Troy S/My > Documents/debug_chisq_080613b.txt") > summary(df) cvd agecat Min. :0. (0,40] :1 1st Qu.:0. (40,60]:2 Median :0. Mean :0. 3rd Qu.:0.5000 Max. :1. > fa <- as.factor(df$cvd) > fb <- as.factor(df$agecat) > stat.table(index=list("a"=fa, "b"=fb)) Error in eval(expr, envir, enclos) : could not find function "count" The file contents is "cvd" "agecat" "1" 0 "(0,40]" "2" 1 "(40,60]" "3" 0 "(40,60]" My sessionInfo is R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats4splines stats graphics grDevices utils datasets [8] methods base other attached packages: [1] Epi_1.0.8 coin_0.6-9modeltools_0.2-15 mvtnorm_0.9-0 [5] survival_2.34-1 loaded via a namespace (and not attached): [1] tools_2.7.0 > Any help would be great! Troy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing data with different delimters
Assuming that the only problem is the blank in the city names, here is one way of doing it: > inFile <- textConnection("City State JanTemp Lat Long + Mobile, AL 44 31.288.5 + Montgomery, AL 38 32.986.8 + Phoenix, AZ 35 33.6112.5 + Little Rock, AR 31 35.492.8 + Los Angeles, CA 47 34.3118.7 + San Francisco, CA 42 38.4123.0") > lines <- readLines(inFile) > # get rid of blanks in city names > newLines <- sub("(.*?) +(.*),", "\\1_\\2,", lines) > > x <- read.table(textConnection(newLines), header=TRUE) > closeAllConnections() > x City State JanTemp Lat Long 1Mobile,AL 44 31.2 88.5 2Montgomery,AL 38 32.9 86.8 3 Phoenix,AZ 35 33.6 112.5 4 Little_Rock,AR 31 35.4 92.8 5 Los_Angeles,CA 47 34.3 118.7 6 San_Francisco,CA 42 38.4 123.0 > > If you want, you can then go back and replace the "_" with a blank in the city name. On Fri, Jun 13, 2008 at 7:14 PM, David Arnold <[EMAIL PROTECTED]> wrote: > All, > > I have a data file with 56 entries that looks like this: > > City State JanTemp Lat Long > Mobile, AL 44 31.288.5 > Montgomery, AL 38 32.986.8 > Phoenix, AZ 35 33.6112.5 > Little Rock, AR 31 35.492.8 > Los Angeles, CA 47 34.3118.7 > San Francisco, CA 42 38.4123.0 > > I would like to "read" this data into a dataframe. Is it possible to do > without editing the datafile? > > D. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rbind() problem
Hi, I would like to rbind 2 data frames. They both some common column names, but also some unique column names each, is there any simple function that rbind these 2 data frames with filling NAs for those columns of unique names? thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correcting the display of colnames and rownames
Dear all, I have a data frame of dimension 720 columns by 360 rows, to which I am trying to add numerical row and column labels to, using the 'sequence' command. The original data, which I read in using 'read.table', had no such labels at all. I've got as far as successfully using the sequence command and getting the labels to display. However, I'm finding that for the minus numbers in particular, the values aren't displaying correctly. For the value '-179.75' for example, it displays as 'X.179.75'. Even for positive numbers, the 'X' prefix appears at the start of the label (but without the '.'). I have tried numerous attempts at addressing this. I'm currently as far as adopting the following approach; I'll show what I've done for just the column headings - I've adopted the same approach for row headings, with the same results/problem so far. columnnames <- seq(from = -179.75, to = 179.75, length = 720) as.numeric <- colnames(Jan) colnames(Jan) <- make.names(columnnames) N.B. 'Jan' (as in January) refers to the data frame in question. So my thinking here is to assign the values to be used as column labels to 'columnnames', and use 'make.names' to assign these values to the column names of the data frame. I've also tried changing 'colnames(Jan)' to be a numeric class, as I was previously having problems assigning the values to the labels - I think because by default 'colnames' is of class 'character vector'? If anyone is able to suggest a way how I can solve the problem of the values not being displayed as I'd hoped (namely, removing the 'X' and displaying '-' for minus numbers), then I'd be very grateful. Many thanks, Steve _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using lm with a matrix?
Many thanks, works great! Charilaos Skiadas-3 wrote: > > Try this: > > lapply( 1:2, function(i) lm( y~x, data=list(x=xdat[,i], y=ydat[,i]) ) ) > > Haris Skiadas > -- View this message in context: http://www.nabble.com/Using-lm-with-a-matrix--tp17708207p17829661.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weights and coxph
I am confuse by the results of the weights option for coxph. I replicated each row three times from the help page for coxph in the data frame test_freq. I had expected that the coefficients, significance tests, and tests of non-proportionality would yield the same results for the replicated and non-replicated data, but the output below shows differences in all three metrics. Is this the result of a curved response variable? This is likely more of a conceptual question than a language question, but all help is sincerely appreciated. Mike > test1 $time [1] 4 3 1 1 2 2 3 $status [1] 1 NA 1 0 1 1 0 $x [1] 0 2 1 1 1 0 0 $sex [1] 0 0 0 0 1 1 1 $wt [1] 3 3 3 3 3 3 3 > test_freq time status x sex 1 4 1 0 0 2 4 1 0 0 3 4 1 0 0 4 3 NA 2 0 5 3 NA 2 0 6 3 NA 2 0 7 1 1 1 0 8 1 1 1 0 9 1 1 1 0 101 0 1 0 111 0 1 0 121 0 1 0 132 1 1 1 142 1 1 1 152 1 1 1 162 1 0 1 172 1 0 1 182 1 0 1 193 0 0 1 203 0 0 1 213 0 0 1 > t1 <- coxph( Surv(time, status) ~ x + strata(sex), data=test1, weights=wt) > summary(t1) Call: coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1, weights = wt) n=6 (1 observation deleted due to missingness) coef exp(coef) se(coef)zp x 1.17 3.220.744 1.57 0.12 exp(coef) exp(-coef) lower .95 upper .95 x 3.22 0.311 0.749 13.8 Rsquare= 0.353 (max possible= 0.999 ) Likelihood ratio test= 2.61 on 1 df, p=0.106 Wald test= 2.47 on 1 df, p=0.116 Score (logrank) test = 2.67 on 1 df, p=0.102 > cox.zph(t1) rho chisq p x -0.0716 0.00598 0.938 > t_freq <- coxph( Surv(time, status) ~ x + strata(sex), data=test_freq) > summary(t_freq) Call: coxph(formula = Surv(time, status) ~ x + strata(sex), data = test_freq) n=18 (3 observations deleted due to missingness) coef exp(coef) se(coef)z p x 1.41 4.090.756 1.86 0.063 exp(coef) exp(-coef) lower .95 upper .95 x 4.09 0.245 0.929 18.0 Rsquare= 0.185 (max possible= 0.879 ) Likelihood ratio test= 3.69 on 1 df, p=0.0549 Wald test= 3.47 on 1 df, p=0.0626 Score (logrank) test = 3.84 on 1 df, p=0.0499 > cox.zph(t_freq) rho chisq p x -0.0697 0.0526 0.819 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] overlaid transparent histograms
Hello all-- I'm attempting to produce overlaid histograms with partially transparent columns. Whether this display will end up being useful, I can't say. But I do want to get it right. I've already got one solution (shown below), but I tried some other versions and had questions about my results. (Note: I'm using a quartz device, so transparency shows up correctly. You might need to print to a pdf device to get transparency, according to the docs I've read) --8<---cut here---start->8--- ## Working version: data(lexdec, package="languageR") attach(lexdec) x <- log(c(BNCw, Frequency)) label <- c(rep("BNCw", length(BNCw)), rep("CELEX", length(Frequency))) h <- data.frame(x, label) g <- ggplot(h, aes(x=x, fill=label)) g + geom_bar(position="identity") + scale_fill_manual(values = c( alpha("red", 0.5), alpha("blue", 0.5))) detach(lexdec) --8<---cut here---end--->8--- Three questions: 1a) Why does the following code not produce transparent bars? 1b) How can I manually specify the elements of the legend for this version of the plot? --8<---cut here---start->8--- ## Non-working version data(lexdec, package="languageR") g <- ggplot(lexdec) g + geom_histogram(aes(x=log(BNCw), fill = alpha("red", .5))) + geom_histogram(aes(x=log(BNCc), fill = alpha("blue", .5))) --8<---cut here---end--->8--- 2) Does anyone have a way to accomplish the same thing in lattice? I saw the post at http://www.nabble.com/Overlay-plots-from-different-data-sets-using-the-Lattice-package-tp14824421p14824421.html, but couldn't figure out how to extend these suggestions to overlaid transparent histograms. Thanks in advance for any help, /au > sessionInfo() R version 2.7.0 (2008-04-22) powerpc-apple-darwin8.10.1 locale: C attached base packages: [1] grid splines stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ggplot2_0.6colorspace_0.95RColorBrewer_1.0-2 MASS_7.2-42 [5] proto_0.3-8reshape_0.8.0 languageR_0.92 coda_0.13-2 [9] lme4_0.999375-15 Matrix_0.999375-10 zipfR_0.6-0lattice_0.17-8 [13] Design_2.1-1 survival_2.34-1Hmisc_3.4-3 -- Austin Frank http://aufrank.net GPG Public Key (D7398C2F): http://aufrank.net/personal.asc pgpY8PedpKU6o.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "False convergence" in LME
I tried to use LME (on a fairly large dataset, so I am not including it), and I got this error message: Error in lme.formula(formula(paste(c(toString(TargetName), "as.factor(nodeInd)"), : nlminb problem, convergence error code = 1 message = false convergence (8) Is there any way to get more information or to get the potentially wrong estimates from LME? (Also, the page in the NLMINB documentation, http://netlib.bell-labs.com/cm/cs/cstr/153.pdf, has errors in it, which makes it harder to check on what is happening.) Thank you in advance! Rebecca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset by Factor by date
I have a dataframe, x, with over 60,000 rows that contains one Factor, "id", with 27 levels. The dataframe contains numerous continuous values (along column "diff") per day (column "date") for every level of id. I would like to select only one row per animal per day, i.e. that containing the minimum value of "diff", along the full length of 1:nrow(x). I am not yet able to conduct anything beyond the simplest of functions and I was hoping someone could suggest an effective way of producing this output. e.g. given this input: id day diff 1 01-01-09 0.5 1 01-01-09 0.7 2 01-01-09 0.2 2 01-01-09 0.4 1 01-02-09 0.1 1 01-02-09 0.3 2 01-02-09 0.3 2 01-02-09 0.4 I would like to produce this output: id day diff 1 01-01-09 0.5 2 01-01-09 0.2 1 01-02-09 0.1 2 01-02-09 0.3 It doesn't seem extremely difficult but I'm sure there are easier ways than how I am currently approaching it! -- View this message in context: http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17835631.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset by Factor by date
on 06/13/2008 11:10 PM T.D.Rudolph wrote: I have a dataframe, x, with over 60,000 rows that contains one Factor, "id", with 27 levels. The dataframe contains numerous continuous values (along column "diff") per day (column "date") for every level of id. I would like to select only one row per animal per day, i.e. that containing the minimum value of "diff", along the full length of 1:nrow(x). I am not yet able to conduct anything beyond the simplest of functions and I was hoping someone could suggest an effective way of producing this output. e.g. given this input: id day diff 1 01-01-09 0.5 1 01-01-09 0.7 2 01-01-09 0.2 2 01-01-09 0.4 1 01-02-09 0.1 1 01-02-09 0.3 2 01-02-09 0.3 2 01-02-09 0.4 I would like to produce this output: id day diff 1 01-01-09 0.5 2 01-01-09 0.2 1 01-02-09 0.1 2 01-02-09 0.3 It doesn't seem extremely difficult but I'm sure there are easier ways than how I am currently approaching it! See ?aggregate > DF id day diff 1 1 01-01-09 0.5 2 1 01-01-09 0.7 3 2 01-01-09 0.2 4 2 01-01-09 0.4 5 1 01-02-09 0.1 6 1 01-02-09 0.3 7 2 01-02-09 0.3 8 2 01-02-09 0.4 > aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE) id day x 1 1 01-01-09 0.5 2 2 01-01-09 0.2 3 1 01-02-09 0.1 4 2 01-02-09 0.3 Note that I have not converted the 'day' column to a 'date' class. You would need to do that to perform any other date related operations (including chronological sorting) on that column. See ?as.Date for more information. For example: DF$day <- as.Date(DF$day, format = "%m-%d-%y") HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping, Control Flow & Conditional Statements
Thanks Chuck, 'rle' was just what I needed. G -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Saturday, 14 June 2008 02:00 To: Warren, Garth (CSE, Gungahlin) Cc: r-help@r-project.org Subject: Re: [R] Looping, Control Flow & Conditional Statements See ?rle Start with this: > a1.runs <- rle( a1 ) > a1.runs$lengths[ a1.runs$values>0 ] [1] 3 4 > HTH, Chuck p.s. > library(fortunes) > fortune(106) If the answer is parse() you should usually rethink the question. -- Thomas Lumley R-help (February 2005) -- see ?get On Fri, 13 Jun 2008, [EMAIL PROTECTED] wrote: > Dear R Group: > > > > I have little experience using R and even less experience with control > flow type questions. > > > > See the following code: > > > > a1 = c(0, 1, 1, 1, > > 0, 0, 0, 0, 0, > > 0, 0, 1, > > 1, 1, 1, 0, 0) > > > > for(i in 1:1){ > >sx <- paste("a",i,sep="") > >s <- eval(parse(text = paste("a",i,sep=""))) > > {g = numeric(length(s)) > > k = numeric(length(s)) > >{for (i in 1:length(s)) > >{for (j in 1:length(s)) > >ifelse(((j=i)>1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i])) > > }} > > h1 <- hist(g,freq=TRUE) > > h <- h1$counts[4] > > cat(sx,":", h,"\n",file = "C:/temp/test-beta.txt", append=TRUE) > > }} > > > > > > The output is: > >> g > > [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0 > >> k > > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > >> > >> h > > [1] 7 > > > > & a text file, which has: > >a1 : 7 > > > > k is a by-product of the ifelse statement and is of no interest & g and > h only go part-way to answering my question, which is: > > > > For every time an object i.e. a1 (which is actually a time series) - 0 1 > 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 has as value over 0 how long do the > values stay above 0. So in this case a1 has two goups or events where > the value is above zero, the first event lasts for 3 'days' and the > second event lasts for 4 'days'. I have my code telling me that there > was a total of 7 'days' in event or above 0, but what I need to know is > that there were two 'events' and the 1st lasted 3 'days' and the 2nd > lasted '4' days. Essentially I want a text file output to say: > > > a1.1 : 3 > > > a1.2 : 4 > > > > My thinking is that I need to somehow get the code working through each > vector one value at a time and when a value is found to meet the critera > of > 0 R creates a new vector; to use the above example it would come > to the first value >0 and then create the new vector a1.1 = (1,1,1) then > as the next value in the series is 0 it would close this new vector > 'a1.1'. It would then continue until it reaches the next value >0 and > then create the vector a1.2 = (1,1,1,1) then again as the next value in > the series is 0 it would close this new vector, and so on. > > > > Then all I need to do is perform a count of '1's in these new vectors to > find how many days they met this criteria of being greater than 0 > > > > I hope the above makes sense and I really hope there is someone willing > and able to help. I don't know how to proceed. > > > > Thanks, > > Garth > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overlaid transparent histograms
> Three questions: > 1a) Why does the following code not produce transparent bars? Because you're setting the fill colour (not mapping it to a variable in your dataset), the fill needs to be outside of aes() g + geom_histogram(aes(x=log(BNCw)), fill = alpha("red", .5)) + geom_histogram(aes(x=log(BNCc)), fill = alpha("blue", .5)) > 1b) How can I manually specify the elements of the legend for this > version of the plot? Use the "manual" scale: g + geom_histogram(aes(x=log(BNCw), fill = "w")) + geom_histogram(aes(x=log(BNCc), fill = "c")) + scale_fill_manual("BNC type", values = alpha(c("red","blue"), 0.5)) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset by Factor by date
aggregate() is indeed a useful function in this case, but it only returns the columns by which it was grouped. Is there a way I can use this while simultaneously retaining all the other column values in the dataframe? e.g. add superfluous (yet pertinent for later) column containing any information at all and retain it in the final output Marc Schwartz wrote: > > on 06/13/2008 11:10 PM T.D.Rudolph wrote: >> I have a dataframe, x, with over 60,000 rows that contains one Factor, >> "id", >> with 27 levels. >> The dataframe contains numerous continuous values (along column "diff") >> per >> day (column "date") for every level of id. I would like to select only >> one >> row per animal per day, i.e. that containing the minimum value of "diff", >> along the full length of 1:nrow(x). I am not yet able to conduct >> anything >> beyond the simplest of functions and I was hoping someone could suggest >> an >> effective way of producing this output. >> >> e.g. given this input: >> >> id day diff >> 1 01-01-09 0.5 >> 1 01-01-09 0.7 >> 2 01-01-09 0.2 >> 2 01-01-09 0.4 >> 1 01-02-09 0.1 >> 1 01-02-09 0.3 >> 2 01-02-09 0.3 >> 2 01-02-09 0.4 >> >> I would like to produce this output: >> id day diff >> 1 01-01-09 0.5 >> 2 01-01-09 0.2 >> 1 01-02-09 0.1 >> 2 01-02-09 0.3 >> >> It doesn't seem extremely difficult but I'm sure there are easier ways >> than >> how I am currently approaching it! > > See ?aggregate > > > DF >id day diff > 1 1 01-01-09 0.5 > 2 1 01-01-09 0.7 > 3 2 01-01-09 0.2 > 4 2 01-01-09 0.4 > 5 1 01-02-09 0.1 > 6 1 01-02-09 0.3 > 7 2 01-02-09 0.3 > 8 2 01-02-09 0.4 > > > > aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE) >id day x > 1 1 01-01-09 0.5 > 2 2 01-01-09 0.2 > 3 1 01-02-09 0.1 > 4 2 01-02-09 0.3 > > > Note that I have not converted the 'day' column to a 'date' class. You > would need to do that to perform any other date related operations > (including chronological sorting) on that column. See ?as.Date for more > information. For example: > >DF$day <- as.Date(DF$day, format = "%m-%d-%y") > > > HTH, > > Marc Schwartz > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17836046.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strsplit, keeping delimiters
Hi all, Does anyone have a version of strsplit that keeps the string that is split by. e.g. from x <- "A: 123 B: 456 C: 678" I'd like to get c("A:", "123 ", "B: ", "456 ", "C: ", 678) but strsplit(x, "[A-Z]+:") gives me c("", " 123 ", " 456 ", " 678") Any ideas? Thanks, Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind() problem
Hi array (?), Hi, I would like to rbind 2 data frames. They both some common column names, but also some unique column names each, is there any simple function that rbind these 2 data frames with filling NAs for those columns of unique names? You can use the reshape package by Hadley Wickham for this: df1 <- data.frame(V1 = rnorm(10), V2 = rnorm(10), V4 = rnorm(10)) df2 <- data.frame(V1 = rnorm(10), V3 = rnorm(10), V4 = rnorm(10)) library(reshape) rbind.fill(df1, df2) HTH, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset by Factor by date
On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote: aggregate() is indeed a useful function in this case, but it only returns the columns by which it was grouped. Is there a way I can use this while simultaneously retaining all the other column values in the dataframe? e.g. add superfluous (yet pertinent for later) column containing any information at all and retain it in the final output I had exactly this kind of need many times, and I have finally created a function for it, which I hope to include soon in an upcoming package. Here is a run of it (I added an extra "A" column containing just the numbers 1:8): > DF id day diff A 1 1 01-01-09 0.5 1 2 1 01-01-09 0.7 2 3 2 01-01-09 0.2 3 4 2 01-01-09 0.4 4 5 1 01-02-09 0.1 5 6 1 01-02-09 0.3 6 7 2 01-02-09 0.3 7 8 2 01-02-09 0.4 8 > byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),]) diff A id day 1 0.5 1 1 01-01-09 2 0.2 3 2 01-01-09 3 0.1 5 1 01-02-09 4 0.3 7 2 01-02-09 Would that do what you want? I've appended the function byDataFrame, and its prerequisite, a function parseIndexList. I'm not quite set on the names yet, but anyway. Hope this helps. I haven't really tested it on large sets, it might perform poorly. Any suggestions on speeding the code / corrections are welcome. Haris Skiadas Department of Mathematics and Computer Science Hanover College parseIndexList <- function(indexList) { # browser() if (!is.list(indexList)) indexList <- as.list(indexList) nI <- length(indexList) namelist <- vector("list", nI) names(namelist) <- names(indexList) extent <- integer(nI) nx <- length(indexList[[1]]) one <- as.integer(1) group <- rep.int(one, nx) ngroup <- one for (i in seq.int(indexList)) { index <- as.factor(indexList[[i]]) if (length(index) != nx) stop("arguments must have same length") namelist[[i]] <- sort(unique(indexList[[i]])) extent[i] <- length(namelist[[i]]) group <- group + ngroup * (as.integer(index) - one) ngroup <- ngroup * nlevels(index) } nms <- do.call(expand.grid, namelist) ind <- unique(sort(group)) res <- data.frame(index=ind, nms[ind, , drop=FALSE]) return(list(cases=group, groups=res)) } byDataFrame <- function (data, INDEX, FUN, newnames, omit.index.cols=TRUE, ...) { # # Part of the code shamelessly stolen from tapply IND <- eval(substitute(INDEX), data) nms <- as.character(as.list(substitute(INDEX))) if (!is.list(IND)) { IND <- list(IND) names(IND) <- nms } else { names(IND) <- nms[-1] } funname <- paste(as.character(substitute(FUN)), collapse=".") indexInfo <- parseIndexList(IND) FUNx <- if (omit.index.cols) { omit.cols <- match(names(indexInfo$groups)[-1], names(data)) function(x, ...) FUN(data[x, -omit.cols], ...) } else { function(x, ...) FUN(data[x, ], ...) } ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...) index <- as.numeric(names(ans)) if (!is.data.frame(ans[[1]])) { ans <- lapply(ans, function(x) { dframe <- as.data.frame(t(x)) if (is.null(names(x))) names(dframe) <- funname dframe }) } lengths <- sapply(ans, nrow) ans <- do.call(rbind, ans) if (!missing(newnames)) names(ans) <- newnames nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE] res <- cbind(ans, nms) res } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strsplit, keeping delimiters
Try this: > library(gsubfn) > x <- "A: 123 B: 456 C: 678" > strapply(x, "[^ :]+[ :]|[^ :]+$") [[1]] [1] "A:" "123 " "B:" "456 " "C:" "678" and check out the gsubfn home page at: http://gsubfn.googlecode.com On Sat, Jun 14, 2008 at 1:35 AM, hadley wickham <[EMAIL PROTECTED]> wrote: > Hi all, > > Does anyone have a version of strsplit that keeps the string that is > split by. e.g. from > x <- "A: 123 B: 456 C: 678" > > I'd like to get > > c("A:", "123 ", "B: ", "456 ", "C: ", 678) > > but > strsplit(x, "[A-Z]+:") > > gives me > c("", " 123 ", " 456 ", " 678") > > Any ideas? > > Thanks, > > Hadley > > -- > http://had.co.nz/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.