[R] Inverse of FAQ 7.31.
Why does R think these numbers ***are*** equal? In a somewhat bizarre set of circumstances I calculated x0 - 0.03580067 x1 - 0.03474075 y0 - 0.4918823 y1 - 0.4474461 dx - x1 - x0 dy - y1 - y0 xx - (x0 + x1)/2 yy - (y0 + y1)/2 chk - yy*dx - xx*dy + x0*dy - y0*dx If you think about it ***very*** carefully ( :-) ) you'll see that ``chk'' ought to be zero. Blow me down, R gets 0. Exactly. To as many significant digits/decimal places as I can get it to print out. But I wrote a wee function in C to do the *same* calculation and dyn.load()-ed it and called it with .C(). And I got -1.248844e-19. This is of course zero, to all floating point arithmetic intents and purposes. But if I name the result returned by my call to .C() ``xxx'' and ask xxx = 0 I get FALSE whereas ``chk = 0'' returns TRUE (as does ``chk = 0'', of course). (And inside my C function, the comparison ``xxx = 0'' yields ``false'' as well.) I was vaguely thinking that raw R arithmetic would be equivalent to C arithmetic. (Isn't R written in C?) Can someone explain to me how it is that R (magically) gets it exactly right, whereas a call to .C() gives the sort of ``approximately right'' answer that one might usually expect? I know that R Core is ***good*** but even they can't make C do infinite precision arithmetic. :-) This is really just idle curiosity --- I realize that this phenomenon is one that I'll simply have to live with. But if I can get some deeper insight as to why it occurs, well, that would be nice. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loops to assign a unique ID to a column
Dear R help, I am fairly new in data management and programming in R, and am trying to write what is probably a simple loop, but am not having any luck. I have a dataframe with something like the following (but much bigger): Dates-c(12/10/2010,12/10/2010,12/10/2010,13/10/2010, 13/10/2010, 13/10/2010) Groups-c(A,B,B,A,B,C) data-data.frame(Dates, Groups) I would like to create a new column in the dataframe, and give each distinct date by group a unique identifying number starting with 1, so that the resulting column would look something like: ID-c(1,2,2,3,4,5) The loop that I have started to write is something like this (but doesn't work!): data$ID-as.number(c()) for(i in unique(data$Dates)){ for(j in unique(data$Groups)){ data$ID[i,j]-i i-i+1 } } Am I on the right track? Any help on this is much appreciated! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loops to assign a unique ID to a column
Dear Chandra, You're on the wrong track. You don't need for loops as you can do this vectorised. as.numeric(interaction(data$Groups, data$Dates, drop = TRUE)) Best regards, Thierry -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Chandra Salgado Kent Verzonden: dinsdag 2 augustus 2011 9:12 Aan: r-help@r-project.org Onderwerp: [R] Loops to assign a unique ID to a column Dear R help, I am fairly new in data management and programming in R, and am trying to write what is probably a simple loop, but am not having any luck. I have a dataframe with something like the following (but much bigger): Dates-c(12/10/2010,12/10/2010,12/10/2010,13/10/2010, 13/10/2010, 13/10/2010) Groups-c(A,B,B,A,B,C) data-data.frame(Dates, Groups) I would like to create a new column in the dataframe, and give each distinct date by group a unique identifying number starting with 1, so that the resulting column would look something like: ID-c(1,2,2,3,4,5) The loop that I have started to write is something like this (but doesn't work!): data$ID-as.number(c()) for(i in unique(data$Dates)){ for(j in unique(data$Groups)){ data$ID[i,j]-i i-i+1 } } Am I on the right track? Any help on this is much appreciated! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inverse of FAQ 7.31.
On Aug 2, 2011, at 08:02 , Rolf Turner wrote: Why does R think these numbers ***are*** equal? In a somewhat bizarre set of circumstances I calculated x0 - 0.03580067 x1 - 0.03474075 y0 - 0.4918823 y1 - 0.4474461 dx - x1 - x0 dy - y1 - y0 xx - (x0 + x1)/2 yy - (y0 + y1)/2 chk - yy*dx - xx*dy + x0*dy - y0*dx If you think about it ***very*** carefully ( :-) ) you'll see that ``chk'' ought to be zero. Blow me down, R gets 0. Exactly. To as many significant digits/decimal places as I can get it to print out. But I wrote a wee function in C to do the *same* calculation and dyn.load()-ed it and called it with .C(). And I got -1.248844e-19. This is of course zero, to all floating point arithmetic intents and purposes. But if I name the result returned by my call to .C() ``xxx'' and ask xxx = 0 I get FALSE whereas ``chk = 0'' returns TRUE (as does ``chk = 0'', of course). (And inside my C function, the comparison ``xxx = 0'' yields ``false'' as well.) I was vaguely thinking that raw R arithmetic would be equivalent to C arithmetic. (Isn't R written in C?) Can someone explain to me how it is that R (magically) gets it exactly right, whereas a call to .C() gives the sort of ``approximately right'' answer that one might usually expect? I know that R Core is ***good*** but even they can't make C do infinite precision arithmetic. :-) This is really just idle curiosity --- I realize that this phenomenon is one that I'll simply have to live with. But if I can get some deeper insight as to why it occurs, well, that would be nice. I think the long and the short of it is that R lost a couple of bits of precision that C retained. This sort of thing happens if R stores things into 64 bit floating point objects while C keeps them in 80 bit CPU registers. In general, floating point calculations do not obey the laws of math, for example the associative law (i.e., (a+b)-c ?= a+(b-c), especially if b and c are large and nearly equal), so any reordering of expressions by the compiler may give a slightly different result. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting problems directional or rose plots
On 08/02/2011 01:38 AM, kitty wrote: Hi again, I have tried playing around with the code given to me by Alan and Jim, thank you for the code but unfortunatelyI can't seem to get either of them to work... Alans does not work with the sample data and Jims is giving the error : Error in radial.grid(labels = labels, label.pos = label.pos, radlab = radlab, : could not find function boxed.labels I have also tried Rose plots in the (heR.Misc) library to to avail. Sorry, does anyone know how to get the plots I need? Hi kitty, Oops, I forgot that the code calls boxed.labels, a function in the plotrix package. Install that and it should work. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Environment of a LM created in a function
Dear Peter, Thanks for your concise answer, it works perfectly. By the way, I fully agree that data or df are not good names for data.frames and I am/was aware of that and I usually avoid those names (not consequently though I've to admit, it is too tempting ;). However, if one uses those evil names, one cannot expect to receive meaningful error messages. Thus, I was not astonished by the peculiar error message itself (in fact I was well aware that this has to do with the bad naming and the fact that data is, above all, a function) and I suspect the error to be due to environment issues. I tried the workaround with passing the very same data argument explicitly to update: update(models[[1]], . ~ ., data = dat) which worked but which left the stale impression of redundancy and even more dangerous error proneness: what happens if the name of the data frame is changed earlier? Finally, your suggestion with update(models[[1]], . ~ ., data = model.frame(models[[1]])) solved all the issues (and I was wondering why I did not try it out myself, so obviously I was not seeing the wood for the trees). So, thanks a lot for your help. Have a nice day. KR, -Thorn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if function problems
Hi another possibility is to use logical values properties (x 0)*x [1] -3 -2 -1 0 0 0 0 Regards Petr In addition to what David said: On Mon, Aug 1, 2011 at 6:57 PM, zoe_zhang 1987.zhan...@gmail.com wrote: Dear All, Sorry to bother I want to write a function in R using if Say I have a dataset x, if x[i]0, then x[i]=x[i], if x[i]0, then x[i]=0 for example, x=-3:3, then using the function, x becomes [-3,-2,-1,0,0,0,0] I write the codes as follows, gjr=function(x) {lena=length(x) for(i in 1:lenx) if (x[i]0) return (x[i]) if (x[i]0) return (0) x} but then, doing gjr(x) it only comes out with one number Does anyone have any suggestions? You define `lena`, but then use `lenx` in `for (i in 1:lenx)` in your function ... I guess this might have something to do with it. You shouldn't use a for loop, though, and just follow david's advice by using logical indexing, or the `ifelse` function, ie: R ifelse(x 0, x, 0) HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ivreg and structural change
On Mon, 1 Aug 2011, Claudio Shikida (??) wrote: Hello, I am looking for some help with this question: how could I test structural breaks in a instrumental variables?s model? In principle, most of tests used in the standard linear regression model can also be transferred to the IV case. However, many of the functions in strucchange do not do this. A notable exception is the function gefp(), see its manual page for references. This allows you to do something like gefp(y ~ x1 + x2 | z1 + z2, fit = ivreg, data = d) etc. For example, I was trying to do something with my model with three time series. tax_ivreg - ivreg(l_y ~ l_x2 + l_x1+ dl_y | lag(l_x2, -1)+lag(l_x2, -2)+ lag(l_x1, -1)+lag(l_x1, -2)+lag(l_y, -1)+lag(l_y, -2), data=tax1) summary(tax_ivreg) I guess that this does not do what you want it to do. I would guess that this essentially yields a standard linear regression because the lag() is not correctly processed. If you want to use ivreg(), you need to set up the lagged variables by hand in advance. Alternatively, you can use dynlm() from the dynlm package which allows you to use lag() or the simpler L() function in the formula together with zoo data. For an example, how to set up the lagged variables by hand, you can look at the manual page of breakpoints(), especially the seatbelt data example. hth, Z ## after estimating it, something weird happened with the several tests in package strucchange. For example: cusum - efp(l_y ~ l_x2 + l_x1+ dl_y | lag(l_x2, -1)+lag(l_x2, -2)+ lag(l_x1, -1)+lag(l_x1, -2)+lag(l_y, -1)+lag(l_y, -2), data=tax1, type=OLS-CUSUM) sctest(cusum) plot(cusum) coef(cusum, breaks=2) ## And: cusum - efp(tax_ivreg, data=tax1, type=OLS-CUSUM) sctest(cusum) plot(cusum) coef(cusum, breaks=2) ## 1. The plot of the two above were very different and ## 2. When I ask for the breaks, instead of the dates, it returned me a line of the summary of the estimated tax_ivreg Any help would be very appreciated. Thanks Claudio -- http://www.shikida.net and http://works.bepress.com/claudio_shikida/ Esta mensagem pode conter informa??o confidencial e/ou privilegiada. Se voc? n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o poder? usar, copiar ou divulgar as informa??es nela contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por engano, por favor avise imediatamente o remetente, respondendo o presente e-mail e apague-o em seguida. This message may contain confidential and/or privileged ...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if function problems
Thank you for your adding, Steve, i followed Daivd's suggection and finally got the answer. It is my careless that should put lena instead of lenx. I also tried your codes and worked well. I appreciate your help. I learnt a lot from this forum. Cheers, Zoe -- View this message in context: http://r.789695.n4.nabble.com/if-function-problems-tp3710995p3711340.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R CMD check problem
Dear friends, I am building an R package called *mypackage*. I followed every possible steps (to my understanding) for the same. I got following problem while doing *R CMD check mypackage*. * installing *source* package 'mypackage' ... ** libs cygwin warning: MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames ERROR: compilation failed for package 'mypackage' * removing 'C:/Rpackages/mypackage.Rcheck/mypackage'. What I understood from above is that it is something with PATH variable. I had set the following PATH variable: C:\Rtools\bin;C:\Rtools\MinGW\bin;C:\Program Files\R\R-2.13.0\bin;C:\Program Files\MiKTeX 2.9\miktex\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;C:\Program Files\HTML Help Workshop Can anybody suggest what possibly could have gone wrong? Thanks, BN Mandal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to 'mute' a function (like confint())
Dear R-helpers, I am using confint() within a function, and I want to turn off the message it prints: x - rnorm(100) y - x^1.1+rnorm(100) nlsfit - nls(y ~ g0*x^g1, start=list(g0=1,g1=1)) confint(nlsfit) Waiting for profiling to be done... 2.5%97.5% g0 0.4484198 1.143761 g1 1.0380479 2.370057 I cannot find any way to turn off 'Waiting for. .. I tried options(max.print=0) and even sink(tempfile()) confint(nlsfit) sink() This suppresses the printing of the table, but not the cat()-ing of the 'Waiting for...'. But it keeps writing this message; is there any way to mute it, for this function and more generally? thanks, Remko -- View this message in context: http://r.789695.n4.nabble.com/How-to-mute-a-function-like-confint-tp3711537p3711537.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting ELISA measurements unknowns to 4 parameter logistic model
Try http://www.myassays.com/four-parameter-fit.assay It’s free, requires no install and pre-configured for ELISAs. Just paste and go AW -- View this message in context: http://r.789695.n4.nabble.com/Fitting-ELISA-measurements-unknowns-to-4-parameter-logistic-model-tp3252381p3711676.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clean up a scatterplot with too much data
I'm working with a lot of data right now, but I'm new to R, and not very good with it, hence my request for help. What type of graph could I use to straighten out things like... http://r.789695.n4.nabble.com/file/n3711389/Untitled.png ...this? I want to see general frequencies. Should I use something like a 3D histogram, or is there an easier way like, say, shading? I'm sure these are both possible, but I don't know which is easiest or how to implement either of them. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'mute' a function (like confint())
See ?suppressMessages On Tue, 2 Aug 2011, Remko Duursma wrote: Dear R-helpers, I am using confint() within a function, and I want to turn off the message it prints: x - rnorm(100) y - x^1.1+rnorm(100) nlsfit - nls(y ~ g0*x^g1, start=list(g0=1,g1=1)) confint(nlsfit) Waiting for profiling to be done... 2.5%97.5% g0 0.4484198 1.143761 g1 1.0380479 2.370057 I cannot find any way to turn off 'Waiting for. .. I tried options(max.print=0) and even sink(tempfile()) confint(nlsfit) sink() This suppresses the printing of the table, but not the cat()-ing of the 'Waiting for...'. But it keeps writing this message; is there any way to mute it, for this function and more generally? thanks, Remko -- View this message in context: http://r.789695.n4.nabble.com/How-to-mute-a-function-like-confint-tp3711537p3711537.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clean up a scatterplot with too much data
Hi, One solution could be to subsample the data, or jitter the data (give it some random noise). A more elegant solution, imho, is to use a 2d histogram (3d histogram is not a good alternative, I think it is much better to use color instead of a third dimension). I don't think this is easy to make using the standard plot system in R, but ggplot2 handles it nicely. This would involve you needing to learn ggplot2, but I would highly recommend that anyways :). An example of the plot I have in mind can be seen at: http://had.co.nz/ggplot2/stat_bin2d.html Just scroll down a bit for some examples. cheers, Paul On 08/02/2011 05:26 AM, DimmestLemming wrote: I'm working with a lot of data right now, but I'm new to R, and not very good with it, hence my request for help. What type of graph could I use to straighten out things like... http://r.789695.n4.nabble.com/file/n3711389/Untitled.png ...this? I want to see general frequencies. Should I use something like a 3D histogram, or is there an easier way like, say, shading? I'm sure these are both possible, but I don't know which is easiest or how to implement either of them. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 102, Issue 2
Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden Fällen melden Sie sich bei Stefanie von Felten steffi.vonfel...@oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfel...@oikostat.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize(stack data) a dataframe inducing names
Works perfectly. Thanks. f. On 1 August 2011 18:22, jim holtman jholt...@gmail.com wrote: Try this: had to add extra names to your data since it was not clear how it was organized. Next time use 'dput' to enclose data. x - read.table(textConnection( index time key date values + 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 + 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 + 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 + 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 + 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 + 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 + 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 + 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 + 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 + 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 + 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 + 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 + 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 + 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 + 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 + 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 + 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43) + , header = TRUE + , as.is = TRUE + ) closeAllConnections() x index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 # get index of first occurance of 'key' column indx - !duplicated(x$key) x[indx,] index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 On Mon, Aug 1, 2011 at 11:13 AM, Francesca francesca.panco...@gmail.com wrote: Dear Contributors thanks for any help you can provide. I searched the threads but I could not find any query that satisfied my needs. This is my database: index time values 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43 I need to select only the rows of this database that correspond to each of the first occurrences of the string represented in column index. In the example shown I would like to obtain a new data.frame which is index time values 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 13775 31215 DATA.Q211.WF.Index04/14/11 1.44 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 As you can see, it is not the whole string to change, rather a
Re: [R] Clean up a scatterplot with too much data
DimmestLemming wrote: I'm working with a lot of data right now, but I'm new to R, and not very good with it, hence my request for help. What type of graph could I use to straighten out things like... http://r.789695.n4.nabble.com/file/n3711389/Untitled.png Three nice alternatives: example(smoothScatter) example(sunflowerplot) library(hexbin) example(hexbinplot) (And do remove the outliers before plotting.) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD check problem
On 11-08-02 5:26 AM, Baidya Nath Mandal wrote: Dear friends, I am building an R package called *mypackage*. I followed every possible steps (to my understanding) for the same. I got following problem while doing *R CMD check mypackage*. * installing *source* package 'mypackage' ... ** libs cygwin warning: MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames I believe that warning is ignorable, but you can turn it off using set CYGWIN=nodosfilewarning It probably didn't cause the error below. ERROR: compilation failed for package 'mypackage' I don't know what did cause that error, but it's likely something in your src directory of the package. What do you have there? Duncan Murdoch * removing 'C:/Rpackages/mypackage.Rcheck/mypackage'. What I understood from above is that it is something with PATH variable. I had set the following PATH variable: C:\Rtools\bin;C:\Rtools\MinGW\bin;C:\Program Files\R\R-2.13.0\bin;C:\Program Files\MiKTeX 2.9\miktex\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;C:\Program Files\HTML Help Workshop Can anybody suggest what possibly could have gone wrong? Thanks, BN Mandal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD check problem
The cygwin warning should not be fatal. Is that what made you think there's a problem with your path? Can you upload mypackage online? Two options would be Github hosts that sort of thing or you could use a tar ball and any file hosting service. I (and possibly others more skilled) would be happy to try it on my system if I had it. You should also be able to see exactly where in the build process it failed from the log. Cheers, Josh On Aug 2, 2011, at 2:26, Baidya Nath Mandal mandal.s...@gmail.com wrote: Dear friends, I am building an R package called *mypackage*. I followed every possible steps (to my understanding) for the same. I got following problem while doing *R CMD check mypackage*. * installing *source* package 'mypackage' ... ** libs cygwin warning: MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames ERROR: compilation failed for package 'mypackage' * removing 'C:/Rpackages/mypackage.Rcheck/mypackage'. What I understood from above is that it is something with PATH variable. I had set the following PATH variable: C:\Rtools\bin;C:\Rtools\MinGW\bin;C:\Program Files\R\R-2.13.0\bin;C:\Program Files\MiKTeX 2.9\miktex\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;C:\Program Files\HTML Help Workshop Can anybody suggest what possibly could have gone wrong? Thanks, BN Mandal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting question
Andrew McCulloch wrote: I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are distinguished by a third variable which is categorical with 10 levels. I have been plotting x against y and using gray scales to distinguish the level of the categorical variable for each point. It looks ok to me but a journal reviewer says this is not any use. I cannot afford to pay for colour prints. Any ideas on what is the best way to distinguish 10 groups on an xy scatter plot? How about having *10* scatterplots + an identical grid in each plot? Try example(coplot) for an idea about it could look (ignore the marginal plots). Of course, do use the lattice or the ggplot2 package, not the coplot function. Too bad you have 10 groups and not 9 (or 12), BTW ... :-/ -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R the right choice for simulating first passage times of random walks?
Dear Dennis and Steve, Am Sonntag, den 31.07.2011, 23:32 -0400 schrieb Steve Lianoglou: […] How about trying to write the of this `f4` function below using the rcpp/inline combo. The C/C++ you will need to write looks to be quite trivial, let's change f4 to accept an x argument as a vector: I've defined f4 in the same way as Dennis did: f4 - function() { x - sample(c(-1L,1L), 1) if (x = 0 ) {return(1)} else { csum - x len - 1 while(csum 0) { csum - csum + sample(c(-1, 1), 1) len - len + 1 } } len } Now, let's do some inline/c++ mojo: library(inline) inc - #include stdio.h #include stdlib.h #include time.h fxx -cxxfunction(includes=inc, plugin=Rcpp, body= int len = 1; int x = ((rand() % 2 ) == 0) ? 1 : -1; int csum = x; while (csum 0) { x = ((rand() % 2 ) == 0) ? 1 : -1; len++; csum = csum + x; } return wrap(len); ) Assuming I've faithfully translated this into c++, the timings aren't all that comparable. Doing 500 replicates with the pure R version: set.seed(123) system.time(out - replicate(500, f4())) user system elapsed 31.525 0.120 32.510 Doing 10,000 replicates using the fxx function doesn't even break a sweat: system.time(outxx - replicate(1, fxx())) user system elapsed 0.371 0.001 0.373 range(out) [1] 1 1994308 range(outxx) [1]1 11909394 thank you very much for your suggestions. This is indeed a nice speed. 1. I first had that implemented in FORTRAN (and Python) too, but turned to R for two reasons. First I wanted to use also other distributions later on and thought that it would be easier with R and that R would have that implemented as fast as possible. Secondly I thought that R would also operate faster having the right vectorization and using `csum()`. But I guess it is difficult to find a good model to use the advantages of R. Especially looking at `top` when running this example CPU is used 100 % but memory only 40 MB from 2 GB. So if one could use another data structure maybe the calculations could be done on more walks at once. 2. It is indeed possible that the walk never returns to zero, so I should make sure, that I abort the while loop after a certain length. 3. Looking at the data types I am wondering if some integer overflow(?) could happen. I could make the length variable unsigned I suppose [1]. But still `csum` could go from `-len` to 0 and for the normal random walk unsigned should not be a problem too besides that the logic/checks have to be adapted. For integrated random walks, `ccsum += csum`, `ccsum` would go from -(ccsum**2)/2 up to 0. So later on I should use probably the 64 bit data type (unsigned) `long` for `ccsum`, `csum` and `length` to avoid those problems. Memory does not seem to be a problem. Also I need to add an additional check for the height and length in the while loop like the following. (csum 0) (csum -ULONG_MAX) (len = ULONG_MAX) So I came up with the following and to use unsigned I only consider that the random walk stays positive instead of negative. 8 code 8 library(inline) inc - #include climits #include stdio.h #include stdlib.h #include time.h f9 -cxxfunction(includes=inc, plugin=Rcpp, body= unsigned long len = 1; if ((rand() % 2 ) == 0) { return wrap(len); } unsigned long x = 1; for (unsigned long csum = x; csum 0; csum = ((rand() % 2 ) == 0) ? csum + 1: csum - 1) { len++; if ((csum == ULONG_MAX) (len == ULONG_MAX)) { return wrap(len); } } return wrap(len); ) 8 code 8 I do not know if the compiler would have optimized it that way anyway and if there is any difference (besides the overflow checks). set.seed(1); system.time( z9_1 - replicate(1000, f9()) ) User System verstrichen 0.076 0.004 0.084 range(z9_1) [1] 1 1449034 length(z9_1) [1] 1000 Thanks, Paul [1] https://secure.wikimedia.org/wikipedia/en/wiki/Integer_(computer_science)#Common_integral_data_types signature.asc Description: This is a digitally signed message part __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory limit in Aggregate()
Dear all, I am trying to aggregate a table (divided in two lists here), but get a memory error. Here is the code I'm running : sessionInfo() print(paste(memory.limit() , memory.limit())) print(paste(memory.size() , memory.size())) print(paste(memory.size(TRUE) , memory.size(TRUE))) print(paste(size listX , object.size(listX))) print(paste(size listBy , object.size(listBy))) print(paste(length , object.size(nrow(listX tableAgg - aggregate(x = listX , by = listBy , FUN = max) It returns : R version 2.9.0 Patched (2009-05-09 r48513) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1];stats;graphics;grDevices;utils;datasets;methods;base other attached packages: [1];RODBC_1.3-2;HarpTools_1.4;HarpReport_1.9 loaded via a namespace (and not attached): [1];tools_2.9.0 [1];memory.limit() 4095 [1];memory.size() 31.92 [1];memory.size(TRUE) 166.94 [1];size listX 218312 [1];size listBy 408552 [1];length 9083 Erreur in vector(list, prod(extent)) : cannot allocate vector of length 1224643220 (the last line is translated from the french error message impossible d'allouer un vecteur de longueur 1224643220 ) Why would R create such a long vector (my original lists , and is there a way to avoid this error ? Thank you for your help, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3711819.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] efficient way to reduce running time
Dear R users, Would you plz tell me how to avoid this for loop blow?? I think there might be a better way to reduce running time. -- ## y1 and y2 are n*1 vectors for (k in 1:n){ ymax - max( y1[k], y2[k] ) i - 0:ymax sums- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) maxsums - max(sums) sums - sums - maxsums lsum - log( sum(exp(sums)) ) + maxsums logbp[k] - y1[k] + y2[k] + lsum } Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/efficient-way-to-reduce-running-time-tp3711985p3711985.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standard Deviation of a matrix
Hello, My R knowledge could not take me any further, so this request ! I have a matrix of dimensions (1185 X 1185). I want to calculate standard deviation of entire matrix. sd function of {stats} calculates standard deviation for each row/column, giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as result. Any ideas, how to do this ? TIA Chakri -- View this message in context: http://r.789695.n4.nabble.com/Standard-Deviation-of-a-matrix-tp3711991p3711991.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using Function
Hi, I have some simple statistics to calculate for a large number of variables. I created a simple function to apply to variables. I would like the variable name to be placed automatically. I tried the following function but is not working. desc = function(x){ media = mean(x, na.rm=T) desvio = sd(x, na.rm=T) cv = desvio/media*100 saida = cbind(media, desvio, cv) colnames(saida) = c(NULL, 'Média', 'Desvio', 'CV') rownames(saida) = c(x) saida } desc(Idade) Média Desvio CV Idade 44.04961 16.9388 38.4539 How do you get the variable name is placed as the first element? My objective is get something like: rbind( desc(Altura), desc(Idade), desc(IMC), desc(FC), desc(CIRCABD), desc(GLICOSE), desc(UREIA), desc(CREATINA), desc(CTOTAL), desc(CHDL), desc(CLDL), desc(CVLDL), desc(TRIG), desc(URICO), desc(SAQRS), desc(SOKOLOW_LYON), desc(CORNELL), desc(QRS_dur), desc(Interv_QT) ) Thanks a lot, -- Silvano Cesar da Costa Departamento de Estatística Universidade Estadual de Londrina Fone: 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard Deviation of a matrix
Hi! The sample below should give you what you want: M = matrix(runif(100), 10, 10) sd(as.numeric(M)) So the as.numeric command is the key. It transforms the matrix to a 1D vector. Or alternatively without using as.numeric: M = matrix(runif(100), 10, 10) M dim(M) = 100 M sd(M) Here I use the dim command to set the dimensions to a vector of 100 long. cheers, Paul On 08/02/2011 11:07 AM, chakri wrote: Hello, My R knowledge could not take me any further, so this request ! I have a matrix of dimensions (1185 X 1185). I want to calculate standard deviation of entire matrix. sd function of {stats} calculates standard deviation for each row/column, giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as result. Any ideas, how to do this ? TIA Chakri -- View this message in context: http://r.789695.n4.nabble.com/Standard-Deviation-of-a-matrix-tp3711991p3711991.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Errors, driving me nuts
On 08/01/2011 08:47 PM, Matt Curcio wrote: Greetings all, I am getting this error that is driving me nuts... (not a long trip, haha) I have a set of files and in these files I want to calculate ttests on rows 'compareA' and 'compareB' (these will change over time there I want a variable here). Also these files are in many different directories so I want a way filter out the junk... Anyway I don't believe that this is related to my errors but I mention it none the less. files_to_test - list.files (pattern = kegg.combine) for (i in 1:length (files_to_test)) { +raw_data - read.table (files_to_test[i], header=TRUE, sep= ) +tmpA - raw_data[,compareA] +tmpB - raw_data[,compareB] +tt - t.test (tmpA, tmpB, var.equal=TRUE) +tt_pvalue[i] - tt$p.value + } Error in tt_pvalue[i] - tt$p.value : object 'tt_pvalue' not found # I tried setting up a vector... # as.vector(tt_pvalue, mode=any) ### but NO GO ...an awesome alternative is to use ldply from the plyr package: library(plyr) files_to_test - list.files (pattern = kegg.combine) tt_pvalue - ldply(files_to_test, function(fname) { raw_data - read.table (files_to_test[i], header=TRUE, sep= ) tmpA - raw_data[,compareA] tmpB - raw_data[,compareB] tt - t.test (tmpA, tmpB, var.equal=TRUE) return(data.frame(fname = fname, pvalue = tt$p.value)) }, .progress = TRUE) This saves you some bookkeeping (no need to create tt_pvalue in advance and keep track of the iterator (i)) and you get a nice progress bar (good when loops take long). ldply (and other plyr functions) are what I use most when processing large amounts of information. cheers, Paul file.name = paste(ttest.results., compareA, compareB, ) setwd(save_to) write.table(tt_pvalue, file=file.name, sep=\t ) Error in inherits(x, data.frame) : object 'tt_pvalue' not found # No idea?? What is going wrong?? M Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard Deviation of a matrix
Hi Hi! The sample below should give you what you want: M = matrix(runif(100), 10, 10) sd(as.numeric(M)) So the as.numeric command is the key. It transforms the matrix to a 1D vector. Or alternatively without using as.numeric: M = matrix(runif(100), 10, 10) M dim(M) = 100 or dim(M)-NULL M sd(M) Here I use the dim command to set the dimensions to a vector of 100 long. cheers, Paul On 08/02/2011 11:07 AM, chakri wrote: Hello, My R knowledge could not take me any further, so this request ! I have a matrix of dimensions (1185 X 1185). I want to calculate standard deviation of entire matrix. sd function of {stats} calculates standard deviation for each row/column, giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as result. Any ideas, how to do this ? TIA Chakri -- View this message in context: http://r.789695.n4.nabble.com/Standard- Deviation-of-a-matrix-tp3711991p3711991.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard Deviation of a matrix
On Aug 2, 2011, at 8:48 AM, Petr PIKAL wrote: Hi Hi! The sample below should give you what you want: M = matrix(runif(100), 10, 10) sd(as.numeric(M)) So the as.numeric command is the key. It transforms the matrix to a 1D vector. Or alternatively without using as.numeric: M = matrix(runif(100), 10, 10) M dim(M) = 100 or dim(M)-NULL shortest would surely be: sd( c(M) ) -- David. M sd(M) Here I use the dim command to set the dimensions to a vector of 100 long. cheers, Paul On 08/02/2011 11:07 AM, chakri wrote: Hello, My R knowledge could not take me any further, so this request ! I have a matrix of dimensions (1185 X 1185). I want to calculate standard deviation of entire matrix. sd function of {stats} calculates standard deviation for each row/column, giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as result. Any ideas, how to do this ? TIA Chakri -- View this message in context: http://r.789695.n4.nabble.com/ Standard- Deviation-of-a-matrix-tp3711991p3711991.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem Installing/Uninstalling Rattle
Rattle won't install properly on my Windows 7 64 bit laptop. Here is what I've tried: I've followed the instructions here: http://rattle.togaware.com/rattle-install-mswindows.html I had R installed already. I downloaded the GTK+ packages, unzipped the 32 bit one into c:\gtkwin32. I put c:\gtkwin32\bin in the system variables PATH. I launched R, installed the rattle package, called the rattle library, called rattle(). It told me RGtk2 could not be found and asked to install it. I let it download it to install, but still nothing. Restarting/resintalling R has not helped. And when I try remove.packages(rattle) I get the error: Removing package(s) from ‘C:/Users/darwish/Documents/R/win-library/2.13’ (as ‘lib’ is unspecified) Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments I've restarted R before trying anything multiple times. From what I understand, I need to clean everything off and start anew. How do I remove rattle so I can start fresh? What did I do wrong in my steps? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Problem-Installing-Uninstalling-Rattle-tp3712221p3712221.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Functions for Sum of determinants of ranges of matrix subsets
Dear R-help list, Pls I have this problem. Suppose I have a matrix of size nxn say, generated as follows z-matrix(rnorm(n*n,0,1),nrow=n) I want to write a function such that for i in 1:n, I will remove the rows and columns corresponding to i (so, will be left with n-1*n-1 submatrix in each cases). Now I need the sum of the determinant of each of this submatrices. As an example, if n=3, it means I will have det(1strow and 1stcolum removed) + det(2ndrow and 2ndcolum removed) + det(3rdrow and 3rdcolum removed). Any help will be appreciated. Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] execute r-code stored in a string variable
Dear all I have a simple R question. How do I execute R-code stored in a variable? E.g if I have a variable which contains some R-code: c = reg - lm(sales$sales~sales$price) Is it possible to execute c E.g like Exec(c) I hope someone can help. Thank you Kim Lillesøe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Using Function
Hi Hi, I have some simple statistics to calculate for a large number of variables. I created a simple function to apply to variables. I would like the variable name to be placed automatically. I tried the following function but is not working. desc = function(x){ media = mean(x, na.rm=T) desvio = sd(x, na.rm=T) cv = desvio/media*100 saida = cbind(media, desvio, cv) colnames(saida) = c(NULL, 'Média', 'Desvio', 'CV') rownames(saida) = c(x) saida } You are quite close. This seems to do what you want if I presume that your variables are located in data frame desc = function(x){ media = mean(x, na.rm=T) desvio = sd(x, na.rm=T) cv = desvio/media*100 saida = data.frame(Media=media, Desvio=desvio, CV=cv) saida } iris4 - iris[,1:4] sapply(iris4, desc) Sepal.Length Sepal.Width Petal.Length Petal.Width Media 5.84 3.0573333.7581.199333 Desvio 0.82806610.4358663 1.765298 0.7622377 CV 14.17113 14.2564246.97441 63.55511 If you want switch rows and cols use t(sapply(iris4, desc)) Regards Petr desc(Idade) Média Desvio CV Idade 44.04961 16.9388 38.4539 How do you get the variable name is placed as the first element? My objective is get something like: rbind( desc(Altura), desc(Idade), desc(IMC), desc(FC), desc(CIRCABD), desc(GLICOSE), desc(UREIA), desc(CREATINA), desc(CTOTAL), desc(CHDL), desc(CLDL), desc(CVLDL), desc(TRIG), desc(URICO), desc(SAQRS), desc(SOKOLOW_LYON), desc(CORNELL), desc(QRS_dur), desc(Interv_QT) ) Thanks a lot, -- Silvano Cesar da Costa Departamento de Estatística Universidade Estadual de Londrina Fone: 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clean up a scatterplot with too much data
In addition to the other responses (all of which I liked), a couple of other alternatives to consider are 2D density plots (see ?kde2d in the MASS package, for example) or geom_tile() in the ggplot2 package, which you can think of as a 3D histogram projected to 2D with color corresponding to (relative) frequency, as suggested by Paul Hiemstra. geom_tile() is a discretized, gridded version of a hexbin plot, but I would start with the hexbin myself. I echo KOH's comment: make sure you remove the outliers first, especially that one in the upper left corner :) After looking at your plot, here's my question: why would you plot kills/minute vs. minutes played? Doesn't the first variable render the second one moot? Wouldn't kills vs. minutes played be a more relevant (scatter)plot? If you have information on the skill level of the players, you could incorporate that information into the plot as well. There are several nice ways to go if this is the case. If kills/minute is the more appropriate measure, a univariate density plot would make sense, or a histogram. HTH, Dennis On Mon, Aug 1, 2011 at 10:26 PM, DimmestLemming nicoadams...@gmail.com wrote: I'm working with a lot of data right now, but I'm new to R, and not very good with it, hence my request for help. What type of graph could I use to straighten out things like... http://r.789695.n4.nabble.com/file/n3711389/Untitled.png ...this? I want to see general frequencies. Should I use something like a 3D histogram, or is there an easier way like, say, shading? I'm sure these are both possible, but I don't know which is easiest or how to implement either of them. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clean up a scatterplot with too much data
On 08/02/2011 01:07 PM, Dennis Murphy wrote: In addition to the other responses (all of which I liked), a couple of other alternatives to consider are 2D density plots (see ?kde2d in the MASS package, for example) or geom_tile() in the ggplot2 package, which you can think of as a 3D histogram projected to 2D with color corresponding to (relative) frequency, as suggested by Paul Hiemstra. geom_tile() is a discretized, gridded version of a hexbin plot, but I When using geom_tile you need to bin the data yourself. I much prefer using stat_bin2d which does all the work for you. cheers, Paul would start with the hexbin myself. I echo KOH's comment: make sure you remove the outliers first, especially that one in the upper left corner :) After looking at your plot, here's my question: why would you plot kills/minute vs. minutes played? Doesn't the first variable render the second one moot? Wouldn't kills vs. minutes played be a more relevant (scatter)plot? If you have information on the skill level of the players, you could incorporate that information into the plot as well. There are several nice ways to go if this is the case. If kills/minute is the more appropriate measure, a univariate density plot would make sense, or a histogram. HTH, Dennis On Mon, Aug 1, 2011 at 10:26 PM, DimmestLemming nicoadams...@gmail.com wrote: I'm working with a lot of data right now, but I'm new to R, and not very good with it, hence my request for help. What type of graph could I use to straighten out things like... http://r.789695.n4.nabble.com/file/n3711389/Untitled.png ...this? I want to see general frequencies. Should I use something like a 3D histogram, or is there an easier way like, say, shading? I'm sure these are both possible, but I don't know which is easiest or how to implement either of them. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying US holidays
Now that I'm back at my computer, I'll actually suggest you do something else entirely. If you look at the code of holidayNYSE() or by calling listHolidays() of the timeDate package you'll see that there are many many functions that get every conceivable holiday directly. I'll let you pick the holidays you want, but a simple script might be like this: x-seq(as.Date(2011-01-01), as.Date(2011-12-31),by=day) GetHolidays - function(x) { years = as.POSIXlt(x)$year+1900 years = unique(years) holidays - NULL for (y in years) { #If you don't need the if/then statements to include which years something was a NYSE holiday, you should drop the loop since the holiday functions are vectorized if (y = 1885) holidays - c(holidays, as.character(USNewYearsDay(y))) if (y = 1885) holidays - c(holidays, as.character(USIndependenceDay(y))) if (y = 1885) holidays - c(holidays, as.character(USThanksgivingDay(y))) if (y = 1885) holidays - c(holidays, as.character(USChristmasDay(y))) } holidays = as.Date(holidays,format=%Y-%m-%d) ans = x %in% holidays return(ans) } This should return a boolean vector indicating which dates fall on the selected holidays: feel free to add/delete holidays as you wish. To get the actual holiday dates, this should work: x[GetHolidays(x)]. If you want to identify things by holiday, you'll only have to modify the script slightly. Let me know if I can help further! Michael Weylandt On Mon, Aug 1, 2011 at 4:57 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: To be specific, I only need to get rid of 2 NYSE holidays: Washington's Birthday and Good Friday. Is there a way to reduce the vector of NYSE holidays in timeDate by throwing out those two? Thank you! Dimitri On Mon, Aug 1, 2011 at 4:24 PM, R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com wrote: Don't know if this is sufficiently slick for this list (which never fails to impress me with quick and elegant solutions) but I would point out to you that GF is the only NYSE holiday falling in March or April so it shouldn't be hard to discard it if desired. Michael Weylandt On Aug 1, 2011, at 4:18 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Just to clarify - I realize that major is subjective here. Maybe I should say most common. But maybe there is a way for me to select from a list of all NYSE holidays and flag only some of them? Just not sure how to do it... Thanks! Dimitri On Mon, Aug 1, 2011 at 3:45 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I am trying to identify which ones of a vector of dates are US holidays. And, ideally, which is which. And I do not know (a-priori) which dates those should be. I have, for example: x-seq(as.Date(2011-01-01),as.Date(2011-12-31),by=day) (x) I think chron should help me here - but maybe I am not using it properly: library(chron) is.holiday(chron) # Says that none of those dates are holidays ?is.holiday says: holidays is an object that should be listing holidays. But I want to figure out which of my dates are US holidays and don't want to provide a list of Package timeDate does almost what I need: library(timeDate) holidayNYSE(2008:2010) holidayNYSE() However, I don't need all the NYSE holidays (like Good Friday). Just the major US holidays - New Years, MLK, Memorial Day, Independence Day, Labor Day, Halloween, Thanksgiving, Christmas. Is there any way to identify major US holidays? Thanks a lot! - Dimitri Liakhovitski marketfusionanalytics.com -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski marketfusionanalytics.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient way to reduce running time
Hi: Could you please provide a reproducible example? In your code, (i) n is undefined; (ii) logbp is undefined. A description of what you want to do and/or a reproducible example with an expected outcome would be useful. As the bottom of each e-mail to R-help says... PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dennis On Tue, Aug 2, 2011 at 4:05 AM, Kathie kathryn.lord2...@gmail.com wrote: Dear R users, Would you plz tell me how to avoid this for loop blow?? I think there might be a better way to reduce running time. -- ## y1 and y2 are n*1 vectors for (k in 1:n){ ymax - max( y1[k], y2[k] ) i - 0:ymax sums- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) maxsums - max(sums) sums - sums - maxsums lsum - log( sum(exp(sums)) ) + maxsums logbp[k] - y1[k] + y2[k] + lsum } Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/efficient-way-to-reduce-running-time-tp3711985p3711985.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying US holidays
Thanks a lot, Michael - that's exactly what I was looking for! Dimitri On Tue, Aug 2, 2011 at 9:48 AM, R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com wrote: Now that I'm back at my computer, I'll actually suggest you do something else entirely. If you look at the code of holidayNYSE() or by calling listHolidays() of the timeDate package you'll see that there are many many functions that get every conceivable holiday directly. I'll let you pick the holidays you want, but a simple script might be like this: x-seq(as.Date(2011-01-01), as.Date(2011-12-31),by=day) GetHolidays - function(x) { years = as.POSIXlt(x)$year+1900 years = unique(years) holidays - NULL for (y in years) { #If you don't need the if/then statements to include which years something was a NYSE holiday, you should drop the loop since the holiday functions are vectorized if (y = 1885) holidays - c(holidays, as.character(USNewYearsDay(y))) if (y = 1885) holidays - c(holidays, as.character(USIndependenceDay(y))) if (y = 1885) holidays - c(holidays, as.character(USThanksgivingDay(y))) if (y = 1885) holidays - c(holidays, as.character(USChristmasDay(y))) } holidays = as.Date(holidays,format=%Y-%m-%d) ans = x %in% holidays return(ans) } This should return a boolean vector indicating which dates fall on the selected holidays: feel free to add/delete holidays as you wish. To get the actual holiday dates, this should work: x[GetHolidays(x)]. If you want to identify things by holiday, you'll only have to modify the script slightly. Let me know if I can help further! Michael Weylandt On Mon, Aug 1, 2011 at 4:57 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: To be specific, I only need to get rid of 2 NYSE holidays: Washington's Birthday and Good Friday. Is there a way to reduce the vector of NYSE holidays in timeDate by throwing out those two? Thank you! Dimitri On Mon, Aug 1, 2011 at 4:24 PM, R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com wrote: Don't know if this is sufficiently slick for this list (which never fails to impress me with quick and elegant solutions) but I would point out to you that GF is the only NYSE holiday falling in March or April so it shouldn't be hard to discard it if desired. Michael Weylandt On Aug 1, 2011, at 4:18 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Just to clarify - I realize that major is subjective here. Maybe I should say most common. But maybe there is a way for me to select from a list of all NYSE holidays and flag only some of them? Just not sure how to do it... Thanks! Dimitri On Mon, Aug 1, 2011 at 3:45 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I am trying to identify which ones of a vector of dates are US holidays. And, ideally, which is which. And I do not know (a-priori) which dates those should be. I have, for example: x-seq(as.Date(2011-01-01),as.Date(2011-12-31),by=day) (x) I think chron should help me here - but maybe I am not using it properly: library(chron) is.holiday(chron) # Says that none of those dates are holidays ?is.holiday says: holidays is an object that should be listing holidays. But I want to figure out which of my dates are US holidays and don't want to provide a list of Package timeDate does almost what I need: library(timeDate) holidayNYSE(2008:2010) holidayNYSE() However, I don't need all the NYSE holidays (like Good Friday). Just the major US holidays - New Years, MLK, Memorial Day, Independence Day, Labor Day, Halloween, Thanksgiving, Christmas. Is there any way to identify major US holidays? Thanks a lot! - Dimitri Liakhovitski marketfusionanalytics.com -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski marketfusionanalytics.com -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Functions for Sum of determinants of ranges of matrix subsets
Hi: Try this: z - matrix(rnorm(100), nrow = 10) sum(sapply(seq_len(nrow(z)), function(k) det(z[-k, -k]))) [1] 1421.06 where sapply(seq_len(nrow(z)), function(k) det(z[-k, -k])) [1] 432.11613 81.65449 516.95791 54.72775 804.32097 -643.35436 [7] -411.15932 394.18780 84.13173 107.47665 HTH, Dennis On Tue, Aug 2, 2011 at 5:18 AM, john james dnt...@yahoo.com wrote: Dear R-help list, Pls I have this problem. Suppose I have a matrix of size nxn say, generated as follows z-matrix(rnorm(n*n,0,1),nrow=n) I want to write a function such that for i in 1:n, I will remove the rows and columns corresponding to i (so, will be left with n-1*n-1 submatrix in each cases). Now I need the sum of the determinant of each of this submatrices. As an example, if n=3, it means I will have det(1strow and 1stcolum removed) + det(2ndrow and 2ndcolum removed) + det(3rdrow and 3rdcolum removed). Any help will be appreciated. Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] execute r-code stored in a string variable
Hi Kim, You can use eval(parse(text = c)) Best, Ista On Tue, Aug 2, 2011 at 8:22 AM, Kim Lillesøe k...@dataminds.dk wrote: Dear all I have a simple R question. How do I execute R-code stored in a variable? E.g if I have a variable which contains some R-code: c = reg - lm(sales$sales~sales$price) Is it possible to execute c E.g like Exec(c) I hope someone can help. Thank you Kim Lillesře [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit in Aggregate()
On Aug 2, 2011, at 11:45 , Guillaume wrote: Dear all, I am trying to aggregate a table (divided in two lists here), but get a memory error. Here is the code I'm running : sessionInfo() print(paste(memory.limit() , memory.limit())) print(paste(memory.size() , memory.size())) print(paste(memory.size(TRUE) , memory.size(TRUE))) print(paste(size listX , object.size(listX))) print(paste(size listBy , object.size(listBy))) print(paste(length , object.size(nrow(listX tableAgg - aggregate(x = listX , by = listBy , FUN = max) It returns : R version 2.9.0 Patched (2009-05-09 r48513) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1];stats;graphics;grDevices;utils;datasets;methods;base other attached packages: [1];RODBC_1.3-2;HarpTools_1.4;HarpReport_1.9 loaded via a namespace (and not attached): [1];tools_2.9.0 [1];memory.limit() 4095 [1];memory.size() 31.92 [1];memory.size(TRUE) 166.94 [1];size listX 218312 [1];size listBy 408552 [1];length 9083 Erreur in vector(list, prod(extent)) : cannot allocate vector of length 1224643220 (the last line is translated from the french error message impossible d'allouer un vecteur de longueur 1224643220 ) Why would R create such a long vector (my original lists , and is there a way to avoid this error ? It would be easier if you described your data rather than just tell us their size, but as far as I can see, listX has about 50K columns and listBy has 100K. So you are trying to form a table of the max of 5 variables over the cartesian product of 10 classifiers? That's basically an infinite number of cells. Thank you for your help, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3711819.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with aggregate syntax for a multi-column function please.
Dear R-experts: I am using a function called AUC whose arguments are data, time, id, and dv. data is the name of the dataframe, time is the independent variable column name, id is the subject id and dv is the dependent variable. The function computes area under the curve by trapezoidal rule, for each subject id. I would like to embed this in aggregate to further subset by each Cycle, DoseDayNominal and Drug, but I can't seem to get the aggregate syntax correct. All the examples I can find use single column function such as mean, whereas this AUC function requires four arguments. Could someone kindly show me the syntax? This is what I've tried so far: AUC.DF- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x,tm,pt,conc) {AUC(x)}, tm=TimeBestEstimate, pt=Pt, conc=ConcentrationBQLzero ) AUC.DF- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x) {AUC(x,TimeBestEstimate, Pt, ConcentrationBQLzero )} ) AUC syntax is: args(AUC) function (data, time = TIME, id = ID, dv = DV) thanks Regards, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] identifying weeks (dates) that certain days (dates) fall into
Hello! I have dates for the beginning of each week, e.g.: weekly-data.frame(week=seq(as.Date(2010-04-01), as.Date(2011-12-26),by=week)) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4-as.Date(c(2010-07-04,2011-07-04)) I would like to flag the weeks in my weekly$week that contain those 2 individual dates. I can only think of a very clumsy way of doing it: myrows-c(which(weekly$week==weekly$week[weekly$weekjuly4[1]][1]-7), which(weekly$week==weekly$week[weekly$weekjuly4[2]][1]-7)) weekly$flag-0 weekly$flag[myrows]-1 It's clumsy - because actually, my vector of dates of interest (july4 above) is much longer. Is there maybe a more elegant way of doing it? Thank you! -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix indexing (igraph ?)
I realize that matrix indexing has been addressed in various flavors, but I'm stumped and didn't find anything in the archives. It's not clear if it is an igraph issue or a more general problem. Thanks in advance for your patience. I am using igraph to read a gml file (http://www-personal.umich.edu/~mejn/netdata/football.zip ). The gml file contains vertex attributes (conference and team) that are provided as character/integer values. I would like to build a matrix of dimension (length.team, length.conference) where the elements are zero except for 1's at the location of index [team, conference]. Here is a snippet of code that hopefully captures what I am trying to do: original-read.graph(./Data/football/football.gml, format=gml) conf.list- get.vertex.attribute(original, 'value', index=V(original))+1 team.list- get.vertex.attribute(original, 'id', index=V(original))+1 temp- matrix(0,115,12) temp[team.list, conf.list]-1 Unfortunately, temp[] is filled with 1's. However, if I try: c.list=c(1,3,5) t.list=c(2,4,6) temp[t.list,c.list]-1 then things work as I would expect. FWIW - I have tried as.integer(get.vertex.attribute(...)) with no luck. Thanks for any suggestions. * original-read.graph(./Data/football/football.gml, format=gml) conf.list- get.vertex.attribute(original, 'value', index=V(original))+1 team.list- get.vertex.attribute(original, 'id', index=V(original))+1 conf.list [1] 8 1 3 4 8 4 3 9 9 8 4 11 7 3 7 3 8 10 7 2 10 9 9 8 11 1 7 10 12 2 2 7 3 1 7 2 6 [38] 1 7 3 4 8 6 7 5 1 12 3 5 12 11 9 4 12 7 2 10 5 12 11 3 7 10 11 3 10 5 12 9 11 10 7 4 12 [75] 4 5 10 9 9 2 6 4 6 12 4 7 5 10 12 1 6 5 5 8 2 10 10 11 4 7 3 2 4 1 8 1 3 4 9 1 5 [112] 9 5 10 12 team.list [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [28] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 [82] 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 [109] 109 110 111 112 113 114 115 length(conf.list) [1] 115 length(team.list) [1] 115 temp- matrix(0,115,12) r-c(1,3,5) col- c(2,4,6) temp[r,col]-1 temp[1:10,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,]010101000 0 0 0 [2,]000000000 0 0 0 [3,]010101000 0 0 0 [4,]000000000 0 0 0 [5,]010101000 0 0 0 [6,]000000000 0 0 0 [7,]000000000 0 0 0 [8,]000000000 0 0 0 [9,]000000000 0 0 0 [10,]000000000 0 0 0 temp[team.list,conf.list]- 1 temp[1:10,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,]111111111 1 1 1 [2,]111111111 1 1 1 [3,]111111111 1 1 1 [4,]111111111 1 1 1 [5,]111111111 1 1 1 [6,]111111111 1 1 1 [7,]111111111 1 1 1 [8,]111111111 1 1 1 [9,]111111111 1 1 1 [10,]111111111 1 1 1 - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging lists within lists via time stamp
From multiple data.frames I created two lists, one with temperature, one with gps data. With your help and lapply I managed to interpolate the timestamps of gps and temperature data. Now I want to merge/join both lists via the time-stamp, taking only times, where both lists have data. For the single data-frames that worked just fine with: both - merge(gps,temp) For the two lists of data.frames I first tried an lapply over both lists...something like both - lapply(temp, gps, function(x){x - merge Then I found both-merge.list(gps,temp), but this doesn´t work either. It just transfers the first list gps to both Thanks for any hint, Thomas -- View this message in context: http://r.789695.n4.nabble.com/merging-lists-within-lists-via-time-stamp-tp3712631p3712631.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] execute r-code stored in a string variable
Yes, you can use: eval(parse(text=c)) On the other hand I would not recommend to use c as a variable name as it is the name of a very important function in the R language to aggregate data. HTH, Samuel -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kim Lillesøe Sent: 02 August 2011 13:22 To: r-help@R-project.org Subject: [R] execute r-code stored in a string variable Dear all I have a simple R question. How do I execute R-code stored in a variable? E.g if I have a variable which contains some R-code: c = reg - lm(sales$sales~sales$price) Is it possible to execute c E.g like Exec(c) I hope someone can help. Thank you Kim Lillesøe [[alternative HTML version deleted]] __ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identifying weeks (dates) that certain days (dates) fall into
The findInterval function should surely be tried in some form or another. On Aug 2, 2011, at 10:36 AM, Dimitri Liakhovitski wrote: Hello! I have dates for the beginning of each week, e.g.: weekly-data.frame(week=seq(as.Date(2010-04-01), as.Date(2011-12-26),by=week)) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4-as.Date(c(2010-07-04,2011-07-04)) I would like to flag the weeks in my weekly$week that contain those 2 individual dates. findInterval(july4, weekly$week) [1] 14 66 # works out of the box Provides an index you cna use with weekly$week I can only think of a very clumsy way of doing it: myrows-c(which(weekly$week==weekly$week[weekly$weekjuly4[1]][1]-7), which(weekly$week==weekly$week[weekly$weekjuly4[2]][1]-7)) weekly$flag-0 weekly$flag[myrows]-1 It's clumsy - because actually, my vector of dates of interest (july4 above) is much longer. Is there maybe a more elegant way of doing it? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with aggregate syntax for a multi-column function please.
Michael, The function aggregate() is not going to work for your situation. The function is applied to the individual columns of the subsetted data, not the subsetted data frame as a whole. The help file reads: Then, each of the variables (columns) in x is split into subsets of cases (rows) of identical combinations of the components of by, and FUN is applied to each such subset with further arguments in ... passed to it. If you can rewrite your function so that it is a function with one argument, the data frame alone, then using the by() function should give you what you need. Here is a simple example: df - data.frame(a=1:5, b=2:6, i=c(1, 1, 1, 2, 2)) junk - function(df) { sum(df$a^2) + prod(df$b) } data.frame(index=sort(unique(df$i)), results=as.vector(by(df[, c(a, b)], df$i, junk))) Hope this helps. Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: Michael Karol mka...@syntapharma.com To: r-help@r-project.org Date: 08/02/2011 09:35 AM Subject: [R] Help with aggregate syntax for a multi-column function please. Sent by: r-help-boun...@r-project.org Dear R-experts: I am using a function called AUC whose arguments are data, time, id, and dv. data is the name of the dataframe, time is the independent variable column name, id is the subject id and dv is the dependent variable. The function computes area under the curve by trapezoidal rule, for each subject id. I would like to embed this in aggregate to further subset by each Cycle, DoseDayNominal and Drug, but I can't seem to get the aggregate syntax correct. All the examples I can find use single column function such as mean, whereas this AUC function requires four arguments. Could someone kindly show me the syntax? This is what I've tried so far: AUC.DF- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x,tm,pt,conc) {AUC(x)}, tm=TimeBestEstimate, pt=Pt, conc=ConcentrationBQLzero ) AUC.DF- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x) {AUC(x,TimeBestEstimate, Pt, ConcentrationBQLzero )} ) AUC syntax is: args(AUC) function (data, time = TIME, id = ID, dv = DV) thanks Regards, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loops to assign a unique ID to a column
How about this? indx - unique(cbind(Dates, Groups)) indx DatesGroups [1,] 12/10/2010 A [2,] 12/10/2010 B [3,] 13/10/2010 A [4,] 13/10/2010 B [5,] 13/10/2010 C indx - data.frame(indx, id=1:nrow(indx)) indx Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 13/10/2010 A 3 4 13/10/2010 B 4 5 13/10/2010 C 5 newdata - merge(data, indx) newdata Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 12/10/2010 B 2 4 13/10/2010 A 3 5 13/10/2010 B 4 6 13/10/2010 C 5 -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Chandra Salgado Kent Sent: Tuesday, August 02, 2011 2:12 AM To: r-help@r-project.org Subject: [R] Loops to assign a unique ID to a column Dear R help, I am fairly new in data management and programming in R, and am trying to write what is probably a simple loop, but am not having any luck. I have a dataframe with something like the following (but much bigger): Dates-c(12/10/2010,12/10/2010,12/10/2010,13/10/2010, 13/10/2010, 13/10/2010) Groups-c(A,B,B,A,B,C) data-data.frame(Dates, Groups) I would like to create a new column in the dataframe, and give each distinct date by group a unique identifying number starting with 1, so that the resulting column would look something like: ID-c(1,2,2,3,4,5) The loop that I have started to write is something like this (but doesn't work!): data$ID-as.number(c()) for(i in unique(data$Dates)){ for(j in unique(data$Groups)){ data$ID[i,j]-i i-i+1 } } Am I on the right track? Any help on this is much appreciated! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] vglm: warnings and errors
Hello, I am using multinomial logit regression for the first time, and I am trying to understand the warnings and errors I get. My data consists of 200 to 600 samples with ~25 predictors (these are principal components). The response has three categories. I use the function vglm from the package VGAM, called as follows: fit1-vglm(fmla, data=tr, multinomial,weights=regwt, maxit=500) regwt are Epanechnikov weights In general, the regression works, but - often, one of the categories has posterior probability zero, but the remaining two probabilities are non-zero (although very small) - I receive many warnings of the following type: in checkwz(wz, m = M , trace = trace, wzeps = control$wzepsilo): n elements replaced by 1.819e-12 in tfun(mu = mu, y = y, w =w, res = FALSE, eta = eta, ...: fitted values close to 0 or 1 ... if I understand it correctly, these have to do with the variance of the predictions being too small? - In some cases, I get an error: Error in devmu[smallmu] = smy * log(smu): NAs are not allowed in subscripted arguments, sometimes this error goes away when I decrease the size of the training set. I would like to know if this is expected behavior for some types of data sets. The manual to VGAM states that multinomial is prone to numerical difficulties if the groups are separable and/or fitted probabilities are close to 0 or 1, but does not explain why. The latter could be my case. I have to run the regression on 10,000s of data sets, so I would like to find a setting in which things go smoothly (i.e. without errors) I realize that this is probably more of a methodological than technical question, but maybe you can give some rules of thumb about a suitable number of samples/predictors or point me to some literature that would help me understand my problems. Thanks Anna __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Display/show the evaluation result of R commands automatically
R-help and Barry Thank you for your suggestions. It works, and may I ask how I am able to do the opposite (disable the call back, so that I could control when to show and suppress the output). I would like to make a function to enable/disable the callback similar to the one as follow: enableOutput - function() { h - taskCallbackManager() h$add(function(expr, value, ok, visible) {if(!visible){print(value)};TRUE}) } disableOutput - function() { } This shows output feature (and use ''; to suppress the output) is the default behavior of Matlab which I find it quite useful (without having to type in the variable name again every time to see the result of the expression). So I am just curious to know how to do it in R. Best Regards, Anthony On 31 July 2011 20:16, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: h - taskCallbackManager() h$add(function(expr, value, ok, visible) {if(!visible){print(value)};TRUE}) On Sun, Jul 31, 2011 at 12:15 PM, Anthony Ching Ho Ng anthony.ch...@gmail.com wrote: Hello R-help, I wonder if it is possible to configure R, so that it will display/show the evaluation result of the R commands automatically (similar to the behavior of Matlab) i.e. If I type x - 8 it will print 8 in the command prompt, instead of having type x explicitly to show the result and perhaps put an ; at the end to suppress the output. i.e. x - 8; The first thing I think you can do by adding a task callback manager to print the value if the value would otherwise be invisible: h - taskCallbackManager() h$add(function(expr, value, ok, visible) {if(!visible){print(value)};TRUE}) The semicolon thing would probably need rewriting bits of R at the C code level. I don't think many people would use it though. And my code above might break things. I don't use it. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identifying weeks (dates) that certain days (dates) fall into
On Tue, Aug 2, 2011 at 10:36 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I have dates for the beginning of each week, e.g.: weekly-data.frame(week=seq(as.Date(2010-04-01), as.Date(2011-12-26),by=week)) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4-as.Date(c(2010-07-04,2011-07-04)) I would like to flag the weeks in my weekly$week that contain those 2 individual dates. I can only think of a very clumsy way of doing it: myrows-c(which(weekly$week==weekly$week[weekly$weekjuly4[1]][1]-7), which(weekly$week==weekly$week[weekly$weekjuly4[2]][1]-7)) weekly$flag-0 weekly$flag[myrows]-1 It's clumsy - because actually, my vector of dates of interest (july4 above) is much longer. Is there maybe a more elegant way of doing it? Thank you! This gives myrows: as.numeric(july4 - weekly[1,1]) %/% 7 + 1 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard Deviation of a matrix
Thank you everyone for your kind input, I forgot to add that I have decimal points in my matrix ! Enclosed input file (reduced to 10 X 10 matrix), scripts and output for your suggesions: Code 1: library(stats) Matrix-read.table(test_input, head=T, sep= , dec=.) SD-sd(as.numeric(Matrix)) SD Output 1: library(stats) Matrix-read.table(test_input, head=T, sep=\t, dec=.) SD-sd(as.numeric(Matrix)) Error in sd(as.numeric(Matrix)) : (list) object cannot be coerced to type 'double' Execution halted Code 2: library(stats) Matrix-read.table(test_input, head=T, sep=\t, dec=.) dim(Matrix)-1 SD-sd(Matrix) SD Output: library(stats) Matrix-read.table(test_input, head=T, sep=\t, dec=.) dim(Matrix)-1 Error in dim(Matrix) - 1 : dims [product 1] do not match the length of object [10] Execution halted Code 3: library(stats) Matrix-read.table(test_input, head=T, sep=\t, dec=.) SD-sd(c(Matrix)) SD Output: library(stats) Matrix-read.table(test_input, head=T, sep=\t, dec=.) SD-sd(c(Matrix)) Error: is.atomic(x) is not TRUE Execution halted Any ideas, what am I missing here ? TIA chakri Input file: http://r.789695.n4.nabble.com/file/n3712328/test_input test_input -- View this message in context: http://r.789695.n4.nabble.com/Standard-Deviation-of-a-matrix-tp3711991p3712328.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to get the percentile of a number in a vector
I'm familiar with the quantile() command, but what if I have a specific number that I want to know its location in a vector? I know that in known distributions, (for example the normal distribution), there is pnorm and qnorm, but how can I do it with unknown vector? thanks in advance _ Walla! Mail - [1]Get your free unlimited mail today References 1. http://www.walla.co.il/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit in Aggregate()
Hi Peter, Thanks for your answer. I made a mistake in the script I copied sorry ! The description of the object : listX has 3 column, listBy has 4 column, and they have 9000 rows : print(paste(ncol x , length((listX print(paste(ncol By , length((listBy print(paste(nrow , length((listX[[1]] [1];ncol x 3 [1];ncol By 4 [1];nrow 9083 It seems the large (=4) number of columns in listBy creates the troubles... Thanks, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3712671.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] My R code is not efficient
Dear R users, I have two n*1 integer vectors, y1 and y2, where n is very very large. I'd like to compute elbp = 4^(y1) * 5^(y2) * sum_{i=0}^{max(y1, y2)} [{ (y1-i)! * (i)! * (y2-i)! }^(-1)]; that is, I need to compute elbp for each (y1, y2) pair. So I made R code like below, but I don't think it's efficient Would you plz tell me how to avoid this for loop blow?? -- for (k in 1:n){ ymax - max( y1[k], y2[k] ) i - 0:ymax sums- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) maxsums - max(sums) sums - sums - maxsums lsum - log( sum(exp(sums)) ) + maxsums lbp[k] - y1[k]*log(4) + y2[k]*log(5) + lsum } elbp - exp(lbp) Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/My-R-code-is-not-efficient-tp3712762p3712762.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract p value from coxme object
Dear R experts; I am trying to extract the p values from a coxme object (package coxme). I can see the value in the model output, but I wanted to have the result with a higher number of decimal places. I have searched the mailing list and followed equivalent suggestions for nlme/lme objects, but I wasn't successful. Thanks; Catarina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] My R code is not efficient
?expand.grid --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Kathie kathryn.lord2...@gmail.com wrote: Dear R users, I have two n*1 integer vectors, y1 and y2, where n is very very large. I'd like to compute elbp = 4^(y1) * 5^(y2) * sum_{i=0}^{max(y1, y2)} [{ (y1-i)! * (i)! * (y2-i)! }^(-1)]; that is, I need to compute elbp for each (y1, y2) pair. So I made R code like below, but I don't think it's efficient Would you plz tell me how to avoid this for loop blow?? _ for (k in 1:n){ ymax - max( y1[k], y2[k] ) i - 0:ymax sums- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) maxsums - max(sums) sums - sums - maxsums lsum - log( sum(exp(sums)) ) + maxsums lbp[k] - y1[k]*log(4) + y2[k]*log(5) + lsum } elbp - exp(lbp) _ Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/My-R-code-is-not-efficient-tp3712762p3712762.html Sent from the R help mailing list archive at Nabble.com. _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get the percentile of a number in a vector
On Aug 2, 2011, at 10:14 AM, ראובן אברמוביץ wrote: I'm familiar with the quantile() command, but what if I have a specific number that I want to know its location in a vector? I know that in known distributions, (for example the normal distribution), there is pnorm and qnorm, but how can I do it with unknown vector? ?ecdf -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard Deviation of a matrix
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of chakri Sent: Tuesday, August 02, 2011 6:31 AM To: r-help@r-project.org Subject: Re: [R] Standard Deviation of a matrix Thank you everyone for your kind input, I forgot to add that I have decimal points in my matrix ! Enclosed input file (reduced to 10 X 10 matrix), scripts and output for your suggesions: Code 1: library(stats) Matrix-read.table(test_input, head=T, sep= , dec=.) SD-sd(as.numeric(Matrix)) SD First, your data attachment did not come through the list. Second, decimals are not a problem. Third, you don't have a matrix, you have a data frame (read.table produces data frames). As long as all columns are numeric you could do something like sd(c(as.matrix(m))) You could also convert to a matrix on input if you really don't need a dataframe for different column types. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get the percentile of a number in a vector
Would this work for you? if you want to know where the i-th element falls percentage-wise in the distribution of a vector: sum(x = x[i])/length(x) This could be turned into a function: pEmpirical - function(i,x) { if (length(i) 1) return(apply(as.matrix(i), 1, pEmpirical,x)) r = sum(x = x[i])/length(x) return(r) } Michael Weylandt 2011/8/2 ר×××× ××ר×××××¥ gantk...@walla.com I'm familiar with the quantile() command, but what if I have a specific number that I want to know its location in a vector? I know that in known distributions, (for example the normal distribution), there is pnorm and qnorm, but how can I do it with unknown vector? thanks in advance _ Walla! Mail - [1]Get your free unlimited mail today References 1. http://www.walla.co.il/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SSOAP chemspider
Has anyone got SSOAP working on anything besides KEGG? I just tried another 3 SOAP servers. Both the WSDL and constructing the .SOAP call. Again the perl and ruby interface worked without any hitches. Paul library(SSOAP) massBank-processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ In addition: Warning message: In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) : Ignoring additional serviceport ... elements metlin-processWSDL(http://metlin.scripps.edu/soap/metlin.wsdl;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ pubchem-processWSDL(http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ On 20 Jul 2011, at 01:54, Benton, Paul wrote: Dear all, I've been trying on and off for the past few months to get SSOAP to work with chemspider. First I tried the WSDL file: cs-processWSDL(http://www.chemspider.com/MassSpecAPI.asmx?WSDL;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ In addition: Warning message: In processWSDL(http://www.chemspider.com/MassSpecAPI.asmx?WSDL;) : Ignoring additional serviceport ... elements Next I've tried using just the pure .SOAP to call the database. s - SOAPServer(http://www.chemspider.com/MassSpecAPI.asmx;) csid- .SOAP(s, SearchByMass2, mass=89.04767, range=0.01, action = I(http://www.chemspider.com/SearchByMass2;), xmlns = c(http://www.chemspider.com;), .opts = list(verbose = TRUE)) This seems to work and gives back a result. However, this result isn't the right result. It's seems to have converted the mass into 0. When I run the similar program in perl I get the correct id's. So this isn't a server side problem but SSOAP. Any thoughts or suggestions on other packages to use? Further infomation about the SeachByMass2 method and it's xml that it's expecting. http://www.chemspider.com/MassSpecAPI.asmx?op=SearchByMass2 Cheers, Paul PS Placing a fake error in the .SOAP code I can look at the xml it's sending to the server: Browse[1] doc ?xml version=1.0? SOAP-ENV:Envelope xmlns:SOAP-ENC=http://schemas.xmlsoap.org/soap/encoding/; xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:xsd=http://www.w3.org/2001/XMLSchema; SOAP-ENV:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/; SOAP-ENV:Body ns:SearchByMass2 xmlns:ns=http://www.chemspider.com; ns:mass89.04767/ns:mass ns:range0.01/ns:range /ns:SearchByMass2 /SOAP-ENV:Body /SOAP-ENV:Envelope __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get the percentile of a number in a vector
Does this help? x - c(3, 8, 5, 2, 9, 33, 21) # the 43rd percentile quantile(x, 0.43) # the proportion of the distribution that is less than 7 mean(x7) Jean `·.,, (((? `·.,, (((? `·.,, (((? Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: øàåáï àáøîåáéõ gantk...@walla.com To: r-help@r-project.org Date: 08/02/2011 10:51 AM Subject: [R] how to get the percentile of a number in a vector Sent by: r-help-boun...@r-project.org I'm familiar with the quantile() command, but what if I have a specific number that I want to know its location in a vector? I know that in known distributions, (for example the normal distribution), there is pnorm and qnorm, but how can I do it with unknown vector? thanks in advance _ Walla! Mail - [1]Get your free unlimited mail today References 1. http://www.walla.co.il/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit in Aggregate()
On Aug 2, 2011, at 17:10 , Guillaume wrote: Hi Peter, Thanks for your answer. I made a mistake in the script I copied sorry ! The description of the object : listX has 3 column, listBy has 4 column, and So what is the contents of listBy? If they are all factors with 100 levels, then you're looking at a table with 10^8 entries... they have 9000 rows : print(paste(ncol x , length((listX print(paste(ncol By , length((listBy print(paste(nrow , length((listX[[1]] [1];ncol x 3 [1];ncol By 4 [1];nrow 9083 It seems the large (=4) number of columns in listBy creates the troubles... Thanks, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3712671.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with aggregate syntax for a multi-column function please.
Hi: Another way to do this is to use one of the summarization packages. The following uses the plyr package. The first step is to create a function that takes a data frame as input and outputs either a data frame or a scalar. In this case, the function returns a scalar, but if you want to carry along additional variables in the output, you can replace it with a data frame that returns the set of variables you want. You don't need to return the grouping variables, but no harm is done if you do. # This assumes the existence of a function AUC with the arguments # you stated in your post. I presume it returns a scalar value; if not, # you should modify it to return a data frame instead. It would probably # be better to modify AUC and call it in ddply() directly, but without the # function code there's not much one can do... myAUC - function(df) AUC(df, 'TimeBestEstimate', 'Pt','ConcentrationBQLzero') library('plyr') ddply(PKdata, .(Cycle, DoseDayNominal, Drug), myAUC) This is obviously untested, so caveat emptor. Both plyr and data.table can accept functions with multiple arguments and do the right thing. The trick in plyr is to write a function that takes a generic input object (e.g., a (sub)data frame) and then uses (the variables within) it to do the necessary calculations. Generally, you want the output of the function to be compatible with the type of output you want from the **ply() function. In this case, ddply() means data frame input, data frame output; alply() would mean array input and list output, etc. If this doesn't work, please provide a reproducible example. HTH, Dennis On Tue, Aug 2, 2011 at 7:32 AM, Michael Karol mka...@syntapharma.com wrote: Dear R-experts: I am using a function called AUC whose arguments are data, time, id, and dv. data is the name of the dataframe, time is the independent variable column name, id is the subject id and dv is the dependent variable. The function computes area under the curve by trapezoidal rule, for each subject id. I would like to embed this in aggregate to further subset by each Cycle, DoseDayNominal and Drug, but I can't seem to get the aggregate syntax correct. All the examples I can find use single column function such as mean, whereas this AUC function requires four arguments. Could someone kindly show me the syntax? This is what I've tried so far: AUC.DF- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x,tm,pt,conc) {AUC(x)}, tm=TimeBestEstimate, pt=Pt, conc=ConcentrationBQLzero ) AUC.DF- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x) {AUC(x,TimeBestEstimate, Pt, ConcentrationBQLzero )} ) AUC syntax is: args(AUC) function (data, time = TIME, id = ID, dv = DV) thanks Regards, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice: index plot
Dear all, How can I make an index plot with lattice, that is plotting a vector simply against its particular index in the vector, i.e. something similar to y - rnorm(10) plot(y) I don't want to specify the x's manually, as this could become cumbersome when having multiple panels. I tried something like library(lattice) mp - function(x, y, ...) { x - 1:length(y) panel.xyplot(x, y, ...) } pp - function(x, y, ...) { list(xlim = extendrange(1:length(y)), ylim = extendrange(y)) } set.seed(123) y - rnorm(10) xyplot(y ~ 1, panel = mp, prepanel = pp, xlab=Index) but I was wondering whether there is a more straightforward way? By the way, if I do not specify the ylim in the prepanel function the plot is clipped, but reading Deepayan's book, p.140 : [...], so a user-specified prepanel function is not required to return all of these components [i.e. xlim, ylim, xat, yat, dx and dy]; any missing component will be replaced by the corresponding default. I'd understand that if I do not specify ylim it is calculated automatically? Not a big thing though, but it seems to me to be inconsistent. Any help appreciated. KR, -Thorn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loops to assign a unique ID to a column
Whoa! 1. First and most important, there is very likely no reason you need to do this. R can handle multiple groupings automatically in fitting and plotting without creating artificial labels of the sort you appear to want to create. Please read an Intro to R and/or get help to see how. 2. The solution offered below is unnecessarily convoluted. Here is a simpler and faster one: z - within(z, indx - as.numeric(interaction(Dates,Groups, drop=TRUE, lex.order=TRUE))) Explanation: interaction() produces all possible combinations the individual groupings; drop=FALSE throws away any unused combinations, lex.order-TRUE lexicographically orders the levels as you indicated. ?interaction for details. By default, the result of the above is a factor, which as.numeric() converts to the numeric codes used in factor representations. ?factor . Finally, within() interprets and makes changes within z. The changed result is then assigned back to z so that it is not lost. ?within Cheers, Bert On Tue, Aug 2, 2011 at 8:36 AM, David L Carlson dcarl...@tamu.edu wrote: How about this? indx - unique(cbind(Dates, Groups)) indx Dates Groups [1,] 12/10/2010 A [2,] 12/10/2010 B [3,] 13/10/2010 A [4,] 13/10/2010 B [5,] 13/10/2010 C indx - data.frame(indx, id=1:nrow(indx)) indx Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 13/10/2010 A 3 4 13/10/2010 B 4 5 13/10/2010 C 5 newdata - merge(data, indx) newdata Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 12/10/2010 B 2 4 13/10/2010 A 3 5 13/10/2010 B 4 6 13/10/2010 C 5 -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Chandra Salgado Kent Sent: Tuesday, August 02, 2011 2:12 AM To: r-help@r-project.org Subject: [R] Loops to assign a unique ID to a column Dear R help, I am fairly new in data management and programming in R, and am trying to write what is probably a simple loop, but am not having any luck. I have a dataframe with something like the following (but much bigger): Dates-c(12/10/2010,12/10/2010,12/10/2010,13/10/2010, 13/10/2010, 13/10/2010) Groups-c(A,B,B,A,B,C) data-data.frame(Dates, Groups) I would like to create a new column in the dataframe, and give each distinct date by group a unique identifying number starting with 1, so that the resulting column would look something like: ID-c(1,2,2,3,4,5) The loop that I have started to write is something like this (but doesn't work!): data$ID-as.number(c()) for(i in unique(data$Dates)){ for(j in unique(data$Groups)){ data$ID[i,j]-i i-i+1 } } Am I on the right track? Any help on this is much appreciated! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identifying weeks (dates) that certain days (dates) fall into
Hi: You could try the lubridate package: library(lubridate) week(weekly$week) week(july4) [1] 27 27 week function (x) yday(x)%/%7 + 1 environment: namespace:lubridate which is essentially Gabor's code :) HTH, Dennis On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I have dates for the beginning of each week, e.g.: weekly-data.frame(week=seq(as.Date(2010-04-01), as.Date(2011-12-26),by=week)) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4-as.Date(c(2010-07-04,2011-07-04)) I would like to flag the weeks in my weekly$week that contain those 2 individual dates. I can only think of a very clumsy way of doing it: myrows-c(which(weekly$week==weekly$week[weekly$weekjuly4[1]][1]-7), which(weekly$week==weekly$week[weekly$weekjuly4[2]][1]-7)) weekly$flag-0 weekly$flag[myrows]-1 It's clumsy - because actually, my vector of dates of interest (july4 above) is much longer. Is there maybe a more elegant way of doing it? Thank you! -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
On Aug 1, 2011, at 20:50 , David L Carlson wrote: Actually Sara's method fails if the insertion is after the first or before the last column: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[,1], newcol, x[,2:ncol(x)]) Sarah (sic) is on the right track, just lose the commas so that you don't drop to a vector: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[1], newcol, x[2:ncol(x)]) A newcol B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 Also notice that there is a named form of cbind cbind(x[1], foo=4:6, x[2:ncol(x)]) A foo B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 and that things will work (mostly) with matrices and data frames too: newcol - data.frame(x=4:6,y=6:4) cbind(x[1], newcol, x[2:ncol(x)]) A x y B C D E 1 1 4 6 1 1 1 1 2 2 5 5 2 2 2 2 3 3 6 4 3 3 3 3 cbind(x[1], as.matrix(newcol), x[2:ncol(x)]) A x y B C D E 1 1 4 6 1 1 1 1 2 2 5 5 2 2 2 2 3 3 6 4 3 3 3 3 (The mostly bit refers to some slight oddness occurring if you cbind a matrix with no column names: cbind(x[1], cbind(4:6,7:9), x[2:ncol(x)]) A 1 2 B C D E 1 1 4 7 1 1 1 1 2 2 5 8 2 2 2 2 3 3 6 9 3 3 3 3 ) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] density plot for weighted data
I'm trying to create a density plot using census data, where the weights don't sum to 1. plot(density(oh$FINCP,weights=oh$PWGTP)) Warning message: In density.default(oh$FINCP, weights = oh$PWGTP) : sum(weights) != 1 -- will not get true density How would I go about doing this? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identifying weeks (dates) that certain days (dates) fall into
Thanks a lot, everyone! Dimitri On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: You could try the lubridate package: library(lubridate) week(weekly$week) week(july4) [1] 27 27 week function (x) yday(x)%/%7 + 1 environment: namespace:lubridate which is essentially Gabor's code :) HTH, Dennis On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I have dates for the beginning of each week, e.g.: weekly-data.frame(week=seq(as.Date(2010-04-01), as.Date(2011-12-26),by=week)) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4-as.Date(c(2010-07-04,2011-07-04)) I would like to flag the weeks in my weekly$week that contain those 2 individual dates. I can only think of a very clumsy way of doing it: myrows-c(which(weekly$week==weekly$week[weekly$weekjuly4[1]][1]-7), which(weekly$week==weekly$week[weekly$weekjuly4[2]][1]-7)) weekly$flag-0 weekly$flag[myrows]-1 It's clumsy - because actually, my vector of dates of interest (july4 above) is much longer. Is there maybe a more elegant way of doing it? Thank you! -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice: index plot
Does xyplot(y ~ seq_along(y), xlab = Index) do what you want? Peter Ehlers On 2011-08-02 09:07, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote: Dear all, How can I make an index plot with lattice, that is plotting a vector simply against its particular index in the vector, i.e. something similar to y- rnorm(10) plot(y) I don't want to specify the x's manually, as this could become cumbersome when having multiple panels. I tried something like library(lattice) mp- function(x, y, ...) { x- 1:length(y) panel.xyplot(x, y, ...) } pp- function(x, y, ...) { list(xlim = extendrange(1:length(y)), ylim = extendrange(y)) } set.seed(123) y- rnorm(10) xyplot(y ~ 1, panel = mp, prepanel = pp, xlab=Index) but I was wondering whether there is a more straightforward way? By the way, if I do not specify the ylim in the prepanel function the plot is clipped, but reading Deepayan's book, p.140 : [...], so a user-specified prepanel function is not required to return all of these components [i.e. xlim, ylim, xat, yat, dx and dy]; any missing component will be replaced by the corresponding default. I'd understand that if I do not specify ylim it is calculated automatically? Not a big thing though, but it seems to me to be inconsistent. Any help appreciated. KR, -Thorn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density plot for weighted data
On Aug 2, 2011, at 12:51 PM, r student wrote: I'm trying to create a density plot using census data, where the weights don't sum to 1. plot(density(oh$FINCP,weights=oh$PWGTP)) Warning message: In density.default(oh$FINCP, weights = oh$PWGTP) : sum(weights) != 1 -- will not get true density How would I go about doing this? Wouldn't you just divide by the sum? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data frame to matrix - revisited
Hi, I've tried to look through all the previous related Threads/posts but can't find a solution to what's probably a simple question. I have a data frame comprised of three columns e.g.: ID1 ID2 Value a b 1 b d 1 c a 2 c e 1 d a 1 e d 2 I'd like to convert the data to a matrix i.e.: a b c d e a n/a 1 2 1 n/a b 1 n/a n/a 1 n/a c 2 n/a n/a n/a 1 d 1 1 n/a n/a 2 e n/a n/a 1 2 n/a Any help is much appreciated, Jagz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit in Aggregate()
Hi Peter, Yes I have a large number of factors in the listBy table. Do you mean that aggregate() creates a complete cartesian product of the by columns ? (and creates combinations of values that do not exist in the orignial by table, before removing them when returning the aggregated table?) Thanks a lot, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3713042.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
Thanks for this Peter: Sarah (sic) is on the right track, just lose the commas so that you don't drop to a vector: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[1], newcol, x[2:ncol(x)]) A newcol B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 Am I correct in saying that this is a bit subtle: x[1] and x[2:ncol(x)] are actually lists with vector components; so you're cbinding lists, which retain the labels, no? If so, it's a nice subtlety to remember, anyway. -- Bert -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density plot for weighted data
On Aug 2, 2011, at 1:11 PM, r student wrote: Like below? plot(density(oh$FINCP,weights=oh$PWGTP/sum(oh$PWGTP))) I don't understand why you are asking for approval. You are the one with the data and know where they came from. We have none of that background. -- David. On Tue, Aug 2, 2011 at 10:06 AM, David Winsemius dwinsem...@comcast.net wrote: On Aug 2, 2011, at 12:51 PM, r student wrote: I'm trying to create a density plot using census data, where the weights don't sum to 1. plot(density(oh$FINCP,weights=oh$PWGTP)) Warning message: In density.default(oh$FINCP, weights = oh$PWGTP) : sum(weights) != 1 -- will not get true density How would I go about doing this? Wouldn't you just divide by the sum? -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame to matrix - revisited
Jagz, Assuming that your data frame is called df, try this ... tapply(df$Value, list(df$ID1, df$ID2), mean) Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA 715-627-4317, ext. 3125 (Office) 715-216-8014 (Cell) 715-623-6773 (FAX) http://www.glsc.usgs.gov (GLSC web site) http://profile.usgs.gov/jvadams (My homepage) jvad...@usgs.gov (E-mail) From: Jagz Bell jagzb...@yahoo.com To: r-help@R-project.org r-help@r-project.org Date: 08/02/2011 12:13 PM Subject: [R] Data frame to matrix - revisited Sent by: r-help-boun...@r-project.org Hi, I've tried to look through all the previous related Threads/posts but can't find a solution to what's probably a simple question. I have a data frame comprised of three columns e.g.: ID1 ID2 Value a b 1 b d 1 c a 2 c e 1 d a 1 e d 2 I'd like to convert the data to a matrix i.e.: a b c d e a n/a 1 2 1 n/a b 1 n/a n/a 1 n/a c 2 n/a n/a n/a 1 d 1 1 n/a n/a 2 e n/a n/a 1 2 n/a Any help is much appreciated, Jagz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting column in between -- better way?
On Aug 2, 2011, at 19:17 , Bert Gunter wrote: Thanks for this Peter: Sarah (sic) is on the right track, just lose the commas so that you don't drop to a vector: x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) newcol - 4:6 cbind(x[1], newcol, x[2:ncol(x)]) A newcol B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 Am I correct in saying that this is a bit subtle: x[1] and x[2:ncol(x)] are actually lists with vector components; so you're cbinding lists, which retain the labels, no? Well, to be precise they are obtained by indexing a data frame _as_ a list. The result of that is a data frame (always, which was the point). So you're cbind()-ing data frames, which is what you wanted to do all along. If so, it's a nice subtlety to remember, anyway. -- Bert -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit in Aggregate()
On Aug 2, 2011, at 19:09 , Guillaume wrote: Hi Peter, Yes I have a large number of factors in the listBy table. Do you mean that aggregate() creates a complete cartesian product of the by columns ? (and creates combinations of values that do not exist in the orignial by table, before removing them when returning the aggregated table?) Hm, at least in recent versions that shouldn't happen. The meat of aggregate.data.frame is ans - lapply(split(e, grp), FUN, ...) where grp is a numerical coding of the factor combination for each cell. That could conceivably contain some large values, but since it is numeric (and not a factor with levels, say, 0:(n1*n2*n3*n4-1)), split should not generate more groups than are present in data. Some of this stuff was rewritten in Jan 2010. You might want to try a version which is later than yours from May 2009... Thanks a lot, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3713042.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density plot for weighted data
Like below? plot(density(oh$FINCP,weights=oh$PWGTP/sum(oh$PWGTP))) On Tue, Aug 2, 2011 at 10:06 AM, David Winsemius dwinsem...@comcast.net wrote: On Aug 2, 2011, at 12:51 PM, r student wrote: I'm trying to create a density plot using census data, where the weights don't sum to 1. plot(density(oh$FINCP,weights=oh$PWGTP)) Warning message: In density.default(oh$FINCP, weights = oh$PWGTP) : sum(weights) != 1 -- will not get true density How would I go about doing this? Wouldn't you just divide by the sum? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract names from vector according to their values
Dear helpers, I can create a vector with the priority of the packages that came with R, like this: installed.packages()[,Priority]-my.vector my.vector base boot class cluster codetools base recommended recommended recommended recommended compiler datasets foreign graphics grDevices basebase recommendedbasebase gridKernSmooth lattice MASSMatrix base recommended recommended recommended recommended methods mgcv nlme nnet rpart base recommended recommended recommended recommended spatial splines statsstats4 survival recommendedbasebasebase recommended tcltk tools utils basebasebase How can I extract the names from this vector according to their priority? I.e. I want to create a vector from this with the names of the base packages, and another vector with the names of the recommended packages. Thank you Sverre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame to matrix - revisited
Hi: Here are a couple of ways. Since your data frame does not contain a 'c' in ID2, we redefine the factor to give it all five levels rather than the observed four: df - read.table(textConnection( + ID1 ID2 Value + a b 1 + b d 1 + c a 2 + c e 1 + d a 1 + e d 2), header = TRUE) str(df) str(df) 'data.frame': 6 obs. of 3 variables: $ ID1 : Factor w/ 5 levels a,b,c,d,..: 1 2 3 3 4 5 $ ID2 : Factor w/ 4 levels a,b,d,e: 2 3 1 4 1 3 $ Value: int 1 1 2 1 1 2 df$ID2 - factor(df$ID2, levels = letters[1:5]) str(df) 'data.frame': 6 obs. of 3 variables: $ ID1 : Factor w/ 5 levels a,b,c,d,..: 1 2 3 3 4 5 $ ID2 : Factor w/ 5 levels a,b,c,d,..: 2 4 1 5 1 4 $ Value: int 1 1 2 1 1 2 Now we're good... # (1) xtabs: with(df, xtabs(Value ~ ID1 + ID2) + xtabs(Value ~ ID2 + ID1)) ID2 ID1 a b c d e a 0 1 2 1 0 b 1 0 0 1 0 c 2 0 0 0 1 d 1 1 0 0 2 e 0 0 1 2 0 # (2) acast() in the reshape2 package: library('reshape2') v1 - acast(df, ID1 ~ ID2, value_var = 'Value', drop = FALSE, fill = 0) v2 - acast(df, ID2 ~ ID1, value_var = 'Value', drop = FALSE, fill = 0) v - v1 + v2 v[v == 0L] - NA v a b c d e a NA 1 2 1 NA b 1 NA NA 1 NA c 2 NA NA NA 1 d 1 1 NA NA 2 e NA NA 1 2 NA HTH, Dennis On Tue, Aug 2, 2011 at 10:00 AM, Jagz Bell jagzb...@yahoo.com wrote: Hi, I've tried to look through all the previous related Threads/posts but can't find a solution to what's probably a simple question. I have a data frame comprised of three columns e.g.: ID1 ID2 Value a b 1 b d 1 c a 2 c e 1 d a 1 e d 2 I'd like to convert the data to a matrix i.e.: a b c d e a n/a 1 2 1 n/a b 1 n/a n/a 1 n/a c 2 n/a n/a n/a 1 d 1 1 n/a n/a 2 e n/a n/a 1 2 n/a Any help is much appreciated, Jagz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract names from vector according to their values
On Aug 2, 2011, at 2:21 PM, Sverre Stausland wrote: Dear helpers, I can create a vector with the priority of the packages that came with R, like this: installed.packages()[,Priority]-my.vector my.vector base boot class cluster codetools base recommended recommended recommended recommended compiler datasets foreign graphics grDevices basebase recommendedbasebase gridKernSmooth lattice MASSMatrix base recommended recommended recommended recommended methods mgcv nlme nnet rpart base recommended recommended recommended recommended spatial splines statsstats4 survival recommendedbasebasebase recommended tcltk tools utils basebasebase How can I extract the names from this vector according to their priority? I.e. I want to create a vector from this with the names of the base packages, and another vector with the names of the recommended packages. names( my.vector[which(my.vector==recommended)]) [1] boot class cluster [4] codetools foreignKernSmooth [7] latticeMASS Matrix [10] mgcv nlme nnet [13] rpart spatialsurvival Note that some people may tell you that this form below should be preferred because the 'which' is superfluous. It is not. The [ function returns all the NA's fr reasons that are unclear to me. It is wiser to use `which` so that you get numerical indexing. names(my.vector[my.vector==recommended]) On my system it produces 493 items most of them NA's. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract names from vector according to their values
Sverre, Try this: my.list - split(names(my.vector), my.vector) my.list$base my.list$recommended Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: Sverre Stausland john...@fas.harvard.edu To: r-help@r-project.org Date: 08/02/2011 01:24 PM Subject: [R] Extract names from vector according to their values Sent by: r-help-boun...@r-project.org Dear helpers, I can create a vector with the priority of the packages that came with R, like this: installed.packages()[,Priority]-my.vector my.vector base boot class cluster codetools base recommended recommended recommended recommended compiler datasets foreign graphics grDevices basebase recommendedbasebase gridKernSmooth lattice MASSMatrix base recommended recommended recommended recommended methods mgcv nlme nnet rpart base recommended recommended recommended recommended spatial splines statsstats4 survival recommendedbasebasebase recommended tcltk tools utils basebasebase How can I extract the names from this vector according to their priority? I.e. I want to create a vector from this with the names of the base packages, and another vector with the names of the recommended packages. Thank you Sverre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract names from vector according to their values
Hi: One more possibility: names(my.vector[grep('recommended', my.vector)]) [1] Matrix boot class clustercodetools [6] foreignKernSmooth latticeMASS Matrix [11] mgcv nlme nnet rpart spatial [16] survival names(my.vector[grep('base', my.vector)]) [1] base compiler datasets graphics grDevices grid [7] methods splines stats stats4tcltk tools [13] utils HTH, Dennis On Tue, Aug 2, 2011 at 11:21 AM, Sverre Stausland john...@fas.harvard.edu wrote: Dear helpers, I can create a vector with the priority of the packages that came with R, like this: installed.packages()[,Priority]-my.vector my.vector base boot class cluster codetools base recommended recommended recommended recommended compiler datasets foreign graphics grDevices base base recommended base base grid KernSmooth lattice MASS Matrix base recommended recommended recommended recommended methods mgcv nlme nnet rpart base recommended recommended recommended recommended spatial splines stats stats4 survival recommended base base base recommended tcltk tools utils base base base How can I extract the names from this vector according to their priority? I.e. I want to create a vector from this with the names of the base packages, and another vector with the names of the recommended packages. Thank you Sverre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need to compute density as done by panel.histogram
Hi, This might be a simple problem but I don't know how to calculate a random variable density the way panel.histogram does it before it creates the actual density rectangles. The documentation says that it uses the density function but the actual code suggests that the hist.constructor function (which does not seem to be easily accessible). Any suggestion for computing the density values of foo$x in the following example will be welcome. require(lattice) set.seed(12345) foo1 - data.frame(x=rnorm(100,0,0.1),grp=1,by=rep(1:2,each=50),by2=rep(1:2,times=50)) foo2 - data.frame(x=rnorm(100,2,1),grp=2,by=rep(1:2,each=50),by2=rep(1:2,times=50)) foo - rbind(foo1,foo2) xplot - histogram(~x,data=foo, type='density') PS: the present question relates to a workaround for another problem previously submitted to the list ( https://stat.ethz.ch/pipermail/r-help/attachments/20110727/5f0a8853/attachment.pl). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3D Bar Graphs in ggplot2?
Does anyone know how to create a 3D Bargraph using ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a 2D bar graph with a 3D shaped bard. See attached excel file for an example. Before anyone asks I know that 3D looking bars don't add anything except prettiness. http://r.789695.n4.nabble.com/file/n3713305/Example.xlsx Example.xlsx -- View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3713305.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] density plot for weighted data
On Wed, Aug 3, 2011 at 5:11 AM, r student student...@gmail.com wrote: Like below? plot(density(oh$FINCP,weights=oh$PWGTP/sum(oh$PWGTP))) Yes If you are doing lots of analyses with weighted data you might want to look at the survey package. It also has a density estimator, in svysmooth(), which works very much the same way as density() for weighted data, but doesn't complain about rescaling the weights. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3D Bar Graphs in ggplot2?
On 8/2/2011 11:39 AM, wwreith wrote: Does anyone know how to create a 3D Bargraph using ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a 2D bar graph with a 3D shaped bard. See attached excel file for an example. It is not possible. Before anyone asks I know that 3D looking bars don't add anything except prettiness. That is being far too generous. Setting aside that prettiness may be subjective, if changing to a 3D effect only affected the prettiness, there would not be the negative reaction to it that there is. In fact, a 3D effect reduces the ability for a graph to be correctly understood. It distorts the data. When I see a 3D bar plot, I think This person wants me to think I've been presented with information, but they have deliberately chosen a format that distorts the data. I wonder what they are hiding? At least you didn't ask about a 3D pie chart. http://r.789695.n4.nabble.com/file/n3713305/Example.xlsx Example.xlsx -- View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3713305.html Sent from the R help mailing list archive at Nabble.com. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3D Bar Graphs in ggplot2?
On 11-08-02 2:39 PM, wwreith wrote: Does anyone know how to create a 3D Bargraph using ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a 2D bar graph with a 3D shaped bard. See attached excel file for an example. Before anyone asks I know that 3D looking bars don't add anything except prettiness. If you want graphs like that, you should be using Excel, not R. Duncan Murdoch http://r.789695.n4.nabble.com/file/n3713305/Example.xlsx Example.xlsx -- View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3713305.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to find the parameter of a power function to fit simulation data to it for the tail?
Dear R folks, having simulation data in a vector n2off, I know that they should be similar to a power function f [1], f(n) = n^(-1/r), r ∈ ℕ\{0}, and I want to find the value for r best fitting the simulation data. Furthermore I know that this is only true for big n, that means n2off(n) ~ f(n) ⇔ n2off(n)/f(n) → 1 for n → ∞. (The vector n2off is considered a function n2off(n).) I came up with the following example where I artificially munch(?) the values of a known function, n^(-½), and the fit should hopefully return r = 2, that means n^(-½). n - 1:10 # Should be more data points, but not useful for including into an email. n [1] 1 2 3 4 5 6 7 8 9 10 n2 - n**(-0.5) n2 [1] 1.000 0.7071068 0.5773503 0.500 0.4472136 0.4082483 0.3779645 [8] 0.3535534 0.333 0.3162278 set.seed(1); n2off - n2 + runif(1)/100 # for greater n the divisor should also be increased I guess. n2off [1] 1.0026551 0.7097619 0.5800054 0.5026551 0.4498687 0.4109034 0.3806196 [8] 0.3562085 0.3359884 0.3188829 Weighting fits(?) larger n higher or only from certain n on, for example n ≥ 100, is not considered in this example. And probably the data points are too small in this function. I have to admit that I am new to this topic and I am just overwhelmed what I have found when searching for »gafit« in the r-help archive [2] and »curve parameter fitting« in rseek.org [3]. Reading ?nlm, ?nlminb, ?opitimze and ?optim there are just too many options there. Reading about gafit [4] it says that it is not maintained. Additionally I am not sure if this could be turned into a linear model using log(n^(-1/r)) = -1/r log(n). Somewhere it said that linear regression models have to fulfill certain assumptions. So if somebody of you experienced users could point me to the “best” function or package to use here and some literature regarding this issue (fit only for big n) that would be much appreciated. Thank you in advance, Paul PS: Is that question too long for sending to the list and should I be less elaborate for further problems? [1] https://secure.wikimedia.org/wikipedia/en/wiki/Power_function [2] http://tolstoy.newcastle.edu.au/~rking/R/ [3] http://www.rseek.org/ [4] http://cran.r-project.org/web/packages/gafit/gafit.pdf signature.asc Description: This is a digitally signed message part __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inverse of FAQ 7.31.
Thanks to Peter Dalgaard and to Baptiste Auguie (off-list) for the insights they provided. cheers, Rolf turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculate mean ignore null
I have the following: Tout = c(., ., + -51.0, -9.6, -9.6, -9.6, -9.6, -9.6, -9.6, + -9.6, -9.5, -9.5, -9.6, -9.5, -9.6, -9.6, + -9.5, -9.4, -9.3, -9.3, -9.3, -9.2, -9.0, + -9.0, -8.9, -8.9, -8.9) How can I take the mean while ignoring the null values? I don't want to delete the ., just ignore. na.rm=TRUE does not work for this. Jeffrey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate mean ignore null
Are you sure it doesn't? na.rm=T works for me, so I think your problem is elsewhere. Specifically, the example given below consists of 27 character strings, not numbers, so there' so surprise R doesn't want to give you a mean -- to R, it's as logical as asking for the average of a and Q5 Try this: mean(as.double(Tout),na.rm=T) I get -11.048 with a warning message saying that some NA's were created in trying to coerce . to a double. Alternatively, if you didn't actually mean all those marks below (in which case I don't know what . is in R), try this: mean(Tout[Tout != . ]), but again -- that's likely not your problem. Michael Weylandt On Tue, Aug 2, 2011 at 5:47 PM, Jeffrey Joh johjeff...@hotmail.com wrote: I have the following: Tout = c(., ., + -51.0, -9.6, -9.6, -9.6, -9.6, -9.6, -9.6, + -9.6, -9.5, -9.5, -9.6, -9.5, -9.6, -9.6, + -9.5, -9.4, -9.3, -9.3, -9.3, -9.2, -9.0, + -9.0, -8.9, -8.9, -8.9) How can I take the mean while ignoring the null values? I don't want to delete the ., just ignore. na.rm=TRUE does not work for this. Jeffrey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with R
On Thu, 2011-07-28 at 11:58 -0400, Sarah Goslee wrote: Hi Mark, On Thu, Jul 28, 2011 at 10:44 AM, m...@statcourse.com wrote: 1. How can I plot the entire tree produced by rpart? What does plot() not do that you are expecting? Not do any labelling... ;-) text(tree) where `tree` is your fitted tree will add the labels after using `plot()` as per Sarah's reply. 2. How can I submit a vector of values to a tree produced by rpart and have it make an assignment? What does predict() not do that you are expecting? Indeed. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wrong values when projecting LatLong in UTM
Hi R helpers, I tried to convert a list of LatLong coordinates (in DD format) into UTM zone 11 NAD 27. I first tried this from PBSmapping: library(PBSmapping) LatLong-cbind(c(56.85359, 56.85478),c(-118.4109, -118.4035)) colnames(LatLong)-c(X,Y) attr(LatLong, projection) - UTM attr(LatLong, zone) - 11 UTM-convUL(LatLong) #and that's what I get UTM X Y 1 -120.9799 -1.068699 2 -120.9799 -1.068632 Now, the UTM values are supposed to be around: X Y 1 414040 6301764.2 2 414493 6301888.39 # I don't know what is wrong, So, I then tried with rgdal library(rgdal) UTM-project(LatLong, +proj=utm +zone=11+ellps=NAD27) # and that's what i get: projected point not finite projected point not finite # Errors might come from the fact that I do have DD format in the LatLong file. Maybe I should be able to specify that it is zone 11 north. What do you think? Colin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xlsx error
Hey All, I'm trying to use the xlsx package to read a series of excel spreadsheets into R, but my code is failing at the first step. I setwd into my the directory with the spreadsheets, and, as a test ask for the first one: read.xlsx(file = Argentina Final.xls, sheetIndex = 1) I promptly get an error message: Error in .jcall(row[[ir]], Lorg/apache/poi/xssf/usermodel/XSSFCell;, : method getCell with signature (I)Lorg/apache/poi/xssf/usermodel/XSSFCell; not found Anyone have any idea how to fix this issue? Thanks, Andrew Winterman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] example package for devel newcomers
Em Segunda 01 Agosto 2011, você escreveu: Is there a preferred language you would like to use in your package development? I randomly downloaded packages until I found some that helped me along my way, and might be able to help you pick one. If you are just looking at building a package of R functions and data you have developed, possibly the following example will get you started till you feel comfortable with the Writing R Extensions documentation (http://cran.r-project.org/doc/manuals/R-exts.pdf): Dan, your message is cool. Well, here is what my project is about: it is a package to embed php into R. Named Rphp for now. It is mostly done from scratch. I have loved R-exts.pdf. Great stuff. Why embed php into R? My primary purpose is to use web content management systems (WCMS) ready and extensively tested code from R cgi scripts. Someone more experienced with php might think of other uses. My approach is RAD(ical) and innovative (IMextremelyHO :-D) because: a) *any* php based WCMS can be used from R code with no php or html coding; b) output fully compliant with the website appearance; c) WCMS automatic upgrades and interfaces changes (skins or themes) will be so unlikey to cause need for maintenance in R cgi scripts; d) R cgi scripts will not demand changes in php code; e) the builtin php session support obviates the need for any special session coding by R (likely non-web) programmers; f) potential for improved analysis of web databases and even of systems surveillance tasks. During my explorations of the R interface for extensions and the time spent in this tiny project, some questions emerged. 1. my code uses no recursion but I do not really know what is inside php code. Stack size could be a concern. Has any of you there ever needed to allocate a new stack for a package? Is it better to wait for complaints (if anyone ever would like to try this package...)? 2. can R_registerRoutines be called more than once within the same library (the same DllInfo data) so that it can reconfigure itself on the fly? 3. Is it safe (I guess it is) to re-export a function pointer retrieved with R_GetCCallable? 4. when loading a second library (in this case libphp5.so) is it better to put it in the package library directory and load it using the 'char *path' member of DllInfo? Using a second library has implications: a) a given R setup can be limited to the user space without root access; b) in the case of desktops where someone might use Rphp, most systems do not have libphp5.so installed by default and installing it frequently means to install apache and all (many) related packages; c) many sysadmins do not have root access but can compile their own php version; d) building the libphp5.so may not be an easy task for many. 5. Similar to 3, is it safe to export functions of the second library? libphp5.so will not be registered to R and has some interesting functions that can be exported directly or as pointers within Rphp library. A stub function can be used. 6. related to 4, with the many machine architectures and operating systems around I think it is neither desirable nor feasible to distribute precompiled libphp5.so versions; the package itself can download (wget and curl are everywhere) and compile php. Compiling php is not a lengthy task (6m12.9s in my quadcore desktop) but is a lot tricky and demands several development packages not installed by default in desktop systems. Their installations would require root access. What is the suggested approach to deploy libphp5.so? 7. I do not know how to produce a version for windows if requested. I have only an old MSC++ 97 and lcc (current) and have xp in a virtual box. This concern includes php. Can I get help regarding windows in this list? It might mean actual work: adapting code, compiling, packaging, etc. Not sure what is needed. 8. system safety does not seem a concern regarding this use of php, but... Any suggestions? I guess some manual steps will be necessary because of potential security breaches related to the use of a second library. Patching php to produce a special build to be used as the package library would not be a trivial task and would demand updates at every new php version. Something I can't assure I can do. And would have to distribute the whole php source code: still have to study php licensing scheme. BTW, I copied Rdynpriv.h by hand to my include path to get access to 'struct _DllInfo' definition. The R install process did not copy this file. Am I doing something wrong here? Sorry for the lengthy message. Thanx for your help. -- Alexandre -- Alexandre Santos Aguiar, MD, SCT signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,