Re: [R] Vim R plugin-2
Jakson A. Aquino jaksonaquino at gmail.com writes: Dear R users, People who uses vim in Linux/Unix may be interested in checking the plugin for R that I'm developing: http://www.vim.org/scripts/script.php?script_id=2628 The plugin includes omni completion for R objects, code indentation and communication with R running in a terminal emulator (xterm or gnome-terminal). This last feature was already present in Johannes Ranke's plugin. I would like to know if you have any suggestions of improvements. Best regards, Excellent work! I wonder if this could be made more portable (e.g., it depends on perl, + R being on a tty terminal, which is not always the case). I'll try to look at it and see if I can port it so it works on windows. But the current communication method I use there are just the clipboard, not sure if it'll be possible. Any alternative ways of sending info both ways from R to any open process (vim) in windows? I have, and like, perl; the only think that is missing is the tty support on windows (I think!) -- Jose Quesada, PhD. Max Planck Institute, Center for Adaptive Behavior and Cognition -ABC-, Lentzeallee 94, office 224, 14195 Berlin http://www.josequesada.name/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caching of the .Rprofile file
Hi Tom, It seems that if I make a change to the .Rprofile file in my working directory, it is not immediately reflected when the session is restarted. (I am using statET and rJava) Is that something I should expect? No. Is your launch configuration of R in StatET configured such that it takes ${resource_loc} as working directory (Main tab of the launch configuration) ? This way you can select the directory you want as a working directory in the Project Explorer and launch R directly in there. If you do not launch R in that way it will take a default directory and therefore not load the .Rprofile from the specific directory you want to be the working directory. HTH, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Could you help me with a problem? I should put non-linear variables into zelig-model, how can that be done? I'm dealing with air pollution data, trying to find out daily associations between mortality and air pollutants. Weather variables used as confounders are in some cases non-linear. Since smoothing is not an option I don't know how to proceed. Thanks, Jaana _ Hotmail® has ever-growing storage! Dont worry about storage limits. rial_Storage1_052009 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] new book on (Perl and) R for computational biology
It looks like the correct link is: http://www.crcpress.com/product/isbn/9781420069730 On Fri, May 8, 2009 at 6:49 PM, Gabriel Valiente valie...@lsi.upc.edu wrote: There is a new book on (Perl and) R for computational biology, G. Valiente. Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R. Taylor Francis/CRC Press (2009) http://www.crcpress.com/product/isbn/9781420063677 I hope it will be of much use to R developers and users. Gabriel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strip labels: use xyplot() to plot columns in parallel with outer=TRUE
The following tinkers with the strip labels, where the different panels are for different levelf of a conditioning factor. tau - (0:5)/2.5; m - length(tau); n - 200; SD - 2 x0 - rnorm(n, mean=12.5, sd=SD) matdf - data.frame( x = as.vector(sapply((0:5)/2.5, function(s)x0+rnorm(n, sd=2*s))), y - rep(15+2.5*x0, m), taugp = factor(rep(tau, rep(n,m names(matdf) - c(x,y,taugp) lab - c(list(0 (No error in x)), lapply(tau[-1], function(x)substitute(A*s[z], list(A=x xyplot(y ~ x | taugp, data=matdf, strip=strip.custom(strip.names=TRUE, var.name=Add error with SD, sep=expression( = ), factor.levels=as.expression(lab))) Is there any way to get custom labels when the same is done by plotting, variables in parallel?: df - unstack(matdf, x ~ taugp) df$y - 15+2.5*x0 lab2 - c(list(0 (No error in x)), lapply(tau[-1], function(x)substitute(Add error with SD == A*s[z], list(A=x form - formula(paste(y ~ , paste(paste(X, tau, sep=), collapse=+))) xyplot(form, data=df, outer=TRUE) I'd hoped that the following would do the trick, but the first label is repeated in each panel, and the variable names are added: xyplot(form, data=df, outer=TRUE, strip=strip.custom(strip.names=TRUE, var.name=as.expression(lab2))) John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pdf transparency not working with Latex documents
Hello, I' using the pdf() device with bg=transparent to create plots to be used within a latex (beamer) presentation. Later on, I see that the background of my pdf() graphics is solid white in the final presentation. I'm using R-2.6.0, and I have also tried to set the version argument in pdf() to 1.5 and 1.6. Later versions are not accepted. Has anyone used transparency successfully in this way? Thanks, and best regards, Javier ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pdf transparency not working with Latex documents. Solved
Hi, I've found that after the call to pdf(), I had a posterior line: par(bg=white) that was creating this white background. Setting this to transparent works fine. Thanks, Javier ... Hello, I' using the pdf() device with bg=transparent to create plots to be used within a latex (beamer) presentation. Later on, I see that the background of my pdf() graphics is solid white in the final presentation. I'm using R-2.6.0, and I have also tried to set the version argument in pdf() to 1.5 and 1.6. Later versions are not accepted. Has anyone used transparency successfully in this way? Thanks, and best regards, Javier ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging two data frames with 3 common variables makes duplicated rows
Thomas, You are very clever! The meil2 data frame has twice the common variable combinations: meil2 dist sexe style meil 138F clas 02:43:17 238F free 02:24:46 338H clas 02:37:36 438H free 01:59:35 545F clas 03:46:15 645F free 02:20:15 745H clas 02:30:07 845H free 01:59:36 938F clas 02:43:17 10 38F free 02:24:46 11 38H clas 02:37:36 12 38H free 01:59:35 13 45F clas 03:46:15 14 45F free 02:20:15 15 45H clas 02:30:07 16 45H free 01:59:36 Keeping unique combinations merged correctly with the next data frame. This merge() function is more subtle than I first thought. That means when merging two data frames, if the resulting data frame has more rows than either former data frames, it means that there are duplicate combinations of the common variables in either or the two data frames. Thank you very much, I will try to be more careful about this. Rock Thomas Lumley wrote: On Fri, 8 May 2009, Rock Ouimet wrote: I am new to R (ex SAS user) , and I cannot merge two data frames without getting duplicated rows in the results. How to avoid this happening without using the unique() function? 1. First data frame is called tmv with 6 variables and 239 rows: tmv[1:10,] temps nomprenom sexe dist style 1 01:59:36 Cyr SteveH 45 free 2 02:09:55 Gosselin ErickH 45 free 3 02:12:18 Desfosses SachaH 45 free 4 02:12:23 Lapointe SebastienH 45 free 5 02:12:52LabrieMichelH 45 free 6 02:12:54 LeblancMichelH 45 free 7 02:13:02 Thibeault SylvainH 45 free 8 02:13:49Martel StephaneH 45 free 9 02:14:03Lavoie Jean-PhilippeH 45 free 10 02:14:05Boivin Jean-ClaudeH 45 free Its structure is: str(tmv) 'data.frame': 239 obs. of 6 variables: $ temps :Class 'times' atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923 ... .. ..- attr(*, format)= chr h:m:s $ nom : Factor w/ 167 levels Aubut,Audy,..: 45 84 55 105 98 110 158 117 109 22 ... $ prenom: Factor w/ 135 levels Alain,Alexandre,..: 128 33 121 122 93 93 130 126 63 59 ... $ sexe : Factor w/ 2 levels F,H: 2 2 2 2 2 2 2 2 2 2 ... $ dist : int 45 45 45 45 45 45 45 45 45 45 ... $ style : Factor w/ 2 levels clas,free: 2 2 2 2 2 2 2 2 2 2 ... 2. The second data frame is called meil2 with 4 variables and 16 rows; meil2[1:10,] dist sexe style meil 138F clas 02:43:17 238F free 02:24:46 338H clas 02:37:36 438H free 01:59:35 545F clas 03:46:15 645F free 02:20:15 745H clas 02:30:07 845H free 01:59:36 938F clas 02:43:17 10 38F free 02:24:46 Lines 9 and 1 appear to be the same in meil2, as do 2 and 10. If the 16 rows consist of two repeats of 8 rows that would explain why you are getting two copies of each individual in the output. unique(meil2) would have just the distinct rows. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.edu University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Merging-two-data-frames-with-3-common-variables-makes-duplicated-rows-tp23454018p23459790.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gantt chart but slightly different
Beata Czyz wrote: Hello, I am new to this list and rather new to graphics with R. I would like to make a chart like Gantt chart, something like that: ... but I would like to fill the different blocks of tasks with different pattern i.e. first blocks of Male 1 and Male 2 with pattern 1, second blocks of Male 1 and Male 2 with pattern 2 etc. Any idea? Hi Beata, This could be done by replacing the taskcolors argument in the gantt.chart function with an angle argument and passing that argument to the rect function that draws the bars. You could then get hatching of different directions instead of colors. Like this: gantt.chart-function(x=NULL,format=%Y/%m/%d,xlim=NULL, angle=c(45,45,90,90,135,135), priority.legend=FALSE,vgridpos=NULL,vgridlab=NULL,vgrid.format=%Y/%m/%d, half.height=0.25,hgrid=FALSE,main=,xlab=,cylindrical=FALSE) { (a great chunk of the gantt.chart function) rect(x$starts[x$labels==tasks[i]],topdown[i]-half.height, x$ends[x$labels==tasks[i]],topdown[i]+half.height, angle=angle[i],border=FALSE) (the rest of the gantt.chart function) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
On May 9, 2009, at 5:39 AM, Jaana Kettunen wrote: Could you help me with a problem? I should put non-linear variables into zelig-model, how can that be done? I'm dealing with air pollution data, trying to find out daily associations between mortality and air pollutants. Weather variables used as confounders are in some cases non-linear. Since smoothing is not an option I don't know how to proceed. You should search within the Zelig documentation for examples of regression splines. It is not difficult to find these. This document has many such examples and it was the first hit on a Google search: http://gking.harvard.edu/zelig/docs/zelig.pdf David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vim R plugin-2
Jakson Alves de Aquino wrote: Jose Quesada wrote: I'll try to look at it and see if I can port it so it works on windows. But the current communication method I use there are just the clipboard, not sure if it'll be possible. Unfortunately, I cannot help on Windows environment. Any alternative ways of sending info both ways from R to any open process (vim) in windows? Netbeans could be a fast and portable route for communication between R and vim without the need of using either pipes or files saved in disk as intermediary. Typing :help netbeans in vim should show netbeans documentation. R also has TCP support: write.socket(), read.socket(), etc. We could begin to explore this route... Great finding! I wish I had more time to dedicate to this. Have you seen: http://pyclewn.wiki.sourceforge.net/features+ In my view, R as a language is very good but the tools around it are not good. When a matlab person tries R, their first comments are always how poor the environment is. Sure, one can have a debugger (with a crappy GUI in TK), and there's some editor support, but it's kind of painful. Integreting an R debugger with something like pyclewn would be very good. Best, -jose -- Jose Quesada, PhD. Max Planck Institute, Center for Adaptive Behavior and Cognition -ABC-, Lentzeallee 94, office 224, 14195 Berlin http://www.josequesada.name/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vim R plugin-2
Hi Jose, Jose Quesada wrote: snip In my view, R as a language is very good but the tools around it are not good. When a matlab person tries R, their first comments are always how poor the environment is. Sure, one can have a debugger (with a crappy GUI in TK), and there's some editor support, but it's kind of painful. Integreting an R debugger with something like pyclewn would be very good. There's no integrated debugger yet, but the StatET plugin for Eclipse is one example of a mature development environment for R. Moreover it allows to leverage the Eclipse eco-system and its myriad of plug-ins. No painful experience at all for me.. http://www.walware.de/goto/statet Best, Tobias P.S. When I try Matlab my first comment is always how poor the language is ;-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
Yes, all of the numbers are positive. I actually have a Bayesian posterior sample of log likelihoods [i.e. thousands of ln(likelihood) scores]. I want to calculate the harmonic mean of these likelihoods, which means I need to convert them back into likelihoods [i.e. e^ln(likelihood)], calculate the harmonic mean, and then take the log of the mean. I have done this before in Mathematica, but I have a simulation pipeline written almost entirely in R, so it would be nice if I could do these calculations in R. If R cannot handle such small values, then perhaps there's a way to calculate the harmonic mean from the log likelihood scores without converting back to likelihoods? I am a biologist, not a mathematician, so any recommendations are welcome! Thanks! -Jamie spencerg wrote: Are all your numbers positive? If yes, have you considered using logarithms? I would guess it is quite rare for people to compute likelihoods. Instead I think most people use log(likelihoods). Most of the probability functions in R have an option of returning the logarithms. Hope this helps. Spencer joaks1 wrote: I need to perform some calculations with some extremely small numbers (i.e. likelihood values on the order of 1.0E-16,000). Even when using the double() function, R is rounding these values to zero. Is there any way to get R to deal with such small numbers? For example, I would like to be able to calculate e^-1 (i.e. exp(-1)) without the result being rounded to zero. I know I can do it in Mathematica, but I would prefer to use R if I can. Any help would be appreciated! Many Thanks in Advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Beyond-double-precision--tp23452471p23457955.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Citing R/Packages Question
I've had an email exchange with the authors of a recent paper in Nature who also made a good faith effort to cite both R and the quantreg package, and were told that the Nature house style didn't allow such citations so they were dropped from the published paper and the supplementary material appearing on the Nature website. Since the CRAN website makes a special effort to make prior versions of packages available, it would seem to me to be much more useful to cite version numbers than access dates. There are serious questions about the ephemerality of url citations, not all of which are adequately resolved by the Wayback machine, and access dating, but it would be nice to have some better standards for such contingent citations rather than leave authors at the mercy of copy editors. I would also be interested in suggestions by other contributors. url:www.econ.uiuc.edu/~rogerRoger Koenker email rkoen...@uiuc.edu Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On May 8, 2009, at 5:27 PM, Derek Ogle wrote: I used R and the quantreg package in a manuscript that is currently in the proofs stage. I cited both R and quantreg as suggested by citation() and noted the version of R and quantreg that I used in the main text as All tests were computed with the R v2.9.0 statistical programming language (R Development Core 2008). Quantile regressions were conducted with the quantreg v4.27 package (Koenker 2008) for R. The editor has asked me to also provide the date when the webpage was accessed for both R and quantreg. This does not seem like an appropriate request to me as both R and the quantreg package are versioned. This request seems to me to be the same as asking someone when they purchased commercial package X version Y (which I don't think would be asked). Am I thinking about this correctly or has the editor made a valid request? I would be interested in any comments or opinions. Dr. Derek H. Ogle Associate Professor of Mathematical Sciences and Natural Resources Northland College 1411 Ellis Avenue Box 112 Ashland, WI 715.682.1300 www.ncfaculty.net/dogle/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vim R plugin-2
Any alternative ways of sending info both ways from R to any open process (vim) in windows? On windows, I'd rather use ole automation. A few years ago I successfully used this plugin: http://www.vim.org/scripts/script.php?script_id=889 I haven't used it since though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] for loop
Hi, I need your help. I have a vector of numbers reflecting the switch in the perception of a figure. For a certain period I have positive numbers (which reflect the perception A) then the perception changes and I have negative numbers (perception B), and so on for 4 iterations. I need to take the rate of this switch for my analysis. Namely, I need a new vector with numbers which reflect how many digit follows in sequence before the change in perception (and then I have to take the reciprocal of these numbers in order to obtain the rate but is not a problem). For example, suppose that the new vector looks like this: new - c(5,7,8,9) , 5 numbers positive then the perception changes and 7 negative numbers follow, then it changes again and 8 positive follows and so on.. In brief I need to write a little script that detects the change in sign of the elements of the vector and count how many positive and how many negative digits there are in sequence. I would use the for loop, I started but then i don't know how to continue rate - vector() for(i in (length(a)) rate - (a[i] 0 ..can you help me? Alessandra -- View this message in context: http://www.nabble.com/for-loop-tp23459661p23459661.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rmysql linking to an old-ish mysql build
Jose Quesada wrote: Hi, I'm trying to get Rmysql to work o windows server 2008 64-bit. I have the latest build of mysql installed (mysql-5.1.34-winx64). Independent of the version number of MySQL (which is less than 6 months old): If you are talking about the RMySQL binary build on CRAN: It is build against a 32-bit version of MySQL. I am not sure if there is a safe way to build a binary that properly links against 64-bit MySQL given you are running 32-bit R. If there is, you have to install the package from sources yourself anyway. Best, Uwe Ligges When trying to load Rmysql, I got a warning that Rmysql is linking to an old-ish mysql build (5.0.67). I could do some basic stuff (the connection works) but it breaks when trying to read a large table. So I set up to use the buld 5.0.67 that Rmysql likes. 10 hrs later and after lots of sysadmin work, I have to call it quits. I couldn't make it work. Since this mysql 5.0.67 is pretty old, I was wondering if anyone has binaries for Rmysql that work for a more recent version. Maybe the authors of the package have plans to update it soon? I've tried the package on both R 2.9.0 and R2.8.1. If nothing comes up, I'll try to spend a few more hours on getting the old version to work. Thanks! Best, -Jose __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
Dear Jamie: The harmonic mean is exp(mean(logs)). Therefore, log(harmonic mean) = mean(logs). Does this make sense? Best Wishes, Spencer joaks1 wrote: Yes, all of the numbers are positive. I actually have a Bayesian posterior sample of log likelihoods [i.e. thousands of ln(likelihood) scores]. I want to calculate the harmonic mean of these likelihoods, which means I need to convert them back into likelihoods [i.e. e^ln(likelihood)], calculate the harmonic mean, and then take the log of the mean. I have done this before in Mathematica, but I have a simulation pipeline written almost entirely in R, so it would be nice if I could do these calculations in R. If R cannot handle such small values, then perhaps there's a way to calculate the harmonic mean from the log likelihood scores without converting back to likelihoods? I am a biologist, not a mathematician, so any recommendations are welcome! Thanks! -Jamie spencerg wrote: Are all your numbers positive? If yes, have you considered using logarithms? I would guess it is quite rare for people to compute likelihoods. Instead I think most people use log(likelihoods). Most of the probability functions in R have an option of returning the logarithms. Hope this helps. Spencer joaks1 wrote: I need to perform some calculations with some extremely small numbers (i.e. likelihood values on the order of 1.0E-16,000). Even when using the double() function, R is rounding these values to zero. Is there any way to get R to deal with such small numbers? For example, I would like to be able to calculate e^-1 (i.e. exp(-1)) without the result being rounded to zero. I know I can do it in Mathematica, but I would prefer to use R if I can. Any help would be appreciated! Many Thanks in Advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop
aledanda wrote: Hi, I need your help. I have a vector of numbers reflecting the switch in the perception of a figure. For a certain period I have positive numbers (which reflect the perception A) then the perception changes and I have negative numbers (perception B), and so on for 4 iterations. I need to take the rate of this switch for my analysis. Namely, I need a new vector with numbers which reflect how many digit follows in sequence before the change in perception (and then I have to take the reciprocal of these numbers in order to obtain the rate but is not a problem). For example, suppose that the new vector looks like this: new - c(5,7,8,9) , 5 numbers positive then the perception changes and 7 negative numbers follow, then it changes again and 8 positive follows and so on.. In brief I need to write a little script that detects the change in sign of the elements of the vector and count how many positive and how many negative digits there are in sequence. I would use the for loop, I started but then i don't know how to continue rate - vector() for(i in (length(a)) rate - (a[i] 0 ..can you help me? See ?sign and ?rle which together yield: a - c(-1, -2, -3, 1, 2, -1) rle(sign(a)) #Run Length Encoding # lengths: int [1:3] 3 2 1 # values : num [1:3] -1 1 -1 ## or just the vector you want is: rle(sign(a))$lengths [1] 3 2 1 Uwe Ligges Alessandra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with package SNOW on MacOS X 10.5.5
Hi Greg, I don't know if this is related to your problem, but I get the same error (on both ubuntu and fedora linux, R 2.9) and just found a very curious behaviour - snowfall apply functions don't like the variable name c. E.g.: c-1 sfLapply(1:10, exp) issues the same error you had posted, while subsequent rm(c) sfLapply(1:10, exp) runs fine. Rainer On Wed, 31 Dec 2008, Greg Riddick wrote: Hello All, I can run the lower level functions OK, but many of the higher level (eg. parSApply) functions are generating errors. When running the example (from the snow help docs) for parApply on MacOSX 10.5.5, I get the following error: cl - makeSOCKcluster(c(localhost,localhost)) sum(parApply(cl, matrix(1:100,10), 1, sum)) Error in do.call(fun, lapply(args, enquote)) : could not find function fun Any ideas? Do I possibly need MPI or PVM to run the Apply functions? Thanks, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Histogram frequencies with a normal pdf curve overlay
Dear List, When I plot a histogram with 'freq=FALSE' and overlay the histogram with a normal pdf curve, everything looks as expected, as follows: x - rnorm(1000) hist(x, freq=FALSE) curve(dnorm(x), add=TRUE, col=blue) What do I need to do if I want to show the frequencies (freq=TRUE) with the same normal pdf overlay, so that the plot would still look the same? Regards, Jacques platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.0 year 2008 month 10 day20 svn rev46754 language R version.string R version 2.8.0 (2008-10-20) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.
Sorry, but your professor offered me $500 NOT to do your assignments. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
G'day all, On Sat, 09 May 2009 08:01:40 -0700 spencerg spencer.gra...@prodsyse.com wrote: The harmonic mean is exp(mean(logs)). Therefore, log(harmonic mean) = mean(logs). Does this make sense? I think you are talking here about the geometric mean and not the harmonic mean. :) The harmonic mean is a bit more complicated. If x_i are positive values, then the harmonic mean is H= n / (1/x_1 + 1/x_2 + ... + 1/x_n) so log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n) now log(1/x_i) = -log(x_i) so if log(x_i) is available, the logarithm of the individual terms are easily calculated. But we need to calculate the logarithm of a sum from the logarithms of the individual terms. At the C level R's API has the function logspace_add for such tasks, so it would be easy to do this at the C level. But one could also implement the equivalent of the C routine using R commands. The way to calculate log(x+y) from lx=log(x) and ly=log(y) according to logspace_add is: max(lx,ly) + log1p(exp(-abs(lx-ly))) So the following function may be helpful: logadd - function(x){ logspace_add - function(lx, ly) max(lx, ly) + log1p(exp(-abs(lx-ly))) len_x - length(x) if(len_x 1){ res - logspace_add(x[1], x[2]) if( len_x 2 ){ for(i in 3:len_x) res - logspace_add(res, x[i]) } }else{ res - x } res } R set.seed(1) R x - runif(50) R lx - log(x) R log(1/mean(1/x)) ## logarithm of harmonic mean [1] -1.600885 R log(length(x)) - logadd(-lx) [1] -1.600885 Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.
That's typical, my profs used to do this to me all the time. G. On Sat, May 9, 2009 at 6:17 PM, Carl Witthoft c...@witthoft.com wrote: Sorry, but your professor offered me $500 NOT to do your assignments. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gabor Csardi gabor.csa...@unil.ch UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
Dear Berwin: Thanks for the elegant correction. Spencer Berwin A Turlach wrote: G'day all, On Sat, 09 May 2009 08:01:40 -0700 spencerg spencer.gra...@prodsyse.com wrote: The harmonic mean is exp(mean(logs)). Therefore, log(harmonic mean) = mean(logs). Does this make sense? I think you are talking here about the geometric mean and not the harmonic mean. :) The harmonic mean is a bit more complicated. If x_i are positive values, then the harmonic mean is H= n / (1/x_1 + 1/x_2 + ... + 1/x_n) so log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n) now log(1/x_i) = -log(x_i) so if log(x_i) is available, the logarithm of the individual terms are easily calculated. But we need to calculate the logarithm of a sum from the logarithms of the individual terms. At the C level R's API has the function logspace_add for such tasks, so it would be easy to do this at the C level. But one could also implement the equivalent of the C routine using R commands. The way to calculate log(x+y) from lx=log(x) and ly=log(y) according to logspace_add is: max(lx,ly) + log1p(exp(-abs(lx-ly))) So the following function may be helpful: logadd - function(x){ logspace_add - function(lx, ly) max(lx, ly) + log1p(exp(-abs(lx-ly))) len_x - length(x) if(len_x 1){ res - logspace_add(x[1], x[2]) if( len_x 2 ){ for(i in 3:len_x) res - logspace_add(res, x[i]) } }else{ res - x } res } R set.seed(1) R x - runif(50) R lx - log(x) R log(1/mean(1/x)) ## logarithm of harmonic mean [1] -1.600885 R log(length(x)) - logadd(-lx) [1] -1.600885 Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
The following packages support high precision precision arithmetic (and the last two also support exact arithmetic): bc - interface to bc calculator http://r-bc.googlecode.com gmp - interface to gmp (gnu multiple precision) http://cran.r-project.org/web/packages/gmp rSymPy - interface to sympy computer algebra system http://rsympy.googlecode.com Ryacas - interface to yacas computer algebra system http://ryacas.googlecode.com On Fri, May 8, 2009 at 4:54 PM, joaks1 joa...@gmail.com wrote: I need to perform some calculations with some extremely small numbers (i.e. likelihood values on the order of 1.0E-16,000). Even when using the double() function, R is rounding these values to zero. Is there any way to get R to deal with such small numbers? For example, I would like to be able to calculate e^-1 (i.e. exp(-1)) without the result being rounded to zero. I know I can do it in Mathematica, but I would prefer to use R if I can. Any help would be appreciated! Many Thanks in Advance! -- View this message in context: http://www.nabble.com/Beyond-double-precision--tp23452471p23452471.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram frequencies with a normal pdf curve overlay
On 09-May-09 16:10:42, Jacques Wagnor wrote: Dear List, When I plot a histogram with 'freq=FALSE' and overlay the histogram with a normal pdf curve, everything looks as expected, as follows: x - rnorm(1000) hist(x, freq=FALSE) curve(dnorm(x), add=TRUE, col=blue) What do I need to do if I want to show the frequencies (freq=TRUE) with the same normal pdf overlay, so that the plot would still look the same? Regards, Jacques Think first about how you would convert the histogram densities (heights of the bars on the density scale) into histogram frequencies. Density * (bin width) * N = frequency where N = total number in sample. Then all you need to is multiply the Normal density by the same factor. To find out the bin width, take the difference between succesive values of the breaks component of the histogram. One way to do all this is N - 1000 x - rnorm(N) H - hist(x, freq=TRUE) ## This will plot the histogram as well dx - min(diff(H$breaks)) curve(N*dx*dnorm(x), add=TRUE, col=blue) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-May-09 Time: 17:31:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram frequencies with a normal pdf curve overlay
Thank you! On Sat, May 9, 2009 at 11:31 AM, Ted Harding ted.hard...@manchester.ac.uk wrote: On 09-May-09 16:10:42, Jacques Wagnor wrote: Dear List, When I plot a histogram with 'freq=FALSE' and overlay the histogram with a normal pdf curve, everything looks as expected, as follows: x - rnorm(1000) hist(x, freq=FALSE) curve(dnorm(x), add=TRUE, col=blue) What do I need to do if I want to show the frequencies (freq=TRUE) with the same normal pdf overlay, so that the plot would still look the same? Regards, Jacques Think first about how you would convert the histogram densities (heights of the bars on the density scale) into histogram frequencies. Density * (bin width) * N = frequency where N = total number in sample. Then all you need to is multiply the Normal density by the same factor. To find out the bin width, take the difference between succesive values of the breaks component of the histogram. One way to do all this is N - 1000 x - rnorm(N) H - hist(x, freq=TRUE) ## This will plot the histogram as well dx - min(diff(H$breaks)) curve(N*dx*dnorm(x), add=TRUE, col=blue) Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 09-May-09 Time: 17:31:03 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rmysql linking to an old-ish mysql build
This topic is usually covered on R-sig-db, so its archives will give more information (and as I recall, so would the R-help archives, not least in pointing you to R-sig-db). On Sat, 9 May 2009, Uwe Ligges wrote: Jose Quesada wrote: Hi, I'm trying to get Rmysql to work o windows server 2008 64-bit. I have the latest build of mysql installed (mysql-5.1.34-winx64). Independent of the version number of MySQL (which is less than 6 months old): If you are talking about the RMySQL binary build on CRAN: It is build against a 32-bit version of MySQL. I am not sure if there is a safe way to build a binary that properly links against 64-bit MySQL given you are running 32-bit R. MySQL is a client-server system: this will work if you have a 32-bit MySQL client DLL and arrange for RMySQL to find it (as a 32-bit client can talk to a 64-bit server). That client DLL needs to be more or less the same MySQL version as RMySQL was built against (and what 'more or less' means is determined by trial-and-error: there is no guarantee whatsoever that any other version will work, and even single patch-level differences have results in crashes). If there is, you have to install the package from sources yourself anyway. That is in any case the safest thing to do. Best, Uwe Ligges When trying to load Rmysql, I got a warning that Rmysql is linking to an old-ish mysql build (5.0.67). I could do some basic stuff (the connection works) but it breaks when trying to read a large table. So I set up to use the buld 5.0.67 that Rmysql likes. 10 hrs later and after lots of sysadmin work, I have to call it quits. I couldn't make it work. Since this mysql 5.0.67 is pretty old, I was wondering if anyone has binaries for Rmysql that work for a more recent version. Maybe the authors of the package have plans to update it soon? I've tried the package on both R 2.9.0 and R2.8.1. If nothing comes up, I'll try to spend a few more hours on getting the old version to work. Thanks! Best, -Jose __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.
my guess he might ask for production code but just didn't want to tell the truth here. in some software forums, this kind of things happen all the time :-) On Fri, May 8, 2009 at 12:36 PM, Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote: Simon Pickett wrote: I bet at least a few people offered their services! It might be an undercover sting operation to weed out the unethical amongst us :-) ... written by some of the r core developers? vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- == WenSui Liu Acquisition Risk, Chase Blog : statcompute.spaces.live.com Tough Times Never Last. But Tough People Do. - Robert Schuller __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.
I hate to start a whole war about this but isn't there some percent chance ( not much but non zero ) that she's willing to pay the 300.00 so that she can get a nice solution that she can then learn from ? I'm definitely guilty of this behavior as a non-student and i forget to be honest if we was definitely a student but I think she was.  again, not meaning to start a war so no replies preffered or atleast they should be off list ? On May 9, 2009, Gábor Csárdi csa...@rmki.kfki.hu wrote: That's typical, my profs used to do this to me all the time. G. On Sat, May 9, 2009 at 6:17 PM, Carl Witthoft [1]c...@witthoft.com wrote: Sorry, but your professor offered me $500 NOT to do your assignments. __ [2]r-h...@r-project.org mailing list [3]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [4]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gabor Csardi [5]gabor.csa...@unil.ch UNIL DGM __ [6]r-h...@r-project.org mailing list [7]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [8]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. References 1. mailto:c...@witthoft.com 2. mailto:R-help@r-project.org 3. https://stat.ethz.ch/mailman/listinfo/r-help 4. http://www.R-project.org/posting-guide.html 5. mailto:gabor.csa...@unil.ch 6. mailto:R-help@r-project.org 7. https://stat.ethz.ch/mailman/listinfo/r-help 8. http://www.R-project.org/posting-guide.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generating a conditional time variable
Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test yearid eif conditional_time 1990 1010 01 1991 1010 02 1992 1010 13 1993 1010 01 1994 1010 02 1995 1010 03 1996 1010 04 1997 1010 15 1998 1010 01 1999 1010 02 2000 1010 03 2001 1010 04 2002 1010 05 2003 1010 06 1990 2010 01 1991 2010 02 1992 2010 03 1993 2010 04 1994 2010 05 1995 2010 06 1996 2010 07 1997 2010 08 1998 2010 09 1999 2010 010 2000 2010 011 2001 2010 112 2002 2010 01 2003 2010 02 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) {# If first id-year cond_time - 1 test[i, 4] - cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time - 1 test[i, 4] - cond_time } else {# Same id as previous row if (test[i, 3] == 0) { test[i, 4] - sum(cond_time, 1) cond_time - test[i, 6] } else { test[i, 4] - sum(cond_time, 1) cond_time - 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a general way to select a subset of matrix rows?
Dear fellow R users, I can't figure out how to do a simple thing properly: apply an operation to matrix columns on a selected subset of rows. Things go wrong when only one row is being selected. I am sure there's a way to do this properly. Here's an example: # define a 3-by-4 matrix x x - matrix(runif(12),ncol=4) str(x) num [1:3, 1:4] 0.568 0.217 0.309 0.859 0.651 ... # calculate column means for selected rows rows - c(1,2) apply(x[rows,],2,mean) [1] 0.3923531 0.7552746 0.3661532 0.1069531 # now the same thing, but the rows vector is actually just one row rows - c(2) apply(x[rows,],2,mean) Error in apply(x[rows, ], 2, mean) : dim(X) must have a positive length The problem is that while x[rows,] in the first case returned a matrix, in the second case, when only one row was selected, it returned a vector (and the apply obviously failed). Is there a general way to subset a matrix so it still returns a matrix even if it's one row? Unfortunately doing as.matrix(x[rows,]) doesn't work either, as it returns a transposed matrix in the case of a single row. Is there a way to do this properly without writing out hideous if statements accounting for single row exception? thanks, -peter. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R package for estimating markov transition matrix from observations + confidence?
Dear R gurus, I have data for which I want to estimate the markov transition matrix that generated the sequence, and preferably obtain some measure of confidence for that estimation. e.g., for a series such as 1 3 4 1 2 3 1 2 1 3 4 3 2 4 2 1 4 1 2 4 1 2 4 1 2 1 2 1 3 1 I would want to get an estimate of the matrix that generated it [[originally: [,1] [,2] [,3] [,4] [1,] 0.00 0.33 0.33 0.33 [2,] 0.33 0.00 0.33 0.33 [3,] 0.33 0.33 0.00 0.33 [4,] 0.33 0.33 0.33 0.00 ]] and the confidence in that estimation. I know that generating the cross--tab matrix is trivial, but if there is a package that does that and provides a likelihood as well, I'd appreciate knowing about it. Best, Uri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqlSave()
Hi all: I have created a MS Access table named 'PredictedValues' through the statement below: myDB - odbcConnectAccess(C:/Documents and Settings/Owner/Desktop/Rpond Farming.mdb,uid=admin,pwd=) sqlSave(myDB,PredictedValues,rownames=FALSE) close(myDB) But if I run the code again with new values I get the message below: Error in sqlSave(myDB, PredictedValues, rownames = FALSE) : table ‘PredictedValues’ already exists and my new records don't get updated. I was under the impression that 'sqlSave' would copy new data on top of the existing one or if the table didn't exist it would create one with the new values. I tried 'sqlUpdate' but my existing 'PredictedValues' didn't update. What am I doing wrong. ? Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqlSave()
Sorry, I'am resending it because I forgot to send my system info(below) Hi all: I have created a MS Access table named 'PredictedValues' through the statement below: myDB - odbcConnectAccess(C:/Documents and Settings/Owner/Desktop/Rpond Farming.mdb,uid=admin,pwd=) sqlSave(myDB,PredictedValues,rownames=FALSE) close(myDB) But if I run the code again with new values I get the message below: Error in sqlSave(myDB, PredictedValues, rownames = FALSE) : table ‘PredictedValues’ already exists and my new records don't get updated. I was under the impression that 'sqlSave' would copy new data on top of the existing one or if the table didn't exist it would create one with the new values. I tried 'sqlUpdate' but my existing 'PredictedValues' didn't update. What am I doing wrong. ? sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] graphics grDevices datasets tools stats grid utils methods base other attached packages: [1] RODBC_1.2-5 forecast_1.23 tseries_0.10-11 quadprog_1.4-10 zoo_1.3-1 hexbin_1.17.0 xtable_1.5-5lattice_0.17-22 plyr_0.1.8 ggplot2_0.8.3 reshape_0.8.0 proto_0.3-7 [13] rcom_2.1-3 rscproxy_1.3-1 Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Overloading some non-dispatched S3 methods for new classes
Hello, I am building a package that creates a new kind of object not unlike a dataframe. However, it is not an extension of a dataframe, as the data themselves reside elsewhere. It only contains metadata. I would like to be able to retrieve data from my objects such as the number of rows, the number of columns, the colnames, etc. I --quite naively-- thought that ncol, nrow, colnames, etc. would be dispatched, so I would only need to create a, say, ncol.myclassname function so as to be able to invoke ncol directly and transparently. However, it is not the case. The only alternative I can think about is to create decorated versions of ncol, nrow, etc. to avoid naming conflicts. But I would still prefer my package users to be able to use the undecorated function names. Do I have a chance? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a conditional time variable
Assuming the year column has complete data and doesn't skip a year, the following should take care of 1) #Simulated data frame: year from 1990 to 2003, for 5 different ids, each having one or two eif events test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),eif=as.vector(sapply(1:5,function(z){a-rep(0,length(1990:2003));a[sample(1:length(1990:2003),sample(1:2,1))]-1;a}))) #Generate the conditional_time column. test-do.call(rbind,lapply(split(test,test$id),function(z){s-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,s-1,s-s+1)))})) Generally sapply, lapply, and apply are faster than for loops. split() will split your data frame by the $id column (second argument). lapply() loops through the resulting list and generates the cond_time variable, resetting when eif==1, otherwise incrementing the count, much as you have in your code. If I understand 2) correctly, the following should do the trick: test2-test; #copy the data frame test2-do.call(rbind,lapply(split(test,test$id),function(z)z[1:which(z$eif==1)[1],])) Similar to the former, but sub-setting the rows of the data data frame up to the first event, for each id. If the above is all you need, then 1) and 2) could be combined in a single call. Others will likely have a different approach.. Cheers, -- Greg Finak Post-Doctoral Research Associate Computational Biology Unit Institut des Recherches Cliniques de Montreal Montreal, QC. On 09/05/09 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com wrote: Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test yearid eif conditional_time 1990 1010 01 1991 1010 02 1992 1010 13 1993 1010 01 1994 1010 02 1995 1010 03 1996 1010 04 1997 1010 15 1998 1010 01 1999 1010 02 2000 1010 03 2001 1010 04 2002 1010 05 2003 1010 06 1990 2010 01 1991 2010 02 1992 2010 03 1993 2010 04 1994 2010 05 1995 2010 06 1996 2010 07 1997 2010 08 1998 2010 09 1999 2010 010 2000 2010 011 2001 2010 112 2002 2010 01 2003 2010 02 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) {# If first id-year cond_time - 1 test[i, 4] - cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time - 1 test[i, 4] - cond_time } else {# Same id as previous row if (test[i, 3] == 0) { test[i, 4] - sum(cond_time, 1) cond_time - test[i, 6] } else { test[i, 4] - sum(cond_time, 1) cond_time - 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a conditional time variable
That will teach me to post without a double-check. On 09/05/09 3:11 PM, Finak Greg greg.fi...@ircm.qc.ca wrote: Assuming the year column has complete data and doesn't skip a year, the following should take care of 1) #Simulated data frame: year from 1990 to 2003, for 5 different ids, each having one or two eif events test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),eif=as.vector(sapply(1:5,function(z){a-rep(0,length(1990:2003));a[sample(1:length(1990:2003),sample(1:2,1))]-1;a}))) #Generate the conditional_time column. test-do.call(rbind,lapply(split(test,test$id),function(z){s-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,s-1,s-s+1)))})) The above resets the count at eif==1 rather than after, and there's a local assignment to s which should be global. Thanks, David, for noting that. do.call(rbind,lapply(split(test,test$id),function(z){s-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,{l-s+1;s-0;l},{l-s+1;s-s+1;l})))})) Generally sapply, lapply, and apply are faster than for loops. split() will split your data frame by the $id column (second argument). lapply() loops through the resulting list and generates the cond_time variable, resetting when eif==1, otherwise incrementing the count, much as you have in your code. If I understand 2) correctly, the following should do the trick: test2-test; #copy the data frame test2-do.call(rbind,lapply(split(test,test$id),function(z)z[1:which(z$eif==1)[1],])) Similar to the former, but sub-setting the rows of the data data frame up to the first event, for each id. If the above is all you need, then 1) and 2) could be combined in a single call. Others will likely have a different approach.. Cheers, -- Greg Finak Post-Doctoral Research Associate Computational Biology Unit Institut des Recherches Cliniques de Montreal Montreal, QC. On 09/05/09 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com wrote: Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test yearid eif conditional_time 1990 1010 01 1991 1010 02 1992 1010 13 1993 1010 01 1994 1010 02 1995 1010 03 1996 1010 04 1997 1010 15 1998 1010 01 1999 1010 02 2000 1010 03 2001 1010 04 2002 1010 05 2003 1010 06 1990 2010 01 1991 2010 02 1992 2010 03 1993 2010 04 1994 2010 05 1995 2010 06 1996 2010 07 1997 2010 08 1998 2010 09 1999 2010 010 2000 2010 011 2001 2010 112 2002 2010 01 2003 2010 02 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) {# If first id-year cond_time - 1 test[i, 4] - cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time - 1 test[i, 4] - cond_time } else {# Same id as previous row if (test[i, 3] == 0) { test[i, 4] - sum(cond_time, 1) cond_time - test[i, 6] } else { test[i, 4] - sum(cond_time, 1) cond_time - 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help with chisq
I am very new to R. I have some data from a CVS stored in vdata with 4 columns labeled: X08, Y08, X09, Y09. I have created two new columns like so: Z08 - (vdata$X08-vdata$Y08) Z09 - (vdata$X09-vdata$Y09) I would like to use chisq.test for each row and output the p-value for each in a stored variable. I don't know how to do it. Can you help? so far I have done it for one row (but I want it done automatically for all my data): chidata=rbind(c(vdata$Y08[1],Z08[1]),c(vdata$Y09[1],Z09[1])) results - chisq.test(chidata) results$p.value I tried removing the [1] and the c() but that didn't work... Any ideas? THANKS! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with chisq
On May 9, 2009, at 4:53 PM, JC wrote: I am very new to R. I have some data from a CVS stored in vdata with 4 columns labeled: X08, Y08, X09, Y09. I have created two new columns like so: Z08 - (vdata$X08-vdata$Y08) Z09 - (vdata$X09-vdata$Y09) I would like to use chisq.test for each row Of what? and output the p-value for each in a stored variable. I don't know how to do it. Can you help? so far I have done it for one row (but I want it done automatically for all my data): chidata=rbind(c(vdata$Y08[1],Z08[1]),c(vdata$Y09[1],Z09[1])) Maybe I am dense, but I cannot figure out what hypothesis is being tested. results - chisq.test(chidata) results$p.value Generally using apply(vdata, 1, . would give you a row by row computation. I tried removing the [1] and the c() but that didn't work... Any ideas? As Jim Holtman's tag line says: What problem are you trying to solve? David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a general way to select a subset of matrix rows?
Yes, use the drop argument; apply(x[rows,,drop=F],2,mean) On Sat, May 9, 2009 at 2:33 PM, Peter Kharchenko peter.kharche...@post.harvard.edu wrote: Dear fellow R users, I can't figure out how to do a simple thing properly: apply an operation to matrix columns on a selected subset of rows. Things go wrong when only one row is being selected. I am sure there's a way to do this properly. Here's an example: # define a 3-by-4 matrix x x - matrix(runif(12),ncol=4) str(x) num [1:3, 1:4] 0.568 0.217 0.309 0.859 0.651 ... # calculate column means for selected rows rows - c(1,2) apply(x[rows,],2,mean) [1] 0.3923531 0.7552746 0.3661532 0.1069531 # now the same thing, but the rows vector is actually just one row rows - c(2) apply(x[rows,],2,mean) Error in apply(x[rows, ], 2, mean) : dim(X) must have a positive length The problem is that while x[rows,] in the first case returned a matrix, in the second case, when only one row was selected, it returned a vector (and the apply obviously failed). Is there a general way to subset a matrix so it still returns a matrix even if it's one row? Unfortunately doing as.matrix(x[rows,]) doesn't work either, as it returns a transposed matrix in the case of a single row. Is there a way to do this properly without writing out hideous if statements accounting for single row exception? thanks, -peter. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a conditional time variable
You might try the following function. First it identifies the last element in each run, then the length of each run, then calls sequence() to generate the within-run sequence numbers. my.sequence is a version of sequence that is more efficient (less time, less memory) than sequence when there are lots of short runs (sequence() calls lapply, which makes a memory consuming list, and then unlists it, and my.sequence avoids the big intermediate list). For your data, f(data) produces the same thing as data$conditional_time. f-function(data, use.my.sequence=FALSE){ n-nrow(data) lastInRun - with(data, eif | c(id[-1]!=id[-n], TRUE)) runLengths - diff(c(0L,which(lastInRun))) if (use.my.sequence) { my.sequence- function(nvec)seq_len(sum(nvec))-rep.int(c(0L,cumsum(nvec[-length(nvec)])),nvec) my.sequence(runLengths) } else { sequence(runLengths) } } Bill Dunlap, Spotfire Division, TIBCO Software Inc. Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test yearid eif conditional_time 1990 1010 01 1991 1010 02 1992 1010 13 1993 1010 01 1994 1010 02 1995 1010 03 1996 1010 04 1997 1010 15 1998 1010 01 1999 1010 02 2000 1010 03 2001 1010 04 2002 1010 05 2003 1010 06 1990 2010 01 1991 2010 02 1992 2010 03 1993 2010 04 1994 2010 05 1995 2010 06 1996 2010 07 1997 2010 08 1998 2010 09 1999 2010 010 2000 2010 011 2001 2010 112 2002 2010 01 2003 2010 02 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) {# If first id-year cond_time - 1 test[i, 4] - cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time - 1 test[i, 4] - cond_time } else {# Same id as previous row if (test[i, 3] == 0) { test[i, 4] - sum(cond_time, 1) cond_time - test[i, 6] } else { test[i, 4] - sum(cond_time, 1) cond_time - 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading large files quickly
I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clump of binary pixels on raster
Dear all, I have a set od 30,000 binary landscapes, which represent habitat and non-habitat cover. I need to generate images that identify those neighbour (rule 8) pixels as one patch ID, and a different patch ID for each clump of pixels. I coded it using labcon(adehabitat), but as some of my landscapes have so many patches, labcon not finish and entry in a eternal looping. By other side, I coded another solution using R grass (r.clump), but the solution is so slow, and as I need to run it a lot of time, I will need about 3 weeks to finish... I was thinking if raster package could do the job fastly than R-grass. Below you can find a simulation of what I need. On the second image, each color have different values. MyMatrix-matrix(rep(0,100), ncol=10) MyMatrix[2:4,3:6]-1 MyMatrix[7:8,1:3]-1 MyMatrix[8,7:8]-1 MyMatrix[8,7:8]-1 MyMatrix[6:7,8:9]-1 x11(800,400) par(mfrow=c(1,2)) image(MyMatrix) MyClusters-matrix(rep(0,100), ncol=10) MyClusters[2:4,3:6]-1 MyClusters[7:8,1:3]-2 MyClusters[8,7:8]-3 MyClusters[8,7:8]-4 MyClusters[6:7,8:9]-4 image(MyClusters, col=c(transparent, 1,3,4,5)) Regards a lot, milton brazil=toronto. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram frequencies with a normal pdf curve overlay
Assuming a constant bin width, you need to multiply the density by n*binwidth, where the bin width is (obviously!) the width of the histogram bins. Jacques Wagnor jacques.wag...@gmail.com 05/09/09 5:10 PM Dear List, When I plot a histogram with 'freq=FALSE' and overlay the histogram with a normal pdf curve, everything looks as expected, as follows: x - rnorm(1000) hist(x, freq=FALSE) curve(dnorm(x), add=TRUE, col=blue) What do I need to do if I want to show the frequencies (freq=TRUE) with the same normal pdf overlay, so that the plot would still look the same? Regards, Jacques platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.0 year 2008 month 10 day20 svn rev46754 language R version.string R version 2.8.0 (2008-10-20) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files quickly
You could try it with sqldf and see if that is any faster. It use RSQLite/sqlite to read the data into a database without going through R and from there it reads all or a portion as specified into R. It requires two lines of code of the form: f file(myfile.dat) DF - sqldf(select * from f, dbname = tempfile()) with appropriate modification to specify the format of your file and possibly to indicate a portion only. See example 6 on the sqldf home page: http://sqldf.googlecode.com and ?sqldf On Sat, May 9, 2009 at 12:25 PM, Rob Steele freenx.10.robste...@xoxy.net wrote: I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files quickly
First 'wc' and readLines are doing vastly different functions. 'wc' is just reading through the file without having to allocate memory to it; 'readLines' is actually storing the data in memory. I have a 150MB file I was trying it on, and here is what 'wc' did on my Windows system: /cygdrive/c: time wc tempxx.txt 1055808 13718468 151012320 tempxx.txt real0m2.343s user0m1.702s sys 0m0.436s /cygdrive/c: If I multiply that by 25 to extrapolate to a 3.5GB file, it should take about a little less than one minute to process on my relatively slow laptop. 'readLines' on the same file takes: system.time(x - readLines('/tempxx.txt')) user system elapsed 37.820.47 39.23 If I extrapolate that to 3.5GB, it would take about 16 minutes. Now considering that I only have 2GB on my system, I would not be able to read the whole file in at once. You never did specify what type of system you were running on and how much memory you had. Were you 'paging' due to lack of memory? system.time(x - readLines('/tempxx.txt')) user system elapsed 37.820.47 39.23 object.size(x) 84814016 bytes On Sat, May 9, 2009 at 12:25 PM, Rob Steele freenx.10.robste...@xoxy.netwrote: I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a conditional time variable
Here is yet another way of doing it (always the case in R): #Simulated data frame: year from 1990 to 2003, for 5 different ids, each having one or two eif events test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)), eif=as.vector(sapply(1:5,function(z){ a-rep(0,length(1990:2003)) a[sample(1:length(1990:2003),sample(1:2,1))]-1 a }))) # partition by 'id' and then by 'eif' changes test.new - do.call(rbind, lapply(split(test, test$id), function(.id){ # now by 'eif' changes do.call(rbind, lapply(split(.id, cumsum(.id$eif)), function(.eif){ # create new dataframe with column cbind(.eif, conditional_time=seq(nrow(.eif))) })) })) On Sat, May 9, 2009 at 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com wrote: Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test yearid eif conditional_time 1990 1010 01 1991 1010 02 1992 1010 13 1993 1010 01 1994 1010 02 1995 1010 03 1996 1010 04 1997 1010 15 1998 1010 01 1999 1010 02 2000 1010 03 2001 1010 04 2002 1010 05 2003 1010 06 1990 2010 01 1991 2010 02 1992 2010 03 1993 2010 04 1994 2010 05 1995 2010 06 1996 2010 07 1997 2010 08 1998 2010 09 1999 2010 010 2000 2010 011 2001 2010 112 2002 2010 01 2003 2010 02 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) {# If first id-year cond_time - 1 test[i, 4] - cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time - 1 test[i, 4] - cond_time } else {# Same id as previous row if (test[i, 3] == 0) { test[i, 4] - sum(cond_time, 1) cond_time - test[i, 6] } else { test[i, 4] - sum(cond_time, 1) cond_time - 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a conditional time variable
Corrected version. I forgot the the count had to change 'after' eif==1: #Simulated data frame: year from 1990 to 2003, for 5 different ids, each having one or two eif events test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)), eif=as.vector(sapply(1:5,function(z){ a-rep(0,length(1990:2003)) a[sample(1:length(1990:2003),sample(1:2,1))]-1 a }))) # partition by 'id' and then by 'eif' changes test.new - do.call(rbind, lapply(split(test, test$id), function(.id){ # now by 'eif' changes do.call(rbind, lapply(split(.id, cumsum(c(0, diff(.id$eif) == -1))), function(.eif){ cbind(.eif, conditional_time=seq(nrow(.eif))) })) })) On Sat, May 9, 2009 at 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com wrote: Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test yearid eif conditional_time 1990 1010 01 1991 1010 02 1992 1010 13 1993 1010 01 1994 1010 02 1995 1010 03 1996 1010 04 1997 1010 15 1998 1010 01 1999 1010 02 2000 1010 03 2001 1010 04 2002 1010 05 2003 1010 06 1990 2010 01 1991 2010 02 1992 2010 03 1993 2010 04 1994 2010 05 1995 2010 06 1996 2010 07 1997 2010 08 1998 2010 09 1999 2010 010 2000 2010 011 2001 2010 112 2002 2010 01 2003 2010 02 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) {# If first id-year cond_time - 1 test[i, 4] - cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time - 1 test[i, 4] - cond_time } else {# Same id as previous row if (test[i, 3] == 0) { test[i, 4] - sum(cond_time, 1) cond_time - test[i, 6] } else { test[i, 4] - sum(cond_time, 1) cond_time - 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Citing R/Packages Question
On Sat, 9 May 2009, roger koenker wrote: I've had an email exchange with the authors of a recent paper in Nature who also made a good faith effort to cite both R and the quantreg package, and were told that the Nature house style didn't allow such citations so they were dropped from the published paper and the supplementary material appearing on the Nature website. Interesting. Software manuals with an ISBN are not good enough for the Nature house style? I wonder what the problem with that could be... Since the CRAN website makes a special effort to make prior versions of packages available, it would seem to me to be much more useful to cite version numbers than access dates. Definitely, yes. Current versions of R with current versions of quantreg for example yield: Roger Koenker (2009). quantreg: Quantile Regression. R package version 4.27. http://CRAN.R-project.org/package=quantreg Even if 4.27 is not current anymore it will be available under the archive link at the above URL. So an access date is not necessary. Pointing this out to the journal editors might help. If not, providing the access date (while keeping all other information) won't do any damage. There are serious questions about the ephemerality of url citations, not all of which are adequately resolved by the Wayback machine, and access dating, but it would be nice to have some better standards for such contingent citations rather than leave authors at the mercy of copy editors. I would also be interested in suggestions by other contributors. I wouldn't be aware of good generally applicable standards of citing software. The default output of citation() has been chosen because repository+package+version uniquely identify which package was used (not unsimilar to journal+volume+pages). Also, using the URL http://CRAN.R-project.org/package=quantreg has the advantage that it is independent of the physical location on CRAN. So in case the structure of the package pages on CRAN changes in the future, the URL will still point to the relevant page with all necessary information. Best, Z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spatstat
Hi all, I am trying to install Spatstat on OpenSUSE 11.1. install.packages(spatstat, dependencies = TRUE) fails on the basis of various compiler packages (full message below). I have gcc version 4.3.2, which should include gfortran and g++ - so I'm not sure what to do! Richard * Installing *source* package ‘deldir’ ... ** libs gfortran -fpic -O2 -c acchk.f -o acchk.o make: gfortran: Command not found make: *** [acchk.o] Error 127 ERROR: compilation failed for package ‘deldir’ * Removing ‘/home/richard/R/i686-pc-linux-gnu-library/2.9/deldir’ * Installing *source* package ‘spatstat’ ... ** libs gcc -std=gnu99 -I/usr/lib/R/include -I/usr/local/include-fpic -O2 -c Kborder.c -o Kborder.o gcc -std=gnu99 -I/usr/lib/R/include -I/usr/local/include-fpic -O2 -c Kwborder.c -o Kwborder.o g++ -I/usr/lib/R/include -I/usr/local/include-fpic -O2 -c PerfectStrauss.cc -o PerfectStrauss.o make: g++: Command not found make: *** [PerfectStrauss.o] Error 127 ERROR: compilation failed for package ‘spatstat’ * Removing ‘/home/richard/R/i686-pc-linux-gnu-library/2.9/spatstat’ The downloaded packages are in ‘/tmp/RtmpdcNYyo/downloaded_packages’ Warning messages: 1: In install.packages(spatstat) : installation of package 'deldir' had non-zero exit status 2: In install.packages(spatstat) : installation of package 'spatstat' had non-zero exit status __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files quickly
Rob Steele wrote: I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? I use statist to convert the fixed width data file into a csv file because read.table() is considerably faster than read.fwf(). For example: system(statist --na-string NA --xcols collist big.txt big.csv) bigdf - read.table(file = big.csv, header=T, as.is=T) The file collist is a text file whose lines contain the following information: variable begin end where variable is the column name, and begin and end are integer numbers indicating where in big.txt the columns begin and end. Statist can be downloaded from: http://statist.wald.intevation.org/ -- Jakson Aquino Social Sciences Department Federal University of Ceará, Brazil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't see libR.so in my installation directory
Tena Sakai wrote: I became aware of such as I was preparing for an installation of little r. The installation material stated to look for libR.so, and I want to make sure that the one I installed (2.9.0) is used by little r. little r... do you mean the scripting front end for R? If so, the core utility Rscript is probably installed (it was added in 2.5.0 I believe) and provides the functionality of little r, including hash-bang lines. Check the bin folder in the R installation. If you are talking about something different, ignore this message :) -Charlie - Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://www.nabble.com/I-don%27t-see-libR.so-in-my-installation-directory-tp23455074p23466363.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files quickly
Thanks guys, good suggestions. To clarify, I'm running on a fast multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1. Paging shouldn't be an issue since I'm reading in chunks and not trying to store the whole file in memory at once. Thanks again. Rob Steele wrote: I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files quickly
Since you are reading it in chunks, I assume that you are writing out each segment as you read it in. How are you writing it out to save it? Is the time you are quoting both the reading and the writing? If so, can you break down the differences in what these operations are taking? How do you plan to use the data? Is it all numeric? Are you keeping it in a dataframe? Have you considered using 'scan' to read in the data and to specify what the columns are? If you would like some more help, the answer to these questions will help. On Sat, May 9, 2009 at 10:09 PM, Rob Steele freenx.10.robste...@xoxy.netwrote: Thanks guys, good suggestions. To clarify, I'm running on a fast multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1. Paging shouldn't be an issue since I'm reading in chunks and not trying to store the whole file in memory at once. Thanks again. Rob Steele wrote: I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to get design matrix?
If I was doing an ANOVA analysis how can I get the design matrix R used? -- View this message in context: http://www.nabble.com/how-to-get-design-matrix--tp23466549p23466549.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get design matrix?
Got code? On May 9, 2009, at 10:29 PM, linakpl wrote: If I was doing an ANOVA analysis how can I get the design matrix R used? -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor
Windows XP R 2.8.1 I am trying to use anova(fitCont,fitCat) to compare two Cox models (coxph) one in which age is entered as a continuous variable, and a second where age is entered as a three-level factor (young, middle, old). The Analysis of Deviance Table produced by anova does not give a p value. Is there any way to get anova to produce p values? Thank you, John Sorkin ANOVA results are pasted below: anova(fitCont,fitCat) Analysis of Deviance Table Model 1: Surv(Time30, Died) ~ Rx + Age Model 2: Surv(Time30, Died) ~ Rx + AgeGrp Resid. Df Resid. Dev Df Deviance 162 147.38 261 142.38 1 5.00 The entire program including the original coxph models follows: fitCont-coxph(Surv(Time30,Died)~Rx+Age,data=GVHDdata) summary(fitCont) Call: coxph(formula = Surv(Time30, Died) ~ Rx + Age, data = GVHDdata) n= 64 coef exp(coef) se(coef)z p Rx 1.375 3.96 0.5318 2.59 0.0097 Age 0.055 1.06 0.0252 2.19 0.0290 exp(coef) exp(-coef) lower .95 upper .95 Rx 3.96 0.253 1.40 11.22 Age 1.06 0.946 1.01 1.11 Rsquare= 0.154 (max possible= 0.915 ) Likelihood ratio test= 10.7 on 2 df, p=0.00483 Wald test= 9.46 on 2 df, p=0.0088 Score (logrank) test = 10.2 on 2 df, p=0.00626 fitCat-coxph(Surv(Time30,Died)~Rx+AgeGrp,data=GVHDdata) summary(fitCat) Call: coxph(formula = Surv(Time30, Died) ~ Rx + AgeGrp, data = GVHDdata) n= 64 coef exp(coef) se(coef)z p Rx1.19 3.270.525 2.26 0.024 AgeGrp[T.(15,25]] 1.98 7.260.771 2.57 0.010 AgeGrp[T.(25,45]] 1.61 5.020.806 2.00 0.045 exp(coef) exp(-coef) lower .95 upper .95 Rx 3.27 0.306 1.17 9.16 AgeGrp[T.(15,25]] 7.26 0.138 1.60 32.88 AgeGrp[T.(25,45]] 5.02 0.199 1.04 24.38 Rsquare= 0.217 (max possible= 0.915 ) Likelihood ratio test= 15.7 on 3 df, p=0.00133 Wald test= 12.0 on 3 df, p=0.0075 Score (logrank) test = 14.5 on 3 df, p=0.00232 anova(fitCont,fitCat) Analysis of Deviance Table Model 1: Surv(Time30, Died) ~ Rx + Age Model 2: Surv(Time30, Died) ~ Rx + AgeGrp Resid. Df Resid. Dev Df Deviance 162 147.38 261 142.38 1 5.00 John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't see libR.so in my installation directory
On 8 May 2009 at 16:17, Tena Sakai wrote: | Maybe I know the answer to my own question. | When I built R 2.9.0, I didn't say: | | ./configure --enable-R-shlib | | I know I have given --prefix flag, but that's | the only flag I used. | | I would appreciate it, if someone would give me | a definitive answer, however. You found the answer. littler aka 'r' embeds R by loading the shared library --- the libR.so you were looking for. Unless you have R build with --enable-R-shlib, you will not be able to use r, or for that matter other users of embedded R. Hope this helps, Dirk | Regards, | | Tena Sakai | tsa...@gallo.ucsf.edu | | | -Original Message- | From: r-help-boun...@r-project.org on behalf of Tena Sakai | Sent: Fri 5/8/2009 4:07 PM | To: r-help@r-project.org | Subject: [R] I don't see libR.so in my installation directory | | Hi, | | I installed R 2.9.0 a couple of days ago on a | linux machine. At the root of installation, | I see 4 directories: bin, lib64, share, and src. | | I don't see libR.so anywhere. (In the following | context, . (dot) indicates the root of my insta- | llation.) I do see: | ./lib64/R/lib/libRblas.so | ./lib64/R/lib/libRlapack.so | | I became aware of such as I was preparing for | an installation of little r. The installation | material stated to look for libR.so, and I want | to make sure that the one I installed (2.9.0) | is used by little r. | | Would someone please clue me in? Why don't I | have libR.so and yet when I execute ./bin/R | it says: | R version 2.9.0 (2009-04-17) | | Regards, | | Tena Sakai | tsa...@gallo.ucsf.edu | | [[alternative HTML version deleted]] | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. | | | [[alternative HTML version deleted]] | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.