Re: [Rd] Best practices for writing R functions
On Tue, 2011-07-26 at 15:19 -0700, Davor Cubranic wrote: > On 2011-07-23, at 5:57 AM, Alireza Mahani wrote: > > > Another trick to reduce verbosity of code (and focus on algorithm logic > > rather than boilerplate code) is to maintain a global copy of variables (in > > the global environment) which makes them visible to all functions (where > > appropriate, of course). Once the development and testing is finished, one > > can tidy things up and modify the function prototypes, add lines for > > unpacking lists inside functions, etc. > > I think you'd be better off to stay away from such tricks. It's asking for > trouble later on, because unless you have really good unit tests it is very > easy to miss a variable during "tidying up" and end up with code that works > fine in your development environment but is full of bugs once you distribute > it to others. Isn't this specifically one of the things that environment are *for*? Have your package/script/functions create an environment, and store 'loose variables' there. Use get/assign to manage. Don't clutter .GlobalEnv. -- Brian __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Best practices for writing R functions
On 2011-07-23, at 5:57 AM, Alireza Mahani wrote: > Another trick to reduce verbosity of code (and focus on algorithm logic > rather than boilerplate code) is to maintain a global copy of variables (in > the global environment) which makes them visible to all functions (where > appropriate, of course). Once the development and testing is finished, one > can tidy things up and modify the function prototypes, add lines for > unpacking lists inside functions, etc. I think you'd be better off to stay away from such tricks. It's asking for trouble later on, because unless you have really good unit tests it is very easy to miss a variable during "tidying up" and end up with code that works fine in your development environment but is full of bugs once you distribute it to others. Davor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN mirror size mushrooming; consider archiving some?
Or one could buy an iPod and host it from there ;-) 160 GB for US$250. Uwe's plan is probably better though... Jeff On Tue, Jul 26, 2011 at 5:08 PM, Hadley Wickham wrote: > >> I'm setting up a new CRAN mirror and filled up the disk space the > >> server allotted me. I asked for more, then filled that up. Now the > >> system administrators want me to buy an $800 fiber channel card and a > >> storage device. I'm going to do that, but it does make want to > >> suggest to you that this is a problem. > > > > Why? Just for the mirror? That's nonsense. A 6 year old outdated desktop > > machine (say upgraded to 2GB RAM) with a 1T harddisc for 50$ should be > fine > > for your first tries. The bottleneck will probably be your network > > connection rather than the storage. > > Another perspective is that it costs ~$10 / month to store 68 Gb of > data on amazon's S3. And then you pay 12c / GB for download. > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Jeffrey Ryan jeffrey.r...@lemnica.com www.lemnica.com www.esotericR.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN mirror size mushrooming; consider archiving some?
>> I'm setting up a new CRAN mirror and filled up the disk space the >> server allotted me. I asked for more, then filled that up. Now the >> system administrators want me to buy an $800 fiber channel card and a >> storage device. I'm going to do that, but it does make want to >> suggest to you that this is a problem. > > Why? Just for the mirror? That's nonsense. A 6 year old outdated desktop > machine (say upgraded to 2GB RAM) with a 1T harddisc for 50$ should be fine > for your first tries. The bottleneck will probably be your network > connection rather than the storage. Another perspective is that it costs ~$10 / month to store 68 Gb of data on amazon's S3. And then you pay 12c / GB for download. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] default par
For number 1, one option is to use the setHook function with the hook in plot.new. Using this you can create a function that will be called before every new plot is created, your function could then call par with the options that you want, this will set the parameters on all devices. However it could cause problems if you ever wanted to change those values for a plot, your call to par would be overwritten by the hook function. For number 2, S-PLUS did have the default to warn when points were outside the plotting region, this was annoying when people intentionally used the limits to look at only part of the data, so I don't think it would be popular to bring back this behavior in general. You can use the zoomplot function in the TeachingDemos package to expand the range of your current plot to show data that was outside the limits, or I believe that if you use ggplot2 the plots will be expanded automatically to include all the data (unless you limit the range in the call). You could also write your own points or plot function that would check the range and give warnings then call the regular points or plot function. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r- > project.org] On Behalf Of Berry Boessenkool > Sent: Friday, July 22, 2011 7:47 AM > To: r-devel@r-project.org > Subject: [Rd] default par > > > > Hello dear R-developers, > > two questions on an otherwise magnificent program: > > 1) > Is there a way to set defaults for par differently than R offers > normally? > I for example would like to have las default to 1. (or in the same > style, sometimes type in plot() could be "l" per default). > > Tthe following post desribes pretty much exactly that: > https://stat.ethz.ch/pipermail/r-help/2007-March/126646.html > It was written four years ago, but it seems like there has been no real > elegant solution. > Did I just miss something there? If so, could someone give me an > update? > If not, is there a chance that such a feature would be added to future > R-versions? > I could live with the idea to assign the par$element default in > Rprofile.site. > > 2) > Would it appear sensible to have R give a warning, when points() is > used, and some/all values are out of plotting range in the active > device? > It has happened some times that I needed quite a bit of time to figure > out why nothing was plotted. > Such a warning (or maybe even a beep?) would give users the clue to > look at the values right away... > (What I mean is this: plot(1:10) ; points(11,3) just in case > it's unclear) > > > Thanks ahead for pondering, and again: R ist the most beautiful thing I > discovered in the last three years. > Keep up the good work! > > Berry > > - > Berry Boessenkool > University of Potsdam, Germany > - > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R result objects as lists
On 26/07/2011 11:26 AM, Thorsten wrote: Hello List, I want to communicate between a minimalistc lisp that has only numbers, symbols (also used as strings) and lists as datatypes, and R. It should be no problem to send command strings from the lisp process to the R childprocess. I know, R is mostly implemented in Scheme, No, the design of the original R interpreter was based on a Scheme interpreter, but it is mostly implemented in C. and I read recently, that these special return objects of R are really lists under the hood. Therefore my questions: 1. When I send a command from a lisp (that iks not elisp) to an R subprocess, how can I recieve the R result object as a list (and not a special R object)? 2. Apart from graphics - are all R result objects lists (or numbers or strings)? That is, is it safe to assume that the result of an R call will always be either a number, a string or a list (under the hood)? No, you need to treat the results as C structures under the hood. Some are implemented as Lisp-like lists, but most are vectors with additional information about the type of object that is contained within (in a C-style array). Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] plot.function documentation/export?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 OK, I see that BDR did this on 2011-06-08 -- I was getting confused by looking at the code of the development version but running the release version. Thanks. Ben On 07/26/2011 02:33 PM, Uwe Ligges wrote: > Now I see the difference: I was using R-devel and that worked as you > expected. > > Best, > Uwe Ligges > > On 25.07.2011 19:01, Ben Bolker wrote: > On 07/25/2011 12:55 PM, Uwe Ligges wrote: On 25.07.2011 17:45, Ben Bolker wrote: I recently suggested to someone ( http://stackoverflow.com/questions/6789055/r-inconsistency-why-add-t-sometimes-works-and-sometimes-not-in-the-plot-functi/6789098#6789098 ) that the should use methods("plot") or methods(class="function") to locate the documentation on the plot method for objects of class "function", but they pointed out that these don't actually work. I can't figure out why not: src/library/graphics/man/curve.Rd contains the line \method{plot}{function}(x, y = 0, to = 1, from = y, xlim = NULL, ylab = NULL, \dots) and src/library/graphics/DESCRIPTION contains > you mean the following line is in NAMESPACE rather than DESCRIPTION. S3method(plot, "function") > >Yes, sorry. > [presumably the extra quotes are in there because function is a reserved word?] I'm not sure where else the information should be. Searching around in the code tree for information on tail.function (which is listed in the methods: >>> methods(class="function") [1] as.list.function head.function* print.function tail.function* I find the same S3method syntax, so I guess the quotation marks aren't the problem ... > ?tail.function > tells us this one is from package "utils" and you can search for this > function in the sources of the utils package > Or you could ask for getAnywhere("tail.function") > and R tells you > A single object matching tail.function was found > It was found in the following places >registered S3 method for tail from namespace utils >namespace:utils > [.] > Best wishes, > Uwe > >Sorry, I didn't frame my question very clearly. I can find > "tail.function" just fine, or I could if I wanted to. What I don't > know is why methods("plot") and methods(class="function") don't list > "plot.function" even though its documentation and setup seem to be > similar to "tail.function", which *does* show up in > methods(class="function") ... > >cheers > Ben Bolker > > > = > >No plot.function listing in either of these ... > library("graphics") methods("plot") > [1] plot.acf* plot.data.frame*plot.decomposed.ts* > [4] plot.defaultplot.dendrogram*plot.density > [7] plot.ecdf plot.factor*plot.formula* > [10] plot.hclust*plot.histogram* plot.HoltWinters* > [13] plot.isoreg*plot.lm plot.medpolish* > [16] plot.mlmplot.ppr* plot.prcomp* > [19] plot.princomp* plot.profile.nls* plot.spec > [22] plot.spec.coherency plot.spec.phase plot.stepfun > [25] plot.stl* plot.table* plot.ts > [28] plot.tskernel* plot.TukeyHSD > > Non-visible functions are asterisked methods(class="function") > [1] as.list.function head.function* print.function tail.function* > > Non-visible functions are asterisked > > > -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4vDBIACgkQc5UpGjwzenP54QCghWmpGf5gpmRVYqNxJ+gm41n4 ErgAoJlXroIs3DLIPnJ4qyEPy1izMrMl =ptBG -END PGP SIGNATURE- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] plot.function documentation/export?
Now I see the difference: I was using R-devel and that worked as you expected. Best, Uwe Ligges On 25.07.2011 19:01, Ben Bolker wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/25/2011 12:55 PM, Uwe Ligges wrote: On 25.07.2011 17:45, Ben Bolker wrote: I recently suggested to someone ( http://stackoverflow.com/questions/6789055/r-inconsistency-why-add-t-sometimes-works-and-sometimes-not-in-the-plot-functi/6789098#6789098 ) that the should use methods("plot") or methods(class="function") to locate the documentation on the plot method for objects of class "function", but they pointed out that these don't actually work. I can't figure out why not: src/library/graphics/man/curve.Rd contains the line \method{plot}{function}(x, y = 0, to = 1, from = y, xlim = NULL, ylab = NULL, \dots) and src/library/graphics/DESCRIPTION contains you mean the following line is in NAMESPACE rather than DESCRIPTION. S3method(plot, "function") Yes, sorry. [presumably the extra quotes are in there because function is a reserved word?] I'm not sure where else the information should be. Searching around in the code tree for information on tail.function (which is listed in the methods: methods(class="function") [1] as.list.function head.function* print.function tail.function* I find the same S3method syntax, so I guess the quotation marks aren't the problem ... ?tail.function tells us this one is from package "utils" and you can search for this function in the sources of the utils package Or you could ask for getAnywhere("tail.function") and R tells you A single object matching tail.function was found It was found in the following places registered S3 method for tail from namespace utils namespace:utils [.] Best wishes, Uwe Sorry, I didn't frame my question very clearly. I can find "tail.function" just fine, or I could if I wanted to. What I don't know is why methods("plot") and methods(class="function") don't list "plot.function" even though its documentation and setup seem to be similar to "tail.function", which *does* show up in methods(class="function") ... cheers Ben Bolker = No plot.function listing in either of these ... library("graphics") methods("plot") [1] plot.acf* plot.data.frame*plot.decomposed.ts* [4] plot.defaultplot.dendrogram*plot.density [7] plot.ecdf plot.factor*plot.formula* [10] plot.hclust*plot.histogram* plot.HoltWinters* [13] plot.isoreg*plot.lm plot.medpolish* [16] plot.mlmplot.ppr* plot.prcomp* [19] plot.princomp* plot.profile.nls* plot.spec [22] plot.spec.coherency plot.spec.phase plot.stepfun [25] plot.stl* plot.table* plot.ts [28] plot.tskernel* plot.TukeyHSD Non-visible functions are asterisked methods(class="function") [1] as.list.function head.function* print.function tail.function* Non-visible functions are asterisked -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4toYcACgkQc5UpGjwzenMyFACggRdP+48u++szSbV82S4HhTxj MJcAnAsZ0iOXAsXtSeB8PZ4JmlgUgb9t =2lyp -END PGP SIGNATURE- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN mirror size mushrooming; consider archiving some?
On 25.07.2011 19:47, Paul Johnson wrote: Hi, everybody I'm setting up a new CRAN mirror and filled up the disk space the server allotted me. I asked for more, then filled that up. Now the system administrators want me to buy an $800 fiber channel card and a storage device. I'm going to do that, but it does make want to suggest to you that this is a problem. Why? Just for the mirror? That's nonsense. A 6 year old outdated desktop machine (say upgraded to 2GB RAM) with a 1T harddisc for 50$ should be fine for your first tries. The bottleneck will probably be your network connection rather than the storage. CRAN now is about 68GB, and about 3/4 of that is in the bin folder, where one finds copies of compiled packages for macosx and windows. If the administrators of CRAN would move the packages for R before, say, 2.12, to long term storage, then mirror management would be a bit more, well, manageable. Moving the R for windows packages for, say, R 2.0 through 2.10 would save some space, and possibly establish a useful precedent for the long term. That is right, but then users of R < 2.11.0 could no longer use install.packages() and friends. If we want to move stuff around in future, we may want to implement that in R first. We thought about removing old binaries before, but then disk space increased roughly as exponentially as repository space in the past and we decided to stay with it as is. Here's the bin/windows folder. Note it is expanding exponentially (or nearly so) And you see that quite a lot of efforts were made during the last release cycles to reduce the amount of used memory (e.g. using better compression). Best wishes, Uwe $ du --max-depth=1 | sort 1012644 ./2.6 103504 ./1.7 122200 ./1.8 1239876 ./2.7 1487024 ./2.8 15220 ./ATLAS 167668 ./1.9 17921604. 1866196 ./2.9 204392 ./2.0 2207708 ./2.10 2340120 ./2.13 2356272 ./2.12 2403176 ./2.11 298620 ./2.1 364292 ./2.2 438044 ./2.3 595920 ./2.4 698064 ./2.5 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R result objects as lists
Hello List, I want to communicate between a minimalistc lisp that has only numbers, symbols (also used as strings) and lists as datatypes, and R. It should be no problem to send command strings from the lisp process to the R childprocess. I know, R is mostly implemented in Scheme, and I read recently, that these special return objects of R are really lists under the hood. Therefore my questions: 1. When I send a command from a lisp (that iks not elisp) to an R subprocess, how can I recieve the R result object as a list (and not a special R object)? 2. Apart from graphics - are all R result objects lists (or numbers or strings)? That is, is it safe to assume that the result of an R call will always be either a number, a string or a list (under the hood)? Cheers Thorsten __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] --max-vsize
Prof Brian Ripley writes: > Point 1 is as documented: you have exceeded the maximum integer and it > does say that it gives NA. So the only 'odd' is reporting that you > did not read the documentation. I'm sorry; I thought that my message made it clear that I was aware that the NA came from exceeding the maximum representable integer. To belatedly address the other information I failed to provide, I use R on Linux, both 32-bit and 64-bit (with 64-bit R). > Point 2 is R not using the correct units for --max-vsize (it used the > number of Vcells, as was once documented), and I have fixed. Thank you; I've read the changes and I think they meet my needs. (I will try to explain how/why I want to use larger-than-integer mem.limits() below. If there's a better or more supported way to achieve what I want, that'd be fine too) > But I do wonder why you are using --max-vsize: the documentation says > it is very rarely needed, and I suspect that there are better ways to > do this. Here's the basic idea: I would like to be able to restrict R to a large amount of memory (say 4GB, for the sake of argument), but in a way such that I can increase that limit temporarily if it turns out to be necessary for some reason. The desire for a restriction is that I have found it fairly difficult to predict in advance how much memory a given calculation or analysis is going to take. Part of that is my inexperience with R, leading to hilarious thinkos, but I think that part of that difficulty to predict is going to remain even as I gain experience. I use R both on multi-user systems and on single-user-multiple-use systems, and in both cases it is usually bad if my R session causes the machine to swap; usually that swapping is not the result of a desired computation -- most often, it's from a straightforward mistake -- but it can take substantial amounts of time for the machine to respond to aborts or kill requests, and usually if the process grows enough to touch swap it will continue growing beyond the swap limit too. So, why not simply slap on an address-space ulimit instead (that being the kind of ulimit in Linux that actually works...)? Well, one reason is that it then becomes necessary to estimate at the start of an R session how much memory will be needed over the lifetime of that session; guess too low, and at some point later (maybe days or even weeks later) I might get a failure to allocate. My options at that stage would be to save the workspace and restart the session with a higher limit, or attempt to delete enough things from the existing workspace to allow the allocation to succeed. (Have I missed anything?) Saving and restarting will take substantial time (from writing ~4GB to disk) while deleting things from the existing session involves cognitive overhead that is irrelevant to my current investigation and may in any case not succeed to free enough. So, being able to raise the limit to something generally large for a short time to perform a computation, get the results, and then lower the limit again allows me to protect myself in general from overwhelming the machine with mistaken computations, while also allowing in specific cases the ability to dedicate more resources to a particular computation. > I don't find reporting values of several GB as bytes very useful, but > then mem.limits() is not useful to me either Ah, I'm not particularly interested in the reporting side of mem.limits() :-); the setting side, on the other hand, very much so. Thank you again for the fixes. Best, Christophe __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] --max-vsize
Point 1 is as documented: you have exceeded the maximum integer and it does say that it gives NA. So the only 'odd' is reporting that you did not read the documentation. Point 2 is R not using the correct units for --max-vsize (it used the number of Vcells, as was once documented), and I have fixed. But I do wonder why you are using --max-vsize: the documentation says it is very rarely needed, and I suspect that there are better ways to do this. Also, you ignored the posting guide and did not tell us the 'at a minimum' information requested: what OS was this, and was it a 32- or 64-bit R if a 64-bit OS? I don't find reporting values of several GB as bytes very useful, but then mem.limits() is not useful to me either On Thu, 21 Jul 2011, Christophe Rhodes wrote: Hi, In both R 2.13 and the SVN trunk, I observe odd behaviour with the --max-vsize command-line argument: 1. passing a largeish value (about 260M or greater) makes mem.limits() report NA for the vsize limit; gc() continues to report a value... 2. ...but that value (and the actual limit) is wrong by a factor of 8. I attach a patch for issue 2, lightly tested. I believe that fixing issue 1 involves changing the return convention of do_memlimits -- not returning a specialized integer vector, but a more general numeric; I wasn't confident to do that. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel