Re: [Rd] CRAN policies
From: x...@yihui.name Date: Tue, 27 Mar 2012 16:40:04 -0500 To: r-devel@r-project.org Subject: Re: [Rd] CRAN policies I have been wondering if it is possible to automate the checking process to reduce human efforts, e.g. automatically check the packages submitted to FTP, and send the package maintainer an email in case of warnings or errors (otherwise just move it to CRAN); package maintainers can appeal for a manual check by CRAN maintainers in case of false positives. As a package author, I really hate to bother CRAN maintainers each time I upload a new version and it passes R CMD check successfully, in which case I should have received an automatic email instead of Kurt's hand-writing thanks, on CRAN now. Frankly speaking, it makes me feel guilty sometimes to update my packages, thinking of other 3700 packages on CRAN and how much time you CRAN maintainers are spending on checking the packages. Indeed it is a good summary of how I felt for so long and in particular my recent experience, which involved Kurt, Brian, and Uwe. I think win-builder certainly helps, but it is feasible with a Linux counterpart to have a final say? I do not know how many package authors actually read this mailing list, so these policies may not really reach some authors at all. Certainly more colleagues read the list than have been revealed by the postings. Kind regards, Jing Hua Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On 28.03.2012 00:07, Hadley Wickham wrote: On Tue, Mar 27, 2012 at 6:52 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: CRAN has for some time had a policies page at http://cran.r-project.org/web/packages/policies.html and we would like to draw this to the attention of package maintainers. In particular, please Thanks for the pointer - I did not know that this page existed. In general, is there some easy way to track changes to this page and the R extension manual over time? It is difficult to keep track of the best practices. I'd also like to get clarification on Packages should not write in the users' home filespace, nor anywhere else on the file system apart from the R session's temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). - what is recommended practice for packages to maintain state across instances? Operating systems have standards for where applications can store settings (e.g. as described in http://pypi.python.org/pypi/appdirs/1.2.0). Is it acceptable to for packages to follow these conventions? The policy is meant not to overwrite user data or generate loads of temporary files from examples and pollute, e.g., the owkring directory. Uwe Ligges Hadley __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On 27.03.2012 20:33, Jeffrey Ryan wrote: Thanks Uwe for the clarification on what goes and what stays. Still fuzzy on the notion of significant though. Do you have an example or two for the list? We have to look at those notes again and again in order to find if something important is noted, hence please always try to avoid all notes unless the effect is really intended! Consider the Note No visible binding for global variable We cannot know if your code intends to use such a global variable (which is undesirable in most cases), hence would let is pass if it seems to be sensible. Another Note such as empty section or partial argument match can quickly be fixed, hence just do it and don't waste our time. Best, Uwe Ligges Jeff P.S. I meant to also thank all of CRAN volunteers for the momentous efforts involved, and it is nice to see some explanation of how we can help, as well as a peek into what goes on 'behind the curtain' ;-) On 3/27/12 1:19 PM, Uwe Liggeslig...@statistik.tu-dortmund.de wrote: On 27.03.2012 19:10, Jeffrey Ryan wrote: Is there a distinction as to NOTE vs. WARNING that is documented? I've always assumed (wrongly?) that NOTES weren't an issue with publishing on CRAN, but that they may change to WARNINGS at some point. We won't kick packages off CRAN for Notes (but we will if Warnings are not fixed), but we may not accept new submissions with significant Notes. Best, Uwe Ligges Is the process by which this happens documented somewhere? Jeff On 3/27/12 11:09 AM, Gabor Grothendieckggrothendi...@gmail.com wrote: 2012/3/27 Uwe Liggeslig...@statistik.tu-dortmund.de: On 27.03.2012 17:09, Gabor Grothendieck wrote: On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote: CRAN has for some time had a policies page at http://cran.r-project.org/web/packages/policies.html and we would like to draw this to the attention of package maintainers. In particular, please - always send a submission email to c...@r-project.org with the package name and version on the subject line. Emails sent to individual members of the team will result in delays at best. - run R CMD check --as-cran on the tarball before you submit it. Do this with the latest version of R possible: definitely R 2.14.2, preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are able to give better diagnostics, e.g. for compiled code and especially on Windows. They may also have extra checks for recently uncovered problems.) Also, please note that CRAN has a very heavy workload (186 packages were published last week) and to remain viable needs package maintainers to make its life as easy as possible. Regarding the part about warnings or significant notes in that page, its impossible to know which notes are significant and which ones are not significant except by trial and error. Right, it needs human inspection to identify false positives. We believe most package maintainers are able to see if he or she is hit by such a false positive. The problem is that a note is generated and the note is correct. Its not a false positive. But that does not tell you whether its significant or not. There is no way to know. One can either try to remove all notes (which may not be feasible) or just upload it and by trial and error find out if its accepted or not. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On 27.03.2012 20:36, Gabor Grothendieck wrote: 2012/3/27 Uwe Liggeslig...@statistik.tu-dortmund.de: On 27.03.2012 19:10, Jeffrey Ryan wrote: Is there a distinction as to NOTE vs. WARNING that is documented? I've always assumed (wrongly?) that NOTES weren't an issue with publishing on CRAN, but that they may change to WARNINGS at some point. We won't kick packages off CRAN for Notes (but we will if Warnings are not fixed), but we may not accept new submissions with significant Notes. Yes, I understand that but that does not really address the problem that one has no idea of whether a Note is significant or not so the only way to determine its significance is to submit your package and see if its accepted or not. We have to look at those notes again and again in order to find if something important is noted, hence please always try to avoid all notes unless the effect is really intended! Consider the Note No visible binding for global variable We cannot know if your code intends to use such a global variable (which is undesirable in most cases), hence would let is pass if it seems to be sensible. Another Note such as empty section or partial argument match can quickly be fixed, hence just do it and don't waste our time. Best, Uwe Ligges __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
2012/3/28 Uwe Ligges lig...@statistik.tu-dortmund.de: On 27.03.2012 20:33, Jeffrey Ryan wrote: Thanks Uwe for the clarification on what goes and what stays. Still fuzzy on the notion of significant though. Do you have an example or two for the list? We have to look at those notes again and again in order to find if something important is noted, hence please always try to avoid all notes unless the effect is really intended! Consider the Note No visible binding for global variable We cannot know if your code intends to use such a global variable (which is undesirable in most cases), hence would let is pass if it seems to be sensible. Another Note such as empty section or partial argument match can quickly be fixed, hence just do it and don't waste our time. Best, Uwe Ligges What is the point of notes vs warnings if you have to get rid of both of them? Furthermore, if there are notes that you don't have to get rid of its not fair that package developers should have to waste their time on things that are actually acceptable. Finally, it makes the whole system arbitrary since packages can be rejected based on undefined rules. Either divide notes into significant notes and ordinary notes and clearly label them as such in the output of R CMD check or else make the significant notes warnings so one can know in advance whether the package passes R CMD check or not. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] serialization regression in 2.15.0 beta
Quoting Prof Brian Ripley rip...@stats.ox.ac.uk: On 27/03/2012 22:01, Ben Goodrich wrote: In case anyone is concerned that this regression will affect them, the code was reverted to the 2.14.x behavior by r58842 | ripley | 2012-03-26 08:12:43 -0400 (Mon, 26 Mar 2012) | 1 line Changed paths: M /branches/R-2-15-branch/doc/NEWS.Rd M /branches/R-2-15-branch/src/library/parallel/R/unix/forkCluster.R M /branches/R-2-15-branch/src/library/parallel/R/unix/mcfork.R revert to XDR serialization for 2.15.0 But the underlying problem (in non-xdr binary unserialization) is AFAWK fixed: it was just that at this late stage there was too little time to test thoroughly before release. Please test R-devel on your own problem (we haven't: the issue was found using a different example from elsewhere). Indeed, the issue seems to be fixed in r-devel for my example. Thanks, Ben I am experiencing a problem related to serialization behavior in 2.15.0 beta (binary installed from Debian unstable) and 2.16.0 (from svn) that is not present in 2.14.2 (binary from Debian testing). I don't fully understand the problem. Also, I tried but have not yet been able to create a small, self-contained example that reproduces the problem. However, I do have a large, not self-contained example, which requires an alpha version (not yet on CRAN) of the mi package (the mi package on CRAN would not exhibit this issue). Anyone interested in reproducing the problem can follow the readme.txt file in this directory: http://www.columbia.edu/~bg2382/mi/serialization/ I track r-devel with git-svn and was able to git bisect to svn commit r58219 commit 799102bd9d0266fe89c3120981decf0b1f17ef11 Author: ripleyripley at 00db46b3-68df-0310-9c12-caf00c1e9a41 Date: Sat Jan 28 15:02:34 2012 + make use of non-xdr serialization;. although this commit could merely expose the problem rather than cause it. The problem occurs when the FUN called by mclapply() in the parallel package returns a S4 object that contains a slot (called X) that is a large matrix, specifically a model matrix similar to that produced by glm(). Some columns of this matrix get corrupted with wrong values (usually zero, but sometimes NaN or 10^300ish), which can be seen by examining X right before FUN returns (to mclapply()'s environment) and comparing to the same X after mclapply() returns to the calling environment. Part of svn commit r58219 is this hunk diff --git a/src/library/parallel/R/unix/mcfork.R b/src/library/parallel/R/unix/mcfork.R index 8e27534..4f92193 100644 --- a/src/library/parallel/R/unix/mcfork.R +++ b/src/library/parallel/R/unix/mcfork.R @@ -82,7 +82,8 @@ mckill- function(process, signal = 2L) ## used by mcparallel, mclapply sendMaster- function(what) { -if (!is.raw(what)) what- serialize(what, NULL, FALSE) +# This is talking to the same machine, so no point in using xdr. +if (!is.raw(what)) what- serialize(what, NULL, xdr = FALSE) .Call(C_mc_send_master, what, PACKAGE = parallel) } Contrary to the comment, I have found that if I specify xdr = TRUE, I get the expected (non-corrupted X slot) behavior in 2.16.0, even though it is forking locally on my 64bit Debian laptop with a little endian i7 processor, whose specs are goodrich at CYBERPOWERPC:/tmp/serialization$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz stepping: 7 microcode : 0x17 cpu MHz : 800.000 cache size : 6144 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips: 3990.83 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ... processor : 7 [same as processor 0] So, to summarize I get the good behavior on R 2.14.2 when using mclapply(), on 2.15.0 beta when using lapply(), and on 2.16.0 using mclapply() iff I patch in xdr = TRUE in sendMaster(). I get the bad behavior on 2.15.0 beta and unpatched 2.16.0 when using mclapply(). My session info: sessionInfo() R version 2.15.0 beta (2012-03-16 r58769) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8
[Rd] --as-cran / BuildVignettes: false
I have packages where I know CRAN and other test platforms do not have all the resources to build the vignettes, for example, access to databases. Previously I think putting BuildVignettes: false in the DESCRIPTION file resolved this, by preventing CRAN checks from attempting to run the vignette code. (If it was not this, then there was some other magic I don't understand.) Now, when I specify --as-cran, the checks fail when attempting to check R code from vignettes, even though I have BuildVignettes: false in the DESCRIPTION file. What is the mechanism for indicating that CRAN should not attempt to check this code? Perhaps it is intentionally difficult - I can see an argument for that. (For running tests there are environment variables, e.g._R_CHECK_HAVE_MYSQL_, but using these really clutters up a vignette, and it did not seem necessary to use them before.) (The difficult also occurs on R-forge, possibly because it is using --as-cran like settings.) Paul __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] --as-cran / BuildVignettes: false
On 28.03.2012 18:07, Paul Gilbert wrote: I have packages where I know CRAN and other test platforms do not have all the resources to build the vignettes, for example, access to databases. Previously I think putting BuildVignettes: false in the DESCRIPTION file resolved this, by preventing CRAN checks from attempting to run the vignette code. (If it was not this, then there was some other magic I don't understand.) Now, when I specify --as-cran, the checks fail when attempting to check R code from vignettes, even though I have BuildVignettes: false in the DESCRIPTION file. Paul, it says BuiltVignettes rather than CheckVignettes. If you want CRAN to disable those checks for some very good reason, please tell the CRAN maintainers, they will move your package to the exclude list for vignette checking. Best, Uwe What is the mechanism for indicating that CRAN should not attempt to check this code? Perhaps it is intentionally difficult - I can see an argument for that. (For running tests there are environment variables, e.g._R_CHECK_HAVE_MYSQL_, but using these really clutters up a vignette, and it did not seem necessary to use them before.) (The difficult also occurs on R-forge, possibly because it is using --as-cran like settings.) Paul __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On 28.03.2012 16:30, Gabor Grothendieck wrote: 2012/3/28 Uwe Liggeslig...@statistik.tu-dortmund.de: On 27.03.2012 20:33, Jeffrey Ryan wrote: Thanks Uwe for the clarification on what goes and what stays. Still fuzzy on the notion of significant though. Do you have an example or two for the list? We have to look at those notes again and again in order to find if something important is noted, hence please always try to avoid all notes unless the effect is really intended! Consider the Note No visible binding for global variable We cannot know if your code intends to use such a global variable (which is undesirable in most cases), hence would let is pass if it seems to be sensible. Another Note such as empty section or partial argument match can quickly be fixed, hence just do it and don't waste our time. Best, Uwe Ligges What is the point of notes vs warnings if you have to get rid of both of them? Furthermore, if there are notes that you don't have to get rid of its not fair that package developers should have to waste their time on things that are actually acceptable. Finally, it makes the whole system arbitrary since packages can be rejected based on undefined rules. Either divide notes into significant notes and ordinary notes and clearly label them as such in the output of R CMD check or else make the significant notes warnings so one can know in advance whether the package passes R CMD check or not. I tried to make clear that we cannot decide that automatically and it needs human inspection and thinking if some Note is significant or not. That why we have not made them Warnings where we are sure things have to be fixed. Please always try to avoid all notes unless the effect is really intended! How hard can it be? If Notes could be completely ignored, they would not be Notes. Uwe __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On Thu, Mar 29, 2012 at 3:30 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: 2012/3/28 Uwe Ligges lig...@statistik.tu-dortmund.de: On 27.03.2012 20:33, Jeffrey Ryan wrote: Thanks Uwe for the clarification on what goes and what stays. Still fuzzy on the notion of significant though. Do you have an example or two for the list? We have to look at those notes again and again in order to find if something important is noted, hence please always try to avoid all notes unless the effect is really intended! Consider the Note No visible binding for global variable We cannot know if your code intends to use such a global variable (which is undesirable in most cases), hence would let is pass if it seems to be sensible. Another Note such as empty section or partial argument match can quickly be fixed, hence just do it and don't waste our time. Best, Uwe Ligges What is the point of notes vs warnings if you have to get rid of both of them? Furthermore, if there are notes that you don't have to get rid of its not fair that package developers should have to waste their time on things that are actually acceptable. Finally, it makes the whole system arbitrary since packages can be rejected based on undefined rules. The notes are precisely the things for which clear rules can't be written. They are reported by CMD check because they are usually signs of coding errors, but are not warnings because their use is sometimes justified. The 'No visible binding for global variable is a good example. This found some bugs in my 'survey' package, which I removed. There is still one note of this type, which arises when I have to handle two different versions of the hexbin package with different internal structures. The note is a false positive because the use is guarded by an if(), but CMD check can't tell this. So, it's a good idea to remove all Notes that can be removed without introducing other code problems, which is nearly all of them, but occasionally there may be a good reason for code that produces a Note. But if you want a simple, unambiguous, mechanical rule for *your* packages, just eliminate all Notes. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel