Re: [Rd] CRAN policies
Herewith comments on some replies to my earlier post. To avoid burying my own points, I'll briefly restate my views (which may have evolved a bit): - We should not be concocting yet more complicated rules to solve imaginary problems; - RCMD CHECK should have (i) Notes, which are up to the individual to ponder and are not CRAN's concern, and (ii) Warnings, which trigger rejection from CRAN. And Warnings should be for a really good reason. Then developers have clarity, and the CRANia have less to do. Surely the CRANia are on a hiding-to-nothing if they create work for themselves by continuing to require manual inspection; the torrent of packages is only going to get deeper. - Given the vast number of packages, the burden of work imposed by false positives from new Warnings (or significant Notes, ) should be very carefully considered against any benefits of true positives. I think the balance is going wrong. Righto, here are my comments on responses, some of which overlap. Thanks for those; I've snipped heavily and paraphrased to save space, no offence intended. - Matthew D: It [all additions of Notes in RCMD CHECK] improves quality, surely. My comments in the final para were actually about Warnings sensu above, not Notes-- sorry if that was unclear. If someone is willing to add checks that lead just to Notes sensu above, then good on 'em. But, from my own experience and reports from others, I certainly do not consider that all Notes/Warnings really do indicate lack-of-quality (even excluding visible-bindings). I don't know exactly what's in QC.R, but one recent new thing did trigger a complaint from CRAN about a non-problem in 10-year-old code. The ensuing discussion cost me, and CRAN, time that neither of us have to spare. My job does not give me time to keep re-hitting a moving target. As to Memos vs Notes vs Warnings: why wouldn't two categories do? Packaging rules are quite complicated enough already! (The temptation to make them even more baroque just to try to stem the flood is understandable, but not laudable...) Footnote: I've just glanced at the check results for mvbutils under R-devel. Another new and in my opinion unreasonable Warning has cropped up on 10-year-old perfectly functional code (beside others which may have a point). I'll start a separate thread, but this reinforces my view that fixing Notes or even Warnings doesn't necessarily improve quality-- and it's not limited to the visible-bindings case. - Spencer G: well, I didn't say RCMD CHECK is bad! I'm not advocating anarchy, merely pointing out that: there are limits to what RCMD CHECK can and should do, that it is fulfilling two different roles which are getting muddled, and that not everyone finds all of it useful. I'm honestly glad you do, but I don't (except Codoc, as I said), so one-size-does-not-fit-all. FWIW, my own pathway to efficient writing relies on (i) a good debugger (the debug package), and (ii) a really seamless method for editing my packages on-the-fly (one part of the mvbutils package). - Paul G: [300 Notes? Please explain!] Actually, I could explain the idiom quicker than I can write this paragraph, but I don't want to here because I'm opposed on principle to Notes requiring an explanation. This part of mvbutils has worked for 10 years. Someone has subsequently decided that code should look a certain way, and has added a check that isn't in the language itself-- but they haven't thought of everything, and of course they never could. (That might be paranoid. Maybe they aren't trying to impose how things should look, and rather are just trying to be helpful, which would be fine. It depends on how Notes are being interpreted, which from this thread is no longer clear. The R-core line used to be Notes are just notes but now we seem to have significant Notes and vague threats about lots of Notes etc. Paranoia seems reasonable.) However, anyone interested is welcome to look at the mvbutils package, as Bill did. The main idiom is clearly documented in ?mlo! cal, and the other cases are usually eval() I think. Since there is no reliable way for a static check to figure out where the eval() happens, it hasn't got a hope of assessing whether bindings exist. Dammit, now I'm explaining, which I didn't want to, but only so that someone can change the check, mind... As to peace-of-mind from RCMD CHECK: well, I certainly don't have it! Two reasons: (1) You don't have to look far on CRAN to find packages that are badly written (and more often badly documented) but pass their RCMD CHECK fine. I tried this just now and got a result on my first go. (2) I know very well how to modify my code to evade the notes and most warnings, without changing any of what it does-- as could anyone with a bit of creativity. If I had inclination and time, I could do it. In no reasonable sense would it be better code, though. So RCMD CHECK is neither a necessary nor
Re: [Rd] CRAN policies
Mark I would like to clarify two specific points. On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote: ... Someone has subsequently decided that code should look a certain way, and has added a check that isn't in the language itself-- but they haven't thought of everything, and of course they never could. There is a large overlap between people writing the checks and people writing the interpreter. Even though your code may have been working, if your understanding of the language definition is not consistent with that of the people writing the interpreter, there is no guarantee that it will continue to work, and in some cases the way in which it fails could be that it produces spurious results. I am inclined to think of code checks as an additional way to be sure my understanding of the R language is close to that of the people writing the interpreter. It depends on how Notes are being interpreted, which from this thread is no longer clear. The R-core line used to be Notes are just notes but now we seem to have significant Notes and ... My understanding, and I think that of a few other people, was incorrect, in that I thought some notes were intended always to remain as notes, and others were more serious in that they would eventually become warnings or errors. I think Uwe addressed this misunderstanding by saying that all notes are intended to become warnings or errors. In several cases the reason they are not yet warnings or errors is that the checks are not yet good enough, they produce too many false positives. So, this means that it is very important for us to look at the notes and to point out the reasons for the false positives, otherwise they may become warnings or errors without being recognised as such. ... Paul __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On Sat, Mar 31, 2012 at 9:57 AM, Paul Gilbert pgilbert...@gmail.com wrote: Mark I would like to clarify two specific points. On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote: ... Someone has subsequently decided that code should look a certain way, and has added a check that isn't in the language itself-- but they haven't thought of everything, and of course they never could. There is a large overlap between people writing the checks and people writing the interpreter. Even though your code may have been working, if your understanding of the language definition is not consistent with that of the people writing the interpreter, there is no guarantee that it will continue to work, and in some cases the way in which it fails could be that it produces spurious results. I am inclined to think of code checks as an additional way to be sure my understanding of the R language is close to that of the people writing the interpreter. The point is that it has been historically possible to push R in different directions even without the blessing of the core team but if its locked down too tightly then we lose that facility and its that loss or potential loss that is worrying. The idea of the package system is that it should be possible to extend R without having to modify the core of R itself. It depends on how Notes are being interpreted, which from this thread is no longer clear. The R-core line used to be Notes are just notes but now we seem to have significant Notes and ... My understanding, and I think that of a few other people, was incorrect, in I don't think so. I think it was changed on us and I think it ought to be changed back. Some people on this thread seem to be framing this as a quality issue but its nothing of the sort. The specifics cited make it clear that the current handling of Notes is not improving the quality of any package but is potentially causing thousands of package developers needless work on packages that have been working for years. If the Notes are just there to be helpful that is one thing but changing the idea of Notes so that an undefined subset of them are arbitrarily imposed at the whim of the R core group is what is objectionable. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
-Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Paul Gilbert Sent: March-31-12 9:57 AM To: mark.braving...@csiro.au Cc: r-de...@stat.math.ethz.ch Subject: Re: [Rd] CRAN policies Greetings all Mark I would like to clarify two specific points. On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote: ... Someone has subsequently decided that code should look a certain way, and has added a check that isn't in the language itself-- but they haven't thought of everything, and of course they never could. There is a large overlap between people writing the checks and people writing the interpreter. Even though your code may have been working, if your understanding of the language definition is not consistent with that of the people writing the interpreter, there is no guarantee that it will continue to work, and in some cases the way in which it fails could be that it produces spurious results. I am inclined to think of code checks as an additional way to be sure my understanding of the R language is close to that of the people writing the interpreter. It depends on how Notes are being interpreted, which from this thread is no longer clear. The R-core line used to be Notes are just notes but now we seem to have significant Notes and ... My understanding, and I think that of a few other people, was incorrect, in that I thought some notes were intended always to remain as notes, and others were more serious in that they would eventually become warnings or errors. I think Uwe addressed this misunderstanding by saying that all notes are intended to become warnings or errors. In several cases the reason they are not yet warnings or errors is that the checks are not yet good enough, they produce too many false positives. So, this means that it is very important for us to look at the notes and to point out the reasons for the false positives, otherwise they may become warnings or errors without being recognised as such. I left the above intact as it nicely illustrates what much of this discussion reminds me of. Let me illustrate with the question of software development in one of my favourite languages: C++. The first issue to consider is, What is the language definition and who decides? Believe it or not, there are two answers from two very different perspectives. The first is favoured by language lawyers, who point to the ANSI standard, and who will argue incessantly about the finest of details. But to understand this, you have to understand what ANSI is: it is an industry organization and to construct the standard, they have industry representatives gathered, divided up into subcommittees each of which is charged with defining the language. And of course everyone knows that, being human, they can get it wrong, and thus ANSI standards evolve ever so slowly through time. To my mind, that is not much different from what R/core or Cran are involved in. But the other answer comes from the perspective of a professional software developer, and that is, that the final arbiter of what the language is is your compiler. If you want to get product out the door, it doesn't matter if the standard says 'X' if the compiler doesn't support it, or worse, implements it incorrectly. Most compilers have warnings and errors, and I like the idea of extending that to have notes, but that is a matter of taste vs pragmatism. I know many software developers that choose to ignore warnings and fix only the errors. Their rationale is that it takes time they don't have to fix the warnings too. And I know others who treat all warnings as errors unless they have discovered that there is a compiler bug that generates spurious warnings of a particular kind (in which case that specific warning can usually be turned off). Guess which group has lower bug rates on average. I tend to fall in the latter group, having observed that with many of these things, you either fix them now or you will fix them, at greater cost, later. The second issue to consider is, What constitutes good code, and what is necessary to produce it? That I won't answer beyond saying, 'whatever works.' That is because it is ultimately defined by the end users' requirements. that is why we have software engineers who specialize in requirements engineering. these are bright people who translate the wish lists of non-technical users into functional and environmental requirements, that the rest of us can code to. But before we begin coding, we have QA specialists that design a variety of tests from finely focussed unit tests through integration tests to broadly focussed usability tests, ending with a suite of tests that basically confirm that the requirements defined for the product are satisfied. Standard practice in good software houses is that nothing gets added to the codebase unless the entire code base, with the new or revised code, compiles
Re: [Rd] CRAN policies
Hi, Ted: Thank you for the most eloquent and complete description of the problem and opportunity I've seen in a while. Might you have time to review the Wikipedia articles on Package development process and Software repository (http://en.wikipedia.org/wiki/Package_development_process; http://en.wikipedia.org/wiki/Software_repository) and share with me your reactions? I wrote the Package development process article and part of the Software repository article, because the R package development process is superior to similar processes I've seen for other languages. However, I'm not a leading researcher on these issues, and your comments suggest that you know far more than I about this. Humanity might benefit from your review of these articles. (If you have any changes you might like to see, please make them or ask me to make them. Contributing to Wikipedia can be a very high leverage activity, as witnessed by the fact that the Wikipedia article on SOPA received a million views between the US holidays of Thanksgiving and Christmas last year.) Thanks again, Spencer On 3/31/2012 8:29 AM, Ted Byers wrote: -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Paul Gilbert Sent: March-31-12 9:57 AM To: mark.braving...@csiro.au Cc: r-de...@stat.math.ethz.ch Subject: Re: [Rd] CRAN policies Greetings all Mark I would like to clarify two specific points. On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote: ... Someone has subsequently decided that code should look a certain way, and has added a check that isn't in the language itself-- but they haven't thought of everything, and of course they never could. There is a large overlap between people writing the checks and people writing the interpreter. Even though your code may have been working, if your understanding of the language definition is not consistent with that of the people writing the interpreter, there is no guarantee that it will continue to work, and in some cases the way in which it fails could be that it produces spurious results. I am inclined to think of code checks as an additional way to be sure my understanding of the R language is close to that of the people writing the interpreter. It depends on how Notes are being interpreted, which from this thread is no longer clear. The R-core line used to be Notes are just notes but now we seem to have significant Notes and ... My understanding, and I think that of a few other people, was incorrect, in that I thought some notes were intended always to remain as notes, and others were more serious in that they would eventually become warnings or errors. I think Uwe addressed this misunderstanding by saying that all notes are intended to become warnings or errors. In several cases the reason they are not yet warnings or errors is that the checks are not yet good enough, they produce too many false positives. So, this means that it is very important for us to look at the notes and to point out the reasons for the false positives, otherwise they may become warnings or errors without being recognised as such. I left the above intact as it nicely illustrates what much of this discussion reminds me of. Let me illustrate with the question of software development in one of my favourite languages: C++. The first issue to consider is, What is the language definition and who decides? Believe it or not, there are two answers from two very different perspectives. The first is favoured by language lawyers, who point to the ANSI standard, and who will argue incessantly about the finest of details. But to understand this, you have to understand what ANSI is: it is an industry organization and to construct the standard, they have industry representatives gathered, divided up into subcommittees each of which is charged with defining the language. And of course everyone knows that, being human, they can get it wrong, and thus ANSI standards evolve ever so slowly through time. To my mind, that is not much different from what R/core or Cran are involved in. But the other answer comes from the perspective of a professional software developer, and that is, that the final arbiter of what the language is is your compiler. If you want to get product out the door, it doesn't matter if the standard says 'X' if the compiler doesn't support it, or worse, implements it incorrectly. Most compilers have warnings and errors, and I like the idea of extending that to have notes, but that is a matter of taste vs pragmatism. I know many software developers that choose to ignore warnings and fix only the errors. Their rationale is that it takes time they don't have to fix the warnings too. And I know others who treat all warnings as errors unless they have discovered that there is a compiler bug that generates spurious warnings of a particular kind (in which case
Re: [Rd] CRAN policies
-Original Message- From: Spencer Graves [mailto:spencer.gra...@prodsyse.com] Sent: March-31-12 1:56 PM To: Ted Byers Cc: 'Paul Gilbert'; mark.braving...@csiro.au; r-de...@stat.math.ethz.ch Subject: Re: [Rd] CRAN policies Hi, Ted: Thank you for the most eloquent and complete description of the problem and opportunity I've seen in a while. To paraphrase and flagrantly plagiarize a better scholar than I, 'If I have seen farther, it is because I stand on the shoulders of giants.' No really, I have been doing this since the stone age, when we used rocks, or marks cut into sticks, or knots tied in string made from hemp, as our computing devices. And the extent to which most of us could count was '1,2,3, many' ;-) Might I suggest an additional essay for you about the place of documentation in quality software production? We all know the benefits of design documentation, but documentation intended for users is, in my view, critical. In my view, though, I have a successful interface if users find it so intuitive that they have no need for the wonderful documentation I write. I'll say no more but to give an example of the best documentation of a software product I have seen in more than 30 years (no, I wrote neither it nor the software it describes): http://eigen.tuxfamily.org/dox/index.html. It is so nice to be able to commend someone who has done well! Eigen is a C++ library supporting very efficient and fast matrix algebra, and then some. GSL is another very good example: http://www.gnu.org/software/gsl/manual/html_node/ but not quite as good, in my view, as Eigen There is a SCM product, primarily Unix, though it does build under Cygwin, called Aegis. The last I looked, it had a nice explanation of the protocol of testing, and ensuring that everything builds and passes all tests before adding new or revised code to the codebase. There may be support for it in more recent products like GIT or Subversion, but to be honest I haven't had the time to look. To gather material for requirements gathering, and use of that to guide QA processes and the design of one of the several suites of tests a project usually needs, the place where the best info is in the many references dealing with UML. You have made a good start on those pages, but it needs to be fleshed out. I do not recommend making either of them longer than 50% more than their current length. Rather, I suggest fleshing it out hypertext fashion, by adding (links to) pages dealing with different issues in more detail than is possible in an executive summary. But, overall, well done. Cheers Ted Might you have time to review the Wikipedia articles on Package development process and Software repository (http://en.wikipedia.org/wiki/Package_development_process; http://en.wikipedia.org/wiki/Software_repository) and share with me your reactions? I wrote the Package development process article and part of the Software repository article, because the R package development process is superior to similar processes I've seen for other languages. However, I'm not a leading researcher on these issues, and your comments suggest that you know far more than I about this. Humanity might benefit from your review of these articles. (If you have any changes you might like to see, please make them or ask me to make them. Contributing to Wikipedia can be a very high leverage activity, as witnessed by the fact that the Wikipedia article on SOPA received a million views between the US holidays of Thanksgiving and Christmas last year.) Thanks again, Spencer On 3/31/2012 8:29 AM, Ted Byers wrote: -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r- project.org] On Behalf Of Paul Gilbert Sent: March-31-12 9:57 AM To: mark.braving...@csiro.au Cc: r-de...@stat.math.ethz.ch Subject: Re: [Rd] CRAN policies Greetings all Mark I would like to clarify two specific points. On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote: ... Someone has subsequently decided that code should look a certain way, and has added a check that isn't in the language itself-- but they haven't thought of everything, and of course they never could. There is a large overlap between people writing the checks and people writing the interpreter. Even though your code may have been working, if your understanding of the language definition is not consistent with that of the people writing the interpreter, there is no guarantee that it will continue to work, and in some cases the way in which it fails could be that it produces spurious results. I am inclined to think of code checks as an additional way to be sure my understanding of the R language is close to that of the people writing the interpreter. It depends on how Notes are being interpreted, which from this thread is no longer clear.