Re: [Rd] CRAN policies

2012-03-31 Thread Mark.Bravington
Herewith comments on some replies to my earlier post. To avoid burying my own 
points, I'll briefly restate my views (which may have evolved a bit):

 - We should not be concocting yet more complicated rules to solve imaginary 
problems;

 - RCMD CHECK should have (i) Notes, which are up to the individual to ponder 
and are not CRAN's concern, and (ii) Warnings, which trigger rejection from 
CRAN. And Warnings should be for a really good reason. Then developers have 
clarity, and the CRANia have less to do. Surely the CRANia are on a 
hiding-to-nothing if they create work for themselves by continuing to require 
manual inspection; the torrent of packages is only going to get deeper.

 - Given the vast number of packages, the burden of work imposed by false 
positives from new Warnings (or significant Notes, ) should be very 
carefully considered against any benefits of true positives. I think the 
balance is going wrong.

Righto, here are my comments on responses, some of which overlap. Thanks for 
those; I've snipped heavily and paraphrased to save space, no offence intended.

 - Matthew D: It [all additions of Notes in RCMD CHECK] improves quality, 
surely. My comments in the final para were actually about Warnings sensu 
above, not Notes-- sorry if that was unclear. If someone is willing to add 
checks that lead just to Notes sensu above, then good on 'em. But, from my own 
experience and reports from others, I certainly do not consider that all 
Notes/Warnings really do indicate lack-of-quality (even excluding 
visible-bindings). I don't know exactly what's in QC.R, but one recent new 
thing did trigger a complaint from CRAN about a non-problem in 10-year-old 
code. The ensuing discussion cost me, and CRAN, time that neither of us have to 
spare. My job does not give me time to keep re-hitting a moving target. As to 
Memos vs Notes vs Warnings: why wouldn't two categories do? Packaging rules are 
quite complicated enough already!  (The temptation to make them even more 
baroque just to try to stem the flood is understandable, but not laudable...)

 Footnote: I've just glanced at the check results for mvbutils under R-devel. 
Another new and in my opinion unreasonable Warning has cropped up on 
10-year-old perfectly functional code (beside others which may have a point). 
I'll start a separate thread, but this reinforces my view that fixing Notes or 
even Warnings doesn't necessarily improve quality-- and it's not limited to the 
visible-bindings case.

 - Spencer G: well, I didn't say RCMD CHECK is bad! I'm not advocating 
anarchy, merely pointing out that: there are limits to what RCMD CHECK can and 
should do, that it is fulfilling two different roles which are getting muddled, 
and that not everyone finds all of it useful. I'm honestly glad you do, but I 
don't (except Codoc, as I said), so one-size-does-not-fit-all. FWIW, my own 
pathway to efficient writing relies on (i) a good debugger (the debug package), 
and (ii) a really seamless method for editing my packages on-the-fly (one part 
of the mvbutils package).


 - Paul G: [300 Notes? Please explain!] Actually, I could explain the idiom 
quicker than I can write this paragraph, but I don't want to here because I'm 
opposed on principle to Notes requiring an explanation. This part of mvbutils 
has worked for 10 years. Someone has subsequently decided that code should look 
a certain way, and has added a check that isn't in the language itself-- but 
they haven't thought of everything, and of course they never could. (That might 
be paranoid. Maybe they aren't trying to impose how things should look, and 
rather are just trying to be helpful, which would be fine. It depends on how 
Notes are being interpreted, which from this thread is no longer clear. The 
R-core line used to be Notes are just notes but now we seem to have 
significant Notes and vague threats about lots of Notes etc. Paranoia seems 
reasonable.) However, anyone interested is welcome to look at the mvbutils 
package, as Bill did. The main idiom is clearly documented in ?mlo!
 cal, and the other cases are usually eval() I think. Since there is no 
reliable way for a static check to figure out where the eval() happens, it 
hasn't got a hope of assessing whether bindings exist. Dammit, now I'm 
explaining, which I didn't want to, but only so that someone can change the 
check, mind...

  As to peace-of-mind from RCMD CHECK: well, I certainly don't have it! Two 
reasons:

  (1) You don't have to look far on CRAN to find packages that are badly 
written (and more often badly documented) but pass their RCMD CHECK fine. I 
tried this just now and got a result on my first go.

  (2) I know very well how to modify my code to evade the notes and most 
warnings, without changing any of what it does-- as could anyone with a bit of 
creativity. If I had inclination and time, I could do it. In no reasonable 
sense would it be better code, though.

 So RCMD CHECK is neither a necessary nor 

Re: [Rd] CRAN policies

2012-03-31 Thread Paul Gilbert

Mark

I would like to clarify two specific points.

On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
 ...

Someone has subsequently decided that code should look a certain way, and has 
added a check that
isn't in the language itself-- but they haven't thought of everything, and of 
course they never could.


There is a large overlap between people writing the checks and people 
writing the interpreter. Even though your code may have been working, if 
your understanding of the language definition is not consistent with 
that of the people writing the interpreter, there is no guarantee that 
it will continue to work, and in some cases the way in which it fails 
could be that it produces spurious results. I am inclined to think of 
code checks as an additional way to be sure my understanding of the R 
language is close to that of the people writing the interpreter.



It depends on how Notes are being interpreted, which from this thread is no 
longer clear.
 The R-core line used to be Notes are just notes but now we seem to 
have significant Notes and ...


My understanding, and I think that of a few other people, was incorrect, 
in that I thought some notes were intended always to remain as notes, 
and others were more serious in that they would eventually become 
warnings or errors. I think Uwe addressed this misunderstanding by 
saying that all notes are intended to become warnings or errors. In 
several cases the reason they are not yet warnings or errors is that the 
checks are not yet good enough, they produce too many false positives. 
So, this means that it is very important for us to look at the notes and 
to point out the reasons for the false positives, otherwise they may 
become warnings or errors without being recognised as such.


 ...

Paul

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-31 Thread Gabor Grothendieck
On Sat, Mar 31, 2012 at 9:57 AM, Paul Gilbert pgilbert...@gmail.com wrote:
 Mark

 I would like to clarify two specific points.

 On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
 ...

 Someone has subsequently decided that code should look a certain way, and
 has added a check that
 isn't in the language itself-- but they haven't thought of everything, and
 of course they never could.


 There is a large overlap between people writing the checks and people
 writing the interpreter. Even though your code may have been working, if
 your understanding of the language definition is not consistent with that of
 the people writing the interpreter, there is no guarantee that it will
 continue to work, and in some cases the way in which it fails could be that
 it produces spurious results. I am inclined to think of code checks as an
 additional way to be sure my understanding of the R language is close to
 that of the people writing the interpreter.

The point is that it has been historically possible to push R in
different directions even without the blessing of the core team but if
its locked down too tightly then we lose that facility and its that
loss or potential loss that is worrying.  The idea of the package
system is that it should be possible to extend R without having to
modify the core of R itself.

 It depends on how Notes are being interpreted, which from this thread is
 no longer clear.

 The R-core line used to be Notes are just notes but now we seem to have
 significant Notes and ...

 My understanding, and I think that of a few other people, was incorrect, in

I don't think so.  I think it was changed on us and I think it ought
to be changed back.

Some people on this thread seem to be framing this as a quality issue
but its nothing of the sort.  The specifics cited make it clear that
the current handling of  Notes is not improving the quality of any
package but is potentially causing thousands of package developers
needless work on packages that have been working for years.  If the
Notes are just there to be helpful that is one thing but changing the
idea of Notes so that an undefined subset of them are arbitrarily
imposed at the whim of the R core group is what is objectionable.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-31 Thread Ted Byers
 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
 On Behalf Of Paul Gilbert
 Sent: March-31-12 9:57 AM
 To: mark.braving...@csiro.au
 Cc: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
Greetings all

 Mark
 
 I would like to clarify two specific points.
 
 On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
   ...
  Someone has subsequently decided that code should look a certain way,
  and has added a check that isn't in the language itself-- but they
haven't
 thought of everything, and of course they never could.
 
 There is a large overlap between people writing the checks and people
writing
 the interpreter. Even though your code may have been working, if your
 understanding of the language definition is not consistent with that of
the
 people writing the interpreter, there is no guarantee that it will
continue to
 work, and in some cases the way in which it fails could be that it
produces
 spurious results. I am inclined to think of code checks as an additional
way to be
 sure my understanding of the R language is close to that of the people
writing
 the interpreter.
 
  It depends on how Notes are being interpreted, which from this thread is
no
 longer clear.
   The R-core line used to be Notes are just notes but now we seem to
have
 significant Notes and ...
 
 My understanding, and I think that of a few other people, was incorrect,
in that
 I thought some notes were intended always to remain as notes, and others
 were more serious in that they would eventually become warnings or errors.
I
 think Uwe addressed this misunderstanding by saying that all notes are
 intended to become warnings or errors. In several cases the reason they
are
 not yet warnings or errors is that the checks are not yet good enough,
they
 produce too many false positives.
 So, this means that it is very important for us to look at the notes and
to point
 out the reasons for the false positives, otherwise they may become
warnings or
 errors without being recognised as such.

I left the above intact as it nicely illustrates what much of this
discussion reminds me of.  Let me illustrate with the question of software
development in one of my favourite languages: C++.

The first issue to consider is, What is the language definition and who
decides?  Believe it or not, there are two answers from two very different
perspectives.  The first is favoured by language lawyers, who point to the
ANSI standard, and who will argue incessantly about the finest of details.
But to understand this, you have to understand what ANSI is: it is an
industry organization and to construct the standard, they have industry
representatives gathered, divided up into subcommittees each of which is
charged with defining the language.  And of course everyone knows that,
being human, they can get it wrong, and thus ANSI standards evolve ever so
slowly through time.  To my mind, that is not much different from what
R/core or Cran are involved in.  But the other answer comes from the
perspective of a professional software developer, and that is, that the
final arbiter of what the language is is your compiler.  If you want to get
product out the door, it doesn't matter if the standard says 'X' if the
compiler doesn't support it, or worse, implements it incorrectly.  Most
compilers have warnings and errors, and I like the idea of extending that to
have notes, but that is a matter of taste vs pragmatism.  I know many
software developers that choose to ignore warnings and fix only the errors.
Their rationale is that it takes time they don't have to fix the warnings
too.  And I know others who treat all warnings as errors unless they have
discovered that there is a compiler bug that generates spurious warnings of
a particular kind (in which case that specific warning can usually be turned
off).  Guess which group has lower bug rates on average.  I tend to fall in
the latter group, having observed that with many of these things, you either
fix them now or you will fix them, at greater cost, later.

The second issue to consider is, What constitutes good code, and what is
necessary to produce it?  That I won't answer beyond saying, 'whatever
works.'  That is because it is ultimately defined by the end users'
requirements.  that is why we have software engineers who specialize in
requirements engineering.  these are bright people who translate the wish
lists of non-technical users into functional and environmental requirements,
that the rest of us can code to.  But before we begin coding, we have QA
specialists that design a variety of tests from finely focussed unit tests
through integration tests to broadly focussed usability tests, ending with a
suite of tests that basically confirm that the requirements defined for the
product are satisfied.  Standard practice in good software houses is that
nothing gets added to the codebase unless the entire code base, with the new
or revised code,  compiles 

Re: [Rd] CRAN policies

2012-03-31 Thread Spencer Graves

Hi, Ted:


  Thank you for the most eloquent and complete description of the 
problem and opportunity I've seen in a while.



  Might you have time to review the Wikipedia articles on Package 
development process and Software repository 
(http://en.wikipedia.org/wiki/Package_development_process; 
http://en.wikipedia.org/wiki/Software_repository) and share with me your 
reactions?



  I wrote the Package development process article and part of the 
Software repository article, because the R package development process 
is superior to similar processes I've seen for other languages.  
However, I'm not a leading researcher on these issues, and your comments 
suggest that you know far more than I about this.  Humanity might 
benefit from your review of these articles.  (If you have any changes 
you might like to see, please make them or ask me to make them.  
Contributing to Wikipedia can be a very high leverage activity, as 
witnessed by the fact that the Wikipedia article on SOPA received a 
million views between the US holidays of Thanksgiving and Christmas last 
year.)



  Thanks again,
  Spencer


On 3/31/2012 8:29 AM, Ted Byers wrote:

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
On Behalf Of Paul Gilbert
Sent: March-31-12 9:57 AM
To: mark.braving...@csiro.au
Cc: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] CRAN policies


Greetings all


Mark

I would like to clarify two specific points.

On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
...

Someone has subsequently decided that code should look a certain way,
and has added a check that isn't in the language itself-- but they

haven't

thought of everything, and of course they never could.

There is a large overlap between people writing the checks and people

writing

the interpreter. Even though your code may have been working, if your
understanding of the language definition is not consistent with that of

the

people writing the interpreter, there is no guarantee that it will

continue to

work, and in some cases the way in which it fails could be that it

produces

spurious results. I am inclined to think of code checks as an additional

way to be

sure my understanding of the R language is close to that of the people

writing

the interpreter.


It depends on how Notes are being interpreted, which from this thread is

no

longer clear.
The R-core line used to be Notes are just notes but now we seem to

have

significant Notes and ...

My understanding, and I think that of a few other people, was incorrect,

in that

I thought some notes were intended always to remain as notes, and others
were more serious in that they would eventually become warnings or errors.

I

think Uwe addressed this misunderstanding by saying that all notes are
intended to become warnings or errors. In several cases the reason they

are

not yet warnings or errors is that the checks are not yet good enough,

they

produce too many false positives.
So, this means that it is very important for us to look at the notes and

to point

out the reasons for the false positives, otherwise they may become

warnings or

errors without being recognised as such.


I left the above intact as it nicely illustrates what much of this
discussion reminds me of.  Let me illustrate with the question of software
development in one of my favourite languages: C++.

The first issue to consider is, What is the language definition and who
decides?  Believe it or not, there are two answers from two very different
perspectives.  The first is favoured by language lawyers, who point to the
ANSI standard, and who will argue incessantly about the finest of details.
But to understand this, you have to understand what ANSI is: it is an
industry organization and to construct the standard, they have industry
representatives gathered, divided up into subcommittees each of which is
charged with defining the language.  And of course everyone knows that,
being human, they can get it wrong, and thus ANSI standards evolve ever so
slowly through time.  To my mind, that is not much different from what
R/core or Cran are involved in.  But the other answer comes from the
perspective of a professional software developer, and that is, that the
final arbiter of what the language is is your compiler.  If you want to get
product out the door, it doesn't matter if the standard says 'X' if the
compiler doesn't support it, or worse, implements it incorrectly.  Most
compilers have warnings and errors, and I like the idea of extending that to
have notes, but that is a matter of taste vs pragmatism.  I know many
software developers that choose to ignore warnings and fix only the errors.
Their rationale is that it takes time they don't have to fix the warnings
too.  And I know others who treat all warnings as errors unless they have
discovered that there is a compiler bug that generates spurious warnings of
a particular kind (in which case 

Re: [Rd] CRAN policies

2012-03-31 Thread Ted Byers
 -Original Message-
 From: Spencer Graves [mailto:spencer.gra...@prodsyse.com]
 Sent: March-31-12 1:56 PM
 To: Ted Byers
 Cc: 'Paul Gilbert'; mark.braving...@csiro.au; r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
 Hi, Ted:
 
 
Thank you for the most eloquent and complete description of the
problem
 and opportunity I've seen in a while.
 
To paraphrase and flagrantly plagiarize a better scholar than I, 'If I have
seen farther, it is because I stand on the shoulders of giants.'

No really, I have been doing this since the stone age, when we used rocks,
or marks cut into sticks, or knots tied in string made from hemp, as our
computing devices.  And the extent to which most of us could count was
'1,2,3, many'  ;-)

Might I suggest an additional essay for you about the place of documentation
in quality software production?  We all know the benefits of design
documentation, but documentation intended for users is, in my view,
critical.  In my view, though, I have a successful interface if users find
it so intuitive that they have no need for the wonderful documentation I
write.  I'll say no more but to give an example of the best documentation of
a software product I have seen in more than 30 years (no, I wrote neither it
nor the software it describes): http://eigen.tuxfamily.org/dox/index.html.
It is so nice to be able to commend someone who has done well!

Eigen is a C++ library supporting very efficient and fast matrix algebra,
and then some.

GSL is another very good example:
http://www.gnu.org/software/gsl/manual/html_node/ but not quite as good, in
my view, as Eigen

There is a SCM product, primarily Unix, though it does build under Cygwin,
called Aegis.  The last I looked, it had a nice explanation of the protocol
of testing, and ensuring that everything builds and passes all tests before
adding new or revised code to the codebase.  There may be support for it in
more recent products like GIT or Subversion, but to be honest I haven't had
the time to look.

To gather material for requirements gathering, and use of that to guide QA
processes and the design of one of the several suites of tests a project
usually needs, the place where the best info is in the many references
dealing with UML.

You have made a good start on those pages, but it needs to be fleshed out.
I do not recommend making either of them longer than 50% more than their
current length.  Rather, I suggest fleshing it out hypertext fashion, by
adding (links to) pages dealing with different issues in more detail than is
possible in an executive summary.

But, overall, well done.

Cheers

Ted

 
Might you have time to review the Wikipedia articles on Package
 development process and Software repository
 (http://en.wikipedia.org/wiki/Package_development_process;
 http://en.wikipedia.org/wiki/Software_repository) and share with me your
 reactions?
 
 
I wrote the Package development process article and part of the
 Software repository article, because the R package development process
 is superior to similar processes I've seen for other languages.
 However, I'm not a leading researcher on these issues, and your comments
 suggest that you know far more than I about this.  Humanity might
 benefit from your review of these articles.  (If you have any changes
 you might like to see, please make them or ask me to make them.
 Contributing to Wikipedia can be a very high leverage activity, as
 witnessed by the fact that the Wikipedia article on SOPA received a
 million views between the US holidays of Thanksgiving and Christmas last
 year.)
 
 
Thanks again,
Spencer
 
 
 On 3/31/2012 8:29 AM, Ted Byers wrote:
  -Original Message-
  From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
 project.org]
  On Behalf Of Paul Gilbert
  Sent: March-31-12 9:57 AM
  To: mark.braving...@csiro.au
  Cc: r-de...@stat.math.ethz.ch
  Subject: Re: [Rd] CRAN policies
 
  Greetings all
 
  Mark
 
  I would like to clarify two specific points.
 
  On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
  ...
  Someone has subsequently decided that code should look a certain way,
  and has added a check that isn't in the language itself-- but they
  haven't
  thought of everything, and of course they never could.
 
  There is a large overlap between people writing the checks and people
  writing
  the interpreter. Even though your code may have been working, if your
  understanding of the language definition is not consistent with that of
  the
  people writing the interpreter, there is no guarantee that it will
  continue to
  work, and in some cases the way in which it fails could be that it
  produces
  spurious results. I am inclined to think of code checks as an
additional
  way to be
  sure my understanding of the R language is close to that of the people
  writing
  the interpreter.
 
  It depends on how Notes are being interpreted, which from this thread
is
  no
  longer clear.