Re: [Rd] CRAN policies

2012-04-02 Thread Martin Maechler
 William Dunlap wdun...@tibco.com
 on Fri, 30 Mar 2012 16:07:52 + writes:

 It looks like you define a few functions that use substitute() or 
sys.call()
 or similar functions to look at the unevaluated argument list.  E.g.,

 cq -
 function( ...) {
 # Saves putting in quotes!
 # E.G.: quoted( first, second, third) is the same as c( 'first', 
'second', 'third')
 # wrapping by as.character means cq() returns character(0) not list()
 as.character( sapply( as.list( match.call( expand.dots=TRUE))[-1], 
as.character))
 }
 %such.that% and %SUCH.THAT% do similar things.

 Almost all the complaints from check involve calls to a
 handful of such functions.  If you could tell
 codetools:::checkUsage that that these functions did
 nonstandard evaluation on all or some of their arguments
 then the complaints would go away and other checks for
 real errors like misspellings would still be done.

I agree very much with you, Bill.
Many (if not the majority) of my packages have given these false
positive notes for many months now... and I have to admit that
the effect indeed has been that I take notes much less seriously
nowadays.  This of course has never been the intention.

I'm pretty sure that most of us agree that it would be very
useful if not desirable to have a simple and robust way for
package authors to declare nonstandard evaluation to the
checkUsage() checks.
Maybe we should branch a new thread about this, for proposals on
how to go about this.

Martin


 Another possible part of the problem is that if checkUsage
 is checking a function like

 f - function(x) paste(x, cq(suffix), sep=.)
 it attributes the out-of-scope suffix problem to 'f' and doesn't mention 
that the immediate
 caller is 'cq', so you cannot easily filter output complaints about cq.  
(CRAN would
 not do such filtering, but a developer might.)

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-devel-boun...@r-project.org 
[mailto:r-devel-boun...@r-project.org] On Behalf
 Of mark.braving...@csiro.au
 Sent: Thursday, March 29, 2012 6:30 PM
 Cc: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
 I'm concerned this thread is heading the wrong way, towards techno-fixes 
for imaginary
 problems. R package-building is already encumbered with a huge set of 
complicated
 rules, and more instructions/rules eg for metadata would make things 
worse not better.
 
 
 RCMD CHECK on the 'mvbutils' package generates over 300 Notes about no 
visible
 binding..., which inevitably I just ignore. They arise because RCMD 
CHECK is too stupid
 to understand one of my preferred coding idioms (I'm not going to 
explain what-- that's
 beside the point). And RCMD CHECK always will be too stupid to 
understand everything
 that a rich language like R might quite reasonably cause experienced 
coders to do.
 
 It should not be CRAN's business how I write my code, or even whether my 
code does
 what it is supposed to. It might be CRAN's business to try to work out 
whether my code
 breaks CRAN's policies, eg by causing R to crash horribly-- that's 
presumably what
 Warnings are for (but see below). And maybe there could be circumstances 
where an
 automatic check might be worried enough to alert the CRANia and 
require manual
 explanation and emails etc from a developer, but even that seems doomed 
given the
 growing deluge of packages.
 
 RCMD CHECK currently functions both as a sanitizer for CRAN, and as a 
developer-tool.
 But the fact that the one programl does both things seems accidental to 
me, and I think
 this dual-use is muddying the discussion. There's a big distinction 
between (i) code-checks
 that developers themselves might or might not find useful-- which should 
be left to the
 developer, and will vary from person to person-- and (ii) code-checks 
that CRAN enforces
 for its own peace-of-mind. Maybe it's convenient to have both functions 
in the same
 place, and it'd be fine to use Notes for one and Warnings for the other, 
but the different
 purposes should surely be kept clear.
 
 Personally, in building over 10 packages (only 2 on CRAN), I haven't 
found RCMD CHECK
 to be of any use, except for the code-documentation and example-running 
bits. I know
 other people have different opinions, but that's the point: 
one-size-does-not-fit-all when
 it comes to coding tools.
 
 And wrto the Warnings themselves: I feel compelled to point out that 
it's logically
 impossible to fully check whether R code will do bad things. One has to 
wonder at what
 point adding new checks becomes futile or counterproductive. There must 
be over 2000
 people who have written CRAN packages by now; every extra check and 
non-back

Re: [Rd] CRAN policies

2012-03-31 Thread Mark.Bravington
 sufficient condition for virtue. 
Inspection of a language as rich as R will never be foolproof. The user simply 
has to take it on trust that a package does what it claims, or otherwise decide 
not to use it. How the package does it, is up to the author. My experience of 
other people's software is: peace-of-mind starts with helpful documentation, 
and also depends on whether I get a sense from the archives that the author 
might actually help if I run into something odd. Several well-known packages 
fail these tests, so I avoid them. Automated checks, beyond a certain limited 
point which they have probably reached, seem to me to be playing the wrong game.

 Bill D: [proposal for additional documentation mechanism] Thanks for going 
to the trouble of looking at my code; I certainly appreciate the effort, but 
your proposal is exactly what I am against! The issue is with the check, not 
with my code, and as above I do not see why I should need to add elaborate 
justifications. For some, this particular check (visible-binding) is apparently 
useful. For others, it's not. So why not just leave it as a Note that people 
can worry about or not if they want? It should not be of concern to those very 
busy CRANia people.

 Joshua W: [CRAN can set its own rules, and if a package doesn't easily fit 
them, maybe it should be put elsewhere.] Certainly CRAN/R-core (the distinction 
is shadowy to me) can, and frequently does, decide to do whatever it wants, 
including decisions about what to host. But it does not follow that every 
decision taken is axiomatically a Good Thing for R. More effort now goes into R 
development from people outside R core than inside it (3000 packages). If a 
CRAN/Rcore decision entails a lot of work for others to amend code in ways that 
do not make the code work better, then it doesn't strike me as a good decision. 
Ditto if perfectly functional code is forced off CRAN, where it is (sort of) 
easy to find-- it becomes more difficult for the wider R community to get it, 
and of course it may not get *any* checks that way. NB I am not commenting here 
on individual aspects of RCMD CHECK etc-- this is a general point about mission 
creep, helps and hindrances, and balance of workloa!
 d.


 Mark

Mark Bravington
CSIRO CMIS
Marine Lab
Hobart
Australia

From: Joshua Wiley [jwiley.ps...@gmail.com]
Sent: 31 March 2012 06:03
To: Kevin Wright
Cc: Bravington, Mark (CMIS, Hobart); r-de...@stat.math.ethz.ch
Subject: Re: [Rd] CRAN policies

On Fri, Mar 30, 2012 at 11:41 AM, Kevin Wright kw.s...@gmail.com wrote:
 I'll echo Mark's concerns.  R _used_ to be a language for turning ideas
 into software quickly.  Now it is more like prototyping ideas in software
 quickly, and then spend a substantial amount of time trying to follow
 administrative rules to package the code.

..if you want to submit to CRAN.  There are practically zero if you
host on your own website.  Of course developers are free to do
whatever they want and R core does not get to tell them what/how to do
it.  R core does get a say when you ask them to host your source and
build your package binaries.

 Quality has its costs.

So does using CRAN.  If it is not the best solution for your problem,
use something else.  Hadley uses github from development ggplot2, and
with the dev_tools package, it is relatively easy for users to install
the source ggplot2 code.  Something like that might be appropriate for
code/packages wehre you just want to 'turn ideas into software
quickly'.  There is an extra step required for users to use it, but
that makes sense because it weeds out inept users from using code with
less quality control.


 Many of the code checks I find quite useful, but the no visible binding
 one generates lots of nuisance notes for me.  I must have a similar coding
 style to Mark.

 Kevin


 On Thu, Mar 29, 2012 at 8:29 PM, mark.braving...@csiro.au wrote:

 I'm concerned this thread is heading the wrong way, towards techno-fixes
 for imaginary problems. R package-building is already encumbered with a
 huge set of complicated rules, and more instructions/rules eg for metadata
 would make things worse not better.

 RCMD CHECK on the 'mvbutils' package generates over 300 Notes about no
 visible binding..., which inevitably I just ignore. They arise because
 RCMD CHECK is too stupid to understand one of my preferred coding idioms
 (I'm not going to explain what-- that's beside the point). And RCMD CHECK
 always will be too stupid to understand everything that a rich language
 like R might quite reasonably cause experienced coders to do.

 It should not be CRAN's business how I write my code, or even whether my
 code does what it is supposed to. It might be CRAN's business to try to
 work out whether my code breaks CRAN's policies, eg by causing R to crash
 horribly-- that's presumably what Warnings are for (but see below). And
 maybe there could be circumstances where an automatic check might

Re: [Rd] CRAN policies

2012-03-31 Thread Paul Gilbert

Mark

I would like to clarify two specific points.

On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
 ...

Someone has subsequently decided that code should look a certain way, and has 
added a check that
isn't in the language itself-- but they haven't thought of everything, and of 
course they never could.


There is a large overlap between people writing the checks and people 
writing the interpreter. Even though your code may have been working, if 
your understanding of the language definition is not consistent with 
that of the people writing the interpreter, there is no guarantee that 
it will continue to work, and in some cases the way in which it fails 
could be that it produces spurious results. I am inclined to think of 
code checks as an additional way to be sure my understanding of the R 
language is close to that of the people writing the interpreter.



It depends on how Notes are being interpreted, which from this thread is no 
longer clear.
 The R-core line used to be Notes are just notes but now we seem to 
have significant Notes and ...


My understanding, and I think that of a few other people, was incorrect, 
in that I thought some notes were intended always to remain as notes, 
and others were more serious in that they would eventually become 
warnings or errors. I think Uwe addressed this misunderstanding by 
saying that all notes are intended to become warnings or errors. In 
several cases the reason they are not yet warnings or errors is that the 
checks are not yet good enough, they produce too many false positives. 
So, this means that it is very important for us to look at the notes and 
to point out the reasons for the false positives, otherwise they may 
become warnings or errors without being recognised as such.


 ...

Paul

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-31 Thread Gabor Grothendieck
On Sat, Mar 31, 2012 at 9:57 AM, Paul Gilbert pgilbert...@gmail.com wrote:
 Mark

 I would like to clarify two specific points.

 On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
 ...

 Someone has subsequently decided that code should look a certain way, and
 has added a check that
 isn't in the language itself-- but they haven't thought of everything, and
 of course they never could.


 There is a large overlap between people writing the checks and people
 writing the interpreter. Even though your code may have been working, if
 your understanding of the language definition is not consistent with that of
 the people writing the interpreter, there is no guarantee that it will
 continue to work, and in some cases the way in which it fails could be that
 it produces spurious results. I am inclined to think of code checks as an
 additional way to be sure my understanding of the R language is close to
 that of the people writing the interpreter.

The point is that it has been historically possible to push R in
different directions even without the blessing of the core team but if
its locked down too tightly then we lose that facility and its that
loss or potential loss that is worrying.  The idea of the package
system is that it should be possible to extend R without having to
modify the core of R itself.

 It depends on how Notes are being interpreted, which from this thread is
 no longer clear.

 The R-core line used to be Notes are just notes but now we seem to have
 significant Notes and ...

 My understanding, and I think that of a few other people, was incorrect, in

I don't think so.  I think it was changed on us and I think it ought
to be changed back.

Some people on this thread seem to be framing this as a quality issue
but its nothing of the sort.  The specifics cited make it clear that
the current handling of  Notes is not improving the quality of any
package but is potentially causing thousands of package developers
needless work on packages that have been working for years.  If the
Notes are just there to be helpful that is one thing but changing the
idea of Notes so that an undefined subset of them are arbitrarily
imposed at the whim of the R core group is what is objectionable.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-31 Thread Ted Byers
 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
 On Behalf Of Paul Gilbert
 Sent: March-31-12 9:57 AM
 To: mark.braving...@csiro.au
 Cc: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
Greetings all

 Mark
 
 I would like to clarify two specific points.
 
 On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
   ...
  Someone has subsequently decided that code should look a certain way,
  and has added a check that isn't in the language itself-- but they
haven't
 thought of everything, and of course they never could.
 
 There is a large overlap between people writing the checks and people
writing
 the interpreter. Even though your code may have been working, if your
 understanding of the language definition is not consistent with that of
the
 people writing the interpreter, there is no guarantee that it will
continue to
 work, and in some cases the way in which it fails could be that it
produces
 spurious results. I am inclined to think of code checks as an additional
way to be
 sure my understanding of the R language is close to that of the people
writing
 the interpreter.
 
  It depends on how Notes are being interpreted, which from this thread is
no
 longer clear.
   The R-core line used to be Notes are just notes but now we seem to
have
 significant Notes and ...
 
 My understanding, and I think that of a few other people, was incorrect,
in that
 I thought some notes were intended always to remain as notes, and others
 were more serious in that they would eventually become warnings or errors.
I
 think Uwe addressed this misunderstanding by saying that all notes are
 intended to become warnings or errors. In several cases the reason they
are
 not yet warnings or errors is that the checks are not yet good enough,
they
 produce too many false positives.
 So, this means that it is very important for us to look at the notes and
to point
 out the reasons for the false positives, otherwise they may become
warnings or
 errors without being recognised as such.

I left the above intact as it nicely illustrates what much of this
discussion reminds me of.  Let me illustrate with the question of software
development in one of my favourite languages: C++.

The first issue to consider is, What is the language definition and who
decides?  Believe it or not, there are two answers from two very different
perspectives.  The first is favoured by language lawyers, who point to the
ANSI standard, and who will argue incessantly about the finest of details.
But to understand this, you have to understand what ANSI is: it is an
industry organization and to construct the standard, they have industry
representatives gathered, divided up into subcommittees each of which is
charged with defining the language.  And of course everyone knows that,
being human, they can get it wrong, and thus ANSI standards evolve ever so
slowly through time.  To my mind, that is not much different from what
R/core or Cran are involved in.  But the other answer comes from the
perspective of a professional software developer, and that is, that the
final arbiter of what the language is is your compiler.  If you want to get
product out the door, it doesn't matter if the standard says 'X' if the
compiler doesn't support it, or worse, implements it incorrectly.  Most
compilers have warnings and errors, and I like the idea of extending that to
have notes, but that is a matter of taste vs pragmatism.  I know many
software developers that choose to ignore warnings and fix only the errors.
Their rationale is that it takes time they don't have to fix the warnings
too.  And I know others who treat all warnings as errors unless they have
discovered that there is a compiler bug that generates spurious warnings of
a particular kind (in which case that specific warning can usually be turned
off).  Guess which group has lower bug rates on average.  I tend to fall in
the latter group, having observed that with many of these things, you either
fix them now or you will fix them, at greater cost, later.

The second issue to consider is, What constitutes good code, and what is
necessary to produce it?  That I won't answer beyond saying, 'whatever
works.'  That is because it is ultimately defined by the end users'
requirements.  that is why we have software engineers who specialize in
requirements engineering.  these are bright people who translate the wish
lists of non-technical users into functional and environmental requirements,
that the rest of us can code to.  But before we begin coding, we have QA
specialists that design a variety of tests from finely focussed unit tests
through integration tests to broadly focussed usability tests, ending with a
suite of tests that basically confirm that the requirements defined for the
product are satisfied.  Standard practice in good software houses is that
nothing gets added to the codebase unless the entire code base, with the new
or revised code,  compiles

Re: [Rd] CRAN policies

2012-03-31 Thread Spencer Graves

Hi, Ted:


  Thank you for the most eloquent and complete description of the 
problem and opportunity I've seen in a while.



  Might you have time to review the Wikipedia articles on Package 
development process and Software repository 
(http://en.wikipedia.org/wiki/Package_development_process; 
http://en.wikipedia.org/wiki/Software_repository) and share with me your 
reactions?



  I wrote the Package development process article and part of the 
Software repository article, because the R package development process 
is superior to similar processes I've seen for other languages.  
However, I'm not a leading researcher on these issues, and your comments 
suggest that you know far more than I about this.  Humanity might 
benefit from your review of these articles.  (If you have any changes 
you might like to see, please make them or ask me to make them.  
Contributing to Wikipedia can be a very high leverage activity, as 
witnessed by the fact that the Wikipedia article on SOPA received a 
million views between the US holidays of Thanksgiving and Christmas last 
year.)



  Thanks again,
  Spencer


On 3/31/2012 8:29 AM, Ted Byers wrote:

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
On Behalf Of Paul Gilbert
Sent: March-31-12 9:57 AM
To: mark.braving...@csiro.au
Cc: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] CRAN policies


Greetings all


Mark

I would like to clarify two specific points.

On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
...

Someone has subsequently decided that code should look a certain way,
and has added a check that isn't in the language itself-- but they

haven't

thought of everything, and of course they never could.

There is a large overlap between people writing the checks and people

writing

the interpreter. Even though your code may have been working, if your
understanding of the language definition is not consistent with that of

the

people writing the interpreter, there is no guarantee that it will

continue to

work, and in some cases the way in which it fails could be that it

produces

spurious results. I am inclined to think of code checks as an additional

way to be

sure my understanding of the R language is close to that of the people

writing

the interpreter.


It depends on how Notes are being interpreted, which from this thread is

no

longer clear.
The R-core line used to be Notes are just notes but now we seem to

have

significant Notes and ...

My understanding, and I think that of a few other people, was incorrect,

in that

I thought some notes were intended always to remain as notes, and others
were more serious in that they would eventually become warnings or errors.

I

think Uwe addressed this misunderstanding by saying that all notes are
intended to become warnings or errors. In several cases the reason they

are

not yet warnings or errors is that the checks are not yet good enough,

they

produce too many false positives.
So, this means that it is very important for us to look at the notes and

to point

out the reasons for the false positives, otherwise they may become

warnings or

errors without being recognised as such.


I left the above intact as it nicely illustrates what much of this
discussion reminds me of.  Let me illustrate with the question of software
development in one of my favourite languages: C++.

The first issue to consider is, What is the language definition and who
decides?  Believe it or not, there are two answers from two very different
perspectives.  The first is favoured by language lawyers, who point to the
ANSI standard, and who will argue incessantly about the finest of details.
But to understand this, you have to understand what ANSI is: it is an
industry organization and to construct the standard, they have industry
representatives gathered, divided up into subcommittees each of which is
charged with defining the language.  And of course everyone knows that,
being human, they can get it wrong, and thus ANSI standards evolve ever so
slowly through time.  To my mind, that is not much different from what
R/core or Cran are involved in.  But the other answer comes from the
perspective of a professional software developer, and that is, that the
final arbiter of what the language is is your compiler.  If you want to get
product out the door, it doesn't matter if the standard says 'X' if the
compiler doesn't support it, or worse, implements it incorrectly.  Most
compilers have warnings and errors, and I like the idea of extending that to
have notes, but that is a matter of taste vs pragmatism.  I know many
software developers that choose to ignore warnings and fix only the errors.
Their rationale is that it takes time they don't have to fix the warnings
too.  And I know others who treat all warnings as errors unless they have
discovered that there is a compiler bug that generates spurious warnings of
a particular kind (in which case

Re: [Rd] CRAN policies

2012-03-31 Thread Ted Byers
 -Original Message-
 From: Spencer Graves [mailto:spencer.gra...@prodsyse.com]
 Sent: March-31-12 1:56 PM
 To: Ted Byers
 Cc: 'Paul Gilbert'; mark.braving...@csiro.au; r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
 Hi, Ted:
 
 
Thank you for the most eloquent and complete description of the
problem
 and opportunity I've seen in a while.
 
To paraphrase and flagrantly plagiarize a better scholar than I, 'If I have
seen farther, it is because I stand on the shoulders of giants.'

No really, I have been doing this since the stone age, when we used rocks,
or marks cut into sticks, or knots tied in string made from hemp, as our
computing devices.  And the extent to which most of us could count was
'1,2,3, many'  ;-)

Might I suggest an additional essay for you about the place of documentation
in quality software production?  We all know the benefits of design
documentation, but documentation intended for users is, in my view,
critical.  In my view, though, I have a successful interface if users find
it so intuitive that they have no need for the wonderful documentation I
write.  I'll say no more but to give an example of the best documentation of
a software product I have seen in more than 30 years (no, I wrote neither it
nor the software it describes): http://eigen.tuxfamily.org/dox/index.html.
It is so nice to be able to commend someone who has done well!

Eigen is a C++ library supporting very efficient and fast matrix algebra,
and then some.

GSL is another very good example:
http://www.gnu.org/software/gsl/manual/html_node/ but not quite as good, in
my view, as Eigen

There is a SCM product, primarily Unix, though it does build under Cygwin,
called Aegis.  The last I looked, it had a nice explanation of the protocol
of testing, and ensuring that everything builds and passes all tests before
adding new or revised code to the codebase.  There may be support for it in
more recent products like GIT or Subversion, but to be honest I haven't had
the time to look.

To gather material for requirements gathering, and use of that to guide QA
processes and the design of one of the several suites of tests a project
usually needs, the place where the best info is in the many references
dealing with UML.

You have made a good start on those pages, but it needs to be fleshed out.
I do not recommend making either of them longer than 50% more than their
current length.  Rather, I suggest fleshing it out hypertext fashion, by
adding (links to) pages dealing with different issues in more detail than is
possible in an executive summary.

But, overall, well done.

Cheers

Ted

 
Might you have time to review the Wikipedia articles on Package
 development process and Software repository
 (http://en.wikipedia.org/wiki/Package_development_process;
 http://en.wikipedia.org/wiki/Software_repository) and share with me your
 reactions?
 
 
I wrote the Package development process article and part of the
 Software repository article, because the R package development process
 is superior to similar processes I've seen for other languages.
 However, I'm not a leading researcher on these issues, and your comments
 suggest that you know far more than I about this.  Humanity might
 benefit from your review of these articles.  (If you have any changes
 you might like to see, please make them or ask me to make them.
 Contributing to Wikipedia can be a very high leverage activity, as
 witnessed by the fact that the Wikipedia article on SOPA received a
 million views between the US holidays of Thanksgiving and Christmas last
 year.)
 
 
Thanks again,
Spencer
 
 
 On 3/31/2012 8:29 AM, Ted Byers wrote:
  -Original Message-
  From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
 project.org]
  On Behalf Of Paul Gilbert
  Sent: March-31-12 9:57 AM
  To: mark.braving...@csiro.au
  Cc: r-de...@stat.math.ethz.ch
  Subject: Re: [Rd] CRAN policies
 
  Greetings all
 
  Mark
 
  I would like to clarify two specific points.
 
  On 12-03-31 04:41 AM, mark.braving...@csiro.au wrote:
  ...
  Someone has subsequently decided that code should look a certain way,
  and has added a check that isn't in the language itself-- but they
  haven't
  thought of everything, and of course they never could.
 
  There is a large overlap between people writing the checks and people
  writing
  the interpreter. Even though your code may have been working, if your
  understanding of the language definition is not consistent with that of
  the
  people writing the interpreter, there is no guarantee that it will
  continue to
  work, and in some cases the way in which it fails could be that it
  produces
  spurious results. I am inclined to think of code checks as an
additional
  way to be
  sure my understanding of the R language is close to that of the people
  writing
  the interpreter.
 
  It depends on how Notes are being interpreted, which from this thread
is
  no
  longer clear

Re: [Rd] CRAN policies

2012-03-30 Thread Matthew Dowle
Mark.Bravington at csiro.au writes:

 There must be over 2000 people who have written CRAN packages by now; every 
extra
 check and non-back-compatible additional requirement runs the risk of 
generating false-negatives and
 incurring many extra person-hours to fix non-problems. Plus someone needs 
to document and explain the
 check (adding to the rule mountain), plus there is the time spent in 
discussions like this..!

Not sure where you're coming from on that. For example, Prof Ripley has added 
quite a few new NOTEs to QC.R over the last few months. These caught things I 
wasn't aware of in the two packages I maintain and I was more than happy to fix 
them. It improves quality, surely.

There's only one particular NOTE causing an issue: 'no visible binding'. If it 
were made a MEMO, we can move on. All the other NOTEs can (and should) be 
fixed, can't they?

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-30 Thread Claudia Beleites
Paul,

 One of the things I have noticed with the R 2.15.0 RC and --as-cran is 
 that the I have to bump the version number of the working copy of my 
[snip]
 
 I am curious how other developers approach this.

Regardless of --as-cran I find it very useful to use the date as minor
part of the version number (e.g. hyperSpec 0.98-20120320), which I set
automatically.

Claudia





-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-30 Thread William Dunlap
It looks like you define a few functions that use substitute() or sys.call()
or similar functions to look at the unevaluated argument list.  E.g.,

  cq -
  function( ...) {
  # Saves putting in quotes!
  # E.G.: quoted( first, second, third) is the same as c( 'first', 'second', 
'third')
  # wrapping by as.character means cq() returns character(0) not list()
as.character( sapply( as.list( match.call( expand.dots=TRUE))[-1], 
as.character))
  }
%such.that% and %SUCH.THAT% do similar things.

Almost all the complaints from check involve calls to a handful of such
functions.  If you could tell codetools:::checkUsage that that these functions
did nonstandard evaluation on all or some of their arguments then the
complaints would go away and other checks for  real errors like misspellings
would still be done.

Another possible part of the problem is that if checkUsage is checking a 
function like
  f - function(x) paste(x, cq(suffix), sep=.)
it attributes the out-of-scope suffix problem to 'f' and doesn't mention that 
the immediate
caller is 'cq', so you cannot easily filter output complaints about cq.  (CRAN 
would
not do such filtering, but a developer might.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
 Behalf
 Of mark.braving...@csiro.au
 Sent: Thursday, March 29, 2012 6:30 PM
 Cc: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
 I'm concerned this thread is heading the wrong way, towards techno-fixes for 
 imaginary
 problems. R package-building is already encumbered with a huge set of 
 complicated
 rules, and more instructions/rules eg for metadata would make things worse 
 not better.
 
 
 RCMD CHECK on the 'mvbutils' package generates over 300 Notes about no 
 visible
 binding..., which inevitably I just ignore. They arise because RCMD CHECK is 
 too stupid
 to understand one of my preferred coding idioms (I'm not going to explain 
 what-- that's
 beside the point). And RCMD CHECK always will be too stupid to understand 
 everything
 that a rich language like R might quite reasonably cause experienced coders 
 to do.
 
 It should not be CRAN's business how I write my code, or even whether my code 
 does
 what it is supposed to. It might be CRAN's business to try to work out 
 whether my code
 breaks CRAN's policies, eg by causing R to crash horribly-- that's presumably 
 what
 Warnings are for (but see below). And maybe there could be circumstances 
 where an
 automatic check might be worried enough to alert the CRANia and require 
 manual
 explanation and emails etc from a developer, but even that seems doomed given 
 the
 growing deluge of packages.
 
 RCMD CHECK currently functions both as a sanitizer for CRAN, and as a 
 developer-tool.
 But the fact that the one programl does both things seems accidental to me, 
 and I think
 this dual-use is muddying the discussion. There's a big distinction between 
 (i) code-checks
 that developers themselves might or might not find useful-- which should be 
 left to the
 developer, and will vary from person to person-- and (ii) code-checks that 
 CRAN enforces
 for its own peace-of-mind. Maybe it's convenient to have both functions in 
 the same
 place, and it'd be fine to use Notes for one and Warnings for the other, but 
 the different
 purposes should surely be kept clear.
 
 Personally, in building over 10 packages (only 2 on CRAN), I haven't found 
 RCMD CHECK
 to be of any use, except for the code-documentation and example-running bits. 
 I know
 other people have different opinions, but that's the point: 
 one-size-does-not-fit-all when
 it comes to coding tools.
 
 And wrto the Warnings themselves: I feel compelled to point out that it's 
 logically
 impossible to fully check whether R code will do bad things. One has to 
 wonder at what
 point adding new checks becomes futile or counterproductive. There must be 
 over 2000
 people who have written CRAN packages by now; every extra check and non-back-
 compatible additional requirement runs the risk of generating false-negatives 
 and
 incurring many extra person-hours to fix non-problems. Plus someone needs to
 document and explain the check (adding to the rule mountain), plus there is 
 the time
 spent in discussions like this..!
 
 Mark
 
 Mark Bravington
 CSIRO CMIS
 Marine Lab
 Hobart
 Australia
 
 From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On Behalf 
 Of
 Hadley Wickham [had...@rice.edu]
 Sent: 30 March 2012 07:42
 To: William Dunlap
 Cc: r-de...@stat.math.ethz.ch; Spencer Graves
 Subject: Re: [Rd] CRAN policies
 
  Most of that stuff is already in codetools, at least when it is checking 
  functions
  with checkUsage().  E.g., arguments of ~ are not checked.  The  expr 
  argument
  to with() will not be checked if you add  skipWith=FALSE to the call to 
  checkUsage.
 
library(codetools

Re: [Rd] CRAN policies

2012-03-30 Thread Kevin Wright
I'll echo Mark's concerns.  R _used_ to be a language for turning ideas
into software quickly.  Now it is more like prototyping ideas in software
quickly, and then spend a substantial amount of time trying to follow
administrative rules to package the code.  Quality has its costs.

Many of the code checks I find quite useful, but the no visible binding
one generates lots of nuisance notes for me.  I must have a similar coding
style to Mark.

Kevin


On Thu, Mar 29, 2012 at 8:29 PM, mark.braving...@csiro.au wrote:

 I'm concerned this thread is heading the wrong way, towards techno-fixes
 for imaginary problems. R package-building is already encumbered with a
 huge set of complicated rules, and more instructions/rules eg for metadata
 would make things worse not better.

 RCMD CHECK on the 'mvbutils' package generates over 300 Notes about no
 visible binding..., which inevitably I just ignore. They arise because
 RCMD CHECK is too stupid to understand one of my preferred coding idioms
 (I'm not going to explain what-- that's beside the point). And RCMD CHECK
 always will be too stupid to understand everything that a rich language
 like R might quite reasonably cause experienced coders to do.

 It should not be CRAN's business how I write my code, or even whether my
 code does what it is supposed to. It might be CRAN's business to try to
 work out whether my code breaks CRAN's policies, eg by causing R to crash
 horribly-- that's presumably what Warnings are for (but see below). And
 maybe there could be circumstances where an automatic check might be
 worried enough to alert the CRANia and require manual explanation and
 emails etc from a developer, but even that seems doomed given the growing
 deluge of packages.

 RCMD CHECK currently functions both as a sanitizer for CRAN, and as a
 developer-tool. But the fact that the one programl does both things seems
 accidental to me, and I think this dual-use is muddying the discussion.
 There's a big distinction between (i) code-checks that developers
 themselves might or might not find useful-- which should be left to the
 developer, and will vary from person to person-- and (ii) code-checks that
 CRAN enforces for its own peace-of-mind. Maybe it's convenient to have both
 functions in the same place, and it'd be fine to use Notes for one and
 Warnings for the other, but the different purposes should surely be kept
 clear.

 Personally, in building over 10 packages (only 2 on CRAN), I haven't found
 RCMD CHECK to be of any use, except for the code-documentation and
 example-running bits. I know other people have different opinions, but
 that's the point: one-size-does-not-fit-all when it comes to coding tools.

 And wrto the Warnings themselves: I feel compelled to point out that it's
 logically impossible to fully check whether R code will do bad things. One
 has to wonder at what point adding new checks becomes futile or
 counterproductive. There must be over 2000 people who have written CRAN
 packages by now; every extra check and non-back-compatible additional
 requirement runs the risk of generating false-negatives and incurring many
 extra person-hours to fix non-problems. Plus someone needs to document
 and explain the check (adding to the rule mountain), plus there is the time
 spent in discussions like this..!

 Mark

 Mark Bravington
 CSIRO CMIS
 Marine Lab
 Hobart
 Australia
 
 From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On
 Behalf Of Hadley Wickham [had...@rice.edu]
 Sent: 30 March 2012 07:42
 To: William Dunlap
 Cc: r-de...@stat.math.ethz.ch; Spencer Graves
 Subject: Re: [Rd] CRAN policies

  Most of that stuff is already in codetools, at least when it is checking
 functions
  with checkUsage().  E.g., arguments of ~ are not checked.  The  expr
 argument
  to with() will not be checked if you add  skipWith=FALSE to the call to
 checkUsage.
 
library(codetools)
 
checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~
 Pred}))
   anonymous: no visible binding for global variable 'Num' (:1)
   anonymous: no visible binding for global variable 'Den' (:1)
 
checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~
 Pred}), skipWith=TRUE)
 
checkUsage(function(dataFrame) with(DataFrame, {Num/Den ; Resp ~
 Pred}), skipWith=TRUE)
   anonymous: no visible binding for global variable 'DataFrame'
 
  The only part that I don't see is the mechanism to add code-walker
 functions to
  the environment in codetools that has the standard list of them for
 functions with
  nonstandard evaluation:
objects(codetools:::collectUsageHandlers, all=TRUE)
[1] $ $-   .Internal
[4] :::::   @
[7] @-   { ~
   [10] --   =
   [13] assignbinomial  bquote
   [16] data  detachexpression
   [19] for   function  Gamma
   [22] gaussian

Re: [Rd] CRAN policies

2012-03-30 Thread Joshua Wiley
On Fri, Mar 30, 2012 at 11:41 AM, Kevin Wright kw.s...@gmail.com wrote:
 I'll echo Mark's concerns.  R _used_ to be a language for turning ideas
 into software quickly.  Now it is more like prototyping ideas in software
 quickly, and then spend a substantial amount of time trying to follow
 administrative rules to package the code.

..if you want to submit to CRAN.  There are practically zero if you
host on your own website.  Of course developers are free to do
whatever they want and R core does not get to tell them what/how to do
it.  R core does get a say when you ask them to host your source and
build your package binaries.

 Quality has its costs.

So does using CRAN.  If it is not the best solution for your problem,
use something else.  Hadley uses github from development ggplot2, and
with the dev_tools package, it is relatively easy for users to install
the source ggplot2 code.  Something like that might be appropriate for
code/packages wehre you just want to 'turn ideas into software
quickly'.  There is an extra step required for users to use it, but
that makes sense because it weeds out inept users from using code with
less quality control.


 Many of the code checks I find quite useful, but the no visible binding
 one generates lots of nuisance notes for me.  I must have a similar coding
 style to Mark.

 Kevin


 On Thu, Mar 29, 2012 at 8:29 PM, mark.braving...@csiro.au wrote:

 I'm concerned this thread is heading the wrong way, towards techno-fixes
 for imaginary problems. R package-building is already encumbered with a
 huge set of complicated rules, and more instructions/rules eg for metadata
 would make things worse not better.

 RCMD CHECK on the 'mvbutils' package generates over 300 Notes about no
 visible binding..., which inevitably I just ignore. They arise because
 RCMD CHECK is too stupid to understand one of my preferred coding idioms
 (I'm not going to explain what-- that's beside the point). And RCMD CHECK
 always will be too stupid to understand everything that a rich language
 like R might quite reasonably cause experienced coders to do.

 It should not be CRAN's business how I write my code, or even whether my
 code does what it is supposed to. It might be CRAN's business to try to
 work out whether my code breaks CRAN's policies, eg by causing R to crash
 horribly-- that's presumably what Warnings are for (but see below). And
 maybe there could be circumstances where an automatic check might be
 worried enough to alert the CRANia and require manual explanation and
 emails etc from a developer, but even that seems doomed given the growing
 deluge of packages.

 RCMD CHECK currently functions both as a sanitizer for CRAN, and as a
 developer-tool. But the fact that the one programl does both things seems
 accidental to me, and I think this dual-use is muddying the discussion.
 There's a big distinction between (i) code-checks that developers
 themselves might or might not find useful-- which should be left to the
 developer, and will vary from person to person-- and (ii) code-checks that
 CRAN enforces for its own peace-of-mind. Maybe it's convenient to have both
 functions in the same place, and it'd be fine to use Notes for one and
 Warnings for the other, but the different purposes should surely be kept
 clear.

 Personally, in building over 10 packages (only 2 on CRAN), I haven't found
 RCMD CHECK to be of any use, except for the code-documentation and
 example-running bits. I know other people have different opinions, but
 that's the point: one-size-does-not-fit-all when it comes to coding tools.

 And wrto the Warnings themselves: I feel compelled to point out that it's
 logically impossible to fully check whether R code will do bad things. One
 has to wonder at what point adding new checks becomes futile or
 counterproductive. There must be over 2000 people who have written CRAN
 packages by now; every extra check and non-back-compatible additional
 requirement runs the risk of generating false-negatives and incurring many
 extra person-hours to fix non-problems. Plus someone needs to document
 and explain the check (adding to the rule mountain), plus there is the time
 spent in discussions like this..!

 Mark

 Mark Bravington
 CSIRO CMIS
 Marine Lab
 Hobart
 Australia
 
 From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On
 Behalf Of Hadley Wickham [had...@rice.edu]
 Sent: 30 March 2012 07:42
 To: William Dunlap
 Cc: r-de...@stat.math.ethz.ch; Spencer Graves
 Subject: Re: [Rd] CRAN policies

  Most of that stuff is already in codetools, at least when it is checking
 functions
  with checkUsage().  E.g., arguments of ~ are not checked.  The  expr
 argument
  to with() will not be checked if you add  skipWith=FALSE to the call to
 checkUsage.
 
    library(codetools)
 
    checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~
 Pred}))
   anonymous: no visible binding for global variable 'Num' (:1

Re: [Rd] CRAN policies

2012-03-29 Thread Gabor Grothendieck
On Wed, Mar 28, 2012 at 11:52 PM, Thomas Lumley tlum...@uw.edu wrote:
 On Thu, Mar 29, 2012 at 3:30 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 2012/3/28 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 27.03.2012 20:33, Jeffrey Ryan wrote:

 Thanks Uwe for the clarification on what goes and what stays.

 Still fuzzy on the notion of significant though.  Do you have an example
 or two for the list?



 We have to look at those notes again and again in order to find if something
 important is noted, hence please always try to avoid all notes unless the
 effect is really intended!


 Consider the Note No visible binding for global variable
 We cannot know if your code intends to use such a global variable (which is
 undesirable in most cases), hence would let is pass if it seems to be
 sensible.

 Another Note such as empty section or partial argument match can quickly
 be fixed, hence just do it and don't waste our time.

 Best,
 Uwe Ligges

 What is the point of notes vs warnings if you have to get rid of both
 of them?  Furthermore, if there are notes that you don't have to get
 rid of its not fair that package developers should have to waste their
 time on things that are actually acceptable.  Finally, it makes the
 whole system arbitrary since packages can be rejected based on
 undefined rules.


 The notes are precisely the things for which clear rules can't be
 written.  They are reported by CMD check because they are usually
 signs of coding errors, but are not warnings because their use is
 sometimes justified.

 The 'No visible binding for global variable is a good example.  This
 found some bugs in my 'survey' package, which I removed. There is
 still one note of this type, which arises when I have to handle two
 different versions of the hexbin package with different internal
 structures.  The note is a false positive because the use is guarded
 by an if(), but  CMD check can't tell this.   So, it's a good idea to
 remove all Notes that can be removed without introducing other code
 problems, which is nearly all of them, but occasionally there may be a
 good reason for code that produces a Note.

 But if you want a simple, unambiguous, mechanical rule for *your*
 packages, just eliminate all Notes.

I think it would be more objective and also easiest for everyone if
notes were accepted.

It might be that over time some notes could be split into multiple
cases some of which are warnings and others continue to be notes.

That way package developers don't have to waste their time on getting
rid of notes which don't matter and the CRAN maintainers can turn the
task of reviewing notes over to the computer.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Brian G. Peterson
On Thu, 2012-03-29 at 16:52 +1300, Thomas Lumley wrote:
 The 'No visible binding for global variable is a good example.  This
 found some bugs in my 'survey' package, which I removed. There is
 still one note of this type, which arises when I have to handle two
 different versions of the hexbin package with different internal
 structures.  The note is a false positive because the use is guarded
 by an if(), but  CMD check can't tell this.   So, it's a good idea to
 remove all Notes that can be removed without introducing other code
 problems, which is nearly all of them, but occasionally there may be a
 good reason for code that produces a Note.
 
'occasionally' seems like an understatement.

Here's an example:

data(cars)
lm(speed ~ dist,cars) #would produce global variables NOTE
lm(speed ~ dist,cars) # would not produce the NOTE

While the change required to avoid the CRAN NOTE is small, I can't think
of a single example or text on using formulas that recommends quoting
the formula as a best practice.  I'm not sure how users or package
authors are supposed to know that they should use a (non standard) way
of specifying the formula to avoid wasting their time, and the CRAN
volunteers time.  I'm certain that there are many other examples, but
this one was easy to demonstrate.

Regards,

   - Brian

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread peter dalgaard

On Mar 29, 2012, at 14:58 , Brian G. Peterson wrote:

 On Thu, 2012-03-29 at 16:52 +1300, Thomas Lumley wrote:
 The 'No visible binding for global variable is a good example.  This
 found some bugs in my 'survey' package, which I removed. There is
 still one note of this type, which arises when I have to handle two
 different versions of the hexbin package with different internal
 structures.  The note is a false positive because the use is guarded
 by an if(), but  CMD check can't tell this.   So, it's a good idea to
 remove all Notes that can be removed without introducing other code
 problems, which is nearly all of them, but occasionally there may be a
 good reason for code that produces a Note.
 
 'occasionally' seems like an understatement.
 
 Here's an example:
 
 data(cars)
 lm(speed ~ dist,cars) #would produce global variables NOTE
 lm(speed ~ dist,cars) # would not produce the NOTE

Context, please. Where does this happen? (and why do you need data(cars)?)

I find it hard to believe that quoting the formula should be the solution to 
this issue. There must be tons of examples to the contrary.


 
 While the change required to avoid the CRAN NOTE is small, I can't think
 of a single example or text on using formulas that recommends quoting
 the formula as a best practice.  I'm not sure how users or package
 authors are supposed to know that they should use a (non standard) way
 of specifying the formula to avoid wasting their time, and the CRAN
 volunteers time.  I'm certain that there are many other examples, but
 this one was easy to demonstrate.
 
 Regards,
 
   - Brian
 
 -- 
 Brian G. Peterson
 http://braverock.com/brian/
 Ph: 773-459-4973
 IM: bgpbraverock
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Terry Therneau

On 03/29/2012 05:00 AM, r-devel-requ...@r-project.org wrote:

The 'No visible binding for global variable is a good example.  This
found some bugs in my 'survey' package, which I removed. There is
still one note of this type, which arises when I have to handle two
different versions of the hexbin package with different internal
structures.  The note is a false positive because the use is guarded
by an if(), but  CMD check can't tell this.   So, it's a good idea to
remove all Notes that can be removed without introducing other code
problems, which is nearly all of them, but occasionally there may be a
good reason for code that produces a Note.
The survival package has a similar special case: the routines for 
expected population survival are set up to accept multiple types of date 
format so have lines like

if (class(x) == 'chron') { y - as.numeric(x - chron(01/01/1960)}
This leaves me with two extraneous no visible binding messages.  There 
used to be half a dozen but I've tried to remove as many as possible, 
for all the good reasons already articulated by the maintainers.


It still remains that 99/100 of the no visible binding messages I've 
seen over the years were misspelled variable names, and the message is a 
very welcome check.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Dirk Eddelbuettel

On 29 March 2012 at 07:58, Brian G. Peterson wrote:
| On Thu, 2012-03-29 at 16:52 +1300, Thomas Lumley wrote:
|  The 'No visible binding for global variable is a good example.  This
|  found some bugs in my 'survey' package, which I removed. There is
|  still one note of this type, which arises when I have to handle two
|  different versions of the hexbin package with different internal
|  structures.  The note is a false positive because the use is guarded
|  by an if(), but  CMD check can't tell this.   So, it's a good idea to
|  remove all Notes that can be removed without introducing other code
|  problems, which is nearly all of them, but occasionally there may be a
|  good reason for code that produces a Note.
|  
| 'occasionally' seems like an understatement.
| 
| Here's an example:
| 
| data(cars)
| lm(speed ~ dist,cars) #would produce global variables NOTE
| lm(speed ~ dist,cars) # would not produce the NOTE
| 
| While the change required to avoid the CRAN NOTE is small, I can't think
| of a single example or text on using formulas that recommends quoting
| the formula as a best practice.  I'm not sure how users or package
| authors are supposed to know that they should use a (non standard) way
| of specifying the formula to avoid wasting their time, and the CRAN
| volunteers time.  I'm certain that there are many other examples, but
| this one was easy to demonstrate.

And it's close to my personal favourite of 

with( cars,  ... some expression involving dist and / or speed ... )

which gives the same warning about dist and speed being unknown globals.
Punishment for good coding style -- gotta love it.


Now, we all want high-quality packages. 

We all strive to have as few false positives. 

And we all understand that writing a parser if freaking hard.

One fudge-y way of helping with this may be via an overrides file. 

This is what Debian does to suppress known / tolerated violations of what the
'lintian' package checker picks up on.  For the R package, I have a fair
number of these: the file for the r-base-core binary is currently 83 lines
long and this ends on

  r-base-core: executable-not-elf-or-script usr/lib/R/bin/Rdiff
  r-base-core: image-file-in-usr-lib 
usr/lib/R/library/graphics/help/figures/mai.png
  r-base-core: image-file-in-usr-lib 
usr/lib/R/library/graphics/help/figures/oma.png
  r-base-core: image-file-in-usr-lib 
usr/lib/R/library/graphics/help/figures/pch.png
  r-base-core: executable-not-elf-or-script usr/lib/R/bin/Rd2pdf

two warnings on files with 755 modes in a non-PATH location (fine, that's how
R works) and idem with image files below /usr/lib (when the FHS probably
prefers them below /usr/share/).

You pipe the output of a lintian run into 'lintian-info' and you get longer
one or two paragraph descriptions with further pointers on the violations.

Does this sounds like something worthwhile to add to the R CMD check system ?

Should we consider to allow overrides to make known good exceptions good away?

Dirk

-- 
R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL
See agenda, registration details and more at http://www.RinFinance.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Spencer Graves

On 3/29/2012 7:07 AM, Dirk Eddelbuettel wrote:

On 29 March 2012 at 07:58, Brian G. Peterson wrote:
| On Thu, 2012-03-29 at 16:52 +1300, Thomas Lumley wrote:
|  The 'No visible binding for global variable is a good example.  This
|  found some bugs in my 'survey' package, which I removed. There is
|  still one note of this type, which arises when I have to handle two
|  different versions of the hexbin package with different internal
|  structures.  The note is a false positive because the use is guarded
|  by an if(), but  CMD check can't tell this.   So, it's a good idea to
|  remove all Notes that can be removed without introducing other code
|  problems, which is nearly all of them, but occasionally there may be a
|  good reason for code that produces a Note.
|
| 'occasionally' seems like an understatement.
|
| Here's an example:
|
| data(cars)
| lm(speed ~ dist,cars) #would produce global variables NOTE
| lm(speed ~ dist,cars) # would not produce the NOTE



Another example using library(ggplot2):


=qplot(time., value, data=X, geom='line',
facets=facets, color=variable, xlim=xlim, ylim=ylim,
xlab='days', ylab='displacement (inches)', ...),


  value and variable are columns of X.  If I knew how to list 
this in an overrides file, I would do so.  My experience is similar to 
what others mentioned:  99 percent of the No visible bindings messages 
I've seen are my coding errors.  This one is not.  I don't recall for 
sure, but I think I checked trying putting value and variable in 
quotes, and it didn't work.



  The function that includes this call to qplot actually includes 
the definition of a global variable time., which is NOT used, because 
X has a column named time..  The global variable time. is a 
character string, while X$time. is class POSIXct.



  I mention this, because this discussion suddenly told me how to 
get rid of this NOTE:  Precede this call to qplot with something like 
the following:



  value - variable - NOTE:  Define these variables to override 
the NOTE impulse in R CMD check'



  I haven't tried this with qplot, but it ignores the global 
variable Time. and uses the Time. column of X, so it should work.  
I just tried something similar with lm, and it ignored a global 
variable in favor of a column of X.  This is a silly kludge, but it's 
simple and does not require a modification to R CMD check.



  Spencer



|
| While the change required to avoid the CRAN NOTE is small, I can't think
| of a single example or text on using formulas that recommends quoting
| the formula as a best practice.  I'm not sure how users or package
| authors are supposed to know that they should use a (non standard) way
| of specifying the formula to avoid wasting their time, and the CRAN
| volunteers time.  I'm certain that there are many other examples, but
| this one was easy to demonstrate.

And it's close to my personal favourite of

 with( cars,  ... some expression involving dist and / or speed ... )

which gives the same warning about dist and speed being unknown globals.
Punishment for good coding style -- gotta love it.


Now, we all want high-quality packages.

We all strive to have as few false positives.

And we all understand that writing a parser if freaking hard.

One fudge-y way of helping with this may be via an overrides file.

This is what Debian does to suppress known / tolerated violations of what the
'lintian' package checker picks up on.  For the R package, I have a fair
number of these: the file for the r-base-core binary is currently 83 lines
long and this ends on

   r-base-core: executable-not-elf-or-script usr/lib/R/bin/Rdiff
   r-base-core: image-file-in-usr-lib 
usr/lib/R/library/graphics/help/figures/mai.png
   r-base-core: image-file-in-usr-lib 
usr/lib/R/library/graphics/help/figures/oma.png
   r-base-core: image-file-in-usr-lib 
usr/lib/R/library/graphics/help/figures/pch.png
   r-base-core: executable-not-elf-or-script usr/lib/R/bin/Rd2pdf

two warnings on files with 755 modes in a non-PATH location (fine, that's how
R works) and idem with image files below /usr/lib (when the FHS probably
prefers them below /usr/share/).

You pipe the output of a lintian run into 'lintian-info' and you get longer
one or two paragraph descriptions with further pointers on the violations.

Does this sounds like something worthwhile to add to the R CMD check system ?

Should we consider to allow overrides to make known good exceptions good away?

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread William Dunlap
 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
 Behalf
 Of Terry Therneau
 Sent: Thursday, March 29, 2012 7:02 AM
 To: r-devel@r-project.org
 Subject: Re: [Rd] CRAN policies
 
 On 03/29/2012 05:00 AM, r-devel-requ...@r-project.org wrote:
  The 'No visible binding for global variable is a good example.  This
  found some bugs in my 'survey' package, which I removed. There is
  still one note of this type, which arises when I have to handle two
  different versions of the hexbin package with different internal
  structures.  The note is a false positive because the use is guarded
  by an if(), but  CMD check can't tell this.   So, it's a good idea to
  remove all Notes that can be removed without introducing other code
  problems, which is nearly all of them, but occasionally there may be a
  good reason for code that produces a Note.
 The survival package has a similar special case: the routines for
 expected population survival are set up to accept multiple types of date
 format so have lines like
  if (class(x) == 'chron') { y - as.numeric(x - chron(01/01/1960)}
 This leaves me with two extraneous no visible binding messages.

Suppose we defined a function like
  NO_VISIBLE_BINDING(expr) expr
and added an entry to the stuff in codetools so that it
would not check for misspelled object names in call to
NO_VISIBLE_BINDING.  Then Terry could write that line as
 if (class(x) == chron) { y - as.numeric(x - 
NO_VISIBLE_BINDING(chron)(01/01/1960)}
and the Notes would disappear.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

 There
 used to be half a dozen but I've tried to remove as many as possible,
 for all the good reasons already articulated by the maintainers.
 
 It still remains that 99/100 of the no visible binding messages I've
 seen over the years were misspelled variable names, and the message is a
 very welcome check.
 
 Terry Therneau
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Matthew Dowle
William Dunlap wdunlap at tibco.com writes:

  -Original Message-
  The survival package has a similar special case: the routines for
  expected population survival are set up to accept multiple types of date
  format so have lines like
   if (class(x) == 'chron') { y - as.numeric(x - chron(01/01/1960)}
  This leaves me with two extraneous no visible binding messages.
 
 Suppose we defined a function like
   NO_VISIBLE_BINDING(expr) expr
 and added an entry to the stuff in codetools so that it
 would not check for misspelled object names in call to
 NO_VISIBLE_BINDING.  Then Terry could write that line as
  if (class(x) == chron) { y - as.numeric(x - NO_VISIBLE_BINDING(chron)
(01/01/1960)}
 and the Notes would disappear.
 

That's ok for package code, but what about test suites?  Say there was a test 
on the result of with(DF,a+b), you wouldn't want to change the test to with
(DF,NO_VISIBLE_BINDING(a)+NO_VISIBLE_BINDING(b)) not just because that's long 
and onerous, but because that's *changing* the test i.e. introducing a 
difference between what's tested and what user code will do.

As others suggested, how about a new category: MEMO. The no visible binding 
NOTE would be downgraded to MEMO. CRAN maintainers could then ignore MEMOs more 
easily.

What I really like about NOTES is that when new checks are added to R then as a 
package maintainer you know you don't have to fix them straight away. If a new 
WARNING shows up on r-devel daily checks, however, then you've got some warning 
about the WARNING that you need to fix more urgently and may even accelerate a 
release. So it's not just about checks when submitting a package, but what 
happens afterwards as R itself (and packages in Depends) move on. In other 
words, you know you need to fix new NOTES but not as urgently as new WARNINGS.

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread William Dunlap


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
 Behalf
 Of Matthew Dowle
 Sent: Thursday, March 29, 2012 10:41 AM
 To: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
 William Dunlap wdunlap at tibco.com writes:
 
   -Original Message-
   The survival package has a similar special case: the routines for
   expected population survival are set up to accept multiple types of date
   format so have lines like
if (class(x) == 'chron') { y - as.numeric(x - chron(01/01/1960)}
   This leaves me with two extraneous no visible binding messages.
 
  Suppose we defined a function like
NO_VISIBLE_BINDING(expr) expr
  and added an entry to the stuff in codetools so that it
  would not check for misspelled object names in call to
  NO_VISIBLE_BINDING.  Then Terry could write that line as
   if (class(x) == chron) { y - as.numeric(x - 
  NO_VISIBLE_BINDING(chron)
 (01/01/1960)}
  and the Notes would disappear.
 
 
 That's ok for package code, but what about test suites?  Say there was a test
 on the result of with(DF,a+b), you wouldn't want to change the test to with
 (DF,NO_VISIBLE_BINDING(a)+NO_VISIBLE_BINDING(b)) not just because that's long
 and onerous, but because that's *changing* the test i.e. introducing a
 difference between what's tested and what user code will do.

I don't know if test suites need to be checked for no visible bindings -
if there is a real problem the test ought to fail.

codetools should be able to do special checks for known functions that
do not following the standard evaluation rules .   E.g., do not check any
arguments of `~`, do not check the 'expr' argument of with, do not check
the subset or weights arguments of lm.

If a package writer introduces a new function with nonstandard evaluation,
perhaps the package could include some information about the matter
in a file that codetools could could source before running its checks.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
 
 As others suggested, how about a new category: MEMO. The no visible binding
 NOTE would be downgraded to MEMO. CRAN maintainers could then ignore MEMOs
 more
 easily.
 
 What I really like about NOTES is that when new checks are added to R then as 
 a
 package maintainer you know you don't have to fix them straight away. If a new
 WARNING shows up on r-devel daily checks, however, then you've got some 
 warning
 about the WARNING that you need to fix more urgently and may even accelerate a
 release. So it's not just about checks when submitting a package, but what
 happens afterwards as R itself (and packages in Depends) move on. In other
 words, you know you need to fix new NOTES but not as urgently as new WARNINGS.
 
 Matthew
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Spencer Graves

On 3/29/2012 11:29 AM, William Dunlap wrote:


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf
Of Matthew Dowle
Sent: Thursday, March 29, 2012 10:41 AM
To: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] CRAN policies

William Dunlapwdunlapat  tibco.com  writes:


-Original Message-
The survival package has a similar special case: the routines for
expected population survival are set up to accept multiple types of date
format so have lines like
  if (class(x) == 'chron') { y- as.numeric(x - chron(01/01/1960)}
This leaves me with two extraneous no visible binding messages.

Suppose we defined a function like
   NO_VISIBLE_BINDING(expr) expr
and added an entry to the stuff in codetools so that it
would not check for misspelled object names in call to
NO_VISIBLE_BINDING.  Then Terry could write that line as
  if (class(x) == chron) { y- as.numeric(x - NO_VISIBLE_BINDING(chron)

(01/01/1960)}

and the Notes would disappear.


That's ok for package code, but what about test suites?  Say there was a test
on the result of with(DF,a+b), you wouldn't want to change the test to with
(DF,NO_VISIBLE_BINDING(a)+NO_VISIBLE_BINDING(b)) not just because that's long
and onerous, but because that's *changing* the test i.e. introducing a
difference between what's tested and what user code will do.

I don't know if test suites need to be checked for no visible bindings -
if there is a real problem the test ought to fail.

codetools should be able to do special checks for known functions that
do not following the standard evaluation rules .   E.g., do not check any
arguments of `~`, do not check the 'expr' argument of with, do not check
the subset or weights arguments of lm.

If a package writer introduces a new function with nonstandard evaluation,
perhaps the package could include some information about the matter
in a file that codetools could could source before running its checks.



  This gets my vote -- but I don't have the bandwidth nor authority 
to effect the change ;-)  Spencer


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

As others suggested, how about a new category: MEMO. The no visible binding
NOTE would be downgraded to MEMO. CRAN maintainers could then ignore MEMOs
more
easily.

What I really like about NOTES is that when new checks are added to R then as a
package maintainer you know you don't have to fix them straight away. If a new
WARNING shows up on r-devel daily checks, however, then you've got some warning
about the WARNING that you need to fix more urgently and may even accelerate a
release. So it's not just about checks when submitting a package, but what
happens afterwards as R itself (and packages in Depends) move on. In other
words, you know you need to fix new NOTES but not as urgently as new WARNINGS.

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread William Dunlap
  codetools should be able to do special checks for known functions that
  do not following the standard evaluation rules .   E.g., do not check any
  arguments of `~`, do not check the 'expr' argument of with, do not check
  the subset or weights arguments of lm.
 
  If a package writer introduces a new function with nonstandard evaluation,
  perhaps the package could include some information about the matter
  in a file that codetools could could source before running its checks.
 
 
This gets my vote -- but I don't have the bandwidth nor authority
 to effect the change ;-)  Spencer

Most of that stuff is already in codetools, at least when it is checking 
functions
with checkUsage().  E.g., arguments of ~ are not checked.  The  expr argument
to with() will not be checked if you add  skipWith=FALSE to the call to 
checkUsage.

   library(codetools)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ Pred}))
  anonymous: no visible binding for global variable 'Num' (:1)
  anonymous: no visible binding for global variable 'Den' (:1)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ Pred}), 
skipWith=TRUE)

   checkUsage(function(dataFrame) with(DataFrame, {Num/Den ; Resp ~ Pred}), 
skipWith=TRUE)
  anonymous: no visible binding for global variable 'DataFrame'

The only part that I don't see is the mechanism to add code-walker functions to
the environment in codetools that has the standard list of them for functions 
with
nonstandard evaluation:
   objects(codetools:::collectUsageHandlers, all=TRUE)
   [1] $ $-   .Internal
   [4] :::::   @
   [7] @-   { ~
  [10] --   =
  [13] assignbinomial  bquote   
  [16] data  detachexpression   
  [19] for   function  Gamma
  [22] gaussian  iflibrary  
  [25] local poisson   quasi
  [28] quasibinomial quasipoisson  quote
  [31] Quote require   substitute   
  [34] with 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: Spencer Graves [mailto:spencer.gra...@prodsyse.com]
 Sent: Thursday, March 29, 2012 12:22 PM
 To: William Dunlap
 Cc: Matthew Dowle; r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] CRAN policies
 
 On 3/29/2012 11:29 AM, William Dunlap wrote:
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] 
  On
 Behalf
  Of Matthew Dowle
  Sent: Thursday, March 29, 2012 10:41 AM
  To: r-de...@stat.math.ethz.ch
  Subject: Re: [Rd] CRAN policies
 
  William Dunlapwdunlapat  tibco.com  writes:
 
  -Original Message-
  The survival package has a similar special case: the routines for
  expected population survival are set up to accept multiple types of date
  format so have lines like
if (class(x) == 'chron') { y- as.numeric(x - chron(01/01/1960)}
  This leaves me with two extraneous no visible binding messages.
  Suppose we defined a function like
 NO_VISIBLE_BINDING(expr) expr
  and added an entry to the stuff in codetools so that it
  would not check for misspelled object names in call to
  NO_VISIBLE_BINDING.  Then Terry could write that line as
if (class(x) == chron) { y- as.numeric(x - 
  NO_VISIBLE_BINDING(chron)
  (01/01/1960)}
  and the Notes would disappear.
 
  That's ok for package code, but what about test suites?  Say there was a 
  test
  on the result of with(DF,a+b), you wouldn't want to change the test to 
  with
  (DF,NO_VISIBLE_BINDING(a)+NO_VISIBLE_BINDING(b)) not just because that's 
  long
  and onerous, but because that's *changing* the test i.e. introducing a
  difference between what's tested and what user code will do.
  I don't know if test suites need to be checked for no visible bindings -
  if there is a real problem the test ought to fail.
 
  codetools should be able to do special checks for known functions that
  do not following the standard evaluation rules .   E.g., do not check any
  arguments of `~`, do not check the 'expr' argument of with, do not check
  the subset or weights arguments of lm.
 
  If a package writer introduces a new function with nonstandard evaluation,
  perhaps the package could include some information about the matter
  in a file that codetools could could source before running its checks.
 
 
This gets my vote -- but I don't have the bandwidth nor authority
 to effect the change ;-)  Spencer
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
  As others suggested, how about a new category: MEMO. The no visible 
  binding
  NOTE would be downgraded to MEMO. CRAN maintainers could then ignore MEMOs
  more
  easily.
 
  What I really like about NOTES is that when new checks are added to R then 
  as a
  package maintainer you

Re: [Rd] CRAN policies

2012-03-29 Thread Hadley Wickham
 Most of that stuff is already in codetools, at least when it is checking 
 functions
 with checkUsage().  E.g., arguments of ~ are not checked.  The  expr argument
 to with() will not be checked if you add  skipWith=FALSE to the call to 
 checkUsage.

   library(codetools)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ Pred}))
  anonymous: no visible binding for global variable 'Num' (:1)
  anonymous: no visible binding for global variable 'Den' (:1)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ Pred}), 
 skipWith=TRUE)

   checkUsage(function(dataFrame) with(DataFrame, {Num/Den ; Resp ~ Pred}), 
 skipWith=TRUE)
  anonymous: no visible binding for global variable 'DataFrame'

 The only part that I don't see is the mechanism to add code-walker functions 
 to
 the environment in codetools that has the standard list of them for functions 
 with
 nonstandard evaluation:
   objects(codetools:::collectUsageHandlers, all=TRUE)
   [1] $             $-           .Internal
   [4] ::            :::           @
   [7] @-           {             ~
  [10] -            -           =
  [13] assign        binomial      bquote
  [16] data          detach        expression
  [19] for           function      Gamma
  [22] gaussian      if            library
  [25] local         poisson       quasi
  [28] quasibinomial quasipoisson  quote
  [31] Quote         require       substitute
  [34] with

It seems like we really need a standard way to add metadata to functions:

attr(with, special_args) - expr
attr(lm, special_args) - c(formula, weights, subset)

This would be useful because it could automatically contribute to the
documentation.

Similarly,

attr(my.new.method, s3method) - c(my.new, method)

could be useful.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-29 Thread Mark.Bravington
I'm concerned this thread is heading the wrong way, towards techno-fixes for 
imaginary problems. R package-building is already encumbered with a huge set of 
complicated rules, and more instructions/rules eg for metadata would make 
things worse not better.

RCMD CHECK on the 'mvbutils' package generates over 300 Notes about no visible 
binding..., which inevitably I just ignore. They arise because RCMD CHECK is 
too stupid to understand one of my preferred coding idioms (I'm not going to 
explain what-- that's beside the point). And RCMD CHECK always will be too 
stupid to understand everything that a rich language like R might quite 
reasonably cause experienced coders to do.

It should not be CRAN's business how I write my code, or even whether my code 
does what it is supposed to. It might be CRAN's business to try to work out 
whether my code breaks CRAN's policies, eg by causing R to crash horribly-- 
that's presumably what Warnings are for (but see below). And maybe there could 
be circumstances where an automatic check might be worried enough to alert 
the CRANia and require manual explanation and emails etc from a developer, but 
even that seems doomed given the growing deluge of packages.

RCMD CHECK currently functions both as a sanitizer for CRAN, and as a 
developer-tool. But the fact that the one programl does both things seems 
accidental to me, and I think this dual-use is muddying the discussion. There's 
a big distinction between (i) code-checks that developers themselves might or 
might not find useful-- which should be left to the developer, and will vary 
from person to person-- and (ii) code-checks that CRAN enforces for its own 
peace-of-mind. Maybe it's convenient to have both functions in the same place, 
and it'd be fine to use Notes for one and Warnings for the other, but the 
different purposes should surely be kept clear. 

Personally, in building over 10 packages (only 2 on CRAN), I haven't found RCMD 
CHECK to be of any use, except for the code-documentation and example-running 
bits. I know other people have different opinions, but that's the point: 
one-size-does-not-fit-all when it comes to coding tools.

And wrto the Warnings themselves: I feel compelled to point out that it's 
logically impossible to fully check whether R code will do bad things. One has 
to wonder at what point adding new checks becomes futile or counterproductive. 
There must be over 2000 people who have written CRAN packages by now; every 
extra check and non-back-compatible additional requirement runs the risk of 
generating false-negatives and incurring many extra person-hours to fix 
non-problems. Plus someone needs to document and explain the check (adding to 
the rule mountain), plus there is the time spent in discussions like this..!

Mark

Mark Bravington
CSIRO CMIS
Marine Lab
Hobart
Australia

From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On Behalf 
Of Hadley Wickham [had...@rice.edu]
Sent: 30 March 2012 07:42
To: William Dunlap
Cc: r-de...@stat.math.ethz.ch; Spencer Graves
Subject: Re: [Rd] CRAN policies

 Most of that stuff is already in codetools, at least when it is checking 
 functions
 with checkUsage().  E.g., arguments of ~ are not checked.  The  expr argument
 to with() will not be checked if you add  skipWith=FALSE to the call to 
 checkUsage.

   library(codetools)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ Pred}))
  anonymous: no visible binding for global variable 'Num' (:1)
  anonymous: no visible binding for global variable 'Den' (:1)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ Pred}), 
 skipWith=TRUE)

   checkUsage(function(dataFrame) with(DataFrame, {Num/Den ; Resp ~ Pred}), 
 skipWith=TRUE)
  anonymous: no visible binding for global variable 'DataFrame'

 The only part that I don't see is the mechanism to add code-walker functions 
 to
 the environment in codetools that has the standard list of them for functions 
 with
 nonstandard evaluation:
   objects(codetools:::collectUsageHandlers, all=TRUE)
   [1] $ $-   .Internal
   [4] :::::   @
   [7] @-   { ~
  [10] --   =
  [13] assignbinomial  bquote
  [16] data  detachexpression
  [19] for   function  Gamma
  [22] gaussian  iflibrary
  [25] local poisson   quasi
  [28] quasibinomial quasipoisson  quote
  [31] Quote require   substitute
  [34] with

It seems like we really need a standard way to add metadata to functions:

attr(with, special_args) - expr
attr(lm, special_args) - c(formula, weights, subset)

This would be useful because it could automatically contribute to the
documentation.

Similarly,

attr(my.new.method, s3method) - c(my.new, method)

could be useful.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics

Re: [Rd] CRAN policies

2012-03-29 Thread Paul Gilbert


On 12-03-29 09:29 PM, mark.braving...@csiro.au wrote:
 I'm concerned this thread is heading the wrong way, towards
 techno-fixes for imaginary problems. R package-building is already
 encumbered with a huge set of complicated rules, and more
 instructions/rules eg for metadata would make things worse not better.

 RCMD CHECK on the 'mvbutils' package generates over 300 Notes about
 no visible binding..., which inevitably I just ignore. They arise
 because RCMD CHECK is too stupid to understand one of my preferred
 coding idioms (I'm not going to explain what-- that's beside the
 point).

Actually, I think that is the point. If your code is generating that 
many notes then I think you should explain your idiom, so the checks can 
be made to accommodate it if it really is good. Otherwise, I'd be 
worried about the quality of your code.


 And RCMD CHECK always will be too stupid to understand everything
 that a rich language like R might quite reasonably cause experienced
 coders to do.

Possibly the interpreter is too stupid to understand it too?

 It should not be CRAN's business how I write my code, or even whether
 my code does what it is supposed to. It might be CRAN's business to
 try to work out whether my code breaks CRAN's policies, eg by causing 
 R to crash horribly-- that's presumably what Warnings are for (but

 see below). And maybe there could be circumstances where an automatic
 check might be worried enough to alert the CRANia and require manual
 explanation and emails etc from a developer, but even that seems
 doomed given the growing deluge of packages.

 RCMD CHECK currently functions both as a sanitizer for CRAN, and as
 a developer-tool. But the fact that the one programl does both things
 seems accidental to me, and I think this dual-use is muddying the
 discussion. There's a big distinction between (i) code-checks that
 developers themselves might or might not find useful-- which should
 be left to the developer, and will vary from person to person--

I think this a case of two heads are better than one. I did lots of
checks before the CRAN checks existed, but the CRAN checks still found 
bugs in code that I considerer very mature, including bugs in code has 
been running without noticeable problems for over 15 years. Despite all 
the noise today, most of us are only talking about a small inconvenience 
around the intended meaning of note, not about whether quality control 
is a bad thing. I've found the errors and warnings are always valid, 
even though I do not always like having to fix the bugs, and the notes 
are most often valid too. But there are a few false positives, so the 
checks that give notes are not yet reliable enough to give warnings or 
errors. But they should be sometime, so one should usually consider 
fixing the package code.


   and (ii) code-checks that CRAN enforces for its own peace-of-mind.

I think of this as being for the piece-of-mind of your package users.

 Maybe it's convenient to have both functions in the same place, and
 it'd be fine to use Notes for one and Warnings for the other, but the
 different purposes should surely be kept clear.

 Personally, in building over 10 packages (only 2 on CRAN), I haven't
 found RCMD CHECK to be of any use, except for the code-documentation
 and example-running bits. I know other people have different
 opinions, but that's the point: one-size-does-not-fit-all when it
 comes to coding tools.

 And wrto the Warnings themselves: I feel compelled to point out that
 it's logically impossible to fully check whether R code will do bad
 things. One has to wonder at what point adding new checks becomes
 futile or counterproductive. There must be over 2000 people who have
 written CRAN packages by now; every extra check and non-back-
 compatible additional requirement runs the risk of generating false-
 negatives and incurring many extra person-hours to fix
 non-problems.
 Plus someone needs to document and explain the check (adding to the
 rule mountain), plus there is the time spent in discussions like
 this..!

Bugs in your packages will require users to waste a lot of time too, and 
possibly reach faulty results with much more serious consequences. Just 
because perfection may never be attained, this does not mean that 
progress should not be attempted, in small steps. Compared to Statlib, 
which basicly followed your recommended approach, CRAN is a vast 
improvement.


Paul

 Mark

 Mark Bravington
 CSIRO CMIS
 Marine Lab
 Hobart
 Australia
 
 From:r-devel-boun...@r-project.org  [r-devel-boun...@r-project.org] 
On Behalf Of Hadley Wickham [had...@rice.edu]

 Sent: 30 March 2012 07:42
 To: William Dunlap
 Cc:r-de...@stat.math.ethz.ch; Spencer Graves
 Subject: Re: [Rd] CRAN policies

 Most of that stuff is already in codetools, at least when it is 
checking functions
 with checkUsage().  E.g., arguments of ~ are not checked.  The  expr 
argument
 to with() will not be checked

Re: [Rd] CRAN policies

2012-03-29 Thread Spencer Graves
 whether R code will do bad
 things. One has to wonder at what point adding new checks becomes
 futile or counterproductive. There must be over 2000 people who have
 written CRAN packages by now; every extra check and non-back-
 compatible additional requirement runs the risk of generating false-
 negatives and incurring many extra person-hours to fix
 non-problems.
 Plus someone needs to document and explain the check (adding to the
 rule mountain), plus there is the time spent in discussions like
 this..!

Bugs in your packages will require users to waste a lot of time too, 
and possibly reach faulty results with much more serious consequences. 
Just because perfection may never be attained, this does not mean that 
progress should not be attempted, in small steps. Compared to Statlib, 
which basicly followed your recommended approach, CRAN is a vast 
improvement.


Paul

 Mark

 Mark Bravington
 CSIRO CMIS
 Marine Lab
 Hobart
 Australia
 
 From:r-devel-boun...@r-project.org  [r-devel-boun...@r-project.org] 
On Behalf Of Hadley Wickham [had...@rice.edu]

 Sent: 30 March 2012 07:42
 To: William Dunlap
 Cc:r-de...@stat.math.ethz.ch; Spencer Graves
 Subject: Re: [Rd] CRAN policies

 Most of that stuff is already in codetools, at least when it is 
checking functions
 with checkUsage().  E.g., arguments of ~ are not checked.  The  
expr argument
 to with() will not be checked if you add  skipWith=FALSE to the 
call to checkUsage.


   library(codetools)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ 
Pred}))

 anonymous: no visible binding for global variable 'Num' (:1)
 anonymous: no visible binding for global variable 'Den' (:1)

   checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ 
Pred}), skipWith=TRUE)


   checkUsage(function(dataFrame) with(DataFrame, {Num/Den ; Resp ~ 
Pred}), skipWith=TRUE)

 anonymous: no visible binding for global variable 'DataFrame'

 The only part that I don't see is the mechanism to add code-walker 
functions to
 the environment in codetools that has the standard list of them for 
functions with

 nonstandard evaluation:
   objects(codetools:::collectUsageHandlers, all=TRUE)
[1] $ $-   .Internal
[4] :::::   @
[7] @-   { ~
   [10] --   =
   [13] assignbinomial  bquote
   [16] data  detachexpression
   [19] for   function  Gamma
   [22] gaussian  iflibrary
   [25] local poisson   quasi
   [28] quasibinomial quasipoisson  quote
   [31] Quote require   substitute
   [34] with
 It seems like we really need a standard way to add metadata to 
functions:


 attr(with, special_args)- expr
 attr(lm, special_args)- c(formula, weights, subset)

 This would be useful because it could automatically contribute to the
 documentation.

 Similarly,

 attr(my.new.method, s3method)- c(my.new, method)

 could be useful.

 Hadley


 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-devel@r-project.org  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread jing hua zhao

 From: x...@yihui.name
 Date: Tue, 27 Mar 2012 16:40:04 -0500
 To: r-devel@r-project.org
 Subject: Re: [Rd] CRAN policies
 
 I have been wondering if it is possible to automate the checking
 process to reduce human efforts, e.g. automatically check the packages
 submitted to FTP, and send the package maintainer an email in case of
 warnings or errors (otherwise just move it to CRAN); package
 maintainers can appeal for a manual check by CRAN maintainers in case
 of false positives. As a package author, I really hate to bother CRAN
 maintainers each time I upload a new version and it passes R CMD check
 successfully, in which case I should have received an automatic email
 instead of Kurt's hand-writing thanks, on CRAN now. Frankly
 speaking, it makes me feel guilty sometimes to update my packages,
 thinking of other 3700 packages on CRAN and how much time you CRAN
 maintainers are spending on checking the packages.
 


Indeed it is a good summary of how I felt for so long and in particular my 
recent experience, which involved Kurt, Brian,  and Uwe.



I think win-builder certainly helps, but it is feasible with a Linux 
counterpart to have a final say?



 I do not know how many package authors actually read this mailing
 list, so these policies may not really reach some authors at all.
 
Certainly more colleagues read the list than  have been revealed by the 
postings.

Kind regards,





Jing Hua



 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: 515-294-2465 Web: http://yihui.name
 Department of Statistics, Iowa State University
 2215 Snedecor Hall, Ames, IA
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
  
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread Uwe Ligges



On 28.03.2012 00:07, Hadley Wickham wrote:

On Tue, Mar 27, 2012 at 6:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk  wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.  In
particular, please


Thanks for the pointer - I did not know that this page existed. In
general, is there some easy way to track changes to this page and the
R extension manual over time?  It is difficult to keep track of the
best practices.

I'd also like to get clarification on Packages should not write in
the users' home filespace, nor anywhere else on the file system apart
from the R session's temporary directory (or during installation in
the location pointed to by TMPDIR: and such usage should be cleaned
up). - what is recommended practice for packages to maintain state
across instances?  Operating systems have standards for where
applications can store settings (e.g. as described in
http://pypi.python.org/pypi/appdirs/1.2.0).  Is it acceptable to for
packages to follow these conventions?



The policy is meant not to overwrite user data or generate loads of 
temporary files from examples and pollute, e.g., the owkring directory.


Uwe Ligges





Hadley



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread Uwe Ligges



On 27.03.2012 20:33, Jeffrey Ryan wrote:

Thanks Uwe for the clarification on what goes and what stays.

Still fuzzy on the notion of significant though.  Do you have an example
or two for the list?



We have to look at those notes again and again in order to find if 
something important is noted, hence please always try to avoid all notes 
unless the effect is really intended!



Consider the Note No visible binding for global variable
We cannot know if your code intends to use such a global variable (which 
is undesirable in most cases), hence would let is pass if it seems to be 
sensible.


Another Note such as empty section or partial argument match can 
quickly be fixed, hence just do it and don't waste our time.


Best,
Uwe Ligges




Jeff

P.S.
I meant to also thank all of CRAN volunteers for the momentous efforts
involved, and it is nice to see some explanation of how we can help, as
well as a peek into what goes on 'behind the curtain' ;-)

On 3/27/12 1:19 PM, Uwe Liggeslig...@statistik.tu-dortmund.de  wrote:




On 27.03.2012 19:10, Jeffrey Ryan wrote:

Is there a distinction as to NOTE vs. WARNING that is documented?  I've
always assumed (wrongly?) that NOTES weren't an issue with publishing on
CRAN, but that they may change to WARNINGS at some point.


We won't kick packages off CRAN for Notes (but we will if Warnings are
not fixed), but we may not accept new submissions with significant Notes.

Best,
Uwe Ligges




Is the process by which this happens documented somewhere?

Jeff

On 3/27/12 11:09 AM, Gabor Grothendieckggrothendi...@gmail.com
wrote:


2012/3/27 Uwe Liggeslig...@statistik.tu-dortmund.de:



On 27.03.2012 17:09, Gabor Grothendieck wrote:


On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.ukwrote:


CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package
maintainers.
   In
particular, please

- always send a submission email to c...@r-project.org with the
package
name and version on the subject line.  Emails sent to individual
members
of
the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it.  Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R
are
able to give better diagnostics, e.g. for compiled code and
especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages
were
published last week) and to remain viable needs package maintainers
to
make
its life as easy as possible.



Regarding the part about warnings or significant notes in that
page,
its impossible to know which notes are significant and which ones are
not significant except by trial and error.




Right, it needs human inspection to identify false positives. We
believe
most package maintainers are able to see if he or she is hit by such a
false
positive.


The problem is that a note is generated and the note is correct. Its
not a false positive.  But that does not tell you whether its
significant or not.  There is no way to know.  One can either try to
remove all notes (which may not be feasible) or just upload it and by
trial and error find out if its accepted or not.

--
Statistics   Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread Uwe Ligges



On 27.03.2012 20:36, Gabor Grothendieck wrote:

2012/3/27 Uwe Liggeslig...@statistik.tu-dortmund.de:



On 27.03.2012 19:10, Jeffrey Ryan wrote:


Is there a distinction as to NOTE vs. WARNING that is documented?  I've
always assumed (wrongly?) that NOTES weren't an issue with publishing on
CRAN, but that they may change to WARNINGS at some point.



We won't kick packages off CRAN for Notes (but we will if Warnings are not
fixed), but we may not accept new submissions with significant Notes.


Yes, I understand that but that does not really address the problem
that one has no idea of whether a Note is significant or not so the
only way to determine its significance is to submit your package and
see if its accepted or not.



We have to look at those notes again and again in order to find if 
something important is noted, hence please always try to avoid all notes 
unless the effect is really intended!



Consider the Note No visible binding for global variable
We cannot know if your code intends to use such a global variable (which 
is undesirable in most cases), hence would let is pass if it seems to be 
sensible.


Another Note such as empty section or partial argument match can 
quickly be fixed, hence just do it and don't waste our time.


Best,
Uwe Ligges

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread Gabor Grothendieck
2012/3/28 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 27.03.2012 20:33, Jeffrey Ryan wrote:

 Thanks Uwe for the clarification on what goes and what stays.

 Still fuzzy on the notion of significant though.  Do you have an example
 or two for the list?



 We have to look at those notes again and again in order to find if something
 important is noted, hence please always try to avoid all notes unless the
 effect is really intended!


 Consider the Note No visible binding for global variable
 We cannot know if your code intends to use such a global variable (which is
 undesirable in most cases), hence would let is pass if it seems to be
 sensible.

 Another Note such as empty section or partial argument match can quickly
 be fixed, hence just do it and don't waste our time.

 Best,
 Uwe Ligges

What is the point of notes vs warnings if you have to get rid of both
of them?  Furthermore, if there are notes that you don't have to get
rid of its not fair that package developers should have to waste their
time on things that are actually acceptable.  Finally, it makes the
whole system arbitrary since packages can be rejected based on
undefined rules.

Either divide notes into significant notes and ordinary notes and
clearly label them as such in the output of   R CMD check   or else
make the significant notes warnings so one can know in advance whether
the package passes R CMD check or not.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread Uwe Ligges



On 28.03.2012 16:30, Gabor Grothendieck wrote:

2012/3/28 Uwe Liggeslig...@statistik.tu-dortmund.de:



On 27.03.2012 20:33, Jeffrey Ryan wrote:


Thanks Uwe for the clarification on what goes and what stays.

Still fuzzy on the notion of significant though.  Do you have an example
or two for the list?




We have to look at those notes again and again in order to find if something
important is noted, hence please always try to avoid all notes unless the
effect is really intended!


Consider the Note No visible binding for global variable
We cannot know if your code intends to use such a global variable (which is
undesirable in most cases), hence would let is pass if it seems to be
sensible.

Another Note such as empty section or partial argument match can quickly
be fixed, hence just do it and don't waste our time.

Best,
Uwe Ligges


What is the point of notes vs warnings if you have to get rid of both
of them?  Furthermore, if there are notes that you don't have to get
rid of its not fair that package developers should have to waste their
time on things that are actually acceptable.  Finally, it makes the
whole system arbitrary since packages can be rejected based on
undefined rules.

Either divide notes into significant notes and ordinary notes and
clearly label them as such in the output of   R CMD check   or else
make the significant notes warnings so one can know in advance whether
the package passes R CMD check or not.




I tried to make clear that we cannot decide that automatically and it 
needs human inspection and thinking if some Note is significant or not. 
That why we have not made them Warnings where we are sure things have to 
be fixed.


Please always try to avoid all notes unless the effect is really 
intended! How hard can it be? If Notes could be completely ignored, they 
would not be Notes.


Uwe

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-28 Thread Thomas Lumley
On Thu, Mar 29, 2012 at 3:30 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 2012/3/28 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 27.03.2012 20:33, Jeffrey Ryan wrote:

 Thanks Uwe for the clarification on what goes and what stays.

 Still fuzzy on the notion of significant though.  Do you have an example
 or two for the list?



 We have to look at those notes again and again in order to find if something
 important is noted, hence please always try to avoid all notes unless the
 effect is really intended!


 Consider the Note No visible binding for global variable
 We cannot know if your code intends to use such a global variable (which is
 undesirable in most cases), hence would let is pass if it seems to be
 sensible.

 Another Note such as empty section or partial argument match can quickly
 be fixed, hence just do it and don't waste our time.

 Best,
 Uwe Ligges

 What is the point of notes vs warnings if you have to get rid of both
 of them?  Furthermore, if there are notes that you don't have to get
 rid of its not fair that package developers should have to waste their
 time on things that are actually acceptable.  Finally, it makes the
 whole system arbitrary since packages can be rejected based on
 undefined rules.


The notes are precisely the things for which clear rules can't be
written.  They are reported by CMD check because they are usually
signs of coding errors, but are not warnings because their use is
sometimes justified.

The 'No visible binding for global variable is a good example.  This
found some bugs in my 'survey' package, which I removed. There is
still one note of this type, which arises when I have to handle two
different versions of the hexbin package with different internal
structures.  The note is a false positive because the use is guarded
by an if(), but  CMD check can't tell this.   So, it's a good idea to
remove all Notes that can be removed without introducing other code
problems, which is nearly all of them, but occasionally there may be a
good reason for code that produces a Note.

But if you want a simple, unambiguous, mechanical rule for *your*
packages, just eliminate all Notes.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] CRAN policies

2012-03-27 Thread Prof Brian Ripley

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers. 
 In particular, please


- always send a submission email to c...@r-project.org with the package
name and version on the subject line.  Emails sent to individual members 
of the team will result in delays at best.


- run R CMD check --as-cran on the tarball before you submit it.  Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were 
published last week) and to remain viable needs package maintainers to 
make its life as easy as possible.


Kurt Hornik
Uwe Ligges
Brian Ripley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Paul Gilbert
One of the things I have noticed with the R 2.15.0 RC and --as-cran is 
that the I have to bump the version number of the working copy of my 
packages immediately after putting a version on CRAN, or I get an 
message about version suitability. This is probably a good thing for 
packages that I have changed, compared with my old habit of bumping the 
version number at arbitrary times, although the mechanics are a nuisance 
because I do not actually want to commit to the next version number at 
that point. For packages that I have not changed it is a bit worse, 
because I have to change the version number even though I have not yet 
made any changes to the package. This will mean, for example, that on 
R-forge it will look like there is a slightly newer version, even though 
there is not really.


I am curious how other developers approach this. Is it better to not 
specify --as-cran most of the time?  My feeling is that it is better to 
specify it all of the time so that I catch errors sooner rather than 
later, but maybe there is a better solution?


Paul

On 12-03-27 07:52 AM, Prof Brian Ripley wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.
In particular, please

- always send a submission email to c...@r-project.org with the package
name and version on the subject line. Emails sent to individual members
of the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it. Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were
published last week) and to remain viable needs package maintainers to
make its life as easy as possible.

Kurt Hornik
Uwe Ligges
Brian Ripley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Uwe Ligges



On 27.03.2012 16:17, Paul Gilbert wrote:

One of the things I have noticed with the R 2.15.0 RC and --as-cran is
that the I have to bump the version number of the working copy of my
packages immediately after putting a version on CRAN, or I get an
message about version suitability. This is probably a good thing for
packages that I have changed, compared with my old habit of bumping the
version number at arbitrary times, although the mechanics are a nuisance
because I do not actually want to commit to the next version number at
that point. For packages that I have not changed it is a bit worse,
because I have to change the version number even though I have not yet
made any changes to the package. This will mean, for example, that on
R-forge it will look like there is a slightly newer version, even though
there is not really.

I am curious how other developers approach this. Is it better to not
specify --as-cran most of the time? My feeling is that it is better to
specify it all of the time so that I catch errors sooner rather than
later, but maybe there is a better solution?



--as-cran is modelled rather closely after the CRAN incoming checks. 
CRAN checks if a new version has a new version number. Of course, you 
can ignore its result if you do not want to submit. The idea of using 
--as-cran is to apply it before you actually submit. Some parts require 
network connection etc.


Uwe





Paul

On 12-03-27 07:52 AM, Prof Brian Ripley wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.
In particular, please

- always send a submission email to c...@r-project.org with the package
name and version on the subject line. Emails sent to individual members
of the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it. Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were
published last week) and to remain viable needs package maintainers to
make its life as easy as possible.

Kurt Hornik
Uwe Ligges
Brian Ripley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Gabor Grothendieck
On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 CRAN has for some time had a policies page at
 http://cran.r-project.org/web/packages/policies.html
 and we would like to draw this to the attention of package maintainers.  In
 particular, please

 - always send a submission email to c...@r-project.org with the package
 name and version on the subject line.  Emails sent to individual members of
 the team will result in delays at best.

 - run R CMD check --as-cran on the tarball before you submit it.  Do
 this with the latest version of R possible: definitely R 2.14.2,
 preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
 able to give better diagnostics, e.g. for compiled code and especially
 on Windows. They may also have extra checks for recently uncovered
 problems.)

 Also, please note that CRAN has a very heavy workload (186 packages were
 published last week) and to remain viable needs package maintainers to make
 its life as easy as possible.


Regarding the part about warnings or significant notes in that page,
its impossible to know which notes are significant and which ones are
not significant except by trial and error.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Paul Gilbert



On 12-03-27 10:59 AM, Uwe Ligges wrote:



On 27.03.2012 16:17, Paul Gilbert wrote:

One of the things I have noticed with the R 2.15.0 RC and --as-cran is
that the I have to bump the version number of the working copy of my
packages immediately after putting a version on CRAN, or I get an
message about version suitability. This is probably a good thing for
packages that I have changed, compared with my old habit of bumping the
version number at arbitrary times, although the mechanics are a nuisance
because I do not actually want to commit to the next version number at
that point. For packages that I have not changed it is a bit worse,
because I have to change the version number even though I have not yet
made any changes to the package. This will mean, for example, that on
R-forge it will look like there is a slightly newer version, even though
there is not really.

I am curious how other developers approach this. Is it better to not
specify --as-cran most of the time? My feeling is that it is better to
specify it all of the time so that I catch errors sooner rather than
later, but maybe there is a better solution?



--as-cran is modelled rather closely after the CRAN incoming checks.
CRAN checks if a new version has a new version number. Of course, you
can ignore its result if you do not want to submit. The idea of using
--as-cran is to apply it before you actually submit. Some parts require
network connection etc.

Uwe


Yes but, for example, will R-forge run checks with --as-cran, and thus 
give warnings for any package unchanged from the one on CRAN, or run 
without --as-cran, and thus not give a true indication of whether the 
package is good to submit?


(No doubt R-forge will customise more, but I am trying to work out a 
strategy for my own automatic testing.)


Paul






Paul

On 12-03-27 07:52 AM, Prof Brian Ripley wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.
In particular, please

- always send a submission email to c...@r-project.org with the package
name and version on the subject line. Emails sent to individual members
of the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it. Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were
published last week) and to remain viable needs package maintainers to
make its life as easy as possible.

Kurt Hornik
Uwe Ligges
Brian Ripley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Uwe Ligges



On 27.03.2012 17:22, Paul Gilbert wrote:



On 12-03-27 10:59 AM, Uwe Ligges wrote:



On 27.03.2012 16:17, Paul Gilbert wrote:

One of the things I have noticed with the R 2.15.0 RC and --as-cran is
that the I have to bump the version number of the working copy of my
packages immediately after putting a version on CRAN, or I get an
message about version suitability. This is probably a good thing for
packages that I have changed, compared with my old habit of bumping the
version number at arbitrary times, although the mechanics are a nuisance
because I do not actually want to commit to the next version number at
that point. For packages that I have not changed it is a bit worse,
because I have to change the version number even though I have not yet
made any changes to the package. This will mean, for example, that on
R-forge it will look like there is a slightly newer version, even though
there is not really.

I am curious how other developers approach this. Is it better to not
specify --as-cran most of the time? My feeling is that it is better to
specify it all of the time so that I catch errors sooner rather than
later, but maybe there is a better solution?



--as-cran is modelled rather closely after the CRAN incoming checks.
CRAN checks if a new version has a new version number. Of course, you
can ignore its result if you do not want to submit. The idea of using
--as-cran is to apply it before you actually submit. Some parts require
network connection etc.

Uwe


Yes but, for example, will R-forge run checks with --as-cran, and thus
give warnings for any package unchanged from the one on CRAN, or run
without --as-cran, and thus not give a true indication of whether the
package is good to submit?



This is a question for the R-forge maintainer. I would not expect it 
runs checks --as-cran, but I do now know.


Best,
Uwe




(No doubt R-forge will customise more, but I am trying to work out a
strategy for my own automatic testing.)

Paul






Paul

On 12-03-27 07:52 AM, Prof Brian Ripley wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.
In particular, please

- always send a submission email to c...@r-project.org with the package
name and version on the subject line. Emails sent to individual members
of the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it. Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages
were
published last week) and to remain viable needs package maintainers to
make its life as easy as possible.

Kurt Hornik
Uwe Ligges
Brian Ripley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Uwe Ligges



On 27.03.2012 17:09, Gabor Grothendieck wrote:

On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk  wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.  In
particular, please

- always send a submission email to c...@r-project.org with the package
name and version on the subject line.  Emails sent to individual members of
the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it.  Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were
published last week) and to remain viable needs package maintainers to make
its life as easy as possible.



Regarding the part about warnings or significant notes in that page,
its impossible to know which notes are significant and which ones are
not significant except by trial and error.



Right, it needs human inspection to identify false positives. We believe 
most package maintainers are able to see if he or she is hit by such a 
false positive.


Uwe Ligges

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Prof Brian Ripley

On 27/03/2012 15:17, Paul Gilbert wrote:

One of the things I have noticed with the R 2.15.0 RC and --as-cran is
that the I have to bump the version number of the working copy of my
packages immediately after putting a version on CRAN, or I get an
message about version suitability. This is probably a good thing for
packages that I have changed, compared with my old habit of bumping the
version number at arbitrary times, although the mechanics are a nuisance
because I do not actually want to commit to the next version number at
that point. For packages that I have not changed it is a bit worse,
because I have to change the version number even though I have not yet
made any changes to the package. This will mean, for example, that on
R-forge it will look like there is a slightly newer version, even though
there is not really.

I am curious how other developers approach this. Is it better to not
specify --as-cran most of the time? My feeling is that it is better to
specify it all of the time so that I catch errors sooner rather than
later, but maybe there is a better solution?


Yes.  It is only recommended for use just before submission.  It is not 
used by the CRAN daily checks, for example.


All it does it set some environment variables that you can also set in 
~/.R/check.Renviron, scripts ... and that is what the CRAN team do.  We 
introduced --as-cran to make it easier to explain to submitters how to 
get the check results we reported [*].


As for what the set is, read 'R Internals' or the code (it will vary by 
R version).


Given that we get several submissions per week with the same version 
number or name as a package already on CRAN, we do need submitters to 
run the 'incoming' check before submission.


[*] Since answering several emails a day about why their results were 
different was taking up far too much time.




Paul

On 12-03-27 07:52 AM, Prof Brian Ripley wrote:

CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package maintainers.
In particular, please

- always send a submission email to c...@r-project.org with the package
name and version on the subject line. Emails sent to individual members
of the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it. Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are
able to give better diagnostics, e.g. for compiled code and especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages were
published last week) and to remain viable needs package maintainers to
make its life as easy as possible.

Kurt Hornik
Uwe Ligges
Brian Ripley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Gabor Grothendieck
2012/3/27 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 27.03.2012 17:09, Gabor Grothendieck wrote:

 On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley
 rip...@stats.ox.ac.uk  wrote:

 CRAN has for some time had a policies page at
 http://cran.r-project.org/web/packages/policies.html
 and we would like to draw this to the attention of package maintainers.
  In
 particular, please

 - always send a submission email to c...@r-project.org with the package
 name and version on the subject line.  Emails sent to individual members
 of
 the team will result in delays at best.

 - run R CMD check --as-cran on the tarball before you submit it.  Do
 this with the latest version of R possible: definitely R 2.14.2,
 preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
 able to give better diagnostics, e.g. for compiled code and especially
 on Windows. They may also have extra checks for recently uncovered
 problems.)

 Also, please note that CRAN has a very heavy workload (186 packages were
 published last week) and to remain viable needs package maintainers to
 make
 its life as easy as possible.


 Regarding the part about warnings or significant notes in that page,
 its impossible to know which notes are significant and which ones are
 not significant except by trial and error.



 Right, it needs human inspection to identify false positives. We believe
 most package maintainers are able to see if he or she is hit by such a false
 positive.

The problem is that a note is generated and the note is correct. Its
not a false positive.  But that does not tell you whether its
significant or not.  There is no way to know.  One can either try to
remove all notes (which may not be feasible) or just upload it and by
trial and error find out if its accepted or not.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Gabor Grothendieck
2012/3/27 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 27.03.2012 19:10, Jeffrey Ryan wrote:

 Is there a distinction as to NOTE vs. WARNING that is documented?  I've
 always assumed (wrongly?) that NOTES weren't an issue with publishing on
 CRAN, but that they may change to WARNINGS at some point.


 We won't kick packages off CRAN for Notes (but we will if Warnings are not
 fixed), but we may not accept new submissions with significant Notes.

Yes, I understand that but that does not really address the problem
that one has no idea of whether a Note is significant or not so the
only way to determine its significance is to submit your package and
see if its accepted or not.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Paul Gilbert
An associated problem, for the wish list, is that it would be nice for 
package developers to have a way to automatically distinguish between 
NOTEs that can usually be ignored (e.g. a package suggests a package 
that is not available for cross reference checks - I have several case 
where the suggested package depends on the package being built, so this 
NOTE occurs all the time), and NOTEs that are really pre-WARNINGS, so 
that one can flag these and spend time fixing them before they become a 
WARNING or ERROR. Perhaps two different kinds of notes?


(And, BTW, having been responsible for a certain amount of the
  [*] Since answering several emails a day about why their
  results were different was taking up far too much time.
I think --as-cran is great.)

Paul

On 12-03-27 02:19 PM, Uwe Ligges wrote:



On 27.03.2012 19:10, Jeffrey Ryan wrote:

Is there a distinction as to NOTE vs. WARNING that is documented? I've
always assumed (wrongly?) that NOTES weren't an issue with publishing on
CRAN, but that they may change to WARNINGS at some point.


We won't kick packages off CRAN for Notes (but we will if Warnings are
not fixed), but we may not accept new submissions with significant Notes.

Best,
Uwe Ligges




Is the process by which this happens documented somewhere?

Jeff

On 3/27/12 11:09 AM, Gabor Grothendieckggrothendi...@gmail.com wrote:


2012/3/27 Uwe Liggeslig...@statistik.tu-dortmund.de:



On 27.03.2012 17:09, Gabor Grothendieck wrote:


On Tue, Mar 27, 2012 at 7:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:


CRAN has for some time had a policies page at
http://cran.r-project.org/web/packages/policies.html
and we would like to draw this to the attention of package
maintainers.
In
particular, please

- always send a submission email to c...@r-project.org with the
package
name and version on the subject line. Emails sent to individual
members
of
the team will result in delays at best.

- run R CMD check --as-cran on the tarball before you submit it. Do
this with the latest version of R possible: definitely R 2.14.2,
preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are
able to give better diagnostics, e.g. for compiled code and
especially
on Windows. They may also have extra checks for recently uncovered
problems.)

Also, please note that CRAN has a very heavy workload (186 packages
were
published last week) and to remain viable needs package
maintainers to
make
its life as easy as possible.



Regarding the part about warnings or significant notes in that page,
its impossible to know which notes are significant and which ones are
not significant except by trial and error.




Right, it needs human inspection to identify false positives. We
believe
most package maintainers are able to see if he or she is hit by such a
false
positive.


The problem is that a note is generated and the note is correct. Its
not a false positive. But that does not tell you whether its
significant or not. There is no way to know. One can either try to
remove all notes (which may not be feasible) or just upload it and by
trial and error find out if its accepted or not.

--
Statistics Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Hadley Wickham
 I have been wondering if it is possible to automate the checking
 process to reduce human efforts, e.g. automatically check the packages
 submitted to FTP, and send the package maintainer an email in case of
 warnings or errors (otherwise just move it to CRAN); package
 maintainers can appeal for a manual check by CRAN maintainers in case
 of false positives.

I've started using win-builder before submitting to CRAN.  This often
picks up problems that I don't see locally.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Hadley Wickham
On Tue, Mar 27, 2012 at 6:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 CRAN has for some time had a policies page at
 http://cran.r-project.org/web/packages/policies.html
 and we would like to draw this to the attention of package maintainers.  In
 particular, please

Thanks for the pointer - I did not know that this page existed. In
general, is there some easy way to track changes to this page and the
R extension manual over time?  It is difficult to keep track of the
best practices.

I'd also like to get clarification on Packages should not write in
the users' home filespace, nor anywhere else on the file system apart
from the R session's temporary directory (or during installation in
the location pointed to by TMPDIR: and such usage should be cleaned
up). - what is recommended practice for packages to maintain state
across instances?  Operating systems have standards for where
applications can store settings (e.g. as described in
http://pypi.python.org/pypi/appdirs/1.2.0).  Is it acceptable to for
packages to follow these conventions?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Murray Stokely
Lots of very sensible policies here.  I have one request as someone
who has in several cases had to involve company lawyers over
intellectual property issues with packages on CRAN -- the first bullet
point on ownership of copyright and intellectual property rights could
be strengthened further.

To the existing text The ownership of copyright and intellectual
property rights of all components of the package must be clear and
unambiguous (including from the authors specification in the
DESCRIPTION file). Where code is copied (or derived) from the work of
others (including from R itself), care must be taken that any
copyright statements are preserved and authorship is not
misrepresented.
Trademarks must be respected.

I would add a few additional points :

1. The text of the license itself should be included in the package in
a LICENSE or COPYING file, as most of these licenses have things that
need to be filled in with names and other data, and just referencing a
license name in the DESCRIPTION file is not really a great way to deal
with licensing metadata when used exclusively (it's a great complement
to a full, filled-out license in the package itself).

2. Per file copyright comment headers can help immensely with ensuring
compliance and the accidental incorporation of files under a different
license.  Comment header blocks with the author name and terms of
distribution could be recommended for all source files.

   - Murray

On Tue, Mar 27, 2012 at 4:52 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 CRAN has for some time had a policies page at
 http://cran.r-project.org/web/packages/policies.html
 and we would like to draw this to the attention of package maintainers.  In
 particular, please

 - always send a submission email to c...@r-project.org with the package
 name and version on the subject line.  Emails sent to individual members of
 the team will result in delays at best.

 - run R CMD check --as-cran on the tarball before you submit it.  Do
 this with the latest version of R possible: definitely R 2.14.2,
 preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
 able to give better diagnostics, e.g. for compiled code and especially
 on Windows. They may also have extra checks for recently uncovered
 problems.)

 Also, please note that CRAN has a very heavy workload (186 packages were
 published last week) and to remain viable needs package maintainers to make
 its life as easy as possible.

 Kurt Hornik
 Uwe Ligges
 Brian Ripley

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel