Re: [Rd] Environment setting _R_CHECK_DEPENDS_ONLY_='true'

2021-10-23 Thread John Maindonald via R-devel
A footnote, following an off-list exchange with Prof Ripley, is that I needed
needed to have Suggested or other additional packages installed somewhere
other than .Library .

"The following variables control checks for undeclared/unconditional use of 
other packages. They work by setting up a temporary library directory and 
setting .libPaths() to just that and .Library, so are only effective if 
additional packages are installed somewhere other than .Library.”
[I am not sure of the source of this quote.]

If vignettes make extensive use of Suggested packages,
then exiting early from vignettes when access would otherwise be required to
Suggested package [under knitr, one can use knitr::knit_exit()]  can be an
alternative to leaving out checking of vignettes in order to speed up initial
testing.

On MacOS Mojave with a bash shell
  env _R_CHECK_DEPENDS_ONLY_=true  R CMD check qra_0.2.4.tar.gz
works like a charm.  Some other Unix systems will omit the ‘='


John Maindonald email: 
john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au>


On 21/10/2021, at 02:31, Dirk Eddelbuettel 
mailto:e...@debian.org>> wrote:


On 20 October 2021 at 09:31, Sebastian Meyer wrote:
| If you set the environment variable inside a running R process, it will
| only affect that process and child processes, but not an independent R
| process launched from a shell like you seem to be doing here:

Yes. That is somewhat common, if obscure, knowledge by those bitten before.

Maybe a line or two could be / should be added to the docs to that effect?

| How to set environment variables is system-specific. On a Unix-like
| system, you could use the command
|
| _R_CHECK_DEPENDS_ONLY_=true  R CMD check qra_0.2.4.tar.gz
|
| to set the environment variable for this R process.
| See, e.g., 
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FEnvironment_variabledata=04%7C01%7Cjohn.maindonald%40anu.edu.au%7Cb519af02f1df454df49208d993cdea27%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637703335008269211%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=aCCVvvnWQaRxtzxuUJ5lDKcPfMU2BzCJnDRC%2BTa4TnI%3Dreserved=0.

R does have hooks for this, I had these for a few years now:

 ~/.R/check.Renviron
 ~/.R/check.Renviron-Rdevel

Again, might be worthwhile documenting it in the Inst+Admin manual (if it
isn' already, I don't recall right now).

Dirk

--
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdirk.eddelbuettel.com%2Fdata=04%7C01%7Cjohn.maindonald%40anu.edu.au%7Cb519af02f1df454df49208d993cdea27%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637703335008269211%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=5MTPe0%2Ftqou%2B0DI8%2F7C4NYtM3tJCb4Vpwbe4klWiTco%3Dreserved=0
 | @eddelbuettel | e...@debian.org<mailto:e...@debian.org>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Environment setting _R_CHECK_DEPENDS_ONLY_='true'

2021-10-19 Thread John Maindonald via R-devel
Setting  
Sys,setenv('_R_CHECK_DEPENDS_ONLY_'=‘true’)
or Sys,setenv('_R_CHECK_DEPENDS_ONLY_’=TRUE)

(either appear to be acceptable) appears to have no effect when I do, e.g.

$R CMD check qra_0.2.4.tar.gz
* using log directory ‘/Users/johnm1/pkgs/qra.Rcheck’
* using R version 4.1.1 (2021-08-10)
* using platform: x86_64-apple-darwin17.0 (64-bit)
* using session charset: UTF-8
. . .

(This should have failed.)

I’d have expected that the "On most systems . . .” mentioned in the Writing R 
extensions 
manual (1.1.3.1 Suggested packages) would include my setup.

Any insight on what I am missing will be welcome.

John Maindonald email: john.maindon...@anu.edu.au




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Namespace is imported from in YAML header, but attracts Note that is is not imported from

2021-09-24 Thread John Maindonald
That is very helpful --- thank you
John

On Fri, 24 Sept 2021 at 17:42, Maëlle SALMON  wrote:

> Hello,
>
> It's better to get rid of this NOTE, by listing bookdown in
> VignetteBuilder and Suggests, not Imports see
> https://blog.r-hub.io/2020/06/03/vignettes/#infrastructure--dependencies-for-vignettes
> That's actually what you did in another package
> https://github.com/cran/gamclass/blob/master/DESCRIPTION (it's a
> coincidence I found a package of yours via code search in CRAN GitHub
> mirror :-) ).
>
> Maëlle.
>
>
> Den fredag 24 september 2021 03:00:57 CEST, John H Maindonald <
> jhmaindon...@gmail.com> skrev:
>
>
>
>
>
> On the Atlas and Linux builds of my package `qra` that has just been
> posted
> on CRAN, I am getting the message:
>
> > Namespace in Imports field not imported from: ‘bookdown’
> >  All declared Imports should be used.
>
> This, in spite of the fct that the YAML header in two of the Rmd files for
> the vignettes has:
>
> > output:
> >  bookdown::html_document2:
> >theme: cayman
>
> Do I need to worry about this?
>
> John Maindonald
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] R CMD check message: The following files should probably not be installed

2015-01-27 Thread John Maindonald
Sorry.  This, and the description in the �Writing R Extensions� manual,
leaves me completely mystified.  Is it that I have to remove the PDFs
that are created when I run �R CMD build�, and somehow ensure that
they are rebuilt when the package is installed?  Do I need a Makefile?


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au

phone : +61 2 (6125)3473fax  : +61 2(6125)5549

Centre for Mathematics  Its Applications, Room 1194,

John Dedman Mathematical Sciences Building (Building 27)

Australian National University, Canberra ACT 0200.


On 26 Jan 2015, at 22:00, 
r-devel-requ...@r-project.orgmailto:r-devel-requ...@r-project.org 
r-devel-requ...@r-project.orgmailto:r-devel-requ...@r-project.org wrote:

From: Prof Brian Ripley rip...@stats.ox.ac.ukmailto:rip...@stats.ox.ac.uk
Subject: Re: [Rd] R CMD check message: The following files should probably not 
be installed
Date: 26 January 2015 19:52:12 AEDT
To: r-devel@r-project.orgmailto:r-devel@r-project.org


On 25/01/2015 23:25, John Maindonald wrote:
I am doing [R version 3.1.2 (2014-10-31) -- Pumpkin Helmet�; Platform: 
x86_64-apple-darwin10.8.0 (64-bit)]

R CMD build DAAGviz
R CMD check DAAGviz_1.0.3.tar.gz

Without a .Rinstignore file, I get:


The following files should probably not be installed:
  �figs10.pdf�, �figs11.pdf�, �figs12.pdf�, �figs13.pdf�, �figs14.pdf�,
  �figs5.pdf�, �figs6.pdf�, �figs9.pdf�

Consider the use of a .Rinstignore file: see �Writing R Extensions�,
or move the vignette sources from �inst/doc� to �vignettes�.


The vignette sources were in �vignettes� when DAAGviz_1.0.3.tar.gz was created. 
 There was nothing in the �inst/doc� directory.

If I have in my .Rinstignore file

  inst/doc/.*[.]pdf

That filters out more than the files warned about.  I guess you meant

inst/doc/figs.*[.]pdf

But the question has to be: how did the files get copied into inst/doc?  Maybe

'When R CMD build builds the vignettes, it copies these and the vignette 
sources from directory vignettes to inst/doc. To install any other files from 
the vignettes directory, include a file vignettes/.install_extras which 
specifies these as Perl-like regular expressions on one or more lines. (See the 
description of the .Rinstignore file for full details.)'

suggests how?

then I get:


* checking package vignettes in �inst/doc� ... WARNING
Package vignettes without corresponding PDF/HTML:
. . .


What am I missing?  Can I ignore the The following files should probably not 
be installed� message?

Not if you want to submit the package to CRAN.



John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.orgmailto:R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 143, Issue 25

2015-01-27 Thread John Maindonald
OK, I see now that I was supposed to twig that the reference was to putting the 
‘.Rnw'
files back into the vignettes directory from the inst/doc directory where 
they’d been
placed in the course of creating the tar.gz file.  I am still trying to work 
out what I need 
to put into ‘.Rinstignore’ so that ‘.install_extras’ is not installed.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 26 Jan 2015, at 22:00, r-devel-requ...@r-project.org 
r-devel-requ...@r-project.org wrote:

 From: Prof Brian Ripley rip...@stats.ox.ac.uk
 Subject: Re: [Rd] R CMD check message: The following files should probably 
 not be installed
 Date: 26 January 2015 19:52:12 AEDT
 To: r-devel@r-project.org
 
 
 On 25/01/2015 23:25, John Maindonald wrote:
 I am doing [R version 3.1.2 (2014-10-31) -- Pumpkin Helmet”; Platform: 
 x86_64-apple-darwin10.8.0 (64-bit)]
 
 R CMD build DAAGviz
 R CMD check DAAGviz_1.0.3.tar.gz
 
 Without a .Rinstignore file, I get:
 
 
 The following files should probably not be installed:
   ‘figs10.pdf’, ‘figs11.pdf’, ‘figs12.pdf’, ‘figs13.pdf’, ‘figs14.pdf’,
   ‘figs5.pdf’, ‘figs6.pdf’, ‘figs9.pdf’
 
 Consider the use of a .Rinstignore file: see ‘Writing R Extensions’,
 or move the vignette sources from ‘inst/doc’ to ‘vignettes’.
 
 
 The vignette sources were in ‘vignettes’ when DAAGviz_1.0.3.tar.gz was 
 created.  There was nothing in the ‘inst/doc’ directory.
 
 If I have in my .Rinstignore file
 
   inst/doc/.*[.]pdf
 
 That filters out more than the files warned about.  I guess you meant
 
 inst/doc/figs.*[.]pdf
 
 But the question has to be: how did the files get copied into inst/doc?  Maybe
 
 'When R CMD build builds the vignettes, it copies these and the vignette 
 sources from directory vignettes to inst/doc. To install any other files from 
 the vignettes directory, include a file vignettes/.install_extras which 
 specifies these as Perl-like regular expressions on one or more lines. (See 
 the description of the .Rinstignore file for full details.)'
 
 suggests how?
 
 then I get:
 
 
 * checking package vignettes in ‘inst/doc’ ... WARNING
 Package vignettes without corresponding PDF/HTML:
 . . .
 
 
 What am I missing?  Can I ignore the The following files should probably 
 not be installed” message?
 
 Not if you want to submit the package to CRAN.
 
 
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 
 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How do I prevent '.install_extras' from being installed?

2015-01-27 Thread John Maindonald
So now I have:
 
vignettes/.install_extras:

inst/doc/figs.*[.]Rnw$


.Rinstignore:

[.]DS_Store
inst/doc/.*[.]pdf$
inst/doc/Sweavel.sty$
inst/doc/[.]install_extras$


Everything is fine except that 'R CMD check …’ generates the note:

Found the following hidden files and directories:
  inst/doc/.install_extras
These were most likely included in error. See section ‘Package
structure’ in the ‘Writing R Extensions’ manual.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R CMD check message: The following files should probably not be installed

2015-01-25 Thread John Maindonald
I am doing [R version 3.1.2 (2014-10-31) -- Pumpkin Helmet”; Platform: 
x86_64-apple-darwin10.8.0 (64-bit)]

 R CMD build DAAGviz
 R CMD check DAAGviz_1.0.3.tar.gz

Without a .Rinstignore file, I get:


The following files should probably not be installed:
  ‘figs10.pdf’, ‘figs11.pdf’, ‘figs12.pdf’, ‘figs13.pdf’, ‘figs14.pdf’,
  ‘figs5.pdf’, ‘figs6.pdf’, ‘figs9.pdf’

Consider the use of a .Rinstignore file: see ‘Writing R Extensions’,
or move the vignette sources from ‘inst/doc’ to ‘vignettes’.


The vignette sources were in ‘vignettes’ when DAAGviz_1.0.3.tar.gz was created. 
 There was nothing in the ‘inst/doc’ directory.


If I have in my .Rinstignore file

  inst/doc/.*[.]pdf

then I get:


* checking package vignettes in ‘inst/doc’ ... WARNING
Package vignettes without corresponding PDF/HTML:
. . .


What am I missing?  Can I ignore the The following files should probably not 
be installed” message?

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Correction in help(factanal)

2014-11-13 Thread John Maindonald

Thus factor analysis is in essence a model for the correlation matrix of x,

Σ = Λ'Λ + Ψ


This should surely be Σ = ΛΛ' + Ψ

Also line 3 under “Details” says

for a p–element row-vector x, …


x is here surely a column vector, albeit the transpose of a row vector
from the data matrix.

cf page 322 of “Modern Applied Statistics with S”, 4th edn.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] no visible binding for global variable for data sets in a package

2014-08-27 Thread John Maindonald
Re solution 2, the following is in the function tabFarsDead()
the latest (0.55) version of gamclass:

  data('FARS', package='gamclass', envir=environment())
  FARS - get(FARS, envir=environment())

The second statement is, strictly, redundant, but it
makes the syntax checker happy.  Another possibility
might be:

  FARS - NULL
 data('FARS', package='gamclass', envir=environment())

I do not know whether this passes.

An FAQ that offers preferred solutions to such chestnuts,
or a web page, or a blog, would seem to me useful.


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au

phone : +61 2 (6125)3473fax  : +61 2(6125)5549

Centre for Mathematics  Its Applications, Room 1194,

John Dedman Mathematical Sciences Building (Building 27)

Australian National University, Canberra ACT 0200.


On 27 Aug 2014, at 20:00, 
r-devel-requ...@r-project.orgmailto:r-devel-requ...@r-project.org 
r-devel-requ...@r-project.orgmailto:r-devel-requ...@r-project.org wrote:

From: Martin Maechler 
maech...@stat.math.ethz.chmailto:maech...@stat.math.ethz.ch
Subject: Re: [Rd] no visible binding for global variable for data sets in a 
package
Date: 27 August 2014 19:24:36 AEST
To: Michael Friendly frien...@yorku.camailto:frien...@yorku.ca
Cc: r-devel r-devel@r-project.orgmailto:r-devel@r-project.org
Reply-To: Martin Maechler 
maech...@stat.math.ethz.chmailto:maech...@stat.math.ethz.ch


Michael Friendly frien...@yorku.camailto:frien...@yorku.ca
   on Tue, 26 Aug 2014 17:58:34 -0400 writes:

I'm updating the Lahman package of baseball statistics to the 2013
release. In addition to
the main data sets, the package also contains several convenience
functions that make use
of these data sets.  These now trigger the notes below from R CMD check
run with
Win builder, R-devel.  How can I avoid these?

* using R Under development (unstable) (2014-08-25 r66471)
* using platform: x86_64-w64-mingw32 (64-bit)
...
* checking R code for possible problems ... NOTE
Label: no visible binding for global variable 'battingLabels'
Label: no visible binding for global variable 'pitchingLabels'
Label: no visible binding for global variable 'fieldingLabels'
battingStats: no visible binding for global variable 'Batting'
battingStats: no visible global function definition for 'mutate'
playerInfo: no visible binding for global variable 'Master'
teamInfo: no visible binding for global variable 'Teams'

One such function:

## function for accessing variable labels

Label - function(var, labels=rbind(battingLabels, pitchingLabels,
fieldingLabels)) {
wanted - which(labels[,1]==var)
if (length(wanted)) labels[wanted[1],2] else var
}

and you are using the data sets you mentioned before,
(and the checking has been changed recently here).

This is a bit subtle:
Your data sets are part of your package (thanks to the default
lazyData), but *not* part of the namespace of your package.
Now, the reasoning goes as following: if someone uses a function
from your package, say Label() above,
by
Lahman::Label(..)
and your package has not been attached to the search path,
your user will get an error, as the datasets are not found by
Label().

If you consider something like   Lahman::Label(..)
for a bit and the emphasis we put on R functions being the
primary entities, you can understand the current, i.e. new,
R CMD check warnings.

I see the following two options for you:

1) export all these data sets from your NAMESPACE
  For this (I thinK), you must define them in  Lahman/R/ possibly via a
  Lahman/R/sysdata.rda

2) rewrite your functions such that ensure the data sets are
  loaded when they are used.


2) actually works by adding
   stopifnot(require(Lahman, quietly=TRUE))
 as first line in Label() and other such functions.

It works in the sense that  Lahman::Label(yearID)  will
work even when Lahman is not in the search path,
but   R-devel CMD check   will still give the same NOTE,
though you can argue that that note is actally a false positive.

Not sure about another elegant way to make 2) work, apart from
using  data() on each of the datasets inside the
function.  As I haven't tried it, that may *still* give a
(false) NOTE..

This is a somewhat interesting problem, and I wonder if everyone
else has solved it with '1)' rather than a version of '2)'.

Martin


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 137, Issue 25

2014-07-28 Thread John Maindonald
Finding and not unnecessarily duplicating existing functionality is important 
also from a user perspective.  Negative binomial regression provides a somewhat 
extreme example of existing overlap between packages, with the scope that this 
creates for confusing users, especially as the notation is not consistent 
between these different implementations.

In addition to MASS::glm.nb(), note msme::negbinomial(), aod::negbin() and 
gamlss::gamlss().  The gamlss function fits two different types of NB model, 
either family = NBI (quadratic; var = mu(1 + sigma * mu)) as I think for all 
the functions above, or family=NBII (linear; var = mu(1 + sigma)).   Also note 
the somewhat special purpose function glmnb.fit() in the stat mod package, 
which requires preliminary setup steps.


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au

phone : +61 2 (6125)3473fax  : +61 2(6125)5549

Centre for Mathematics  Its Applications, Room 1194,

John Dedman Mathematical Sciences Building (Building 27)

Australian National University, Canberra ACT 0200.


On 28/07/2014, at 8:00 pm, Darren Norris wrote:

From: Darren Norris doo...@hotmail.commailto:doo...@hotmail.com
Subject: Re: [Rd] [Wishlist] a 'PackageDevelopment' Task View
Date: 28 July 2014 6:19:08 am AEST
To: r-devel@r-project.orgmailto:r-devel@r-project.org


Hi Luca,
Based on previous comments seems like
1) there should be a multi-functional/general category to cover packages
like devtools
2) I think finding existing function code ( e.g in cran packages / github )
is necessary and saves many hours in package development (no one wants to
develop a package and then discover they have just reinvented the wheel). So
including packages like sos seems justified and helpful.
Best,
Darren




--
View this message in context: 
http://r.789695.n4.nabble.com/Wishlist-a-PackageDevelopment-Task-View-tp4694537p4694625.html
Sent from the R devel mailing list archive at Nabble.comhttp://nabble.com/.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Error in Writing R Extensions

2013-10-02 Thread John Maindonald
In Section 1.4.2 of Writing R Extensions
 %\VignetteEngine{knitr::knitr}
should be
 %\VignetteEngine{knitr::knit}

 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

Is this sort of thing best reported here, or is a huge report in order?

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Failure to get compactPDF to compact a pdf file

2012-01-24 Thread John Maindonald
I am failing to get compactPDF to make any change to a pdf file
that, a/c to the message from the CRAN upload site, can be very
substantially compacted.  Any ideas what may be wrong?

I have also tried recreating the pdf file.  I also tried
R CMD build --resave-data --compact-vignettes DAAG

The data files compact alright (but I get the 'significantly better compression'
warning message that might suggest that this is not happening), but the pdf
file appears to go into the package unmodified.


 tools::compactPDF('/Users/johnm/packages/DAAG/inst/doc/', gs_quality = 
 ebook)
 dir('/Users/johnm/packages/DAAG/inst/doc/')
[1] rockArt.pdf
 sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.14.1


From the Unix command line:
jhm:doc johnm$ ls -lt /Users/johnm/packages/DAAG/inst/doc
total 1368
-rw-r--r--@ 1 johnm  staff  696762  2 Aug 12:35 rockArt.pdf

Message from the CRAN upload site:

* checking sizes of PDF files under ‘inst/doc’ ... NOTE
 ‘gs’ made some significant size reductions:
compacted ‘rockArt.pdf’ from 680Kb to 58Kb
 consider running tools::compactPDF(gs_quality = ebook) on these files

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Failure to get compactPDF to compact a pdf file

2012-01-24 Thread John Maindonald
Quoting from the R-2.14.1 help page for compactPDF:

This by default makes use of 'qpdf', available from URL:
http://qpdf.sourceforge.net/ (including as a Windows binary) and
included with the CRAN Mac OS X distribution of R.  If 'gs_cmd' is
non-empty, GhostScript will used instead.


The defaults are:
compactPDF(paths, qpdf = Sys.getenv(R_QPDF, qpdf),
   gs_cmd = Sys.getenv(R_GSCMD, ),
   gs_quality = c(printer, ebook, screen),
   gs_extras = character())

 Sys.getenv(R_QPDF, qpdf)
[1] /Library/Frameworks/R.framework/Resources/bin/qpdf
 Sys.getenv(R_GSCMD, )
[1] 
 

Thus, as far as I can see, compactPDF is set up (on my system) to use qpdf to 
compress.

I take it then that the Writing R Extensions manual [2.14.1 (2011-12-22)] is 
anticipating what is in R-devel:

The --compact-vignettes option will run tools::compactPDF over the PDF files in 
inst/doc (and its subdirectories) to losslessly compress them. This is not 
enabled by default (it can be selected by environment variable 
_R_BUILD_COMPACT_VIGNETTES_) and needs qpdf(http://qpdf.sourceforge.net/) to be 
available.


John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 24/01/2012, at 11:22 PM, Prof Brian Ripley wrote:

 On 24/01/2012 08:30, John Maindonald wrote:
 I am failing to get compactPDF to make any change to a pdf file
 that, a/c to the message from the CRAN upload site, can be very
 substantially compacted.  Any ideas what may be wrong?
 
 AFAICS you are quoting a message from R-devel, which tries to find 'gs' for 
 you.  In R 2.14.1 you need to tell compactPDF where it is (assuming you do 
 have it installed): see its help.
 
 I have also tried recreating the pdf file.  I also tried
 R CMD build --resave-data --compact-vignettes DAAG
 
 Again, in R-devel you can do R CMD build --compact-vignettes=gs (assuming 
 that is in your path or R_GSCMD is set), but not in R 2.14.1.
 
 But I have already told you directly (and been ignored) that the problem is 
 the excessive resolution of the embedded bitmap image which needs to be 
 down-sampled.
 
 
 The data files compact alright (but I get the 'significantly better 
 compression'
 warning message that might suggest that this is not happening), but the pdf
 file appears to go into the package unmodified.
 
 
 tools::compactPDF('/Users/johnm/packages/DAAG/inst/doc/', gs_quality = 
 ebook)
 dir('/Users/johnm/packages/DAAG/inst/doc/')
 [1] rockArt.pdf
 sessionInfo()
 R version 2.14.1 (2011-12-22)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 loaded via a namespace (and not attached):
 [1] tools_2.14.1
 
 
 From the Unix command line:
 jhm:doc johnm$ ls -lt /Users/johnm/packages/DAAG/inst/doc
 total 1368
 -rw-r--r--@ 1 johnm  staff  696762  2 Aug 12:35 rockArt.pdf
 
 Message from the CRAN upload site:
 
 * checking sizes of PDF files under ‘inst/doc’ ... NOTE
 ‘gs’ made some significant size reductions:
compacted ‘rockArt.pdf’ from 680Kb to 58Kb
 consider running tools::compactPDF(gs_quality = ebook) on these files
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 100, Issue 28

2011-06-29 Thread John Maindonald
I get the same style of path as Hadley.  This is on Windows 7 Home Premium with 
SP1.
I start R by clicking on the R-2.31.0 icon.

I'd assumed that it was a change that came with R-2.13.0!
(On 32-bit Windows XP, which I have just checked, I do indeed get the 8.3 
paths.)

 R.home()
[1] C:/Programs/R/R-2.13.0
 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252 
[2] LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C  
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] stats graphics  grDevices utils datasets 
[6] methods   base 

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

 From: Duncan Murdoch murdoch.dun...@gmail.com
 Date: 29 June 2011 10:17:46 AM AEST
 To: Hadley Wickham had...@rice.edu
 Cc: Simon Urbanek simon.urba...@r-project.org, r-devel@r-project.org
 Subject: Re: [Rd] Small bug in install.packages?
 
 
 On 28/06/2011 5:42 PM, Hadley Wickham wrote:
 Isn't R.home() 8.3 path anyway?
 
 I don't think so:
 
 R.home(bin)
 [1] C:/Program Files/R/R-2.13.0/bin/i386
 
 Weird.  Like others, I see 8.3 pathnames.  R gets those from a Windows call; 
 what version of Windows are you using?
 
 Duncan Murdoch
 


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] anova.lm fails with test=Cp

2011-05-10 Thread John Maindonald
For unknown sigma^2, a version that is a modification of AIC
may be preferred, i.e.

n log(RSS/n) + 2p - n

I notice that this is what is given in Maindonald and Braun (2010)
Data Analysis  Graphics Using R, 3rd edition.

Cf: Venables and Ripley, MASS, 4th edn, p.174.  VR do however
stop short of actually saying that Cp should be modified in the same
way as AIC when sigma^2 has to be estimated.  

Better still, perhaps,  give the AIC statistic.  This would make the
output consistent with dropterm(), drop1() and add1().  Or if Cp
is to stay, allow AIC as a a further test.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 08/05/2011, at 6:15 PM, peter dalgaard wrote:

 
 On May 8, 2011, at 09:25 , John Maindonald wrote:
 
 Here is an example, modified from the help page to use test=Cp:
 
 
 fit0 - lm(sr ~ 1, data = LifeCycleSavings)
 fit1 - update(fit0, . ~ . + pop15)
 fit2 - update(fit1, . ~ . + pop75)
 anova(fit0, fit1, fit2, test=Cp)
 Error in `[.data.frame`(table, , Resid. Dev) : 
 undefined columns selected
 
 Yes, the Resid. Dev column is only there in analysis of deviance tables. 
 For the lm() case, it looks like you should have RSS. 
 
 This has probably been there forever. Just goes to show how often people 
 use these things...
 
 Also, now that I'm looking at it, are we calculating it correctly in any 
 case? We have
 
   cbind(table, Cp = table[, Resid. Dev] + 2 * scale * 
   (n - table[, Resid. Df]))
 
 whereas all the references I can find have Cp=RSS/MS-N+2P, so the above would 
 actually be scale*Cp+N. 
 
 
 -- 
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] anova.lm fails with test=Cp

2011-05-08 Thread John Maindonald
Here is an example, modified from the help page to use test=Cp:


 fit0 - lm(sr ~ 1, data = LifeCycleSavings)
 fit1 - update(fit0, . ~ . + pop15)
 fit2 - update(fit1, . ~ . + pop75)
 anova(fit0, fit1, fit2, test=Cp)
Error in `[.data.frame`(table, , Resid. Dev) : 
 undefined columns selected
 sessionInfo()
R version 2.13.0 Patched (2011-04-28 r55678)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C  
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.13.0

--

The help page says for  test: 
a character string specifying the test statistic to be used. 
Can be one of F, Chisq or Cp, with partial matching allowed, 
or NULL for no test.

test=Cp is, following the help page, intended to work?  Setting
the scale parameter does not help.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compression of largish expression array files in the DAAGbio/inst/doc directory?

2011-04-09 Thread John Maindonald
Thanks.  That seems to work.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 09/04/2011, at 4:58 PM, Prof Brian Ripley wrote:

 As far as I can see read.maimages is built on top of R's own file-reading 
 facilties, and they all read compressed (but not zipped) files as from R 
 2.10.0.
 
 So simply use
 
 gzip -9 coral55?.spot
 
 and rename the files back to *.spot.
 
 If you need more compression, use xz -9e.  (You can also do this in R: 
 readLines() on the file, writeLines() using gzfile or xzfile.)
 
 You will need to make the package 'Depends: R (= 2.10)'.
 
 On Sat, 9 Apr 2011, John Maindonald wrote:
 
 The inst/doc directory of the DAAG package has 6 files coral551.spot, ... 
 that
 are around 0.85 MB each.  It would be useful to be able to zip then, but that
 as matters stand interferes with the use of the Sweave file that uses them to
 demonstrate input of expression array data that is in the spot format.  
 They
 do not automatically get unzipped when required.  I have checked that
 read.maimages (in limma) does not, unless I have missed something, have
 an option for reading zipped files.  Is there any way to get around this 
 without
 substantially complicating the exposition in marray-notes.pdf (also in the
 inst/doc subdirectory)?
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Compression of largish expression array files in the DAAGbio/inst/doc directory?

2011-04-08 Thread John Maindonald
The inst/doc directory of the DAAG package has 6 files coral551.spot, ... that
are around 0.85 MB each.  It would be useful to be able to zip then, but that
as matters stand interferes with the use of the Sweave file that uses them to
demonstrate input of expression array data that is in the spot format.  They
do not automatically get unzipped when required.  I have checked that 
read.maimages (in limma) does not, unless I have missed something, have 
an option for reading zipped files.  Is there any way to get around this without
substantially complicating the exposition in marray-notes.pdf (also in the
inst/doc subdirectory)?

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Standardized Pearson residuals

2011-03-16 Thread John Maindonald
One can easily test for the binary case and not give the statistic in that case.

A general point is that if one gave no output that was not open to abuse,
there'd be nothing given at all!  One would not be giving any output at all
from poisson or binomial models, given that data that really calls for 
quasi links (or a glmm with observation level random effects) is in my
experience the rule rather than the exception!

At the very least, why not a function dispersion() or pearsonchisquare()
that gives this information.

Apologies that I misattributed this.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 16/03/2011, at 12:41 AM, peter dalgaard wrote:

 
 On Mar 15, 2011, at 13:42 , John Maindonald wrote:
 
 Peter Dalgaard: It would also be nice for teaching purposes if glm or 
 summary.glm had a
 pearsonchisq component and a corresponding extractor function, but I
 can imagine that there might be arguments against it that haven't
 occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
 to repair a bug. If I'm wrong about all that though, ...
 
 
 Umm, that was Brett, actually.

 This would remedy what I have long judged a deficiency in summary.glm().
 The information is important for diagnostic purposes.  One should not have
 to fit a model with a quasi error, or suss out how to calculate the Pearson
 chi square from the glm model object, to discover that the information in the
 model object is inconsistent with simple binomial or poisson assumptions.
 
 It could be somewhere between useless and misleading in cases like binary 
 logistic regression though. (Same thing goes for the test against the 
 saturated model: Sometimes it makes sense and sometimes not.)
 
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 On 15/03/2011, at 10:00 PM, r-devel-requ...@r-project.org wrote:
 
 From: Brett Presnell presn...@stat.ufl.edu
 Date: 15 March 2011 2:40:29 PM AEDT
 To: peter dalgaard pda...@gmail.com
 Cc: r-devel@r-project.org
 Subject: Re: [Rd] Standardized Pearson residuals
 
 
 
 Thanks Peter.  I have just a couple of minor comments, and another
 possible feature request, although it's one that I don't think will be
 implemented.
 
 peter dalgaard pda...@gmail.com writes:
 
 On Mar 14, 2011, at 22:25 , Brett Presnell wrote:
 
 
 Is there any reason that rstandard.glm doesn't have a pearson option?
 And if not, can it be added?
 
 Probably... I have been wondering about that too. I'm even puzzled why
 it isn't the default. Deviance residuals don't have quite the
 properties that one might expect, e.g. in this situation, the absolute
 residuals sum pairwise to zero, so you'd expect that the standardized
 residuals be identical in absolute value
 
 y - 1:4
 r - c(0,0,1,1)
 c - c(0,1,0,1)
 rstandard(glm(y~r+c,poisson))
   1  2  3  4 
 -0.2901432  0.2767287  0.2784603 -0.2839995 
 
 in comparison,
 
 i - influence(glm(y~r+c,poisson))
 i$pear.res/sqrt(1-i$hat)
   1  2  3  4 
 -0.2817181  0.2817181  0.2817181 -0.2817181 
 
 The only thing is that I'm always wary of tampering with this stuff,
 for fear of finding out the hard way why thing are the way they
 are
 
 I'm sure that's wise, but it would be nice to get it in as an option,
 even if it's not the default
 
 Background: I'm currently teaching an undergrad/grad-service course from
 Agresti's Introduction to Categorical Data Analysis (2nd edn) and
 deviance residuals are not used in the text.  For now I'll just provide
 the students with a simple function to use, but I prefer to use R's
 native capabilities whenever possible.
 
 Incidentally, chisq.test will have a stdres component in 2.13.0 for
 much the same reason.
 
 Thank you.  That's one more thing I won't have to provide code for
 anymore.  Coincidentally, Agresti mentioned this to me a week or two ago
 as something that he felt was missing, so that's at least two people who
 will be happy to see this added.
 
 It would also be nice for teaching purposes if glm or summary.glm had a
 pearsonchisq component and a corresponding extractor function, but I
 can imagine that there might be arguments against it that haven't
 occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
 to repair a bug. If I'm wrong about all that though, ...
 
 BTW, as I go along I'm trying to collect a lot of the datasets from the
 examples and exercises in the text into an R package (icda).  It's far
 from complete

Re: [Rd] Standardized Pearson residuals

2011-03-15 Thread John Maindonald
 Peter Dalgaard: It would also be nice for teaching purposes if glm or 
 summary.glm had a
 pearsonchisq component and a corresponding extractor function, but I
 can imagine that there might be arguments against it that haven't
 occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
 to repair a bug. If I'm wrong about all that though, ...

This would remedy what I have long judged a deficiency in summary.glm().
The information is important for diagnostic purposes.  One should not have
to fit a model with a quasi error, or suss out how to calculate the Pearson
chi square from the glm model object, to discover that the information in the
model object is inconsistent with simple binomial or poisson assumptions.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 15/03/2011, at 10:00 PM, r-devel-requ...@r-project.org wrote:

 From: Brett Presnell presn...@stat.ufl.edu
 Date: 15 March 2011 2:40:29 PM AEDT
 To: peter dalgaard pda...@gmail.com
 Cc: r-devel@r-project.org
 Subject: Re: [Rd] Standardized Pearson residuals
 
 
 
 Thanks Peter.  I have just a couple of minor comments, and another
 possible feature request, although it's one that I don't think will be
 implemented.
 
 peter dalgaard pda...@gmail.com writes:
 
 On Mar 14, 2011, at 22:25 , Brett Presnell wrote:
 
 
 Is there any reason that rstandard.glm doesn't have a pearson option?
 And if not, can it be added?
 
 Probably... I have been wondering about that too. I'm even puzzled why
 it isn't the default. Deviance residuals don't have quite the
 properties that one might expect, e.g. in this situation, the absolute
 residuals sum pairwise to zero, so you'd expect that the standardized
 residuals be identical in absolute value
 
 y - 1:4
 r - c(0,0,1,1)
 c - c(0,1,0,1)
 rstandard(glm(y~r+c,poisson))
 1  2  3  4 
 -0.2901432  0.2767287  0.2784603 -0.2839995 
 
 in comparison,
 
 i - influence(glm(y~r+c,poisson))
 i$pear.res/sqrt(1-i$hat)
 1  2  3  4 
 -0.2817181  0.2817181  0.2817181 -0.2817181 
 
 The only thing is that I'm always wary of tampering with this stuff,
 for fear of finding out the hard way why thing are the way they
 are
 
 I'm sure that's wise, but it would be nice to get it in as an option,
 even if it's not the default
 
 Background: I'm currently teaching an undergrad/grad-service course from
 Agresti's Introduction to Categorical Data Analysis (2nd edn) and
 deviance residuals are not used in the text.  For now I'll just provide
 the students with a simple function to use, but I prefer to use R's
 native capabilities whenever possible.
 
 Incidentally, chisq.test will have a stdres component in 2.13.0 for
 much the same reason.
 
 Thank you.  That's one more thing I won't have to provide code for
 anymore.  Coincidentally, Agresti mentioned this to me a week or two ago
 as something that he felt was missing, so that's at least two people who
 will be happy to see this added.
 
 It would also be nice for teaching purposes if glm or summary.glm had a
 pearsonchisq component and a corresponding extractor function, but I
 can imagine that there might be arguments against it that haven't
 occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
 to repair a bug. If I'm wrong about all that though, ...
 
 BTW, as I go along I'm trying to collect a lot of the datasets from the
 examples and exercises in the text into an R package (icda).  It's far
 from complete and what is there needed tidying up, but I hope to
 eventually to round it into shape and put it on CRAN, assuming that
 Agresti approves and that there are no copyright issues.
 
 I think something along the following lines should do it:
 
 rstandard.glm -
 function(model,
  infl=influence(model, do.coef=FALSE),
  type=c(deviance, pearson), ...)
 {
 type - match.arg(type)
 res - switch(type, pearson = infl$pear.res, infl$dev.res)
 res - res/sqrt(1-infl$hat)
 res[is.infinite(res)] - NaN
 res
 }
 


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] keep.source when semicolons separate statements on the one line

2011-02-03 Thread John Maindonald
The following is 'semicolon.Rnw'

 \SweaveOpts{engine=R, keep.source=TRUE}
 
 xycig-A, eval=f, echo=f=
 library(SMIR); data(bronchit); library(KernSmooth)
 @ %
 
 Code for panel A is
 code-xycig-A, eval=f, echo=t=
 xycig-A
 @ %

Sweave(semicolon) yields the following 'semicolon.tex'

 Code for panel A is
 \begin{Schunk}
 \begin{Sinput}
 library(SMIR); data(bronchit); library(KernSmooth)
 library(SMIR); data(bronchit); library(KernSmooth)
 library(SMIR); data(bronchit); library(KernSmooth)
 \end{Sinput}
 \end{Schunk}

(I have omitted three blank lines at the start)

With keep.source=FALSE, the commands are split onto 
separate lines, and there is no repetition.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] keep.source when semicolons separate statements on the one line; PS

2011-02-03 Thread John Maindonald
I forgot to add the sessionInfo() information:

 sessionInfo()
R version 2.12.1 Patched (2011-01-22 r54081)
Platform: x86_64-pc-mingw32/x64 (64-bit)
. . .


The following is 'semicolon.Rnw'

 \SweaveOpts{engine=R, keep.source=TRUE}
 
 xycig-A, eval=f, echo=f=
 library(SMIR); data(bronchit); library(KernSmooth)
 @ %
 
 Code for panel A is
 code-xycig-A, eval=f, echo=t=
 xycig-A
 @ %

Sweave(semicolon) yields the following 'semicolon.tex'

 Code for panel A is
 \begin{Schunk}
 \begin{Sinput}
 library(SMIR); data(bronchit); library(KernSmooth)
 library(SMIR); data(bronchit); library(KernSmooth)
 library(SMIR); data(bronchit); library(KernSmooth)
 \end{Sinput}
 \end{Schunk}

(I have omitted three blank lines at the start)

With keep.source=FALSE, the commands are split onto 
separate lines, and there is no repetition.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug report 14459 -- procedure for handling follow-up issues

2010-12-21 Thread John Maindonald
Although the specific behaviour that was reported has been fixed, bugs 
remain in Sweave's processing of comment lines when keep.source=TRUE

This is in some senses a follow-up from earlier bugs.  Hence the query --
what is the preferred procedure, to submit a new bug report?  (Another option 
might be to add a comment to the web page for bug 14459.)

Is there now a preference to submit via the web page, rather than send a message
to r-b...@r-project.org?  If so, the relevant paragraph in the FAQ surely 
requires 
updating:


On Unix-like systems a bug report can be generated using the function 
bug.report(). This automatically includes the version information and sends the 
bug to the correct address. Alternatively the bug report can be emailed to 
r-b...@r-project.org or submitted to the Web page at 
http://bugs.R-project.org/. Please try including results of sessionInfo() in 
your bug report.


I have posted files test10.Rnw, test11.Rnw, and test12.Rnw that demonstrate the 
bugs at 
http://www.maths.anu.edu.au/~johnm/r/issues/
The output files test10.tex, test11.tex and test12.tex are from r53870 on 
x86_64-apple-darwin9.8.0/x86_64 (64-bit)

test10.Rnw has a code chunk that begins and ends with a comment.  
An NA appears following the final comment.  This disappears if I
remove the initial comment line.

test11.Rnw follows a comment line with a named code chunk.  The
comment line does not appear in the output.

test12.Rnw places a line of code between the comment line and the
named code chunk.  The comment line does now appear in the output.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug report 14459 -- procedure for handling follow-up issues

2010-12-21 Thread John Maindonald
Thanks.  It is useful to have a list of items that are outstanding.
I will experiment a bit more, but may revert to using R-2.11.1 for
running Sweave().  Did any of these issues arise for R-2.11.1?

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 22/12/2010, at 3:14 AM, Duncan Murdoch wrote:

 On 21/12/2010 3:23 AM, John Maindonald wrote:
 Although the specific behaviour that was reported has been fixed, bugs
 remain in Sweave's processing of comment lines when keep.source=TRUE
 
 This is in some senses a follow-up from earlier bugs.  Hence the query --
 what is the preferred procedure, to submit a new bug report?  (Another option
 might be to add a comment to the web page for bug 14459.)
 
 Is there now a preference to submit via the web page, rather than send a 
 message
 to r-b...@r-project.org?  If so, the relevant paragraph in the FAQ surely 
 requires
 updating:
 
 
 On Unix-like systems a bug report can be generated using the function 
 bug.report(). This automatically includes the version information and sends 
 the bug to the correct address. Alternatively the bug report can be emailed 
 to r-b...@r-project.org or submitted to the Web page at 
 http://bugs.R-project.org/. Please try including results of sessionInfo() in 
 your bug report.
 
 
 I have posted files test10.Rnw, test11.Rnw, and test12.Rnw that demonstrate 
 the bugs at
 http://www.maths.anu.edu.au/~johnm/r/issues/
 The output files test10.tex, test11.tex and test12.tex are from r53870 on
 x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 test10.Rnw has a code chunk that begins and ends with a comment.
 An NA appears following the final comment.  This disappears if I
 remove the initial comment line.
 
 This is now fixed.  It was a different bug than 14459.
 
 test11.Rnw follows a comment line with a named code chunk.  The
 comment line does not appear in the output.
 
 test12.Rnw places a line of code between the comment line and the
 named code chunk.  The comment line does now appear in the output.
 
 These look like a different issue, and are still unfixed, and are unlikely to 
 be fixed soon.
 
 The problem is that the handling of source references in Sweave is messy, and 
 needs a major cleanup, which takes time.  Between now and at least 
 mid-February I won't have the time it would take, and I don't know anyone 
 else who would do it.  So I would not bet on these fixes getting done before 
 2.13.0.
 
 The problems I know about are these:
 
 - if you use a named chunk chunkname in another, you won't get
 leading and trailing comments on the named chunk.
 
 - if you mix named chunks and \SweaveInput, you won't get the original source 
 at all in the expanded chunks.
 
 Your examples look like the first of these.  I had thought the comments had 
 to be in the chunk to get lost, but apparently not.
 
 Just to make priorities clear:  in the short term I will fix bugs where NAs 
 show up inappropriately.  I will not fix bugs involving dropping leading or 
 trailing comments when there are simple workarounds.  (The workaround in your 
 case is not to use the named chunk.)
 
 Duncan Murdoch
 
 
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wishlist for plot.lm() (PR#13560)

2009-02-28 Thread john . maindonald
Full_Name: John Maindonald
Version: R-2.8.1
OS: MacOS X 10.5.6
Submission from: (NULL) (203.173.3.75)


The following code demonstrates an annoyance with plot.lm():

library(DAAGxtras)
x11(width=3.75, height=4)
nihills.lm - lm(log(time) ~ log(dist) + log(climb), data = nihills)
plot(nihills.lm, which=5)

OR try the following
xy - data.frame(x=c(3,1:5), y=c(-2, 1:5))
plot(lm(y ~ x, data=xy), which=5)

The Cook's distance text overplots the label for the point with the smallest
residual.  This is an issue when the size of the plot is much less than the
default, and the pointsize is not reduced proportionately.


I suggest the following:
 xx - hii
 xx[xx = 1] - NA
## Insert new code
 fracht - (1.25*par()$cin[2])/par()$pin[2]
 ylim[1] - ylim[1] - diff(ylim)*max(0, fracht-0.04)
## End insert new code
 plot(xx, rsp, xlim = c(0, max(xx, na.rm = TRUE)),
  ylim = ylim, main = main, xlab = Leverage,
  ylab = ylab5, type = n, ...)

Then, about 15 lines further down, replace
   legend(bottomleft, legend = Cook's distance,
  lty = 2, col = 2, bty = n)

by
   legend(bottomleft, legend = Cook's distance,
  lty = 2, col = 2, text.col=2, bty = n, y.intersp=0.5)
 # This changes the legend color to agree with the line color

Another possibility, suggested by John Fox, is to replace the caption by Cook's
distance contours, and omit the legend entirely.  

Both John Fox and myself are comfortable with either of these fixes.

Test the changes with:
x11()
nihills.lm - lm(log(time) ~ log(dist) + log(climb), data = nihills)
plot(nihills.lm, which=5)
xy - data.frame(x=c(3,1:5), y=c(-2, 1:5))
plot(lm(y ~ x, data=xy), which=5)
x11(width=3.75, height=4)
plot(nihills.lm, which=5)
plot(lm(y ~ x, data=xy), which=5)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plot.lm: Cook's distance label can overplot point labels

2009-02-19 Thread John Maindonald
Actually, the contours and the smooth are currently printed with  
col=2.  This prints satisfactorily in grayscale.Colours (orange  
and darkred as well as col=2) are also used in termplot.


Does the stricture against colour extend to grayscale?  Does it  
apply to lines as well as text?


John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 19/02/2009, at 5:58 PM, Prof Brian Ripley wrote:


On Wed, 18 Feb 2009, John Fox wrote:


Dear John,


-Original Message-
From: John Maindonald [mailto:john.maindon...@anu.edu.au]
Sent: February-18-09 4:57 PM
To: John Fox
Cc: 'Martin Maechler'; r-devel@r-project.org
Subject: Re: [Rd] plot.lm: Cook's distance label can overplot  
point

labels


Dear John -
The title above the graph is also redundant for the first of the
plots; do we want to be totally consistent?  I am not sure.


Why not? A foolish consistency is the hobgoblin of little minds,  
but maybe

this isn't a foolish consistency.



It occurs to me that the text Cook's distance, as well as the
contours, might be in red.


That would provide a nice visual cue (for those who aren't colour  
blind).


Or using a black-and-white device.  We have not hitherto assumed a  
colour device in 'stats' graphics, and given how often they are  
printed I don't think we want to start.


As so often, it seems that what looks good is in the eye of the  
beholder.  If the two of you can agree on something that you both  
see is a definite improvement, please provide a patch and examples  
to try to persuade everyone else.  (As a Wishlist item on R-bugs, so  
it gets recorded.)




Best,
John


Regards
John.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 18/02/2009, at 12:27 PM, John Fox wrote:


Dear John,

It occurs to me that the title above the graph, Residuals vs.
Leverage, is
entirely redundant since the x-axis is labelled Leverage and  
the y-

axis
Studentized residuals. Why not use the title above the graph for
Cook's
distance countours?

Regards,
John


-Original Message-
From: r-devel-boun...@r-project.org

[mailto:r-devel-boun...@r-project.org

]

On

Behalf Of John Maindonald
Sent: February-17-09 5:54 PM
To: r-devel@r-project.org
Cc: Martin Maechler
Subject: [Rd] plot.lm: Cook's distance label can overplot point
labels

The following code demonstrates an annoyance with plot.lm():

library(DAAGxtras)
x11(width=3.75, height=4)
nihills.lm - lm(log(time) ~ log(dist) + log(climb), data =  
nihills)

plot(nihills.lm, which=5)

OR try the following
xy - data.frame(x=c(3,1:5), y=c(-2, 1:5))
plot(lm(y ~ x, data=xy), which=5)

The Cook's distance text overplots the label for the point  
with the

smallest residual.  This is an issue when the size of the plot is
much
less than the default, and the pointsize is not reduced
proportionately.


I suggest the following:
   xx - hii
   xx[xx = 1] - NA
## Insert new code
   fracht - (1.25*par()$cin[2])/par()$pin[2]
   ylim[1] - ylim[1] - diff(ylim)*max(0, fracht-0.04)
## End insert new code
   plot(xx, rsp, xlim = c(0, max(xx, na.rm = TRUE)),
ylim = ylim, main = main, xlab = Leverage,
ylab = ylab5, type = n, ...)

Then, about 15 lines further down, replace
 legend(bottomleft, legend = Cook's distance,
lty = 2, col = 2, bty = n)

by
 legend(bottomleft, legend = Cook's distance,
lty = 2, col = 2, bty = n, y.intersp=0.5)

If this second change is not made, then one wants fracht -
(1.5*par()
$cin[2])/par()$pin[2]
I prefer the Cook's distance text to be a bit closer to the x- 
axis,

as it separates it more clearly from any point labels.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plot.lm: Cook's distance label can overplot point labels

2009-02-18 Thread John Maindonald

Dear John -
The title above the graph is also redundant for the first of the  
plots; do we want to be totally consistent?  I am not sure.


It occurs to me that the text Cook's distance, as well as the  
contours, might be in red.

Regards
John.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 18/02/2009, at 12:27 PM, John Fox wrote:


Dear John,

It occurs to me that the title above the graph, Residuals vs.  
Leverage, is
entirely redundant since the x-axis is labelled Leverage and the y- 
axis
Studentized residuals. Why not use the title above the graph for  
Cook's

distance countours?

Regards,
John


-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org 
]

On

Behalf Of John Maindonald
Sent: February-17-09 5:54 PM
To: r-devel@r-project.org
Cc: Martin Maechler
Subject: [Rd] plot.lm: Cook's distance label can overplot point  
labels


The following code demonstrates an annoyance with plot.lm():

library(DAAGxtras)
x11(width=3.75, height=4)
nihills.lm - lm(log(time) ~ log(dist) + log(climb), data = nihills)
plot(nihills.lm, which=5)

OR try the following
xy - data.frame(x=c(3,1:5), y=c(-2, 1:5))
plot(lm(y ~ x, data=xy), which=5)

The Cook's distance text overplots the label for the point with the
smallest residual.  This is an issue when the size of the plot is  
much
less than the default, and the pointsize is not reduced  
proportionately.



I suggest the following:
 xx - hii
 xx[xx = 1] - NA
## Insert new code
 fracht - (1.25*par()$cin[2])/par()$pin[2]
 ylim[1] - ylim[1] - diff(ylim)*max(0, fracht-0.04)
## End insert new code
 plot(xx, rsp, xlim = c(0, max(xx, na.rm = TRUE)),
  ylim = ylim, main = main, xlab = Leverage,
  ylab = ylab5, type = n, ...)

Then, about 15 lines further down, replace
   legend(bottomleft, legend = Cook's distance,
  lty = 2, col = 2, bty = n)

by
   legend(bottomleft, legend = Cook's distance,
  lty = 2, col = 2, bty = n, y.intersp=0.5)

If this second change is not made, then one wants fracht -  
(1.5*par()

$cin[2])/par()$pin[2]
I prefer the Cook's distance text to be a bit closer to the x-axis,
as it separates it more clearly from any point labels.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] plot.lm: Cook's distance label can overplot point labels

2009-02-17 Thread John Maindonald

The following code demonstrates an annoyance with plot.lm():

library(DAAGxtras)
x11(width=3.75, height=4)
nihills.lm - lm(log(time) ~ log(dist) + log(climb), data = nihills)
plot(nihills.lm, which=5)

OR try the following
xy - data.frame(x=c(3,1:5), y=c(-2, 1:5))
plot(lm(y ~ x, data=xy), which=5)

The Cook's distance text overplots the label for the point with the  
smallest residual.  This is an issue when the size of the plot is much  
less than the default, and the pointsize is not reduced proportionately.



I suggest the following:
  xx - hii
  xx[xx = 1] - NA
## Insert new code
  fracht - (1.25*par()$cin[2])/par()$pin[2]
  ylim[1] - ylim[1] - diff(ylim)*max(0, fracht-0.04)
## End insert new code
  plot(xx, rsp, xlim = c(0, max(xx, na.rm = TRUE)),
   ylim = ylim, main = main, xlab = Leverage,
   ylab = ylab5, type = n, ...)

Then, about 15 lines further down, replace
legend(bottomleft, legend = Cook's distance,
   lty = 2, col = 2, bty = n)

by
legend(bottomleft, legend = Cook's distance,
   lty = 2, col = 2, bty = n, y.intersp=0.5)

If this second change is not made, then one wants fracht - (1.5*par() 
$cin[2])/par()$pin[2]
I prefer the Cook's distance text to be a bit closer to the x-axis,  
as it separates it more clearly from any point labels.


John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 62, Issue 24

2008-04-24 Thread John Maindonald
The columns of the model matrix are all orthogonal.  So the problem  
lies with poly(), not with lm().

  x = rep(1:5,3)
y = rnorm(15)
z - model.matrix(lm(y ~ poly(x, 12)))
x = rep(1:5,3)
  y = rnorm(15)
  z - model.matrix(lm(y ~ poly(x, 12)))

   round(crossprod(z),15)
   (Intercept) poly(x, 12)1 poly(x, 12)2 poly(x, 12)3  
poly(x, 12)4
(Intercept)1500 
00
poly(x, 12)1010 
00
poly(x, 12)2001 
00
poly(x, 12)3000 
10
poly(x, 12)4000 
01
poly(x, 12)5000 
00
poly(x, 12)6000 
00
poly(x, 12)7000 
00
poly(x, 12)8000 
00
poly(x, 12)9000 
00
poly(x, 12)10   000 
00
poly(x, 12)11   000 
00
poly(x, 12)12   000 
00
   poly(x, 12)5 poly(x, 12)6 poly(x, 12)7 poly(x, 12)8  
poly(x, 12)9
(Intercept)  000 
00
poly(x, 12)1 000 
00
poly(x, 12)2 000 
00
poly(x, 12)3 000 
00
poly(x, 12)4 000 
00
poly(x, 12)5 100 
00
poly(x, 12)6 010 
00
poly(x, 12)7 001 
00
poly(x, 12)8 000 
10
poly(x, 12)9 000 
01
poly(x, 12)10000 
00
poly(x, 12)11000 
00
poly(x, 12)12000 
00
   poly(x, 12)10 poly(x, 12)11 poly(x, 12)12
(Intercept)   0 0 0
poly(x, 12)1  0 0 0
poly(x, 12)2  0 0 0
poly(x, 12)3  0 0 0
poly(x, 12)4  0 0 0
poly(x, 12)5  0 0 0
poly(x, 12)6  0 0 0
poly(x, 12)7  0 0 0
poly(x, 12)8  0 0 0
poly(x, 12)9  0 0 0
poly(x, 12)10 1 0 0
poly(x, 12)11 0 1 0
poly(x, 12)12 0 0 1

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 24 Apr 2008, at 8:00 PM, [EMAIL PROTECTED] wrote:
 From: [EMAIL PROTECTED]
 Date: 24 April 2008 3:05:28 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: [Rd] poly() can exceed degree k - 1 for k distinct points  
 (PR#11251)


 The poly() function can create more variables than can be fitted when
 there are replicated values.  In the example below, 'x' has only 5
 distinct values, but I can apparently fit a 12th-degree polynomial  
 with
 no error messages or even nonzero coefficients:

 R x = rep(1:5,3)
 R y = rnorm(15)
 R lm(y ~ poly(x, 12))

 Call:
 lm(formula = y ~ poly(x, 12))

 Coefficients:
   (Intercept)   poly(x, 12)1   poly(x, 12)2   poly(x, 12)3
  -0.274420.35822   -0.264122.11780
  poly(x, 12)4   poly(x, 12)5   poly(x, 12)6   poly(x, 12)7
   1.83117   -0.09260   -0.485721.94030
  poly(x, 12)8   poly(x, 12)9  poly(x, 12)10  poly(x, 12)11
  -0.88297   -1.045560.74289   -0.01422
 poly(x, 12)12
  -0.46548




 If I try the same with raw=TRUE, only a 4th-degree polynomial is  
 obtained:

 R lm(y ~ poly(x, 12, raw=TRUE))

 Call:
 lm(formula = y ~ poly(x, 12, raw = TRUE))

 Coefficients:
   (Intercept)   poly(x, 12, raw = TRUE)1
9.7527   -22.0971
  poly(x, 12, raw = TRUE)2   poly(x, 12, raw = TRUE)3
   15.3293-4.1005
  poly(x

Re: [Rd] R-devel Digest, Vol 62, Issue 24

2008-04-24 Thread John Maindonald
Actually, this may be a useful feature!  It allows calculation of a  
basis for the orthogonal complement of the space spanned by  
model.matrix(lm(y ~ poly(x,12)).  However, the default ought surely to  
be to disallow df  k-1 in poly(x,df), where k = length(unique(x)).

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 24 Apr 2008, at 8:00 PM, [EMAIL PROTECTED] wrote:

 From: [EMAIL PROTECTED]
 Date: 24 April 2008 3:05:28 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: [Rd] poly() can exceed degree k - 1 for k distinct points  
 (PR#11251)


 The poly() function can create more variables than can be fitted when
 there are replicated values.  In the example below, 'x' has only 5
 distinct values, but I can apparently fit a 12th-degree polynomial  
 with
 no error messages or even nonzero coefficients:

 R x = rep(1:5,3)
 R y = rnorm(15)
 R lm(y ~ poly(x, 12))

 Call:
 lm(formula = y ~ poly(x, 12))

 Coefficients:
   (Intercept)   poly(x, 12)1   poly(x, 12)2   poly(x, 12)3
  -0.274420.35822   -0.264122.11780
  poly(x, 12)4   poly(x, 12)5   poly(x, 12)6   poly(x, 12)7
   1.83117   -0.09260   -0.485721.94030
  poly(x, 12)8   poly(x, 12)9  poly(x, 12)10  poly(x, 12)11
  -0.88297   -1.045560.74289   -0.01422
 poly(x, 12)12
  -0.46548

snip
snip

 [I thought I submitted this via the website yesterday, but I can  
 find no
 trace of it.  I apologize if this is a duplicate, but I don't think  
 it is.]
 -- 
 Russell V. Lenth, Professor
 Department of Statistics
Actuarial Science(319)335-0814FAX (319)335-3017
 The University of Iowa   [EMAIL PROTECTED]
 Iowa City, IA 52242  USA http://www.stat.uiowa.edu/~rlenth/


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] html help fails for named vector objects (PR#9927)

2007-09-22 Thread John . Maindonald
   help(letters, htmlhelp=TRUE) fails.

Under the Mac OSX gui, the message is 'Help for the topic a was not  
found.' Under the version documented below, and under Windows, the  
message is

   No documentation for 'a' in specified packages and libraries:
repeated for all the elements of letters, then followed by
   you could try 'help.search(a)',
again repeated for all elements of letters.

The outcome seems similar for any character vector (including matrix)  
object, e.g. the matrix 'primateDNA' in the DAAGbio package.

The following have the expected result
   help(letters, htmlhelp=TRUE)
   help(letters, htmlhelp=FALSE)

The same result is obtained with R-2.5.1.


--please do not edit the information below--

Version:
platform = i386-apple-darwin8.10.1
arch = i386
os = darwin8.10.1
system = i386, darwin8.10.1
status = beta
major = 2
minor = 6.0
year = 2007
month = 09
day = 22
svn rev = 42941
language = R
version.string = R version 2.6.0 beta (2007-09-22 r42941)

Locale:
C

Search Path:
.GlobalEnv, package:testpkg, package:stats, package:graphics,  
package:grDevices, package:utils, package:datasets, package:methods,  
Autoloads, package:base

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] substitute and expression (Peter Dalgaard)

2007-07-19 Thread John Maindonald
In this connection, note the following

  a4 - 4
  plotThis - bquote(alpha==.(a), list(a=a4))
  do.call(plot, list(1:10, main=do.call(expression, c(plotThis
  do.call(plot, list(1:10, main=do.call(expression, plotThis)))
Error in do.call(expression, plotThis) : second argument must be a list

  ## Whereas plotThis has class call, c(plotThis) has class list
  class(plotThis)
[1] call
  class(c(plotThis))
[1] list

  ## Thus, the following is possible:
  do.call(plot, list(1:10, main=do.call(expression, list(plotThis


Marc Schwartz pointed out to me., some considerable time ago,
that one could use bquote() and .() to create the elements of a
list object whose elements can be plotted in parallel as required,
e.g., for axis labels, thus:

  plot(1:2, 1:2, xaxt=n)
  arg1 - bquote( .(x), list(x=1.5))
  arg2 - bquote(= .(x), list(x=1.5))
  axis(1, at=1:2, labels=do.call(expression, list(arg1, arg2)))

For a unified approach to use of do.call(expression, ...), maybe
one should use bquote() and .()?

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 18 Jul 2007, at 8:00 PM, [EMAIL PROTECTED] wrote:

 From: Peter Dalgaard [EMAIL PROTECTED]
 Date: 18 July 2007 1:39:50 AM
 To: Deepayan Sarkar [EMAIL PROTECTED]
 Cc: R Development Mailing List [EMAIL PROTECTED]
 Subject: Re: [Rd] substitute and expression


 Deepayan Sarkar wrote:
 On 7/16/07, Peter Dalgaard [EMAIL PROTECTED] wrote:

 Deepayan Sarkar wrote:

 Hi,

 I'm trying to understand whether the use of substitute() is
 appropriate/documented for plotmath annotation. The following two
 calls give the same results:



 plot(1:10, main = expression(alpha == 1))
 do.call(plot, list(1:10, main = expression(alpha == 1)))


 But not these two:



 plot(1:10, main = substitute(alpha == a, list(a = 2)))
 do.call(plot, list(1:10, main = substitute(alpha == a, list(a =  
 2


 Error in as.graphicsAnnot(main) : object alpha not found

 (as a consequence, xyplot(..., main = substitute(alpha)) doesn't
 currently work.)

 On the other hand, this works:



 foo - function(x) plot(1, main = x)
 foo(substitute(alpha))


 I'm not sure how to interpret ?plotmath; it says

  If the 'text' argument to one of the text-drawing functions
  ('text', 'mtext', 'axis', 'legend') in R is an expression, the
  argument is interpreted as a mathematical expression...

 and uses substitute() in its examples, but



 is.expression(substitute(alpha == a, list(a = 1)))


 [1] FALSE


 I think you need to take plotmath out of the equation and study the
 difference between objects of mode call and those of mode
 expression. Consider this:

 f - function(...)match.call()
 do.call(f, list(1:10, main = substitute(alpha == a, list(a = 2
 function(...)match.call()
 (1:10, main = alpha == 2)
 do.call(list, list(1:10, main = substitute(alpha == a, list(a =  
 2
 Error in do.call(list, list(1:10, main = substitute(alpha == a,  
 list(a =
 2 :
 object alpha not found

 The issue is that function ends up with an argument  alpha == 2  
 which it
 proceeds to evaluate (lazily), where a direct call sees
 substitute(.). It is a general problem with the do.call  
 mechanism
 that it effectively pre-evaluates the argument list, which can  
 confuse
 functions that rely on accessing the original argument  
 expression. Try,
 e.g., do.call(plot, list(airquality$Wind, airquality$Ozone)) and  
 watch
 the axis labels.


 Right. Lazy evaluation was the piece I was missing.


 Does it work if you use something like

  main = substitute(quote(alpha == a), list(a = 2))?


 Not for xyplot, though I haven't figured out why. Turns out this also
 doesn't work:


 plot(y ~ x, data = list(x = 1:10, y = 1:10), main = substitute 
 (alpha))

 Error in as.graphicsAnnot(main) : object alpha not found

 I'll take this to mean that the fact that substitute() works  
 sometimes
 (for plotmath) is an undocumented side effect of the implementation
 that should not be relied upon.

 Probably the correct solution is to use expression objects. More or  
 less
 the entire reason for their existence is this sort of surprises.

 plot(y ~ x, data = list(x = 1:10, y = 1:10), main =
 as.expression(substitute(alpha==a, list(a=2

 I'm not going to investigate why this is necessary in connection with
 plot(), but the core issue is probably

 e - quote(f(x)) ; e[[2]] - quote(2+2)
 e
 f(2 + 2)
 f - quote(f(2+2))
 identical(e,f)
 [1] TRUE

 notice that since the two calls are identical, there is no way for  
 e to
 detect that it was called with x replaced by an object of mode call.
 Or put differently, objects of mode call tend to lose their
 personality in connection with computing on the language.


 -- 
O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
   c

Re: [Rd] termplot - changes in defaults

2007-07-03 Thread John Maindonald
While termplot is under discussion, here's another proposal. I'd like to
change the default for partial.resid to TRUE, and for smooth to
panel.smooth.  I'd be surprised if those changes were to break existing
code.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On Mon, 2 Jul 2007, [EMAIL PROTECTED] wrote:

 Precisely.  Thanks Brian.

 I did do something like this but not nearly so elegantly.

 I suggest this become the standard version in the next release.  I can't

Yes, that was the intention (to go into R-devel).
(It was also my intention to attach as plain text, but my Windows mailer
seems to have defeated that.)

 see that it can break any existing code.  It's a pity now we can't make
 ylim = common the default.

I suspect we could if I allow a way to get the previous behaviour
(ylim=free, I think).

Brian

 Regards,
 Bill V.


 Bill Venables
 CSIRO Laboratories
 PO Box 120, Cleveland, 4163
 AUSTRALIA
 Office Phone (email preferred): +61 7 3826 7251
 Fax (if absolutely necessary):  +61 7 3826 7304
 Mobile: +61 4 8819 4402
 Home Phone: +61 7 3286 7700
 mailto:[EMAIL PROTECTED]
 http://www.cmis.csiro.au/bill.venables/

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Monday, 2 July 2007 7:55 PM
 To: Venables, Bill (CMIS, Cleveland)
 Cc: [EMAIL PROTECTED]
 Subject: Re: [Rd] termplot with uniform y-limits

 Is the attached the sort of thing you are looking for?
 It allows ylim to be specified, including as common.

 On Mon, 2 Jul 2007, [EMAIL PROTECTED] wrote:

 Does anyone have, or has anyone ever considered making, a version of
 'termplot' that allows the user to specify that all plots should have
 the same y-limits?

 This seems a natural thing to ask for, as the plots share a y-scale.
 If
 you don't have the same y-axes you can easily misread the comparative
 contributions of the different components.

 Notes: the current version of termplot does not allow the user to
 specify ylim.  I checked.

   the plot tools that come with mgcv do this by default.  Thanks
 Simon.


 Bill Venables
 CSIRO Laboratories
 PO Box 120, Cleveland, 4163
 AUSTRALIA
 Office Phone (email preferred): +61 7 3826 7251
 Fax (if absolutely necessary):  +61 7 3826 7304
 Mobile: +61 4 8819 4402
 Home Phone: +61 7 3286 7700
 mailto:[EMAIL PROTECTED]
 http://www.cmis.csiro.au/bill.venables/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] termplot - changes in defaults

2007-07-03 Thread John Maindonald
While termplot is under discussion, here's another proposal. I'd like to
change the default for partial.resid to TRUE, and for smooth to
panel.smooth.  I'd be surprised if those changes were to break existing code.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On Mon, 2 Jul 2007, [EMAIL PROTECTED] wrote:

 Precisely.  Thanks Brian.

 I did do something like this but not nearly so elegantly.

 I suggest this become the standard version in the next release.  I can't

Yes, that was the intention (to go into R-devel).
(It was also my intention to attach as plain text, but my Windows mailer
seems to have defeated that.)

 see that it can break any existing code.  It's a pity now we can't make
ylim = common the default.

I suspect we could if I allow a way to get the previous behaviour
(ylim=free, I think).

Brian

 Regards,
 Bill V.


 Bill Venables
 CSIRO Laboratories
 PO Box 120, Cleveland, 4163
 AUSTRALIA
 Office Phone (email preferred): +61 7 3826 7251
 Fax (if absolutely necessary):  +61 7 3826 7304
 Mobile: +61 4 8819 4402
 Home Phone: +61 7 3286 7700
 mailto:[EMAIL PROTECTED]
 http://www.cmis.csiro.au/bill.venables/

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Monday, 2 July 2007 7:55 PM
 To: Venables, Bill (CMIS, Cleveland)
 Cc: [EMAIL PROTECTED]
 Subject: Re: [Rd] termplot with uniform y-limits

 Is the attached the sort of thing you are looking for?
 It allows ylim to be specified, including as common.

 On Mon, 2 Jul 2007, [EMAIL PROTECTED] wrote:

 Does anyone have, or has anyone ever considered making, a version of
'termplot' that allows the user to specify that all plots should have
the same y-limits?

 This seems a natural thing to ask for, as the plots share a y-scale.
 If
 you don't have the same y-axes you can easily misread the comparative
contributions of the different components.

 Notes: the current version of termplot does not allow the user to
specify ylim.  I checked.

   the plot tools that come with mgcv do this by default.  Thanks
 Simon.


 Bill Venables
 CSIRO Laboratories
 PO Box 120, Cleveland, 4163
 AUSTRALIA
 Office Phone (email preferred): +61 7 3826 7251
 Fax (if absolutely necessary):  +61 7 3826 7304
 Mobile: +61 4 8819 4402
 Home Phone: +61 7 3286 7700
 mailto:[EMAIL PROTECTED]
 http://www.cmis.csiro.au/bill.venables/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Levels attribute in integer columns created by model.frame()

2007-05-01 Thread John Maindonald
I get

worms.glm - glm(cbind(deaths, (20-deaths)) ~ sex+ doselin,
+   data=worms, family=binomial)
  attr(worms.glm, dataClasses)
NULL

But maybe the result from somewhere within predict.lm() or model.frame()
is different.

Surely the levels attribute has no relevance to glm's computations with
the doselin term.  It has treated it as numeric.  In my view, either  
predict()
should maintain the stance (pretence?) that it is numeric, or else  
the call
to  .checkMFClasses() that follows on the use of glm() should report at
least a warning.,

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 1 May 2007, at 4:33 PM, Prof Brian Ripley wrote:

 Stripping attributes from a column in model.frame would be highly  
 undesirable.

 The mistake was using 'unclass' when the intention was to remove  
 the levels (I presume).  The new variable given is correctly  
 reported as not matching that used during fitting.

 Uuse of traceback() would have shown that the error is not reported  
 from model.frame (as claimed) but from

 4: .checkMFClasses(cl, m)
 3: predict.lm(object, newdata, se.fit, scale = 1, type = ifelse 
 (type ==
link, response, type), terms = terms, na.action =  
 na.action)
 2: predict.glm(worms.glm, new = data.frame(sex = 1, doselin = 6))
 1: predict(worms.glm, new = data.frame(sex = 1, doselin = 6))

 The reason the class is reported as other is clear from
 attr(worms.glm, dataClasses).  This comes from .MFclass.


 On Tue, 1 May 2007, John Maindonald wrote:

 The following is evidence of what is surely an undesirable feature.
 The issue is the handling, in calls to model.frame(), of  an
 explanatory variable that has been derived as an unclassed
 factor. (Ross Darnell drew this to my attention.)

 He has already filed a bug report on it, without saying what he  
 thinks the bug is.

 ## Data are slightly modified from p.191 of MASS
  worms - data.frame(sex=gl(2,6), Dose=factor(rep(2^(0:5),2)),
 + deaths=c(1,4,9,13,18,20,0,2,6,10,12,16))
  worms$doselin - unclass(worms$Dose)
  class(worms$doselin)
 [1] integer
  attributes(worms$doselin)
 $levels
 [1] 1  2  4  8  16 32

  worms.glm - glm(cbind(deaths, (20-deaths)) ~ sex+ doselin,
 +  data=worms, family=binomial)
  predict(worms.glm, new=data.frame(sex=1, doselin=6))
 Error: variable 'doselin' was fitted with class other but class
 numeric was supplied
 In addition: Warning message:
 variable 'doselin' is not a factor in: model.frame.default(Terms,
 newdata, na.action = na.action, xlev = object$xlevels)


 The error is reported in the call to model.frame() from predict.lm()
 which is called by predict.glm(). It is not clear to me why this  
 call to
 model.frame identifies the class that should be expected as other.

 The problem might be fixed by stripping the levels attribute from
 any column created by model.frame() that is integer or numeric.

  ###
 
  ## Note the following
  mframe - model.frame(cbind(deaths, (20-deaths)) ~ sex+ doselin,
 +   data=worms)
  class(mframe$doselin)
 [1] integer
  attributes(mframe$doselin)
 $levels
 [1] 1  2  4  8  16 32


 John Maindonald email: [EMAIL PROTECTED]
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Levels attribute in integer columns created by model.frame()

2007-04-30 Thread John Maindonald
The following is evidence of what is surely an undesirable feature.
The issue is the handling, in calls to model.frame(), of  an
explanatory variable that has been derived as an unclassed
factor. (Ross Darnell drew this to my attention.)

## Data are slightly modified from p.191 of MASS
  worms - data.frame(sex=gl(2,6), Dose=factor(rep(2^(0:5),2)),
+ deaths=c(1,4,9,13,18,20,0,2,6,10,12,16))
  worms$doselin - unclass(worms$Dose)
  class(worms$doselin)
[1] integer
  attributes(worms$doselin)
$levels
[1] 1  2  4  8  16 32

  worms.glm - glm(cbind(deaths, (20-deaths)) ~ sex+ doselin,
+  data=worms, family=binomial)
  predict(worms.glm, new=data.frame(sex=1, doselin=6))
Error: variable 'doselin' was fitted with class other but class  
numeric was supplied
In addition: Warning message:
variable 'doselin' is not a factor in: model.frame.default(Terms,  
newdata, na.action = na.action, xlev = object$xlevels)


The error is reported in the call to model.frame() from predict.lm()
which is called by predict.glm(). It is not clear to me why this call to
model.frame identifies the class that should be expected as other.

The problem might be fixed by stripping the levels attribute from
any column created by model.frame() that is integer or numeric.

  ###
 
  ## Note the following
  mframe - model.frame(cbind(deaths, (20-deaths)) ~ sex+ doselin,
+   data=worms)
  class(mframe$doselin)
[1] integer
  attributes(mframe$doselin)
$levels
[1] 1  2  4  8  16 32


John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] read.table() errors with tab as separator (PR#9061)

2006-07-05 Thread John . Maindonald
(1) read.table(), with sep=\t, identifies 13 our of 1400 records,
in a file with 1400 records of 3 fields each, as having only 2 fields.
This happens under version 2.3.1 for Windows as well as with
R 2.3.1 for Mac OS X, and with R-devel under Mac OS X.
[R version 2.4.0 Under development (unstable) (2006-07-03 r38478)]

(2) Using read.table() with sep=\t, the first 1569 records only
of a 1821 record file are input.  The file has exactly two fields
in each record, and the minimum length of the second field is
1 character.  If however I extract lines 1561 to 1650 from the
file (the file short.txt below), all 90 lines are input.

  webtwo - http://www.maths.anu.edu.au/~johnm/testfiles/twotabs.txt;
  xy - read.table(url(webtwo), sep=\t)
Warning message:
number of items read is not a multiple of the number of columns
  z - count.fields(url(webtwo), sep=\t)
  table(z)
z
23
   13 1387
  table(sapply(strsplit(readLines(url(webtwo)), split=\t), length))

3
1400
  readLines(url(webtwo))[z==2][9:13]  # last 5 as a sample (shorter  
lines)
[1] 865\tlinear model (lm)! Cook's distance\t152
[2] 1019\tlinear model (lm)! Cook's distance\t177
[3] 1048\tlinear model (lm)! Cook's distance\t183
[4] 1082\tlinear model (lm)! Cook's distance\t187
[5] 1220\tlinear model (lm)! Cook's distance\t214
  weblong - http://www.maths.anu.edu.au/~johnm/testfiles/long.txt;
  webshort - http://www.maths.anu.edu.au/~johnm/testfiles/short.txt;
  xyLong - read.table(url(weblong), sep=\t)
  dim(xyLong)# Should be 1821 x 2
[1] 15692
  xyShort - read.table(url(webshort), sep=\t)
  dim(xyShort)   # Should be, and will be, 90 x 2
[1] 90  2
  long - readLines(url(weblong))
  short - readLines(url(webshort))
  length(long)
[1] 1821
  length(short)
[1] 90
  all(long[1561:1650]==short)  # short is lines 1561:1650 of long
[1] TRUE
  ## Moreover strsplit() can pick up the \t's correctly
  lsplit - strsplit(long, \t)
  table(sapply(lsplit, length))

2
1821
  # Try also table(sapply(lsplit, function(x)x[2]))

--please do not edit the information below--

Version:
platform = powerpc-apple-darwin8.6.0
arch = powerpc
os = darwin8.6.0
system = powerpc, darwin8.6.0
status =
major = 2
minor = 3.1
year = 2006
month = 06
day = 01
svn rev = 38247
language = R
version.string = Version 2.3.1 (2006-06-01)

Locale:
C

Search Path:
.GlobalEnv, package:lattice, package:methods, package:stats,  
package:graphics, package:grDevices, package:utils, package:datasets,  
Autoloads, package:base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Citation of R packages

2006-02-10 Thread John Maindonald
On 5 Feb 2006, at 2:27 AM, [EMAIL PROTECTED] wrote:

 On Mon, 30 Jan 2006 10:06:52 +1100 (EST),
 John Maindonald (JM) wrote:

 The bibtex citations provided by citation() do not
 work all that well in cases where there is no printed
 document to reference:

 That's why there is a warning at the end that they will need manual
 editing ... IMHO they at least save you some typing effort in many
 cases.

They are certainly a useful start.

 (1) A version field is needed, as the note field is
 required for other purposes, currently trying to
 sort out nuances that cannot be sorted out in the
 author list (author, compiler, implementor of R version,
 contributor, ...) and maybe giving a cross-reference
 to a book or paper that is somehow relevant.

 Why should a reference cross-reference another reference? Could you
 give an example?

Where there is a published paper or a book (such as MASS), or a manual
for which a url can be given, my decision was to include that in the  
main
list of references, but not to include references there that were  
references
to the package itself, which as you suggest below can be a reference to
the concatenated help pages.

It seemed anyway useful to have a separate list of packages.  For
consistency, these were always references to the package, with a
cross-reference to any relevant document in the references to papers.

 (2) Maybe the author field should be more nuanced, or
 maybe ...

 author fields of bibtex entries have a strict format (names separated
 by and), what do you mean by more nuanced?

Those named in the list of authors may be any combination of: the  
authors
of an R package, the authors of an original S version, the person or  
persons
responsible for an R port, the authors of the Fortran code, compiler 
(s), and
contributors of ideas.

For John Fox's car, citation() gives the following:
 author = {John Fox. I am grateful to Douglas Bates and David  
Firth and Michael Friendly and Gregor Gorjanc and Georges Monette and  
Henric Nilsson and Brian Ripley and Sanford Weisberg and and Achim  
Zeleis for various suggestions and contributions.},

For Rcmdr:
 author = {John Fox and with contributions from Michael Ash and  
Philippe Grosjean and Martin Maechler and Dan Putler and and Peter  
Wolf.},

For car, maybe John Fox should be identified as author.  For Rcmdr,  
maybe the other persons that are named should be added?

For leaps:
 author = {Thomas Lumley using Fortran code by Alan Miller},

It seems reasonable to cite Lumley and Miller as authors.  Should  
there be a note that identifies Miller as the contributor of the  
Fortran code?

Should the name(s) of porters (usually from S) be included as author 
(s)?  Or should their contribution be acknowledged in the note field?  
Or ...

Possibilities are to cite all those individuals as author, or to cite  
John Fox only,
with any combination of no additional information in the note field,  
or using the
note field to explain who did what.  The citation() function leaves  
it unclear who
are to be acknowledged as authors, and in fact

 (3) In compiling a list of packages, name order seems
 preferable, and one wants the title first (achieved by
 relocating the format.title field in the manual FUNCTION
 in the .bst file
 (4) manual seems not an ideal name for the class, if
 there is no manual.

 A package always has a reference manual, the concatenated help pages
 certainly qualify as such and can be downloaded in PDF format from
 CRAN. The ISBN rules even allow to assign an ISBN number to the online
 help of a software package which also can serve as the ISBN number of
 the *software itself* (which we did for base R).

I'd prefer some consistency in the way that R packages are referenced.
Thus, if reference for one package is to the concatenated help pages,
do it that way for all of them.

 Maybe what is needed is a package or suchlike class,
 and several alternative .bst files that handle the needed
 listings.

 I know at least one other person who is wrestling with
 this, and others on this list must be wrestling with it.

 I am certainly open for discussions and any suggestions for
 improvements, but it must be within the standard bibtex entry types,
 we cannot write our own entry types and .bst files. Many journals
 require the usage of their own (or standard) bibtex styles, and the
 entries we produce must work with those. If R creates nonstandard
 bibtex entries even more manual work will be necessary in many
 cases.

 I have no definitive bibtex reference at hand, but the natbib style
 files (a very popular collection of bibtex styles, at least I
 definitely want to be compatible with those) define

  article
  book
  booklet
  conference  (= alias for inproceedings)
  inbook
  incollection
  inproceedings
  manual
  mastersthesis
  misc
  phdthesis
  proceedings
  techreport
  unpublished

 which coincide with the choices the emacs bibtex mode offers. Out of
 these only manual, misc

Re: [Rd] Citation of R packages

2006-02-10 Thread John Maindonald
Even if a CITATION file is included, there is an issue of what to put  
in it.
Authorship of a book or paper is not always the simple matter that might
appear.  With an R package, it can be a far from simple matter.  We are
trying to adapt a tool, surely, that was designed for different  
purposes.

1. I'd like to see the definition of a new BibTeX entry type that has  
fields for
additional author details and version number. There is surely some
mechanism for getting agreement on a new entry type.

2. In any case, there's a message for maintainers of packages to include
CITATION files that reflect what they want to appear in any citation,  
with
citation(lattice) as maybe a suitable model?

John.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Mathematical Sciences Institute, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 11 Feb 2006, at 5:36 AM, [EMAIL PROTECTED] wrote:

 On Fri, 10 Feb 2006 21:01:44 +1100,
 John Maindonald (JM) wrote:

 [...]

 Where there is a published paper or a book (such as MASS), or a
 manual for which a url can be given, my decision was to include
 that in the main list of references, but not to include references
 there that were references to the package itself, which as you
 suggest below can be a reference to the concatenated help pages.

 The CITATION file of a package may contain as many entries as the
 author wants, including both a reference to the help pages and to the
 book (or whatever).


 It seemed anyway useful to have a separate list of packages.  For
 consistency, these were always references to the package, with a
 cross-reference to any relevant document in the references to papers.

 (2) Maybe the author field should be more nuanced, or
 maybe ...

 author fields of bibtex entries have a strict format (names  
 separated
 by and), what do you mean by more nuanced?

 Those named in the list of authors may be any combination of: the
 authors
 of an R package, the authors of an original S version, the person or
 persons
 responsible for an R port, the authors of the Fortran code, compiler
 (s), and
 contributors of ideas.

 For John Fox's car, citation() gives the following:
  author = {John Fox. I am grateful to Douglas Bates and David
 Firth and Michael Friendly and Gregor Gorjanc and Georges Monette and
 Henric Nilsson and Brian Ripley and Sanford Weisberg and and Achim
 Zeleis for various suggestions and contributions.},

 For Rcmdr:
  author = {John Fox and with contributions from Michael Ash and
 Philippe Grosjean and Martin Maechler and Dan Putler and and Peter
 Wolf.},

 For car, maybe John Fox should be identified as author.  For Rcmdr,
 maybe the other persons that are named should be added?

 For leaps:
  author = {Thomas Lumley using Fortran code by Alan Miller},

 It seems reasonable to cite Lumley and Miller as authors.  Should
 there be a note that identifies Miller as the contributor of the
 Fortran code?

 Should the name(s) of porters (usually from S) be included as author
 (s)?  Or should their contribution be acknowledged in the note field?
 Or ...

 Possibilities are to cite all those individuals as author, or to cite
 John Fox only,
 with any combination of no additional information in the note field,
 or using the
 note field to explain who did what.  The citation() function leaves
 it unclear who
 are to be acknowledged as authors, and in fact


 Umm, the problem there is not the citation() function, but that the
 authors of all those packages obviously have not included a CITATION
 file in their package which overrides the default (extracted from the
 DESCRIPTION file).

 E.g., package flexclust has DESCRIPTION

 Package: flexclust
 Version: 0.8-1
 Date: 2006-01-11
 Author: Friedrich Leisch, parts based on code by Evgenia Dimitriadou

 but

 
 R citation(flexclust)

 To cite package flexclust in publications use:

   Friedrich Leisch. A Toolbox for K-Centroids Cluster Analysis.
   Computational Statistics and Data Analysis, 2006. Accepted for
   publication.

 A BibTeX entry for LaTeX users is

   @Article{,
 author = {Friedrich Leisch},
 title = {A Toolbox for K-Centroids Cluster Analysis},
 journal = {Computational Statistics and Data Analysis},
 year = {2006},
 note = {Accepted for publication},
   }
 

 because the CITATION file overrides the DESCRIPTION file. Writing a
 CITATION file is of course also intended for those cases where a
 proper reference cannot be auto-generated from the DESCRIPTION file.


 (3) In compiling a list of packages, name order seems
 preferable, and one wants the title first (achieved by
 relocating the format.title field in the manual FUNCTION
 in the .bst file
 (4) manual seems not an ideal name for the class, if
 there is no manual.

 A package always has a reference manual, the concatenated help  
 pages
 certainly qualify

[Rd] Citation of R packages

2006-01-29 Thread John Maindonald
The bibtex citations provided by citation() do not
work all that well in cases where there is no printed
document to reference:
(1) A version field is needed, as the note field is
required for other purposes, currently trying to
sort out nuances that cannot be sorted out in the
author list (author, compiler, implementor of R version,
contributor, ...) and maybe giving a cross-reference
to a book or paper that is somehow relevant.
(2) Maybe the author field should be more nuanced, or
maybe ...
(3) In compiling a list of packages, name order seems
preferable, and one wants the title first (achieved by
relocating the format.title field in the manual FUNCTION
in the .bst file
(4) manual seems not an ideal name for the class, if
there is no manual.

Maybe what is needed is a package or suchlike class,
and several alternative .bst files that handle the needed
listings.

I know at least one other person who is wrestling with
this, and others on this list must be wrestling with it.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Mathematical Sciences Institute, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bugs/issues with model.tables() (PR#8275)

2005-11-01 Thread john . maindonald

  unique(predict.lm(bal1.aov, type=terms, se=TRUE)$se)
 trt
1 0.3054198

II (d) In the interests of brevity (sic!), I will limit attention
to means:

bdes -
   structure(list(trt = structure(as.integer(c(1, 2, 1, 3, 1, 4,
2, 3, 2, 4, 3, 4)), .Label = c(a, b, c, d),
class = factor),
  blk = structure(as.integer(c(1, 1, 2, 2, 3, 3, 4,  
4, 5, 5,
6, 6)), .Label = c(A, B, C, D, E, F),
class = factor),
  y = c(0.8, -1.1, 4.5, 3.3, 4.3, 4.9, 0.6, 3.9, 4.6,  
9.4,
3.7, 5.7)), .Names = c(trt, blk, y),
 row.names = c(1,2, 3, 4, 5, 6, 7, 8,  
9, 10,
   11, 12), class = data.frame)
  # Crude block means; these are meaningless
  tapply(bdes$y, bdes$blk, mean)
 A B C D E F
-0.15  3.90  4.60  2.25  7.00  4.70
  # Crude treatment means
  tapply(bdes$y, bdes$trt, mean)
abcd
3.20 1.37 3.63 6.67
  ## aov fits
  bdes.aov - aov(y~blk+trt, data=bdes)  # Blocks first
  bdes_trtFirst.aov - aov(y~trt+blk, data=bdes) # trt first
  model.tables(bdes.aov, type=means)[[table]][[blk]]
blk
 A B C D E F
-0.15  3.90  4.60  2.25  7.00  4.70
# Observe that these agree with the above crude block means
  model.tables(bdes_trtFirst.aov, type=means)[[table]][[trt]]
trt
abcd
3.20 1.37 3.63 6.67
 
  ## Treatment means, when blocks are taken first
  model.tables(bdes.aov, type=means)[[table]][[trt]]
trt
abcd
4.13 2.05 3.73 4.95
  ## Also, note their differences
  diff(model.tables(bdes.aov, type=means)[[table]][[trt]])
trt
 b c d
-2.08  1.68  1.216667

  ## Treatment means, from the usual least squares analysis
  dummy.coef(bdes.aov)[[(Intercept)]]
(Intercept)
  1.4125
  dummy.coef(bdes.aov)[[(Intercept)]]+mean(dummy.coef(bdes.aov)$blk)+
+ dummy.coef(bdes.aov)$trt
abcd
4.341667 1.216667 3.741667 5.57
  diff(dummy.coef(bdes.aov)$trt)
  b  c  d
-3.125  2.525  1.825
  diff(model.tables(bdes.aov, type=means)[[table]][[trt]])/
+ diff(dummy.coef(bdes.aov)$trt)
trt
 b c d
0.667 0.667 0.667
   # This factor is some simple function of the BIB design parameters,
   # which I am too lazy or too busy to work out.


--please do not edit the information below--

Version:
platform = powerpc-apple-darwin7.9.0
arch = powerpc
os = darwin7.9.0
system = powerpc, darwin7.9.0
status =
major = 2
minor = 2.0
year = 2005
month = 10
day = 06
svn rev = 35749
language = R

Locale:
C

Search Path:
.GlobalEnv, cuckoohosts, file:~/r/ch2/.RData, file:../.RData,  
package:methods, package:stats, package:graphics, package:grDevices,  
package:utils, package:datasets, Autoloads, package:base



John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plot(lm): new behavior in R-2.2.0 alpha

2005-09-17 Thread John Maindonald
Martin -
Thanks for your efforts in initiating and managing this
discussion.

As for the issue of deprecating the plot.lm() pictures in
the published books, surely this will have great benefits
for the authors. It will help them to sell the new editions
of their books that will in due course appear replete with
the new plots!

For 2.2.0, I have nothing more to add to the comments
others have made,  I hope we can in due course agree,
as a minimum, to put some version of John Fox's vif(),
and something akin to Werner Stahl's smooths for up to
20 simulated data sets, into 2.3.0
John Maindonald.

On 18 Sep 2005, at 1:29 AM, Martin Maechler wrote:

 Wst == Werner Stahel [EMAIL PROTECTED]
 on Fri, 16 Sep 2005 09:37:02 +0200 writes:


 Wst Dear Martin, dear Johns Thanks for including me into
 Wst your discussion.

 Wst I am a strong supporter of Residuals vs. Hii


 One remaining problem I'd like to address is the
 balanced AOV situation, ...


 Wst In order to keep the plots consistent, I suggest to
 Wst draw a histogram. Other alternatives will or can be
 Wst interesting in the general case and therefore are not a
 Wst suitable substitute for this plot.

 hmm, but all other 3 default plots have
  (standardized / sqrt) residuals  on the y-axis.
 I'd very much like to keep that for any forth plot.
 So would we want a horizontal histogram?  And do we really want
 a histogram when we've already got the QQ plot?

 We need a decent proposal for a 4th plot
 {instead of  R_i vs h_ii  , when  h_ii are constant}
 REAL SOON NOW  since it's feature
 freeze on Monday.
 Of course the current state can be declared a bug and still be
 fixed but that was not the intention...

 Also, there are now at least 2 book authors among R-core (and
 more book authors more generally!) in whose books there are
 pictures with the old-default 4th plot.
 So I'd like to have convincing reasons for ``deprecating'' all
 the plot.lm() pictures in the published books.

 At the moment, I'd still  go for

  R_i  vs i
 or  sqrt|R_i| vs i  -- possibly with type = 'h'

 which could be used to check an important kind of temporal
 auto-correlation.

 the latter, because in a 2 x 2 plot arrangement, this gives the
 same y-axis as default plot 3.

 Wst 

 Wst Back to currently available methods:

 Wst John Maindonald discusses different contours. I like
 Wst the implementation I get currently in R-devel: contours
 Wst of Cook's distances, since they are popular and we can
 Wst then argue that the plot of D_i vs. i is no more
 Wst needed.

 what about John's proposal of different contour levels than
 c(0.5, 1)  -- note that these *have* been added as arguments to
 plot.lm() a user could modify.

 Wst For most plots, I like to see a smoother along with the
 Wst points.  I suggest to add the option to include
 Wst smoothers, not only as an argument to plot.lm, but even
 Wst as an option().  I have heared of the intense
 Wst discussions about options().  With Martin, we arrived
 Wst at the conclusion that options() should never influence
 Wst calculations and results, but is suitable to adjust
 Wst outputs (numerical: digits=, graphical: smooth=) to the
 Wst user's taste.

 {and John Fox agreed, `in general'}

 That could be a possibility, for 2.2.0  only applied to
 plot.lm() in any case, where plot.lm() would get a new argument

 add.smooth = getOption(plot.add.smooth)

 What do people think about the name?
 it would ``stick with us'' -- so we better choose it well..


 (4) Are there other diagnostics that ought to be
 included in stats? (perhaps in a function other than
 plot.lm(), which risks being overloaded).  One strong
 claiment is vif() (variance inflation factor),


...
...
...


 Wst As we focus on plots, my plot method includes the
 Wst option (default) to add smooths for 20 simulated
 Wst datasets (according to the fitted model).

 this and others are really nice.

 However not for R 2.2.x in any case.

 I agree that one should rather provide `single-plot'
 functions and have plot.lm() just call a few of them; instead of
 having things all part of plot.lm().
 There's the slight advantage that you can guarantee some
 consistence (e.g. in the definition of standardized residuals)
 and save some computations when have everything in one function,
 but consistency should be possible otherwise as well...
 Anyway this is for 2.3.0 or later.

 Martin


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel