from:"maechler"

Re: [Rd] Suggestion: help()

2005-06-07 Thread Martin Maechler

> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Tue, 07 Jun 2005 12:12:57 -0400 writes:

  .

>>> The current .Rd files don't just document functions, they also document 
>>> data objects and classes.
>>> 
>>> But the main point here is that it's not good to have multiple 
>>> disconnected sets of documentation for a package.  Users should be able 
>>> to say the equivalent of "give me help on foo", and get help on foo, 
>>> whether it's a function, a data object, a package, a method, a class, 
or 
>>> whatever.  It's a bad design to force them to ask for the same sort of 
>>> thing in different ways depending on the type of thing they're asking 
for.
... On 6/7/2005 11:59 AM, Robert Gentleman wrote:

>> 
>> Hi Duncan and others,
>> I think they are linked. There are tools available both in R and in 
>> Bioconductor and some pop things up and some don't. It doesn't take much 
>> work to add vignettes to the windows menu bar - as we have done in BioC 
>> for some time now - it would be nice if this was part of R, but no one 
>> seems to have been interested in achieving that. Fixing the help system 
>> to deal with more diverse kinds of help would be nice as well - but 
>> taking one part of it and saying, "now everyone must do it this way" is 
>> not that helpful.

>> I respectfully disagree about the main point. My main point is, I 
>> don't want more things imposed on me; dealing with  R CMD check is 
>> enough of a burden in its current version, without someone deciding that 
>> it would be nice to have a whole bunch more requirements. Folks should 
>> feel entirely free to do what they want - but a little less free to tell 
>> me what I should be doing.

Duncan> And I disagree pretty strenuously about that.  One
Duncan> of the strengths of R is that it does impose
Duncan> standards on contributed packages, and these make
Duncan> them easier to use, less likely to conflict with
Duncan> each other, and so on.

Duncan> We shouldn't impose things lightly, but if they do
Duncan> make packages better, we should feel no reason not
Duncan> to tell you what you should be doing.

As Kurt mentioned early in this thread, we currently have
the auto-generated information from
either

help(package = )# or (equivalently!)
library(help = )

which shows  
  DESCRIPTION + 
  (user-written/auto-generated) INDEX +
  mentions vignettes and other contents in inst/doc/

Now if Duncan would write some R code that produces a   man/.Rd
file from the above information -- and as he mentioned also
added some of that functionality to package.skeleton(), 
I think everyone could become "happy", i.e.,
we could improve the system in the future with only a very light
burden on the maintainers of currently existing packages: You'd
have to run the new R function only once for every package you
maintain.

Also, the use of a user-written INDEX file could eventually
completely be abandoned in favor of maintaining
man/.Rd, which is much nicer;  
I'd welcome such a direction quite a bit.

And as much as I do like (and read) the vignettes that are
available, I also do agree that writing one other *.Rd file is
easier for many new package authors than to write a
vignette -- the package author already had to learn *.Rd syntax
anyway -- and it's nice to be able to produce something where
hyperlinks to the other existing reference material (ie. help
pages) just works out of the box.

OTOH, we should still keep in mind that it's worth to try to
get  bi-directional linking between (PDF) vignettes and help
files  (assuming all relevant files are installed by R CMD
INSTALL of course).

Martin

Duncan> Currently R has 3 types of help: the .Rd files in
Duncan> the man directory (which are converted into plain
Duncan> text, HTML, compiled HTML, LaTex, DVI, PDF, etc),
Duncan> the vignettes, and unstructured files in inst/doc.
Duncan> We currently require .Rd files for every function
Duncan> and data object.  Adding a requirement to also
Duncan> document the package that way is not all that much
Duncan> of a burden, and will automatically give all those
Duncan> output formats I listed above.  It will help to
Duncan> solve the often-complained about problem of packages
Duncan> that contain no overview at all.  (Requiring a
Duncan> vignette and giving a way to display it would also
Duncan> do that, but I think requiring a .Rd file is less of
Duncan> a burden, and for anyone who has gone to the trouble
Duncan> of creating a vignette, gives a natural place for a
Duncan> link to it.  Vignettes aren't used as much as they
Duncan> should be, because they are hidden away where users
Duncan> don't see them.)

Duncan> Duncan Murdoch

>> 
>> Best wishes,
>> Robert
>> 
>> 
>>> I

Re: [Rd] Re: [R] p-value > 1 in fisher.test()

2005-06-04 Thread Martin Maechler

>>>>> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
>>>>> on Sat, 04 Jun 2005 11:43:34 +0200 writes:

UweL> (Ted Harding) wrote:
>> On 03-Jun-05 Ted Harding wrote:
>> 
>>> And on mine
>>> 
>>> (A: PII, Red Had 9, R-1.8.0):
>>> 
>>> ff <- c(0,10,250,5000); dim(ff) <- c(2,2);
>>> 
>>> 1-fisher.test(ff)$p.value
>>> [1] 1.268219e-11
>>> 
>>> (B: PIII, SuSE 7.2, R-2.1.0beta):
>>> 
>>> ff <- c(0,10,250,5000); dim(ff) <- c(2,2);
>>> 
>>> 1-fisher.test(ff)$p.value
>>> [1] -1.384892e-12
>> 
>> 
>> I have a suggestion (maybe it should also go to R-devel).
>> 
>> There are many functions in R whose designated purpose is
>> to return the value of a probability (or a probability
>> density). This designated purpose is in the mind of the
>> person who has coded the function, and is implicit in its
>> usage.
>> 
>> Therefore I suggest that every such function should have
>> a built-in internal check that no probability should be
>> less than 0 (and if the primary computation yields such
>> a value then the function should set it exactly to zero),
>> and should not exceed 1 (in which case the function should
>> set it exactly to 1). [And, in view of recent echanges,
>> I would suggest exactly +0, not -0!]
>> 
>> Similar for any attempts to return a negative probability
>> density; while of course a positive value can be allowed
>> to be anything.
>> 
>> All probabilities would then be guaranteed to be "clean"
>> and issues like the Fisher exact test above would no longer
>> be even a tiny problem.
>> 
>> Implementing this in the possibly many cases where it is
>> not already present is no doubt a long-term (and tedious)
>> project.
>> 
>> Meanwhile, people who encounter problems due to its absence
>> can carry out their own checks and adjustments!

UweL> [moved to R-devel]

UweL> Ted, my (naive?) objection:
UweL> Many errors in the underlying code have been detected by a function 
UweL> returning a nonsensical value, but if the probability is silently set 
to 
UweL> 0 or 1 ...
UweL> Hence I would agree to do so in special cases where it makes sense 
UweL> because of numerical issues, but please not globally.

I agree very much with Uwe's point.

Further to fisher.test(): This whole thread is
re-hashing a pretty recent  bug report on fisher.test() 
{ "negative p-values from fisher's test (PR#7801)", April '05}
I think that only *because* of the obviously wrong P-values have
we found and confirmed that the refereed and published code
underlying fisher.test() is bogous.   Such knowledge would have
been harder to gain if the P-values would have been cut into [0,1].

Martin Maechler

UweL> Uwe Ligges

>> Best wishes to all,
>> Ted.
>> 
>> 
>> 
>> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 04-Jun-05   Time: 00:02:32
>> -- XFMail --

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] print()ing of raw matrices (PR#7912)

2005-06-02 Thread maechler

Not a bug in a very strict sense, 
but still something to be fixed eventually:

> s <- sapply(0:7, function(i) rawShift(charToRaw("my text"),i))
> s

 ... nothing is printed at all

> str(s)
 raw [1:7, 1:8] 6d 79 20 74 ...
> c(s)
 [1] 6d 79 20 74 65 78 74 da f2 40 e8 ca f0 e8 b4 e4 80 d0 94 e0 d0 68 c8 00 a0
[26] 28 c0 a0 d0 90 00 40 50 80 40 a0 20 00 80 a0 00 80 40 40 00 00 40 00 00 80
[51] 80 00 00 80 00 00
>

and similar behavior for arrays, e.g.,

> dim(s) <- c(7,4,2)
> s
, , 1


, , 2

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] 1/tan(-0) != 1/tan(0)

2005-06-01 Thread Martin Maechler

Testing the code that Morten Welinder suggested for improving
extreme tail behavior of  qcauchy(),
I found what you can read in the subject.
namely that the tan() + floating-point implementation on all
four different versions of Redhat linux, I have access to on
i686 and amd64 architectures,

> 1/tan(c(-0,0))
gives
-Inf  Inf

and of course, that can well be considered a feature, since
after all, the tan() function does jump from -Inf to +Inf at 0. 
I was still surprised that this even happens on the R level,
and I wonder if this distinction of "-0" and "0" shouldn't be
mentioned in some place(s) of the R documentation.

For the real problem, the R source (in C), It's simple
to work around the fact that
qcauchy(0, log=TRUE)
for Morten's code proposal gives -Inf instead of +Inf.

Martin

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Wed,  1 Jun 2005 08:57:18 +0200 (CEST) writes:

>>>>> "Morten" == Morten Welinder <[EMAIL PROTECTED]>
>>>>> on Fri, 27 May 2005 20:24:36 +0200 (CEST) writes:

  .

Morten> Now that pcauchy has been fixed, it is becoming
Morten> clear that qcauchy suffers from the same problems.

Morten> 
Morten> qcauchy(pcauchy(1e100,0,1,FALSE,TRUE),0,1,FALSE,TRUE)

Morten> should yield 1e100 back, but I get 1.633178e+16.
Morten> The code below does much better.  Notes:

Morten> 1. p need not be finite.  -Inf is ok in the log_p
Morten> case and R_Q_P01_check already checks things.

MM> yes

Morten> 2. No need to disallow scale=0 and infinite
Morten> location.

MM> yes

Morten> 3. The code below uses isnan and finite directly.
Morten> It needs to be adapted to the R way of doing that.

MM> I've done this, and started testing the new code; a version will
MM> be put into the next version of R.

MM> Thank you for the suggestions.

>>> double
>>> qcauchy (double p, double location, double scale, int lower_tail, int 
log_p)
>>> {
>>> if (isnan(p) || isnan(location) || isnan(scale))
>>> return p + location + scale;

>>> R_Q_P01_check(p);
>>> if (scale < 0 || !finite(scale)) ML_ERR_return_NAN;

>>> if (log_p) {
>>> if (p > -1)
>>> lower_tail = !lower_tail, p = -expm1 (p);
>>> else
>>> p = exp (p);
>>> }
>>> if (lower_tail) scale = -scale;
>>> return location + scale / tan(M_PI * p);
>>> }

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] qcauchy accuracy (PR#7902)

2005-06-01 Thread maechler

> "Morten" == Morten Welinder <[EMAIL PROTECTED]>
> on Fri, 27 May 2005 20:24:36 +0200 (CEST) writes:

Morten> Full_Name: Morten Welinder Version: 2.1.0 OS: src
Morten> only Submission from: (NULL) (216.223.241.212)

Morten> Now that pcauchy has been fixed, it is becoming
Morten> clear that qcauchy suffers from the same problems.

Morten>
Morten> qcauchy(pcauchy(1e100,0,1,FALSE,TRUE),0,1,FALSE,TRUE)

Morten> should yield 1e100 back, but I get 1.633178e+16.
Morten> The code below does much better.  Notes:

Morten> 1. p need not be finite.  -Inf is ok in the log_p
Morten> case and R_Q_P01_check already checks things.

yes

Morten> 2. No need to disallow scale=0 and infinite
Morten> location.

yes

Morten> 3. The code below uses isnan and finite directly.
Morten> It needs to be adapted to the R way of doing that.

I've done this, and started testing the new code; a version will
be put into the next version of R.

Thank you for the suggestions.

   >> double
   >> qcauchy (double p, double location, double scale, int lower_tail, int 
log_p)
   >> {
   >>   if (isnan(p) || isnan(location) || isnan(scale))
   >> return p + location + scale;

   >>   R_Q_P01_check(p);
   >>   if (scale < 0 || !finite(scale)) ML_ERR_return_NAN;

   >>   if (log_p) {
   >> if (p > -1)
   >>lower_tail = !lower_tail, p = -expm1 (p);
   >> else
   >>p = exp (p);
   >>   }
   >>   if (lower_tail) scale = -scale;
   >>   return location + scale / tan(M_PI * p);
   >> }

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ?strptime ambiguity (PR#7907)

2005-05-30 Thread maechler

> "tobias" == tobias verbeke <[EMAIL PROTECTED]>
> on Mon, 30 May 2005 11:37:31 +0200 (CEST) writes:

tobias> Full_Name: Tobias Verbeke Version: 2.1.0 OS:
tobias> GNU/Linux Submission from: (NULL) (81.247.252.229)

tobias> Would it be possible to use a non-ambiguous example
tobias> of expressing a day according to the ISO 8601
tobias> international standard in the first sentence of the
tobias> Note section of ?strptime, e.g. "2001-04-18" instead
tobias> of "2001-02-03" ?

yes, it would be possible...

More seriously:  Thank you for the suggestion; it will be in
R-devel.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Reversing axis in a log plot (PR#7894)

2005-05-30 Thread maechler

> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
> on Fri, 27 May 2005 11:42:54 +0200 (CEST) writes:

UweL> Please find attached my proposal for a bugfix. The
UweL> experts might to better, though.

UweL> Uwe Ligges

Thank you, Uwe!

Yes, your fix works, and yes, it can be improved (e.g., no need for
the 'atr'; no 'reversed' when in the linear case, ..)

Also, there won't be any warning either anymore,
since
plot(1:3, exp(1:3), xlim = c(30,1))
also gives no warning.

You'll find the fixed plot.c in tomorrow's snapshot.

Martin

BTW:  The whole thing {axis reversion} doesn't work in S-plus
  {doing a non-sensical plot, with tons of warnings}

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rout for library/base/R-ex/Extract.data.frame.R

2005-05-25 Thread Martin Maechler

>>>>> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
>>>>> on Wed, 25 May 2005 11:08:18 +0200 writes:

UweL> Vadim Ogranovich wrote:
>> Hi,
>> 
>> I am writing a light-weight data frame class and want to
>> borrow the test cases from the standard data frame. I
>> found the test cases in
>> library/base/R-ex/Extract.data.frame.R, but surprisingly
>> no corresponding .Rout files. In fact there is no *.Rout
>> file in the entire tarball. Not that I cann't generate
>> them, but I am just curious why they are not there? How
>> does the base package get tested?
>> 
>> Thanks, Vadim

UweL> The base packages have their test cases in ...R/tests
UweL> rather than R/src/library/packagename

yes, and the *examples* from the help pages are just run, and
not compared to prespecified output in *.Rout.save (sic!) files.
In an *installed* (not the source!) version of R or an R package 
you find the R code for all the examples from the help pages
in /R-ex/*.R.   
That's the same for all R packages, not just the standard
packages.

Martin Maechler

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-exts.texi: nuke-trailing-whitespace has changed name (PR#7888)

2005-05-25 Thread Martin Maechler

Thank you,
Bjørn-Helge,

> "BjøHM" == Bjørn-Helge Mevik <[EMAIL PROTECTED]>
> on Sun, 22 May 2005 22:56:49 +0200 (CEST) writes:

  ..

BjøHM> In Appendix B R coding standards of the Writing R
BjøHM> Extensions manual, Emacs/ESS users are encouraged to
BjøHM> use

BjøHM> ..

BjøHM> However, as of ess-5.2.4 (current is 5.2.8)
BjøHM> `nuke-trailing-whitespace' has changed name to
BjøHM> `ess-nuke-trailing-whitespace'.

BjøHM> In addition: by default, ess-nuke-trailing-whitespace
BjøHM> is a no-op (as was nuke-trailing-whitespace).  To
BjøHM> `activate' it one must set
BjøHM> ess-nuke-trailing-whitespace-p to t or 'ask (default
BjøHM> is nil), e.g.  (setq ess-nuke-trailing-whitespace-p t)

Thank you. I've now changed it to

-

.
.

(add-hook 'local-write-file-hooks
  (lambda ()
(ess-nuke-trailing-whitespace)
(setq ess-nuke-trailing-whitespace-p 'ask)
;; or even
;; (setq ess-nuke-trailing-whitespace-p t)

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: Fwd: Re: [Rd] Implementation of the names attribute of attribute lists

2005-05-11 Thread Martin Maechler

>>>>> "Gabriel" == Gabriel Baud-Bovy <[EMAIL PROTECTED]>
>>>>> on Tue, 10 May 2005 19:00:53 +0200 writes:

Gabriel> Hi Martin,
Gabriel> Thanks for your reply. I am responding on r-devel to
Gabriel> provide some examples of outputs of the function that
Gabriel> I had list in the post-scriptum of my previous
Gabriel> email (BTW, did my post went through the list? I
Gabriel> subscribed only after mailing it).

Gabriel> You wrote:

>> Just to ask the obvious:
>> 
>> Why is using  str() not sufficient for you and instead,
>> you use  'print.object' {not a good name, BTW, since it looks like a
>> print() S3 method but isn't one} ?

Gabriel> Would printObject or printSEXP a better name?

definitely better because not interfering with the S3 pseudo-OO convention...
Still not my taste though :   
  every R object is an object (:-) -- and a SEXP internally --
  and we don't use 'fooObject' for other function names even
  though their arguments are R objects
My taste would rather lead to something like
 'displayStructure' (or 'dissectInternal' ;-) or a shorter
version of those.

>> The very few cases I found it was insufficient,
>> certainly  dput()  was, possibly even using it as
>> dput(. , control = ).

Gabriel> As I wrote in my email, I might have reinvented
Gabriel> the wheel. I did not know str! 

(amazingly ... ;-)

Gabriel> The output of str and print.object is quite similar
Gabriel> for atomic and list objects. I might look at this
Gabriel> function to change the argument names of the
Gabriel> print.object function.

Gabriel> However, the output of str is quite different
Gabriel> for language expressions and does not show as well
Gabriel> the their list-like strcuture since it respects
Gabriel> the superficial C-like syntax of the R language
Gabriel> (at the textual level).

Ok, thanks for clarifying this aspect, and the difference to
both str() and dput() here.

< much omitted ...>

Martin Maechler,
ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Re: [R] Unbundling gregmisc (was: loading gap package)

2005-05-04 Thread Martin Maechler

> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
> on Wed, 4 May 2005 16:29:33 +0100 (BST) writes:

...


BDR> .. we need some education about how to use the
BDR> power of *.packages (and we need to get the MacOS
BDR> versions in place).

and maybe  we should even consider adding regression tests for
the *.packages() functions so chances increase they will work on
all platforms

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] (PR#7826) Re: ... print.POSIXct .. infinite recursion

2005-04-30 Thread maechler

Thank you, Jskud.

I can reproduce your problem, though not the
seg.fault, see below

>>>>> "Jskud" == Jskud  <[EMAIL PROTECTED]>
>>>>> on Sat, 30 Apr 2005 09:04:03 +0200 (CEST) writes:

Jskud> In attempting to build R using rpmbuild --rebuild
Jskud> R-2.1.0-0.fdr.2.fc3.src.rpm

Jskud> on a fairly up-to-date RedHat 9 system (that is, with
Jskud> patches installed through May 1 2004), it failed at
Jskud> the make check-all step.

Jskud> The problem was reproducible by going into the tests
Jskud> directory and

Jskud>  make test-Segfault

<>

Jskud> I was able to reproduce the problem (a segfault) as the following 
simple
Jskud> transcript demonstrates: 

Jskud> LC_ALL=C SRCDIR=. R_DEFAULT_PACKAGES= ../bin/R --vanilla

Jskud> R : Copyright 2005, The R Foundation for Statistical Computing
Jskud> Version 2.1.0  (2005-04-18), ISBN 3-900051-07-0

<>

>> unusual_but_ok <- c.POSIXlt(character(0))
>> unusual_but_ok
Jskud> character(0)
>> unusual_and_faults <- c.POSIXct(character(0))
>> unusual_and_faults
Jskud> Segmentation fault

Jskud> Running this test program under gdb, we find that we're running off 
the
Jskud> end of the stack, with 4222 stack frames showing -- apparently in an
Jskud> infinite recursion -- "as.character" shows up every 69 function 
calls:
<>

Jskud> So it would seem that *printing* the unusual POSIXct
Jskud> value is suspect.  

that's correct.

Jskud> value is suspect.  Looking at a R-1.8.1 install, we
Jskud> find these definitions in base/R/base:

Jskud> print.POSIXct <- function(x, ...)  {
Jskud> print(format(x, usetz=TRUE), ...)  invisible(x) }

Jskud> print.POSIXlt <- function(x, ...)  {
Jskud> print(format(x, usetz=TRUE), ...)  invisible(x) }

Jskud> However, looking at the 2.1.0 src file
Jskud> R-2.1.0/src/library/base/R/datetime.R, we find

Jskud> print.POSIXct <- function(x, ...)  {
Jskud> print(format(x, usetz=TRUE, ...), ...)  invisible(x)
Jskud> }

Jskud> print.POSIXlt <- function(x, ...)  {
Jskud> print(format(x, usetz=TRUE), ...)  invisible(x) }

Jskud> Note the suspicious definition of print.POSIXct using
Jskud> *two* sets of ellipses, and that the print.POSIXct
Jskud> and print.POSIXlt definitions no longer match.

well, passing the "..." to both format() and print()
is probably on purpose -- and I assume even fixes another bug.  
You are right however in wondering, why this is done only in
print.*ct() and not in print.*lt().

The infinite recursion, BTW, happens with format(), not print()...:
Here is the end of the stack you get from 
traceback(), after e.g. options(expressions = 50)

13: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...), 
names = names(x))
12: format.POSIXct(x, ...)
11: format(x, ...)
10: as.character.POSIXt(x)
9: as.character(x)
8: strptime(x, f)
7: fromchar(x)
6: as.POSIXlt(x, tz)
5: inherits(x, "POSIXlt")
4: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...)
3: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...), 
   names = names(x))
2: format.POSIXct(unusual_and_faults, usetz = TRUE)
1: format(unusual_and_faults, usetz = TRUE)

- - - 
Unfortunately, I must do less fun things at the moment than
fixing such a bug...  but of course it *will* be fixed rather
sooner than later.

Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Enhanced version of plot.lm()

2005-04-27 Thread Martin Maechler

>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
>>>>> on 27 Apr 2005 16:54:02 +0200 writes:

PD> Martin Maechler <[EMAIL PROTECTED]> writes:
>> I'm about to commit the current proposal(s) to R-devel,
>> **INCLUDING** changing the default from 
>> 'which = 1:4' to 'which = c(1:3,5)
>> 
>> and ellicit feedback starting from there.
>> 
>> One thing I think I would like is to use color for the Cook's
>> contours in the new 4th plot.

PD> Hmm. First try running example(plot.lm) with the modified function and
PD> tell me which observation has the largest Cook's D. With the suggested
PD> new 4th plot it is very hard to tell whether obs #49 is potentially or
PD> actually influential. Plots #1 and #3 are very close to conveying the
PD> same information though...

I shouldn't be teaching here, and I know that I'm getting into fighted
territory (regression diagnostics; robustness; "The" Truth, etc,etc)
but I believe there is no unique way to define "actually influential"
(hence I don't believe that it's extremely useful to know
exactly which Cook's D is largest).

Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way. 
AFAIK, all observation-influence measures g(i) are functions of
(r_i, h_{ii}) and the latter are the quantities that "regression
users" should really know {without consulting a text book} and
that are generalizable {e.g. to "linear smoothers" such as
gam()s (for "non-estimated" smoothing parameter)}.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Enhanced version of plot.lm()

2005-04-27 Thread Martin Maechler

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Tue, 26 Apr 2005 12:13:38 +0200 writes:

>>>>> "JMd" == John Maindonald <[EMAIL PROTECTED]>
>>>>> on Tue, 26 Apr 2005 15:44:26 +1000 writes:

JMd> The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
JMd> now includes files:
JMd> plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in 
JMd> which
JMd> David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6.
JMd> The tick labels are in units of leverage, and the contour labels are
JMd> in units of absolute values of the standardized residual.

JMd> plot6.lm.Rd file: A matching help file

JMd> Comments will be welcome.

MM> Thank you John!

MM> The *.Rd has the new references and a new example but
MM> is not quite complete: the \usage{} has only 4 captions,
MM> \arguments{ .. \item{which} ..}  only mentions '1:5' --- but
MM> never mind.

MM> One of the new examples is

MM> ## Replace Cook's distance plot by Residual-Leverage plot
MM> plot(lm.SR, which=c(1:3, 5))

MM> and -- conceptually I'd really like to change the default from
MM> 'which = 1:4' to the above
MM> 'which=c(1:3, 5))' 

MM> This would be non-compatible though for all those that have
MM> always used the current default 1:4. 
MM> OTOH, "MASS" or Peter Dalgaard's book don't mention  plot( )
MM> or at least don't show it's result.

MM> What do others think?
MM> How problematic would a change be in the default plots that
MM> plot.lm() produces?

JMd> Another issue, discussed recently on r-help, is that when the model
JMd> formula is long, the default sub.caption=deparse(x$call) is broken
JMd> into multiple text elements and overwrites.  
MM> good point!

JMd> The only clean and simple way that I can see to handle
JMd> is to set a default that tests whether the formula is
JMd> broken into multiple text elements, and if it is then
JMd> omit it.  Users can then use their own imaginative
JMd> skills, and such suggestions as have been made on
JMd> r-help, to construct whatever form of labeling best
JMd> suits their case, their imaginative skills and their
JMd> coding skills.

MM> Hmm, yes, but I think we (R programmers) could try a bit harder
MM> to provide a reasonable default, e.g., something along

MM> cap <- deparse(x$call, width.cutoff = 500)[1]
MM> if((nc <- nchar(cap)) > 53) 
MM>   cap <- paste(substr(cap, 1, 50), "", substr(cap, nc-2, nc))

MM> {untested;  some of the details will differ;
MM> and the '53', '50' could depend on par("..") measures}

In the mean time, I came to quite a nice way of doing this:

if(is.null(sub.caption)) { ## construct a default:
cal <- x$call
if (!is.na(m.f <- match("formula", names(cal {
cal <- cal[c(1, m.f)]
names(cal)[2] <- "" # drop  " formula = "
}
cc <- deparse(cal, 80)
nc <- nchar(cc[1])
abbr <- length(cc) > 1 || nc > 75
sub.caption <-
if(abbr) paste(substr(cc[1], 1, min(75,nc)), "...") else cc[1]
}

I'm about to commit the current proposal(s) to R-devel,
**INCLUDING** changing the default from 
  'which = 1:4' to 'which = c(1:3,5)

and ellicit feedback starting from there.

One thing I think I would like is to use color for the Cook's
contours in the new 4th plot.

Martin

<.. lots deleted ..>

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] smooth.spline(): residuals(), fitted(),...

2005-04-27 Thread Martin Maechler

It has bothered me for quite some time that a smoothing spline
fit doesn't allow access to residuals or fitted values in
general, since after
  fit <- smooth.spline(x,y, *)

the resulting fit$x is really equal to the unique (up to 1e-6
precision) sorted original x values and fit$yin (and $y) is accordingly.

There are several possible ways to implement the missing
feature.  My current implementation would add a new argument
'keep.data' which when set to TRUE would make sure that the
original (x, y, w) are kept such that fitted values and (weighted
or unweighted) residuals are sensibly available from the result.

My main RFC (:= request for comments) is about the
acceptance of the new behavior to become the *default*
(i.e. 'keep.data = TRUE' would be default) such that by default
residuals(smooth.spline(...)) will work.

The drawback of the new default behavior would be that
potentially a 'fit' can become quite a bit larger than previously, e.g.
in the following extremely artificial example

  x0 <- seq(0,1, by = 0.1)
  x <- sort(sample(x0, 1000, replace = TRUE))
  ff <- function(x) 10*(x-1/4)^2 + sin(7*pi*x)
  y <- ff(x) + rnorm(x) / 2
  fit <- smooth.spline(x,y)

but typically the size increase will only be less than about 40%.

Comments are welcome.

Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Enhanced version of plot.lm()

2005-04-26 Thread Martin Maechler

> "JMd" == John Maindonald <[EMAIL PROTECTED]>
> on Tue, 26 Apr 2005 15:44:26 +1000 writes:

JMd> The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
JMd> now includes files:
JMd> plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in 
JMd> which
JMd> David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6.
JMd> The tick labels are in units of leverage, and the contour labels are
JMd> in units of absolute values of the standardized residual.

JMd> plot6.lm.Rd file: A matching help file

JMd> Comments will be welcome.

Thank you John!

The *.Rd has the new references and a new example but
is not quite complete: the \usage{} has only 4 captions,
\arguments{ .. \item{which} ..}  only mentions '1:5' --- but
never mind.

One of the new examples is

## Replace Cook's distance plot by Residual-Leverage plot
plot(lm.SR, which=c(1:3, 5))

and -- conceptually I'd really like to change the default from
'which = 1:4' to the above
'which=c(1:3, 5))' 

This would be non-compatible though for all those that have
always used the current default 1:4. 
OTOH, "MASS" or Peter Dalgaard's book don't mention  plot( )
or at least don't show it's result.

What do others think?
How problematic would a change be in the default plots that
plot.lm() produces?

JMd> Another issue, discussed recently on r-help, is that when the model
JMd> formula is long, the default sub.caption=deparse(x$call) is broken
JMd> into multiple text elements and overwrites.  
good point!

JMd>  The only clean and simple way that I can see to handle
JMd> is to set a default that tests whether the formula is
JMd> broken into multiple text elements, and if it is then
JMd> omit it.  Users can then use their own imaginative
JMd> skills, and such suggestions as have been made on
JMd> r-help, to construct whatever form of labeling best
JMd> suits their case, their imaginative skills and their
JMd> coding skills.

Hmm, yes, but I think we (R programmers) could try a bit harder
to provide a reasonable default, e.g., something along

 cap <- deparse(x$call, width.cutoff = 500)[1]
 if((nc <- nchar(cap)) > 53)
 cap <- paste(substr(cap, 1, 50), "", substr(cap, nc-2, nc))

{untested;  some of the details will differ;
 and the '53', '50' could depend on par("..") measures}

JMd> John Maindonald.

JMd> On 25 Apr 2005, at 8:00 PM, David Firth wrote:

>> From: David Firth <[EMAIL PROTECTED]>
>> Date: 24 April 2005 10:23:51 PM
>> To: John Maindonald <[EMAIL PROTECTED]>
>> Cc: r-devel@stat.math.ethz.ch
>> Subject: Re: [Rd] Enhanced version of plot.lm()
>> 
>> 
>> On 24 Apr 2005, at 05:37, John Maindonald wrote:
>> 
>>> I'd not like to lose the signs of the residuals. Also, as
>>> plots 1-3 focus on residuals, there is less of a mental
>>> leap in moving to residuals vs leverage; residuals vs
>>> leverage/(1-leverage) would also be in the same spirit.
>> 
>> Yes, I know what you mean.  Mental leaps are a matter of 
>> taste...pitfalls, etc, come to mind.
>> 
>>> 
>>> Maybe, one way or another, both plots (residuals vs
>>> a function of leverage, and the plot from Hinkley et al)
>>> should go in.  The easiest way to do this is to add a
>>> further which=6.  I will do this if the consensus is that
>>> this is the right way to go.  In any case, I'll add the
>>> Hinkley et al reference (author of the contribution that
>>> includes p.74?) to the draft help page.
>> 
>> Sorry, I should have given the full reference, which (in BibTeX format 
>> from CIS) is
>> 
>> @inproceedings{Firt:gene:1991,
>> author = {Firth, D.},
>> title = {Generalized Linear Models},
>> year = {1991},
>> booktitle = {Statistical Theory and Modelling. In Honour of Sir 
>> David Cox, FRS},
>> editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.},
>> publisher = {Chapman \& Hall Ltd},
>> pages = {55--82},
>> keywords = {Analysis of deviance; Likelihood}
>> }
>> 
>> David
>> 
JMd> John Maindonald email: [EMAIL PROTECTED]
JMd> phone : +61 2 (6125)3473fax  : +61 2(6125)5549
JMd> Centre for Bioinformation Science, Room 1194,
JMd> John Dedman Mathematical Sciences Building (Building 27)
JMd> Australian National University, Canberra ACT 0200.

JMd> __
JMd> R-devel@stat.math.ethz.ch mailing list
JMd> https://stat.ethz.ch/mailman/listinfo/r-devel

> "JMd" == John Maindonald <[EMAIL PROTECTED]>
> on Tue, 26 Apr 2005 15:44:26 +1000 writes:

JMd> The web page
JMd> http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ now
JMd> includes files: plot.lm.RData: Image for file for
JMd> plot6.lm, a version of plot.lm in which David Firth's
JMd> Cook's distance vs leverage/(1-leverage) plot

Re: [Rd] Speeding up library loading

2005-04-25 Thread Martin Maechler

> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
> on Mon, 25 Apr 2005 18:51:50 +0200 writes:

UweL> Ali - wrote:
>> (1) When R tries to load a library, does it load 'everything' in the 
>> library at once?

UweL> No, see ?lazyLoad

are you sure Ali is talking about *package*s.
He did use the word "library" though, and most of us (including
Uwe!) know the difference...

>> (2) Is there any options to 'load as you go'?

UweL> Well, this is the way R does it

for packages yes, because of lazyloading, as Uwe mentioned above.

For libraries, (you know: the things you get from compiling and
linking C code ..), it may be a bit different.

What do you really mean, packages or libraries,
Ali?

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Overloading methods in R

2005-04-20 Thread Martin Maechler

>>>>> "Ali" == Ali - <[EMAIL PROTECTED]>
>>>>> on Wed, 20 Apr 2005 15:45:09 + writes:

Ali> Thanks a lot Tony. I am trying to apply the overloading
Ali> to the methods created by R.oo package and,
Ali> unfortunately, R.oo uses S3-style classes; so I cannot
Ali> use the features of S4 methods as you described. On the
Ali> other hand, I caouldn't find a decent OO package which
Ali> is based on S4 AND comes with the official release of
Ali> R.

Ali, maybe we R-core members are not decent enough.
But we strongly believe that we don't want to advocate yet
another object system additionally to the S3 and S4 one,
and several of us have given talks and classes, even written
books on how to do "decent" object oriented programming 
`just' with the S3 and/or S4 object system.

No need of additional "oo" in our eyes.
Your main problem is that you assume what "oo" means {which may
well be true} but *additionally* you also assume that OO has to
be done in the same way you know it from Python, C++, or Java..

Since you are new, please try to learn the S4 way,
where methods belong to (generic) functions more than
to classes in some way, particularly if you compare with other
OO systems where methods belong entirely to classes.
This is NOT true for R (and S-plus) and we don't want this to
change {and yes, we do know about C++, Python, Java,... and
their way to do OO}.

Please also read in more details the good advice given by Tony
Plate and Sean Davis.

Martin Maechler,
ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: hexadecimal constants and decimal points

2005-04-18 Thread Martin Maechler

>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Sun, 17 Apr 2005 12:38:10 +0100 (BST) writes:

BDR> These are some points stimulated by reading about C history (and 
BDR> related in their implementation).

<.>

BDR> 2) R does not have integer constants.  It would be
BDR> convenient if it did, and I can see no difficulty in
BDR> allowing the same conversions when parsing as when
BDR> coercing.  This would have the side effect that 100
BDR> would be integer (but the coercion rules would come
BDR> into play) but 20 would be double.  And
BDR> x <- 0xce80 would be valid.

Hmm, I'm not sure if this (parser change, mainly) is worth the
potential problems.  Of course you (Brian) know better than
anyone here that, when that change was implemented for S-plus, I think
Mathsoft (the predecessor of 'Insightful') did also change all
their legacy S code and translate all '' to '.'  just in
order to make sure that things stayed back compatible.  
And, IIRC, they recommended users to do so similarly with their
own S source files. I had found this extremely ugly at the time,
but it was mandated by the fact they didn't want to break
existing code which in some places did assume that e.g. '0' was
a double but became an integer in the new version of S-plus
{and e.g., as.double(.) became absolutely mandated before passing
 things to C  --- of course, using as.double(.) ``everywhere''
 before passing to C has been recommended for a long time which
 didn't prevent people to rely on the current behavior (in R) that
 almost all numbers are double}. 

We (or rather the less sophisticated members of the R community)
may get into similar problems when, e.g.,
matrix(0, 3,4)  suddenly produces an integer matrix instead of a
double precision one.

BDR> 3) We do allow setting LC_NUMERIC, but it partially breaks R if the 
BDR> decimal point is not ".".  (I know of no locale in which it is not "." 
or 
BDR> ",", and we cannot allow "," as part of numeric constants when 
parsing.) 
BDR> E.g.:

>> Sys.setlocale("LC_NUMERIC", "fr_FR")
BDR> [1] "fr_FR"
BDR> Warning message:
BDR> setting 'LC_NUMERIC' may cause R to function strangely in: 
BDR> setlocale(category, locale)
>> x <- 3.12
>> x
BDR> [1] 3
>> as.numeric("3,12")
BDR> [1] 3,12
>> as.numeric("3.12")
BDR> [1] NA
BDR> Warning message:
BDR> NAs introduced by coercion

BDR> We could do better by insisting that "." was the decimal point in all 
BDR> interval conversions _to_ numeric.  Then the effect of setting 
LC_NUMERIC 
BDR> would primarily be on conversions _from_ numeric, especially printing 
and 
BDR> graphical output.  (One issue would be what to do with scan(), which 
has a 
BDR> `dec' argument but is implemented assuming LC_NUMERIC=C.  I would hope 
to 
BDR> continue to have `dec' but perhaps with a locale-dependent default.)  
The 
BDR> resulting asymmetry (R would not be able to parse its own output) 
would be 
BDR> unhappy, but seems inevitable. (This could be implemented easily by 
having 
BDR> a `dec' arg to EncodeReal and EncodeComplex, and using LC_NUMERIC to 
BDR> control that rather than actually setting the local category.  For 
BDR> example, deparsing needs to be done in LC_NUMERIC=C.)

Yes, I like this quite a bit:

 -  Only allow "." as decimal point in conversions to numeric.

 -  Allowing "," (or other locale settings if there are) for
conversions _from_ numeric will be very attractive to some
(not to me) and will make the use of R's ``reporting
facility' much more natural to them. 

  That the asymmetry is bit unhappy -- and that will be a good reason
  to advocate (to the user community) that using "," for decimal
  point may be a bad idea in general.

Martin Maechler
ETH Zurich

BDR> All of these could be implemented by customized versions of 
BDR> strtod/strtol.

BDR> -- 
BDR> Brian D. Ripley,  [EMAIL PROTECTED]
BDR> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
BDR> University of Oxford, Tel:  +44 1865 272861 (self)
BDR> 1 South Parks Road, +44 1865 272866 (PA)
BDR> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: hexadecimal constants and decimal points

2005-04-18 Thread Martin Maechler

> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Mon, 18 Apr 2005 03:33:42 -0400 (EDT) writes:

>> On Sun, 17 Apr 2005, Jan T. Kim wrote:
>> 
>>> On Sun, Apr 17, 2005 at 12:38:10PM +0100, Prof Brian Ripley wrote:
 These are some points stimulated by reading about C history (and
 related in their implementation).

 1) On some platforms

> as.integer("0xA")
 [1] 10

 but not all (not on Solaris nor Windows).  We do not define what is
 allowed, and rely on the OS's implementation of strtod (yes, not
 strtol).
 It seems that glibc does allow hex: C99 mandates it but C89 seems not
 to
 allow it.

 I think that was a mistake, and strtol should have been used.  Then C89
 does mandate the handling of hex constants and also octal ones.  So
 changing to strtol would change the meaning of as.integer("011").
>>> 
>>> I think interpretation of a leading "0" as a prefix indicating an octal
>>> representation should indeed be avoided. People not familiar to C will
>>> have a hard time understanding and getting used to this concept, and
>>> in addition, it happens way too often that numeric data are provided
>>> left-
>>> padded with zeros.

Duncan> I agree with this:  011 should be 11, it should not be 9.

I agree (with Duncan and Jan).

I'm sure the current (decimal) behavior is implicitly used in
many places of people's code that reads text files and
manipulates it.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] ESS 5.2.7 released

2005-04-18 Thread Martin Maechler

Dear ESS users, {BCC'ed to RPM and Debian maintainers of ESS}

We have now released ESS 5.2.7.  This is a bug fix release against 5.2.6
where - the new UTF-8 "support" gave problems for Xemacs, and
  - accidentally, 'auto-fill-mode' was activated for *.R buffers
with a few new features, see "New Features" below , notably some
extended Sweave supported, originally contributed by David Whiting.

I'm crossposting to R-devel just to make you aware that R 2.1.0,
bound to be released today, comes with UTF-8 (unicode) support and
that doesn't work correctly in ESS versions prior to 5.2.6.

Downloads from the ESS site http://ESS.R-project.org/ or
directly http://ess.r-project.org/downloads/ess/ as *.zip and
*.tar.gz files. Hopefully, *.deb and *.rpm will also be made
available in due time.

For the ESS core team,
Martin Maechler, ETH Zurich.

--- ANNOUNCE --


ANNOUNCING ESS
**

   The ESS Developers proudly announce the release of ESS

   5.2.7

   Emacs Speaks Statistics (ESS) provides an intelligent, consistent
interface between the user and the software.  ESS interfaces with
S-PLUS, R, SAS, BUGS and other statistical analysis packages under the
Unix, Microsoft Windows, and Apple Mac OS operating systems.  ESS is a
package for the GNU Emacs and XEmacs text editors whose features ESS
uses to streamline the creation and use of statistical software.  ESS
knows the syntax and grammar of statistical analysis packages and
provides consistent display and editing features based on that
knowledge.  ESS assists in interactive and batch execution of
statements written in these statistical analysis languages.

   ESS is freely available under the GNU General Public License (GPL).
Please read the file COPYING which comes with the distribution, for
more information about the license. For more detailed information,
please read the README files that come with ESS.

Getting the Latest Version
==

   The latest released version of ESS is always available on the web at:
ESS web page (http://ess.r-project.org) or StatLib
(http://lib.stat.cmu.edu/general/ESS/)

   The latest development version of ESS is available via
`https://svn.R-project.org/ESS/', the ESS Subversion repository.  If
you have a Subversion client (see `http://subversion.tigris.org/'), you
can download the sources using:
 % svn checkout https://svn.r-project.org/ESS/trunk PATH

which will put the ESS files into directory PATH.  Later, within that
directory, `svn update' will bring that directory up to date.
Windows-based tools such as TortoiseSVN are also available for
downloading the files.  Alternatively, you can browse the sources with a
web browser at: ESS SVN site (https://svn.r-project.org/ESS/trunk).
However, please use a subversion client instead to minimize the load
when retrieving.

   If you remove other versions of ESS from your emacs load-path, you
can then use the development version by adding the following to .emacs:

 (load "/path/to/ess-svn/lisp/ess-site.el")

   Note that https is required, and that the SSL certificate for the
Subversion server of the R project is

 Certificate information:
  - Hostname: svn.r-project.org
  - Valid: from Jul 16 08:10:01 2004 GMT until Jul 14 08:10:01 2014 GMT
  - Issuer: Department of Mathematics, ETH Zurich, Zurich, Switzerland, CH
  - Fingerprint: c9:5d:eb:f9:f2:56:d1:04:ba:44:61:f8:64:6b:d9:33:3f:93:6e:ad

(currently, there is no "trusted certificate").  You can accept this
certificate permanently and will not be asked about it anymore.

Current Features


   * Languages Supported:
* S family (S 3/4, S-PLUS 3.x/4.x/5.x/6.x/7.x, and R)

* SAS

* BUGS

* Stata

* XLispStat including Arc and ViSta

   * Editing source code (S family, SAS, BUGS, XLispStat)
* Syntactic indentation and highlighting of source code

* Partial evaluation of code

* Loading and error-checking of code

* Source code revision maintenance

* Batch execution (SAS, BUGS)

* Use of imenu to provide links to appropriate functions

   * Interacting with the process (S family, SAS, XLispStat)
* Command-line editing

* Searchable Command history

* Command-line completion of S family object names and file
  names

* Quick access to object lists and search lists

* Transcript recording

* Interface to the help system

   * Transcript manipulation (S family, XLispStat)
* Recording and saving transcript files

* Manipulating and editing saved transcripts

* Re-evaluating commands from transcript files

   * Help File Editing (R)
* Syntactic indentation and highlighting of source code.

* Sending Examples to running ESS process.

* Previewing

Requirement

[Rd] recent spam on mailing lists

2005-04-17 Thread Martin Maechler

Probably not many of you have noticed, since I assume you
have your own active spam filters:
But we have recently (first time on Friday April 15) had problems
with spam filtering on our mail server.  The spamassassin daemon
(spamd) has "died" for no apparent reason, and hence the mail
has been passed through without spam filtering.

We have had a look at the log files and haven't got a real clue
about the reasons.  As stop-gap measure we now have a "nanny
script" that tries to see if 'spamd' lives, and restarts it in
case it isn't there anymore.

Martin Maechler
ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] exists("loadings.default") ...

2005-04-11 Thread Martin Maechler

Paul Gilbert asked me the following, about a topic that was
dealt here (on R-devel) a few weeks ago (~ March 21):

> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
> on Mon, 11 Apr 2005 10:35:03 -0400 writes:

PaulG> Martin, a while ago you suggested:

>> For S3, it's a bit uglier, but I think you could still do -- in your
>> package --

>> if(!exists("loadings.default", mode="function")) {
>>   loadings.default <- loadings
>>   loadings <- function(x, ...) UseMethod("loadings")
>> }

PaulG> I don't think exists works properly here if namespaces are used and
PaulG> loadings.default is not exported. (i.e. it always gives false.) I can
PaulG> redefine loadings and loadings.default, but I can't guard against the
PaulG> possibility that those might actually be defined in stats someday.

Yes, you are correct, one cannot easily use exists() for this
when namespaces are involved.

For S3 methods, instead of exists(), I think one should use
something like

  > !is.null(getS3method("loadings", "default", optional = TRUE))
  [1] FALSE
  > !is.null(getS3method("predict", "ppr", optional = TRUE))
  [1] TRUE

Apart from the need to mention something along this line on
'exists' help page,  I wonder if we shouldn't even consider providing an  
existsS3method() wrapper, or alternatively and analogously to getAnywhere() an
existsAnywhere() function.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] strange error with rw2010dev

2005-04-11 Thread Martin Maechler

> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
> on 11 Apr 2005 09:46:11 +0200 writes:

 .

 MM> Thanks again for the report; this should be fixable
 MM> before release.

PD> Preferably before code freeze! (today)

PD> I think we (Thomas L.?) got it analysed once before: The
PD> issue is that summary.matrix is passing
PD> data.frame(object) back to summary.data.frame without
PD> removing the AsIs class.

PD> I don't a simple unclass() will do here.

or, a bit more cautiously,

summary.matrix <- function(object, ...)
summary.data.frame(data.frame(if(inherits(object,"AsIs")) unclass(object)
else object), ...)

That does cure the problem in the Kjetil's example and the equivalent

 ## short 1-liner:
 summary(df <- data.frame(mat = I(matrix(1:8, 2

I'm currently make-checking the above.
Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] strange error with rw2010dev

2005-04-10 Thread Martin Maechler

> "Kjetil" == Kjetil Brinchmann Halvorsen <[EMAIL PROTECTED]>
> on Sun, 10 Apr 2005 14:00:52 -0400 writes:

Kjetil> The error reported below still occurs in todays
Kjetil> (2005-04-08) rw2010beta, should I file a formal bug
Kjetil> report?

Thank you, Kjetil.

It seems nobody has found time to look at this in the mean time.
However,
I can confirm the bug on quite a different platform
(Linux Redhat 64-bit on AMD 64).
The problem is infinite recursion which you see more easily,
when you set something like options(expressions=500).

Further note that the bug is not new, it also happens in
previous versions of R ( -> i.e. no reason to stop using "R 2.1.0 beta"!)

Here's a ``pure script''

testmat <- matrix(1:80, 20,4)
dim(testmat)
#
testframe <- data.frame(testmat=I(testmat),
x=rnorm(20), y=rnorm(20), z=sample(1:20))
str(testframe)

options(expressions=100)
summary(testframe)
##> Error: evaluation nested too deeply: infinite recursion / 
options(expression=)?
## -- or --
##> Error: protect(): protection stack overflow

### In the second case, I at least get a useful trace back:

traceback() ## longish output, shows the infinite recursion:

..
...

17: summary.data.frame(data.frame(object), ...)
16: summary.matrix(object, digits = digits, ...)
15: summary.default(X[[1]], ...)
14: FUN(X[[1]], ...)
13: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, 
...)
12: summary.data.frame(data.frame(object), ...)
11: summary.matrix(object, digits = digits, ...)
10: summary.default(X[[1]], ...)
9: FUN(X[[1]], ...)
8: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, 
   ...)
7: summary.data.frame(data.frame(object), ...)
6: summary.matrix(object, digits = digits, ...)
5: summary.default(X[[1]], ...)
4: FUN(X[[1]], ...)
3: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, 
   ...)
2: summary.data.frame(testframe)
1: summary(testframe)

Thanks again for the report;
this should be fixable before release.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] orphaning CRAN packages

2005-04-09 Thread Martin Maechler

> "Ted" == Ted Harding <[EMAIL PROTECTED]>
> on Sat, 09 Apr 2005 13:02:22 +0100 (BST) writes:

Ted> On 09-Apr-05 Uwe Ligges wrote:
>> [EMAIL PROTECTED] wrote:
>> 
>>> Dear R Developers,
>>> 
>>> the following CRAN packages do not cleanly pass R CMD
>>> check for quite some time now and did not have any
>>> updates since the time given. Several attempts by the
>>> CRAN admins to contact the package maintainers had no
>>> success.
>>> 
>>> norm, 1.0-9, 2002-05-07, WARN

Ted> It would be serious if 'norm' were to lapse, since it
Ted> is part of the 'norm+cat+mix+pan' family, and people
Ted> using any of these are likely to have occasion to use
Ted> the others.

Indeed!  I had a very similar thought but couldn't afford your
offer (below), so thanks a lot !

Ted> I'd offer to try to clean up 'norm' myself if only I
Ted> were up-to-date on R itself (I'm waiting for 2.1.0 to
Ted> come out, which I understand is scheduled to happen
Ted> soon, yes?).

yes, as Uwe has already confirmed.

Since R 2.1.0 is now in beta testing, we consider it very
stable, and having less bugs than any other version of R, so
please ("everyone!") follow Uwe's advice and install R 2.1.0"beta"

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] NaN and linear algebra

2005-03-23 Thread Martin Maechler

>>>>> "Bill" == Bill Northcott <[EMAIL PROTECTED]>
>>>>> on Wed, 23 Mar 2005 10:19:22 +1100 writes:

Bill> On 23/03/2005, at 12:55 AM, Simon Urbanek wrote:
>>> As I see it, the MacOS X behaviour is not IEEE-754 compliant.
>>> 
>>> I had a quick look at the IEEE web site and it seems quite clear that 
>>> NaNs should not cause errors, but propagate through calculations to 
>>> be tested at some appropriate (not too frequent) point.
>> 
>> This is not quite correct and in fact irrelevant to the problem you 
>> describe. NaNs may or may not signal, depending on how they are used. 
>> Certain operations on NaN must signal by the IEEE-754 standard. The 
>> error you get is not a trap, and it's not a result of a signal check, 
>> either. The whole problem is that depending on which algorithm is 
>> used, the NaNs will be used different ways and thus may or may not use 
>> signaling operations.

Bill> It may not violate the letter of IEEE-754 because matrix calculations 
Bill> are not covered, but it certainly violates the spirit that arithmetic 
Bill> should be robust and programs should not halt on these sorts of 
errors.
>> 
>> I don't consider the `solve' error a bug - in fact I would rather get 
>> an error telling me that something is wrong (although I agree that the 
>> error is misleading - the error given in Linux is a bit more helpful) 
>> than getting a wrong result.

Bill> You may prefer the error, but it is not in the sprit of robust 
Bill> arithmetic. ie
>> d<-matrix(NaN,3,3)
>> f<-solve(d)
Bill> Error in solve.default(d) : Lapack routine dgesv: system is exactly 
Bill> singular
>> f
Bill> Error: Object "f" not found

>> If I would mark something in your example as a bug that would be 
>> det(m)=0, because it should return NaN (remember, NaN==NaN is FALSE; 
>> furthermore if det was calculated inefficiently using Laplace 
>> expansion, the result would be NaN according to IEEE rules). det=0 is 
>> consistent with the error given, though. Should we check this in R 
>> before calling Lapack - if the vector contains NaNs, det/determinant 
>> should return NaN right away?

Bill> Clearly det(d) returning 0 is wrong.  As a result based on a 
Bill> computation including a NaN, it should return NaN.  The spirit of 
Bill> IEEE-754 is that the programmer should choose the appropriate point 
at 
Bill> which to check for NaNs.  I would interpret this to mean the R 
Bill> programmer not the R library developer.  Surely that is why R 
provides 
Bill> the is.nan function.

>> d
Bill> [,1] [,2] [,3]
Bill> [1,]  NaN  NaN  NaN
Bill> [2,]  NaN  NaN  NaN
Bill> [3,]  NaN  NaN  NaN
>> is.nan(solve(d))
Bill> Error in solve.default(d) : Lapack routine dgesv: system is exactly 
Bill> singular

Bill> This is against the spirit of IEEE-754 because it halts the program.

>> is.nan(det(d))
Bill> [1] FALSE

Bill> That is plain wrong.

>> 
>> Many functions in R will actually bark at NaN inputs (e.g. qr, eigen, 
>> ...) - maybe you're saying that we should check for NaNs in solve 
>> before proceeding and raising an error?

Bill> However, this problem is in the Apple library not R.

Bill> Bill Northcott

Indeed!

I pretty much entirely agree with your points, Bill, and would
tend to declare that this Apple library is ``broken''
for building a correctly running R.

Let me ask one question I've been wondering about now for a
while:

  Did you run "make check" after building R,
  and "make check" ran to completion without an error?

If yes (which I doubt quite a bit), there *is* a bug in R's
quality control / quality assurance tools -- and I would want to
add a check for the misbehavior you've mentioned.

Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] sub('^', .....) bugs (PR#7742)

2005-03-23 Thread maechler

> "David" == David Forrest <[EMAIL PROTECTED]>
> on Tue, 22 Mar 2005 15:02:20 -0600 (CST) writes:

David> According to help(sub), the ^ should match the
David> zero-length string at the beginning of a string:

yes, indeed.

David> sub('^','var',1:3) # "1" "2" "3"
David> sub('$','var',1:3) # "1var" "2var" "3var"

David> # This generates what I expected from the first case:
David> sub('^.','var',11:13)  # "var1" "var2" "var3"

there are even more fishy things here:

1) In your cases, the integer 'x' argument is auto-coerced to
   character, however that fails as soon as  'perl = TRUE' is used.

 > sub('^','v_', 1:3, perl=TRUE)
 Error in sub.perl(pattern, replacement, x, ignore.case) : 
 invalid argument

 {one can argue that this is not a bug, since the help file asks
  for 'x' to be a character vector; OTOH, we have
  as.character(.) magic in many other places, i.e. quite
  naturally here;  
  at least  perl=TRUE and perl=FALSE should behave consistently.}

2) The 'perl=TRUE' case behaves even more problematically here:

  > sub('^','v_', LETTERS[1:3], perl=TRUE)
  [1] "A\0e" "B\0J" "C\0S"
  > sub('^','v_', LETTERS[1:3], perl=TRUE)
  [1] "A\0J" "B\0P" "C\0J"
  > sub('^','v_', LETTERS[1:3], perl=TRUE)
  [1] "A\0\0" "B\0\0" "C\0m" 
  >

 i.e., the result is random nonsense.

Note that this happens both for R-patched (2.0.1)  and R-devel (2.1.0 alpha).

==> "forwarded" as bug report to R-bugs

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] loadings generic?

2005-03-21 Thread Martin Maechler

> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
> on Sun, 20 Mar 2005 10:37:29 -0500 writes:

PaulG> Can loadings in stats be made generic?

It becomes a (S4) generic automagically when you define an S4 method
for it.  ;-)

{yes, I know this isn't the answer you wanted to hear;
 but really, maybe it's worth considering to use S4 classes and
 methods ?}

For S3, it's a bit uglier, but I think you could still do -- in your
package --

if(!exists("loadings.default", mode="function")) {
  loadings.default <- loadings
  loadings <- function(x, ...) UseMethod("loadings")
}

loadings. <- function(x, ...) {
   . 
}

and S3-export these.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Buglet in install.packages warning message

2005-03-21 Thread Martin Maechler

> "Seth" == Seth Falcon <[EMAIL PROTECTED]>
> on Sun, 20 Mar 2005 18:34:13 -0800 writes:

Seth> I've been experimenting with install.packages and it's
Seth> new ability to track down dependencies from a list of
Seth> repositories and encountered this:

Seth> install.packages(c("foo", "bar"),
Seth> repos="http://cran.r-project.org";,
Seth> dependencies=c("Depends", "Suggests"))

Seth> dependencies 'foo' are not availabledependencies 'bar'
Seth> are not available 

Seth> With the following change (see below) I get what I
Seth> suspect is the intended warning message:

Seth>dependencies 'foo', 'bar' are not available

Indeed.
Thank you Seth! - I've committed your change 
to be in '` R-alpha of 2005-03-22 ''

Apropos: Please, all users of R-2.1.0 (alpha)  {aka "R-devel"}:
  ``keep your eyes open'' for not quite correctly
  formatted error messages, or even other problems in
  error and warning messages.

The large amount of work that was put in (mostly by Prof Brian
Ripley) rationalizing these messages in order to make them
more consistent (for translation, e.g.!) may have lead to a few
typos that are unavoidable when changing those thousand of
lines of code efficiently.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] package.skeleton

2005-03-19 Thread Martin Maechler

Thanks a lot, Jim,

yes, I can confirm the behavior;
clearly a bug in R-devel (only!)

Martin Maechler

>>>>> "JimMcD" == James MacDonald <[EMAIL PROTECTED]>
>>>>> on Fri, 18 Mar 2005 13:28:18 -0500 writes:

>> R.version.string
JimMcD> [1] "R version 2.1.0, 2005-03-17"

JimMcD> I don't see anything in either
JimMcD> https://svn.r-project.org/R/trunk/NEWS or in the
JimMcD> Changes file for R-2.1.0 about changes in
JimMcD> package.skeleton() (nor in the help page), but when
JimMcD> I run this function, all the .Rd files produced are
JimMcD> of the data format even if all I have in my
JimMcD> .GlobalEnv are functions.

JimMcD> A trivial example is to run the examples from the
JimMcD> package.skeleton() help page. I believe there should
JimMcD> be two data type and two function type .Rd files,
JimMcD> but instead they are all of the data type.

JimMcD> Best,

JimMcD> Jim

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Small suggestion for stripchart help

2005-03-15 Thread Martin Maechler

Thank you, Kevin.

I've just committed your improvement (to
 /src/library/graphics/man/stripchart.Rd).

Martin

> "KevinW" == Kevin Wright <[EMAIL PROTECTED]>
> on Mon, 14 Mar 2005 12:39:50 -0800 (PST) writes:

KevinW> I needed to look up information about the 'jitter'
KevinW> argument of stripchart.  When I opened the help page
KevinW> for jitter I found:

KevinW> jitter  when jittering is used, jitter gives the amount of 
jittering applied.

KevinW> which is slightly confusing/self-referential if you
KevinW> are lazy and don't read the entire page.

KevinW> It might be clearer to say

KevinW> jitter when \code{method="jitter"} is used, jitter
KevinW> gives the amount of jittering applied.

KevinW> Just my opinion.  Thanks for listening.

KevinW> Kevin Wright

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Use of htest class for different tests

2005-03-14 Thread Martin Maechler

>>>>> "Torsten" == Torsten Hothorn <[EMAIL PROTECTED]>
>>>>> on Mon, 14 Mar 2005 13:43:32 +0100 (CET) writes:

Torsten> On Sun, 13 Mar 2005, Gorjanc Gregor wrote:
>> Hello!
>> 
>> First of all I must appologize if this has been raised
>> previously, but search provided by Robert King at the
>> University of Newcastle seems to be down these
>> days. Additionally let me know if such a question should
>> be sent to R-help.
>> 
>> I did a contribution to function hwe.hardy in package
>> 'gap' during the weekend. That functions performs
>> Hardy-Weinberg equilibrium test using MCMC. The return of
>> the function does not have classical components for htest
>> class so I was afcourse not successfull in using
>> it. However, I managed to copy and modify some part of
>> print.htest to accomplish the same task.
>> 
>> Now my question is what to do in such cases? Just copy
>> parts of print.htest and modify for each test or anything
>> else. Are such cases rare? If yes, then mentioned
>> approach is probably the easiest.
>> 

Torsten> you can use print.htest directly for the components
Torsten> which _are_ elements of objects of class `htest'
Torsten> and provide your one print method for all
Torsten> others. If your class `foo' (essentially) extends
Torsten> `htest', a simple version of `print.foo' could by

   Torsten>  print.foo <- function(x, ...) {
   Torsten>  
   Torsten> # generate an object of class `htest'
   Torsten> y <- x
   Torsten> class(y) <- "htest"
   Torsten> # maybe modify some thinks like y$method
   Torsten> ...
   Torsten> # print y using `print.htest' without copying code
   Torsten> print(y)
   Torsten>  
   Torsten> # and now print additional information
   Torsten> cat(x$whatsoever)
   Torsten>  
   Torsten>  }

and if you want to really `comply to standards'
you should end your print method with

invisible(x)

Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Re: Packages and Libraries (was: Re: lme4 "package" etc ..)

2005-02-08 Thread Martin Maechler

>>>>> "tony" == A J Rossini <[EMAIL PROTECTED]>
>>>>> on Tue, 8 Feb 2005 13:33:23 +0100 writes:

tony> For OBVIOUS reasons, is there any chance that we could introduce
tony> "package()" and deprecate "library()"?

This idea is not new {as you must surely have guessed}. In fact,
there's a much longer standing proposition of  "usePackage()"
(IIRC, or "use.package()" ?).  However, we (R-core) always had
wanted to also provide a ``proper'' class named "package" 
along with this, but for several reasons didn't get around to it.. yet.

-- I've diverted to R-devel now that we are really talking about
   desired future behavior of R

tony> (well, I'll also ask if we could deprecate "=" for assignment, but
tony> that's hopeless).
:-)

tony> On Tue, 8 Feb 2005 11:49:39 +0100, Martin Maechler
tony> <[EMAIL PROTECTED]> wrote:
>> >>>>> "Pavel" == Pavel Khomski <[EMAIL PROTECTED]>
>> >>>>> on Tue, 08 Feb 2005 10:20:03 +0100 writes:
>> 
Pavel> this is a question, how can i specify the random part
Pavel> in the GLMM-call (of the lme4 library) for compound
Pavel> matrices just in the the same way as they defined in
Pavel> the lme-Call (of the nlme library).
>> 
>> ``twice in such a short paragraph -- yikes !!'' ... I'm getting
>> convulsive...
>> 
>> There is NO lme4 library nor an nlme one !
>> There's the lme4 *PACKAGE* and the nlme *PACKAGE* -- please --
>> 
>> 

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] barplot: space makes beside=F (PR#7668)

2005-02-08 Thread maechler

Hi Ondrej,

can you give a very small *REPRODUCIBLE* example
of R code that worked in R 1.5.1 and doesn't work the same in R
2.0.1.

I know that we made some changes for barplot() on purpose,
documented it, announced it in NEWS, etc, etc.
So I'm sure it's not a bug.  

{ I'm also sure that your
``It's impossible now ''
  must be wrong. R is a full-fledged programming language, and in
  principle everything is possible :-)
}

> "Ondrej" == o medek <[EMAIL PROTECTED]>
> on Mon,  7 Feb 2005 21:03:19 +0100 (CET) writes:

Ondrej> Full_Name: Ondrej Medek
Ondrej> Version: 2.0.1
Ondrej> OS: Linux/Debian Sarge
Ondrej> Submission from: (NULL) (147.32.127.204)

Ondrej> Hi, I had a R version 1.5.1 and I used a 'barplot'
Ondrej> with 'beside=T' and 'space' has been vector of 8
Ondrej> numbers 'space=c(1,0.5,rep(c(0.5,-0.5),3))'. Then I
Ondrej> upgraded to the R 2.0.1 and my graphs are broken. If
Ondrej> I use any vector of more than 2 elements for 'space'
Ondrej> then the graph is drawn as 'beside=F' even if I
Ondrej> specify 'beside=T'.

Ondrej> In the previous version my graph was a graph of
Ondrej> groups of eight bars separated by a big
Ondrej> spaces. Every group consisted of 4 pairs of bars
Ondrej> separated by a small space. It's impossible now.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel daily snapshots

2005-01-26 Thread Martin Maechler

> "Kurt" == Kurt Hornik <[EMAIL PROTECTED]>
> on Tue, 25 Jan 2005 21:57:32 +0100 writes:

> apjaworski  writes:
>> I just noticed that as of January 22, the daily snapshots
>> of the R-devel tree (in
>> ftp://ftp.stat.math.ethz.ch/Software/R/) are only about
>> 1Mb (instead of about 10Mb).  When the January 25 file is
>> downloaded and uncompressed, it seems to be missing the
>> src directory.

Kurt> We are working on this.  Building the daily snapshot
Kurt> for R-devel now requires Makeinfo 4.7, and the system
Kurt> creating the tarball currently only has 4.5 installed.

There's now a new one in
  ftp://ftp.stat.math.ethz.ch/Software/R/

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

2005-01-18 Thread Martin Maechler

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Mon, 17 Jan 2005 22:02:39 +0100 writes:

 >>>>> "GS" == Gordon Smyth <[EMAIL PROTECTED]>
 >>>>> on Sun, 16 Jan 2005 19:55:35 +1100 writes:

  <..>

 GS> 7. The 'n' argument is removed. Setting this argument
 GS> for any methods other than "none" or "bonferroni" make
 GS> the p-values indeterminate, and the argument seems to be
 GS> seldom used.
 GS>  (It isn't used in the R default distribution.) 

that's only any indication it *might* be seldom used...
we really have to *know*, because not allowing it anymore will
break all code calling p.adjust(p, meth, n = *) 

 GS> I think trying to combine this argument with NAs would get you
 GS> into lots of hot water. For example, what does
 GS> p.adjust(c(NA,NA,0.05),n=2) mean?  Which 2 values
 GS> should be adjusted?

The case where n < length(p) should simply give an error which
should bring you into cool water...

MM> I agree that I don't see a good reason to allow specifying 'n'
MM> as argument unless e.g. for "bonferroni".
MM> What do other think ?

no reaction yet.

I've thought a bit more in the mean time:
Assume someone has 10 P values and knows that he
only want to adjust the smallest ones.
Then, only passing the ones to adjust and setting 'n = 10'
can be useful and will certainly work for "bonferroni" but
I think it can't work in general for any other method.

In sum, I still tend to agree that the argument 'n' should be
dropped -- but maybe with "deprecation" -- i.e. still allow it
for 2.1.x giving a deprecation warning.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

2005-01-17 Thread Martin Maechler

> "GS" == Gordon Smyth <[EMAIL PROTECTED]>
> on Sun, 16 Jan 2005 19:55:35 +1100 writes:

GS> I append below a suggested update for p.adjust().  

thank you.

GS> 1. A new method "yh" for control of FDR is included which is
GS> valid for any dependency structure. Reference is
GS> Benjamini, Y., and Yekutieli, D. (2001).  The control of
GS> the false discovery rate in multiple testing under
GS> dependency. Annals of Statistics 29, 1165-1188.

good, thanks!

GS> 2. I've re-named the "fdr" method to "bh" but kept "fdr"
GS> as a synonym for backward compatability.
ok

GS> 3. Upper case values for method "BH" or "YH" are also
GS> accepted.

I don't see why we'd want this.  The S language is
case-sensitive and we don't want to lead people to believe
that case wouldn't matter.

GS> 4. p.adust() now preserves attributes like names for
GS> named vectors (as does cumsum and friends for example).

good point; definitely desirable!!

GS> 5. p.adjust() now works columnwise on numeric
GS> data.frames (as does cumsum and friends).

well, "cusum and friends" are either generic or groupgeneric
(for the "Math" group) -- there's a Math.data.frame group
method.
This is quite different for p.adjust which is not generic and
I'm not (yet?) convinced it should become so.

People can easily use sapply(d.frame, p.adjust, method) if needed; 

In any case it's not in the spirit of R's OO programming to
special case "data.frame" inside a function such as p.adjust

GS> 6. method="hommel" now works correctly even for n=2

ok, thank you (but as said, in R 2.0.1 the behavior was much
more problematic)

GS> 7. The 'n' argument is removed. Setting this argument
GS> for any methods other than "none" or "bonferroni" make
GS> the p-values indeterminate, and the argument seems to be
GS> seldom used. (It isn't used in the R default
GS> distribution.) I think trying to combine this argument
GS> with NAs would get you into lots of hot water. For
GS> example, what does p.adjust(c(NA,NA,0.05),n=2) mean?
GS> Which 2 values should be adjusted?

I agree that I don't see a good reason to allow specifying 'n'
as argument unless e.g. for "bonferroni".
What do other think ?

GS> 8. NAs are treated in na.exclude style. This is the
GS> correct approach for most applications. The only other
GS> consistent thing you could do would be to treat the NAs
GS> as if they all had value=1. But then you would have to
GS> explain clearly that the values being returned are not
GS> actually the correct adjusted p-values, which are
GS> unknown, but are the most conservative possible values
GS> assuming the worst-case for the missing values. This
GS> would become arbitrarily unreasonable as the number of
GS> NAs increases.

I now agree that your proposed default behavior is more sensible
than my proposition.
I'm not sure yet if it wasn't worth to allow for other NA
treatment, like the "treat as if 1" {which my code proposition
was basically doing} or rather mre sophisticated procedure like
"integrating" over all P ~ U[0,1] marginals for each missing
value, approximating the integral possibly by "Monte-Carlo" 
even quasi random numbers.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

2005-01-17 Thread Martin Maechler

> "GS" == Gordon Smyth <[EMAIL PROTECTED]>
> on Sun, 16 Jan 2005 19:44:26 +1100 writes:

GS> The new committed version of p.adjust() contains some
GS> problems:
>> p.adjust(c(0.05,0.5),method="hommel")
GS> [1] 0.05 0.50

GS> No adjustment!

yes, but that's still better than what the current version of 
R 2.0.1 does, namely to give NA NA + two warnings ..

GS> I can't see how the new treatment of NAs can be
GS> justified. One needs to distinguish between NAs which
GS> represent missing p-values and NAs which represent
GS> unknown p-values. In virtually all applications giving
GS> rise to NAs, the NAs represent missing p-values which
GS> could not be computed because of missing data. In such
GS> cases, the observed p-values should definitely be
GS> adjusted as if the NAs weren't there, because NAs
GS> represent p-values which genuinely don't exist.

hmm, "definitely" being a bit strong.  One could argue that
ooonoe should use multiple imputation of the underlying missing
data, or .. other scenarios.

I'll reply to your other, later, more detailed message
separately and take the liberty to drop the other points here...

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

2005-01-08 Thread Martin Maechler

I've thought more and made experiements with R code versions
and just now committed a new version of  p.adjust()  to R-devel
--> https://svn.r-project.org/R/trunk/src/library/stats/R/p.adjust.R
which does sensible NA handling by default and 
*additionally* has an "na.rm" argument (set to FALSE by
default).  The extended 'Examples' secion on the help page
https://svn.r-project.org/R/trunk/src/library/stats/man/p.adjust.Rd
shows how the new NA handling is typically much more sensible
than using "na.rm = TRUE".

Martin 

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Sat, 8 Jan 2005 17:19:23 +0100 writes:

>>>>> "GS" == Gordon K Smyth <[EMAIL PROTECTED]>
>>>>> on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:

MM> <.>

GS> p.adjust() unfortunately gives incorrect results when
GS> 'p' includes NAs.  The results from topTable are
GS> correct.  topTable() takes care to remove NAs before
GS> passing the values to p.adjust().

MM> There's at least one bug in p.adjust(): The "hommel"
MM> method currently does not work at all with NAs (and I
MM> have an uncommitted fix ready for this bug).  OTOH, the
MM> current version of p.adjust() ``works'' with NA's, apart
MM> from Hommel's method, but by using "n = length(p)" in
MM> the correction formulae, i.e. *including* the NAs for
MM> determining sample size `n' {my fix to "hommel" would do
MM> this as well}.

MM> My question is what p.adjust() should do when there are
MM> NA's more generally, or more specifically which `n' to
MM> use in the correction formula. Your proposal amounts to
MM> ``drop NA's and forget about them till the very end''
MM> (where they are wanted in the result), i.e., your sample
MM> size `n' would be sum(!is.na(p)) instead of length(p).

MM> To me it doesn't seem obvious that this setting "n =
MM> #{non-NA observations}" is desirable for all P-value
MM> adjustment methods. One argument for keeping ``n = #{all
MM> observations}'' at least for some correction methods is
MM> the following "continuity" one:

MM> If only a few ``irrelevant'' (let's say > 0.5) P-values
MM> are replaced by NA, the adjusted relevant small P-values
MM> shouldn't change much, ideally not at all.  I'm really
MM> no scholar on this topic, but e.g. for "holm" I think I
MM> would want to keep ``full n'' because of the above
MM> continuity argument.  BTW, for "fdr", I don't see a
MM> straightforward way to achieve the desired continuity.
MM> 5D Of course, p.adjust() could adopt the possibility of
MM> chosing how NA's should be treated e.g. by another
MM> argument ``use.na = TRUE/FALSE'' and hence allow both
MM> versions.

MM> Feedback very welcome, particularly from ``P-value
MM> experts'' ;-)

MM> Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

2005-01-08 Thread Martin Maechler

>>>>> "GS" == Gordon K Smyth <[EMAIL PROTECTED]>
>>>>> on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:

<.>

GS> p.adjust() unfortunately gives incorrect results when
GS> 'p' includes NAs.  The results from topTable are
GS> correct.  topTable() takes care to remove NAs before
GS> passing the values to p.adjust().

There's at least one bug in p.adjust():  The "hommel" method
currently does not work at all with NAs (and I have an
uncommitted fix ready for this bug).
OTOH, the current version of p.adjust() ``works'' with NA's,
apart from Hommel's method, but by using "n = length(p)" in the
correction formulae, i.e. *including* the NAs for determining
sample size `n'  {my fix to "hommel" would do this as well}.

My question is what p.adjust() should do when there are NA's
more generally, or more specifically which `n' to use in the
correction formula. Your proposal amounts to
  ``drop NA's and forget about them till the very end''
  (where they are wanted in the result),
i.e., your sample size `n' would be sum(!is.na(p)) instead of
length(p).

To me it doesn't seem obvious that this setting 
"n = #{non-NA observations}" is desirable for all 
P-value adjustment methods. One argument for keeping
``n = #{all observations}'' at least for some correction
methods is the following  "continuity" one:

If only a few ``irrelevant'' (let's say > 0.5) P-values are
replaced by NA, the adjusted relevant small P-values shouldn't
change much, ideally not at all.  I'm really no scholar on this
topic, but e.g. for "holm" I think I would want to keep ``full
n'' because of the above continuity argument.
BTW, for "fdr", I don't see a straightforward way to achieve the
desired continuity.
5D
Of course, p.adjust() could adopt the possibility of chosing how
NA's should be treated e.g. by another argument ``use.na = TRUE/FALSE''
and hence allow both versions.  

Feedback very welcome, particularly from ``P-value experts'' ;-)

Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Object Memory-limits in base and its help document (PR#7468)

2005-01-04 Thread maechler

>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Tue, 4 Jan 2005 10:51:40 + (GMT) writes:

BDR> On Tue, 4 Jan 2005 [EMAIL PROTECTED] wrote:
>> Full_Name: Shigeru Mase Version: 2.0.1 OS: Linux (Debian)
>> Submission from: (NULL) (222.149.162.192)
>> 
>> 
>> help.search("Mem") shows there is an object
>> "Memory-limits" in the base package.

BDR> No.  It shows there is a *help* document
BDR> "Memory-limits":

BDR>Help files with alias or concept or title matching
BDR> 'Mem' using regular expression matching:

BDR>Memory-limits(base) Memory Limits in R

BDR> Try help("Memory-limits"), and see ?help.

>> But commads "Memory-limits", or "Memory-limits()", causes
>> an error message: Error: Object "Memory" not found. In
>> addition, help(Memory-limits) displays the document for
>> Arithmetic.

BDR> As is should, for that is an arithmetical operation.

well, but it doesn't do that in R 2.0.1 (proper or patched) both
on Debian or Redhat [in a terminal:]

  > help(Memory-limits)
  No documentation for 'Memory - limits' in specified packages and libraries:
  you could try 'help.search("Memory - limits")'

whereas in ESS, it actually does what Shigero Mase expected,
since ESS has it's own {not entirely correct either!} magic
handling of help().

Martin Maechler

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] sorry to have broken R-devel snapshot

2004-12-28 Thread Martin Maechler

I'm sorry to report that I had accidentally broken last night's
R-devel snapshot "R-devel_2004-12-28...". 
If for some reason, you are interested in fixing that manually,
add one "\" at the end of line 649 in file src/main/array.c.

It may have bad consequences for automatic daily builds (with
R-devel only), possibly including the CRAN and Bioconductor
package check results.

Martin Maechler, ETH Zurich

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R's IO speed

2004-12-26 Thread Martin Maechler

> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
> on Sun, 26 Dec 2004 10:03:30 + (GMT) writes:

BDR> R-devel now has some improved versions of read.table
BDR> and write.table.  For a million-row data frame
BDR> containing one number, one factor with few levels and
BDR> one logical column, a 56Mb object.

BDR> generating it takes 4.5 secs.

BDR> calling summary() on it takes 2.2 secs.

BDR> writing it takes 8 secs and an additional 10Mb.

BDR> saving it in .rda format takes 4 secs.

BDR> reading it naively takes 28 secs and an additional
BDR> 240Mb

BDR> reading it carefully (using nrows, colClasses and
BDR> comment.char) takes 16 secs and an additional 150Mb
BDR> (56Mb of which is for the object read in).  (The
BDR> overhead of read.table over scan was about 2 secs,
BDR> mainly in the conversion back to a factor.)

BDR> loading from .rda format takes 3.4 secs.

BDR> [R 2.0.1 read in 23 secs using an additional 210Mb, and
BDR> wrote in 50 secs using an additional 450Mb.]

Excellent!
Thanks a lot Brian (for this and much more)!

I wish you continued merry holidays!
Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No window graphics on new binary installation for Mac OS 10.3 (PR#7428)

2004-12-17 Thread maechler

>>>>> "BillC" == wcleveland  <[EMAIL PROTECTED]>
>>>>> on Fri, 17 Dec 2004 17:00:06 +0100 (CET) writes:

BillC> I loaded the binary for R, and most things work fine.
BillC> The plot command defaults to a postscript file.  If I
BillC> type X11(), I get a message the X11 failed to load.
BillC> However if I type capabilities(what="X11") I get TRUE
BillC> back.

BillC> What do I need to do to get X11 to load properly or
BillC> to get screen plots?

-  quartz() is I think what the mac lovers love.
-  For X11() you need to start an X server on the Mac for which
   there's a button to click.

I know this primarily from reading the R-SIG-Mac mailing list,
a mailing list for (development/testing) of Mac specifics.

Tiag Magalhaes also recently had your problem and solved and
posted a summary / "installation instruction" of it. 
It's in the archives at
 https://stat.ethz.ch/pipermail/r-sig-mac/2004-December/001487.html

BTW:  This is not bug of R (``in our sense'' ;-)

Martin Maechler, ETH Zurich

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Re: [R] Is k equivalent to k:k ?

2004-12-13 Thread Martin Maechler

>>>>> "RichOK" == Richard A O'Keefe <[EMAIL PROTECTED]>
>>>>> on Mon, 13 Dec 2004 10:56:48 +1300 (NZDT) writes:

RichOK> I asked:
>> In this discussion of seq(), can anyone explain to me
>> _why_ seq(to=n) and seq(length=3) have different types?

RichOK> Martin Maechler <[EMAIL PROTECTED]>
RichOK> replied: well, the explantion isn't hard: look at
RichOK> seq.default :-)

RichOK> That's the "efficient cause", I was after the "final
RichOK> cause".  That is, I wasn't asking "what is it about
RichOK> the system which MAKES this happen" but "why does
RichOK> anyone WANT this to happen"?

sure, I did understand you quite well -- I was trying to joke
and used the " :-) " to point the joking ..

MM> now if that really makes your *life* simpler,
MM> what does that tell us about your life ;-) :-)

{ even more " :-) "  !! }

RichOK> It tells you I am revising someone else's e-book
RichOK> about S to describe R.  The cleaner R is, the easier
RichOK> that part of my life gets.

of course, and actually I do agree for my life too, 
since as you may believe, parts of my life *are* influenced by R.

Apologize for my unsuccessful attempts to joking..

RichOK> seq: from, to, by, length[.out], along[.with]

MM> I'm about to fix this (documentation, not code).

RichOK> Please don't.  There's a lot of text out there:
RichOK> tutorials, textbooks, S on-inline documentation, &c
RichOK> which states over and over again that the arguments
RichOK> are 'along' and 'with'.  

you meant
 'along' and 'length'

yes. And everyone can continue to use the abbreviated form as
I'm sure nobody will introduce a 'seq' method that uses
*multiple* argument names starting with "along" or "length"
(such that the partial argument name matching could become a problem).

RichOK> Change the documentation, and people will start
RichOK> writing length.out, and will that port to S-Plus?
RichOK> (Serious question: I don't know.)

yes, as Peter has confirmed already.

Seriously, I think we wouldn't even have started using the ugly
".with" or ".out" appendices, wouldn't it have been for S-plus
compatibility {and Peter has also given the explanation why there
*had* been a good reason for these appendices in the past}.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Re: [R] Is k equivalent to k:k ?

2004-12-10 Thread Martin Maechler

>>>>> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
>>>>> on Fri, 10 Dec 2004 08:38:34 -0500 writes:

Duncan> On Fri, 10 Dec 2004 09:32:14 +0100, Martin Maechler
Duncan> <[EMAIL PROTECTED]> wrote :

RichOK> If you want to pass seq(length=n) to a .C or
RichOK> .Fortran call, it's not helpful that you can't tell
RichOK> what the type is until you know n!  It would be nice
RichOK> if seq(length=n) always returned the same type.  I
RichOK> use seq(length=n) often instead of 1:n because I'd
RichOK> like my code to work when n == 0; it would make life
RichOK> simpler if seq(length=n) and 1:n were the same type.
>> 
>> now if that really makes your *life* simpler, what does that
>> tell us about your life  ;-) :-)
>> 
>> But yes, you are right.  All should return integer I think.

Duncan> Yes, it should be consistent, and integer makes sense here.

the R-devel version now does;  and so does  seq(along = <.>)

Also ?seq {or ?seq.default} now has the value section as

> Value:

>  The result is of 'mode' '"integer"' if 'from' is (numerically
>  equal to an) integer and, e.g., only 'to' is specified, or also if
>  only 'length' or only 'along.with' is specified.

which is correct {and I hope does not imply that it gives *all* cases of
an integer result}.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Re: [R] Is k equivalent to k:k ?

2004-12-10 Thread Martin Maechler

I'm diverting to R-devel,  where this is really more
appropriate.

> "RichOK" == Richard A O'Keefe <[EMAIL PROTECTED]>
> on Fri, 10 Dec 2004 14:37:16 +1300 (NZDT) writes:

RichOK> In this discussion of seq(), can anyone explain to
RichOK> me _why_ seq(to=n) and seq(length=3) have different
RichOK> types?  

well, the explantion isn't hard:  look at  seq.default  :-)

RichOK> In fact, it's worse than that (R2.0.1):

>> storage.mode(seq(length=0))
RichOK> [1] "integer"
>> storage.mode(seq(length=1))
RichOK> [1] "double"

  { str(.) is shorter than  storage.mode(.) }

RichOK> If you want to pass seq(length=n) to a .C or
RichOK> .Fortran call, it's not helpful that you can't tell
RichOK> what the type is until you know n!  It would be nice
RichOK> if seq(length=n) always returned the same type.  I
RichOK> use seq(length=n) often instead of 1:n because I'd
RichOK> like my code to work when n == 0; it would make life
RichOK> simpler if seq(length=n) and 1:n were the same type.

now if that really makes your *life* simpler, what does that
tell us about your life  ;-) :-)

But yes, you are right.  All should return integer I think.

BTW --- since this is now on R-devel where we discuss R development:

 In the future, we really might want to have a new type,
 some "long integer" or "index" which would be used both in R
 and C's R-API for indexing into large objects where 32-bit
 integers overflow.
 I assume, we will keep theR "integer" == C "int" == 32-bit int
 forever, but need something with more bits rather sooner than later.
 But in any, case by then, some things might have to change in
 R (and C's R-API) storage type of indexing.

RichOK> Can anyone explain to me why the arguments of seq.default are
RichOK> "from", "to", "by", "length.out", "along.with"
RichOK> ^
RichOK> when the help page for seq documents them as
RichOK> "from", "to", "by", "length", and "along"?

Well I can explain why this wasn't caught by R's builtin 
QA (quality assurance) checks:

The base/man/seq.Rd page uses  both \synopsis{} and \usage{}
which allows to put things on the help page that are not checked
to coincide with the code...
I'm about to fix this (documentation, not code).

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] wishlist -- names gives slotnames (PR#7410)

2004-12-09 Thread maechler

> "ElizP" == Elizabeth Purdom <[EMAIL PROTECTED]>
> on Thu,  9 Dec 2004 19:28:40 +0100 (CET) writes:

ElizP> Full_Name: Elizabeth Purdom Version: 1.9.1 OS:
ElizP> Windows XP Submission from: (NULL) (171.64.102.199)

ElizP> It would be nice if names(obj) would give slot names
ElizP> as well. Since for many people slots are new, the
ElizP> first thing that happens is you try to access what's
ElizP> in them and can't find how to do it. 

Thank you for your thoughts,..
but

  ``As with everything, use  str() ''   

--- but you need at least R 2.0.0; your R 1.9.1 is too old for
this (and probably, in general for posting to R-bugs !)

E.g.  
  library(stats4)
  example(mle)
  str(fit2)

gives

   Formal class 'mle' [package "stats4"] with 8 slots
 ..@ call : language mle(minuslogl = ll2)
 ..@ coef : Named num [1:2] 3.22 1.12
 .. ..- attr(*, "names")= chr [1:2] "lymax" "lxhalf"
 ..@ fullcoef : Named num [1:2] 3.22 1.12
 .. ..- attr(*, "names")= chr [1:2] "lymax" "lxhalf"

<...>

Now if you don't know much about S4 classes, you see the word
"slot" in the first line of str()'s output and
hopefully try

   help(slot)

This will tell you about  slotNames().

ElizP> If you don't know that slotNames() exists, it can be
ElizP> very frustrating. Moreover, if you don't even know
ElizP> that the objects has slots (or that such things
ElizP> exist), it's extremely confusing. 

I agree that it might be confusing {but do use str() .. }

ElizP> exist), it's extremely confusing. It just looks like
ElizP> nothing is there (you get NULL). 

The same happens if you do  length(..) of an S4 object; that
gives 0;  so at least names() and length() are consistent ;-)

I'm not so sure if inames() should be extended to S4 classes
that way;  in any case if it's done, length() should also
give the same as  length(names()).

I'm CC'ing John Chambers, the masterminder of S4, to make sure we
get his comments on this.

ElizP> If needed, you could have a message that says that
ElizP> these are slots and should be accessed with "@".

It seems you are thinking about list()s and their names.
Note that atomic vector have names too and these are not accessed
with "$" either.  So I wouldn't see the need for such a message.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Problems when printing large R objects

2004-12-06 Thread Martin Maechler

> "Simon" == Simon Urbanek <[EMAIL PROTECTED]>
> on Sun, 5 Dec 2004 19:39:07 -0500 writes:

Simon> On Dec 4, 2004, at 9:50 PM, [EMAIL PROTECTED]
Simon> wrote:
>> Source code leading to crash:
>> 
>> library(cluster)
>> data(xclara)
>> plot(hclust(dist(xclara)))
>> 
>> This leads to a long wait where the application is frozen
>> (spinning status bar going the entire time), a quartz
>> window displays without any content, and then the
>> following application crash occurs:

Simon> Please post this to the maintainers of the cluster
Simon> library (if at all),

Well, this is a *package*, not a library {please, please!}

And really, that has nothing to do with the 'cluster' package
(whose maintainer I am), as David only uses its data set.
hclust() and dist() are in the standard 'stats' package.

Btw, this can be accomplished more cleanly, i.e., without
attaching "cluster", by 

  data(xclara, package = "cluster")

Simon> this has nothing to do
Simon> with the GUI (behaves the same in X11).  The above
Simon> doesn't make much sense anyway - you definitely want
Simon> to use cutree before you plot that dendogram ...

Indeed!  

A bit more explicitly for David:
xclara has 3000 observations, 
i.e. 3000*2999/2 ~= 4.5 Mio distances {i.e., a bit more than 36
MBytes to keep in memory and about 48 mio characters to display
when you use default options(digits=7)}.
I don't think you can really make much of printing these many
numbers onto your console as you try with

David> dist(xclara) -> xclara.dist

David> Works okay, though when attempting to display those results it 
freezes  
David> up the entire system, probably as  the result of memory  
David> threshing/starvation if the top results are any indicator:

David> 1661 R   8.5%  9:36.12   392   567   368M+ 3.88M   350M-  828M

"freezes up the entire system"  when trying to print something
too large actually has something to do with user interface.
AFAIK, it doesn't work 'nicely' on any R console,
but at least in ESS on Linux, it's just that one Emacs,
displaying the "wrist watch" (and I can easily tell emacs "not
to wait" by pressing Ctrl g").  Also, just trying it now {on a
machine with large amounts of RAM}: After pressing return, it at
least starts printing (displaying to the *R* buffer) after a bit
more than 1 minute.. and that does ``seem'' to never finish.
I can signal a break (via the [Signals] Menu or C-c C-c in
Emacs), and still have to wait about 2-3 minutes for the output
stops; but it does, and I can work on.. {well, in theory; my Emacs
seems to have become v..e..r..y  s...l...ow}  We only
recently had a variation on this theme in the ESS-help mailing
list, and several people were reporting they couldn't really
stop R from printing and had to kill the R process...

So after all, there's not quite a trivial problem "hidden" in
David's report :  What should happen if the user accidentally
wants to print a huge object to console... how to make sure R
can be told to stop.

And as I see it now, there's even something like an R "bug" (or
"design infelicity") here:

I've now done it again {on a compute server Dual-Opteron with 4
GB RAM}:  After stopping, via the ESS [Signals] [Break (C-c C-c)] menu,
   Emacs stops immediately, but R doesn't return quickly,
and rather, watching "top" {the good ol' unix process monitor} I
see R using 99.9% CPU and it's memory footage ("VIRT" and 
"SHR") increasing and increasing..., upto '1081m', a bit more
than 1 GB, when R finally returns (displays the prompt) after
only a few minutes --- but then, as said, this is on a remote
64bit machine with 4000 MB RAM.

BTW, when I then remove the 'dist' (and hclust) objects in R,
and type  gc(),
(or maybe do some other things in R; the R process has about
halfed its apparent memory usage to 500something MB.  

more stats: 
 during printing:  798 m
 after "break"  :  798, for ~5 seconds, then starting to
   grow; slowly (in my top, in steps of ~ 10m)
   upto 1076m
 then the R prompt is displayed and top shows "1081m".

It stays there , until I do  
   > gc()
where it goes down to VIRT 841m (RES 823m)
and after removing the large distance object, and gc() again,
it lowers to 820m (RES 790m) and stays there.

Probably this thread should be moved to R-devel -- and hence I
crosspost for once.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] write.table inconsistency (PR#7403)

2004-12-04 Thread Martin Maechler

>>>>> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
>>>>> on Sat, 04 Dec 2004 09:17:26 -0500 writes:

Duncan> On Sat, 4 Dec 2004 13:51:55 +0100, Martin Maechler
Duncan> <[EMAIL PROTECTED]> wrote:

>>>>>>> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]> on
>>>>>>> Sat, 4 Dec 2004 01:55:26 +0100 (CET) writes:
>>
Duncan> There's an as.matrix() call in write.table that
Duncan> means the formatting of numeric columns changes
Duncan> depending on whether there are any non-numeric
Duncan> columns in the table or not.
>>  yes, I think I had seen this (a while ago in the source
>> code) and then wondered if one shouldn't have used
>> data.matrix() instead of as.matrix() - something I
>> actually do advocate more generally, as "good programming
>> style".  It also does solve the problem in the example
>> here -- HOWEVER, the lines *before* as.matrix() have
>> 
>> ## as.matrix might turn integer or numeric columns into a
>> complex matrix cmplx <- sapply(x, is.complex)
>> if(any(cmplx) && !all(cmplx)) x[cmplx] <-
>> lapply(x[cmplx], as.character) x <- as.matrix(x)
>> 
>> which makes you see that write.table(.) should also work
>> when the data frame has complex variables {or some other
>> kinds of non-numeric as you've said above} -- something
>> which data.matrix() can't handle  As soon as you have
>> a complex or a character variable (together with others)
>> in your data.frame, as.matrix() will have to return
>> "character" and apply format() to the numeric variables,
>> as well...
>> 
>> So, to make this consistent in your sense,
>> i.e. formatting of a column shouldn't depend on the
>> presence of other columns, we can't use as.matrix() nor
>> data.matrix() but have to basically replicate an altered
>> version of as.matrix inside write.table.
>> 
>> I propose to do this, but expose the altered version as
>> something like as.charMatrix(.)
>> 
>> and replace the 4 lines {of code in write.table()} above
>> by the single line as.charMatrix(x)

Duncan> That sounds good.  Which version of the formatting
Duncan> would you choose, leading spaces or not?  My
Duncan> preference would be to leave off the leading spaces,

mine too, very strong preference, actually:

The behavior should be such that each column is formatted  
  ___ as if it was the only column of that data frame ___

Duncan> in the belief that write.table is usually used for
Duncan> data storage rather than data display, but it is
Duncan> sometimes used for data display (e.g. in
Duncan> utils::upgrade.packageStatus, which would not be
Duncan> affected by your choice).

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] write.table inconsistency (PR#7403)

2004-12-04 Thread maechler

> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Sat,  4 Dec 2004 01:55:26 +0100 (CET) writes:

Duncan> There's an as.matrix() call in write.table that means the formatting
Duncan> of numeric columns changes depending on whether there are any
Duncan> non-numeric columns in the table or not.  

yes, I think I had seen this (a while ago in the source code)
and then wondered if one shouldn't have used
   data.matrix()  instead of  as.matrix()   
- something I actually do advocate more generally, as "good
programming style".  It also does solve the problem in the
example here -- HOWEVER, the lines *before* as.matrix() have

## as.matrix might turn integer or numeric columns into a complex matrix
cmplx <- sapply(x, is.complex)
if(any(cmplx) && !all(cmplx)) x[cmplx] <- lapply(x[cmplx], as.character)
x <- as.matrix(x)

which makes you see that  write.table(.) should also work when
the data frame has complex variables {or some other kinds of
non-numeric as you've said above} -- something which
data.matrix() can't handle
As soon as you have a complex or a character variable (together
with others) in your data.frame,  as.matrix() will have to
return "character" and apply format() to the numeric variables, as well...

So, to make this consistent in your sense, i.e. formatting of a
column shouldn't depend on the presence of other columns, we
can't use as.matrix() nor data.matrix() but have to basically
replicate an altered version of as.matrix inside write.table.

I propose to do this, but expose the altered version as
something like
   as.charMatrix(.)

and replace the 4 lines {of code in write.table()} above by the
single line
   as.charMatrix(x)

--
Martin

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] regex to match word boundaries

2004-12-01 Thread Martin Maechler

> "Gabor" == Gabor Grothendieck <[EMAIL PROTECTED]>
> on Wed,  1 Dec 2004 21:05:59 -0500 (EST) writes:

Gabor> Can someone verify whether or not this is a bug.

Gabor> When I substitute all occurrence of "\\B" with "X" R
Gabor> seems to correctly place an X at all non-word
Gabor> boundaries (whether or not I specify perl) but "\\b"
Gabor> does not seem to act on all complement positions:

>> gsub("\\b", "X", "abc def") # nothing done
Gabor> [1] "abc def"
>> gsub("\\B", "X", "abc def") # as expected, I think
Gabor> [1] "aXbXc dXeXf"
>> gsub("\\b", "X", "abc def", perl = TRUE) # not as
>> expected
Gabor> [1] "abc Xdef"
>> gsub("\\B", "X", "abc def", perl = TRUE) # as expected
Gabor> [1] "aXbXc dXeXf"
>> R.version.string # Windows 2000
Gabor> [1] "R version 2.0.1, 2004-11-27"

I agree this looks "unfortunate".

Just to confirm: 
1) I get the same on a Linux version
2) the real perl does behave differently and as
   you (and I) would have expected:

 $ echo 'abc def'| perl -pe 's/\b/X/g'
 XabcX XdefX
 $ echo 'abc def'| perl -pe 's/\B/X/g'
 aXbXc dXeXf


Also, from what I see, "\b" should behave the same independently
of perl = TRUE or FALSE.

--
Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] a better "source(echo=TRUE)" {was "....how to pause...."}

2004-11-30 Thread Martin Maechler

> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Sun, 28 Nov 2004 10:25:24 -0500 writes:

Duncan> <>
Duncan> <>

Duncan> We already have code to source() from the clipboard, and it could
Duncan> address the problems above, but:

Duncan> - Source with echo=T doesn't echo, it deparses, so some comments are
Duncan> lost, formatting is changed, etc.

yes, and we would have liked to have an alternative "source()"
for a *very* long time...
Examples where I "hate" the non-echo (i.e. the loss of all
comments and own-intended formatting) is when you use it for
demos, etc, notably in R's own  demo() and example() functions.

But to do this might be more tricky than at first thought:
Of course you can readLines() the source file and writeLines()
them to whatever your console is. The slightly difficult thing
is to "see" which junks to ``send to R'' , i.e. to parse() and eval().
The basic problem seems to see when expressions are complete.

Maybe we should / could think about enhancing parse() {or a new
function with extended behavior} such that it would not only
return the parse()d expressions, but also indices (byte or even
line counters) to the source text, indicating where each of the
expression started and ended.

That way I could see a way to proceed.

Martin

Duncan> <>
Duncan> <>

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] pausing between plots - waiting for graphics input

2004-11-30 Thread Martin Maechler

{I have changed the subject to match this interesting side thread}

> "TL" == Thomas Lumley <[EMAIL PROTECTED]>
> on Mon, 29 Nov 2004 09:15:27 -0800 (PST) writes:

TL> On Sun, 28 Nov 2004, Duncan Murdoch wrote:
>> 
>> Another that deals only with the original graphics problem is to have
>> par(ask=T) wait for input to the graphics window, rather than to the
>> console.  This has the advantage that the graphics window probably has
>> the focus, so a simple Enter there could satisfy it.
>> 

TL> I like this one.  I have often found it irritating that
TL> I have to switch the focus back to the console (which
TL> means uncovering the console window) to get the next
TL> graph.

I agree. 
Note that this is not windows-specific really.  Rather, this
should be applicable to all devices which support minimal mouse
interaction, i.e. at least those that support locator(),
ideally just all those listed in  dev.interactive

However, I'm not at all sure that this should be done with  
par(ask = TRUE)  which works on all devices, not just
interactive ones.
Rather, we probably should a new par() {and gpar() for grid !}
option for the new feature,
maybe something like [g]par(wait_mouseclick = TRUE)

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] \link{} to help pages in Debian

2004-11-29 Thread Martin Maechler

> "Iago" == Iago Mosqueira <[EMAIL PROTECTED]>
> on Mon, 29 Nov 2004 08:41:03 + writes:

Iago> Hello,
Iago> In my Debian 3.0 systems, packages are installed in two different
Iago> places, namely /usr/lib/R/library and /usr/local/lib/R/site-library,
Iago> depending on whether they come from debian packages or CRAN ones. Help
Iago> pages for my own packages, installed in the second location, cannot 
find
Iago> help pages from, for example, the base package via \link{}. I also 
tried
Iago> specifying the package with \link[pkg]{page}.

Iago> Is the only solution to force the system to use a single library 
folder?

not at all!
We have been working with several libraries "forever",
and I think I have n't seen your problem ever.

For instance, I never install extra packages into the "standard"
library (the one where "base" is in); have all CRAN packages in
one library, bioconductor in another library, etc,etc.

Iago> Can I force the help system to look in both places?

Actually you forgot to specify which interface to the help system
you are using.  But I assume you mean the help.start()
{webbrowser-"HTML"} one (which I very rarely use, since ESS and
"C-c C-v" is faster; to  follow links in ESS help buffers, after
selection, often "h"  " is sufficient -- ah reminds me
of an ESS improvement  I've wanted to implement...)

For me, help.start() works fine including links between pages
from packages in different libraries.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: (PR#7393) Re: [Rd] dhyper() does not allow non-integer values for

2004-11-25 Thread maechler

> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
> on 24 Nov 2004 18:32:15 +0100 writes:

PD> [EMAIL PROTECTED] writes:
>> > > dhyper() does not allow non-integer values for input
>> parameters m and n.
>> >
>> > this is in contrast to the other functions in the
>> _hyper() "family",
>> 
>> I would argue that the bug was in the other functions. If
>> not, there is a= =20 bug in the documentation, which
>> gives no way to tell what the result=20 should mean for
>> non-integer m, n, k.

PD> My initial reaction too (and surely it is not a bug that
PD> functions behave inconsistently in regions where they
PD> are not documented to work at all), but on the other
PD> hand, noninteger m,n do appear to give a well-defined
PD> distribution, and perhaps there's a way of making sense
PD> of it? I wouldn't think it corresponds to noncentral
PD> hypergeometric distributions.

I'd tend to pretty much agree here.

Incidentally (slightly related, but prompted by something completely different),
just these hours I've been extending
 choose(r, k)
to work not just for integer but all positive 'n'.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Building Packages on Windows using .Rbuildignore (PR#7379)

2004-11-18 Thread maechler

> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Thu, 18 Nov 2004 16:36:03 +0100 (CET) writes:

Duncan> On Thu, 18 Nov 2004 00:38:47 + (UTC), Gabor
Duncan> Grothendieck <[EMAIL PROTECTED]> wrote :

>> DIFFERENCE BETWEEN USING .RBUILDIGNORE AND NOT
>> 
>> The reason that the processing is different according to
>> whether one uses .Rbuildignore or not is that R CMD build
>> takes the .Rbuildignore file into account but R CMD
>> install R CMD check R CMD build --binary do not take
>> .Rbuildignore into account.

Duncan> Okay, now I understand.  I think I'd call the last
Duncan> of those a bug, and it would seem to me that the
Duncan> install and check scripts should also respect this
Duncan> directive.  I've now copied this to the r-bugs list.

Duncan> (This was reported for Windows; I don't know if it
Duncan> applies to other platforms as well.)

Yes it does (*), but I think it is NOT a bug but a feature,
at least for "check" and "install" (*) 
and very much desired in some cases :

For instance, the package developer may want more
regression tests in /tests/ :

1) Have extra *.Rout.save files that are architecture
   dependent and hence not for general distribution of the
   package, but important for the package developer in order to
   assure that functionality doesn't change when the package is
   extended, reorganized, 

2) Have more  tests/*.R  files  that take a long time to run.
   Time that the package developer wants to spend, but doesn't
   dare to put on the daily CRAN or Bioconductor test suites.

3) similarly for vignettes

4) similar issues for experimental  R/*.R files  or man/*.Rd
   files for these.

One I thing that would be quite useful and would even solve
Gabor's problem: 
The introduction of a new command line switch, say "--build-ignore",
to the commands 'R CMD check' and 'R CMD install'

(*) I do agree that  "R CMD build --binary" probably really should
follow the ".Rbuildignore" file ``directives'' if it doesn't
currently.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Row labels are skewed in 'heatmap' (PR#7358)

2004-11-11 Thread maechler

Hi Peter, thank you for trying,
but that's not yet a useful bug report:

We can't reproduce what you did!

If you put your data on a web site,
and replace
   exp <- read.table("graph/1933672048.cluster.data")
by exp <- read.table("http://<your_adress>/mycluster.data")

it will hopefully become reproducible and thus potentially
useful.

Though even that is doubtful since you are using an outdated
version of R, and I now that heatmap() has been improved since then!

Martin Maechler

>>>>> "pfh" == pfh  <[EMAIL PROTECTED]>
>>>>> on Thu, 11 Nov 2004 14:01:23 +0100 (CET) writes:

pfh> I've made a script look like this:
pfh> exp <- read.table("graph/1933672048.cluster.data")
pfh> exp <- as.matrix(exp) 
pfh> postscript("graph/1933672048.cluster.data.ps") 

pfh> 
heatmap(exp,scale="none",cexCol=0.4,cexRow=0.2,col=custom,margins=c(5,5))

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] StructTS (PR#7353)

2004-11-09 Thread maechler

>>>>> "Claus" == Claus Dethlefsen <[EMAIL PROTECTED]>
>>>>> on Tue,  9 Nov 2004 08:39:22 +0100 (CET) writes:

Claus> Dear R-bugs

(whom did you mean? :-)

Claus> I have been studying the StructTS function (in
Claus> package 'stats') and functions supplied with it. I
Claus> think I have found a few minor bugs in the
Claus> documentation.

Claus> I am referring to the version of StructTS supplied with the release 
R 2.0.0.

Claus> 1) In the help page under "fixed" it is explained (file:
Claus> R-2.0.0/src/library/stats/man/StructTS.Rd): 

Claus> "If supplied, only non-\code{NA} entries in
Claus> \code{fixed} will be varied."

Claus> Either I misunderstand this or there may be a bug in
Claus> the documentation. As I understand it, setting fixed
Claus> to the value, eg. c(NA,0,NA,NA) sets the variance of
Claus> the slope component to 0 and varies the other
Claus> variances to find the maximum likelihood
Claus> estimates. If I am right, I suggest the docs changed
Claus> to: "If supplied, only \code{NA} entries in
Claus> \code{fixed} will be varied."  ("non-" has been
Claus> deleted).

yes. thank you.  You are entirely correct.

Claus> E.g. try the following after invoking R:

R> StructTS(log10(UKgas), type = "BSM", fixed=c(0,NA,NA,NA))

Claus> Observe, that the first variance is fixed to zero,
Claus> while the remaining three are varied to obtain the
Claus> maximum likelihood estimates.

Claus> 2) There is a minor typo in the explanation of 'fitted' - a 
parenthesis is
Claus> missing. I suggest changing
Claus> "(that is at time \eqn{t} and not at the end of the series."
Claus> to
Claus> "(that is at time \eqn{t} and not at the end of the series)."

indeed, thank you.

Claus> 3) In the documentation, one of the values returned in the list is 
called
Claus> "convergence", while in the code, it is called "code". I suggest 
either to
Claus> change the documentation from
Claus> "\item{convergence}{the value returned by \code{\link{optim}}.}"
Claus> to
Claus> "\item{code}{the value returned by \code{\link{optim}}.}"

Claus> or to change the code.

Changing the code has the potential to break other code relying on
StrucTS()   {and we are in deep-freeze for R 2.0.1}
--> I'm changing the documentation.

Claus> Thank you for supplying a great set of functions.

and thank you for the bug report!
Martin Maechler, ETH Zurich

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] boxplot() defaults {was "boxplot in extreme cases"}

2004-11-08 Thread Martin Maechler


AndyL> Try:

AndyL> x <- list(x1=rep(c(0,1,2),c(10,20,40)), x2=rep(c(0,1,2),c(10,40,20)))
AndyL> boxplot(x, pars=list(medpch=20, medcex=3))

AndyL> (Cf ?bxp, pointed to from ?boxplot.)

Good! Thank you, Andy.

However,
this is not the first time it had crossed my mind that R's
default settings of drawing boxplot()s are not quite ok -- and
that's why I've diverted to R-devel.

Keeping Tufte's considerations in mind, (and me not really wanting
to follow S-plus), shouldn't we consider to slightly change R's
boxplot()ing such that

   boxplot(list(x1=rep(c(0,1,2),c(10,20,40)), x2=rep(c(0,1,2),c(10,40,20

will *not* give too identically looking boxplots?
Also, the median should be emphasized more by default anyway.
{The lattice function  bwplot() does it by only drawing a large
 black ball as in Andy's example (and not drawing a line at all)}

One possibility I'd see is to use a default 'medlwd = 3'
either in boxplot() or in bxp(.) and hence, what you currently get by

   boxplot(list(x1=rep(c(0,1,2),c(10,20,40)), x2=rep(c(0,1,2),c(10,40,20))),
   medlwd=3)

would become the default plotting in boxplot().
Of course a smaller value "medlwd=2" would work too, but I'd
prefer a bit more (3).

Martin


> From: Erich Neuwirth
> 
> I noticed the following:
> the 2 datasets
> rep(c(0,1,2),c(10,20,40)) and
> rep(c(0,1,2),c(10,40,20))
> produce identical boxplots despite the fact that the medians are 
> different. The reason is that the median in one case 
> coincides with the
> first quartile, and in the second case with the third quartile.
> Is there a recommended way of displaying the median visibly in these 
> cases? Setting notch=TRUE displays the median, but does look strange.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] idea (PR#7345)

2004-11-05 Thread Martin Maechler


DanB> I really don't understand the negative and condescending
DanB> culture that seems to pervade R-dev.

It's pervading in replies to *Bug reports* about non-bugs!!
I thought you had read in the mean time what R bug reports
should be and that the things you have been posting as bug
reports were posted **WRONGLY**.

PLEASE:  
  1) All these suggestions were perfectly fit to be posted to R-devel
  2) All of them were completely NOT fit to be sent as bug reports

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug report (PR#7341)

2004-11-05 Thread maechler

>>>>> "Dan" == Dan Bolser <[EMAIL PROTECTED]>
>>>>> on Fri, 5 Nov 2004 12:32:39 + (GMT) writes:

Dan> On Fri, 5 Nov 2004, Martin Maechler wrote:
>>>>>>> "dan" == dan <[EMAIL PROTECTED]> on Thu, 4 Nov 2004
>>>>>>> 19:08:08 +0100 (CET) writes:
>>
dan> Full_Name: Dan B Version: na OS: na Submission from:
dan> (NULL) (80.6.127.185)
>>
dan> I can't log into the bug tracker (I can't find where to
dan> register / login).
>>  [that's not what you should do.

Dan> OK, I just wanted to post a 'follow up' to the original
Dan> 'bug'.

I see:  In this case, just use e-mail and make sure to keep
the *original* 'PR#<...>' in the 'Subject:' line of the mail.
As Brian Ripley has once remarked, it may make sense to move the
'PR#' to the beginning of the subject line since that is
often wrapped when it becomes somewhat long.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug report (PR#7341)

2004-11-05 Thread Martin Maechler

>>>>> "dan" == dan  <[EMAIL PROTECTED]>
>>>>> on Thu,  4 Nov 2004 19:08:08 +0100 (CET) writes:

dan> Full_Name: Dan B Version: na OS: na Submission from:
dan> (NULL) (80.6.127.185)

dan> I can't log into the bug tracker (I can't find where to
dan> register / login).

[that's not what you should do. 
 Have you read on this in the FAQ or  help(bug.report) ?
 Please, please, do.
]

dan> In this way I can't add the following context diff
dan> (hopefully in the right order) for my changes to the
dan> matrix.Rd...

dan> Hmm... I guess this should be a separate report
dan> anyway...

No, this is really not a bug report __AT ALL__

You had all this long discussion about how the documentation 
can/could/should/{is_hard_to} be improved  and end up sending
a *bug report* ?
Really!

Whereas I value your contribution for improving the matrix help
page -- and I do think both changes are worthwhile ---
there is no bug, and hence a bug report is *WRONG*!

Sending this to R-devel [instead! - not automagically via the
bug report] would have been perfectly fine and helpful...

dan> The first diff explains how the dimnames list should
dan> work, and the second diff gives an example of using the
dan> dimnames list. (no equivelent example exists, and where
dan> better than the matrix man page to show this off).

agreed.

I'll put in a version of your proposed improvement,
but please do try more to understand what's appropriate for bug
reports.

Regards,
Martin Maechler, ETH Zurich

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Getting the bare R version number {was "2.0.1 buglets"}

2004-11-05 Thread Martin Maechler

> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
> on Thu, 04 Nov 2004 20:26:04 -0500 writes:


>> If you want the R version, that is 'R --version'.
>> 
>> 
PaulG> I've been using this, but to make it into a file name
PaulG> I need to do some stripping out of the extra
PaulG> stuff. (And tail and head on Solaris do not use -n
PaulG> whereas Linux wants this, so it becomes difficult to
PaulG> be portable.) Is there a way to get just "R-2.0.1"
PaulG> from R or R CMD something?

yes, by applying Brian's advice and good ol' "sed" :

  R --version | sed -n '1s/ /-/; 1s/ .*//p'

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 2.0.1 buglets

2004-11-05 Thread Martin Maechler

> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
> on 04 Nov 2004 23:17:45 +0100 writes:

PD> Prof Brian Ripley <[EMAIL PROTECTED]> writes:
>> On Thu, 4 Nov 2004, Paul Gilbert wrote:
>> 
>> > With R-2.0.1 (patched) on Linux rhlinux 2.4.21-4.ELsmp
>> > 
>> > when I configure get > ...  > checking whether C
>> runtime needs -D__NO_MATH_INLINES... no > checking for
>> xmkmf... /usr/bin/X11/xmkmf > Usage: which [options] [--]
>> programname [...]  > Options: --version, -[vV] Print
>> version and exit successfully.  > --help, Print this help
>> and exit successfully.  > --skip-dot Skip directories in
>> PATH that start with a dot.  > ...
>> > 
>> > but everything seems to configure and make ok. Should
>> this message be > expect or is this a bug?
>> 
>> It is unexpected.  Is it new in 2.0.1 beta?  You have
>> told us your kernel, not your distro.  This looks like a
>> bug, but not in R.

PD> I've seen it whizz by occasionally but never got around
PD> to investigate.

me too. IIRC, also in some of my current Linux setups.
I think it's showing unfortunate behavior of configure ..
(ie. a "buglet" in the configure tools used to produce 'configure',
 and not in our 'configure.ac' source).

PD>  As said, it doesn't actually affect the result of configure.

my experience as well.
Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] pgamma discontinuity (PR#7307)

2004-10-25 Thread maechler

>>>>> "Morten" == Morten Welinder <[EMAIL PROTECTED]>
>>>>> on Mon, 25 Oct 2004 12:04:08 -0400 (EDT) writes:

Morten> A little code study, formula study and experimentation reveals that the
Morten> situation is mostly fixable:

Morten> 1. Get rid of the explicit alpha limit.  (A form of
Morten> it is implicit in (2) and (3) below.)

Morten> 2. Use the series formula when

Morten> (x < alph + 10 && x < 0.99 * (alph + 1000))

Morten> This guarantees that the sum converges reasonably
Morten> fast.  (For extremely large x and alpha that will
Morten> take about -53/log2(0.99) iterations for 53
Morten> significant bits, i.e., about 3700 iterations.)

Morten> 3. Use the continued fraction formula when

Morten> (alph < x && alph - 100 < 0.99 * x)

Morten> Aka, you don't want to use the formula either near
Morten> the critical point where alpha/x ~ 1 unless the
Morten> numbers are small.

Morten> 4a. Go to a library and figure out how Temme does it
Morten> for alpha near x, both large.  In this case the 0.99
Morten> from above could probably be lowered a lot for
Morten> faster convergence.

Morten> or

Morten> 4b. Use the pnorm approximation.  It seems to do a
Morten> whole lot better for alpha near x than it did for
Morten> the 10:1 case I quoted.

Morten> Comments, please.

Hi Morten,
thanks a lot for your investigation.

I have spent quite a few working days exploring pgamma() and
also some alternatives.  The discontinuity is "well know".  I
vaguely remember my conclusions were a bit different - at least
over all problems (not only the one mentioned), it needed more
fixing. I think I had included Temme's paper (and others) in my study.

But really, I'm just talking from the top of my head; I will
take the time to look into my old testing scripts and
alternative C code; but just not this week (release of
bioconductor upcoming; other urgencies).

I'll be glad to also communicate with you offline on this topic
{and about pbeta() !}.  But "just not now".

Martin Maechler, ETH Zurich

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: OT: Debian/Ubuntu on amd64 (Re: [Rd] 64 Bit)

2004-10-22 Thread Martin Maechler

> "Dirk" == Dirk Eddelbuettel <[EMAIL PROTECTED]>
> on Fri, 22 Oct 2004 09:45:05 -0500 writes:

Dirk> On Fri, Oct 22, 2004 at 12:24:56PM +0100, Prof Brian
Dirk> Ripley wrote:
>> If you want a prebuilt version you are out of luck except
>> for Debian Linux on Alpha or ia64, from a quick glance.

Dirk> amd64 as well. It is not "fully official" but almost,
Dirk> see http://www.debian.org/ports/amd64/

yes, indeed.
We have one compute server (dual opteron) that runs a nice
64-bit Debian {"sid" aka "unstable" though} and for which I've used
'aptitude' (the "new" kid on the block replacement for 'apt-get')
to install r-base-recommended (and more) -- all prebuilt
[Of course I still mainly work with hand-compiled versions of R].

Dirk> <..>
Dirk> <..>

(that note about "Ubuntu" was very interesting to read;  thanks Dirk!)

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] error in plot.dendrogram (PR#7300)

2004-10-21 Thread maechler

>>>>> "Witold" == Witold Eryk Wolski <[EMAIL PROTECTED]>
>>>>> on Thu, 21 Oct 2004 16:52:26 +0200 writes:

Witold> Hi,
Witold> First. If you should not do it like you write here. You will get an 
Witold> error loading the rda file. It is a binary format.

Witold> Error in load(dFile) : input has been corrupted, with LF replaced by CR

no. I don't get such an error. 

Witold> Hence, you should specify mode="wb" for downloading binary formats.

ok, at least to be on the safe side.
{Though I *never* had the need till now, and it's not the first
 time I download.file( "./*.rda" ) .}

Witold> Try this code.

Witold> file.remove("hres.rda")

Witold> dFile <- paste(getwd(),"/hres.rda",sep="")
Witold> if(!file.exists(dFile))
Witold> download.file(url ="http://www.molgen.mpg.de/~wolski/hres.rda";, dest= 
dFile,mode="wb")
Witold> load(dFile)

Witold> hresd
Witold> plot(hresd)

Witold> In one think you are right, the X axis. But there is an ERROR.

Witold> On my Machine (see previous mail) not the complete dendrogram is drawn. 
Witold> It just draws the first 100 leavs out of 380 when I run
Witold> Second.  Even if I increase the size of the window maximally the 
Witold> dendrogram is not plotted.
Witold> And I checked it just again on a linux box. And it does not work there 
Witold> either.

well, it does work on mine...
(as I said, I quickly saw how unuseful single-linkage is for
 this data/dissimilarity ..)

ok, ok, -- now I see why:

I "just knew" that for largish dendrograms one had to do
something like

  options(expressions = 1)

before anything reasonable can happen.. ;-)

So, yes, I finally see your problem.

The real R bug there is that no error message is *printed*,
for me at least, but the recursive function calls to plotNode()
just bail out.

If you do 
   str(hresd)
you at least get an error message that helps you find out about
options(expressions = ..)...

Witold> How the dendrogram should look like you can figure out looking at the 
Witold> following.

Witold> hclustObj <- paste(getwd(),"/hress.rda",sep="")
Witold> if(!file.exists(dFile))
Witold> download.file(url ="http://www.molgen.mpg.de/~wolski/hress.rda";, dest= 
hclustObj ,mode="wb")
Witold> load(dFile)

(well, that doesn't work since you have replaced "dFile" by "hclustObj"
 in some places but not all.
 [[[ why on earth do you call a file name an object ?]]]
)

Witold> hress
Witold> plot(hress)

Witold> hressd<-as.dendrogram(hress)
Witold> plot(hressd)

Witold> Third. I have expected this comment about if it is meaningfull or not.
Witold> First for what I need it is it meaningfull. Second it is a valid 
Witold> dendrogram generated by as.dendrogram from a vaild hclust object. I do 
Witold> not need a plot routine which teaches me what she thinks is meaningfull 
Witold> or not.

agreed on that.

Note that originally in your bug report,
you were telling about a missing x-axis and that set off the
whole track since, as we now agree, it's not about an x-axis at all...

Witold> Martin Maechler wrote:

>>>>>>> "Eryk" == Eryk Wolski <[EMAIL PROTECTED]>
>>>>>>> on Thu, 21 Oct 2004 13:41:29 +0200 (CEST) writes:
>>>>>>> 
>>>>>>> 
>> 
Eryk> Hi,
>> 
Eryk> hres <- hclust(smatr,method="single")
Eryk> hresd<-as.dendrogram(hres)
Eryk> as.dendrogram(hres)
Eryk> `dendrogram' with 2 branches and 380 members total, at height 2514.513 
Eryk> plot(hresd,leaflab="none") #<-error here.
>> 
>> definitely no error here.. maybe your graphic window is too
>> small or otherwise unable to show all the leaf labels?
>> 
Eryk> #the plotted dendrogram is incomplete. The x axis is not drawn.
>> 
>> ha! and why should this be a bug
>> Have you RTFHP and looked at its example??
>> There's never an x-axis in such a plot!
>> 
>> [You really don't want an x-axis overlayed over all the labels]
>> 
Eryk> #The interested reader can download the
>> 
Eryk> save(hresd,file="hres.rda")
>> 
Eryk> #from the following loacation
Eryk> www.molgen.mpg.de/~wolski/hres.rda
>> 
>> If you send a bug report (and please rather don't..),
>> it should be reproducible, i.e., I've just

[Rd] untrace() failing {when "methods" are active} (PR#7301)

2004-10-21 Thread maechler

This happens in R-2.0.0 (or R-patched or R-devel of this night):

> trace(axis)
> untrace(axis)
Error in .assignOverBinding(what, newFun, whereF, global) : 
Object "newFun" not found

> traceback()
6: .assignOverBinding(what, newFun, whereF, global)
5: methods::.TraceWithMethods(axis, where = , untrace = TRUE)
4: eval(expr, envir, enclos)
3: eval(expr, p)
2: eval.parent(call)
1: untrace(axis)

---
I've quickly tried to run R without "methods" and there,
untrace() doesn't fail {inside untrace(), the other if()-branch
is used}.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] error in plot.dendrogram (PR#7300)

2004-10-21 Thread Martin Maechler

>>>>> "Eryk" == Eryk Wolski <[EMAIL PROTECTED]>
>>>>> on Thu, 21 Oct 2004 13:41:29 +0200 (CEST) writes:

Eryk> Hi,

Eryk> hres <- hclust(smatr,method="single")
Eryk> hresd<-as.dendrogram(hres)
Eryk> as.dendrogram(hres)
Eryk> `dendrogram' with 2 branches and 380 members total, at height 2514.513 
Eryk> plot(hresd,leaflab="none") #<-error here.

definitely no error here.. maybe your graphic window is too
small or otherwise unable to show all the leaf labels?

Eryk> #the plotted dendrogram is incomplete. The x axis is not drawn.

ha! and why should this be a bug
Have you RTFHP and looked at its example??
There's never an x-axis in such a plot!

[You really don't want an x-axis overlayed over all the labels]

Eryk> #The interested reader can download the

Eryk> save(hresd,file="hres.rda")

Eryk> #from the following loacation
Eryk> www.molgen.mpg.de/~wolski/hres.rda

If you send a bug report (and please rather don't..),
it should be reproducible, i.e., I've just wasted my time for 

dFile <- "/u/maechler/R/MM/Pkg-ex/stats/wolski-hres.rda"
if(!file.exists(dFile))
download.file(url ="http://www.molgen.mpg.de/~wolski/hres.rda";, dest= dFile)
load(dFile)
hresd
plot(hresd)

If you look at this plot I hope you rather see that "single" has
been an extremly unuseful clustering method for this data / dissimilarities,
and you'd rather tried other methods than to which for an
x-axis.

If you really want one (just to see that it doesn't make sense),
you can always add
axis(1, fg = "red")

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] splinefun bug (PR#7290)

2004-10-15 Thread maechler

>>>>> "kjetil" == kjetil halvorsen <[EMAIL PROTECTED]>
>>>>> on Fri, 15 Oct 2004 18:49:09 +0200 (CEST) writes:

kjetil> The following reliably bombs R, or at least rw2000 (windows XP):

kjetil> test <- splinefun(vector(length=0), vector(length=0))
kjetil> test(1)

yes, that's a reproducible bug, thank you.

The embarassing thing is that we've fixed this bug already for 
approxfun() -- and some of us know that  spline* and approx*
have originally been very much in parallel...

BTW: No need to give the function a name: S can nicely work with
 anonymous functions. Consequently, you can get the same effect from

splinefun(vector(length=0), vector(length=0)) (1)

or, even a bit more compactly:

splinefun(1[0], 1[0]) (1)

Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bug in power.t.test( ) (PR#7245)

2004-09-24 Thread maechler

>>>>> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
>>>>> on Fri, 24 Sep 2004 08:43:56 +0200 (CEST) writes:

UweL> [EMAIL PROTECTED] wrote:
>> Full_Name: Mai Zhou Version: 1.9.1 OS: Win XP
>> Professional Submission from: (NULL) (12.222.227.93)
>> 
>> 
>> 
>>> power.t.test(n=25, delta=0.1, sig.level=1.1,
>>> strict=TRUE, type="one.sample")
>> 
>> 
>> One-sample t test power calculation
>> 
>> n = 25 delta = 0.1 sd = 1 sig.level = 1.1 power =
>> 1.088311 alternative = two.sided
>> 
>> ### power can never be over one!  Of course, sig.level
>> should not take value > 1 ### either.  ### Possible
>> solution: A check in the input to truncate sig.level into
>> [0, 1]??

UweL> Well, an error (or at least warning) message seems to
UweL> be more appropriate rather than silently changing some
UweL> values, e.g. somehwere at the top of the functions
UweL> body:

UweL>  if(any(sig.level < 0 | sig.level > 1))
UweL>   stop("sig.level must be in [0,1]")

yes, in principle;
thank you, Uwe!

Since  sig.level can also be NULL (which works with the way you
constructed the test - on purpose?), I'd use the test a bit
differently.

BTW, did you know that e.g., sig.level or delta can be *vectors*
giving vectorized results - at least in some cases...
That will hopefully leed to more documentation / code updates, 
but not for 2.0.0 I presume.

Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] rw2000dev: problems with library(foreign)

2004-09-24 Thread Martin Maechler

> "Kjetil" == Kjetil Brinchmann Halvorsen <[EMAIL PROTECTED]>
> on Fri, 24 Sep 2004 10:10:39 -0400 writes:

Kjetil> I get the following
>> library(foreign)
Kjetil> Error in namespaceExport(ns, exports) : undefined
Kjetil> exports: write.foreign Error in library(foreign) :
Kjetil> package/namespace load failed for 'foreign'

Kjetil> with rw2000dev as of (2004-09-17

Does
  > system.file(package="foreign")

give the same initial path as
  > system.file(package="base")

?

If yes, I cannot help further;
if no,  this explains the problem:  you're picking up a wrong
version of the  foreign package.

Regards,
Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Cannot build cluster_1.9,6 under R 2.0.0 beta Sep 21

2004-09-24 Thread Martin Maechler

> "Dirk" == Dirk Eddelbuettel <[EMAIL PROTECTED]>
> on Thu, 23 Sep 2004 21:31:50 -0500 writes:

Dirk> And another follow-up -- this may well be related to
Dirk> cluster as mgcv-1.1.2 builds fine.

Well, thanks, Dirk, for all these.
As maintainer of cluster, I should be interested..

But then, I just see I did successfully "R CMD check cluster" on 
Sep 21 (your snapshot's date) on several Linux platforms,
and have just now tried to install from the local Sep.24 snapshot
i.e. ftp://ftp.stat.math.ethz.ch/Software/R/R-devel_2004-09-24.tar.gz
(which needs  'tools/rsync-recommended' after unpacking) without
a problem.

I'll try the /src/base-prerelease/R-2.0.0-beta-20040924.tgz 
one, subsequently.

Are you sure it's not a problem just with your copy of
something?

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] algorithm reference for sample()

2004-09-24 Thread Martin Maechler

Hi Vadim,

>>>>> "Vadim" == Vadim Ogranovich <[EMAIL PROTECTED]>
>>>>> on Thu, 23 Sep 2004 17:48:45 -0700 writes:

Vadim> Hi, Don't know if it belongs to r-devel or r-help,
Vadim> but since I am planning to alter some of R's internal code

i.e., you will propose a patch to the R sources eventually ?

Vadim>  I am sending it here.

good choice.  Also, since you are talking about
internal (non-API) C code from R - which I would deem
inappropriate on R-help.

Vadim> The existing implementation of the sample() function,
Vadim> when the optional 'prob' argument is given, is quite
Vadim> inefficient. The complexity is O(sampleSize *
Vadim> universeSize), see ProbSampleReplace() and
Vadim> ProbSampleNoReplace() in random.c. This makes the
Vadim> function impractical for the vector sizes I use. 

I'm interested: What problem are you solving where sample() is
the bottleneck (rather than what you *do* with the sample ..)

Vadim> I want to re-code these functions and I "think" I can
Vadim> come up with a more efficient algorithm.

I agree. It's a kind of table lookup, that definitely can be
made faster e.g. by bisection ideas.

Vadim> However before I go and reinvent the wheel I wonder if there
Vadim> is a published description of an efficient sampling
Vadim> algorithm with user-specified probabilities?

I've got some ideas, but maybe would first want to get a reply to the
current ones.

Vadim> Thanks, Vadim

Vadim>  [[alternative HTML version deleted]]
^^^^

(you *did* read the posting guide or just the general
 instructions on http://www.R-project.org/mail.html  ?
)

Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] attaching to position 1

2004-09-23 Thread Martin Maechler

>>>>> "PatBurns" == Patrick Burns <[EMAIL PROTECTED]>
>>>>> on Wed, 22 Sep 2004 18:30:10 +0100 writes:

PatBurns> If an attempt is made to attach to position 1, it
PatBurns> appears to work (not even a warning) but in fact
PatBurns> it doesn't work as many would expect.  "search"
PatBurns> thinks that it gets placed in position 2, but
PatBurns> nothing is actually there (according to "ls").

PatBurns> This is guaranteed to be confusing (and annoying)
PatBurns> to people who are used to attaching to position 1
PatBurns> in S-PLUS.

yes; thanks for bringing this up!

PatBurns> I'm not clear on all of the implications of
PatBurns> changing this, but my first inclination would be
PatBurns> to make it an error to attach to position 1.  The
PatBurns> help file says that you can't do it.

and has done so for a long time AFAIR.

PatBurns> At the very least there should be a warning .  My
PatBurns> guess is that it is rare for someone to attach to
PatBurns> position 1 and not attempt to modify what is being
PatBurns> attached.

Hence (together with the arguments further above),
I think an error would be more appropriate
[if there's only a warning and the user's code continues on 
 the wrong assumption, more problems lay ahead].

OTOH, in the current "beta" phase I can think of a case where an
error would be too "hard": 
The worst I can see is an R script that has attach(*, pos=1)
which doesn't attach at all {as you say, it *seems* to attach to
position 2 but doesn't really provide the object}, but for some
reason still continues to produce reasonable things.

Hene, for 2.0.0 in "deep freeze", I'd propose to give a warning only.
However, we wouldn't the database' to search()[2]  "seemingly" only,
and this could be a problem if a user's script does a detach(..) later.
I.e., we should attach() to pos=2 *properly* (instead of
"seemingly") only.

At the latest for 2.1.0, we should rather make the warning an error.

In any case, this looks like a very simple fix (to the C
source);

Martin Maechler

>> attach('foo.RData')
>> search()
PatBurns> [1] ".GlobalEnv""file:foo.RData""package:methods" 
PatBurns> [4] "package:stats" "package:graphics"  "package:grDevices"
PatBurns> [7] "package:utils" "package:datasets"  "Autoloads"   
PatBurns> [10] "package:base"
>> ls(2)
PatBurns> [1] "jj"
>> jj
PatBurns> [1] 1 2 3 4 5 6 7 8 9
>> detach()
>> search()
PatBurns> [1] ".GlobalEnv""package:methods"   "package:stats"   
PatBurns> [4] "package:graphics"  "package:grDevices" "package:utils"   
PatBurns> [7] "package:datasets"  "Autoloads" "package:base"
>> attach('foo.RData', pos=1)
>> search()
PatBurns> [1] ".GlobalEnv""file:foo.RData""package:methods" 
PatBurns> [4] "package:stats" "package:graphics"  "package:grDevices"
PatBurns> [7] "package:utils" "package:datasets"  "Autoloads"   
PatBurns> [10] "package:base"
>> ls(2)
PatBurns> character(0)

PatBurns> _  
PatBurns> platform i386-pc-mingw32
PatBurns> arch i386   
PatBurns> os   mingw32
PatBurns> system   i386, mingw32  
PatBurns> status   Under development (unstable)
PatBurns> major2  
PatBurns> minor0.0
PatBurns> year 2004   
PatBurns> month09 
PatBurns> day  17 
PatBurns> language R

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] "Namespace dependencies not required" message

2004-09-21 Thread Martin Maechler

> "GB" == Göran Broström <[EMAIL PROTECTED]>
> on Mon, 20 Sep 2004 22:28:08 +0200 writes:

GB> On Mon, Sep 20, 2004 at 04:05:52PM -0400, Warnes,
GB> Gregory R wrote:
>>  I'm still working to add namespace support to some of my
>> packages. I've removed the 'Depends' line from the
>> DESCRIPTION file, and created an appropriate NAMESPACE
>> files.
>> 
>> Strangely, whenever I use 'importFrom(package, function)'
>> R CMD check generates "Namespace dependencies not
>> required" warnings .  Without the import statements, the
>> warning does not occur, but the code tests fail with the
>> expected object not found errors.
>> 
>> This occurs with both R 1.9.1 and R 2.0.0.
>> 
>> So, should are these errors just bogus and I just ignore
>> these errors, or is there something I've done wrong with
>> the NAMESPACE or elsewhere?

GB> I had the same problem. I think you must keep the
GB> 'Depends' field in DESCRIPTION file.

yes, definitely.

And since you are (rightly, thank you!) working with 2.0.0-beta,
please consider
   http://developer.r-project.org/200update.txt

which mentions more things on 'Depends:', 'Suggests:' etc.
Also, don't forget to use 'Writing R Extensions' of 2.0.0.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

RE: [Rd] R-2.0.0 Install problem for pkg bundle w inter-dependent namespaces

2004-09-20 Thread Martin Maechler

>>>>> "Greg" == Warnes, Gregory R <[EMAIL PROTECTED]>
>>>>> on Mon, 20 Sep 2004 15:10:32 -0400 writes:

    >> -Original Message- From: Martin Maechler
>> [mailto:[EMAIL PROTECTED]
Greg> [...]  So, what is the proper way to handle this?  Is
Greg> there some way to manually specify the package install
Greg> order?
>>  Well, isn't the order in the 'Contains:' field of the
>> bundle DESCRIPTION file used?  If not, please consider
>> sending patches for src/scripts/INSTALL.in
>> 

Greg> OK, that's the simple thing that I had been
Greg> overlooking. Changing the the Contains line to provide
Greg> the packages in the order that they should be
Greg> installed fixed the problem.

Greg> May I suggest that the significance of the ordering in
Greg> the Contains: field be added to the (extremely brief)
Greg> description of in "Writing R Extensions"?

Greg> Perhaps change the text to:

Greg>   The 'Contains' field lists the packages, which
Greg> should be contained in separate subdirectories with
Greg> the names given.  During buiding and installation,
Greg> packages will be installed in the order specified.  Be
Greg> sure to order this list so that dependencies are
Greg> appropriately met.

Greg>   The packages contained in a bundle are standard
Greg> packages in all respects except that the 'DESCRIPTION'
Greg> file is replaced by a 'DESCRIPTION.in' file which just
Greg> contains fields additional to the 'DESCRIPTION' file
Greg> of the bundle, for example ...

Good idea, thank you!
I just committed this (with a typo corrected).

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Namespace problem

2004-09-20 Thread Martin Maechler

>>>>> "GB" == Göran Broström <[EMAIL PROTECTED]>
>>>>> on Mon, 20 Sep 2004 11:00:57 +0200 writes:

GB> On Mon, Sep 20, 2004 at 10:43:44AM +0200, Martin Maechler wrote:
>> >>>>> "GB" == Göran Broström <[EMAIL PROTECTED]>
>> >>>>> on Sun, 19 Sep 2004 18:51:49 +0200 writes:

GB> [...]

GB> I've checked that section, but I am not adding methods to generics, 
>> 
>> sure? 
>> Aren't you trying to export  mlreg.fit
>> which looks like a 'fit' S3 method for the 'mlreg' generic?

GB> But it isn't. I just have found '.' to be a convenient separator in
GB> variable names, since '_' (my C favourite) wasn't available. So what you
GB> are suggesting

no!! I'm not.

GB> that I have to change all the variable names with dots in
GB> them. Or add 'S3metod(...' for each of them. I guess that the former is
GB> preferable.  

no, really neither should be required.

We do encourage not using "." for new function names because of
the reason above, but it's definitely not a requirement.
In the case where  'foo'  is an S3 generic function name,
we however recommend quite strongly not to use 
   'foo.bar'
as function name since it looks "too much" like an S3 method.
Is this the case for you?

GB> But how is this problem connected to using C/Fortran code?

only via "namespace magic".

E.g., for packages with namespaces and R 2.0.0, 
 it' will become recommended  to *NOT* use the 'PACKAGE = "foobar"'
 argument to .C(.) or .Fortran() calls because then, the package
 version can be taken into account,
since NEWS for 2.0.0 has

>> C-LEVEL FACILITIES
>> 
>> oThe PACKAGE argument for .C/.Call/.Fortran/.External can be
>>  omitted if the call is within code within a package with a
>>  namespace.  This ensures that the native routine being called
>>  is found in the DLL of the correct version of the package if
>>  multiple versions of a package are loaded in the R session.
>>  Using a namespace and omitting the PACKAGE argument is
>>  currently the only way to ensure that the correct version is
>>  used.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-2.0.0 Install problem for pkg bundle w inter-dependent namespaces

2004-09-20 Thread Martin Maechler

>>>>> "Greg" == Warnes, Gregory R <[EMAIL PROTECTED]>
>>>>> on Fri, 17 Sep 2004 14:18:29 -0400 writes:

Greg> I have a revised version of the gregmisc package,
Greg> which I've converted into a package bundle each of
Greg> which has a namespace: gplots, gmodels, gdata,
Greg> gtoools.  Of course, there are interdependencies among
Greg> these namespaces:

Greg> gsun374: /tmp [11]> cd gregmisc/
Greg> gsun374: gregmisc [12]> grep import */NAMESPACE
Greg>   gdata/NAMESPACE:importFrom(gtools, odd, invalid, mixedsort)
Greg> gmodels/NAMESPACE:importFrom(MASS, ginv)
Greg>  gplots/NAMESPACE:importFrom(gtools, invalid)
Greg>  gplots/NAMESPACE:importFrom(gtools, odd)
Greg>  gplots/NAMESPACE:importFrom(gdata, nobs)

since nobody else has answered yet (and a considerable portion
of R-core is traveling this week) :

If I understand correctly, your basic package 'gtools' and the
dependency you need is

 gplots --> gdata --> gtools
\->/

Have you made sure to use the proper  'Depends: ' entries in
the DESCRIPTION(.in) files of your bundle packages ?

This works fine if the packages are *not* in a bundle, right?

Greg> Under R-1.9.1, this package bundle passes R CMD check
Greg> and installs happily.  However, under yesterday's
Greg> R-2.0.0-alpha, the package fails to install (& hence
Greg> pass CMD CHECK) with the error

Greg> ** preparing package for lazy loading
Greg> Error in loadNamespace(i[[1]], c(lib.loc, .libPaths()), keep.source)
Greg> : 
Greg> There is no package called 'gdata'
Greg> Execution halted
Greg> ERROR: lazy loading failed for package 'gplots'

Greg> because the gdata package is the last in the bundle to
Greg> be installed, so it is not yet present.

Greg> So, what is the proper way to handle this?  Is there
Greg> some way to manually specify the package install order?

Well, isn't the order in the 'Contains:' field of the bundle
DESCRIPTION file used?  
If not, please consider sending patches for  
src/scripts/INSTALL.in

There are not too many bundles AFAIK, and conceptually
(inspite of the recommended VR one) the improved package
management tools that we (and the bioconductor project) have
been adding to R for a while noe
really aim for "R package objects" and clean version /
dependency handling of inidividual packages in many different concepts.

If bundle installation etc could rely entirely on the package
tools, bundles would "work automagically".  But probably, for
this a bundle would have to be treated as a "package repository"
which it isn't currently AFAIK.

Regards,
Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Namespace problem

2004-09-20 Thread Martin Maechler

>>>>> "GB" == Göran Broström <[EMAIL PROTECTED]>
>>>>> on Sun, 19 Sep 2004 18:51:49 +0200 writes:

GB> Now I try to add some C and Fortan code to my package, so the NAMESPACE
GB> file is

GB> useDynLib(eha)
GB> importFrom(survival, Surv)
GB> export(mlreg.fit, risksets)

GB> but I get 

GB> .
GB> * checking R files for library.dynam ... OK
GB> * checking S3 generic/method consistency ... WARNING
GB> Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc, 
character.only = TRUE, verbose = FALSE) :
GB> package/namespace load failed for 'eha'
GB> Execution halted
GB> See section 'Generic functions and methods' of the 'Writing R Extensions'
GB> manual.
GB> .

GB> I've checked that section, but I am not adding methods to generics, 

sure? 
Aren't you trying to export  mlreg.fit
which looks like a 'fit' S3 method for the 'mlreg' generic?

In that case you need to add
S3method(mlreg, fit)
to the NAMESPACE file.

GB> I'm not writing new generic functions.

GB> If I remove useDynLib(eha), I get no errors or warnings, except that the
GB> example in mlreg.fit.Rd doesn't work (of course).

GB> So what's wrong?

(see above)?

Regards,
Martin

Martin Maechler <[EMAIL PROTECTED]> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   <><

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Covergence FLAG in glm (PR#7235)

2004-09-18 Thread maechler

>>>>> "daniel" == daniel jeske <[EMAIL PROTECTED]>
>>>>> on Sat, 18 Sep 2004 04:10:40 +0200 (CEST) writes:

daniel> Full_Name: Daniel R Jeske Version: 1.8.1 OS: Windows
daniel> 2000 Submission from: (NULL) (138.23.228.79)

daniel> We have just noticed that when you use glm() it
daniel> seems the logical output 'converged' is always TRUE.
daniel> The same data set that shows FALSE in version 1.7.1
daniel> shows TRUE in 1.8.1.  And I know that FALSE is the
daniel> correct answer...so it seems like we cannot trust
daniel> the 'converged' flag for glm() in version 1.8.1
daniel> ?

I'm pretty sure you're wrong:

The  NEWS  file for 1.8.0  contains

>>oThe defaults for glm.control(epsilon=1e-8, maxit=25) have been
>> tightened: this will produce more accurate results, slightly slower.

Hence, compared to 1.7.1, glm() in your (still very outdated!!)
version 1.8.1 will by default use more iterations and may well
converge in cases it didn't in 1.7.1 

If there weren't such cases, the change wouldn't have been
worth!

---

Note that even if you were right, your bug report would be
basically unusable as bug report and we ask to   
PLEASE only send bug reports (to [EMAIL PROTECTED] that is) if you know
how to do it -- i.e. if you have read that section in the R FAQ.

Regards,
Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cor() fails with big dataframe

2004-09-16 Thread Martin Maechler

> "Mayeul" == Mayeul KAUFFMANN <[EMAIL PROTECTED]>
> on Thu, 16 Sep 2004 01:23:09 +0200 writes:

Mayeul> Hello,
Mayeul> I have a big dataframe with *NO* na's (9 columns, 293380 rows).

Mayeul> # doing
Mayeul> memory.limit(size = 10)
Mayeul> cor(x)
Mayeul> #gives
Mayeul> Error in cor(x) : missing observations in cov/cor
Mayeul> In addition: Warning message:
Mayeul> NAs introduced by coercion

"by coercion" means there were other things *coerced* to NAs!

One of the biggest problem with R users (and other S users for
that matter) is that if they get an error, they throw hands up
and ask for help - assuming the error message to be
non-intelligible.  Whereas it *is* intelligible (slightly ? ;-)
more often than not ...

Mayeul> #I found the obvious workaround:
Mayeul> COR <- matrix(rep(0, 81),9,9)
Mayeul> for (i in 1:9) for (j in 1:9) {if (i>j) COR[i,j] <- cor (x[,i],x[,j])}
Mayeul> #which works fine, with no warning

Mayeul> #looks like a "cor()" bug.

quite improbably.

The following works flawlessly for me
and the only things that takes a bit of time is construction of
x, not cor():

  > n <- 30
  > set.seed(1)
  > x <- as.data.frame(matrix(rnorm(n*9), n,9))
  > cx <- cor(x)
  > str(cx)
   num [1:9, 1:9]  1.0 -0.00039  0.00113  0.00134 -0.00228 ...
   - attr(*, "dimnames")=List of 2
..$ : chr [1:9] "V1" "V2" "V3" "V4" ...
..$ : chr [1:9] "V1" "V2" "V3" "V4" ...

Mayeul> #I checked absence of NA's by
Mayeul> x <- x[complete.cases(x),]
Mayeul> summary(x)
Mayeul> apply(x,2, function (x) (sum(is.na(x

Mayeul> #I use R 1.9.1

What does
sapply(x, function(u)all(is.finite(u)))
return ?

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] is it a typo?

2004-09-14 Thread Martin Maechler

> "AndyL" == Liaw, Andy <[EMAIL PROTECTED]>
> on Tue, 14 Sep 2004 10:28:31 -0400 writes:

AndyL> In ?options:
AndyL> 'warning.expression': an R code expression to be called if a
AndyL> warning is generated, replacing the standard message.  If
AndyL> non-null is called irrespective of the value of option
AndyL> 'warn'.

AndyL> Is there a missing `it' between `non-null' and `is'?

yes, thank you -- now fixed.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] reorder [stats] and reorder.factor [lattice]

2004-09-13 Thread Martin Maechler

> "DeepS" == Deepayan Sarkar <[EMAIL PROTECTED]>
> on Mon, 13 Sep 2004 14:54:52 -0500 writes:

DeepS> Before it's too late for R 2.0.0, do we have a final decision yet on 
DeepS> having a reorder method for "factor" in stats?

Since the thread is quite a bit old, (and I have been in vacation back then),
could you summarize what you think about it?

When skimming through the thread I got the impression that, yes,
it was worth to "centralize" such a method in 'stats' rather
than have different slightly incompatible versions in different
other packages.
This is of tangential interest to me as I have been slightly
involved with reorder.dendrogram()

Regards,
Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] author field in Rd files

2004-08-27 Thread Martin Maechler

Since nobody else has reacted yet:

>>>>> "Timothy" == Timothy H Keitt <[EMAIL PROTECTED]>
>>>>> on Wed, 25 Aug 2004 16:53:39 -0500 writes:

Timothy> I noticed in the extension manual that the
Timothy> \author{} entry should refer to the author of the
Timothy> Rd file and not the code documented. I had always
Timothy> interpreted it as the author of the code, not the
Timothy> documentation. I wonder if others also find this
Timothy> ambiguous.

I tend to agree with you.  Very often the author means both the
author of the R object and the help page.
In the few other cases, for me, I was the help page author
(rather than the other way around) and I think I usually have
done what you suggest:  Showed the author of the code and
sometimes also mentioned myself (as docu-author), but typically
only if I had also improved on the code.

Timothy> Its generally not an issue, except when there is a
Timothy> third party writing documentation. It looks like
Timothy> they wrote all the code. Would it make sense to
Timothy> have two entries, one for the documentation author
Timothy> and one for the code author if different?

I think in such a case \author{..} should contain both the code
and documentation authors.
In a package with many help pages, a possibility is also to
 specify  \author{..} and \references{} in only a few help
pages and for the others, inside the   \seealso{...}   section
have a sentence pointing to the main help page(s), such as
\seealso{ 
  ..
  For references etc, \code{\link{}}.
}

Regards,
Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] No is.formula()

2004-08-26 Thread Martin Maechler

> "tony" == A J Rossini <[EMAIL PROTECTED]>
> on Wed, 25 Aug 2004 14:33:23 -0700 writes:

tony> "Warnes, Gregory R"
tony> <[EMAIL PROTECTED]> writes:
>> There appears to be no "is.formula()" function in
>> R-1.9.1.  May I suggest that
>> 
>> is.formula <- function(x) inherits(x, "formula")
>> 
>> be added to base, since formula is a fundimental R type?

tony> why not just

tony> is(x,"formula")
tony> ?

because the latter needs the methods package and base functions
must work independently of "methods".

The question is what  "fundamental R type" would be exactly.
But I tend to agree with Greg, since formulae are constructed
via the .Primitive '~' operator.

Apropos, I believe we should move the  is.primitive function
from "methods" to "base".

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible Latex problem in R-devel?

2004-08-23 Thread Martin Maechler

> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
> on Mon, 23 Aug 2004 21:14:44 +0100 (BST) writes:

BDR> On Mon, 23 Aug 2004, Jeff Gentry wrote:
>> > What version of perl?
>> 
>> Ack, didn't realize it was this ancient.  Version 5.005_03, which is what
>> comes with FreeBSD 4.9 apparently.  I did install the /usr/ports version
>> of perl is 5.8.2, although there seems to be other problems here (which
>> are most likely related to my system, will track that down before bringing
>> that issue up - it appears to be a mismatch on my libraries between the
>> two versions).

BDR> Yes, that version of perl has a lot of bugs, but in theory we support it.
BDR> (It seems worse than either 5.004 or 5.005_04.)

>> > print $latexout &latex_link_trans0($blocks{"name"});
>> > will probably solve it for you.
>> 
>> Yup, this works, thanks.

BDR> I've changed the sources, to be defensive.  I don't like reading Perl
BDR> like that, but it does work more portably.

I'm glad for the change.
Our redhat enterprise version of perl (5.8.0) also couldn't deal
with the other syntax.

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] legend's arguments "angle", "density" & "lwd" (PR#7023)

2004-08-23 Thread Martin Maechler

>>>>> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
>>>>> on Fri, 20 Aug 2004 19:44:40 +0200 writes:

UweL> Martin Maechler wrote:
>>>>>>> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
>>>>>>> on Fri, 20 Aug 2004 17:01:13 +0200 writes:
>> 
UweL> Paul Murrell wrote [on 2002-03-14 with Subject: "filled bars with 
UweL> patterns" in reply to Arne Mueller]
>> 
>> >> Hi
>> >> 
>> >> 
>> >> 
>> >>> I'd also like to have the filled boxes in the legend to be striped. The
>> >>> legend function has a 'density' attribute, but unfortunately this does't
>> >>> seem to do anything
>> >>> 
>> >>> following the above example
>> >>> 
>> >>> legend(3.4, 5, c('Axx','Bxx','Cxx','Dxx'), fill = c('red', 'blue',
>> >>> 'green', 'orange'))
>> >>> 
>> >>> is the same as
>> >>> 
>> >>> legend(3.4, 5, c('Axx','Bxx','Cxx','Dxx'), density=10,
>> >>>fill = c('red', 'blue', 'green', 'orange'),
>> >>>density=c(10,-1,20, 200))


>> >> This appears to be a bug.  Can you file a bug report for this please?

UweL> [SNIP; I cannot find any related bug report in the repository]

UweL> I'm just reviewing bug reports and other stuff re. legend() and found 
UweL> this old message in one of my Next-Week-To-Do-folders.
>> 
UweL> Well, the point mentioned above is not really a bug,
UweL> because one has to specify BOTH arguments, angle AND
UweL> density in legend(). Is there any point not to make
UweL> angle = 45 the default, as it already is for polygon()
UweL> and rect()?

MM> This seems like a good idea,
MM> but we'll wait for your many other patches to legend.R and
MM> legend.Rd   :-)

UweL> Just three rather than many issues I'm trying to address, the third one 
UweL> is just closing a bug report. ;-)
UweL> Here the two suggested patches in merged form.

UweL> Uwe

<... snipping Uwe's patches  .>

This has now lead to more:
I've just added to NEWS (and the C and R sources of course)

o   plot.xy(), the workhorse function of points, lines and plot.default
now has 'lwd' as explicit argument instead of implicitly in "...",
and now recycles lwd where it makes sense, i.e. for line based plot
symbols.

such that Uwe's proposed new argument to legend(), pt.lwd is
also recycled and now can default to 'lwd', the line width of
the line segments in legend().
Hence, Leslie's original feature request (PR#7023) of June 25
is now also fulfilled using only 'lwd' and not both 'lwd' and 'pt.lwd'.
I.e., the following now works

x <- 1:10 ; y <- rnorm(10,10,5) ; y2 <- rnorm(10,8,4)
plot(x, y, bty="l", lwd=3, type="o", col=2, ylim = range(y,y2), cex=1.5)
points(x, y2, lwd=3, lty=8, col=4, type="o", pch=2, cex=1.5)
legend(10, max(y, y2), legend=c("Method 1","Method 2"),  
   col=c(2, 4), lty=c(1, 8), pch=c(1, 2), 
   xjust=1, yjust=1, pt.cex=1.5, lwd=3)

[note that I've used 'ylim = range(y,y2)' which is slightly better than
 'ylim = c(min(y,y2),max(y,y2))']

With thanks to Uwe!
Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] legend's arguments "angle" and "density"

2004-08-20 Thread Martin Maechler

> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
> on Fri, 20 Aug 2004 17:01:13 +0200 writes:

UweL> Paul Murrell wrote [on 2002-03-14 with Subject: "filled bars with 
UweL> patterns" in reply to Arne Mueller]

>> Hi
>> 
>> 
>> 
>>> I'd also like to have the filled boxes in the legend to be striped. The
>>> legend function has a 'density' attribute, but unfortunately this does't
>>> seem to do anything
>>> 
>>> following the above example
>>> 
>>> legend(3.4, 5, c('Axx','Bxx','Cxx','Dxx'), fill = c('red', 'blue',
>>> 'green', 'orange'))
>>> 
>>> is the same as
>>> 
>>> legend(3.4, 5, c('Axx','Bxx','Cxx','Dxx'), density=10, fill = c('red',
>>> 'blue', 'green', 'orange'),
>>> density=c(10,-1,20, 200))
>> 
>> 
>> 
>> This appears to be a bug.  Can you file a bug report for this please?

UweL> [SNIP; I cannot find any related bug report in the repository]


UweL> I'm just reviewing bug reports and other stuff re. legend() and found 
UweL> this old message in one of my Next-Week-To-Do-folders.

UweL> Well, the point mentioned above is not really a bug, because one has to 
UweL> specify BOTH arguments, angle AND density in legend(). Is there any 
UweL> point not to make angle = 45 the default, as it already is for polygon() 
UweL> and rect()?

This seems like a good idea,
but we'll wait for your many other patches to legend.R and
legend.Rd   :-)

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] text(x, y, labels) - recycling problems and RFC (PR#7084)

2004-08-20 Thread maechler

>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Thu, 19 Aug 2004 17:50:13 +0100 (BST) writes:

BDR> On Thu, 19 Aug 2004 [EMAIL PROTECTED] wrote:
>> I didn't get any feedback on this posting,
>> 
>> so I will commit my proposal to recycle the coordinates (x,y) to
>> the length of 'labels' if the latter is longer  (instead of
>> silently dropping the extra labels[] entries).

BDR> I'd suggest only doing non-fractional recycling (or at the very least 
BDR> warning against fractional recycling).  I would expect almost all 
BDR> occurrences of your example would be unintended.

well, as said, both "grid" and S-plus do recycle (w/o warning) in
this situation.

Do we have precedence cases of "recycling but warn if
fractional" ?

Martin

>> >>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>> >>>>> on Tue, 13 Jul 2004 18:22:05 +0200 (CEST) writes:
>> 
MM> Not a bug necessarily, in text(), but at least an inconsistency,
MM> and a need for more documentation:  Contrary to e.g., plot(),
MM> text(x,y,labels) *does* recycle it's arguments to some extent --
MM> and probably has always in S.
>> 
MM> However it doesn't do all I think it should, i.e.,
>> 
MM> plot(1:7); text(1:2, 1+ 1:3, LETTERS[1:4])
>> 
MM> does recycle 'x' to c(1:2, 1) {length 3} to match 'y'
MM> but doesn't recycle to length 4 in order to match 'labels'.
>> 
MM> While one can well accept this, I believe it should give a
MM> warning since it silently 'drops' the "d".
>> 
MM> However, I'm proposing to consider S(-plus) compatibility here.
MM> In S-PLUS 6.1, the result of the above is
MM> identical to
MM> plot(1:7); text(rep(1:2,length=4), rep(1+ 1:3, length=4), LETTERS[1:4])
MM> i.e. (x,y) is recycled to length 4, the length of 'labels'.
>> 
MM> Further note that in
MM> plot(1:7); text(1:2, 1+ 1:3, LETTERS[1:2], col=2:6)
MM> the 'labels' *are* recycled to length 3, matching (x,y) -- but
MM> not to length 5 of 'col' which is fine -- just not the other way around.
>> 
MM> I'd propose that R should recycle all three (x,y,labels)
MM> [but not more] to common length.
>> 
MM> BTW, "grid" graphics do recycle as well, at least  
MM> grid.text(labels, x, y) does --- and as I see it does also
MM> recycle at least the 'rotation'.
>> 
MM> Martin Maechler
>> 
MM> __
MM> [EMAIL PROTECTED] mailing list
MM> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>> 
>> __
>> [EMAIL PROTECTED] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 

BDR> -- 
BDR> Brian D. Ripley,  [EMAIL PROTECTED]
BDR> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
BDR> University of Oxford, Tel:  +44 1865 272861 (self)
BDR> 1 South Parks Road, +44 1865 272866 (PA)
BDR> Oxford OX1 3TG, UKFax:  +44 1865 272595

BDR> __
BDR> [EMAIL PROTECTED] mailing list
BDR> https://stat.ethz.ch/mailman/listinfo/r-devel

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unbalanced parentheses printed by warnings() crash text editor

2004-08-20 Thread Martin Maechler

>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
>>>>> on 20 Aug 2004 12:01:39 +0200 writes:

PD> Duncan Murdoch <[EMAIL PROTECTED]> writes:
>> >I could have sent this to the ESS or Xemacs devel list, but ESS & Xemacs'
>> >attempt to find balanced parentheses accross many lines seems sensible,
>> >and is needed with very long functions.
>> 
>> Yes, it's sensible to try, but it is a bug that they don't fail
>> gracefully.

PD> (Actually, it is not sensible; ESS should try harder to figure out
PD> what is actually R code. Much as I love ESS, it is a persistent fly in
PD> the ointment when the opening blurb comes up with "for" and "in" in
PD> magenta.)

I'm chiming in, since I have been addressed explicitly here (for
whatever reason):

Yes, yes, and yes to Duncan's and Peter's notes:

- This should have gone to the ESS-help mailing list
- it's no bug in R and a bug in ESS/Xemacs (actually a bug in Xemacs combined
  with a missing feature in ESS).

Martin Maechler

For the sake of ESS-help, here's the original message as well:

>>>>> "Mayeul" == Mayeul KAUFFMANN <[EMAIL PROTECTED]>
>>>>> on Thu, 19 Aug 2004 23:32:51 +0200 writes:

Mayeul> ... Hope it is the good place for this
Mayeul> (I discuss the question of the right place below).

Mayeul> Most of the time, warnings are more than 1000

[?? you probably mean something like '100', not '1000', right?]

Mayeul> characters long and thus are truncated.  Most of the
Mayeul> time, this generates printouts with unbalanced parentheses.

Mayeul> Intelligent text editors which do parentheses
Mayeul> highlighting get very confused with this.  After too
Mayeul> many warnings, they give errors, and may even crash.

crashing *must* be a bug of the editor setup (ESS - XEmacs -
Windows), not of R.

Mayeul> Specifically, I use ESS and XEmacs for Windows Users
Mayeul> of R (by John Fox) which is advised to do at
Mayeul> http://ess.r-project.org/ with a buffer for text
Mayeul> editing and an inferior ESS (R) buffer.  (I
Mayeul> downloaded the latest Xemacs and ESS a month ago).

Mayeul> After too many warnings (with unbalanced
Mayeul> parentheses), Xemacs swithes to an ESS-error buffer
Mayeul> which says "error Nesting too deep for parser".  In
Mayeul> some case, when back in R buffer, typing any letter
Mayeul> switches back to the ESS-error Buffer.  In other
Mayeul> case, it simply takes ages (until you kill Xemacs)
Mayeul> or it crashes.  In most case, the R process is lost.

Mayeul> I could have sent this to the ESS or Xemacs devel
Mayeul> list, but ESS & Xemacs' attempt to find balanced
Mayeul> parentheses accross many lines seems sensible, and
Mayeul> is needed with very long functions.

Mayeul> A workaround would be to change the function that print warnings.

Mayeul> Instead of, for instance,
Mayeul> "error message xx in: function.yy(z,zzz,   ..."

Mayeul> It may print
Mayeul> "error message xx in: function.yy(z,zzz,   ...)"

Mayeul> The function should truncate the error message, find
Mayeul> how many parenthesis and brackets are open in the
Mayeul> remaining part, substract the number of closing
Mayeul> parenthesis and brackets, and add that many
Mayeul> parenthesis at the end.  (Xemacs parentheses
Mayeul> highligher regards "(" and "[" as equivalent)

Mayeul> Mayeul KAUFFMANN
Mayeul> Univ. Pierre Mendes France
Mayeul> Grenoble - France

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Springer: New Series Announcement "UseR!"

2004-08-20 Thread Martin Maechler

   [in behalf of John Kimmel and Kurt Hornik:]

   [PDF version for nice printing attached at the end]

NEW SERIES ANNOUNCEMENT and 
REQUEST FOR BOOK PROPOSALS 

Springer announces a series of books called 
 UseR! 

edited by Robert Gentleman, Kurt Hornik, and Giovanni Parmigiani. 

This series of inexpensive and focused books on R will publish
shorter books aimed at practitioners. Books can discuss the use
of R in a particular subject area (e.g., epidemiology,
econometrics, psychometrics) or as it relates to statistical
topics (e.g., missing data, longitudinal data). In most cases,
books are to be written as combinations of LaTeX and R so that
all the code for figures and tables can be put on a
website. Authors should assume a background as supplied by
«Dalgaard s Introductory Statistics with R» so that each book does
not repeat basic material. Springer will supply a LaTeX style
file, all books will be reviewed and copyedited, and faster
production schedules will be used. 

To propose a book, please contact 

John Kimmel 
Executive Editor, 
Statistics Springer 
24494 Alta Vista Dr. 
Dana Point, CA 92629

[EMAIL PROTECTED] 
Telephone: 949-487-1216 Fax: 949-240-4321



UseR-Kimmel.pdf
Description: Adobe PDF document
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] text(x, y, labels) - recycling problems and RFC (PR#7084)

2004-08-19 Thread maechler

I didn't get any feedback on this posting,

so I will commit my proposal to recycle the coordinates (x,y) to
the length of 'labels' if the latter is longer  (instead of
silently dropping the extra labels[] entries).

Martin Maechler

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Tue, 13 Jul 2004 18:22:05 +0200 (CEST) writes:

MM> Not a bug necessarily, in text(), but at least an inconsistency,
MM> and a need for more documentation:  Contrary to e.g., plot(),
MM> text(x,y,labels) *does* recycle it's arguments to some extent --
MM> and probably has always in S.

MM> However it doesn't do all I think it should, i.e.,

MM> plot(1:7); text(1:2, 1+ 1:3, LETTERS[1:4])

MM> does recycle 'x' to c(1:2, 1) {length 3} to match 'y'
MM> but doesn't recycle to length 4 in order to match 'labels'.

MM> While one can well accept this, I believe it should give a
MM> warning since it silently 'drops' the "d".

MM> However, I'm proposing to consider S(-plus) compatibility here.
MM> In S-PLUS 6.1, the result of the above is
MM> identical to
MM> plot(1:7); text(rep(1:2,length=4), rep(1+ 1:3, length=4), LETTERS[1:4])
MM> i.e. (x,y) is recycled to length 4, the length of 'labels'.

MM> Further note that in
MM> plot(1:7); text(1:2, 1+ 1:3, LETTERS[1:2], col=2:6)
MM> the 'labels' *are* recycled to length 3, matching (x,y) -- but
MM> not to length 5 of 'col' which is fine -- just not the other way around.

MM> I'd propose that R should recycle all three (x,y,labels)
MM> [but not more] to common length.

MM> BTW, "grid" graphics do recycle as well, at least  
MM> grid.text(labels, x, y) does --- and as I see it does also
MM> recycle at least the 'rotation'.

MM> Martin Maechler

MM> __
MM> [EMAIL PROTECTED] mailing list
MM> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] weighted data (PR#7181)

2004-08-17 Thread maechler

>>>>> "hoefler" == hoefler  <[EMAIL PROTECTED]>
>>>>> on Mon, 16 Aug 2004 17:15:20 +0200 (CEST) writes:

hoefler> dear ladies and gentlemen,

hoefler> i am currently working on the revised version of a paper entitled
hoefler> "on the use of statistical weights to account for non-participation".

this is not at all a bug report!!
By this action of misuse, you produce unnecessary work load for
R-core members (maintaining the R-bugs repository). 

Reason for some of us to not even considering to answer your
questions...  

**PLEASE** do read http://www.R-project.org/mail.html  
carefully including its link, "the posting guide" !

Martin Maechler

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bus error /segmentation fault from 'approx' (PR#7177)

2004-08-16 Thread maechler

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Mon, 16 Aug 2004 14:19:16 +0200 (CEST) writes:

>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
>>>>> on 16 Aug 2004 12:01:20 +0200 writes:

PD> [EMAIL PROTECTED] writes:
>>> follow up to ID 7166. something like
>>> 
>>> approx(c(1,2),c(NA,NA),1.5,rule=2)
>>> 
>>> crashes 1.9.1 on both systems (MacOS 10.3.5.: bus error,
>>> SunOS 5.9: segmentation fault) even if xout is within
>>> given x range (as in example above) where rule=2 seems
>>> not be relevant anyway.

  PD> Yes, this is a silly bug in the R driver routine:

  PD> if (nx < 2 && method == "linear")
  PD>stop("approx requires at least two values to interpolate")
  PD> if (any(na <- is.na(x) | is.na(y))) {
  PD>ok <- !na
  PD>x <- x[ok]
  PD>y <- y[ok]
  PD>nx <- length(x)
  PD> }

  PD> You want to do the check after removing NAs! 

  PD> Also, we should probably have a check for (nx == 0 && method != "linear")
MM> yes (2 times)  *and*  'method' is not a string anymore at that
MM> moment because it has been pmatch()ed.

MM> I'm going to match.arg() it instead.

(no, I didn't. It's not worth it, since the C code needs the
 integer version anyway.)

MM> And I see there's more cleanup possible, since,
MM> xy.coords() already makes sure to return 2 equal length numeric
MM> components.

MM> When I do this now, it breaks back compatibility since things
MM> like (PR#6809)
MM> approx(list(x=rep(NaN, 9), y=1:9), xout=NaN)
MM> would give an error instead of (NaN, NaN).

MM> OTOH, the stop("") message you cite above was obviously
MM> *intended* to be triggered for such a case.
MM> I presume we rather intend to be back compatible?

I'm not so sure anymore:  For approxfun() {which must return a
function() in non-error cases} this would need extra code, just
for a somewhat non-sensical case.
And then, approx() should stay entirely "parallel" to approxfun().

S-plus also simply gives an error in these cases  {but AFAIR, R's
approx has tried to be better here...}

So, I propose to change behavior for all these cases and
produce an error,  for approx() from the following code part :

if (nx <= 1) {
if(method == 1)# linear
stop("need at least two non-NA values to interpolate")
if(nx == 0) stop("zero non-NA points")
}

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bus error /segmentation fault from 'approx' (PR#7177)

2004-08-16 Thread maechler

> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
> on 16 Aug 2004 12:01:20 +0200 writes:

PD> [EMAIL PROTECTED] writes:
>> follow up to ID 7166. something like
>> 
>> approx(c(1,2),c(NA,NA),1.5,rule=2)
>> 
>> crashes 1.9.1 on both systems (MacOS 10.3.5.: bus error,  SunOS 5.9:
>> segmentation fault) even if xout is within given x range (as in example above)
>> where rule=2 seems not be relevant anyway.

PD> Yes, this is a silly bug in the R driver routine:

PD> if (nx < 2 && method == "linear")
PD> stop("approx requires at least two values to interpolate")
PD> if (any(na <- is.na(x) | is.na(y))) {
PD> ok <- !na
PD> x <- x[ok]
PD> y <- y[ok]
PD> nx <- length(x)
PD> }

PD> You want to do the check after removing NAs! 

PD> Also, we should probably have a check for (nx == 0 && method != "linear")
yes (2 times)  *and*  'method' is not a string anymore at that
moment because it has been pmatch()ed.

I'm going to match.arg() it instead.
And I see there's more cleanup possible, since,
xy.coords() already makes sure to return 2 equal length numeric
components.

When I do this now, it breaks back compatibility since things
like (PR#6809)
approx(list(x=rep(NaN, 9), y=1:9), xout=NaN)
would give an error instead of (NaN, NaN).

OTOH, the stop("") message you cite above was obviously
*intended* to be triggered for such a case.
I presume we rather intend to be back compatible?

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] interaction.plot

2004-08-11 Thread Martin Maechler

> "ChrBu" == Christoph Buser <[EMAIL PROTECTED]>
> on Fri, 6 Aug 2004 10:24:40 +0200 writes:

ChrBu> Dear R core team I've a proprosal to improve the
ChrBu> function interaction.plot. It should be allowed to
ChrBu> use type = "b". 

thank you for the suggestion.
I've implemented the above for R-devel several days ago.

ChrBu> This can be done by changing the function's header from

ChrBu> function( , type = c("l", "p"), )

ChrBu> to

ChrBu> function( , type = c("l", "p", "b"), )

ChrBu> Then it works. 

well, as I mentioned to you privately, it also needs a change
in the legend() call subsequently.

ChrBu> This type = "b" is useful, if the second level of the
ChrBu> x.factor is missing for some level of the
ChrBu> trace.factor. With type= "l" you loose first level of
ChrBu> the x.factor, too (because you can't draw the line to
ChrBu> the missing second level). With type = "p" so see
ChrBu> this first level, but you have no lines at all (just
ChrBu> chaos with points). With type = "b", you get all
ChrBu> existing levels plus the lines between two contiguous
ChrBu> levels (if they both exist).

ChrBu> There is a second point. Using interaction.plot with
ChrBu> the additional argument main creates a warning:

ChrBu> parameter "main" couldn't be set in high-level plot() function

ChrBu> The reason is that "..." is used twice inside of
ChrBu> interaction.plot, in fact in

ChrBu> matplot( ,...)

ChrBu> and in

ChrBu> axis( ,...)

ChrBu> axis can do nothing with this argument main and
ChrBu> creates this warning. You could replace ,... in the
ChrBu> axis function by inserting all reasonable arguments
ChrBu> of axis in the functions header (interaction.plot)
ChrBu> and give over those arguments to axis. Then you
ChrBu> shouldn't get this warning anymore.

yes, indeed.

Note however that this warning also happens with other such plotting
functions and I find it is not a real blemish.
Your proposed solution is not so obvious or easy, since 
axis() really has its own ``intrinsic'' ... argument and
conceptually does accept many more possible "graphical
parameters" in addition to its specific ones.

Hence I believe it would need quite a large extent of extra code
in order to
1) keep the current potential functionality
2) always properly separate arguments to be passed to matplot()
   from those to be passed to axis().

-- and as ``we all know'' we should really use lattice package
   functions rather than interaction.plot() 
   {but then I'm still not the role model here ... ;-( }

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting time series

2004-08-10 Thread Martin Maechler

>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Tue, 10 Aug 2004 09:11:39 +0100 (BST) writes:

BDR> On Tue, 10 Aug 2004, Martin Maechler wrote:
>> >>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>> >>>>> on Tue, 10 Aug 2004 05:47:28 +0100 (BST) writes:
>> 
BDR> On Tue, 10 Aug 2004, Ross Ihaka wrote:
>> >> Rob Hyndman wrote:
>> >> > When part of a time series is extracted, the time series component is 
>> >> > lost. e.g.,
>> >> > x <- ts(1:10)
>> >> > x[1:4]
>> >> > 
>> >> > It would be nice if there was a subsetting function [.ts to avoid this 
>> >> > problem. However, it is beyond my R-coding ability to produce such a 
>> >> > thing.  Is someone willing to do it?
>> 
BDR> There is a [.ts, in src/library/stats/R/ts.R, and it is documented 
BDR> (?"[.ts").
>> 
>> >> Have you had a look at "window"?  The problem with "["
>> >> its that it can produce non-contiguous sets of values.
>> 
BDR> Yes.
>> 
>> indeed.  window() is what we have been advocation for a long
>> time now ... (but see below).
>> 
BDR> If you look in the sources for [.ts you will see,
BDR> commented, the code that was once there to handle cases
BDR> where the index was evenly spaced.  But it was removed
BDR> long ago in favour of window().  I tried to consult the
BDR> logs, but it seems that in the shift from CVS to SVN
BDR> recently I can't get at them.  I think the rationale
BDR> was that x[ind] should always produce an object of the
BDR> same class.
>> 
>> well, that can't have been the only rationale since now
>> x[ind] is *not* of the same class - when the "ts" property is
>> lost in any case.

BDR> `always of the same class' : for all (non-trivial)  values of ind.

aah!  please excuse my mis-interpretation of meaning "same class
as original x".

Martin

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 3 >

1 - 100 of 297 matches

Mail list logo