Re: [Rd] Write unix format files on windows and vice versa

2012-04-24 Thread Ted Harding
On 24-Apr-2012 17:23:00 Andrew Redd wrote:
 I go back and forth between windows and linux, and find myself
 running into problem with line endings. Is there a way to
 control the line ending conversion when writing files, such as
 write and cat? More explicitly I want to be able to write files
 with LF line endings rather than CRLF line ending on windows;
 and CRLF line endings instead of LF on linux, and I want to be
 able to control when the conversion is made and/or choose the
 line endings that I want.
 
 As far as I can tell the conversion is not optional and buried
 deep in compiled code. Is there a possible work around?
 
 Thanks,
 Andrew

Rather than write the Linux version out in Windows (or the other
way round in Linux), you might find it more useful to use an
external conversion utility.

One such is unix2dos/dos2unix (the same program in either case,
whose function depends on what name you call it by). This used
to be standard on older Linux systems, but may need to be explicitly
installed on your system. It seems there may also be a version for
Windows -- see:

  http://download.cnet.com/Unix2DOS/3000-2381_4-10488164.html


Then, when you create a file in Windows, you could transfer it to
Linux and covert it to Unix; and also keep it as it is for use
on Windows. Conversely, when you create a file in Linux, you
coled convert it to Windows and transfer it to Windows; and
also keep it as it is for use on Linux.

It is also possible to do the CRLF--LF or LF--CRLF using 'sed'
in Linux.

For some further detail see

   http://en.wikipedia.org/wiki/Unix2dos

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 24-Apr-2012  Time: 18:56:21
This message was sent by XFMail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NA vs. NA

2012-04-05 Thread Ted Harding
On 05-Apr-2012 11:03:15 Adrian Dusa wrote:
 Dear All,
 
 I assume this is an R-devel issue, apologies if I missed something
 obvious. I have a dataframe where the row names are country codes,
 based on ISO 3166, something like this:
 
 
 v1v2
 UK12
 NA23
 
 
 It happens that NA is the country code for Namibia, and that
 creates problems on using this data within a package due to this:
 
 Error in read.table(zfile, header = TRUE, as.is = FALSE) :
   missing values in 'row.names' are not allowed
 
 I realise that NA is reserved in R, but I assumed that when quoted it
 would be usable.
 For the moment I simply changes the country code, but I wonder if
 there's any (other) solution to circumvent this issue.
 
 Thanks very much in advance,
 Adrian

Hi Adrian,
The default in read.table() for the na.strings parameter is

  na.strings = NA

So, provided you have no NA in the data portion of your file
(or e.g. any missing values are simply blank) you could use
something like:

read.table(zfile, header = TRUE, as.is = FALSE, na.strings=OOPS)

which should avoid the problem.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 05-Apr-2012  Time: 12:21:57
This message was sent by XFMail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Julia

2012-03-01 Thread Ted Harding
http://julialang.org/blog

Then click on Stanford Talk Video.
Then click on available here.

Ted.

On 01-Mar-2012 Kjetil Halvorsen wrote:
 Can somebody postb a link to the video? I cant find it, searching
 Julia on youtube stanford channel gives nothing.
 
 Kjetil
 
 On Thu, Mar 1, 2012 at 11:37 AM, Douglas Bates ba...@stat.wisc.edu wrote:
 On Thu, Mar 1, 2012 at 11:20 AM, Jeffrey Ryan jeffrey.r...@lemnica.com
 wrote:
 Doug,

 Agreed on the interesting point - looks like it has some real promise.
 Â_I think the spike in interest could be attributable to Mike
 Loukides's tweet on Feb 20. (editor at O'Reilly)

 https://twitter.com/#!/mikeloukides/status/171773229407551488

 That is exactly the moment I stumbled upon it.

 I think Jeff Bezanson attributes the interest to a blog posting by
 Viral Shah, another member of the development team, that hit Reddit.
 He said that, with Viral now in India, it all happened overnight for
 those in North America and he awoke the next day to find a firestorm
 of interest. Â_I ran across Julia in the Release Notes of LLVM and
 mentioned it to Dirk Eddelbuettel who posted about it on Google+ in
 January. Â_(Dirk, being much younger than I, knows about these
 new-fangled social media things and I don't.)

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 01-Mar-2012  Time: 20:47:42
This message was sent by XFMail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in sum() on integer vector

2011-12-13 Thread Ted Harding
 of decimal (or binary)
digits. An example of this is R (and most other numerical
programs). This may be 32 bits or 64 bits. Any result ot
computation which involve smore than this numer of bits
is inevitably an approximation.

Provided the user is aware of this, there is no need for
your It should always return the correct value or fail.
It will return the correct value if the integers are not
too large; otherwise it will retuirn the best approximation
that it can cope with in the fixed finite storage space
for which it has been programmed.

There is an implcit element of the arbitrary in this. You
can install 32-bit R on a 64-bit-capable machine, or a
64-bit version. You could re-program R so that it can
work to, say, 128 bits or 256 bits even on a 32-bit machine
(using techniques like those that underlie 'bc'), but
that would be an arbitrary choice. However, the essential
point is that some choice is unavoidable, since if you push
it too far the Universe will run out of particles -- and the
computer industry will run out of transistors long before
you hit the Universe limit!

So you just have to accept the limits. Provided you are aware
of the approximations which may set in at some point, you can
cope with the consequences, so long as you take account of
some concept of adequacy in the inevitable approximations.
Simply to fail is far too unsophisticated a result!

Hoping this is useful,
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 14-Dec-11   Time: 00:52:49
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] array extraction

2011-09-27 Thread Ted Harding
Somewhat out of my depth here (I have only 2 arms, but am
swimming in waters which require 3): My interpretation is
that a and M are basically vectors, with dimension attributes,
accessed down the columns.

The array 'a' consists of 30 elements 1:30 in that order,
accessed by each of 3 rows for each of 5 columns in each
of two layers, in that order of precedence.

The matrix M consusts of 15 elements, accessed by each
of 35 rows for each of 5 columns.

Thus a(M) treats M as an selection vector, ignoring dimension,
and reads along a and at the same time along M, selecting
according to TRUE of FALSE. Then, when it gets to the end
of the first layer in 'a' it re-cycles M.

When I tried a[M,] I got

  a[M,]
  # Error in a[M, ] : incorrect number of dimensions

I infer that it is treating the M as a vector, so there
are only 2 dimensions in a[M,] instead of 3. So try:

  a[M,,]
  # Error: (subscript) logical subscript too long

which is not surprising since one is applying a 15-long
selector to the first dimension of 'a', which has only
length 3, just as in

  a[rep(TRUE,15),,]
  # Error: (subscript) logical subscript too long

which, interestingly, differs from

  a[rep(1,15),,]
  # (Output, which is what you'd expect, omitted)

(Hmm, out of my depth again ... ).

Well, maybe two arms is not enough when you need three
to swim in these waters; but it seems that one long swishy
tail will do nicely. That being said, I still find the
water quite murky!
Ted.

On 27-Sep-11 22:12:26, robin hankin wrote:
 thank you Simon.
 
 I find a[M] working to be unexpected, but consistent with (a close
 reading of) Extract.Rd
 
 Can we reproduce a[,M]?
 
 [I would expect this to extract a[,j,k] where M[j,k] is TRUE]
 
 try this:
 
 
 a - array(1:30,c(3,5,2))
 M - matrix(1:10,5,2) %% 3==1
 a[M]
  [1]  1  4  7 10 11 14 17 20 21 24 27 30
 
 This is not doing what I would want a[,M] to do.
 
 
 
 
 I'll checkout afill() right now
 
 best wishes
 
 
 Robin
 
 
 On Wed, Sep 28, 2011 at 10:39 AM, Simon Knapp sleepingw...@gmail.com
 wrote:
 a[M] gives the same as your `cobbled together' code.

 On Wed, Sep 28, 2011 at 6:35 AM, robin hankin hankin.ro...@gmail.com
 wrote:

 hello everyone.

 Look at the following R idiom:

 _a - array(1:30,c(3,5,2))
 _M - (matrix(1:15,c(3,5)) %% 4)  2
 _a[M,] - 0

 Now, I think that a[M,] has an unambiguous meaning (to a human).
 However, the last line doesn't work as desired, but I expected it
 to...and it recently took me an indecent amount of time to debug an
 analogous case. _Just to be explicit, I would expect a[M,] to extract
 a[i,j,] where M[i,j] is TRUE. _(Extract.Rd is perfectly clear here,
 and R
 is
 behaving as documented).

 The best I could cobble together was the following:

 _ind - which(M,arr.ind=TRUE)
 _n - 3
 _ind -
 cbind(kronecker(ind,rep(1,dim(a)[n])),rep(seq_len(dim(a)[n]),nrow(ind)
 ))
 _a[ind] - 0


 but the intent is hardly clear, certainly compared to a[M,]

 I've been pondering how to implement such indexing, and its
 generalization.

 Suppose 'a' is a seven-dimensional array, and M1 a matrix and M2 a
 three-dimensional array (both Boolean). _Then a[,M1,,M2] is a
 natural generalization of the above. _I would want a[,M1,,M2] to
 extract a[i1,i2,i3,i4,i5,i6,i7] where M1[i2,i3] and M[i5,i6,i7] are
 TRUE.

 One would need all(dim(a)[2:3] == dim(M1)) and all(dim(a)[5:7] ==
 dim(M2)) for consistency.

 Can any R-devel subscribers advise?




 --
 Robin Hankin
 Uncertainty Analyst
 hankin.ro...@gmail.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 
 
 
 -- 
 Robin Hankin
 Uncertainty Analyst
 hankin.ro...@gmail.com
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 28-Sep-11   Time: 00:28:37
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish there were a strict mode for R interpreter. What

2011-04-09 Thread Ted Harding
On 09-Apr-11 20:37:28, Duncan Murdoch wrote:
 On 11-04-09 3:51 PM, Paul Johnson wrote:
 Years ago, I did lots of Perl programming. Perl will let you be lazy
 and write functions that refer to undefined variables (like R does),
 but there is also a strict mode so the interpreter will block anything
 when a variable is mentioned that has not been defined. I wish there
 were a strict mode for checking R functions.

 Here's why. We have a lot of students writing R functions around here
 and they run into trouble because they use the same name for things
 inside and outside of functions. When they call functions that have
 mistaken or undefined references to names that they use elsewhere,
 then variables that are in the environment are accidentally used. Know
 what I mean?

 dat- whatever

 someNewFunction- function(z, w){
 #do something with z and w and create a new dat
 # but forget to name it dat
  lm (y, x, data=dat)
 # lm just used wrong data
 }

 I wish R had a strict mode to return an error in that case. Users
 don't realize they are getting nonsense because R finds things to fill
 in for their mistakes.

 Is this possible?  Does anybody agree it would be good?

 
 It would be really bad, unless done carefully.
 
 In your function the free (undefined) variables are dat and lm.  You 
 want to be warned about dat, but you don't want to be warned about lm. 
 What rule should R use to determine that?
 
 (One possible rule would work in a package with a namespace.  In that 
 case, all variables must be found in declared dependencies, the search 
 could stop before it got to globalenv().  But it seems unlikely that 
 your students are writing packages with namespaces.)
 
 Duncan Murdoch

I'm with Duncan on this one! On the other hand, I can understand the
issues that Paul's students might encounter.

I think the right thing to so is to introduce the students to the
basics of scoping, early in the process of learning R.

Thus, when there is a variable (such as 'lm' in the example) which
you *expect* to already be out there (since 'lm' is in 'stats'
which is pre-loaded by default), then you can go ahead and use it.

But when your function uses a variable (e.g. 'dat') which just
*happened* to be out there when you first wrote the function,
then when you re-use the same function definition in a different
context things are likely to go wrong. So teach them that variables
which occur in functions, which might have any meaning in whatever
the context of use may be, should either be named arguments in
the argument list, or should be specifically defined within the
function, and not assumed to already exist unless that is already
guaranteed in every context in which the function would be used.

This is basic good practice which, once routinely adopted, should
ensure that the right thing is done every time!

Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 09-Apr-11   Time: 22:08:10
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Surprising behavior of letters[c(NA, NA)]

2010-12-17 Thread Ted Harding
On 17-Dec-10 14:32:18, Gabor Grothendieck wrote:
 Consider this:
 
 letters[c(2, 3)]
 [1] b c
 letters[c(2, NA)]
 [1] b NA
 letters[c(NA, 3)]
 [1] NA  c
 letters[c(NA, NA)]
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA
 [26] NA
 
 The result is a 2-vector in each case until we get to c(NA, NA) and
 then it unexpectedly changes from returning a 2-vector to returning a
 26-vector.  I think most people would have expected that the answer
 would be c(NA, NA).

I'm not sure that it is suprising! Consider
  letters[NA]
which returns exactly the same result. Then consider that 'letters' is
simply a 26-element character vector c(a,...). Now consider

  x - c(1,2,3,4,5,6,7,8,9,10,11,12,13)
  x[NA]
  # [1] NA NA NA NA NA NA NA NA NA NA NA NA NA

In other words, x[NA] for any vector x will test each index 1:length(x)
against NA, and will find that it's NA, since it doesn't know whether
the index matches or not. Therefore it returns NA for that index, and
will do the same for every index. So it's telling you: For each of my
elements a,b,c,d,e,f,... I have to tell you that I don't know whether
you want it or not. You also get similar behavior for x==NA.

If anything might be surprising (though that also admits a logical
explanation), is the result

  letters[c(2, NA)]
  # [1] b NA

since the result being asked for by the first element of c(2,NA) is
definite -- so far so good -- but then you would expect it to have the
same problem with what is being asked for by NA. This time, it seems
that because the 2-element vector c(2,NA) is being submitted, its
length over-rides the length of the response that would be given for
x[NA]: You asked for a 2-element extraction from letters; I can see
what you want for the first, but not for the second.

However, that logic does not work for letters[c(NA,NA)] which still
returns the 26-element result!

After all that, I'm inclined to the view that letters[NA] should
return one element (NA), letters[c(NA,NA)] should return 2 (NA,NA),
etc.; and that the same should apply to all vectors accessed by [].
The above behaviour seems to contradict [what I can understand from]
what is said in ?[:

NAs in indexing:
 When extracting, a numerical, logical or character 'NA' index
 picks an unknown element and so returns 'NA' in the corresponding
 element of a logical, integer, numeric, complex or character
 result, and 'NULL' for a list.  (It returns '00' for a raw
 result.]

since that seems to imply that x[c(NA,NA)] should return c(NA,NA)
and not rep(NA,length(x))!

Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 17-Dec-10   Time: 15:40:03
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Logical vectors

2010-11-04 Thread Ted Harding
On 04-Nov-10 08:56:42, Gerrit Eichner wrote:
 On Thu, 4 Nov 2010, Stephen Liu wrote:
 [snip]
 In;

 2.4 Logical vectors
 http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

 It states:-

 The logical operators are , =, , =, == for exact equality and !=
 for inequality 

# exact equality
 !=   # inequality
 
 [snip]
 
 
 Hello, Stephen,
 in my understanding of the sentence
 
 The logical operators are , =, , =, == for exact equality and !=
 for inequality 
 
 the phrase exact equality refers to the operator ==, i. e. to the
 last element == in the enumeration (, =, , =, ==), and not to its
 first.
 
   Regards  --  Gerrit

This indicates that the sentence can be mis-read. It should be
cured by a small change in punctuation (hence I copy to R-devel):

  The logical operators are , =, , =; == for exact equality;
  and != for inequality 

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 04-Nov-10   Time: 09:08:37
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] No RTFM?

2010-08-22 Thread Ted Harding
On 22-Aug-10 18:31:46, Paul Johnson wrote:
 Hey Ben:
 One of my colleagues bought your book and was reading it during a
 faculty meeting last Tuesday.  Everybody kept asking what's that?
 
 If you know how to put it in the wiki, would you please do it and let
 us know where it is.  I was involved with an R wiki about 5 or 6 years
 ago, but completely lost track of it.
 
 I'm going to keep pushing for sharp, clear guidelines.  If I could get
 R-help set up as a template for messages like  bugzilla, I think it
 would be awesome! To disagree with Ted. People won't mind giving us
 what we need, as long as we tell them exactly what to do. We save
 ourselves the circle of give us sessionInfo() and give us your
 LANG and 'what version is that' and so forth.

People won't mind? If R-help ends up telling me exactly what to do,
I shall leave the list. I mean it. For good.
Ted.

[the rest snipped]


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-10   Time: 19:49:41
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] No RTFM?

2010-08-21 Thread Ted Harding
 is not part of R proper
 and you should address questions about Bioconductor to their
 support framework.

Presumably you mean the various special lists for Bioconductor?
In such a case, that is fair enough, since anyhone using Bioconductor
is likely to be subscribed to such lists anyway. (But they wouldn't
need to be told about this in the general Posting Guide).

[***]
There are many packages, however, such as combinat, boot, cluster,
etc. which are not part of R-base (though many of them are in
recommended) which are in frequent use, and questions about them
to R-help are perfectly in order (and the individual author of
such a package does not want to be the sole recipient of frequent
questions about it -- the R-help community can carry the load
better with its distributed model).

 C. If you are writing code for R itself, or if you are developing a
package, send your question to r-devel, rather than r-help.

It depends on the question!

 D. For operating-system or R interface questions, there are dedicated
 lists. See R-sig-Mac, R-sig-Debian, R-sig-Fedora, etc.

Fair enough (in most cases).

 ==
 
 It will be necessary to add, toward the end, the part about
 be polite when posting.

And, since a reply is also a posting, be polite when responding!

 And along the lines of the No RTFM policy, I think we should say
 All RTFM answers should include an exact document and section
 number. It is certainly insulting to answer a question about plot
 with the one line
 
 ?plot
 
 but it is not insulting to say In ?plot, check the Details
 section and run the example code.

Fair enough -- and indeed this is an example of a helpful and
sufficient (for most people) response, and therefore is not in
the RTFM category.



E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-10   Time: 03:13:22
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] suggestion for ?factor

2010-08-09 Thread Ted Harding
Hi Folks.
May I suggest a small addition to ?factor -- this is not explicit
about how to force the ordering of levels in an ordered factor.

One can guess it, but I think it would be a good idea to spell it out.
For example, one can do:

  X - c(Alf,Bert,Chris,Dave)
  F - factor(X,levels=c(Dave,Alf,Chris,Bert),ordered=TRUE)
  F
  # [1] Alf   Bert  Chris Dave 
  # Levels: Dave  Alf  Chris  Bert

So, where ?factor says:

  levels: an optional vector of the values that ?x? might have taken.
  The default is the unique set of values taken by
  'as.character(x)', sorted into increasing order of 'x'.
  Note that this set can be smaller than 'sort(unique(x))'.

it could include a bit extra so as to read:

  levels: an optional vector of the values that 'x' might have taken.
  The default is the unique set of values taken by
  'as.character(x)', sorted into increasing order of 'x'.
  Note that this set can be smaller than 'sort(unique(x))'.
  If 'levels=...' is present and 'ordered=TRUE', then the
  levels are ordered according to the order in which they
  are listed in 'levels'.

With thanks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-Aug-10   Time: 16:37:50
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] results of pnorm as either NaN or Inf

2010-05-13 Thread Ted Harding
On 13-May-10 20:04:50, efree...@berkeley.edu wrote:
 I stumbled across this and I am wondering if this is unexpected
 behavior or if I am missing something.
 
 pnorm(-1.0e+307, log.p=TRUE)
 [1] -Inf
 pnorm(-1.0e+308, log.p=TRUE)
 [1] NaN
 Warning message:
 In pnorm(q, mean, sd, lower.tail, log.p) : NaNs produced
 pnorm(-1.0e+309, log.p=TRUE)
 [1] -Inf
 
 I don't know C and am not that skilled with R, so it would be hard
 for me to look into the code for pnorm. If I'm not just missing
 something, I thought it may be of interest.
 
 Details:
 I am using Mac OS X 10.5.8. I installed a precompiled binary version.
 Here is the output from sessionInfo(), requested in the posting guide:
 R version 2.11.0 (2010-04-22) 
 i386-apple-darwin9.8.0 
 
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base   
 
 loaded via a namespace (and not attached):
 [1] tools_2.11.0
 
 Thank you very much,
 
 Eric Freeman
 UC Berkeley

This is probably platform-independent. I get the same results with
R on Linux. More to the point:

You are clearly pushing the envelope here. First, have a look
at what R makes of your inputs to pnorm():

  -1.0e+307
  # [1] -1e+307
  -1.0e+308
  # [1] -1e+308
  -1.0e+309
  # [1] -Inf


So, somewhere beteen -1e+308 and -1.0e+309. the envelope burst!
Given -1.0e+309, R returns -Inf (i.e. R can no longer represent
this internally as a finite number).

Now look at

  pnorm(-Inf,log.p=TRUE)
  # [1] -Inf

So, R knows how to give the correct answer (an exact 0, or -Inf
on the log scale) if you feed pnorm() with -Inf. So you're OK
with -1e+N where N = 309.

For smaller powers, e.g. -1e+(200:306), these give pnorm() much
less than -1.0e+309, and presumably R's algorithm (which I haven't
studied either ... ) returns 0 for pnorm(), as it should to the
available internal accuracy.

So, up to pnorm(-1.0e+307, log.p=TRUE) = -Inf. All is as it should be.
However, at -1e+308, the envelope is about to burst, and something
may occur within the algorithm which results in a NaN.

So there is nothing anomalous about your results except at -1e+308,
which is where R is at a critical point.

That's how I see it, anway!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-10   Time: 00:01:27
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] y ~ X -1 , X a matrix

2010-03-17 Thread Ted Harding
On 17-Mar-10 23:32:41, Ross Boylan wrote:
 While browsing some code I discovered a call to lm that used
 a formula y ~ X - 1, where X was a matrix.
 
 Looking through the documentation of formula, lm, model.matrix
 and maybe some others I couldn't find this useage (R 2.10.1).
 Is it anything I can count on in future versions?  Is there
 documentation I've overlooked?
 
 For the curious: model.frame on the above equation returns a
 data.frame with 2 columns.  The second column is the whole X
 matrix. model.matrix on that object returns the expected matrix,
 with the transition from the odd model.frame to the regular
 matrix happening in an .Internal call.
 
 Thanks.
 Ross
 
 P.S. I would appreciate cc's, since mail problems are preventing
 me from seeing list mail.

Hmmm ... I'm not sure what is the problem with what you describe.
Code:

  set.seed(54321)
  X  - matrix(rnorm(50),ncol=2)
  Y  - 1*X[,1] + 2*X[,2] + 0.25*rnorm(25)
  LM - lm(Y ~ X-1)

  summary(LM)
  # Call:
  # lm(formula = Y ~ X - 1)
  # Residuals:
  #  Min   1Q   Median   3Q  Max 
  # -0.39942 -0.13143 -0.02249  0.11662  0.61661 
  # Coefficients:
  #Estimate Std. Error t value Pr(|t|)
  # X1  0.977070.04159   23.49   2e-16 ***
  # X2  2.091520.06714   31.15   2e-16 ***
  # ---
  # Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
  # Residual standard error: 0.2658 on 23 degrees of freedom
  # Multiple R-squared: 0.9863, Adjusted R-squared: 0.9851 
  # F-statistic: 826.6 on 2 and 23 DF,  p-value:  2.2e-16 

  model.frame(LM)
  #  Y  X.1  X.2
  # 1   0.04936244 -0.178900750  0.051420078
  # 2  -0.54224173 -0.928044132 -0.027963292
  # [...]
  # 24  1.54196979  0.312332806  0.602009497
  # 25 -0.16928420 -1.285559427  0.394790358

  str(model.frame(LM))
  #  $ Y: num  0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ...
  #  $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ...
  # [...]

  model.frame(Y ~ X-1)
  #  Y  X.1  X.2
  # 1   0.04936244 -0.178900750  0.051420078
  # 2  -0.54224173 -0.928044132 -0.027963292
  # [...]
  # 24  1.54196979  0.312332806  0.602009497
  # 25 -0.16928420 -1.285559427  0.394790358
  ## (Identical to above)

  str(model.frame(Y ~ X-1))
  # $ Y: num  0.0494 -0.5422 -0.7295 -3.4422 -3.1296 ...
  # $ X: num [1:25, 1:2] -0.179 -0.928 -0.784 -1.651 -0.408 ...
  # [...]
  ## (Identical to above)

Maybe the clue (admittedly somewhat obtuse( can be found in ?lm:

  lm(formula, data, subset, weights, na.action,
 method = qr, model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
 singular.ok = TRUE, contrasts = NULL, offset, ...)
  [...]

  data: an optional data frame, list or environment (or object
coercible by 'as.data.frame' to a data frame) containing the
variables in the model.  If not found in 'data', the variables
are taken from 'environment(formula)', typically the
environment from which ?lm? is called.

So, in the example the variables are taken from X, coercible
by 'as.data.frame' ... taken from 'environment(formula)'.

Hence (I guess) X is found in the environment and is coerced
into a dataframe with 2 columns, and X.1, X.2 are taken from there.

R Gurus: Please comment! (I'm only guessing by plausibility).
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 18-Mar-10   Time: 00:57:20
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug in fisher.test() (PR#14196)

2010-01-27 Thread Ted Harding
On 27-Jan-10 17:30:10, nhor...@smith.edu wrote:
# is there a bug in the calculation of the odds ratio in fisher.test?
# Nicholas Horton, nhor...@smith.edu Fri Jan 22 08:29:07 EST 2010
 
 x1 = c(rep(0, 244), rep(1, 209))
 x2 = c(rep(0, 177), rep(1, 67), rep(0, 169), rep(1, 40))
 
 or1 = sum(x1==1x2==1)*sum(x1==0x2==0)/
  (sum(x1==1x2==0)*sum(x1==0x2==1))
 
 library(epitools)
 or2 = oddsratio.wald(x1, x2)$measure[2,1]
 
 or3 = fisher.test(x1, x2)$estimate
 
# or1=or2 = 0.625276, but or3=0.6259267!
 
 I'm running R 2.10.1 under Mac OS X 10.6.2.
 Nick

Not so. Look closely at ?fisher.test:

Value:
[...]
estimate: an estimate of the odds ratio.  Note that the
  _conditional_ Maximum Likelihood Estimate (MLE)
  rather than the unconditional MLE (the sample
  odds ratio) is used. Only present in the 2 by 2 case.

Your or1 (and presumably the epitools value also) is the sample OR.

The conditional MLE is the value of rho (the OR) that maximises
the probability of the table *conditional* on the margins.

In this case it differs slightly from the sample OR (by 0.1%).
For smaller tables it will tend to differ even more, e.g.

  M1 - matrix(c(4,7,17,18),nrow=2)
  M1
  #  [,1] [,2]
  # [1,]4   17
  # [2,]7   18

  (4*18)/(17*7)
  # [1] 0.605042

  fisher.test(M1)$estimate
  # odds ratio 
  # 0.6116235  ## (1.1% larger than sample OR)

  M2 - matrix(c(1,2,4,5),nrow=2)
  M2
  #  [,1] [,2]
  # [1,]14
  # [2,]25

  (1*5)/(4*2)
  # [1] 0.625

  fisher.test(M2)$estimate
  # odds ratio 
  # 0.649423  ## (3.9% larger than sample OR)

The probability of a table matrix(c(a,b,c,d),nrow=2) given
the marginals (a+b),(a+c),(b+c) and hence also (c+d) is
a function of the odds ratio only. Again see ?fisher.test:

  given all marginal totals fixed, the first element of
   the contingency table has a non-central hypergeometric
   distribution with non-centrality parameter given by
   the odds ratio (Fisher, 1935).

The value of the odds ratio which maximises this (for given
observed 'a') is not the sample OR.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 27-Jan-10   Time: 18:14:57
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Benefit of treating NA and NaN differently for numerics

2009-12-31 Thread Ted Harding
On 31-Dec-09 20:43:43, Saptarshi Guha wrote:
 Hello,
 I notice in main/arithmetic.c, that NA and NaN are encoded
 differently(since every numeric NA comes  from R_NaReal which is
 defined via ValueOfNA)
 What is the benefit of treating these two differently? Why can't NA
 be a synonym for NaN?
 
 Thank you
 Saptarshi
 (R-2.9)

Because they are used to represent different things. Others will be
able to give you a much more comprehensive account than I can of
their uses in R, but essentially:

NaN represents a result which is not valid (i.e. Not a Number)
in the domain of quantities being evaluated. For example, R does
its arithmetic by default in the domain of double, i.e. the
machine representation of real numbers. In this domain, sqrt(-1)
does not exist -- it is not a number in the domain of real numbers.
Hence:

  sqrt(-1)
  # [1] NaN
  # Warning message:
  # In sqrt(-1) : NaNs produced

In order to obtain a result which does exist, you need to switch
domain to complex numbers:

   sqrt(as.complex(-1))
  # [1] 0+1i

NA, on the other hand, represents a value (in whatever domain:
double, logical, character, ...) which is not known, which is why
it is typically used to represent missing data. It would be a valid
entity in the current domain if its value were known, but the value
is not known. Hence the result of any expression involving NA
quantities is NA, since the value if the expression would depend on
the unkown elements, and hence the value of the expression is unknown.

This distinction is important and useful, so it should not be done
away with by merging NaN and NA!

Best wishes,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Dec-09   Time: 21:05:06
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] a little bug for the function 'sprintf' (PR#14161)

2009-12-21 Thread Ted Harding
On 21-Dec-09 17:36:12, Peter Dalgaard wrote:
 baoli...@gmail.com wrote:
 Dear R-ers,
 I am a gratuate student from South China University of Technology.
 I fond the function 'sprintf' in R2.10.1 have a little bug(?):
 
 When you type in the example codes:
 
 sprintf(%s is %f feet tall\n, Sven, 7.1)
 
 and R returns:
 
 [1] Sven is 7.10 feet tall\n
 
 this is very different from the 'sprintf' function in C/C++,
 for in C/C++, the format string \n usually represents a new line,
 but here, just the plain text \n!
 
 No, this is exactly the same as in C/C++. If you compare the result
 of sprintf to Sven is 7.10 feet tall\n with strcmp() in C,
 they will compare equal.
 
   s - sprintf(%s is %f feet tall\n, Sven, 7.1)
   s
 [1] Sven is 7.10 feet tall\n
   nchar(s)
 [1] 27
   substr(s,27,27)
 [1] \n
 
 The thing that is confusing you is that strings are DISPLAYED
 using the same escape-character mechanisms as used for input.
 Compare
 
   cat(s)
 Sven is 7.10 feet tall
  
 
 
 Is it a bug, or a deliberate design?
 
 Design, not bug (and please don't file as bug when you are in doubt.)

And another confusion is that the C/C++ function sprintf() indeed
creates a string AND ASSIGNS IT TO A NAMED VARIABLE, according to
the syntax

  int sprintf(char *str, const char *format, ...);

as in

  char *X ;
  sprintf(X,%s is %f feet tall\n, Sven, 7.1) ;

as a result of which the string X will have the value
  Sven is 7.10 feet tall\n

R's sprintf does not provide for the parameter Char *str, here X,
and so RETURNS the string as the value of the function.

This is NOT TO BE CONFUSED with the behaviour of the C/C++ functions
printf() and fprintf(), both of which create the string and then
send it to either stdout or to a file:

  int printf(const char *format, ...);
  int fprintf(FILE *stream, const char *format, ...);

Therefore, if you programmed

  printf(%s is %f feet tall\n, Sven, 7.1) ;

you would see on-screen (stdout) the string 

  Sven is 7.10 feet tall

(followed by a line-break due to the \n), while

  mystream = fopen(myoutput.txt,a) ;
  fprintf(mystream, %s is %f feet tall\n, Sven, 7.1) ;

would append

  Sven is 7.10 feet tall

(followed by a line-break) to myoutput.txt

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-Dec-09   Time: 18:14:57
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Binning of integers with hist() function odd results (P

2009-11-07 Thread Ted Harding
On 06-Nov-09 23:30:12, g...@fnal.gov wrote:
 Full_Name: Gerald Guglielmo
 Version: 2.8.1 (2008-12-22)
 OS: OSX Leopard
 Submission from: (NULL) (131.225.103.35)
 
 When I attempt to use the hist() function to bin integers the behavior
 seems
 very odd as the bin boundary seems inconsistent across the various
 bins. For
 some bins the upper boundary includes the next integer value, while in
 others it
 does not. If I add 0.1 to every value, then the hist() binning behavior
 is what
 I would normally expect. 
 
 h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
 h1$mids
 [1] 1.5 2.5 3.5 4.5
 h1$counts
 [1] 3 3 4 5
 h2-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1)
 )
 h2$mids
 [1] 1.5 2.5 3.5 4.5 5.5
 h2$counts
 [1] 1 2 3 4 5
 
 Naively I would have expected the same distribution of counts in the
 two cases, but clearly that is not happening. This is a simple example
 to illustrate the behavior, originally I noticed this while binning a
 large data sample where I had set the breaks=c(0,24,1).

This is the correct intended bahaviour. By default, values which are
exactly on the boundary between two bins are counted in the bin which
is just below the boundary value. Except that the bottom-most break
will count values on it into the bin just above it.

Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3];
4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5.

Since you did not set breaks in
  h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)),
they were set using the default method, and you can see what they are
with

  h1$breaks
  [1] 1 2 3 4 5

When you add 0.1 to each value, you push the values on the boundaries
up into the next bin. Now each value is inside its bin, and not on
any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3];
3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and
5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe.

The default behaviour described above is defined by the default options

  include.lowest = TRUE, right = TRUE

where:

include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks'
  value will be included in the first (or last, for 'right =
  FALSE') bar.  This will be ignored (with a warning) unless
  'breaks' is a vector.

   right: logical; if 'TRUE', the histograms cells are right-closed
  (left open) intervals.

See '?hist'. You can change this behaviour by shanging the options.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Nov-09   Time: 13:57:07
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Binning of integers with hist() function odd results (P (PR#14047)

2009-11-07 Thread ted . harding
On 06-Nov-09 23:30:12, g...@fnal.gov wrote:
 Full_Name: Gerald Guglielmo
 Version: 2.8.1 (2008-12-22)
 OS: OSX Leopard
 Submission from: (NULL) (131.225.103.35)
 
 When I attempt to use the hist() function to bin integers the behavior
 seems
 very odd as the bin boundary seems inconsistent across the various
 bins. For
 some bins the upper boundary includes the next integer value, while in
 others it
 does not. If I add 0.1 to every value, then the hist() binning behavior
 is what
 I would normally expect. 
 
 h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
 h1$mids
 [1] 1.5 2.5 3.5 4.5
 h1$counts
 [1] 3 3 4 5
 h2-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1)
 )
 h2$mids
 [1] 1.5 2.5 3.5 4.5 5.5
 h2$counts
 [1] 1 2 3 4 5
 
 Naively I would have expected the same distribution of counts in the
 two cases, but clearly that is not happening. This is a simple example
 to illustrate the behavior, originally I noticed this while binning a
 large data sample where I had set the breaks=c(0,24,1).

This is the correct intended bahaviour. By default, values which are
exactly on the boundary between two bins are counted in the bin which
is just below the boundary value. Except that the bottom-most break
will count values on it into the bin just above it.

Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3];
4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5.

Since you did not set breaks in
  h1-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)),
they were set using the default method, and you can see what they are
with

  h1$breaks
  [1] 1 2 3 4 5

When you add 0.1 to each value, you push the values on the boundaries
up into the next bin. Now each value is inside its bin, and not on
any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3];
3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and
5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe.

The default behaviour described above is defined by the default options

  include.lowest = TRUE, right = TRUE

where:

include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks'
  value will be included in the first (or last, for 'right =
  FALSE') bar.  This will be ignored (with a warning) unless
  'breaks' is a vector.

   right: logical; if 'TRUE', the histograms cells are right-closed
  (left open) intervals.

See '?hist'. You can change this behaviour by shanging the options.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Nov-09   Time: 13:57:07
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] basename returns . not in filename (PR#13958)

2009-09-18 Thread Ted Harding
On 18-Sep-09 19:08:15, Jens Oehlschlägel wrote:
 Mmh,
  Point is, I gather, that trailing slashes are removed, e.g.,
 
  viggo:~/basename foo/
  foo
 
  So, not a bug.
 
 This unfortunately means that we cannot distinguish between 
 1) a path with a filename
 2) a path without a filename
 
 For example in the next version of the ff-package we allow a user to
 specify a 'pattern' for all files of a ff dataframe which is path
 together with a fileprefix, the above means we cannot specify an empty
 prefix  for the current working directory, because 
 
 dirname(./.)
 [1] .
 basename(./.)
 [1] .
 dirname(./)
 [1] .
 basename(./)
 [1] .
 
 Jens Oehlschlägel

I am getting confused by this discussion. At least on Unixoid systems,
and I believe it holds for Windows systems too, . stands for the
current directory (working directory).

Moreover, ./ means exactly the same as .: If you list the files
in ./ you will get exactly the same as if you list the files in ..

Further, any directory/. means the same as any directory/
and the same as any directory, so (on the same basis) ./. also
means exactly the same as ..

Therefore the second . in ./. is not a filename.

What the above examples of dirname and basename usage are returning
is simply a specific representation of the current working directlory.

Forgive me if I have not seen the point, but what I think I have seen
boils down to the interpretation I have given above.

Best wishes,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 18-Sep-09   Time: 22:35:34
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Ted Harding
On 16-Aug-09 10:38:40, Liviu Andronic wrote:
 Dear R developers,
 Currently many (all?) test functions in R describe the alternative
 hypothesis, but not the the null hypothesis being tested. For example,
 cor.test:
 require(boot)
 data(mtcars)
 with(mtcars, cor.test(mpg, wt, met=kendall))
 
   Kendall's rank correlation tau
 
 data:  mpg and wt
 z = -5.7981, p-value = 0.6706
 alternative hypothesis: true tau is not equal to 0
 sample estimates:
  tau
 -0.72783
 
 Warning message:
 In cor.test.default(mpg, wt, met = kendall) :
   Cannot compute exact p-value with ties
 
 
 In this example,
 H0: (not printed)
 Ha: true tau is not equal to 0
 
 This should be fine for the advanced users and expert statisticians,
 but not for beginners. The help page will also often not explicitely
 state the null hypothesis. Personally, I often find myself in front of
 an htest object guessing what the null should have reasonably sounded
 like.
 
 Are there compelling reasons for not printing out the null being
 tested, along with the rest of the results? Thank you
 Liviu

I don't know about *compelling* reasons! But (as a general rule)
if the Alternative Hyptohesis is stated, then the Null Hypothesis
is simply its negation. So, in your example, you can infer

  H0: true tau equals 0
  Ha: true tau is not equal to 0.

I don't think one needs to be an advanced user or expert statistician
to see this -- it is part of the basic understanding of hypothesis
testing.

Some people might regard the H0 statement as simply redundant!

The Ha statement is, however, essential, since different alterntatives
may ne adopted depending on the application such as

  Ha: true tau is greater than 0
  (implicit: true tau = 0)

or
  Ha: tru tau is less than 0
  (implicit: true tau = 0)

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-09   Time: 11:55:15
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Printing the null hypothesis

2009-08-16 Thread Ted Harding
On 16-Aug-09 14:06:18, Liviu Andronic wrote:
 Hello,
 On 8/16/09, Ted Harding ted.hard...@manchester.ac.uk wrote:
 I don't know about *compelling* reasons! But (as a general rule)
  if the Alternative Hyptohesis is stated, then the Null Hypothesis
  is simply its negation. So, in your example, you can infer

   H0: true tau equals 0
   Ha: true tau is not equal to 0.

 Oh, I had a slightly different H0 in mind. In the given example,
 cor.test(..., met=kendall) would test H0: x and y are independent,
 but cor.test(..., met=pearson) would test: H0: x and y are not
 correlated (or `are linearly independent') .

Ah, now you are playing with fire! What the Pearson, Kendall and
Spearman coefficients in cor.test measure is *association*. OK, if
the results clearly indicate association, then the variables are
not independent. But it is possible to have two variables x, y
which are definitely not independent (indeed one is a function of
the other) which yield zero association by any of these measures.

Example:
  x -  (-10:10) ; y - x^2 - mean(x^2)
  cor.test(x,y,method=pearson)
  #   Pearson's product-moment correlation
  # t = 0, df = 19, p-value = 1
  # alternative hypothesis: true correlation is not equal to 0 
  # sample estimates: cor 0
  cor.test(x,y,method=kendall)
  #   Kendall's rank correlation tau
  # z = 0, p-value = 1
  # alternative hypothesis: true tau is not equal to 0 
  # sample estimates: tau 0
  # cor.test(x,y,method=spearman)
  #  Spearman's rank correlation rho
  # S = 1540, p-value = 1
  # alternative hypothesis: true rho is not equal to 0 
  # sample estimates: rho 0

If you wanted, for instance, that the method=kendall should
announce that it is testing H0: x and y are independent then
it would seriously mislead the reader!

 To take a different example, a test of normality.
 shapiro.test(mtcars$wt)
 
   Shapiro-Wilk normality test
 
 data:  mtcars$wt
 W = 0.9433, p-value = 0.09265
 
 Here both H0: x is normal and Ha: x is not normal are missing. At
 least to beginners, these things are not always perfectly clear (even
 after reading the documentation), and when interpreting the results it
 can prove useful to have on-screen information about the null.
 
 Thank you for answering
 Liviu

This is possibly a more discussable point, in that even if you know
what the Shapiro-Wilk statistic is, it is not obvious what it is
sensitive to, and hence what it might be testing for. But I doubt
that someone would be led to try the Shapiro-Wilk test in the
first place unless they were aware that it was a test for normality,
and indeded this is announced in the first line of the response.
The alternative, therefore, is non-normality.

As to the contrast between absence of an Ha statement for the
Shapiro-Wilk, and its presence in cor,test(), this comes back to
the point I made earlier: cot.test() offers you three alternatives
to choose from: two-sided (default), greater, less. This
distinction can be important, and when cor.test() reports Ha it
tells you which one was used.

On the other hand, as far as Shapiro-Wilk is concerned there is
no choice of alternatives (nor of anything else except the data x).
So there is nothing to tell you! And, further, departure from
normality has so many dimensions that alternatives like two
sided, greater or less would make no sense. One can think of
tests targeted at specific kinds of alternative such as Distribution
is excessively skew or distribution has excessive kurtosis or
distribution is bimodal or distribution is multimodal, and so on.
But any of these can be detected by Shapiro-Wilk, so it is not
targeted at any specific alternative.

Best wishes,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-09   Time: 16:26:48
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] identical(0, -0)

2009-08-07 Thread Ted Harding
On 07-Aug-09 11:07:08, Duncan Murdoch wrote:
 Martin Maechler wrote:
 William Dunlap wdun...@tibco.com
 on Thu, 6 Aug 2009 15:06:08 -0700 writes:
  -Original Message- From:
  r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of
  Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM
  To: milton.ru...@gmail.com Cc: r-h...@r-project.org;
  daniel.gerl...@geodecapital.com Subject: Re: [R] Why is 0
  not an integer?
  
  
  I ran an instant experiment...
  
   typeof(0) [1] double  typeof(-0) [1] double 
  identical(0, -0) [1] TRUE
  
  Best, Giovanni

  But 0.0 and -0.0 have different reciprocals

  1.0/0.0
 [1] Inf
  1.0/-0.0
 [1] -Inf

  Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap
  tibco.com

 yes.  {finally something interesting in this boring thread !}
 --- diverting to R-devel

 In April, I've had a private e-mail communication with John
 Chambers [father of S, notably S4, which also brought identical()]
 and Bill about the topic,
 where I had started suggesting that  R  should be changed such
 that
 identical(-0. , +0.)
 would return FALSE.
 Bill did mention that it does so for (newish versions of) S+
 and that he'd prefer that, too,
 and John said

   I agree on having a preference for a bitwise comparison for
   identical()---that's what the name means after all.  But since
   someone implemented the numerical case as the C == it's probably
   going to be more hassle than it's worth to change it.  But we
   should make the implementation clear in the documentation.

 so in principle, we all agreed that R's identical() should be
 changed here, namely by using something like  memcmp() instead
 of simple '==' ,  however we haven't bothered to actually 
 *implement* this change.

 I am currently testing a patch  which would lead to
 identical(0, -0)  return FALSE.
   
 I don't think that would be a good idea.  Other expressions besides
 -0 
 calculate the zero with the negative sign bit, e.g. the following
 sequence:
 
 pos - 1
 neg - -1
 zero - 0
 y - zero*pos
 z - zero*neg
 identical(y, z)
 
 I think most R users would expect the last expression there to be
 TRUE based on the previous two lines, given that pos and neg both
 have finite values. In a simple case like this y == z would be a
 better test to use, but if those were components of a larger
 structure, identical() is all we've got, and people would waste a
 lot of time tracking down why structures differing only in the
 sign of zero were not identical, even though every element tested
 equal.
 
 Duncan Murdoch
 Martin Maechler, ETH Zurich

My own view of this is that there may in certain cirumstances be an
interest in distinguishing between 0 and (-0), yet normally most
users will simply want to compare the numerical values.

Therefore I am in favour of revising identical() so that it can so
distinguish; but also of taking the opportunity to give it a parameter
say

  identical(x,y,sign.bit=FALSE)

so that the default behaviour would be to see 0 and (-0) as identical,
but with sign.bit=TRUE it would see the difference.

However, I put this forward in ignorance of
a) Any difficulties that this may present in re-coding identical();
b) Any complications that may arise when applying this new form
   to complex objects.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-09   Time: 14:49:51
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.csv

2009-06-14 Thread Ted Harding
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
 If read.csv's colClasses= argument is NOT used then read.csv accepts
 double quoted numerics:
 
 1:  read.csv(stdin())
 0: A,B
 1: 1,1
 2: 2,2
 3:
   A B
 1 1 1
 2 2 2
 
 However, if colClasses is used then it seems that it does not:
 
 read.csv(stdin(), colClasses = numeric)
 0: A,B
 1: 1,1
 2: 2,2
 3:
 Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
 na.strings,  :
   scan() expected 'a real', got '1'
 
 Is this really intended?  I would have expected that a csv file
 in which each field is surrounded with double quotes is acceptable
 in both cases. This may be documented as is yet seems undesirable
 from both a consistency viewpoint and the viewpoint that it should
 be possible to double quote fields in a csv file.

Well, the default for colClasses is NA, for which ?read.csv says:
  [...]
  Possible values are 'NA' (when 'type.convert' is used),
  [...]
and then ?type.convert says:
  This is principally a helper function for 'read.table'. Given a
  character vector, it attempts to convert it to logical, integer,
  numeric or complex, and failing that converts it to factor unless
  'as.is = TRUE'.  The first type that can accept all the non-missing
  values is chosen.

It would seem that type 'logical' won't accept integer (naively one
might expect 1 -- TRUE, but see experiment below), so the first
acceptable type for 1 is integer, and that is what happens.
So it is indeed documented (in the R[ecursive] sense of documented :))

However, presumably when colClasses is used then type.convert() is
not called, in which case R sees itself being asked to assign a
character entity to a destination which it has been told shall be
integer, and therefore, since the default for as.is is
  as.is = !stringsAsFactors
but for this ?read.csv says that stringsAsFactors is overridden
bu [sic] 'as.is' and 'colClasses', both of which allow finer
control., so that wouldn't come to the rescue either.

Experiment:
  X -logical(10)
  class(X)
  # [1] logical
  X[1]-1
  X
  # [1] 1 0 0 0 0 0 0 0 0 0
  class(X)
  # [1] numeric
so R has converted X from class 'logical' to class 'numeric'
on being asked to assign a number to a logical; but in this
case its hands were not tied by colClasses.

Or am I missing something?!!

Ted.




E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jun-09   Time: 21:21:22
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Closed-source non-free ParallelR ?

2009-04-23 Thread Ted Harding
 and a lot of these things  
 are
 covered in the GPL FAQs, including the reporting of violations.

 The GPL FAQs are the FSF's interpretation.  The R Foundation is not
 obliged to have the same interpretation, and of course the FSF cannot
 enforce licenses given by the R Foundation.
 
 Underlying all of your comments seems to be a presumption that the R  
 Foundation can disentangle themselves from the FSF vis-a-vis the GPL.
 
 Keep in mind that it is the FSF that is the copyright holder of the
 GPL.
 
 The R Foundation may be the copyright holder to R, but they are  
 distributing it under a license which they did not write.
 
 Thus, I cannot envision any reasonable circumstances under which the R 
 Foundation would place themselves in a position of legal risk in  
 deviating from the interpretations of the GPL by the FSF. It would be  
 insane legally to do so.
 
 The key issue is the lack of case law relative to the GPL and that  
 leaves room for interpretation. One MUST therefore give significant  
 weight to the interpretations of the FSF as it will likely be the FSF  
 that will be involved in any legal disputes over the GPL and its  
 application. You would want them on your side, not be fighting them.
 
 A parallel here is why most large U.S. public corporations legally  
 incorporate in the state of Delaware, even though they may not have  
 any material physical presence in that state. It is because the  
 overwhelming majority of corporate case law in the U.S. has been  
 decided under the laws of Delaware and the interpretations of said  
 laws. If I were to start a company (which I have done in the past) and 
 feared that I should find myself facing litigation at some future  
 date, I would want that huge database of case law behind me. A small  
 company (such as I had) may be less concerned about this and be  
 comfortable with the laws of their own state, which I was. But if I  
 were to be looking to build a big company with investors, etc. and  
 perhaps look to go public at a future date, you bet I would look to  
 incorporate in Delaware. It would be the right fiduciary decision to  
 make in the interest of all parties.
 
 Unfortunately, we have no such archive of case law yet of the GPL.  
 Thus at least from a legally enforceable perspective, all is grey and  
 the FSF has to be the presumptive leader here.
 
 HTH,
 
 Marc Schwartz
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Apr-09   Time: 01:54:11
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Gamma funtion(s) bug

2009-03-30 Thread Ted Harding
On 30-Mar-09 18:40:03, Kjetil Halvorsen wrote:
 With R 2.8.1 on ubuntu I get:
 gamma(-1)
 [1] NaN
 Warning message:
 In gamma(-1) : NaNs produced
 lgamma(-1)
 [1] Inf
 Warning message:
 value out of range in 'lgamma'
 
 Is'nt the first one right, and the second one (lgamma)
 should also be NaN?
 Kjetil

That is surely correct! Since lim[x-(-1)+] gamma(x) = +Inf,
while lim[x-(-1)-] gamma(x) = -Inf, at gamma(-1) one cannot
choose between +Inf and -Inf, so surely is is NaN.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09   Time: 19:55:33
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Gamma funtion(s) bug

2009-03-30 Thread Ted Harding
On 30-Mar-09 20:37:51, Duncan Murdoch wrote:
 On 3/30/2009 2:55 PM, (Ted Harding) wrote:
 On 30-Mar-09 18:40:03, Kjetil Halvorsen wrote:
 With R 2.8.1 on ubuntu I get:
 gamma(-1)
 [1] NaN
 Warning message:
 In gamma(-1) : NaNs produced
 lgamma(-1)
 [1] Inf
 Warning message:
 value out of range in 'lgamma'
 
 Is'nt the first one right, and the second one (lgamma)
 should also be NaN?
 Kjetil
 
 That is surely correct! Since lim[x-(-1)+] gamma(x) = +Inf,
 while lim[x-(-1)-] gamma(x) = -Inf, at gamma(-1) one cannot
 choose between +Inf and -Inf, so surely is is NaN.
 
 But lgamma(x) is log(abs(gamma(x))), so it looks okay to me.
 
 Duncan Murdoch

Oops, yes! That's what comes of talking off the top of my head
(I don't think I've ever had occasion to evaluate lgamma(x)
for negative x, so never consciously checked in ?lgamma).

Thanks, Duncan!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09   Time: 22:28:52
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error in help file for quantile()

2009-03-29 Thread Ted Harding
On 29-Mar-09 20:43:33, Rob Hyndman wrote:
 For some reason, the help file on quantile() says Missing values are
 ignored in the description of the x argument. Yet this is only true
 if na.rm=TRUE. I suggest the help file is amended to remove the words
 Missing values are ignored.
 Rob

True enough -- in that if, as in the default, na.rm == FALSE,
then applying quantile() to a vector with NAs yields the error
message:

quantile(X1)
# Error in quantile.default(X1) : 
#   missing values and NaN's not allowed if 'na.rm' is FALSE

So either you have na.rm==TRUE, in which case it doesn't need
saying that Missing values are ignored (unless you really
want to spell it out in the form Missing values are ignored if
called with na.rm=TRUE; otherwise an error message is produced),
or you have na.rm==FALSE, in which case you get the error message
and know where you stand.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Mar-09   Time: 23:08:50
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Semantics of sequences in R

2009-02-24 Thread Ted Harding
On 24-Feb-09 13:14:36, Berwin A Turlach wrote:
 G'day Dimitris,
 
 On Tue, 24 Feb 2009 11:19:15 +0100
 Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote:
 
 in my opinion the point of the whole discussion could be summarized
 by the question, what is a design flaw? This is totally subjective,
 and it happens almost everywhere in life. [...]
 
 Beautifully summarised and I completely agree.  Not surprisingly,
 others don't.
 
 [...]
 To close I'd like to share with you a Greek saying (maybe also a
 saying in other parts of the world) that goes, for every rule there
 is an exception. [...]
 
 As far as I know, the same saying exist in English.  It definitely
 exist in German.  Actually, in German it is every rule has its
 exception including this rule.

Or, as my mother used to say, Moderation in all things!
To which, as I grew up, I adjoined  ... including moderation.
Ted.

 In German there is one grammar rule
 that does not have an exception.  At least there used to be one; I am
 not really sure whether that rule survived the recent reform of the
 German grammar rules.
 
 Cheers,
   Berwin


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Feb-09   Time: 14:44:21
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] open-ended plot limits?

2009-02-05 Thread Ted Harding
Hi Folks,
Maybe I've missed it already being available somehow,
but if the following isn't available I'd like to suggest it.

If you're happy to let plot() choose its own limits,
then of course plot(x,y) will do it.

If you know what limits you want, then
  plot(x,y,xlim=c(x0,x1),ylim(y0,y1)
will do it.

But sometimes one would like to
a) make sure that (e.g.) the y-axis has a lower limit (say) 0
b) let plot() choose the upper limit.

In that case, something like

  plot(x,y,ylim=c(0,NA))

would be a natural way of specifying it. But of course that
does not work.

I would like to suggest that this possibility should be available.
What do people think?

Best wishes,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 05-Feb-09   Time: 20:48:30
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] open-ended plot limits?

2009-02-05 Thread Ted Harding
Thanks, everyone, for all the responses!
Ted.

On 05-Feb-09 20:48:33, Ted Harding wrote:
 Hi Folks,
 Maybe I've missed it already being available somehow,
 but if the following isn't available I'd like to suggest it.
 
 If you're happy to let plot() choose its own limits,
 then of course plot(x,y) will do it.
 
 If you know what limits you want, then
   plot(x,y,xlim=c(x0,x1),ylim(y0,y1)
 will do it.
 
 But sometimes one would like to
 a) make sure that (e.g.) the y-axis has a lower limit (say) 0
 b) let plot() choose the upper limit.
 
 In that case, something like
 
   plot(x,y,ylim=c(0,NA))
 
 would be a natural way of specifying it. But of course that
 does not work.
 
 I would like to suggest that this possibility should be available.
 What do people think?
 
 Best wishes,
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 05-Feb-09   Time: 20:48:30
 -- XFMail --
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 05-Feb-09   Time: 23:08:25
-- XFMail --


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 06-Feb-09   Time: 00:21:44
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cat: ./R/Copy: No such file or directory

2008-11-01 Thread Ted Harding
Just guessing here, but it looks as though an attempt has been made
to execute the following command:

  cat ./R/Copy of create.fourier.basis.R

This may have arisen because there is indeed a file named

  Copy of create.fourier.basis.R

but the command was issued to the OS without quotes; or (perhaps
less likely, because of the presence of the ./R/) the line

  Copy of create.fourier.basis.R

was intended as a comment in the file, but somehow got read as
a substantive part of the code.

Good luck!
Ted.

On 01-Nov-08 15:40:33, Spencer Graves wrote:
 Hello: 
 
   What do you recommend I do to get past cryptic error messages
 from 
 R CMD check, complaining No such file or directory?  The package is
 under SVN control, and I reverted to a version that passed R CMD 
 check, without eliminating the error.  The 00install.out file is 
 short and bitter: 
 
 
 cat: ./R/Copy: No such file or directory
 cat: of: No such file or directory
 cat: create.fourier.basis.R: No such file or directory
 make: *** [Rcode0] Error 1
 
 -- Making package fda 
   adding build stamp to DESCRIPTION
   installing NAMESPACE file and metadata
 make[2]: *** No rule to make target `R/Copy', needed by 
 `D:/spencerg/statmtds/splines/fda/RForge/fda/fda.Rcheck/fda/R/fda'. 
 Stop.
 make[1]: *** [all] Error 2
 make: *** [pkg-fda] Error 2
 *** Installation of fda failed ***
 
 Removing 'D:/spencerg/statmtds/splines/fda/RForge/fda/fda.Rcheck/fda'
 
  
   Thanks for any suggestions.  I have a *.tar.gz file that I hope
 may not have this problem, but I'm not sure. 
 
   Thanks,
   Spencer
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Nov-08   Time: 16:19:56
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] non user-friendly error for chol2inv functions

2008-08-29 Thread Ted Harding
On 29-Aug-08 13:00:01, Martin Maechler wrote:
 cd == christophe dutang [EMAIL PROTECTED]
 on Fri, 29 Aug 2008 14:28:42 +0200 writes:
 
 cd Yes, I do not cast the first argument as a matrix with
 cd as.matrix function.
 cd Maybe we could detail the error message if the first
 cd argument
 cd is a numeric?
 
 cd error(_('a' is a numeric and must be coerced to a numeric
 cd matrix));
 
 Merci, Christophe.   Yes, we *could* do that.
 Alternatively, I think I will just make it work in that case,
 since I see that 
   qr(), chol(), svd(), solve()  all 
 treat a numeric (of length 1) as a  1 x 1 matrix automatically.

I was about to draw attention to this inconsistency!
While one is about it, might it not be useful also to do
the converse: Treat a 1x1 matrix as a scalar in appropriate
contexts?

E.g.

  a - 4
  A - matrix(4,1,1)
  B - matrix(c(1,2,3,4),2,2)
  a*B
#  [,1] [,2]
# [1,]4   12
# [2,]8   16

  a+B
#  [,1] [,2]
# [1,]57
# [2,]68

  A*B
# Error in A * B : non-conformable arrays
  A+B
# Error in A + B : non-conformable arrays

Ted. 


 
 cd Thanks for your answer
 De rien!
 Martin
 
 cd 2008/8/29 Martin Maechler [EMAIL PROTECTED]
 
   cd == christophe dutang [EMAIL PROTECTED]
   on Fri, 29 Aug 2008 12:44:18 +0200 writes:
  
 cd Hi,
 cd In function chol2inv with the option LINPACK set to false
 (default),
  it
 cd raises an error when the matrix is 1x1 matrix (i.e. just a
 real)
  saying
  
 cd 'a' must be a numeric matrix
  
  It is very helpful, but you have to read and understand it.
  I'm pretty sure you did not provide a  1 x 1 matrix.
  
  Here's an example showing how things works :
  
   m - matrix(4,1,1)
   cm - chol(m)
   cm
  [,1]
  [1,]2
   chol2inv(cm)
  [,1]
  [1,] 0.25
  
  
  Martin Maechler, ETH Zurich
  
  
 cd This error is raised by the underlying C function
 (modLa_chol2inv in
 cd function Lapack.c). Everything is normal, but I wonder if we
 could
  have
 cd another behavior when we pass a 1x1 matrix. I spent time this
  morning
 cd finding where was the error, and it was this problem.
  
 cd Thanks in advance
  
 cd Christophe
  
 cd [[alternative HTML version deleted]]
  
 cd __
 cd R-devel@r-project.org mailing list
 cd https://stat.ethz.ch/mailman/listinfo/r-devel
  
 
 cd [[alternative HTML version deleted]]
 
 cd __
 cd R-devel@r-project.org mailing list
 cd https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 29-Aug-08   Time: 14:16:03
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: add a warning in the help-file of unique()

2008-04-17 Thread Ted Harding
On 17-Apr-08 10:44:32, Matthieu Stigler wrote:
 Hello
 
 I'm sorry if this suggestion/correction was already made
 but after a search in devel list I did not find any mention
 of it. I would just suggest to add a warning or an exemple
 for the help-file of the function unique() like
 
 Note that unique() compares only identical values. Values
 which, are printed equally but in facts are not identical
 will be treated as different.
 
 
   a-c(0.2, 0.3, 0.2, 0.4-0.1)
   a
 [1] 0.2 0.3 0.2 0.3
   unique(a)
 [1] 0.2 0.3 0.3
 
 Well this is just the idea and the sentence could be made better
 (my poor english...). Maybe a reference to RFAQ 7.31 could be made.
 Maybe is this behaviour clear and logical for experienced users,
 but I don't think it is for beginners. I personnaly spent two
 hours to see that the problem in my code came from this.

The above is potentially a useful suggestion, and I would be
inclined to support it. However, for your other suggestion:

 I was thinking about modify the function unique() to introduce
 a tol argument which allows to compare with a tolerance level
 (with default value zero to keep unique consistent) like all.equal(),
 but it seemed too complicated with my little understanding.
 
 Bests regards and many thanks for what you do for R!
 Matthieu Stigler

What is really complicated about it is that the results may
depend on the order of elements. When unique() eliminates only
values which are strictly identical to values which have been
scanned earlier, there is no problem.

But suppose you set tol=0.11 in

unique(c(20.0, 30.0, 30.1, 30.2, 40.0)
# 20.0, 30.0, 40
[30.1 rejected because within 0.11 of previous 30.0;
 30.2 rejected because within 0.11 of previous 30.1]
and compare with

unique(c(20.0, 30.0, 30.2, 30.1, 40.0)
# 20.0, 30.0, 30.2, 40.0
[30.2 accepted because not within 0.11 of any previous;
 30.1 rejected because within 0.11 of previous 30.2 or 30.0]

This kind of problem is always present in situations where
there are potential chained tolerances.

You cannot see the difference between the position of the
hour-hand of a clock now, and one minute later.

But you may not chain this logic, for, if you could:

If A is indistinguishable from B, and B is indistinguishable
  from C, then A is indistinguishable from C.

10:00 is indistinguishable from 10:01 (on the hour-hand)
10:[n] is indistinguishable from 10:[n+1]

Hence, by induction, 10:00 is indistinguishable from 11:00

Which you do not want!

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-08   Time: 14:54:19
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Warnings generated by log2()/log10() are really large/t

2008-02-27 Thread Ted Harding
On 27-Feb-08 13:39:47, Gabor Grothendieck wrote:
 On Wed, Feb 27, 2008 at 5:50 AM, Henrik Bengtsson
 [EMAIL PROTECTED] wrote:
 On Wed, Feb 27, 2008 at 12:56 AM, Prof Brian Ripley
 [EMAIL PROTECTED] wrote:
  On Wed, 27 Feb 2008, Martin Maechler wrote:
 
Thank you Henrik,
   
HenrikB == Henrik Bengtsson [EMAIL PROTECTED]
on Tue, 26 Feb 2008 22:03:24 -0800 writes:
   
{with many superfluous empty statements ( i.e., trailing ; ):
 
   Indeed!

 I like to add a personal touch to the code I'm writing ;)

 Seriously, I added them here as a bait in order to get a chance to say
 that I finally found a good reason for adding the semicolons.  If you
 cut'n'paste code from certain web pages it may happen that
 newlines/carriage returns are not transferred and all code is pasted
 into the same line at the R prompt.  With semicolons you still get a
 valid syntax.  I cannot remember under what conditions this happened -
 
 I have seen that too and many others have as well since in some forums
 (not related to R) its common to indent all source lines by two spaces.
 Any line appearing without indentation must have been wrapped.

A not-so-subtle solution to this (subtle or not) problem.

NEVER paste from a browser (or a Word doc, or anything similar)
into the R command interface. Paste only from pure plain text.

Therefore, if you must paste, then paste first into a window
where a pure-plain-text editor is running.

Then you can see what you're getting, and can clean it up.
After that, you can paste from this directly into R, or can
save the file and source() it.

Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 27-Feb-08   Time: 14:36:02
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 0.450.45 = TRUE (PR#10744)

2008-02-13 Thread Ted Harding
On 13-Feb-08 12:40:48, Barry Rowlingson wrote:
 hadley wickham wrote:
 
 It's more than that as though, as floating point addition is
 no longer guaranteed to be commutative or associative, and
 multiplication does not distribute over addition. Many concepts
 that are clear cut in pure math become fuzzy in floating point
 math - equality, singularity of matrices etc etc.
 
   I've just noticed that R doesn't calculate e^pi - pi as equal to 20:
 
exp(pi)-pi == 20
   [1] FALSE
 
   See: http://www.xkcd.com/217/
 
 Barry

Barry,
These things fluctuate. Once upon a time (sometime in 1915 will do)
you could get $[US]4.81 for £1.00 sterling.

One of the rare brief periods when the folks on opposite side
of the Atlantic saw i^i (to within .Machine$double.eps, which
at the time was about 0.001, if you were lucky and didn't
make a slip of the pen).

R still gets it approximately right:

  1/(1i^1i)
  [1] 4.810477+0i

$i^i = £1

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 13-Feb-08   Time: 15:57:02
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 0.450.45 = TRUE (PR#10744)

2008-02-12 Thread Ted Harding
On 12-Feb-08 14:53:19, Gavin Simpson wrote:
 On Tue, 2008-02-12 at 15:35 +0100, [EMAIL PROTECTED] wrote:
 Dear developer,
 
 in my version of R (2.4.0) as weel as in a more recent version
 (2.6.0) on different computers, we found this problem :
 
 No problem in R. This is the FAQ of all FAQs (Type III SS is
 probably up there as well).

I'm thinking (by now quite strongly) that there is a place
in Introduction to R (and maybe other basic documentation)
for an account of arithmetic precision in R (and in digital
computation generally).

A section Arithmetic Precision in R near the beginning
would alert people to this issue (there is nothing about it in
Introduction to R, R Language Definition, or R internals).

Once upon a time, poeple who did arithmetic knew about this
from hands-on experience (just when do you break out of the
loop when you are dividing 1 by 3 on a sheet of paper?) -- but
now people press buttons on black boxes, and when they find
that 1/3 calculated in two mathematically equivalent ways
comes out with two different values, they believe that there
is a bug in the software.

It would not occur to them, spontaneously, that the computer
is doing the right thing and that they should look in a FAQ
for an explanation of how they do not understand!

I would be willing to contribute to such an explanation;
and probably many others would too. But I feel it should be
coordinated by people who are experts in the internals
of how R handles such things.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Feb-08   Time: 15:31:26
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] When 1+2 != 3 (PR#9895)

2007-09-03 Thread Ted Harding
On 03-Sep-07 15:12:06, Henrik Bengtsson wrote:
 On 9/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 [...]
 If it may be usefull, I have written to small function
 (Unique and isEqual)
 which can deal with this problem of the double numbers.
 
 Quiz: What about utility functions equalsE() and equalsPi()?
 ...together with examples illustrating when they return TRUE and when
 they return FALSE.
 
 Cheers
 
 /Henrik

Well, if you guys want a Quiz: ... My favourite example
of something which will probably never work on R (or any
machine which implements fixed-length binary real arithmetic).

An interated function scheme on [0,1] is defined by

  if 0 = x = 0.5 then next x = 2*x

  if 0.5  x = 1  then next x = 2*(1 - x)

in R:

  nextX - function(x){ifelse(x=0.5, 2*x, 2*(1-x))}

and try, e.g.,

 x-3/7; for(i in (1:60)){x-nextX(x); print(c(i,x))}

x = 0 is an absorbing state.
x = 1 - x = 0
x = 1/2 - 1 - 0
...
(these work in R)

If K is an odd integer, and 0  r  K, then

x = r/K -  ... leads into a periodic set.

E.g. (see above) 3/7 - 6/7 - 2/7 - 4/7 - 2/7

All other numbers x outside these sets generate non-periodic
sequences.

Apart from the case where initial x = 1/2^k, none of the
above is true in R (e.g. the example above).

So can you devise an isEqual function which will make this
work?

It's only Monday .. plenty of time!
Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 03-Sep-07   Time: 17:32:38
-- XFMail --


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 03-Sep-07   Time: 18:50:23
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] When 1+2 != 3 (PR#9895)

2007-09-03 Thread Ted Harding
On 03-Sep-07 19:25:58, Gabor Grothendieck wrote:
 Not sure if this counts but using the Ryacas package

Gabor, I'm afraid it doesn't count! (Though I didn't
exclude it explicitly). I'm not interested in the behaviour
of the sequence with denominator = 7 particularly.
The system is in fact an example of simulating chaotic
systems on a computer.

For instance, one of the classic illustrations is

  next x = 2*x*(1-x)

for any real x. The question is, how does a finite-length
binary representation behave?

Petr Savicky [privately] sent me a similar example:
Starting with r/K:

nextr - function(r){ifelse(r=K/2, 2*r, 2*(K-r))}

  For K = 7 and r = 3, this yields r = 3,  6,  2,  4,  6, ...
   Dividing this by K=7, one gets the correct period with
   approximately correct numbers.

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 03-Sep-07   Time: 21:02:27
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SWF animation method

2007-08-08 Thread Ted Harding
On 08-Aug-07 13:52:59, Mike Lawrence wrote:
 Hi all,
 
 Just thought I'd share something I discovered last night. I was  
 interested in creating animations consisting of a series of plots and  
 after finding very little in the usual sources regarding animation in  
 R directly, and disliking the imagemagick method described here 
 (http://tolstoy.newcastle.edu.au/R/help/05/10/13297.html), I  
 discovered that if one exports the plots to a multipage pdf, it is  
 relatively trivial to then use the pdf2swf command in SWFTools  
 (http://www.swftools.org/download.html; mac install instructions  
 here: http://9mmedia.com/blog/?p=7).

Thanks so much for sharing your discovery, Mike! Out of the blue!
(Unexpected bonus for being on the R list).

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 08-Aug-07   Time: 15:25:44
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding
On 16-Jul-07 13:28:50, Gabor Grothendieck wrote:
 The formula attribute of the builtin CO2 dataset seems a bit strange:
 
 formula(CO2)
 Plant ~ Type + Treatment + conc + uptake
 
 What is one supposed to do with that?  Certainly its not suitable
 for input to lm and none of the examples in ?CO2 use the above.

I think one is supposed to ignore it! (Or maybe be inspired to
write a mail to the list ... ).

I couldn't find anything that looked like the above formula from
str(CO2). But I did spot that the order of terms in the formula:
Plant, Type, treatment, conc, uptake, is the same as the order
of the columns in the dataframe.

So I tried:

  D-data.frame(x=(1:10),y=(1:10))

  formula(D)
  x ~ y

So, lo and behold, D has a formula!

Or does it? Maybe if you give formula() a dataframe, it simply
constructs one from the columns.

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 14:57:28
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding
On 16-Jul-07 13:57:56, Ted Harding wrote:
 On 16-Jul-07 13:28:50, Gabor Grothendieck wrote:
 The formula attribute of the builtin CO2 dataset seems a bit strange:
 
 formula(CO2)
 Plant ~ Type + Treatment + conc + uptake
 
 What is one supposed to do with that?  Certainly its not suitable
 for input to lm and none of the examples in ?CO2 use the above.
 
 I think one is supposed to ignore it! (Or maybe be inspired to
 write a mail to the list ... ).
 
 I couldn't find anything that looked like the above formula from
 str(CO2). But I did spot that the order of terms in the formula:
 Plant, Type, treatment, conc, uptake, is the same as the order
 of the columns in the dataframe.
 
 So I tried:
 
   D-data.frame(x=(1:10),y=(1:10))
 
   formula(D)
   x ~ y
 
 So, lo and behold, D has a formula!
 
 Or does it? Maybe if you give formula() a dataframe, it simply
 constructs one from the columns.

Now that I think about it, I can see a use for this phenomenon:

   formula(CO2)
  Plant ~ Type + Treatment + conc + uptake
   formula(CO2[,2:5])
  Type ~ Treatment + conc + uptake
   formula(CO2[,3:5])
  Treatment ~ conc + uptake
   formula(CO2[,4:5])
  conc ~ uptake
   formula(CO2[,c(5,1,2,3,4)])
  uptake ~ Plant + Type + Treatment + conc


Could save a lot of typing!

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 15:14:38
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding
On 16-Jul-07 14:16:10, Gabor Grothendieck wrote:
 Following up on your comments it seems formula.data.frame just creates
 a formula whose lhs is the first column name and whose rhs is made up
 of the remaining column names.  It ignores the formula attribute.
 
 In fact, CO2 does have a formula attribute but its not extracted by
 formula.data.frame:
 
 [EMAIL PROTECTED]
 uptake ~ conc | Plant
 formula(CO2)
 Plant ~ Type + Treatment + conc + uptake

Indeed! And, following up yet again on my own follow-up comment:

library(combinat)

for(j in (1:4)){
  for(i in combn((1:4),j,simplify=FALSE)){
print(formula(CO2[,c(5,i)]))
  }
}
uptake ~ Plant
uptake ~ Type
uptake ~ Treatment
uptake ~ conc
uptake ~ Plant + Type
uptake ~ Plant + Treatment
uptake ~ Plant + conc
uptake ~ Type + Treatment
uptake ~ Type + conc
uptake ~ Treatment + conc
uptake ~ Plant + Type + Treatment
uptake ~ Plant + Type + conc
uptake ~ Plant + Treatment + conc
uptake ~ Type + Treatment + conc
uptake ~ Plant + Type + Treatment + conc

opening the door to automated fitting of all possible models
(without interactions)!

Now if only I could find out how to do the interactions as well,
I would never need to think again!

best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 15:40:36
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] formula(CO2)

2007-07-16 Thread Ted Harding
On 16-Jul-07 14:42:19, Gabor Grothendieck wrote:
 Note that the formula uptake ~. will do the same thing so its
 not clear how useful this facility really is.

Hmmm... Do you mean somthing like

  lm(uptake ~ . , data=CO2[,i])

where i is a subset of (1:4) as in my code below? In which case
I agree!

Ted.


 
 On 7/16/07, Ted Harding [EMAIL PROTECTED] wrote:
 On 16-Jul-07 14:16:10, Gabor Grothendieck wrote:
  Following up on your comments it seems formula.data.frame just
  creates
  a formula whose lhs is the first column name and whose rhs is made
  up
  of the remaining column names.  It ignores the formula attribute.
 
  In fact, CO2 does have a formula attribute but its not extracted by
  formula.data.frame:
 
  [EMAIL PROTECTED]
  uptake ~ conc | Plant
  formula(CO2)
  Plant ~ Type + Treatment + conc + uptake

 Indeed! And, following up yet again on my own follow-up comment:

 library(combinat)

 for(j in (1:4)){
  for(i in combn((1:4),j,simplify=FALSE)){
print(formula(CO2[,c(5,i)]))
  }
 }
 uptake ~ Plant
 uptake ~ Type
 uptake ~ Treatment
 uptake ~ conc
 uptake ~ Plant + Type
 uptake ~ Plant + Treatment
 uptake ~ Plant + conc
 uptake ~ Type + Treatment
 uptake ~ Type + conc
 uptake ~ Treatment + conc
 uptake ~ Plant + Type + Treatment
 uptake ~ Plant + Type + conc
 uptake ~ Plant + Treatment + conc
 uptake ~ Type + Treatment + conc
 uptake ~ Plant + Type + Treatment + conc

 opening the door to automated fitting of all possible models
 (without interactions)!

 Now if only I could find out how to do the interactions as well,
 I would never need to think again!

 best wishes,
 Ted.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 16-Jul-07   Time: 15:40:36
 -- XFMail --

 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 16:13:15
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bounding box in PostScript

2006-04-17 Thread Ted Harding
On 17-Apr-06 Prof Brian Ripley wrote:
 On Mon, 17 Apr 2006, David Allen wrote:
 
 When a graph is saved as PostScript, the bounding box is often too
 big.
 A consequence is that when the graph is included in a LaTeX document,
 the spacing does not look good.
 Is this a recognized problem? Is someone working on it? Could I help?
 
 It's not really true.  The bounding box is of the figure region, and it
 is 
 a design feature.  If the figure `region' that is too big for your 
 purposes you need to adjust the margins.
 
 Alternatively, there are lots of ways to set a tight bounding box: I
 tend to use the bbox device in gs.  But that often ends up with
 figures which do not align, depending on whether text has descenders.

Ghostscript's BBox resources tend to suffer from this problem,
since for some reason as Brian says it may not properly allow
for the true sizes of characters. I once tweaked ps2eps to
correct (more or less) for this.

But nowadays, to get it exactly right, I use gv (ghostview
front end) to open the EPS file, and set watch mode:

  gv -watch filename.eps

If it doesn't automatically give the BoundingBox view, you
can always select this from the top menu. This ensures that
the frame of the gv window is the same as the BoundingBox.

Then open the EPS file for editing, and look for the

  %%BoundingBox llx lly urx ury

line (it may be near the beginning, or at the end -- which
will be flagged by a line

  %%BoundingBox: atend

near the beinning).

Here, llx, lly, urx, ury are integers giving the coordinates
(in points relative to the page origin) of the lower left,
and upper right, corners of the bounding box.

Now you can change the values of these. Each time you change,
save the edited file. Because gv is in watch mode, it will
re-draw the page whenever the file changes. Thus you can adjust
the bounding box until it is just as you want it.

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-06   Time: 23:05:29
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pbinom with size argument 0 (PR#8560)

2006-02-03 Thread Ted Harding
On 03-Feb-06 [EMAIL PROTECTED] wrote:
 Full_Name: Uffe Høgsbro Thygesen
 Version: 2.2.0
 OS: linux
 Submission from: (NULL) (130.226.135.250)
 
 
 Hello all.
 
   pbinom(q=0,size=0,prob=0.5)
 
 returns the value NaN. I had expected the result 1. In fact any
 value for q seems to give an NaN.

Well, NaN can make sense since q=0 refers to a single sampled
value, and there is no value which you can sample from size=0;
i.e. sampling from size=0 is a non-event. I think the probability
of a non-event should be NaN, not 1! (But maybe others might argue
that if you try to sample from an empty urn you necessarily get
zero successes, so p should be 1; but I would counter that you
also necessarily get zero failures so q should be 1. I suppose
it may be a matter of whether you regard the r of the binomial
distribution as referring to the identities of the outcomes
rather than to how many you get of a particular type. Hmmm.)

 Note that
 
   dbinom(x=0,size=0,prob=0.5)
 
 returns the value 1.

That is probably because the .Internal code for pbinom may do
a preliminary test for x = size. This also makes sense, for
the cumulative pdist for any dist with a finite range,
since the answer must then be 1 and a lot of computation would
be saved (likewise returning 0 when x  0). However, it would
make even more sense to have a preceding test for size=0
and return NaN in that case since, for the same reasons as
above, the result is the probability of a non-event.

(But it depends on your point of view, as above ... However,
surely the two  should be consistent with each other.)

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 03-Feb-06   Time: 14:34:28
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pbinom with size argument 0 (PR#8560)

2006-02-03 Thread Ted Harding
On 03-Feb-06 Peter Dalgaard wrote:
 (Ted Harding) [EMAIL PROTECTED] writes:
 
 On 03-Feb-06 [EMAIL PROTECTED] wrote:
  Full_Name: Uffe Høgsbro Thygesen
  Version: 2.2.0
  OS: linux
  Submission from: (NULL) (130.226.135.250)
  
  
  Hello all.
  
pbinom(q=0,size=0,prob=0.5)
  
  returns the value NaN. I had expected the result 1. In fact any
  value for q seems to give an NaN.
 
 Well, NaN can make sense since q=0 refers to a single sampled
 value, and there is no value which you can sample from size=0;
 i.e. sampling from size=0 is a non-event. I think the probability
 of a non-event should be NaN, not 1! (But maybe others might argue
 that if you try to sample from an empty urn you necessarily get
 zero successes, so p should be 1; but I would counter that you
 also necessarily get zero failures so q should be 1. I suppose
 it may be a matter of whether you regard the r of the binomial
 distribution as referring to the identities of the outcomes
 rather than to how many you get of a particular type. Hmmm.)
 
  Note that
  
dbinom(x=0,size=0,prob=0.5)
  
  returns the value 1.
 
 That is probably because the .Internal code for pbinom may do
 a preliminary test for x = size. This also makes sense, for
 the cumulative pdist for any dist with a finite range,
 since the answer must then be 1 and a lot of computation would
 be saved (likewise returning 0 when x  0). However, it would
 make even more sense to have a preceding test for size=0
 and return NaN in that case since, for the same reasons as
 above, the result is the probability of a non-event.
 
 Once you get your coffee, you'll likely realize that you got
 your p's and d's mixed up...

You're right about the mix-up! (I must mend the pipeline.)

 I think Uffe is perfectly right: The result of zero experiments will
 be zero successes (and zero failures) with probability 1, so the
 cumulative distribution function is a step function with one step at
 zero ( == as.numeric(x=0) ).

I'm perfectly happy with this argument so long as it leads to
dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1
(which seems to be what you are arguing too). And I think there
are no traps if p=0 or p=1.

 (But it depends on your point of view, as above ... However,
 surely the two  should be consistent with each other.)

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 03-Feb-06   Time: 15:07:49
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Typo [Was: Rd and guillemots]

2005-09-17 Thread Ted Harding
On 17-Sep-05 Prof Brian Ripley wrote:
 On Fri, 16 Sep 2005 [EMAIL PROTECTED] wrote:
 
 On 16-Sep-05 Duncan Murdoch wrote:
 [...]
 This seems to happen in Rdconv.pm, around here:

  ## avoid conversion to guillemots
  $c =~ s//\{\}/;
  $c =~ s//\{\}/;

 The name of the continental quotation mark « is guillemet.

 The R Development Core Team must have had some bird on the brain
 at the time ...
 
 I don't think any authority agrees with Ted here. There are two 
 characters, left and right.

Agreed I only gave one instance. Either « or » is a guillemet.

As to any authority, it depends what you mean by authority.

1. Take any good French dictionary (e.g. Collins Robert).
   Look up [Fr]guillemet: -- [En]quotation mark, inverted comma.
   Look up [En]quotation mark: -- [Fr]guillemet.

   There is a phrase entre guillemets: -- in quotation marks
   or in quotes, and vice versa.

   Look up [Fr]guillemot: -- [En]guillemot and vice versa.

2. Take a good book on printing/typographical matters, e.g. The
   Chicago Manual of Style which is very comprehensive.

   Index: guillemets [the entry is in the plural]: - 9.22-26
   Small angle marks called guillemets («») are used for quotation
   marks ...

   Index: guillemot: -- nothing found.

It's not as straightforward as that, however! In French, guillemet
is in fact used generically for quotation mark and, typographically,
includes not only the marks « and » we are talking bout, but also
the marks used for similar purposes in non-continental typography.

So the opening double quote `` (e.g. in Times Roman) and closing ''
(sorry, can't make these marks in email) are also guillemets.
Indeed we have [note the singular] guillemet anglais ouvrant (``),
guillemet anglais fermant (''), as well as guillemet français
ouvrant («), guillemet français fermant (»); not to mention the
fact that a guillemet français e.g. « consists of two chevrons
and one can also have a chevron ouvrant consisting of just one
of these (can't do this either) which is also called a guillemet
français simple ouvrant (in PostScript guilsingleft), etc. And
there is (as in Courier font) the guillemet dactylographique
= typewriter quotation mark (). And lots of other variants.

Rather than sink in the morass of French-speaking usage, we might
be better off referring to an authority closer to the sort of usage
that concerns us, So I've had a look at the Unicode Standard,
specifically

  http://www.unicode.org/Public/UNIDATA/NamesList.txt

where one can find

  00ABLEFT-POINTING DOUBLE ANGLE QUOTATION MARK
  = LEFT POINTING GUILLEMET
  = chevrons (in typography)
  * usually opening, sometimes closing

  00BBRIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
  = RIGHT POINTING GUILLEMET
  * usually closing, sometimes opening

  2039SINGLE LEFT-POINTING ANGLE QUOTATION MARK
  = LEFT POINTING SINGLE GUILLEMET
  * usually opening, sometimes closing

  203ASINGLE RIGHT-POINTING ANGLE QUOTATION MARK
  = RIGHT POINTING SINGLE GUILLEMET
  * usually closing, sometimes opening

but no guillemots!

 Collectively it seems agreed they are called guillemets, but the
 issue is over the names of the single characters, and the character
 shown is the left guillem[eo]t.

See above ...

 Adobe says these are left and right guillemot.  It seems that the 
 majority opinion does not agree, but there is a substantial usage 
 following Adobe.

That is certainly a matter of fact! And it is certainly thus in
Adobe's PostScript Language Reference Manual (see e.g. Standard
Roman Character Set in Appendix E, Standard Character Sets and
Encoding Vectors). So that is what must be used when invoking
them in PostScript. However, I am firmly of the view that Adobe
made an error when they gave these things the names guillemotleft
and guillemotright.

 I had already changed the R source code, so please Ted and others
 follow the advice in the posting guide and
 
 *** check the current sources before posting alleged bugs ***

Easier said than done ... However, I apologise!

Nevertheless, apart from the issue of a possible R bug, I think
it is worth putting the record straight on the general issue of
nomenclature.

Best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 17-Sep-05   Time: 10:51:17
-- XFMail --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel