RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-15 Thread Simon Fear
I think the thread ended up with several people (not only me)
feeling certain they didn't like `is.na-` but with the 
developers defending it and me not really understanding
why.

Uwe Ligges was going to come up with an example of
`- NA` going wrong (sorry Brian R, I mean behaving
unexpectedly), but never did, and I think the problem
has been fixed. It was apparently a problem with assigning
NAs to an existing factor, but the code for `[-.factor`
looks pretty robust to me [not that I'm at all qualified to say
that, be warned]. Interestingly, at some point both methods
for `is.na-` perform this operation: x[value] - NA. Ahem.

By the way, `is.na(x) - FALSE` will leave x unchanged (including
leaving it as NA ! how bad is that ?!)

 -Original Message-
 From: Paul Lemmens [mailto:[EMAIL PROTECTED]
 Sent: 14 October 2003 16:10
 To: [EMAIL PROTECTED]
 Subject: RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation
 fault)
 
 
 Security Warning:
 If you are not sure an attachment is safe to open please contact 
 Andy on x234. There are 0 attachments with this message.
 
 
 By accident I'm also toying around with NA's, so I started 
 reading up on
 this thread but failed to find a 'concluding' remark or advice. As a
 naive 
 R user I would have loved to see a comment do it like this.
 
 The prevailing opinion seemed to be that is.na() might be 
 better (safer)
 but x - NA is much clearer to understand. Can I relatively safely use
 the 
 easy form, or is it better to remember (the hard way) the 
 safer version?
 Has the discussion continued privately or just stopped here?
 
 Personally I still find the fragments below (taken from the 
 thread) very
 counter intuitive, not to say scary.
 
 x - 1:10
 is.na(x) - 1:5
 
 and
 
 is.na(x) - FALSE
 
 
 It's very hard to understand what happens (as layman) because the 
 assignment seems to reverse in meaning in the first example (actually 
 taking indices 1:5 of x and assigning those the value NA) 
 whereas in the
 second case it's not obvious what happens to x: will it get the value
 FALSE 
 or will the original value remain(*).
 
 IMHO the - NA construct is much easier to understand and 
 should be made
 safe in all possible situations (whatever the underlying 
 safety problem
 or 
 other difficulties might be).
 
 
 kind regards,
 Paul
 
 (*) Such a remark will probably lead to some kind of reprimand because
 it's 
 probably somewhere within the 10e6 manual pages but I'm trying my luck
 here.
 
 
 -- 
 Paul Lemmens
 NICI, University of Nijmegen  ASCII Ribbon Campaign /\
 Montessorilaan 3 (B.01.03)Against HTML Mail \ /
 NL-6525 HR Nijmegen  X
 The Netherlands / \
 Phonenumber+31-24-3612648
 Fax+31-24-3616066
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-15 Thread Paul Lemmens
Hello Simon,

--On woensdag 15 oktober 2003 10:08 +0100 Simon Fear 
[EMAIL PROTECTED] wrote:

By the way, `is.na(x) - FALSE` will leave x unchanged (including
leaving it as NA ! how bad is that ?!)
Twilight Zone (Golden Earring). But with that remark I'm getting off topic, 
so thank you for your summary. I've already memorized the is.na() 
construct, so I should be safe for the time being :

kind regards,
Paul


--
Paul Lemmens
NICI, University of Nijmegen  ASCII Ribbon Campaign /\
Montessorilaan 3 (B.01.03)Against HTML Mail \ /
NL-6525 HR Nijmegen  X
The Netherlands / \
Phonenumber+31-24-3612648
Fax+31-24-3616066
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-14 Thread Paul Lemmens
By accident I'm also toying around with NA's, so I started reading up on 
this thread but failed to find a 'concluding' remark or advice. As a naive 
R user I would have loved to see a comment do it like this.

The prevailing opinion seemed to be that is.na() might be better (safer) 
but x - NA is much clearer to understand. Can I relatively safely use the 
easy form, or is it better to remember (the hard way) the safer version? 
Has the discussion continued privately or just stopped here?

Personally I still find the fragments below (taken from the thread) very 
counter intuitive, not to say scary.

x - 1:10
is.na(x) - 1:5
and

is.na(x) - FALSE

It's very hard to understand what happens (as layman) because the 
assignment seems to reverse in meaning in the first example (actually 
taking indices 1:5 of x and assigning those the value NA) whereas in the 
second case it's not obvious what happens to x: will it get the value FALSE 
or will the original value remain(*).

IMHO the - NA construct is much easier to understand and should be made 
safe in all possible situations (whatever the underlying safety problem or 
other difficulties might be).

kind regards,
Paul
(*) Such a remark will probably lead to some kind of reprimand because it's 
probably somewhere within the 10e6 manual pages but I'm trying my luck here.

--
Paul Lemmens
NICI, University of Nijmegen  ASCII Ribbon Campaign /\
Montessorilaan 3 (B.01.03)Against HTML Mail \ /
NL-6525 HR Nijmegen  X
The Netherlands / \
Phonenumber+31-24-3612648
Fax+31-24-3616066
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-09 Thread Simon Fear

 -Original Message-
 From: Richard A. O'Keefe [mailto:[EMAIL PROTECTED]
snip

 The very existence of an is.na- which accepts  a logical
 vector containing FALSE as well as TRUE ...

And don't forget this is not the only usage of is.na-. In fact it is 
designed to take any valid indexing value. For example:

 a-1:10
 is.na(a) - 1:5
 a
 [1] NA NA NA NA NA  6  7  8  9 10

Wow. I really hate that. Someone tell me again why this is
better than a[1:5] - NA ??
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Uwe Ligges
Richard A. O'Keefe wrote:

I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA.

?NA says
 Function `is.na-' may provide a safer way to set missingness. It
 behaves differently for factors, for example.
However, MAY provide is a bit scary, and it doesn't say WHAT the
difference in behaviour is.
I must say that is.na(x) - ... is rather repugnant, because it doesn't
work.  What do I mean?  Well, as the designers of SETL who many years ago
coined the term sinister function call to talk about f(...)-...,
pointed out, if you do
f(x) - y
then afterwards you expect
f(x) == y
to be true.  So let's try it:
 x - c(1,NA,3)
 is.na(x) - c(FALSE,FALSE,TRUE)
 x
[1]  1 NA NA
 is.na(x)
[1] FALSE  TRUE  TRUE
v
So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x),
but I _got_ c(FALSE,TRUE, TRUE) instead.
^
That is not how a well behaved sinister function call should work,
and it's enough to scare someone off is.na()- forever.
The obvious way to set elements of a variable to missing is ... - NA.
Wouldn't it be better if that just plain worked?
Can someone give an example of is.na()- and -NA working differently
with a factor?  I just tried it:
 x - factor(c(3,1,4,1,5,9))
 y - x
 is.na(x) - x==1
 y[y==1] - NA
 x
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9
 y
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9

Both approaches seem to have given the same answer.  What did I miss?


As mentioned in another mail to R-help. I'm pretty sure there was (is?) 
a problem with character (and/or factor) and assignment of NAs, but I 
cannot (re)produce an example. I think something for the x - NA case 
has been fixed during the last year.
What prevents me to think I'm completely confused is that the is.na()- 
usage is proposed in: ?NA, S Programming, the R Language Definition 
manual, R's News file, but I cannot find it in the green book right now.

Uwe Ligges

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Simon Fear
Note this behaviour:

 a-a
 a-NA
 mode(a)
[1] logical
 a-a
 is.na(a) - T
 mode(a)
[1] character

However after either way of assigning NA to a, is.na(a) is true,
and it prints as NA, so I can't see it's ever likely to matter. [Why
do I say these things? Expect usual flood of examples where it 
does matter.]

Also if a is a character vector, a[2] - NA coerces the NA to
as.character(NA); again, just as one would hope/expect.

I have to echo Richard O'K's remark: if - NA can ever go wrong,
is that not a bug rather than a feature?
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Prof Brian Ripley
On Wed, 8 Oct 2003, Simon Fear wrote:

 Note this behaviour:
 
  a-a
  a-NA
  mode(a)
 [1] logical
  a-a
  is.na(a) - T
  mode(a)
 [1] character
 
 However after either way of assigning NA to a, is.na(a) is true,
 and it prints as NA, so I can't see it's ever likely to matter. [Why
 do I say these things? Expect usual flood of examples where it 
 does matter.]
 
 Also if a is a character vector, a[2] - NA coerces the NA to
 as.character(NA); again, just as one would hope/expect.
 
 I have to echo Richard O'K's remark: if - NA can ever go wrong,
 is that not a bug rather than a feature?

I don't think it can ever `go wrong', but it can do things other than the 
user intends.  The intention of is.na- is clearer, and so perhaps user 
error is less likely?  That is the thinking behind the function, anyway.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Simon Fear
Well, that's a convincing argument, but maybe 
it's the name that's worrying some of us. Maybe it would be 
more intuitive if called set.na (sorry, I mean setNA).

Also is.na- cannot be used to create a new variable of 
NAs, so is not a universal method,  which is a shame for its 
advocates.

I note also that for a vector you can assign a new NA using 
either TRUE or FALSE:

 a - 1:3
 is.na(a[4])-F
 a
[1]  1  2  3 NA

For a list,  assigning F leaves the new element set to NULL.

Mind you, I suspect this would be a particularly stupid thing 
to do, so I'm not going to lose any sleep over R's reaction to it.

 
 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 I don't think it can ever `go wrong', but it can do things other than
 the 
 user intends.  The intention of is.na- is clearer, and so 
 perhaps user 
 error is less likely?  That is the thinking behind the 
 function, anyway.
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Gabor Grothendieck


Also, presumably is.na- could be redefined by the user for particular
classes so if you got in the habit of setting NAs that way it would
generalize better.

--- 
Date: Wed, 8 Oct 2003 11:49:29 +0100 (BST) 
From: Prof Brian Ripley [EMAIL PROTECTED]

I don't think it can ever `go wrong', but it can do things other than the 
user intends. The intention of is.na- is clearer, and so perhaps user 
error is less likely? That is the thinking behind the function, anyway.



___
No banners. No pop-ups. No kidding.
Introducing My Way - http://www.myway.com

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Richard A. O'Keefe
Simon Fear [EMAIL PROTECTED] suggested that

 a-a
 a-NA
 mode(a)
[1] logical
 a-a
 is.na(a) - T
 mode(a)
[1] character

might be a relevant difference between assigning NA and using is.na.
But the analogy is flawed:  is.na(x) - operates on the _elements_ of
x, while x - affects the variable x.  When you assign NA to
_elements_ of a vector, the mode does not change:

 a - a  
 is.na(a) - TRUE
 mode(a)
[1] character
 b - b
 b[TRUE] - NA
 mode(b)
[1] character
 c - c
 c[1] - NA
 mode(c)
[1] character

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Richard A. O'Keefe
Concerning  x[i] - NA  vs  is.na(x[i]) - TRUE
Brian Ripley wrote:

I don't think it can ever `go wrong', but it can do things other
than the user intends.

If the user writes x[i] - NA, the user has clearly indicated his intention
that the i element(s) of x should become NA.  There isn't any clearer way to
say that.  The only way it could ever do something other than the user
intends is if the mode of x changes or the selected elements are set to
something other than NA.

The ?NA help page *hints* that this might be the case, but does not give
an example.

The question remains, *WHAT* can x[i]-NA do that might be other than
what the user intends?  An example (especially one added to the ?NA help)
would be very useful.

The intention of is.na- is clearer,

I find this extremely puzzling.  x[i] - NA is an extremely clear and
obvious way of saying I want the i element(s) of x to become NA.
is.na(x) - ... is not only an indirect way of doing this, it is a way
which is confusing and error-prone.

Bear in mind that one way of implementing something is is.na() would be
to associate a bit with each element of a vector; is.na() would test and
is.na-() would set that bit.  It would be possible to have a language
exactly like R -except- that

x - 1
is.na(x) - TRUE
x
=  NA
is.na(x) - FALSE
x
=  1

would work.  The very existence of an is.na- which accepts a logical
vector containing FALSE as well as TRUE strongly suggests this.  But it
doesn't work like that.  As I've pointed out, 
is.logical(m)  length(m) == length(x)  done{is.na(x) - m}
=  identical(is.na(x), m)
is the kind of behaviour that has been associated with well-behaved
sinister function calls for several decades, and yet this is not a fact
about R.

and so perhaps user error is less likely?

I see no reason to believe this; the bad behaviour of is.na- surely
makes user error *more* likely rather than *less* likely.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Duncan Murdoch
Tongue in cheek

But surely 

 is.na(x) - is.na(x)

is clearer than

 x[is.na(x)] - NA

(neither of which is a no-op).

/Tongue in cheek

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-07 Thread Richard A. O'Keefe
I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA.

?NA says
 Function `is.na-' may provide a safer way to set missingness. It
 behaves differently for factors, for example.

However, MAY provide is a bit scary, and it doesn't say WHAT the
difference in behaviour is.

I must say that is.na(x) - ... is rather repugnant, because it doesn't
work.  What do I mean?  Well, as the designers of SETL who many years ago
coined the term sinister function call to talk about f(...)-...,
pointed out, if you do
f(x) - y
then afterwards you expect
f(x) == y
to be true.  So let's try it:

 x - c(1,NA,3)
 is.na(x) - c(FALSE,FALSE,TRUE)
 x
[1]  1 NA NA
 is.na(x)
[1] FALSE  TRUE  TRUE
v
So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x),
but I _got_ c(FALSE,TRUE, TRUE) instead.
^
That is not how a well behaved sinister function call should work,
and it's enough to scare someone off is.na()- forever.

The obvious way to set elements of a variable to missing is ... - NA.
Wouldn't it be better if that just plain worked?

Can someone give an example of is.na()- and -NA working differently
with a factor?  I just tried it:

 x - factor(c(3,1,4,1,5,9))
 y - x
 is.na(x) - x==1
 y[y==1] - NA
 x
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9
 y
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9

Both approaches seem to have given the same answer.  What did I miss?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help