RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
I think the thread ended up with several people (not only me) feeling certain they didn't like `is.na-` but with the developers defending it and me not really understanding why. Uwe Ligges was going to come up with an example of `- NA` going wrong (sorry Brian R, I mean behaving unexpectedly), but never did, and I think the problem has been fixed. It was apparently a problem with assigning NAs to an existing factor, but the code for `[-.factor` looks pretty robust to me [not that I'm at all qualified to say that, be warned]. Interestingly, at some point both methods for `is.na-` perform this operation: x[value] - NA. Ahem. By the way, `is.na(x) - FALSE` will leave x unchanged (including leaving it as NA ! how bad is that ?!) -Original Message- From: Paul Lemmens [mailto:[EMAIL PROTECTED] Sent: 14 October 2003 16:10 To: [EMAIL PROTECTED] Subject: RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault) Security Warning: If you are not sure an attachment is safe to open please contact Andy on x234. There are 0 attachments with this message. By accident I'm also toying around with NA's, so I started reading up on this thread but failed to find a 'concluding' remark or advice. As a naive R user I would have loved to see a comment do it like this. The prevailing opinion seemed to be that is.na() might be better (safer) but x - NA is much clearer to understand. Can I relatively safely use the easy form, or is it better to remember (the hard way) the safer version? Has the discussion continued privately or just stopped here? Personally I still find the fragments below (taken from the thread) very counter intuitive, not to say scary. x - 1:10 is.na(x) - 1:5 and is.na(x) - FALSE It's very hard to understand what happens (as layman) because the assignment seems to reverse in meaning in the first example (actually taking indices 1:5 of x and assigning those the value NA) whereas in the second case it's not obvious what happens to x: will it get the value FALSE or will the original value remain(*). IMHO the - NA construct is much easier to understand and should be made safe in all possible situations (whatever the underlying safety problem or other difficulties might be). kind regards, Paul (*) Such a remark will probably lead to some kind of reprimand because it's probably somewhere within the 10e6 manual pages but I'm trying my luck here. -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /\ Montessorilaan 3 (B.01.03)Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber+31-24-3612648 Fax+31-24-3616066 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Hello Simon, --On woensdag 15 oktober 2003 10:08 +0100 Simon Fear [EMAIL PROTECTED] wrote: By the way, `is.na(x) - FALSE` will leave x unchanged (including leaving it as NA ! how bad is that ?!) Twilight Zone (Golden Earring). But with that remark I'm getting off topic, so thank you for your summary. I've already memorized the is.na() construct, so I should be safe for the time being : kind regards, Paul -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /\ Montessorilaan 3 (B.01.03)Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber+31-24-3612648 Fax+31-24-3616066 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
By accident I'm also toying around with NA's, so I started reading up on this thread but failed to find a 'concluding' remark or advice. As a naive R user I would have loved to see a comment do it like this. The prevailing opinion seemed to be that is.na() might be better (safer) but x - NA is much clearer to understand. Can I relatively safely use the easy form, or is it better to remember (the hard way) the safer version? Has the discussion continued privately or just stopped here? Personally I still find the fragments below (taken from the thread) very counter intuitive, not to say scary. x - 1:10 is.na(x) - 1:5 and is.na(x) - FALSE It's very hard to understand what happens (as layman) because the assignment seems to reverse in meaning in the first example (actually taking indices 1:5 of x and assigning those the value NA) whereas in the second case it's not obvious what happens to x: will it get the value FALSE or will the original value remain(*). IMHO the - NA construct is much easier to understand and should be made safe in all possible situations (whatever the underlying safety problem or other difficulties might be). kind regards, Paul (*) Such a remark will probably lead to some kind of reprimand because it's probably somewhere within the 10e6 manual pages but I'm trying my luck here. -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /\ Montessorilaan 3 (B.01.03)Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber+31-24-3612648 Fax+31-24-3616066 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
-Original Message- From: Richard A. O'Keefe [mailto:[EMAIL PROTECTED] snip The very existence of an is.na- which accepts a logical vector containing FALSE as well as TRUE ... And don't forget this is not the only usage of is.na-. In fact it is designed to take any valid indexing value. For example: a-1:10 is.na(a) - 1:5 a [1] NA NA NA NA NA 6 7 8 9 10 Wow. I really hate that. Someone tell me again why this is better than a[1:5] - NA ?? Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Richard A. O'Keefe wrote: I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA. ?NA says Function `is.na-' may provide a safer way to set missingness. It behaves differently for factors, for example. However, MAY provide is a bit scary, and it doesn't say WHAT the difference in behaviour is. I must say that is.na(x) - ... is rather repugnant, because it doesn't work. What do I mean? Well, as the designers of SETL who many years ago coined the term sinister function call to talk about f(...)-..., pointed out, if you do f(x) - y then afterwards you expect f(x) == y to be true. So let's try it: x - c(1,NA,3) is.na(x) - c(FALSE,FALSE,TRUE) x [1] 1 NA NA is.na(x) [1] FALSE TRUE TRUE v So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x), but I _got_ c(FALSE,TRUE, TRUE) instead. ^ That is not how a well behaved sinister function call should work, and it's enough to scare someone off is.na()- forever. The obvious way to set elements of a variable to missing is ... - NA. Wouldn't it be better if that just plain worked? Can someone give an example of is.na()- and -NA working differently with a factor? I just tried it: x - factor(c(3,1,4,1,5,9)) y - x is.na(x) - x==1 y[y==1] - NA x [1] 3NA 4NA 59 Levels: 1 3 4 5 9 y [1] 3NA 4NA 59 Levels: 1 3 4 5 9 Both approaches seem to have given the same answer. What did I miss? As mentioned in another mail to R-help. I'm pretty sure there was (is?) a problem with character (and/or factor) and assignment of NAs, but I cannot (re)produce an example. I think something for the x - NA case has been fixed during the last year. What prevents me to think I'm completely confused is that the is.na()- usage is proposed in: ?NA, S Programming, the R Language Definition manual, R's News file, but I cannot find it in the green book right now. Uwe Ligges __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Note this behaviour: a-a a-NA mode(a) [1] logical a-a is.na(a) - T mode(a) [1] character However after either way of assigning NA to a, is.na(a) is true, and it prints as NA, so I can't see it's ever likely to matter. [Why do I say these things? Expect usual flood of examples where it does matter.] Also if a is a character vector, a[2] - NA coerces the NA to as.character(NA); again, just as one would hope/expect. I have to echo Richard O'K's remark: if - NA can ever go wrong, is that not a bug rather than a feature? Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
On Wed, 8 Oct 2003, Simon Fear wrote: Note this behaviour: a-a a-NA mode(a) [1] logical a-a is.na(a) - T mode(a) [1] character However after either way of assigning NA to a, is.na(a) is true, and it prints as NA, so I can't see it's ever likely to matter. [Why do I say these things? Expect usual flood of examples where it does matter.] Also if a is a character vector, a[2] - NA coerces the NA to as.character(NA); again, just as one would hope/expect. I have to echo Richard O'K's remark: if - NA can ever go wrong, is that not a bug rather than a feature? I don't think it can ever `go wrong', but it can do things other than the user intends. The intention of is.na- is clearer, and so perhaps user error is less likely? That is the thinking behind the function, anyway. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Well, that's a convincing argument, but maybe it's the name that's worrying some of us. Maybe it would be more intuitive if called set.na (sorry, I mean setNA). Also is.na- cannot be used to create a new variable of NAs, so is not a universal method, which is a shame for its advocates. I note also that for a vector you can assign a new NA using either TRUE or FALSE: a - 1:3 is.na(a[4])-F a [1] 1 2 3 NA For a list, assigning F leaves the new element set to NULL. Mind you, I suspect this would be a particularly stupid thing to do, so I'm not going to lose any sleep over R's reaction to it. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] I don't think it can ever `go wrong', but it can do things other than the user intends. The intention of is.na- is clearer, and so perhaps user error is less likely? That is the thinking behind the function, anyway. Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Also, presumably is.na- could be redefined by the user for particular classes so if you got in the habit of setting NAs that way it would generalize better. --- Date: Wed, 8 Oct 2003 11:49:29 +0100 (BST) From: Prof Brian Ripley [EMAIL PROTECTED] I don't think it can ever `go wrong', but it can do things other than the user intends. The intention of is.na- is clearer, and so perhaps user error is less likely? That is the thinking behind the function, anyway. ___ No banners. No pop-ups. No kidding. Introducing My Way - http://www.myway.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Simon Fear [EMAIL PROTECTED] suggested that a-a a-NA mode(a) [1] logical a-a is.na(a) - T mode(a) [1] character might be a relevant difference between assigning NA and using is.na. But the analogy is flawed: is.na(x) - operates on the _elements_ of x, while x - affects the variable x. When you assign NA to _elements_ of a vector, the mode does not change: a - a is.na(a) - TRUE mode(a) [1] character b - b b[TRUE] - NA mode(b) [1] character c - c c[1] - NA mode(c) [1] character __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Concerning x[i] - NA vs is.na(x[i]) - TRUE Brian Ripley wrote: I don't think it can ever `go wrong', but it can do things other than the user intends. If the user writes x[i] - NA, the user has clearly indicated his intention that the i element(s) of x should become NA. There isn't any clearer way to say that. The only way it could ever do something other than the user intends is if the mode of x changes or the selected elements are set to something other than NA. The ?NA help page *hints* that this might be the case, but does not give an example. The question remains, *WHAT* can x[i]-NA do that might be other than what the user intends? An example (especially one added to the ?NA help) would be very useful. The intention of is.na- is clearer, I find this extremely puzzling. x[i] - NA is an extremely clear and obvious way of saying I want the i element(s) of x to become NA. is.na(x) - ... is not only an indirect way of doing this, it is a way which is confusing and error-prone. Bear in mind that one way of implementing something is is.na() would be to associate a bit with each element of a vector; is.na() would test and is.na-() would set that bit. It would be possible to have a language exactly like R -except- that x - 1 is.na(x) - TRUE x = NA is.na(x) - FALSE x = 1 would work. The very existence of an is.na- which accepts a logical vector containing FALSE as well as TRUE strongly suggests this. But it doesn't work like that. As I've pointed out, is.logical(m) length(m) == length(x) done{is.na(x) - m} = identical(is.na(x), m) is the kind of behaviour that has been associated with well-behaved sinister function calls for several decades, and yet this is not a fact about R. and so perhaps user error is less likely? I see no reason to believe this; the bad behaviour of is.na- surely makes user error *more* likely rather than *less* likely. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Tongue in cheek But surely is.na(x) - is.na(x) is clearer than x[is.na(x)] - NA (neither of which is a no-op). /Tongue in cheek __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA. ?NA says Function `is.na-' may provide a safer way to set missingness. It behaves differently for factors, for example. However, MAY provide is a bit scary, and it doesn't say WHAT the difference in behaviour is. I must say that is.na(x) - ... is rather repugnant, because it doesn't work. What do I mean? Well, as the designers of SETL who many years ago coined the term sinister function call to talk about f(...)-..., pointed out, if you do f(x) - y then afterwards you expect f(x) == y to be true. So let's try it: x - c(1,NA,3) is.na(x) - c(FALSE,FALSE,TRUE) x [1] 1 NA NA is.na(x) [1] FALSE TRUE TRUE v So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x), but I _got_ c(FALSE,TRUE, TRUE) instead. ^ That is not how a well behaved sinister function call should work, and it's enough to scare someone off is.na()- forever. The obvious way to set elements of a variable to missing is ... - NA. Wouldn't it be better if that just plain worked? Can someone give an example of is.na()- and -NA working differently with a factor? I just tried it: x - factor(c(3,1,4,1,5,9)) y - x is.na(x) - x==1 y[y==1] - NA x [1] 3NA 4NA 59 Levels: 1 3 4 5 9 y [1] 3NA 4NA 59 Levels: 1 3 4 5 9 Both approaches seem to have given the same answer. What did I miss? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help