RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
I think the thread ended up with several people (not only me) feeling certain they didn't like `is.na-` but with the developers defending it and me not really understanding why. Uwe Ligges was going to come up with an example of `- NA` going wrong (sorry Brian R, I mean behaving unexpectedly), but never did, and I think the problem has been fixed. It was apparently a problem with assigning NAs to an existing factor, but the code for `[-.factor` looks pretty robust to me [not that I'm at all qualified to say that, be warned]. Interestingly, at some point both methods for `is.na-` perform this operation: x[value] - NA. Ahem. By the way, `is.na(x) - FALSE` will leave x unchanged (including leaving it as NA ! how bad is that ?!) -Original Message- From: Paul Lemmens [mailto:[EMAIL PROTECTED] Sent: 14 October 2003 16:10 To: [EMAIL PROTECTED] Subject: RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault) Security Warning: If you are not sure an attachment is safe to open please contact Andy on x234. There are 0 attachments with this message. By accident I'm also toying around with NA's, so I started reading up on this thread but failed to find a 'concluding' remark or advice. As a naive R user I would have loved to see a comment do it like this. The prevailing opinion seemed to be that is.na() might be better (safer) but x - NA is much clearer to understand. Can I relatively safely use the easy form, or is it better to remember (the hard way) the safer version? Has the discussion continued privately or just stopped here? Personally I still find the fragments below (taken from the thread) very counter intuitive, not to say scary. x - 1:10 is.na(x) - 1:5 and is.na(x) - FALSE It's very hard to understand what happens (as layman) because the assignment seems to reverse in meaning in the first example (actually taking indices 1:5 of x and assigning those the value NA) whereas in the second case it's not obvious what happens to x: will it get the value FALSE or will the original value remain(*). IMHO the - NA construct is much easier to understand and should be made safe in all possible situations (whatever the underlying safety problem or other difficulties might be). kind regards, Paul (*) Such a remark will probably lead to some kind of reprimand because it's probably somewhere within the 10e6 manual pages but I'm trying my luck here. -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /\ Montessorilaan 3 (B.01.03)Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber+31-24-3612648 Fax+31-24-3616066 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Hello Simon, --On woensdag 15 oktober 2003 10:08 +0100 Simon Fear [EMAIL PROTECTED] wrote: By the way, `is.na(x) - FALSE` will leave x unchanged (including leaving it as NA ! how bad is that ?!) Twilight Zone (Golden Earring). But with that remark I'm getting off topic, so thank you for your summary. I've already memorized the is.na() construct, so I should be safe for the time being : kind regards, Paul -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /\ Montessorilaan 3 (B.01.03)Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber+31-24-3612648 Fax+31-24-3616066 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
By accident I'm also toying around with NA's, so I started reading up on this thread but failed to find a 'concluding' remark or advice. As a naive R user I would have loved to see a comment do it like this. The prevailing opinion seemed to be that is.na() might be better (safer) but x - NA is much clearer to understand. Can I relatively safely use the easy form, or is it better to remember (the hard way) the safer version? Has the discussion continued privately or just stopped here? Personally I still find the fragments below (taken from the thread) very counter intuitive, not to say scary. x - 1:10 is.na(x) - 1:5 and is.na(x) - FALSE It's very hard to understand what happens (as layman) because the assignment seems to reverse in meaning in the first example (actually taking indices 1:5 of x and assigning those the value NA) whereas in the second case it's not obvious what happens to x: will it get the value FALSE or will the original value remain(*). IMHO the - NA construct is much easier to understand and should be made safe in all possible situations (whatever the underlying safety problem or other difficulties might be). kind regards, Paul (*) Such a remark will probably lead to some kind of reprimand because it's probably somewhere within the 10e6 manual pages but I'm trying my luck here. -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /\ Montessorilaan 3 (B.01.03)Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber+31-24-3612648 Fax+31-24-3616066 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
-Original Message- From: Richard A. O'Keefe [mailto:[EMAIL PROTECTED] snip The very existence of an is.na- which accepts a logical vector containing FALSE as well as TRUE ... And don't forget this is not the only usage of is.na-. In fact it is designed to take any valid indexing value. For example: a-1:10 is.na(a) - 1:5 a [1] NA NA NA NA NA 6 7 8 9 10 Wow. I really hate that. Someone tell me again why this is better than a[1:5] - NA ?? Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Richard A. O'Keefe wrote: I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA. ?NA says Function `is.na-' may provide a safer way to set missingness. It behaves differently for factors, for example. However, MAY provide is a bit scary, and it doesn't say WHAT the difference in behaviour is. I must say that is.na(x) - ... is rather repugnant, because it doesn't work. What do I mean? Well, as the designers of SETL who many years ago coined the term sinister function call to talk about f(...)-..., pointed out, if you do f(x) - y then afterwards you expect f(x) == y to be true. So let's try it: x - c(1,NA,3) is.na(x) - c(FALSE,FALSE,TRUE) x [1] 1 NA NA is.na(x) [1] FALSE TRUE TRUE v So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x), but I _got_ c(FALSE,TRUE, TRUE) instead. ^ That is not how a well behaved sinister function call should work, and it's enough to scare someone off is.na()- forever. The obvious way to set elements of a variable to missing is ... - NA. Wouldn't it be better if that just plain worked? Can someone give an example of is.na()- and -NA working differently with a factor? I just tried it: x - factor(c(3,1,4,1,5,9)) y - x is.na(x) - x==1 y[y==1] - NA x [1] 3NA 4NA 59 Levels: 1 3 4 5 9 y [1] 3NA 4NA 59 Levels: 1 3 4 5 9 Both approaches seem to have given the same answer. What did I miss? As mentioned in another mail to R-help. I'm pretty sure there was (is?) a problem with character (and/or factor) and assignment of NAs, but I cannot (re)produce an example. I think something for the x - NA case has been fixed during the last year. What prevents me to think I'm completely confused is that the is.na()- usage is proposed in: ?NA, S Programming, the R Language Definition manual, R's News file, but I cannot find it in the green book right now. Uwe Ligges __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Note this behaviour: a-a a-NA mode(a) [1] logical a-a is.na(a) - T mode(a) [1] character However after either way of assigning NA to a, is.na(a) is true, and it prints as NA, so I can't see it's ever likely to matter. [Why do I say these things? Expect usual flood of examples where it does matter.] Also if a is a character vector, a[2] - NA coerces the NA to as.character(NA); again, just as one would hope/expect. I have to echo Richard O'K's remark: if - NA can ever go wrong, is that not a bug rather than a feature? Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
On Wed, 8 Oct 2003, Simon Fear wrote: Note this behaviour: a-a a-NA mode(a) [1] logical a-a is.na(a) - T mode(a) [1] character However after either way of assigning NA to a, is.na(a) is true, and it prints as NA, so I can't see it's ever likely to matter. [Why do I say these things? Expect usual flood of examples where it does matter.] Also if a is a character vector, a[2] - NA coerces the NA to as.character(NA); again, just as one would hope/expect. I have to echo Richard O'K's remark: if - NA can ever go wrong, is that not a bug rather than a feature? I don't think it can ever `go wrong', but it can do things other than the user intends. The intention of is.na- is clearer, and so perhaps user error is less likely? That is the thinking behind the function, anyway. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Well, that's a convincing argument, but maybe it's the name that's worrying some of us. Maybe it would be more intuitive if called set.na (sorry, I mean setNA). Also is.na- cannot be used to create a new variable of NAs, so is not a universal method, which is a shame for its advocates. I note also that for a vector you can assign a new NA using either TRUE or FALSE: a - 1:3 is.na(a[4])-F a [1] 1 2 3 NA For a list, assigning F leaves the new element set to NULL. Mind you, I suspect this would be a particularly stupid thing to do, so I'm not going to lose any sleep over R's reaction to it. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] I don't think it can ever `go wrong', but it can do things other than the user intends. The intention of is.na- is clearer, and so perhaps user error is less likely? That is the thinking behind the function, anyway. Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Also, presumably is.na- could be redefined by the user for particular classes so if you got in the habit of setting NAs that way it would generalize better. --- Date: Wed, 8 Oct 2003 11:49:29 +0100 (BST) From: Prof Brian Ripley [EMAIL PROTECTED] I don't think it can ever `go wrong', but it can do things other than the user intends. The intention of is.na- is clearer, and so perhaps user error is less likely? That is the thinking behind the function, anyway. ___ No banners. No pop-ups. No kidding. Introducing My Way - http://www.myway.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Simon Fear [EMAIL PROTECTED] suggested that a-a a-NA mode(a) [1] logical a-a is.na(a) - T mode(a) [1] character might be a relevant difference between assigning NA and using is.na. But the analogy is flawed: is.na(x) - operates on the _elements_ of x, while x - affects the variable x. When you assign NA to _elements_ of a vector, the mode does not change: a - a is.na(a) - TRUE mode(a) [1] character b - b b[TRUE] - NA mode(b) [1] character c - c c[1] - NA mode(c) [1] character __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Concerning x[i] - NA vs is.na(x[i]) - TRUE Brian Ripley wrote: I don't think it can ever `go wrong', but it can do things other than the user intends. If the user writes x[i] - NA, the user has clearly indicated his intention that the i element(s) of x should become NA. There isn't any clearer way to say that. The only way it could ever do something other than the user intends is if the mode of x changes or the selected elements are set to something other than NA. The ?NA help page *hints* that this might be the case, but does not give an example. The question remains, *WHAT* can x[i]-NA do that might be other than what the user intends? An example (especially one added to the ?NA help) would be very useful. The intention of is.na- is clearer, I find this extremely puzzling. x[i] - NA is an extremely clear and obvious way of saying I want the i element(s) of x to become NA. is.na(x) - ... is not only an indirect way of doing this, it is a way which is confusing and error-prone. Bear in mind that one way of implementing something is is.na() would be to associate a bit with each element of a vector; is.na() would test and is.na-() would set that bit. It would be possible to have a language exactly like R -except- that x - 1 is.na(x) - TRUE x = NA is.na(x) - FALSE x = 1 would work. The very existence of an is.na- which accepts a logical vector containing FALSE as well as TRUE strongly suggests this. But it doesn't work like that. As I've pointed out, is.logical(m) length(m) == length(x) done{is.na(x) - m} = identical(is.na(x), m) is the kind of behaviour that has been associated with well-behaved sinister function calls for several decades, and yet this is not a fact about R. and so perhaps user error is less likely? I see no reason to believe this; the bad behaviour of is.na- surely makes user error *more* likely rather than *less* likely. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
Tongue in cheek But surely is.na(x) - is.na(x) is clearer than x[is.na(x)] - NA (neither of which is a no-op). /Tongue in cheek __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Beginner's query - segmentation fault
I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Thanks Laura __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
Laura Quinn [EMAIL PROTECTED] writes: I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Not telling us which system and which version you are using, and not giving us a reproducible example... OK, the latter can be tricky, but does it happen all the time? Only after doing X? Also if you deal with a subset of data? The command as such should work as far as I can see, and segmentation faults should basically not happen unless the user has been messing about at the C code level. (BTW, that's a data frame, not a matrix, I assume.) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
On Tue, 7 Oct 2003, Laura Quinn wrote: I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Well, R should not segfault, so there is bug here somewhere. However, I don't think what you have described can actually work. Is temp really a matrix? If so temp$t1 will return NULL, and you should get an error message. If temp is a matrix temp[temp == -999.00] - NA will do what you want. If as is more likely temp is a data frame with all columns numeric, there are several ways to do this, e.g. temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x)) temp[as.matrix(temp) == -999] - NA # only in recent versions of R as well as explicit looping over columns. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Beginner's query - segmentation fault
I cannot explain the segmentation fault but try this instead (which works for matrices) temp[which(temp==-999, arr.ind=T)] - NA Are you sure temp is matrix and not a dataframe ? Use class(temp) to find out. Also, if you are getting these -999.00 because you have read files containing them, it might just be easier to code the missing values when reading in. Try read.table( file=lala.txt, na.strings = -999.00). -- Adaikalavan Ramasamy -Original Message- From: Laura Quinn [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 07, 2003 8:04 PM To: [EMAIL PROTECTED] Subject: [R] Beginner's query - segmentation fault I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Thanks Laura __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
Laura Quinn wrote: I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? The crash for this inappropriate usage has already been fixed for R-1.7.1, so you are using an outdated version, I guess. 1. If temp is a matrix, you have to use matrix indexing, not data.frame or list indexing, see the manuals. Now, we have got the (still wrong) line temp[temp[ ,t1] == -999.00, t1] - NA 2. Use is.na(x) - TRUE instead of x - NA: is.na(temp[temp[ ,t1] == -999.00, t1]) - TRUE Or change all values -999 to NA in the whole matrix by is.na(temp[temp == -999.00]) - TRUE Uwe Ligges Thanks Laura __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
thanks, have used temp [temp==0]- NA and this seems to have worked, though it won't let me access individual columns (ie temp$t1 etc) to work on - is there any real advantage in using a matrix, or would i be better advised to deal with dataframes? (I have double checked and temp is currently a matrix). On Tue, 7 Oct 2003, Prof Brian Ripley wrote: On Tue, 7 Oct 2003, Laura Quinn wrote: I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Well, R should not segfault, so there is bug here somewhere. However, I don't think what you have described can actually work. Is temp really a matrix? If so temp$t1 will return NULL, and you should get an error message. If temp is a matrix temp[temp == -999.00] - NA will do what you want. If as is more likely temp is a data frame with all columns numeric, there are several ways to do this, e.g. temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x)) temp[as.matrix(temp) == -999] - NA # only in recent versions of R as well as explicit looping over columns. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
Adaikalavan RAMASAMY wrote: I cannot explain the segmentation fault but try this instead (which works for matrices) temp[which(temp==-999, arr.ind=T)] - NA No! Please *do* use is.na()- !!! Uwe Ligges Are you sure temp is matrix and not a dataframe ? Use class(temp) to find out. Also, if you are getting these -999.00 because you have read files containing them, it might just be easier to code the missing values when reading in. Try read.table( file=lala.txt, na.strings = -999.00). -- Adaikalavan Ramasamy -Original Message- From: Laura Quinn [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 07, 2003 8:04 PM To: [EMAIL PROTECTED] Subject: [R] Beginner's query - segmentation fault I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Thanks Laura __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
On Tue, 7 Oct 2003, Laura Quinn wrote: thanks, have used temp [temp==0]- NA and this seems to have worked, though it won't let me access individual columns (ie temp$t1 etc) to work on - is there any real advantage in using a matrix, or would i be better advised to deal with dataframes? (I have double checked and temp is currently a matrix). Things are going to be a lot faster for a numerical matrix than a data frame: the advantage of data frames is that the columns can be of different types. BTW, you should really use temp[, t1] for a data frame or a matrix: temp$t1 works for data frames, `by the back door' and has a number of bugs (including failing to detect errors which corrupt the data frame) prior to 1.8.0 (to be). On Tue, 7 Oct 2003, Prof Brian Ripley wrote: On Tue, 7 Oct 2003, Laura Quinn wrote: I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Well, R should not segfault, so there is bug here somewhere. However, I don't think what you have described can actually work. Is temp really a matrix? If so temp$t1 will return NULL, and you should get an error message. If temp is a matrix temp[temp == -999.00] - NA will do what you want. If as is more likely temp is a data frame with all columns numeric, there are several ways to do this, e.g. temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x)) temp[as.matrix(temp) == -999] - NA # only in recent versions of R as well as explicit looping over columns. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Beginner's query - segmentation fault
Laura Quinn wrote: thanks, have used temp [temp==0]- NA Please use is.na(temp[temp==0]) - TRUE and this seems to have worked, though it won't let me access individual columns (ie temp$t1 etc) No! temp$t1 is a list element or column of a data.frame, but not a column of a matrix. *PLEASE*, read manuals, help pages, or books on R how to use index / extract elements. Please read my previous answer on how to access individual columns. to work on - is there any real advantage in using a matrix, or would i be better advised to deal with dataframes? (I have double checked and temp is currently a matrix). Working on matrices is supposed to be faster. But matrices have the restriction of one data type for all columns (e.g. numeric). Uwe Ligges On Tue, 7 Oct 2003, Prof Brian Ripley wrote: On Tue, 7 Oct 2003, Laura Quinn wrote: I am dealing with a huge matrix in R (20 columns, 54000 rows) and have lots of missing values within the dataset which are currently displayed as the value -999.00 I am trying to create a new matrix (or change the existing one) to display these values as NA so that I can then perform the necessary analysis on the columns within the matrix. The matrix name is temp and the column names are t1 to t20 inclusive. I have tried the following command: temp$t1[temp$t1 == -999.00] - NA and it returns a segmentation fault, can someone tell me what I am doing wrong? Well, R should not segfault, so there is bug here somewhere. However, I don't think what you have described can actually work. Is temp really a matrix? If so temp$t1 will return NULL, and you should get an error message. If temp is a matrix temp[temp == -999.00] - NA will do what you want. If as is more likely temp is a data frame with all columns numeric, there are several ways to do this, e.g. temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x)) temp[as.matrix(temp) == -999] - NA # only in recent versions of R as well as explicit looping over columns. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)
I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA. ?NA says Function `is.na-' may provide a safer way to set missingness. It behaves differently for factors, for example. However, MAY provide is a bit scary, and it doesn't say WHAT the difference in behaviour is. I must say that is.na(x) - ... is rather repugnant, because it doesn't work. What do I mean? Well, as the designers of SETL who many years ago coined the term sinister function call to talk about f(...)-..., pointed out, if you do f(x) - y then afterwards you expect f(x) == y to be true. So let's try it: x - c(1,NA,3) is.na(x) - c(FALSE,FALSE,TRUE) x [1] 1 NA NA is.na(x) [1] FALSE TRUE TRUE v So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x), but I _got_ c(FALSE,TRUE, TRUE) instead. ^ That is not how a well behaved sinister function call should work, and it's enough to scare someone off is.na()- forever. The obvious way to set elements of a variable to missing is ... - NA. Wouldn't it be better if that just plain worked? Can someone give an example of is.na()- and -NA working differently with a factor? I just tried it: x - factor(c(3,1,4,1,5,9)) y - x is.na(x) - x==1 y[y==1] - NA x [1] 3NA 4NA 59 Levels: 1 3 4 5 9 y [1] 3NA 4NA 59 Levels: 1 3 4 5 9 Both approaches seem to have given the same answer. What did I miss? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help