RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-15 Thread Simon Fear
I think the thread ended up with several people (not only me)
feeling certain they didn't like `is.na-` but with the 
developers defending it and me not really understanding
why.

Uwe Ligges was going to come up with an example of
`- NA` going wrong (sorry Brian R, I mean behaving
unexpectedly), but never did, and I think the problem
has been fixed. It was apparently a problem with assigning
NAs to an existing factor, but the code for `[-.factor`
looks pretty robust to me [not that I'm at all qualified to say
that, be warned]. Interestingly, at some point both methods
for `is.na-` perform this operation: x[value] - NA. Ahem.

By the way, `is.na(x) - FALSE` will leave x unchanged (including
leaving it as NA ! how bad is that ?!)

 -Original Message-
 From: Paul Lemmens [mailto:[EMAIL PROTECTED]
 Sent: 14 October 2003 16:10
 To: [EMAIL PROTECTED]
 Subject: RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation
 fault)
 
 
 Security Warning:
 If you are not sure an attachment is safe to open please contact 
 Andy on x234. There are 0 attachments with this message.
 
 
 By accident I'm also toying around with NA's, so I started 
 reading up on
 this thread but failed to find a 'concluding' remark or advice. As a
 naive 
 R user I would have loved to see a comment do it like this.
 
 The prevailing opinion seemed to be that is.na() might be 
 better (safer)
 but x - NA is much clearer to understand. Can I relatively safely use
 the 
 easy form, or is it better to remember (the hard way) the 
 safer version?
 Has the discussion continued privately or just stopped here?
 
 Personally I still find the fragments below (taken from the 
 thread) very
 counter intuitive, not to say scary.
 
 x - 1:10
 is.na(x) - 1:5
 
 and
 
 is.na(x) - FALSE
 
 
 It's very hard to understand what happens (as layman) because the 
 assignment seems to reverse in meaning in the first example (actually 
 taking indices 1:5 of x and assigning those the value NA) 
 whereas in the
 second case it's not obvious what happens to x: will it get the value
 FALSE 
 or will the original value remain(*).
 
 IMHO the - NA construct is much easier to understand and 
 should be made
 safe in all possible situations (whatever the underlying 
 safety problem
 or 
 other difficulties might be).
 
 
 kind regards,
 Paul
 
 (*) Such a remark will probably lead to some kind of reprimand because
 it's 
 probably somewhere within the 10e6 manual pages but I'm trying my luck
 here.
 
 
 -- 
 Paul Lemmens
 NICI, University of Nijmegen  ASCII Ribbon Campaign /\
 Montessorilaan 3 (B.01.03)Against HTML Mail \ /
 NL-6525 HR Nijmegen  X
 The Netherlands / \
 Phonenumber+31-24-3612648
 Fax+31-24-3616066
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-15 Thread Paul Lemmens
Hello Simon,

--On woensdag 15 oktober 2003 10:08 +0100 Simon Fear 
[EMAIL PROTECTED] wrote:

By the way, `is.na(x) - FALSE` will leave x unchanged (including
leaving it as NA ! how bad is that ?!)
Twilight Zone (Golden Earring). But with that remark I'm getting off topic, 
so thank you for your summary. I've already memorized the is.na() 
construct, so I should be safe for the time being :

kind regards,
Paul


--
Paul Lemmens
NICI, University of Nijmegen  ASCII Ribbon Campaign /\
Montessorilaan 3 (B.01.03)Against HTML Mail \ /
NL-6525 HR Nijmegen  X
The Netherlands / \
Phonenumber+31-24-3612648
Fax+31-24-3616066
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-14 Thread Paul Lemmens
By accident I'm also toying around with NA's, so I started reading up on 
this thread but failed to find a 'concluding' remark or advice. As a naive 
R user I would have loved to see a comment do it like this.

The prevailing opinion seemed to be that is.na() might be better (safer) 
but x - NA is much clearer to understand. Can I relatively safely use the 
easy form, or is it better to remember (the hard way) the safer version? 
Has the discussion continued privately or just stopped here?

Personally I still find the fragments below (taken from the thread) very 
counter intuitive, not to say scary.

x - 1:10
is.na(x) - 1:5
and

is.na(x) - FALSE

It's very hard to understand what happens (as layman) because the 
assignment seems to reverse in meaning in the first example (actually 
taking indices 1:5 of x and assigning those the value NA) whereas in the 
second case it's not obvious what happens to x: will it get the value FALSE 
or will the original value remain(*).

IMHO the - NA construct is much easier to understand and should be made 
safe in all possible situations (whatever the underlying safety problem or 
other difficulties might be).

kind regards,
Paul
(*) Such a remark will probably lead to some kind of reprimand because it's 
probably somewhere within the 10e6 manual pages but I'm trying my luck here.

--
Paul Lemmens
NICI, University of Nijmegen  ASCII Ribbon Campaign /\
Montessorilaan 3 (B.01.03)Against HTML Mail \ /
NL-6525 HR Nijmegen  X
The Netherlands / \
Phonenumber+31-24-3612648
Fax+31-24-3616066
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-09 Thread Simon Fear

 -Original Message-
 From: Richard A. O'Keefe [mailto:[EMAIL PROTECTED]
snip

 The very existence of an is.na- which accepts  a logical
 vector containing FALSE as well as TRUE ...

And don't forget this is not the only usage of is.na-. In fact it is 
designed to take any valid indexing value. For example:

 a-1:10
 is.na(a) - 1:5
 a
 [1] NA NA NA NA NA  6  7  8  9 10

Wow. I really hate that. Someone tell me again why this is
better than a[1:5] - NA ??
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Uwe Ligges
Richard A. O'Keefe wrote:

I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA.

?NA says
 Function `is.na-' may provide a safer way to set missingness. It
 behaves differently for factors, for example.
However, MAY provide is a bit scary, and it doesn't say WHAT the
difference in behaviour is.
I must say that is.na(x) - ... is rather repugnant, because it doesn't
work.  What do I mean?  Well, as the designers of SETL who many years ago
coined the term sinister function call to talk about f(...)-...,
pointed out, if you do
f(x) - y
then afterwards you expect
f(x) == y
to be true.  So let's try it:
 x - c(1,NA,3)
 is.na(x) - c(FALSE,FALSE,TRUE)
 x
[1]  1 NA NA
 is.na(x)
[1] FALSE  TRUE  TRUE
v
So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x),
but I _got_ c(FALSE,TRUE, TRUE) instead.
^
That is not how a well behaved sinister function call should work,
and it's enough to scare someone off is.na()- forever.
The obvious way to set elements of a variable to missing is ... - NA.
Wouldn't it be better if that just plain worked?
Can someone give an example of is.na()- and -NA working differently
with a factor?  I just tried it:
 x - factor(c(3,1,4,1,5,9))
 y - x
 is.na(x) - x==1
 y[y==1] - NA
 x
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9
 y
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9

Both approaches seem to have given the same answer.  What did I miss?


As mentioned in another mail to R-help. I'm pretty sure there was (is?) 
a problem with character (and/or factor) and assignment of NAs, but I 
cannot (re)produce an example. I think something for the x - NA case 
has been fixed during the last year.
What prevents me to think I'm completely confused is that the is.na()- 
usage is proposed in: ?NA, S Programming, the R Language Definition 
manual, R's News file, but I cannot find it in the green book right now.

Uwe Ligges

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Simon Fear
Note this behaviour:

 a-a
 a-NA
 mode(a)
[1] logical
 a-a
 is.na(a) - T
 mode(a)
[1] character

However after either way of assigning NA to a, is.na(a) is true,
and it prints as NA, so I can't see it's ever likely to matter. [Why
do I say these things? Expect usual flood of examples where it 
does matter.]

Also if a is a character vector, a[2] - NA coerces the NA to
as.character(NA); again, just as one would hope/expect.

I have to echo Richard O'K's remark: if - NA can ever go wrong,
is that not a bug rather than a feature?
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Prof Brian Ripley
On Wed, 8 Oct 2003, Simon Fear wrote:

 Note this behaviour:
 
  a-a
  a-NA
  mode(a)
 [1] logical
  a-a
  is.na(a) - T
  mode(a)
 [1] character
 
 However after either way of assigning NA to a, is.na(a) is true,
 and it prints as NA, so I can't see it's ever likely to matter. [Why
 do I say these things? Expect usual flood of examples where it 
 does matter.]
 
 Also if a is a character vector, a[2] - NA coerces the NA to
 as.character(NA); again, just as one would hope/expect.
 
 I have to echo Richard O'K's remark: if - NA can ever go wrong,
 is that not a bug rather than a feature?

I don't think it can ever `go wrong', but it can do things other than the 
user intends.  The intention of is.na- is clearer, and so perhaps user 
error is less likely?  That is the thinking behind the function, anyway.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Simon Fear
Well, that's a convincing argument, but maybe 
it's the name that's worrying some of us. Maybe it would be 
more intuitive if called set.na (sorry, I mean setNA).

Also is.na- cannot be used to create a new variable of 
NAs, so is not a universal method,  which is a shame for its 
advocates.

I note also that for a vector you can assign a new NA using 
either TRUE or FALSE:

 a - 1:3
 is.na(a[4])-F
 a
[1]  1  2  3 NA

For a list,  assigning F leaves the new element set to NULL.

Mind you, I suspect this would be a particularly stupid thing 
to do, so I'm not going to lose any sleep over R's reaction to it.

 
 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 I don't think it can ever `go wrong', but it can do things other than
 the 
 user intends.  The intention of is.na- is clearer, and so 
 perhaps user 
 error is less likely?  That is the thinking behind the 
 function, anyway.
 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Gabor Grothendieck


Also, presumably is.na- could be redefined by the user for particular
classes so if you got in the habit of setting NAs that way it would
generalize better.

--- 
Date: Wed, 8 Oct 2003 11:49:29 +0100 (BST) 
From: Prof Brian Ripley [EMAIL PROTECTED]

I don't think it can ever `go wrong', but it can do things other than the 
user intends. The intention of is.na- is clearer, and so perhaps user 
error is less likely? That is the thinking behind the function, anyway.



___
No banners. No pop-ups. No kidding.
Introducing My Way - http://www.myway.com

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Richard A. O'Keefe
Simon Fear [EMAIL PROTECTED] suggested that

 a-a
 a-NA
 mode(a)
[1] logical
 a-a
 is.na(a) - T
 mode(a)
[1] character

might be a relevant difference between assigning NA and using is.na.
But the analogy is flawed:  is.na(x) - operates on the _elements_ of
x, while x - affects the variable x.  When you assign NA to
_elements_ of a vector, the mode does not change:

 a - a  
 is.na(a) - TRUE
 mode(a)
[1] character
 b - b
 b[TRUE] - NA
 mode(b)
[1] character
 c - c
 c[1] - NA
 mode(c)
[1] character

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Richard A. O'Keefe
Concerning  x[i] - NA  vs  is.na(x[i]) - TRUE
Brian Ripley wrote:

I don't think it can ever `go wrong', but it can do things other
than the user intends.

If the user writes x[i] - NA, the user has clearly indicated his intention
that the i element(s) of x should become NA.  There isn't any clearer way to
say that.  The only way it could ever do something other than the user
intends is if the mode of x changes or the selected elements are set to
something other than NA.

The ?NA help page *hints* that this might be the case, but does not give
an example.

The question remains, *WHAT* can x[i]-NA do that might be other than
what the user intends?  An example (especially one added to the ?NA help)
would be very useful.

The intention of is.na- is clearer,

I find this extremely puzzling.  x[i] - NA is an extremely clear and
obvious way of saying I want the i element(s) of x to become NA.
is.na(x) - ... is not only an indirect way of doing this, it is a way
which is confusing and error-prone.

Bear in mind that one way of implementing something is is.na() would be
to associate a bit with each element of a vector; is.na() would test and
is.na-() would set that bit.  It would be possible to have a language
exactly like R -except- that

x - 1
is.na(x) - TRUE
x
=  NA
is.na(x) - FALSE
x
=  1

would work.  The very existence of an is.na- which accepts a logical
vector containing FALSE as well as TRUE strongly suggests this.  But it
doesn't work like that.  As I've pointed out, 
is.logical(m)  length(m) == length(x)  done{is.na(x) - m}
=  identical(is.na(x), m)
is the kind of behaviour that has been associated with well-behaved
sinister function calls for several decades, and yet this is not a fact
about R.

and so perhaps user error is less likely?

I see no reason to believe this; the bad behaviour of is.na- surely
makes user error *more* likely rather than *less* likely.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-08 Thread Duncan Murdoch
Tongue in cheek

But surely 

 is.na(x) - is.na(x)

is clearer than

 x[is.na(x)] - NA

(neither of which is a no-op).

/Tongue in cheek

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Peter Dalgaard BSA
Laura Quinn [EMAIL PROTECTED] writes:

 I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
 lots of missing values within the dataset which are currently displayed as
 the value -999.00 I am trying to create a new matrix (or change the
 existing one) to display these values as NA so that I can then perform
 the necessary analysis on the columns within the matrix.
 
 The matrix name is temp and the column names are t1 to t20 inclusive.
 
 I have tried the following command:
 
 temp$t1[temp$t1 == -999.00] - NA
 
 and it returns a segmentation fault, can someone tell me what I am doing
 wrong?

Not telling us which system and which version you are using, and not
giving us a reproducible example... OK, the latter can be tricky, but
does it happen all the time? Only after doing X? Also if you deal with
a subset of data? 

The command as such should work as far as I can see, and segmentation
faults should basically not happen unless the user has been messing
about at the C code level. 

(BTW, that's a data frame, not a matrix, I assume.)

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Prof Brian Ripley
On Tue, 7 Oct 2003, Laura Quinn wrote:

 I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
 lots of missing values within the dataset which are currently displayed as
 the value -999.00 I am trying to create a new matrix (or change the
 existing one) to display these values as NA so that I can then perform
 the necessary analysis on the columns within the matrix.
 
 The matrix name is temp and the column names are t1 to t20 inclusive.
 
 I have tried the following command:
 
 temp$t1[temp$t1 == -999.00] - NA
 
 and it returns a segmentation fault, can someone tell me what I am doing
 wrong?

Well, R should not segfault, so there is bug here somewhere.  However, I
don't think what you have described can actually work. Is temp really a
matrix?  If so temp$t1 will return NULL, and you should get an error
message.


If temp is a matrix

temp[temp == -999.00] - NA

will do what you want.


If as is more likely temp is a data frame with all columns numeric,
there are several ways to do this, e.g.

temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x))

temp[as.matrix(temp) == -999] - NA  # only in recent versions of R

as well as explicit looping over columns.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Beginner's query - segmentation fault

2003-10-07 Thread Adaikalavan RAMASAMY
I cannot explain the segmentation fault but try this instead (which
works for matrices) 

temp[which(temp==-999, arr.ind=T)] - NA

Are you sure temp is matrix and not a dataframe ? Use class(temp) to
find out.

Also, if you are getting these -999.00 because you have read files
containing them, it might just be easier to code the missing values when
reading in. Try read.table( file=lala.txt,  na.strings = -999.00).

--
Adaikalavan Ramasamy 



-Original Message-
From: Laura Quinn [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 07, 2003 8:04 PM
To: [EMAIL PROTECTED]
Subject: [R] Beginner's query - segmentation fault


I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
lots of missing values within the dataset which are currently displayed
as the value -999.00 I am trying to create a new matrix (or change the
existing one) to display these values as NA so that I can then perform
the necessary analysis on the columns within the matrix.

The matrix name is temp and the column names are t1 to t20 inclusive.

I have tried the following command:

temp$t1[temp$t1 == -999.00] - NA

and it returns a segmentation fault, can someone tell me what I am doing
wrong?

Thanks
Laura

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Uwe Ligges
Laura Quinn wrote:

I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
lots of missing values within the dataset which are currently displayed as
the value -999.00 I am trying to create a new matrix (or change the
existing one) to display these values as NA so that I can then perform
the necessary analysis on the columns within the matrix.
The matrix name is temp and the column names are t1 to t20 inclusive.

I have tried the following command:

temp$t1[temp$t1 == -999.00] - NA

and it returns a segmentation fault, can someone tell me what I am doing
wrong?
The crash for this inappropriate usage has already been fixed for 
R-1.7.1, so you are using an outdated version, I guess.

1. If temp is a matrix, you have to use matrix indexing, not data.frame 
or list indexing, see the manuals.

Now, we have got the (still wrong) line
  temp[temp[ ,t1] == -999.00, t1] - NA
2. Use is.na(x) - TRUE instead of x - NA:
  is.na(temp[temp[ ,t1] == -999.00, t1]) - TRUE
Or change all values -999 to NA in the whole matrix by
  is.na(temp[temp == -999.00]) - TRUE
Uwe Ligges


Thanks
Laura
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Laura Quinn
thanks, have used

temp [temp==0]- NA

and this seems to have worked, though it won't let me access individual
columns (ie temp$t1 etc) to work on - is there any real advantage in using
a matrix, or would i be better advised to deal with dataframes? (I have
double checked and temp is currently a matrix).



On Tue, 7 Oct 2003, Prof Brian Ripley wrote:

 On Tue, 7 Oct 2003, Laura Quinn wrote:

  I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
  lots of missing values within the dataset which are currently displayed as
  the value -999.00 I am trying to create a new matrix (or change the
  existing one) to display these values as NA so that I can then perform
  the necessary analysis on the columns within the matrix.
 
  The matrix name is temp and the column names are t1 to t20 inclusive.
 
  I have tried the following command:
 
  temp$t1[temp$t1 == -999.00] - NA
 
  and it returns a segmentation fault, can someone tell me what I am doing
  wrong?

 Well, R should not segfault, so there is bug here somewhere.  However, I
 don't think what you have described can actually work. Is temp really a
 matrix?  If so temp$t1 will return NULL, and you should get an error
 message.


 If temp is a matrix

 temp[temp == -999.00] - NA

 will do what you want.


 If as is more likely temp is a data frame with all columns numeric,
 there are several ways to do this, e.g.

 temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x))

 temp[as.matrix(temp) == -999] - NA  # only in recent versions of R

 as well as explicit looping over columns.

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Uwe Ligges
Adaikalavan RAMASAMY wrote:
I cannot explain the segmentation fault but try this instead (which
works for matrices) 

temp[which(temp==-999, arr.ind=T)] - NA
No! Please *do* use is.na()- !!!

Uwe Ligges

Are you sure temp is matrix and not a dataframe ? Use class(temp) to
find out.
Also, if you are getting these -999.00 because you have read files
containing them, it might just be easier to code the missing values when
reading in. Try read.table( file=lala.txt,  na.strings = -999.00).
--
Adaikalavan Ramasamy 



-Original Message-
From: Laura Quinn [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 07, 2003 8:04 PM
To: [EMAIL PROTECTED]
Subject: [R] Beginner's query - segmentation fault

I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
lots of missing values within the dataset which are currently displayed
as the value -999.00 I am trying to create a new matrix (or change the
existing one) to display these values as NA so that I can then perform
the necessary analysis on the columns within the matrix.
The matrix name is temp and the column names are t1 to t20 inclusive.

I have tried the following command:

temp$t1[temp$t1 == -999.00] - NA

and it returns a segmentation fault, can someone tell me what I am doing
wrong?
Thanks
Laura
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Prof Brian Ripley
On Tue, 7 Oct 2003, Laura Quinn wrote:

 thanks, have used
 
 temp [temp==0]- NA
 
 and this seems to have worked, though it won't let me access individual
 columns (ie temp$t1 etc) to work on - is there any real advantage in using
 a matrix, or would i be better advised to deal with dataframes? (I have
 double checked and temp is currently a matrix).

Things are going to be a lot faster for a numerical matrix than a data 
frame: the advantage of data frames is that the columns can be of
different types.

BTW, you should really use  temp[, t1] for a data frame or a matrix:
temp$t1 works for data frames, `by the back door' and has a number of bugs 
(including failing to detect errors which corrupt the data frame) prior to 
1.8.0 (to be).


 
 
 
 On Tue, 7 Oct 2003, Prof Brian Ripley wrote:
 
  On Tue, 7 Oct 2003, Laura Quinn wrote:
 
   I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
   lots of missing values within the dataset which are currently displayed as
   the value -999.00 I am trying to create a new matrix (or change the
   existing one) to display these values as NA so that I can then perform
   the necessary analysis on the columns within the matrix.
  
   The matrix name is temp and the column names are t1 to t20 inclusive.
  
   I have tried the following command:
  
   temp$t1[temp$t1 == -999.00] - NA
  
   and it returns a segmentation fault, can someone tell me what I am doing
   wrong?
 
  Well, R should not segfault, so there is bug here somewhere.  However, I
  don't think what you have described can actually work. Is temp really a
  matrix?  If so temp$t1 will return NULL, and you should get an error
  message.
 
 
  If temp is a matrix
 
  temp[temp == -999.00] - NA
 
  will do what you want.
 
 
  If as is more likely temp is a data frame with all columns numeric,
  there are several ways to do this, e.g.
 
  temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x))
 
  temp[as.matrix(temp) == -999] - NA  # only in recent versions of R
 
  as well as explicit looping over columns.
 
  --
  Brian D. Ripley,  [EMAIL PROTECTED]
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Beginner's query - segmentation fault

2003-10-07 Thread Uwe Ligges
Laura Quinn wrote:

thanks, have used

temp [temp==0]- NA
Please use
  is.na(temp[temp==0]) - TRUE

and this seems to have worked, though it won't let me access individual
columns (ie temp$t1 etc) 
No! temp$t1 is a list element or column of a data.frame, but not a 
column of a matrix. *PLEASE*, read manuals, help pages, or books on R 
how to use index / extract elements.

Please read my previous answer on how to access individual columns.


to work on - is there any real advantage in using
a matrix, or would i be better advised to deal with dataframes? (I have
double checked and temp is currently a matrix).
Working on matrices is supposed to be faster. But matrices have the 
restriction of one data type for all columns (e.g. numeric).

Uwe Ligges


On Tue, 7 Oct 2003, Prof Brian Ripley wrote:


On Tue, 7 Oct 2003, Laura Quinn wrote:


I am dealing with a huge matrix in R (20 columns, 54000 rows) and have
lots of missing values within the dataset which are currently displayed as
the value -999.00 I am trying to create a new matrix (or change the
existing one) to display these values as NA so that I can then perform
the necessary analysis on the columns within the matrix.
The matrix name is temp and the column names are t1 to t20 inclusive.

I have tried the following command:

temp$t1[temp$t1 == -999.00] - NA

and it returns a segmentation fault, can someone tell me what I am doing
wrong?
Well, R should not segfault, so there is bug here somewhere.  However, I
don't think what you have described can actually work. Is temp really a
matrix?  If so temp$t1 will return NULL, and you should get an error
message.
If temp is a matrix

temp[temp == -999.00] - NA

will do what you want.

If as is more likely temp is a data frame with all columns numeric,
there are several ways to do this, e.g.
temp[] - lapply(temp, function(x) ifelse(x == -999, NA, x))

temp[as.matrix(temp) == -999] - NA  # only in recent versions of R

as well as explicit looping over columns.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


is.na(v)-b (was: Re: [R] Beginner's query - segmentation fault)

2003-10-07 Thread Richard A. O'Keefe
I am puzzled by the advice to use is.na(x) - TRUE instead of x - NA.

?NA says
 Function `is.na-' may provide a safer way to set missingness. It
 behaves differently for factors, for example.

However, MAY provide is a bit scary, and it doesn't say WHAT the
difference in behaviour is.

I must say that is.na(x) - ... is rather repugnant, because it doesn't
work.  What do I mean?  Well, as the designers of SETL who many years ago
coined the term sinister function call to talk about f(...)-...,
pointed out, if you do
f(x) - y
then afterwards you expect
f(x) == y
to be true.  So let's try it:

 x - c(1,NA,3)
 is.na(x) - c(FALSE,FALSE,TRUE)
 x
[1]  1 NA NA
 is.na(x)
[1] FALSE  TRUE  TRUE
v
So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x),
but I _got_ c(FALSE,TRUE, TRUE) instead.
^
That is not how a well behaved sinister function call should work,
and it's enough to scare someone off is.na()- forever.

The obvious way to set elements of a variable to missing is ... - NA.
Wouldn't it be better if that just plain worked?

Can someone give an example of is.na()- and -NA working differently
with a factor?  I just tried it:

 x - factor(c(3,1,4,1,5,9))
 y - x
 is.na(x) - x==1
 y[y==1] - NA
 x
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9
 y
[1] 3NA 4NA 59   
Levels: 1 3 4 5 9

Both approaches seem to have given the same answer.  What did I miss?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help