From the NEWS file for R-patched:
o A field containing just a sign is no longer regarded as numeric
(it was on all platforms in 2.7.0, but not on most in earlier
versions of R).
So the default behaviour has already been changed. The right way to
overcome this was (as you discovered) to say what type you want the
variables to be rather than let R guess for you. But you cna also update
your R (as the posting guide suggests).
On Wed, 28 May 2008, Margherita wrote:
Great R people,
I have noticed a strange behaviour in read.delim() and friends in the R 2.7.0
version. I will describe you the problem and also the solution I already
found, just to be sure it is an expected behaviour and also to tell people,
who may experience the same difficulty, a way to overcome it.
And also to see if it is a proper behaviour or maybe a correction is needed.
Here is the problem:
I have some genomic coordinates files (bed files, a standard format, for
example) containing a column (Strand) in which there is either a "+" or a
"-".
In R-2.6.2patched (and every past version I have used) I never had problems
in reading them in, as for example:
a <- read.table("coords.bed", skip=1)
disp(a)
class data.frame
dimensions are 38650 6
first rows:
V1 V2 V3 V4 V5 V6
1 chr1 100088396 100088446 seq1 0 +
2 chr1 100088764 100088814 seq2 0 -
If I do exactly the same command on the same file in R-2.7.0 the result I
obtain is:
a <- read.table("coords.bed", skip=1)
disp(a)
class data.frame
dimensions are 38650 6
first rows:
V1 V2 V3 V4 V5 V6
1 chr1 100088396 100088446 seq1 0 0
2 chr1 100088764 100088814 seq2 0 0
and I completely loose the strand information, they are all zeros! I have
also tried to put quotes around "+" and "-" in the file before reading it, to
set in read.table() call stringsAsFactors=FALSE, to set "encoding" to a few
different alternatives, but the result was always the same: they are all
transformed in 0.
Then I tried scan() and I saw it was reading the character "+" properly:
scan("coords.bed", skip=1, nlines=1, what="ch")
Read 6 items
[1] "chr1" "100088396" "100088446.00" "seq1" "0" [6] "+"
...my conclusion is that the lone "+" or "-" are not taken as "characters" in
the data frame creation step, they are taken as "numeric" but, being without
numbers are all converted to 0.
Is it correct if this behaviour happens also if they are surrounded by
quotes?
Anyway, my temporary solution (which works without the need of changing the
files) is:
a <- read.table("coords.bed", skip=1, colClasses=c("character", "numeric",
"numeric", "character", "numeric", "character"))
a[1:2,]
V1 V2 V3 V4 V5 V6
1 chr1 100088396 100088446 seq1 0 +
2 chr1 100088764 100088814 seq2 0 -
Another way to avoid loosing strand information was to manually substitute an
"R" to "-" and an "F" to "+" in the file before reading it in R. But it is
much more cumbersome since the use of + and - is, for example, a standard
format in bed files accepted and generated by the Genome Browser and other
genome sites.
Please let me know what do you think. Ps. I saw this first in the Fedora
version (rpm automatically updated), but it is reproduced also in the Windows
version.
Thank you all people for your work and for making R the wonderful tool it is!
Cheers,
Margherita
--
--
-----------------------------------------------------------------------------------
Margherita Mutarelli, PhD Seconda Universita' di Napoli
Dipartimento di Patologia Generale
via L. De Crecchio, 7
80138 Napoli - Italy
Tel/Fax. +39.081.5665802
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.