Hi people,
I have a text file like this one posted:

snp_id  gene    chromosome      distance_from_gene_center
position        pop1    pop2    pop3    pop4    pop5    pop6    pop7
rs2129081       RAPT2   3       -129993 "upstream"      0.439009
1.169210        NA      0.233020        0.093042        NA
-0.902596
rs1202698       RAPT2   3       -128695 "upstream"      NA
1.815000        NA      0.399079        1.814270        1.382950
NA
rs1163207       RAPT2   3       -128224 "upstream"      NA      NA
NA      NA      NA      NA      NA
rs1834127       RAPT2   3       -128106 "upstream"      NA      NA
NA      NA      NA      NA      2.180670
rs2114211       RAPT2   3       -126738 "upstream"      -0.468279
-1.447620       NA      0.010616        -0.414581       NA
0.550447
rs2113151       RAPT2   3       -124620 "upstream"      -0.897660
-1.971020       NA      -0.920327       -0.764658       NA
0.337127
rs2524130       RAPT2   3       -123029 "upstream"      -0.109795
-0.004646       -0.412059       1.116740        0.667567
-0.924529       0.962841
rs1381318       RAPT2   3       -12818  "upstream"      -0.911662
-1.791580       NA      -0.945716       -1.239640       NA
0.004876
rs2113319       RAPT2   3       -122028 "upstream"      -0.911662
-1.738610       NA      -0.945716       -1.240950       NA      -0.005318

When I use read.delim (or any read function) on it, R skips the first
column, and I don' understand why.

For example:
$: R
> data = read.delim('snp_file.txt', head=T, sep='\t')

Now, I would expect data$snp_id to contain snp ids, and data$gene to contain
gene names; but it is not like this:

> data$snp_id
[1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
Levels: RAPT2
> data$gene
[1] 3 3 3 3 3 3 3 3 3

> summary(data)
  snp_id       gene     chromosome      distance_from_gene_center
 RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
           1st Qu.:3   1st Qu.:-128224
           Median :3   Median :-126738
           Mean   :3   Mean   :-113806
           3rd Qu.:3   3rd Qu.:-123029
           Max.   :3   Max.   : -12818
....

> data$pop7
[1] NA NA NA NA NA NA NA NA NA


Notice that it did use snp_id as the header for the first column, but it
skips completely al the data from that column, and all the fields are
shifted, so the last column is filled with NA values.

What I am doing wrong? Can it be a problem of my data files? I have tried to
modify them a bit (add new columns, etc..) but it didn't work.

I am running R from an Ubuntu system:
> sessionInfo()
R version 2.9.1 (2009-06-26)
i486-pc-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base




-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to