Hi people, I have a text file like this one posted: snp_id gene chromosome distance_from_gene_center position pop1 pop2 pop3 pop4 pop5 pop6 pop7 rs2129081 RAPT2 3 -129993 "upstream" 0.439009 1.169210 NA 0.233020 0.093042 NA -0.902596 rs1202698 RAPT2 3 -128695 "upstream" NA 1.815000 NA 0.399079 1.814270 1.382950 NA rs1163207 RAPT2 3 -128224 "upstream" NA NA NA NA NA NA NA rs1834127 RAPT2 3 -128106 "upstream" NA NA NA NA NA NA 2.180670 rs2114211 RAPT2 3 -126738 "upstream" -0.468279 -1.447620 NA 0.010616 -0.414581 NA 0.550447 rs2113151 RAPT2 3 -124620 "upstream" -0.897660 -1.971020 NA -0.920327 -0.764658 NA 0.337127 rs2524130 RAPT2 3 -123029 "upstream" -0.109795 -0.004646 -0.412059 1.116740 0.667567 -0.924529 0.962841 rs1381318 RAPT2 3 -12818 "upstream" -0.911662 -1.791580 NA -0.945716 -1.239640 NA 0.004876 rs2113319 RAPT2 3 -122028 "upstream" -0.911662 -1.738610 NA -0.945716 -1.240950 NA -0.005318
When I use read.delim (or any read function) on it, R skips the first column, and I don' understand why. For example: $: R > data = read.delim('snp_file.txt', head=T, sep='\t') Now, I would expect data$snp_id to contain snp ids, and data$gene to contain gene names; but it is not like this: > data$snp_id [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 Levels: RAPT2 > data$gene [1] 3 3 3 3 3 3 3 3 3 > summary(data) snp_id gene chromosome distance_from_gene_center RAPT2:9 Min. :3 Min. :-129993 upstream:9 1st Qu.:3 1st Qu.:-128224 Median :3 Median :-126738 Mean :3 Mean :-113806 3rd Qu.:3 3rd Qu.:-123029 Max. :3 Max. : -12818 .... > data$pop7 [1] NA NA NA NA NA NA NA NA NA Notice that it did use snp_id as the header for the first column, but it skips completely al the data from that column, and all the fields are shifted, so the last column is filled with NA values. What I am doing wrong? Can it be a problem of my data files? I have tried to modify them a bit (add new columns, etc..) but it didn't work. I am running R from an Ubuntu system: > sessionInfo() R version 2.9.1 (2009-06-26) i486-pc-linux-gnu locale: LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.