Re: [R] read.delim skips first column (why?)

2009-07-15 Thread Paolo Sonego

This should work:


junk -read.table(fd0edfab.txt, sep=, header=T, fill=F,quote= )

HIH

Paolo Sonego

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.delim skips first column (why?)

2009-07-14 Thread Giovanni Marco Dall'Olio
Hi,
I have uploaded a copy of the file here:
- http://pastebin.com/fd0edfab

the file has also been passed throught the unix command tool unexpand, but
it doesn't solve the problem.

using head=TRUE instead of head=T has also the same effect.

the output of print(names) is:
 print(names(ngly), quote=TRUE)
 [1] snp   gene
 [3] chromosomedistance_from_gene_center
 [5] position  ame
 [7] csasiaeasia
 [9] eur   mena
[11] oce   ssafr
[13] X X.1
[15] X.2

Thank you to all the people who answered me to my mail address, but I
couldn't solve the problem yet.


On Tue, Jul 14, 2009 at 12:36 AM, jim holtman jholt...@gmail.com wrote:

 Can you send your file as an attachment since it is impossible to see
 where the separator characters are.

 On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
 Dall'Oliodalloli...@gmail.com wrote:
  Hi people,
  I have a text file like this one posted:
 
  snp_id  genechromosome  distance_from_gene_center
  positionpop1pop2pop3pop4pop5pop6pop7
  rs2129081   RAPT2   3   -129993 upstream  0.439009
  1.169210NA  0.2330200.093042NA
  -0.902596
  rs1202698   RAPT2   3   -128695 upstream  NA
  1.815000NA  0.3990791.8142701.382950
  NA
  rs1163207   RAPT2   3   -128224 upstream  NA  NA
  NA  NA  NA  NA  NA
  rs1834127   RAPT2   3   -128106 upstream  NA  NA
  NA  NA  NA  NA  2.180670
  rs2114211   RAPT2   3   -126738 upstream  -0.468279
  -1.447620   NA  0.010616-0.414581   NA
  0.550447
  rs2113151   RAPT2   3   -124620 upstream  -0.897660
  -1.971020   NA  -0.920327   -0.764658   NA
  0.337127
  rs2524130   RAPT2   3   -123029 upstream  -0.109795
  -0.004646   -0.412059   1.1167400.667567
  -0.924529   0.962841
  rs1381318   RAPT2   3   -12818  upstream  -0.911662
  -1.791580   NA  -0.945716   -1.239640   NA
  0.004876
  rs2113319   RAPT2   3   -122028 upstream  -0.911662
  -1.738610   NA  -0.945716   -1.240950   NA  -0.005318
 
  When I use read.delim (or any read function) on it, R skips the first
  column, and I don' understand why.
 
  For example:
  $: R
  data = read.delim('snp_file.txt', head=T, sep='\t')
 
  Now, I would expect data$snp_id to contain snp ids, and data$gene to
 contain
  gene names; but it is not like this:
 
  data$snp_id
  [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
  Levels: RAPT2
  data$gene
  [1] 3 3 3 3 3 3 3 3 3
 
  summary(data)
   snp_id   gene chromosome  distance_from_gene_center
   RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
1st Qu.:3   1st Qu.:-128224
Median :3   Median :-126738
Mean   :3   Mean   :-113806
3rd Qu.:3   3rd Qu.:-123029
Max.   :3   Max.   : -12818
  
 
  data$pop7
  [1] NA NA NA NA NA NA NA NA NA
 
 
  Notice that it did use snp_id as the header for the first column, but it
  skips completely al the data from that column, and all the fields are
  shifted, so the last column is filled with NA values.
 
  What I am doing wrong? Can it be a problem of my data files? I have tried
 to
  modify them a bit (add new columns, etc..) but it didn't work.
 
  I am running R from an Ubuntu system:
  sessionInfo()
  R version 2.9.1 (2009-06-26)
  i486-pc-linux-gnu
 
  locale:
 
 LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
 
 
 
  --
  Giovanni Dall'Olio, phd student
  Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
 
  My blog on bioinformatics: http://bioinfoblog.it
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?




-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide 

Re: [R] read.delim skips first column (why?)

2009-07-14 Thread Petr PIKAL
Hi

 str(read.table(test.txt, header=T))
'data.frame':   9 obs. of  12 variables:
 $ snp  : Factor w/ 9 levels 
rs1113188,rs1113397,..: 9 5 7 8 3 4 6 1 2
 $ gene : Factor w/ 1 level TRP2: 1 1 1 1 1 1 1 1 1
 $ chromosome   : int  3 3 3 3 3 3 3 3 3
 
It can be sometimes tricky to upload files to R. I would recommend if 
read.delim fils try read.table which has less assumptions and try to set 
parameters (heade, sep, dec) to get your file right

Regards
Petr


r-help-boun...@r-project.org napsal dne 14.07.2009 11:11:10:

 Hi,
 I have uploaded a copy of the file here:
 - http://pastebin.com/fd0edfab
 
 the file has also been passed throught the unix command tool unexpand, 
but
 it doesn't solve the problem.
 
 using head=TRUE instead of head=T has also the same effect.
 
 the output of print(names) is:
  print(names(ngly), quote=TRUE)
  [1] snp   gene
  [3] chromosomedistance_from_gene_center
  [5] position  ame
  [7] csasiaeasia
  [9] eur   mena
 [11] oce   ssafr
 [13] X X.1
 [15] X.2
 
 Thank you to all the people who answered me to my mail address, but I
 couldn't solve the problem yet.
 
 
 On Tue, Jul 14, 2009 at 12:36 AM, jim holtman jholt...@gmail.com 
wrote:
 
  Can you send your file as an attachment since it is impossible to see
  where the separator characters are.
 
  On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
  Dall'Oliodalloli...@gmail.com wrote:
   Hi people,
   I have a text file like this one posted:
  
   snp_id  genechromosome  distance_from_gene_center
   positionpop1pop2pop3pop4pop5pop6pop7
   rs2129081   RAPT2   3   -129993 upstream  0.439009
   1.169210NA  0.2330200.093042NA
   -0.902596
   rs1202698   RAPT2   3   -128695 upstream  NA
   1.815000NA  0.3990791.8142701.382950
   NA
   rs1163207   RAPT2   3   -128224 upstream  NA  NA
   NA  NA  NA  NA  NA
   rs1834127   RAPT2   3   -128106 upstream  NA  NA
   NA  NA  NA  NA  2.180670
   rs2114211   RAPT2   3   -126738 upstream  -0.468279
   -1.447620   NA  0.010616-0.414581   NA
   0.550447
   rs2113151   RAPT2   3   -124620 upstream  -0.897660
   -1.971020   NA  -0.920327   -0.764658   NA
   0.337127
   rs2524130   RAPT2   3   -123029 upstream  -0.109795
   -0.004646   -0.412059   1.1167400.667567
   -0.924529   0.962841
   rs1381318   RAPT2   3   -12818  upstream  -0.911662
   -1.791580   NA  -0.945716   -1.239640   NA
   0.004876
   rs2113319   RAPT2   3   -122028 upstream  -0.911662
   -1.738610   NA  -0.945716   -1.240950   NA -0.005318
  
   When I use read.delim (or any read function) on it, R skips the 
first
   column, and I don' understand why.
  
   For example:
   $: R
   data = read.delim('snp_file.txt', head=T, sep='\t')
  
   Now, I would expect data$snp_id to contain snp ids, and data$gene to
  contain
   gene names; but it is not like this:
  
   data$snp_id
   [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
   Levels: RAPT2
   data$gene
   [1] 3 3 3 3 3 3 3 3 3
  
   summary(data)
snp_id   gene chromosome  distance_from_gene_center
RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
 1st Qu.:3   1st Qu.:-128224
 Median :3   Median :-126738
 Mean   :3   Mean   :-113806
 3rd Qu.:3   3rd Qu.:-123029
 Max.   :3   Max.   : -12818
   
  
   data$pop7
   [1] NA NA NA NA NA NA NA NA NA
  
  
   Notice that it did use snp_id as the header for the first column, 
but it
   skips completely al the data from that column, and all the fields 
are
   shifted, so the last column is filled with NA values.
  
   What I am doing wrong? Can it be a problem of my data files? I have 
tried
  to
   modify them a bit (add new columns, etc..) but it didn't work.
  
   I am running R from an Ubuntu system:
   sessionInfo()
   R version 2.9.1 (2009-06-26)
   i486-pc-linux-gnu
  
   locale:
  
  
 
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
  
   attached base packages:
   [1] stats graphics  grDevices utils datasets  methods   base
  
  
  
  
   --
   Giovanni Dall'Olio, phd student
   Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
  
   My blog on bioinformatics: http://bioinfoblog.it
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   

Re: [R] read.delim skips first column (why?)

2009-07-14 Thread Gabor Grothendieck
Try

count.fields(myfile.txt, sep = \t)

read.delim uses sep = \t but there are trailing tabs
on some lines.

The first line, i.e. with the headers, has three trailing tabs
so it thinks that there are 15 columns rather than 12.

The 5th line of the file (4th line of data) has 4 trailing
tabs so it thinks that there are up to 16 fields in each
data line.

Since it now believes that there are 16 fields of data and
15 fields of headers it assumes the extra field, i.e. the
first one, is the row names.


On Tue, Jul 14, 2009 at 5:11 AM, Giovanni Marco
Dall'Oliodalloli...@gmail.com wrote:
 Hi,
 I have uploaded a copy of the file here:
 - http://pastebin.com/fd0edfab

 the file has also been passed throught the unix command tool unexpand, but
 it doesn't solve the problem.

 using head=TRUE instead of head=T has also the same effect.

 the output of print(names) is:
 print(names(ngly), quote=TRUE)
  [1] snp                       gene
  [3] chromosome                distance_from_gene_center
  [5] position                  ame
  [7] csasia                    easia
  [9] eur                       mena
 [11] oce                       ssafr
 [13] X                         X.1
 [15] X.2

 Thank you to all the people who answered me to my mail address, but I
 couldn't solve the problem yet.


 On Tue, Jul 14, 2009 at 12:36 AM, jim holtman jholt...@gmail.com wrote:

 Can you send your file as an attachment since it is impossible to see
 where the separator characters are.

 On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
 Dall'Oliodalloli...@gmail.com wrote:
  Hi people,
  I have a text file like this one posted:
 
  snp_id  gene    chromosome      distance_from_gene_center
  position        pop1    pop2    pop3    pop4    pop5    pop6    pop7
  rs2129081       RAPT2   3       -129993 upstream      0.439009
  1.169210        NA      0.233020        0.093042        NA
  -0.902596
  rs1202698       RAPT2   3       -128695 upstream      NA
  1.815000        NA      0.399079        1.814270        1.382950
  NA
  rs1163207       RAPT2   3       -128224 upstream      NA      NA
  NA      NA      NA      NA      NA
  rs1834127       RAPT2   3       -128106 upstream      NA      NA
  NA      NA      NA      NA      2.180670
  rs2114211       RAPT2   3       -126738 upstream      -0.468279
  -1.447620       NA      0.010616        -0.414581       NA
  0.550447
  rs2113151       RAPT2   3       -124620 upstream      -0.897660
  -1.971020       NA      -0.920327       -0.764658       NA
  0.337127
  rs2524130       RAPT2   3       -123029 upstream      -0.109795
  -0.004646       -0.412059       1.116740        0.667567
  -0.924529       0.962841
  rs1381318       RAPT2   3       -12818  upstream      -0.911662
  -1.791580       NA      -0.945716       -1.239640       NA
  0.004876
  rs2113319       RAPT2   3       -122028 upstream      -0.911662
  -1.738610       NA      -0.945716       -1.240950       NA      -0.005318
 
  When I use read.delim (or any read function) on it, R skips the first
  column, and I don' understand why.
 
  For example:
  $: R
  data = read.delim('snp_file.txt', head=T, sep='\t')
 
  Now, I would expect data$snp_id to contain snp ids, and data$gene to
 contain
  gene names; but it is not like this:
 
  data$snp_id
  [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
  Levels: RAPT2
  data$gene
  [1] 3 3 3 3 3 3 3 3 3
 
  summary(data)
   snp_id       gene     chromosome      distance_from_gene_center
   RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
            1st Qu.:3   1st Qu.:-128224
            Median :3   Median :-126738
            Mean   :3   Mean   :-113806
            3rd Qu.:3   3rd Qu.:-123029
            Max.   :3   Max.   : -12818
  
 
  data$pop7
  [1] NA NA NA NA NA NA NA NA NA
 
 
  Notice that it did use snp_id as the header for the first column, but it
  skips completely al the data from that column, and all the fields are
  shifted, so the last column is filled with NA values.
 
  What I am doing wrong? Can it be a problem of my data files? I have tried
 to
  modify them a bit (add new columns, etc..) but it didn't work.
 
  I am running R from an Ubuntu system:
  sessionInfo()
  R version 2.9.1 (2009-06-26)
  i486-pc-linux-gnu
 
  locale:
 
 LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base
 
 
 
 
  --
  Giovanni Dall'Olio, phd student
  Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
 
  My blog on bioinformatics: http://bioinfoblog.it
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 

[R] read.delim skips first column (why?)

2009-07-13 Thread Giovanni Marco Dall'Olio
Hi people,
I have a text file like this one posted:

snp_id  genechromosome  distance_from_gene_center
positionpop1pop2pop3pop4pop5pop6pop7
rs2129081   RAPT2   3   -129993 upstream  0.439009
1.169210NA  0.2330200.093042NA
-0.902596
rs1202698   RAPT2   3   -128695 upstream  NA
1.815000NA  0.3990791.8142701.382950
NA
rs1163207   RAPT2   3   -128224 upstream  NA  NA
NA  NA  NA  NA  NA
rs1834127   RAPT2   3   -128106 upstream  NA  NA
NA  NA  NA  NA  2.180670
rs2114211   RAPT2   3   -126738 upstream  -0.468279
-1.447620   NA  0.010616-0.414581   NA
0.550447
rs2113151   RAPT2   3   -124620 upstream  -0.897660
-1.971020   NA  -0.920327   -0.764658   NA
0.337127
rs2524130   RAPT2   3   -123029 upstream  -0.109795
-0.004646   -0.412059   1.1167400.667567
-0.924529   0.962841
rs1381318   RAPT2   3   -12818  upstream  -0.911662
-1.791580   NA  -0.945716   -1.239640   NA
0.004876
rs2113319   RAPT2   3   -122028 upstream  -0.911662
-1.738610   NA  -0.945716   -1.240950   NA  -0.005318

When I use read.delim (or any read function) on it, R skips the first
column, and I don' understand why.

For example:
$: R
 data = read.delim('snp_file.txt', head=T, sep='\t')

Now, I would expect data$snp_id to contain snp ids, and data$gene to contain
gene names; but it is not like this:

 data$snp_id
[1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
Levels: RAPT2
 data$gene
[1] 3 3 3 3 3 3 3 3 3

 summary(data)
  snp_id   gene chromosome  distance_from_gene_center
 RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
   1st Qu.:3   1st Qu.:-128224
   Median :3   Median :-126738
   Mean   :3   Mean   :-113806
   3rd Qu.:3   3rd Qu.:-123029
   Max.   :3   Max.   : -12818


 data$pop7
[1] NA NA NA NA NA NA NA NA NA


Notice that it did use snp_id as the header for the first column, but it
skips completely al the data from that column, and all the fields are
shifted, so the last column is filled with NA values.

What I am doing wrong? Can it be a problem of my data files? I have tried to
modify them a bit (add new columns, etc..) but it didn't work.

I am running R from an Ubuntu system:
 sessionInfo()
R version 2.9.1 (2009-06-26)
i486-pc-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base




-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.