Re: [R] strsplit help

2012-04-12 Thread alison waller
Thanks - I checked through and it looks as if all of the geneids are 
formatted similarily so I don't know which one would be causing an error.
Interestingly, your sapply method works on the same data.  So I'm happy 
although still confused, because the strsplit method worked the other 
day with a similarly generated dataset.

I dumped my entire dataframe below.  Incase anyone wants to investigate.

Alison

Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1)
Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2)
  dput(Rumino_Reps_agreeWalign)
structure(list(geneid = c(657313.locus_tag:RTO_08940, 457412.251848018,
657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 
657313.locus_tag:RTO_09690,
471875.197297106, 411470.DS231493.G14, 411459.149830627,
657313.locus_tag:RTO_09720, 411460.145845997, 411459.149831369,
657321.locus_tag:RBR_01830, 411460.145846414, 457412.251848805,
657321.locus_tag:RBR_08030, 471875.197296907, 457412.251847995,
657314.locus_tag:CK5_20840, 411460.145846423, 
657314.locus_tag:CK5_25030,
457412.251847990, 471875.197297117, 471875.197299322, 
411459.149831093,
411459.149831815, 411460.145846434, 213810.locus_tag:RUM_09700,
657314.locus_tag:CK5_09460, 657323.locus_tag:CK1_18840, 
471875.197297108,
411460.145846680, 411459.149831368, 657314.locus_tag:CK5_19120,
657321.locus_tag:RBR_09560, 411460.145846435, 
657323.locus_tag:CK1_11530,
457412.251850723, 213810.locus_tag:RUM_12960, 
213810.locus_tag:RUM_14740,
213810.locus_tag:RUM_07030, 471875.197296936, 411459.149831092,
471875.197297110, 471875.197298135, 411460.145846430, 
657314.locus_tag:CK5_20370,
657313.locus_tag:RTO_09790, 657323.locus_tag:CK1_33050, 
411460.145846407,
457412.251849909, 411460.145846340, 657313.locus_tag:RTO_14810,
457412.251848010, 457412.251850599, 657323.locus_tag:CK1_33200,
657323.locus_tag:CK1_33190, 213810.locus_tag:RUM_03050, 
657314.locus_tag:CK5_09880,
213810.locus_tag:RUM_15180, 657313.locus_tag:RTO_14610, 
657313.locus_tag:RTO_23930,
411459.149830473, 657313.locus_tag:RTO_18090, 
657323.locus_tag:CK1_27940,
657314.locus_tag:CK5_20720, 411459.149831855, 471875.197297691,
411459.149833320, 457412.251849358, 657321.locus_tag:RBR_13130,
411459.149831077, 471875.197297272, 657314.locus_tag:CK5_09370,
457412.251847994, 411459.149831080, 657314.locus_tag:CK5_20730,
457412.251850579, 213810.locus_tag:RUM_14870, 
657321.locus_tag:RBR_01750,
657313.locus_tag:RTO_09660, 657314.locus_tag:CK5_28910, 
411460.145846907,
657313.locus_tag:RTO_09860, 457412.251847996, 
657323.locus_tag:CK1_38480,
411460.145846417, 471875.197297592, 411459.149831814, 
457412.251848016,
411459.149831804, 657323.locus_tag:CK1_32880, 
657321.locus_tag:RBR_08130,
411460.145846429, 657313.locus_tag:RTO_09880, 
213810.locus_tag:RUM_03410,
657313.locus_tag:RTO_09740, 657313.locus_tag:RTO_09840, 
457412.251848009,
657323.locus_tag:CK1_33090, 657323.locus_tag:CK1_25000, 
411459.149831095,
411459.149830934, 457412.251847970, 457412.251848000, 
657314.locus_tag:CK5_20680,
411459.149831088, 657323.locus_tag:CK1_19350, 
657321.locus_tag:RBR_08670,
471875.197299547, 411459.149831081, 657323.locus_tag:CK1_32550,
411459.149831091, 657313.locus_tag:RTO_24580, 457412.251848004,
471875.197297195, 411460.145846602, 657321.locus_tag:RBR_06200,
213810.locus_tag:RUM_19570, 411460.145846361, 411459.149833804,
657323.locus_tag:CK1_32930, 471875.197296906, 411459.149831078,
657321.locus_tag:RBR_09900, 411460.145846496, 
657321.locus_tag:RBR_08260,
411459.149833021, 657313.locus_tag:RTO_02600, 
657323.locus_tag:CK1_33030,
657313.locus_tag:RTO_09750, 213810.locus_tag:RUM_14790, 
457412.251848017,
457412.251848806, 457412.251847640, 657314.locus_tag:CK5_20620,
411459.149830474, 657323.locus_tag:CK1_11750, 
213810.locus_tag:RUM_09690,
457412.251847999, 657321.locus_tag:RBR_05870, 411460.145846409,
657313.locus_tag:RTO_16220, 657321.locus_tag:RBR_10630, 
411459.149833026,
457412.251847997, 657313.locus_tag:RTO_09650, 471875.197297129,
471875.197297112, 213810.locus_tag:RUM_14720, 457412.251847991,
657313.locus_tag:RTO_09730, 471875.197297132, 
657313.locus_tag:RTO_14650,
411470.DS231491.G186, 457412.251849520, 657323.locus_tag:CK1_04710,
657323.locus_tag:CK1_04510, 411460.145846182, 411460.145846883,
657321.locus_tag:RBR_08040, 411459.149833983, 457412.251849519,
471875.197297124, 457412.251849906, 657321.locus_tag:RBR_08010,
657321.locus_tag:RBR_03380, 657323.locus_tag:CK1_20230, 
471875.197297115,
657323.locus_tag:CK1_13100, 657323.locus_tag:CK1_32950, 
411460.145846428,
471875.197297120, 213810.locus_tag:RUM_13040, 
657314.locus_tag:CK5_25080,
411459.149831096, 411459.149831090, 411459.14981, 
411459.149831370,
657313.locus_tag:RTO_26330, 411459.149833340, 
657314.locus_tag:CK5_20590,
411460.145846458, 471875.197297290, 657313.locus_tag:RTO_09850,
213810.locus_tag:RUM_12130, 657323.locus_tag:CK1_32910, 
213810.locus_tag:RUM_09770,
657313.locus_tag:RTO_09640, 657313.locus_tag:RTO_09830, 
457412.251849013,
411460.145847544, 

Re: [R] strsplit help

2012-04-12 Thread Jean V Adams
Alison,

You've got two geneids with two periods (instead of just one period).

gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\\.)
Rumino_Reps_agreeWalign[sapply(gene.list, length)!=2, ]
  geneid count_Conser count_NonCons count_ConsSubst 
count_NCSubst
7411470.DS231493.G141 0   0  0
154 411470.DS231491.G1861 2   0  1

Your method had an error, because it couldn't deal with the different 
lengths of vectors in the list created by strsplit.

My method ran without error because it just pulled of the first two parts 
of the geneid,
   411470.DS231493   and   411470.DS231491
and ignored the third part of the geneid
   G14   and   G186.

However, you may need to come up with a different method or a workaround, 
if you want the full geneid from these two records.

Jean



alison waller alison.wal...@embl.de wrote on 04/12/2012 03:00:26 AM:

 Thanks - I checked through and it looks as if all of the geneids are
 formatted similarily so I don't know which one would be causing an 
error.
 Interestingly, your sapply method works on the same data.  So I'm 
 happy although still confused, because the strsplit method worked 
 the other day with a similarly generated dataset.
 
 I dumped my entire dataframe below.  Incase anyone wants to investigate.
 
 Alison
 
 Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1)
 Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2) 
  dput(Rumino_Reps_agreeWalign)
 structure(list(geneid = c(657313.locus_tag:RTO_08940, 
457412.251848018, 
 657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 
 657313.locus_tag:RTO_09690, 
 471875.197297106, 411470.DS231493.G14, 411459.149830627,  
 657313.locus_tag:RTO_09720, 411460.145845997, 411459.
 149831369, 
 657321.locus_tag:RBR_01830, 411460.145846414, 457412.
 251848805, 
 657321.locus_tag:RBR_08030, 471875.197296907, 457412.
 251847995, 
 657314.locus_tag:CK5_20840, 411460.145846423, 
 657314.locus_tag:CK5_25030, 
 457412.251847990, 471875.197297117, 471875.197299322, 411459.
 149831093, 
 411459.149831815, 411460.145846434, 
 213810.locus_tag:RUM_09700, 
 657314.locus_tag:CK5_09460, 657323.locus_tag:CK1_18840, 471875.
 197297108, 
 411460.145846680, 411459.149831368, 
 657314.locus_tag:CK5_19120, 
 657321.locus_tag:RBR_09560, 411460.145846435, 
 657323.locus_tag:CK1_11530, 
 457412.251850723, 213810.locus_tag:RUM_12960, 
 213810.locus_tag:RUM_14740, 
 213810.locus_tag:RUM_07030, 471875.197296936, 411459.
 149831092, 
 471875.197297110, 471875.197298135, 411460.145846430, 
 657314.locus_tag:CK5_20370, 
 657313.locus_tag:RTO_09790, 657323.locus_tag:CK1_33050, 411460.
 145846407, 
 457412.251849909, 411460.145846340, 
 657313.locus_tag:RTO_14810, 
 457412.251848010, 457412.251850599, 
 657323.locus_tag:CK1_33200, 
 657323.locus_tag:CK1_33190, 213810.locus_tag:RUM_03050, 
 657314.locus_tag:CK5_09880, 
 213810.locus_tag:RUM_15180, 657313.locus_tag:RTO_14610, 
 657313.locus_tag:RTO_23930, 
 411459.149830473, 657313.locus_tag:RTO_18090, 
 657323.locus_tag:CK1_27940, 
 657314.locus_tag:CK5_20720, 411459.149831855, 471875.
 197297691, 
 411459.149833320, 457412.251849358, 
 657321.locus_tag:RBR_13130, 
 411459.149831077, 471875.197297272, 
 657314.locus_tag:CK5_09370, 
 457412.251847994, 411459.149831080, 
 657314.locus_tag:CK5_20730, 
 457412.251850579, 213810.locus_tag:RUM_14870, 
 657321.locus_tag:RBR_01750, 
 657313.locus_tag:RTO_09660, 657314.locus_tag:CK5_28910, 411460.
 145846907, 
 657313.locus_tag:RTO_09860, 457412.251847996, 
 657323.locus_tag:CK1_38480, 
 411460.145846417, 471875.197297592, 411459.149831814, 457412.
 251848016, 
 411459.149831804, 657323.locus_tag:CK1_32880, 
 657321.locus_tag:RBR_08130, 
 411460.145846429, 657313.locus_tag:RTO_09880, 
 213810.locus_tag:RUM_03410, 
 657313.locus_tag:RTO_09740, 657313.locus_tag:RTO_09840, 457412.
 251848009, 
 657323.locus_tag:CK1_33090, 657323.locus_tag:CK1_25000, 411459.
 149831095, 
 411459.149830934, 457412.251847970, 457412.251848000, 
 657314.locus_tag:CK5_20680, 
 411459.149831088, 657323.locus_tag:CK1_19350, 
 657321.locus_tag:RBR_08670, 
 471875.197299547, 411459.149831081, 
 657323.locus_tag:CK1_32550, 
 411459.149831091, 657313.locus_tag:RTO_24580, 457412.
 251848004, 
 471875.197297195, 411460.145846602, 
 657321.locus_tag:RBR_06200, 
 213810.locus_tag:RUM_19570, 411460.145846361, 411459.
 149833804, 
 657323.locus_tag:CK1_32930, 471875.197296906, 411459.
 149831078, 
 657321.locus_tag:RBR_09900, 411460.145846496, 
 657321.locus_tag:RBR_08260, 
 411459.149833021, 657313.locus_tag:RTO_02600, 
 657323.locus_tag:CK1_33030, 
 657313.locus_tag:RTO_09750, 213810.locus_tag:RUM_14790, 457412.
 251848017, 
 457412.251848806, 457412.251847640, 
 657314.locus_tag:CK5_20620, 
 411459.149830474, 657323.locus_tag:CK1_11750, 
 213810.locus_tag:RUM_09690, 
 457412.251847999, 657321.locus_tag:RBR_05870, 411460.
 145846409, 
 657313.locus_tag:RTO_16220, 

[R] strsplit help

2012-04-11 Thread alison waller

Dear all,

I want to use string split to parse column names, however, I am having 
some errors that I don't understand.

I see a problem when I try to rbind the output from strsplit.

please let me know if I'm missing something obvious,

thanks,
alison

here are my commands:
strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\\.)
 
Rumino_Reps_agreeWalignTR-transform(Rumino_Reps_agreeWalign,taxid=do.call(rbind, 
strsplit))

Warning message:
In function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg 1)


here is my data:

 head(Rumino_Reps_agreeWalign)
  geneid count_Conser count_NonCons count_ConsSubst
1 657313.locus_tag:RTO_089407 5   5
2   457412.2518480181 4   3
3 657314.locus_tag:CK5_206302 4   1
4 657323.locus_tag:CK1_330601 0   1
5 657313.locus_tag:RTO_096903 0   3
6   471875.1972971060 2   1
  count_NCSubst
1 1
2 0
3 0
4 0
5 1
6 1

here are the results from strsplit:
 head(strsplit)
[[1]]
[1] 657313  locus_tag:RTO_08940

[[2]]
[1] 457412251848018

[[3]]
[1] 657314  locus_tag:CK5_20630

[[4]]
[1] 657323  locus_tag:CK1_33060

[[5]]
[1] 657313  locus_tag:RTO_09690

[[6]]
[1] 471875197297106

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strsplit help

2012-04-11 Thread Jean V Adams
Alison,

Your code works fine on the first six lines of the data that you provided.

Rumino_Reps_agreeWalign - data.frame(
geneid = c(657313.locus_tag:RTO_08940, 
457412.251848018, 
657314.locus_tag:CK5_20630, 
657323.locus_tag:CK1_33060, 
657313.locus_tag:RTO_09690, 
471875.197297106), 
count_Conser = c(7, 1, 2, 1, 3, 0),
count_NonCons = c(5, 4, 4, 0, 0, 2), 
count_ConsSubst = c(5, 3, 1, 1, 3, 1), 
count_NCSubst = c(1, 0, 0, 0, 1, 1))
gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid), \\.)
Rumino_Reps_agreeWalignTR - transform(Rumino_Reps_agreeWalign, 
taxid=do.call(rbind, gene.list))

Perhaps in later rows of the data there are cases where there is no . in 
geneid?  If not, can you provide a subset of your data that results in the 
warning?  Use the dput() function.

It's not a good idea to create an object named strsplit.  That will only 
mask the function strsplit() in later runs.

If time is an issue, a slightly faster way to do this, after the 
strsplit() function is:
Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1)
Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2)

Jean


alison waller wrote on 04/11/2012 08:23:29 AM:

 Dear all,
 
 I want to use string split to parse column names, however, I am having 
 some errors that I don't understand.
 I see a problem when I try to rbind the output from strsplit.
 
 please let me know if I'm missing something obvious,
 
 thanks,
 alison
 
 here are my commands:
  strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\\.)
   
 Rumino_Reps_agreeWalignTR-transform
 (Rumino_Reps_agreeWalign,taxid=do.call(rbind, 
 strsplit))
 Warning message:
 In function (..., deparse.level = 1)  :
number of columns of result is not a multiple of vector length (arg 
1)
 
 
 here is my data:
 
   head(Rumino_Reps_agreeWalign)
geneid count_Conser count_NonCons count_ConsSubst
 1 657313.locus_tag:RTO_089407 5   5
 2   457412.2518480181 4   3
 3 657314.locus_tag:CK5_206302 4   1
 4 657323.locus_tag:CK1_330601 0   1
 5 657313.locus_tag:RTO_096903 0   3
 6   471875.1972971060 2   1
count_NCSubst
 1 1
 2 0
 3 0
 4 0
 5 1
 6 1
 
 here are the results from strsplit:
   head(strsplit)
 [[1]]
 [1] 657313  locus_tag:RTO_08940
 
 [[2]]
 [1] 457412251848018
 
 [[3]]
 [1] 657314  locus_tag:CK5_20630
 
 [[4]]
 [1] 657323  locus_tag:CK1_33060
 
 [[5]]
 [1] 657313  locus_tag:RTO_09690
 
 [[6]]
 [1] 471875197297106

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strsplit help

2012-04-11 Thread David Winsemius


On Apr 11, 2012, at 2:01 PM, Jean V Adams wrote:


Alison,

Your code works fine on the first six lines of the data that you  
provided.


Rumino_Reps_agreeWalign - data.frame(
   geneid = c(657313.locus_tag:RTO_08940,
   457412.251848018,
   657314.locus_tag:CK5_20630,
   657323.locus_tag:CK1_33060,
   657313.locus_tag:RTO_09690,
   471875.197297106),
   count_Conser = c(7, 1, 2, 1, 3, 0),
   count_NonCons = c(5, 4, 4, 0, 0, 2),
   count_ConsSubst = c(5, 3, 1, 1, 3, 1),
   count_NCSubst = c(1, 0, 0, 0, 1, 1))
gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid),  
\\.)

Rumino_Reps_agreeWalignTR - transform(Rumino_Reps_agreeWalign,
   taxid=do.call(rbind, gene.list))

Perhaps in later rows of the data there are cases where there is no  
. in
geneid?  If not, can you provide a subset of your data that results  
in the

warning?  Use the dput() function.

It's not a good idea to create an object named strsplit.  That  
will only

mask the function strsplit() in later runs.


There is not a problem with masking the function unless the new name  
is replaced with a language object (which wasn't the case here). The  
potential confusion is in minds of users. Function names are stored  
separately from non-language object names so you can have a data  
object named 'strsplit' and it will not mask the function 'strsplit'.


--
David.


If time is an issue, a slightly faster way to do this, after the
strsplit() function is:
Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1)
Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2)

Jean


alison waller wrote on 04/11/2012 08:23:29 AM:


Dear all,

I want to use string split to parse column names, however, I am  
having

some errors that I don't understand.
I see a problem when I try to rbind the output from strsplit.

please let me know if I'm missing something obvious,

thanks,
alison

here are my commands:
strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\ 
\.)



Rumino_Reps_agreeWalignTR-transform
(Rumino_Reps_agreeWalign,taxid=do.call(rbind,
strsplit))
Warning message:
In function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg

1)



here is my data:


head(Rumino_Reps_agreeWalign)
  geneid count_Conser count_NonCons  
count_ConsSubst
1 657313.locus_tag:RTO_089407  
5   5
2   457412.2518480181  
4   3
3 657314.locus_tag:CK5_206302  
4   1
4 657323.locus_tag:CK1_330601  
0   1
5 657313.locus_tag:RTO_096903  
0   3
6   471875.1972971060  
2   1

  count_NCSubst
1 1
2 0
3 0
4 0
5 1
6 1

here are the results from strsplit:

head(strsplit)

[[1]]
[1] 657313  locus_tag:RTO_08940

[[2]]
[1] 457412251848018

[[3]]
[1] 657314  locus_tag:CK5_20630

[[4]]
[1] 657323  locus_tag:CK1_33060

[[5]]
[1] 657313  locus_tag:RTO_09690

[[6]]
[1] 471875197297106


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strsplit help

2012-04-11 Thread Jean V Adams
David,

Right you are!  Thanks for pointing that out.

strsplit - 1:10
strsplit(With spaces, NULL)
strsplit

Jean


David Winsemius dwinsem...@comcast.net wrote on 04/11/2012 01:17:07 PM:

 [image removed] 
 
 Re: [R] strsplit help
 
 David Winsemius 
 
 to:
 
 Jean V Adams
 
 04/11/2012 01:19 PM
 
 Cc:
 
 alison waller, r-help
 
 
 On Apr 11, 2012, at 2:01 PM, Jean V Adams wrote:
 
  Alison,
 
  Your code works fine on the first six lines of the data that you 
  provided.
 
  Rumino_Reps_agreeWalign - data.frame(
 geneid = c(657313.locus_tag:RTO_08940,
 457412.251848018,
 657314.locus_tag:CK5_20630,
 657323.locus_tag:CK1_33060,
 657313.locus_tag:RTO_09690,
 471875.197297106),
 count_Conser = c(7, 1, 2, 1, 3, 0),
 count_NonCons = c(5, 4, 4, 0, 0, 2),
 count_ConsSubst = c(5, 3, 1, 1, 3, 1),
 count_NCSubst = c(1, 0, 0, 0, 1, 1))
  gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid), 
  \\.)
  Rumino_Reps_agreeWalignTR - transform(Rumino_Reps_agreeWalign,
 taxid=do.call(rbind, gene.list))
 
  Perhaps in later rows of the data there are cases where there is no 
  . in
  geneid?  If not, can you provide a subset of your data that results 
  in the
  warning?  Use the dput() function.
 
  It's not a good idea to create an object named strsplit.  That 
  will only
  mask the function strsplit() in later runs.
 
 There is not a problem with masking the function unless the new name 
 is replaced with a language object (which wasn't the case here). The 
 potential confusion is in minds of users. Function names are stored 
 separately from non-language object names so you can have a data 
 object named 'strsplit' and it will not mask the function 'strsplit'.
 
 -- 
 David.
 
  If time is an issue, a slightly faster way to do this, after the
  strsplit() function is:
  Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1)
  Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2)
 
  Jean
 
 
  alison waller wrote on 04/11/2012 08:23:29 AM:
 
  Dear all,
 
  I want to use string split to parse column names, however, I am 
  having
  some errors that I don't understand.
  I see a problem when I try to rbind the output from strsplit.
 
  please let me know if I'm missing something obvious,
 
  thanks,
  alison
 
  here are my commands:
  strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\ 
  \.)
 
  Rumino_Reps_agreeWalignTR-transform
  (Rumino_Reps_agreeWalign,taxid=do.call(rbind,
  strsplit))
  Warning message:
  In function (..., deparse.level = 1)  :
number of columns of result is not a multiple of vector length (arg
  1)
 
 
  here is my data:
 
  head(Rumino_Reps_agreeWalign)
geneid count_Conser count_NonCons 
  count_ConsSubst
  1 657313.locus_tag:RTO_089407 
  5   5
  2   457412.2518480181 
  4   3
  3 657314.locus_tag:CK5_206302 
  4   1
  4 657323.locus_tag:CK1_330601 
  0   1
  5 657313.locus_tag:RTO_096903 
  0   3
  6   471875.1972971060 
  2   1
count_NCSubst
  1 1
  2 0
  3 0
  4 0
  5 1
  6 1
 
  here are the results from strsplit:
  head(strsplit)
  [[1]]
  [1] 657313  locus_tag:RTO_08940
 
  [[2]]
  [1] 457412251848018
 
  [[3]]
  [1] 657314  locus_tag:CK5_20630
 
  [[4]]
  [1] 657323  locus_tag:CK1_33060
 
  [[5]]
  [1] 657313  locus_tag:RTO_09690
 
  [[6]]
  [1] 471875197297106
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.