Re: [R] strsplit help
Thanks - I checked through and it looks as if all of the geneids are formatted similarily so I don't know which one would be causing an error. Interestingly, your sapply method works on the same data. So I'm happy although still confused, because the strsplit method worked the other day with a similarly generated dataset. I dumped my entire dataframe below. Incase anyone wants to investigate. Alison Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1) Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2) dput(Rumino_Reps_agreeWalign) structure(list(geneid = c(657313.locus_tag:RTO_08940, 457412.251848018, 657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 657313.locus_tag:RTO_09690, 471875.197297106, 411470.DS231493.G14, 411459.149830627, 657313.locus_tag:RTO_09720, 411460.145845997, 411459.149831369, 657321.locus_tag:RBR_01830, 411460.145846414, 457412.251848805, 657321.locus_tag:RBR_08030, 471875.197296907, 457412.251847995, 657314.locus_tag:CK5_20840, 411460.145846423, 657314.locus_tag:CK5_25030, 457412.251847990, 471875.197297117, 471875.197299322, 411459.149831093, 411459.149831815, 411460.145846434, 213810.locus_tag:RUM_09700, 657314.locus_tag:CK5_09460, 657323.locus_tag:CK1_18840, 471875.197297108, 411460.145846680, 411459.149831368, 657314.locus_tag:CK5_19120, 657321.locus_tag:RBR_09560, 411460.145846435, 657323.locus_tag:CK1_11530, 457412.251850723, 213810.locus_tag:RUM_12960, 213810.locus_tag:RUM_14740, 213810.locus_tag:RUM_07030, 471875.197296936, 411459.149831092, 471875.197297110, 471875.197298135, 411460.145846430, 657314.locus_tag:CK5_20370, 657313.locus_tag:RTO_09790, 657323.locus_tag:CK1_33050, 411460.145846407, 457412.251849909, 411460.145846340, 657313.locus_tag:RTO_14810, 457412.251848010, 457412.251850599, 657323.locus_tag:CK1_33200, 657323.locus_tag:CK1_33190, 213810.locus_tag:RUM_03050, 657314.locus_tag:CK5_09880, 213810.locus_tag:RUM_15180, 657313.locus_tag:RTO_14610, 657313.locus_tag:RTO_23930, 411459.149830473, 657313.locus_tag:RTO_18090, 657323.locus_tag:CK1_27940, 657314.locus_tag:CK5_20720, 411459.149831855, 471875.197297691, 411459.149833320, 457412.251849358, 657321.locus_tag:RBR_13130, 411459.149831077, 471875.197297272, 657314.locus_tag:CK5_09370, 457412.251847994, 411459.149831080, 657314.locus_tag:CK5_20730, 457412.251850579, 213810.locus_tag:RUM_14870, 657321.locus_tag:RBR_01750, 657313.locus_tag:RTO_09660, 657314.locus_tag:CK5_28910, 411460.145846907, 657313.locus_tag:RTO_09860, 457412.251847996, 657323.locus_tag:CK1_38480, 411460.145846417, 471875.197297592, 411459.149831814, 457412.251848016, 411459.149831804, 657323.locus_tag:CK1_32880, 657321.locus_tag:RBR_08130, 411460.145846429, 657313.locus_tag:RTO_09880, 213810.locus_tag:RUM_03410, 657313.locus_tag:RTO_09740, 657313.locus_tag:RTO_09840, 457412.251848009, 657323.locus_tag:CK1_33090, 657323.locus_tag:CK1_25000, 411459.149831095, 411459.149830934, 457412.251847970, 457412.251848000, 657314.locus_tag:CK5_20680, 411459.149831088, 657323.locus_tag:CK1_19350, 657321.locus_tag:RBR_08670, 471875.197299547, 411459.149831081, 657323.locus_tag:CK1_32550, 411459.149831091, 657313.locus_tag:RTO_24580, 457412.251848004, 471875.197297195, 411460.145846602, 657321.locus_tag:RBR_06200, 213810.locus_tag:RUM_19570, 411460.145846361, 411459.149833804, 657323.locus_tag:CK1_32930, 471875.197296906, 411459.149831078, 657321.locus_tag:RBR_09900, 411460.145846496, 657321.locus_tag:RBR_08260, 411459.149833021, 657313.locus_tag:RTO_02600, 657323.locus_tag:CK1_33030, 657313.locus_tag:RTO_09750, 213810.locus_tag:RUM_14790, 457412.251848017, 457412.251848806, 457412.251847640, 657314.locus_tag:CK5_20620, 411459.149830474, 657323.locus_tag:CK1_11750, 213810.locus_tag:RUM_09690, 457412.251847999, 657321.locus_tag:RBR_05870, 411460.145846409, 657313.locus_tag:RTO_16220, 657321.locus_tag:RBR_10630, 411459.149833026, 457412.251847997, 657313.locus_tag:RTO_09650, 471875.197297129, 471875.197297112, 213810.locus_tag:RUM_14720, 457412.251847991, 657313.locus_tag:RTO_09730, 471875.197297132, 657313.locus_tag:RTO_14650, 411470.DS231491.G186, 457412.251849520, 657323.locus_tag:CK1_04710, 657323.locus_tag:CK1_04510, 411460.145846182, 411460.145846883, 657321.locus_tag:RBR_08040, 411459.149833983, 457412.251849519, 471875.197297124, 457412.251849906, 657321.locus_tag:RBR_08010, 657321.locus_tag:RBR_03380, 657323.locus_tag:CK1_20230, 471875.197297115, 657323.locus_tag:CK1_13100, 657323.locus_tag:CK1_32950, 411460.145846428, 471875.197297120, 213810.locus_tag:RUM_13040, 657314.locus_tag:CK5_25080, 411459.149831096, 411459.149831090, 411459.14981, 411459.149831370, 657313.locus_tag:RTO_26330, 411459.149833340, 657314.locus_tag:CK5_20590, 411460.145846458, 471875.197297290, 657313.locus_tag:RTO_09850, 213810.locus_tag:RUM_12130, 657323.locus_tag:CK1_32910, 213810.locus_tag:RUM_09770, 657313.locus_tag:RTO_09640, 657313.locus_tag:RTO_09830, 457412.251849013, 411460.145847544,
Re: [R] strsplit help
Alison, You've got two geneids with two periods (instead of just one period). gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\\.) Rumino_Reps_agreeWalign[sapply(gene.list, length)!=2, ] geneid count_Conser count_NonCons count_ConsSubst count_NCSubst 7411470.DS231493.G141 0 0 0 154 411470.DS231491.G1861 2 0 1 Your method had an error, because it couldn't deal with the different lengths of vectors in the list created by strsplit. My method ran without error because it just pulled of the first two parts of the geneid, 411470.DS231493 and 411470.DS231491 and ignored the third part of the geneid G14 and G186. However, you may need to come up with a different method or a workaround, if you want the full geneid from these two records. Jean alison waller alison.wal...@embl.de wrote on 04/12/2012 03:00:26 AM: Thanks - I checked through and it looks as if all of the geneids are formatted similarily so I don't know which one would be causing an error. Interestingly, your sapply method works on the same data. So I'm happy although still confused, because the strsplit method worked the other day with a similarly generated dataset. I dumped my entire dataframe below. Incase anyone wants to investigate. Alison Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1) Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2) dput(Rumino_Reps_agreeWalign) structure(list(geneid = c(657313.locus_tag:RTO_08940, 457412.251848018, 657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 657313.locus_tag:RTO_09690, 471875.197297106, 411470.DS231493.G14, 411459.149830627, 657313.locus_tag:RTO_09720, 411460.145845997, 411459. 149831369, 657321.locus_tag:RBR_01830, 411460.145846414, 457412. 251848805, 657321.locus_tag:RBR_08030, 471875.197296907, 457412. 251847995, 657314.locus_tag:CK5_20840, 411460.145846423, 657314.locus_tag:CK5_25030, 457412.251847990, 471875.197297117, 471875.197299322, 411459. 149831093, 411459.149831815, 411460.145846434, 213810.locus_tag:RUM_09700, 657314.locus_tag:CK5_09460, 657323.locus_tag:CK1_18840, 471875. 197297108, 411460.145846680, 411459.149831368, 657314.locus_tag:CK5_19120, 657321.locus_tag:RBR_09560, 411460.145846435, 657323.locus_tag:CK1_11530, 457412.251850723, 213810.locus_tag:RUM_12960, 213810.locus_tag:RUM_14740, 213810.locus_tag:RUM_07030, 471875.197296936, 411459. 149831092, 471875.197297110, 471875.197298135, 411460.145846430, 657314.locus_tag:CK5_20370, 657313.locus_tag:RTO_09790, 657323.locus_tag:CK1_33050, 411460. 145846407, 457412.251849909, 411460.145846340, 657313.locus_tag:RTO_14810, 457412.251848010, 457412.251850599, 657323.locus_tag:CK1_33200, 657323.locus_tag:CK1_33190, 213810.locus_tag:RUM_03050, 657314.locus_tag:CK5_09880, 213810.locus_tag:RUM_15180, 657313.locus_tag:RTO_14610, 657313.locus_tag:RTO_23930, 411459.149830473, 657313.locus_tag:RTO_18090, 657323.locus_tag:CK1_27940, 657314.locus_tag:CK5_20720, 411459.149831855, 471875. 197297691, 411459.149833320, 457412.251849358, 657321.locus_tag:RBR_13130, 411459.149831077, 471875.197297272, 657314.locus_tag:CK5_09370, 457412.251847994, 411459.149831080, 657314.locus_tag:CK5_20730, 457412.251850579, 213810.locus_tag:RUM_14870, 657321.locus_tag:RBR_01750, 657313.locus_tag:RTO_09660, 657314.locus_tag:CK5_28910, 411460. 145846907, 657313.locus_tag:RTO_09860, 457412.251847996, 657323.locus_tag:CK1_38480, 411460.145846417, 471875.197297592, 411459.149831814, 457412. 251848016, 411459.149831804, 657323.locus_tag:CK1_32880, 657321.locus_tag:RBR_08130, 411460.145846429, 657313.locus_tag:RTO_09880, 213810.locus_tag:RUM_03410, 657313.locus_tag:RTO_09740, 657313.locus_tag:RTO_09840, 457412. 251848009, 657323.locus_tag:CK1_33090, 657323.locus_tag:CK1_25000, 411459. 149831095, 411459.149830934, 457412.251847970, 457412.251848000, 657314.locus_tag:CK5_20680, 411459.149831088, 657323.locus_tag:CK1_19350, 657321.locus_tag:RBR_08670, 471875.197299547, 411459.149831081, 657323.locus_tag:CK1_32550, 411459.149831091, 657313.locus_tag:RTO_24580, 457412. 251848004, 471875.197297195, 411460.145846602, 657321.locus_tag:RBR_06200, 213810.locus_tag:RUM_19570, 411460.145846361, 411459. 149833804, 657323.locus_tag:CK1_32930, 471875.197296906, 411459. 149831078, 657321.locus_tag:RBR_09900, 411460.145846496, 657321.locus_tag:RBR_08260, 411459.149833021, 657313.locus_tag:RTO_02600, 657323.locus_tag:CK1_33030, 657313.locus_tag:RTO_09750, 213810.locus_tag:RUM_14790, 457412. 251848017, 457412.251848806, 457412.251847640, 657314.locus_tag:CK5_20620, 411459.149830474, 657323.locus_tag:CK1_11750, 213810.locus_tag:RUM_09690, 457412.251847999, 657321.locus_tag:RBR_05870, 411460. 145846409, 657313.locus_tag:RTO_16220,
[R] strsplit help
Dear all, I want to use string split to parse column names, however, I am having some errors that I don't understand. I see a problem when I try to rbind the output from strsplit. please let me know if I'm missing something obvious, thanks, alison here are my commands: strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\\.) Rumino_Reps_agreeWalignTR-transform(Rumino_Reps_agreeWalign,taxid=do.call(rbind, strsplit)) Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1) here is my data: head(Rumino_Reps_agreeWalign) geneid count_Conser count_NonCons count_ConsSubst 1 657313.locus_tag:RTO_089407 5 5 2 457412.2518480181 4 3 3 657314.locus_tag:CK5_206302 4 1 4 657323.locus_tag:CK1_330601 0 1 5 657313.locus_tag:RTO_096903 0 3 6 471875.1972971060 2 1 count_NCSubst 1 1 2 0 3 0 4 0 5 1 6 1 here are the results from strsplit: head(strsplit) [[1]] [1] 657313 locus_tag:RTO_08940 [[2]] [1] 457412251848018 [[3]] [1] 657314 locus_tag:CK5_20630 [[4]] [1] 657323 locus_tag:CK1_33060 [[5]] [1] 657313 locus_tag:RTO_09690 [[6]] [1] 471875197297106 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strsplit help
Alison, Your code works fine on the first six lines of the data that you provided. Rumino_Reps_agreeWalign - data.frame( geneid = c(657313.locus_tag:RTO_08940, 457412.251848018, 657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 657313.locus_tag:RTO_09690, 471875.197297106), count_Conser = c(7, 1, 2, 1, 3, 0), count_NonCons = c(5, 4, 4, 0, 0, 2), count_ConsSubst = c(5, 3, 1, 1, 3, 1), count_NCSubst = c(1, 0, 0, 0, 1, 1)) gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid), \\.) Rumino_Reps_agreeWalignTR - transform(Rumino_Reps_agreeWalign, taxid=do.call(rbind, gene.list)) Perhaps in later rows of the data there are cases where there is no . in geneid? If not, can you provide a subset of your data that results in the warning? Use the dput() function. It's not a good idea to create an object named strsplit. That will only mask the function strsplit() in later runs. If time is an issue, a slightly faster way to do this, after the strsplit() function is: Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1) Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2) Jean alison waller wrote on 04/11/2012 08:23:29 AM: Dear all, I want to use string split to parse column names, however, I am having some errors that I don't understand. I see a problem when I try to rbind the output from strsplit. please let me know if I'm missing something obvious, thanks, alison here are my commands: strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\\.) Rumino_Reps_agreeWalignTR-transform (Rumino_Reps_agreeWalign,taxid=do.call(rbind, strsplit)) Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1) here is my data: head(Rumino_Reps_agreeWalign) geneid count_Conser count_NonCons count_ConsSubst 1 657313.locus_tag:RTO_089407 5 5 2 457412.2518480181 4 3 3 657314.locus_tag:CK5_206302 4 1 4 657323.locus_tag:CK1_330601 0 1 5 657313.locus_tag:RTO_096903 0 3 6 471875.1972971060 2 1 count_NCSubst 1 1 2 0 3 0 4 0 5 1 6 1 here are the results from strsplit: head(strsplit) [[1]] [1] 657313 locus_tag:RTO_08940 [[2]] [1] 457412251848018 [[3]] [1] 657314 locus_tag:CK5_20630 [[4]] [1] 657323 locus_tag:CK1_33060 [[5]] [1] 657313 locus_tag:RTO_09690 [[6]] [1] 471875197297106 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strsplit help
On Apr 11, 2012, at 2:01 PM, Jean V Adams wrote: Alison, Your code works fine on the first six lines of the data that you provided. Rumino_Reps_agreeWalign - data.frame( geneid = c(657313.locus_tag:RTO_08940, 457412.251848018, 657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 657313.locus_tag:RTO_09690, 471875.197297106), count_Conser = c(7, 1, 2, 1, 3, 0), count_NonCons = c(5, 4, 4, 0, 0, 2), count_ConsSubst = c(5, 3, 1, 1, 3, 1), count_NCSubst = c(1, 0, 0, 0, 1, 1)) gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid), \\.) Rumino_Reps_agreeWalignTR - transform(Rumino_Reps_agreeWalign, taxid=do.call(rbind, gene.list)) Perhaps in later rows of the data there are cases where there is no . in geneid? If not, can you provide a subset of your data that results in the warning? Use the dput() function. It's not a good idea to create an object named strsplit. That will only mask the function strsplit() in later runs. There is not a problem with masking the function unless the new name is replaced with a language object (which wasn't the case here). The potential confusion is in minds of users. Function names are stored separately from non-language object names so you can have a data object named 'strsplit' and it will not mask the function 'strsplit'. -- David. If time is an issue, a slightly faster way to do this, after the strsplit() function is: Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1) Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2) Jean alison waller wrote on 04/11/2012 08:23:29 AM: Dear all, I want to use string split to parse column names, however, I am having some errors that I don't understand. I see a problem when I try to rbind the output from strsplit. please let me know if I'm missing something obvious, thanks, alison here are my commands: strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\ \.) Rumino_Reps_agreeWalignTR-transform (Rumino_Reps_agreeWalign,taxid=do.call(rbind, strsplit)) Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1) here is my data: head(Rumino_Reps_agreeWalign) geneid count_Conser count_NonCons count_ConsSubst 1 657313.locus_tag:RTO_089407 5 5 2 457412.2518480181 4 3 3 657314.locus_tag:CK5_206302 4 1 4 657323.locus_tag:CK1_330601 0 1 5 657313.locus_tag:RTO_096903 0 3 6 471875.1972971060 2 1 count_NCSubst 1 1 2 0 3 0 4 0 5 1 6 1 here are the results from strsplit: head(strsplit) [[1]] [1] 657313 locus_tag:RTO_08940 [[2]] [1] 457412251848018 [[3]] [1] 657314 locus_tag:CK5_20630 [[4]] [1] 657323 locus_tag:CK1_33060 [[5]] [1] 657313 locus_tag:RTO_09690 [[6]] [1] 471875197297106 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strsplit help
David, Right you are! Thanks for pointing that out. strsplit - 1:10 strsplit(With spaces, NULL) strsplit Jean David Winsemius dwinsem...@comcast.net wrote on 04/11/2012 01:17:07 PM: [image removed] Re: [R] strsplit help David Winsemius to: Jean V Adams 04/11/2012 01:19 PM Cc: alison waller, r-help On Apr 11, 2012, at 2:01 PM, Jean V Adams wrote: Alison, Your code works fine on the first six lines of the data that you provided. Rumino_Reps_agreeWalign - data.frame( geneid = c(657313.locus_tag:RTO_08940, 457412.251848018, 657314.locus_tag:CK5_20630, 657323.locus_tag:CK1_33060, 657313.locus_tag:RTO_09690, 471875.197297106), count_Conser = c(7, 1, 2, 1, 3, 0), count_NonCons = c(5, 4, 4, 0, 0, 2), count_ConsSubst = c(5, 3, 1, 1, 3, 1), count_NCSubst = c(1, 0, 0, 0, 1, 1)) gene.list - strsplit(as.character(Rumino_Reps_agreeWalign$geneid), \\.) Rumino_Reps_agreeWalignTR - transform(Rumino_Reps_agreeWalign, taxid=do.call(rbind, gene.list)) Perhaps in later rows of the data there are cases where there is no . in geneid? If not, can you provide a subset of your data that results in the warning? Use the dput() function. It's not a good idea to create an object named strsplit. That will only mask the function strsplit() in later runs. There is not a problem with masking the function unless the new name is replaced with a language object (which wasn't the case here). The potential confusion is in minds of users. Function names are stored separately from non-language object names so you can have a data object named 'strsplit' and it will not mask the function 'strsplit'. -- David. If time is an issue, a slightly faster way to do this, after the strsplit() function is: Rumino_Reps_agreeWalign$geneid.prefix - sapply(gene.list, [, 1) Rumino_Reps_agreeWalign$geneid.suffix - sapply(gene.list, [, 2) Jean alison waller wrote on 04/11/2012 08:23:29 AM: Dear all, I want to use string split to parse column names, however, I am having some errors that I don't understand. I see a problem when I try to rbind the output from strsplit. please let me know if I'm missing something obvious, thanks, alison here are my commands: strsplit-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),\ \.) Rumino_Reps_agreeWalignTR-transform (Rumino_Reps_agreeWalign,taxid=do.call(rbind, strsplit)) Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1) here is my data: head(Rumino_Reps_agreeWalign) geneid count_Conser count_NonCons count_ConsSubst 1 657313.locus_tag:RTO_089407 5 5 2 457412.2518480181 4 3 3 657314.locus_tag:CK5_206302 4 1 4 657323.locus_tag:CK1_330601 0 1 5 657313.locus_tag:RTO_096903 0 3 6 471875.1972971060 2 1 count_NCSubst 1 1 2 0 3 0 4 0 5 1 6 1 here are the results from strsplit: head(strsplit) [[1]] [1] 657313 locus_tag:RTO_08940 [[2]] [1] 457412251848018 [[3]] [1] 657314 locus_tag:CK5_20630 [[4]] [1] 657323 locus_tag:CK1_33060 [[5]] [1] 657313 locus_tag:RTO_09690 [[6]] [1] 471875197297106 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.