Re: [R] Convert CSV file to FASTA
On 08/31/2011 01:36 AM, anyone wrote: Hi there, I have large excel files which I can save as CSV files. Each excel file contains two columns. One contains the chromosome number and the second contains a DNA sequence. I need to convert this into a fasta file that looks like this chromosomenumber CGTCGAGCGTCGAGCGGAGCG Can anyone show me an R script to do this? Hi x along the lines of csv = read.csv("foo.csv") fa = character(2 * nrow(csv)) fa[c(TRUE, FALSE)] = sprintf("> %s", csv$chr) fa[c(FALSE, TRUE)] = csv$seq writeLines(fa, "foo.fasta") perhaps use scan() instead of read.csv if your input file is simple enough. The Bioconductor project has many packages for dealing with sequence data, including Biostrings and ShortRead; both would enable this, e.g., ## once only source("http://bioconductor.org/biocLite.R";) biocLite("Biostrings") ## then... library(Biostrings) seq = csv$seq names(seq) = csv$id dna = DNAStringSet(seq) write.XStringSet(dna, "foo.fasta") which would be worth the effort if you wanted to use R for more than an awk replacement. Martin Many thanks x -- View this message in context: http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert CSV file to FASTA
> Date: Wed, 31 Aug 2011 01:36:51 -0700 > From: oliviacree...@gmail.com > To: r-help@r-project.org > Subject: [R] Convert CSV file to FASTA > > Hi there, > > I have large excel files which I can save as CSV files. > > Each excel file contains two columns. One contains the chromosome number and > the second contains a DNA sequence. > > I need to convert this into a fasta file that looks like this > >chromosomenumber > CGTCGAGCGTCGAGCGGAGCG > > Can anyone show me an R script to do this? > If you can post a few lines of your "csv" someone can probably give you a bach script to do it. It may be possible in R but sed/awk probbly work better. IIRC, fasta is just a name line followed by sequence. If your csv looks like "name, XX" it may be possible to change comma to space and use awk with something like print ">"$1"\n"$2 etc. > Many thanks > > x > > -- > View this message in context: > http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert CSV file to FASTA
Hi there, I have large excel files which I can save as CSV files. Each excel file contains two columns. One contains the chromosome number and the second contains a DNA sequence. I need to convert this into a fasta file that looks like this >chromosomenumber CGTCGAGCGTCGAGCGGAGCG Can anyone show me an R script to do this? Many thanks x -- View this message in context: http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.