Re: [R] Convert CSV file to FASTA

Martin Morgan Wed, 31 Aug 2011 06:03:25 -0700

On 08/31/2011 01:36 AM, anyone wrote:

Hi there,


I have large excel files which I can save as CSV files.

Each excel file contains two columns. One contains the chromosome number and
the second contains a DNA sequence.

I need to convert this into a fasta file that looks like this

chromosomenumber

CGTCGAGCGTCGAGCGGAGCG....

Can anyone show me an R script to do this?


Hi x

along the lines of

  csv = read.csv("foo.csv")
  fa = character(2 * nrow(csv))
  fa[c(TRUE, FALSE)] = sprintf("> %s", csv$chr)
  fa[c(FALSE, TRUE)] = csv$seq
  writeLines(fa, "foo.fasta")

perhaps use scan() instead of read.csv if your input file is simple enough.

The Bioconductor project has many packages for dealing with sequencedata, including Biostrings and ShortRead; both would enable this, e.g.,


   ## once only
   source("http://bioconductor.org/biocLite.R";)
   biocLite("Biostrings")

   ## then...
   library(Biostrings)
   seq = csv$seq
   names(seq) = csv$id
   dna = DNAStringSet(seq)
   write.XStringSet(dna, "foo.fasta")

which would be worth the effort if you wanted to use R for more than anawk replacement.


Martin


Many thanks

x

--
View this message in context: 
http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert CSV file to FASTA

Reply via email to