On 08/31/2011 01:36 AM, anyone wrote:
Hi there,
I have large excel files which I can save as CSV files.
Each excel file contains two columns. One contains the chromosome number and
the second contains a DNA sequence.
I need to convert this into a fasta file that looks like this
chromosomenumber
CGTCGAGCGTCGAGCGGAGCG....
Can anyone show me an R script to do this?
Hi x
along the lines of
csv = read.csv("foo.csv")
fa = character(2 * nrow(csv))
fa[c(TRUE, FALSE)] = sprintf("> %s", csv$chr)
fa[c(FALSE, TRUE)] = csv$seq
writeLines(fa, "foo.fasta")
perhaps use scan() instead of read.csv if your input file is simple enough.
The Bioconductor project has many packages for dealing with sequence
data, including Biostrings and ShortRead; both would enable this, e.g.,
## once only
source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
## then...
library(Biostrings)
seq = csv$seq
names(seq) = csv$id
dna = DNAStringSet(seq)
write.XStringSet(dna, "foo.fasta")
which would be worth the effort if you wanted to use R for more than an
awk replacement.
Martin
Many thanks
x
--
View this message in context:
http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.