Hi Yuxin, you can also try out the phyloGenerator tool to download specific sequences from GenBank (http://willpearse.github.io/phyloGenerator/index.html). You only need a species list to do so, for more details read the manual.
Best, Eugen Am 14.07.2014 12:00, schrieb r-sig-phylo-requ...@r-project.org: > Send R-sig-phylo mailing list submissions to > r-sig-phylo@r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > or, via email, send a message with subject or body 'help' to > r-sig-phylo-requ...@r-project.org > > You can reach the person managing the list at > r-sig-phylo-ow...@r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of R-sig-phylo digest..." > > > Today's Topics: > > 1. use R to download the DNA barcode sequence of a list of > species from GenBank (cheny...@hotmail.com) > 2. Re: use R to download the DNA barcode sequence of a list of > species from GenBank (Karolis Ramanauskas) > 3. Fw: Re: use R to download the DNA barcode sequence of a list > of species from GenBank (cheny...@hotmail.com) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 13 Jul 2014 23:39:13 +0800 > From: "cheny...@hotmail.com" <cheny...@hotmail.com> > To: r-sig-phylo <r-sig-phylo@r-project.org> > Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a > list of species from GenBank > Message-ID: <blu436-smtp22689d7abb1c7022fa6b08382...@phx.gbl> > Content-Type: text/plain > > Dear All, > > I have a list of about 2,000 plant species, and want to construct a > phylogenetic tree for them. I'd like to use the DNA barcode data availabe in > GenBank. Then I will first need to download these DNA sequences from the > Internet. I know that read.GenBank in package "ape" is capable to do it if I > have the GenBank accession numbers. But what I only have now is their species > names. Does anybody know which R function can batch-process it with only > species names from GenBank? > > Many thanks in advance. > Yuxin > > > > Yuxin Chen > Phd Candidate > School of Life Sciences > Sun Yat-sen University > Guangzhou, P. R. China, 510006 > cheny...@gmail.com or cheny...@mail2.sysu.edu.cn > > [[alternative HTML version deleted]] > > > ------------------------------ > > Message: 2 > Date: Sun, 13 Jul 2014 12:31:03 -0500 > From: Karolis Ramanauskas <kram...@uic.edu> > To: cheny...@hotmail.com, R-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence > of a list of species from GenBank > Message-ID: > <CACT_pJHyfaTqX=Ft3g+g=MZ85jxf=_xf_ame4ynkwcyws+j...@mail.gmail.com> > Content-Type: text/plain > > Good day, > > I understand you have done some work already, but you may want to try my > PhyloMill pipeline. It will do exactly what you need. It is written in > Python, not R. You will need to give it the names of ingroup and outgroup > taxa and which loci you want to use. If the loci you want to use are not > predefined in PhyloMill, I can create the definitions, just let me know > which loci you want to use. > > PhyloMill will actually do a lot more than just download and align > sequences, it will filter mislabeled sequences, reverse-complement if > needed, etc. It will also create a consensus sequence when multiple GI > accessions are available for that taxon and locus. > > https://github.com/karolisr/krpy > > Peace, > Karolis Ramanauskas > Department of Biological Sciences > University of Illinois at Chicago > 840 W. Taylor St. SEL 4093 M/C 067 > Chicago, IL 60607 > E-Mail: kram...@uic.edu > >> From: "cheny...@hotmail.com" <cheny...@hotmail.com> >> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a > list of species from GenBank >> Date: July 13, 2014 10:39:13 AM CDT >> To: r-sig-phylo <r-sig-phylo@r-project.org> >> >> Dear All, >> >> I have a list of about 2,000 plant species, and want to construct a > phylogenetic tree for them. I'd like to use the DNA barcode data availabe > in GenBank. Then I will first need to download these DNA sequences from the > Internet. I know that read.GenBank in package "ape" is capable to do it if > I have the GenBank accession numbers. But what I only have now is their > species names. Does anybody know which R function can batch-process it with > only species names from GenBank? >> >> Many thanks in advance. >> Yuxin >> >> >> >> Yuxin Chen >> Phd Candidate >> School of Life Sciences >> Sun Yat-sen University >> Guangzhou, P. R. China, 510006 >> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 3 > Date: Mon, 14 Jul 2014 14:19:29 +0800 > From: "cheny...@hotmail.com" <cheny...@hotmail.com> > To: kraman2 <kram...@uic.edu> > Cc: R-sig-phylo <R-sig-phylo@r-project.org> > Subject: [R-sig-phylo] Fw: Re: use R to download the DNA barcode > sequence of a list of species from GenBank > Message-ID: <blu436-smtp20941d6c07e3af5b0bec72782...@phx.gbl> > Content-Type: text/plain > > Hi Karolis, > > Thank you for providing the guides. > > David's "rentrez" R package works quite well with my problem (I have copied > his reply below), and I am not familiar with Python. But thank you all the > same. > > Cheers, > Yuxin > > > > Yuxin Chen > Phd Candidate > School of Life Sciences > Sun Yat-sen University > Guangzhou, P. R. China, 510006 > cheny...@gmail.com or cheny...@mail2.sysu.edu.cn > > From: cheny...@hotmail.com > Date: 2014-07-14 14:10 > To: David Winter > Subject: Re: Re: [R-sig-phylo] use R to download the DNA barcode sequence of > a list of species from GenBank > Hi David, > > Thanks for your reply. > > The package "rentrez" is really wonderful. I have already tried to search my > species list with your function fetch_gene and it worked. > But there is one small question. What is the "BOLD[all]" for? Sorry, I am > just a beginner on phylogeny and am not familar with this term yet. When I > included it in my case, the search result is empty, but when I exluded this > term but keeping others the same it worked. > > Thank you again, > Yuxin > > > > Yuxin Chen > Phd Candidate > School of Life Sciences > Sun Yat-sen University > Guangzhou, P. R. China, 510006 > cheny...@gmail.com or cheny...@mail2.sysu.edu.cn > > From: David Winter > Date: 2014-07-14 02:52 > To: cheny...@hotmail.com > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a > list of species from GenBank > Hi Yuxin, > > If you want specifically to get at the Barcode of Life records in > genbank then you can try using the NCBI's Entrez tools > (http://www.ncbi.nlm.nih.gov/books/NBK25500/). If you want to do it in > R you can use rentrez, a library that I maintain as part of rOpenSci > (https://github.com/ropensci/rentrez) > > Taking a quick look at some of the BOLD records, it seems they are not > consistently tagged in a way that makes them easy to search for. > Here's what I came up with for a solution > > library(rentrez) > nuc_search <- entrez_search(db="nuccore", term="Solanum[Organism] > rbcl[gene] BOLD[all]", retmax=40) > head(nuc_search$ids) > ##[1] "409977017" "409977015" "379134037" "379134035" "379133963" "326394567" > > Those ids can then be passed to read.Genbank or entrez_fetch to > retrieve records. > > If you want to do this for a bunch of genes you might want to wrap the > whole process up in a function: > > fetch_gene <- function(organism_name, gene_name, file_format="fasta", > max_recs=50){ > sterm <- sprintf("%s[organism] %s[gene] BOLD[all]", organism_name, > gene_name) > nuc_search <- entrez_search(db="nuccore", term=sterm, retmax=max_recs) > return(entrez_fetch(db="nuccore", id=nuc_search$ids, rettype=file_format)) > } > genera <- c("Solanum", "Terminalia") > recs <- lapply(genera, fetch_gene, gene_name="rbcl") > > Which will give you a list of characters, each representing a fasta > file. You can check have the right number of records etc if you want: > > library(stringr) > sapply(recs, str_count, pattern=">") > > Almost every language you might otherwise use in bioinformatics has a > wrapper for the Entrez API, so you easily adapt this to You Favourite > Language if you wanted to. > > Hope that's some help to you > > David > > On Sun, Jul 13, 2014 at 8:39 AM, cheny...@hotmail.com > <cheny...@hotmail.com> wrote: >> Dear All, >> >> I have a list of about 2,000 plant species, and want to construct a >> phylogenetic tree for them. I'd like to use the DNA barcode data availabe in >> GenBank. Then I will first need to download these DNA sequences from the >> Internet. I know that read.GenBank in package "ape" is capable to do it if I >> have the GenBank accession numbers. But what I only have now is their >> species names. Does anybody know which R function can batch-process it with >> only species names from GenBank? >> >> Many thanks in advance. >> Yuxin >> >> >> >> Yuxin Chen >> Phd Candidate >> School of Life Sciences >> Sun Yat-sen University >> Guangzhou, P. R. China, 510006 >> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > > _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/