Hi Yuxin,

you can also try out the phyloGenerator tool to download specific
sequences from GenBank
(http://willpearse.github.io/phyloGenerator/index.html). You only need a
species list to do so, for more details read the manual.

Best,

Eugen

Am 14.07.2014 12:00, schrieb r-sig-phylo-requ...@r-project.org:
> Send R-sig-phylo mailing list submissions to
>       r-sig-phylo@r-project.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> or, via email, send a message with subject or body 'help' to
>       r-sig-phylo-requ...@r-project.org
> 
> You can reach the person managing the list at
>       r-sig-phylo-ow...@r-project.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-phylo digest..."
> 
> 
> Today's Topics:
> 
>    1. use R to download the DNA barcode sequence of a list    of
>       species from GenBank (cheny...@hotmail.com)
>    2. Re: use R to download the DNA barcode sequence of a list of
>       species from GenBank (Karolis Ramanauskas)
>    3. Fw: Re: use R to download the DNA barcode sequence of   a list
>       of species from GenBank (cheny...@hotmail.com)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sun, 13 Jul 2014 23:39:13 +0800
> From: "cheny...@hotmail.com" <cheny...@hotmail.com>
> To: r-sig-phylo <r-sig-phylo@r-project.org>
> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a
>       list    of species from GenBank
> Message-ID: <blu436-smtp22689d7abb1c7022fa6b08382...@phx.gbl>
> Content-Type: text/plain
> 
> Dear All,
> 
> I have a list of about 2,000 plant species, and want to construct a 
> phylogenetic tree for them. I'd like to use the DNA barcode data availabe in 
> GenBank. Then I will first need to download these DNA sequences from the 
> Internet. I know that read.GenBank in package "ape" is capable to do it if I 
> have the GenBank accession numbers. But what I only have now is their species 
> names. Does anybody know which R function can batch-process it with only 
> species names from GenBank?
> 
> Many thanks in advance.
> Yuxin 
> 
> 
> 
> Yuxin Chen
> Phd Candidate
> School of Life Sciences
> Sun Yat-sen University
> Guangzhou, P. R. China, 510006
> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn 
> 
>       [[alternative HTML version deleted]]
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 13 Jul 2014 12:31:03 -0500
> From: Karolis Ramanauskas <kram...@uic.edu>
> To: cheny...@hotmail.com, R-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence
>       of a list of species from GenBank
> Message-ID:
>       <CACT_pJHyfaTqX=Ft3g+g=MZ85jxf=_xf_ame4ynkwcyws+j...@mail.gmail.com>
> Content-Type: text/plain
> 
> Good day,
> 
> I understand you have done some work already, but you may want to try my
> PhyloMill pipeline. It will do exactly what you need. It is written in
> Python, not R. You will need to give it the names of ingroup and outgroup
> taxa and which loci you want to use. If the loci you want to use are not
> predefined in PhyloMill, I can create the definitions, just let me know
> which loci you want to use.
> 
> PhyloMill will actually do a lot more than just download and align
> sequences, it will filter mislabeled sequences, reverse-complement if
> needed, etc. It will also create a consensus sequence when multiple GI
> accessions are available for that taxon and locus.
> 
> https://github.com/karolisr/krpy
> 
> Peace,
> Karolis Ramanauskas
> Department of Biological Sciences
> University of Illinois at Chicago
> 840 W. Taylor St. SEL 4093 M/C 067
> Chicago, IL 60607
> E-Mail: kram...@uic.edu
> 
>> From: "cheny...@hotmail.com" <cheny...@hotmail.com>
>> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a
> list of species from GenBank
>> Date: July 13, 2014 10:39:13 AM CDT
>> To: r-sig-phylo <r-sig-phylo@r-project.org>
>>
>> Dear All,
>>
>> I have a list of about 2,000 plant species, and want to construct a
> phylogenetic tree for them. I'd like to use the DNA barcode data availabe
> in GenBank. Then I will first need to download these DNA sequences from the
> Internet. I know that read.GenBank in package "ape" is capable to do it if
> I have the GenBank accession numbers. But what I only have now is their
> species names. Does anybody know which R function can batch-process it with
> only species names from GenBank?
>>
>> Many thanks in advance.
>> Yuxin
>>
>>
>>
>> Yuxin Chen
>> Phd Candidate
>> School of Life Sciences
>> Sun Yat-sen University
>> Guangzhou, P. R. China, 510006
>> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
> 
>       [[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 14 Jul 2014 14:19:29 +0800
> From: "cheny...@hotmail.com" <cheny...@hotmail.com>
> To: kraman2 <kram...@uic.edu>
> Cc: R-sig-phylo <R-sig-phylo@r-project.org>
> Subject: [R-sig-phylo] Fw: Re: use R to download the DNA barcode
>       sequence of     a list of species from GenBank
> Message-ID: <blu436-smtp20941d6c07e3af5b0bec72782...@phx.gbl>
> Content-Type: text/plain
> 
> Hi Karolis,
> 
> Thank you for providing the guides. 
> 
> David's "rentrez" R package works quite well with my problem (I have copied 
> his reply below), and I am not familiar with Python. But thank you all the 
> same.
> 
> Cheers,
> Yuxin
> 
> 
> 
> Yuxin Chen
> Phd Candidate
> School of Life Sciences
> Sun Yat-sen University
> Guangzhou, P. R. China, 510006
> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn 
>  
> From: cheny...@hotmail.com
> Date: 2014-07-14 14:10
> To: David Winter
> Subject: Re: Re: [R-sig-phylo] use R to download the DNA barcode sequence of 
> a list of species from GenBank
> Hi David, 
> 
> Thanks for your reply.
> 
> The package "rentrez" is really wonderful. I have already tried to search my 
> species list with your function fetch_gene and it worked. 
> But there is one small question. What is the "BOLD[all]" for? Sorry, I am 
> just a beginner on phylogeny and am not familar with this term yet. When I 
> included it in my case, the search result is empty, but when I exluded this 
> term but keeping others the same it worked.
> 
> Thank you again,
> Yuxin
> 
> 
> 
> Yuxin Chen
> Phd Candidate
> School of Life Sciences
> Sun Yat-sen University
> Guangzhou, P. R. China, 510006
> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn 
>  
> From: David Winter
> Date: 2014-07-14 02:52
> To: cheny...@hotmail.com
> Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a 
> list of species from GenBank
> Hi Yuxin,
>  
> If you want specifically to get at the Barcode of Life records in
> genbank then you can try using the NCBI's Entrez tools
> (http://www.ncbi.nlm.nih.gov/books/NBK25500/). If you want to do it in
> R you can use rentrez, a library that I maintain as part of rOpenSci
> (https://github.com/ropensci/rentrez)
>  
> Taking a quick look at some of the BOLD records, it seems they are not
> consistently tagged in a way that makes them easy to search for.
> Here's what I came up with for a solution
>  
> library(rentrez)
> nuc_search <- entrez_search(db="nuccore", term="Solanum[Organism]
> rbcl[gene] BOLD[all]", retmax=40)
> head(nuc_search$ids)
> ##[1] "409977017" "409977015" "379134037" "379134035" "379133963" "326394567"
>  
> Those ids can then be passed to read.Genbank or entrez_fetch to
> retrieve records.
>  
> If you want to do this for a bunch of genes you might want to wrap the
> whole process up in a function:
>  
> fetch_gene <- function(organism_name, gene_name, file_format="fasta",
> max_recs=50){
>     sterm <- sprintf("%s[organism] %s[gene] BOLD[all]", organism_name,
> gene_name)
>     nuc_search <- entrez_search(db="nuccore", term=sterm, retmax=max_recs)
>     return(entrez_fetch(db="nuccore", id=nuc_search$ids, rettype=file_format))
> }
> genera <- c("Solanum", "Terminalia")
> recs <- lapply(genera, fetch_gene, gene_name="rbcl")
>  
> Which will give you a list of characters, each representing a fasta
> file. You can check have the right number of records etc if you want:
>  
> library(stringr)
> sapply(recs, str_count, pattern=">")
>  
> Almost every language you might otherwise use in bioinformatics has a
> wrapper for the Entrez API, so you easily adapt this to You Favourite
> Language if you wanted to.
>  
> Hope that's some help to you
>  
> David
>  
> On Sun, Jul 13, 2014 at 8:39 AM, cheny...@hotmail.com
> <cheny...@hotmail.com> wrote:
>> Dear All,
>>
>> I have a list of about 2,000 plant species, and want to construct a 
>> phylogenetic tree for them. I'd like to use the DNA barcode data availabe in 
>> GenBank. Then I will first need to download these DNA sequences from the 
>> Internet. I know that read.GenBank in package "ape" is capable to do it if I 
>> have the GenBank accession numbers. But what I only have now is their 
>> species names. Does anybody know which R function can batch-process it with 
>> only species names from GenBank?
>>
>> Many thanks in advance.
>> Yuxin
>>
>>
>>
>> Yuxin Chen
>> Phd Candidate
>> School of Life Sciences
>> Sun Yat-sen University
>> Guangzhou, P. R. China, 510006
>> cheny...@gmail.com or cheny...@mail2.sysu.edu.cn
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
>  
>  
>  
>

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to