Hi Marc, Thanks a lot for your help, I will keep you updated on this.
Tengfei On Mon, Feb 11, 2013 at 4:21 PM, Marc Carlson <mcarl...@fhcrc.org> wrote: > ** > Hi Tengfei, > > Ugh. It seems that they have cooked up yet another new way to represent > that same kind of data inside of a gff file. :( I am sad to say that > this is exactly the sort of thing that I was worried about. > > If you can't specify a field from your gff attribute field that contains > the exon rank information (and in this case it looks like you can't). Then > the software will try to infer it for you (and it will warn you that it is > being forced to do this). But the inference is not magic of course and it > is just going to do the simplest possible thing.. It is just going to > assume that the order of the exons along the chromosome is the correct > rank. But so for something like soybeans, I definitely think should > extract those exon ranks and use them instead... > > But how best to proceed with this very weird file? > > If I was in your shoes I would probably look at doing a substitution. You > could use a substitution to convert attributes (things in the final column) > that look like ".exon.1" into things that look like ".exon.1;exonRank=1" > while using a regular expression so that the "1" was preserved into the > output. A couple of global substitutions like this would effectively add > an attribute to the file for all the rows that contain a CDS or exon. You > could do this substitution in R for example and then save out a modified > file. Then you could just feed that modified file right into the > makeTranscriptDbFromGFF() function and pass "exonRank" as the argument to > exonRankAttributeName... > > > Also, I am just now checking in a solution to the other inconvenience that > you reported earlier (to the devel branch). So look for an update to > appear very soon (or DL it from svn if you are impatient). Please let me > know if there are any more snags with this. > > > > Marc > > > > > On 02/11/2013 12:01 PM, Tengfei Yin wrote: > > Hi Marc, > > Thanks a lot for your advice. > > I think as far as I know the gff3 file is the only way I can use to get > Gmax's latest build for annotation from phytozome( > http://www.phytozome.net/). Now it's publicly available > > > ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz > > And the reason I didn't provide the 'exonRankAttributeName' is that > because there is no explicit numbers which indicate the exon rank directly > in that gff3 file, examples are like > > Gm01 phytozome8_0 gene 27643 27977 . - . ID=Glyma01g00210;Name=Glyma01g00210 > > Gm01 phytozome8_0 mRNA 27643 27977 . - . > ID=PAC:26325839;Name=Glyma01g00210.1;pacid=26325839;longest=1;Parent=Glyma01g00210 > > Gm01 phytozome8_0 exon 27913 27977 . - . > ID=PAC:26325839.exon.1;Parent=PAC:26325839;pacid=26325839 > > Gm01 phytozome8_0 CDS 27913 27977 . - 0 > ID=PAC:26325839.CDS.1;Parent=PAC:26325839;pacid=26325839 > > Gm01 phytozome8_0 exon 27643 27811 . - . > ID=PAC:26325839.exon.2;Parent=PAC:26325839;pacid=26325839 > > Gm01 phytozome8_0 CDS 27643 27811 . - 1 > ID=PAC:26325839.CDS.2;Parent=PAC:26325839;pacid=26325839 > > > The ID attributes looks like it has information about the rank, I see > *.exon.1 *.exon.2, so I guess I can extract those information as extra > column manually and specify them in the function of ' > makeTranscriptDbFromGFF'. > > btw, Is this required? It looks like the GenomicFeatures trying to infer > exon rank if I didn't provide that information, so I thought > 'exonRankAttributeName' > is optional at first. > > Thanks again > > Tengfei > > > > > > On Fri, Feb 8, 2013 at 6:08 PM, Marc Carlson <mcarl...@fhcrc.org> wrote: > > Hi Tengfei, > > Yes that looks like an oversight. Thanks for reporting that! I will > extend makeTxDbPackage so that it's more accommodating of these newer > transcriptDbs. If you want to help me out, you could call saveDb() on your > gmax189 object and send me the .sqlite file that you save it to. > > Also, if you have any alternate options for importing your data (other > than using GFF or GTF): I think you probably should consider it. The file > specifications for these filetypes are missing key details and so you can > very easily get a "legal" GFF or GTF file that is actually missing > important details from it's contents. For example, they can commonly lack > information about the order of the exons for a given transcript, which can > render them difficult (or impossible) to use for transcript work. But for > these specifications, that information is "optional". > > > Marc > > > > > On 02/06/2013 09:46 PM, Tengfei Yin wrote: > > Dear all, > > I am trying to build a txdb object from gff3 for soybean data and try to > make it a package. Code used like this > > gmax189<- makeTranscriptDbFromGFF("~/ Gmax_189_gene_exons.gff3", > > format = "gff3", species = "Glycine > max", > dataSource = " > http://www.phytozome.org/") > makeTxDbPackage(txdb = gmax189, > version = "0.9.1", > maintainer = "Tengfei Yin", > author = "Tengfei Yin", > destDir=".", > license="Artistic-2.0") > > Error message: > Error in gsub("_", "", pkgName) : > error in evaluating the argument 'x' in selecting a method for function > 'gsub': Error: object 'pkgName' not found > > > Looks like my dataSource should be either BioMart or UCSC, otherwise no > pkgname will be produced in function .makePackageName? > > Or should I build annotation package in some other ways? > > Thanks a lot > > Tengfei > > my sessionInfo > > sessionInfo() > > R Under development (unstable) (2013-01-21 r61728) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10 Biobase_2.19.2 > [4] GenomicRanges_1.11.28 IRanges_1.17.31 BiocGenerics_0.5.6 > > loaded via a namespace (and not attached): > [1] biomaRt_2.15.0 Biostrings_2.27.10 bitops_1.0-5 > BSgenome_1.27.1 > [5] DBI_0.2-5 RCurl_1.95-3 Rsamtools_1.11.15 > RSQLite_0.11.2 > [9] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 XML_3.95-0.1 > > [13] zlibbioc_1.5.0 > > > > ______________________________ _________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/ > listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> > > > > > -- > Tengfei Yin > MCDB PhD student > 1620 Howe Hall, 2274, > Iowa State University > Ames, IA,50011-2274 > > > > > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel