Hi Marc,

Thanks a lot for your help, I will keep you updated on this.

Tengfei

On Mon, Feb 11, 2013 at 4:21 PM, Marc Carlson <mcarl...@fhcrc.org> wrote:

> **
> Hi Tengfei,
>
> Ugh.  It seems that they have cooked up yet another new way to represent
> that same kind of data inside of a gff file.  :(    I am sad to say that
> this is exactly the sort of thing that I was worried about.
>
> If you can't specify a field from your gff attribute field that contains
> the exon rank information (and in this case it looks like you can't).  Then
> the software will try to infer it for you (and it will warn you that it is
> being forced to do this).  But the inference is not magic of course and it
> is just going to do the simplest possible thing..  It is just going to
> assume that the order of the exons along the chromosome is the correct
> rank.  But so for something like soybeans, I definitely think should
> extract those exon ranks and use them instead...
>
> But how best to proceed with this very weird file?
>
> If I was in your shoes I would probably look at doing a substitution.  You
> could use a substitution to convert attributes (things in the final column)
> that look like ".exon.1" into things that look like ".exon.1;exonRank=1"
> while using a regular expression so that the "1" was preserved into the
> output.  A couple of global substitutions like this would effectively add
> an attribute to the file for all the rows that contain a CDS or exon.  You
> could do this substitution in R for example and then save out a modified
> file.  Then you could just feed that modified file right into the
> makeTranscriptDbFromGFF() function and pass "exonRank" as the argument to
> exonRankAttributeName...
>
>
> Also, I am just now checking in a solution to the other inconvenience that
> you reported earlier (to the devel branch).  So look for an update to
> appear very soon (or DL it from svn if you are impatient).  Please let me
> know if there are any more snags with this.
>
>
>
>   Marc
>
>
>
>
> On 02/11/2013 12:01 PM, Tengfei Yin wrote:
>
> Hi Marc,
>
>  Thanks a lot for your advice.
>
>  I think as far as I know the gff3 file is the only way I can use to get
> Gmax's latest build for annotation from phytozome(
> http://www.phytozome.net/). Now it's publicly available
>
>
> ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Gmax/annotation/Gmax_189_gene_exons.gff3.gz
>
>  And the reason I didn't provide the 'exonRankAttributeName' is that
> because there is no explicit numbers which indicate the exon rank directly
> in that gff3 file, examples are like
>
>   Gm01 phytozome8_0 gene 27643 27977 . - . ID=Glyma01g00210;Name=Glyma01g00210
>
>  Gm01 phytozome8_0 mRNA 27643 27977 . - . 
> ID=PAC:26325839;Name=Glyma01g00210.1;pacid=26325839;longest=1;Parent=Glyma01g00210
>
>  Gm01 phytozome8_0 exon 27913 27977 . - . 
> ID=PAC:26325839.exon.1;Parent=PAC:26325839;pacid=26325839
>
>  Gm01 phytozome8_0 CDS 27913 27977 . - 0 
> ID=PAC:26325839.CDS.1;Parent=PAC:26325839;pacid=26325839
>
>  Gm01 phytozome8_0 exon 27643 27811 . - . 
> ID=PAC:26325839.exon.2;Parent=PAC:26325839;pacid=26325839
>
>  Gm01 phytozome8_0 CDS 27643 27811 . - 1 
> ID=PAC:26325839.CDS.2;Parent=PAC:26325839;pacid=26325839
>
>
>  The ID attributes looks like it has information about the rank, I see
> *.exon.1 *.exon.2, so I guess I can extract those information as extra
> column manually and specify them in the function of '
> makeTranscriptDbFromGFF'.
>
>  btw, Is this required? It looks like the GenomicFeatures trying to infer
> exon rank if I didn't provide that information, so I thought 
> 'exonRankAttributeName'
> is optional at first.
>
>  Thanks again
>
>  Tengfei
>
>
>
>
>
> On Fri, Feb 8, 2013 at 6:08 PM, Marc Carlson <mcarl...@fhcrc.org> wrote:
>
> Hi Tengfei,
>
> Yes that looks like an oversight.  Thanks for reporting that!  I will
> extend makeTxDbPackage so that it's more accommodating of these newer
> transcriptDbs.  If you want to help me out, you could call saveDb() on your
> gmax189 object and send me the .sqlite file that you save it to.
>
> Also, if you have any alternate options for importing your data (other
> than using GFF or GTF): I think you probably should consider it.  The file
> specifications for these filetypes are missing key details and so you can
> very easily get a "legal" GFF or GTF file that is actually missing
> important details from it's contents.  For example, they can commonly lack
> information about the order of the exons for a given transcript, which can
> render them difficult (or impossible) to use for transcript work.   But for
> these specifications, that information is "optional".
>
>
>   Marc
>
>
>
>
> On 02/06/2013 09:46 PM, Tengfei Yin wrote:
>
> Dear all,
>
> I am trying to build a txdb object from gff3 for soybean data and try to
> make it a package. Code used like this
>
> gmax189<- makeTranscriptDbFromGFF("~/ Gmax_189_gene_exons.gff3",
>
>                                     format = "gff3", species = "Glycine
> max",
>                                     dataSource = "
> http://www.phytozome.org/";)
> makeTxDbPackage(txdb = gmax189,
>                  version = "0.9.1",
>                  maintainer = "Tengfei Yin",
>                  author = "Tengfei Yin",
>                  destDir=".",
>                  license="Artistic-2.0")
>
> Error message:
> Error in gsub("_", "", pkgName) :
>    error in evaluating the argument 'x' in selecting a method for function
> 'gsub': Error: object 'pkgName' not found
>
>
> Looks like my dataSource should be either BioMart or UCSC, otherwise no
> pkgname will be produced in function .makePackageName?
>
> Or should I build annotation package in some other ways?
>
> Thanks a lot
>
> Tengfei
>
> my sessionInfo
>
>  sessionInfo()
>
> R Under development (unstable) (2013-01-21 r61728)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.11.8 AnnotationDbi_1.21.10  Biobase_2.19.2
> [4] GenomicRanges_1.11.28  IRanges_1.17.31        BiocGenerics_0.5.6
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.15.0     Biostrings_2.27.10 bitops_1.0-5
> BSgenome_1.27.1
>   [5] DBI_0.2-5          RCurl_1.95-3       Rsamtools_1.11.15
>   RSQLite_0.11.2
>   [9] rtracklayer_1.19.9 stats4_3.0.0       tools_3.0.0        XML_3.95-0.1
>
> [13] zlibbioc_1.5.0
>
>
>
>  ______________________________ _________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/ 
> listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
>  --
> Tengfei Yin
> MCDB PhD student
> 1620 Howe Hall, 2274,
> Iowa State University
> Ames, IA,50011-2274
>
>
>
>
>


-- 
Tengfei Yin
MCDB PhD student
1620 Howe Hall, 2274,
Iowa State University
Ames, IA,50011-2274

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to