Hi Marc,

On 09/22/2015 05:39 PM, Marc Carlson wrote:
Herve is right. UCSC doesn't give us this information,  And actually, I
think it's pretty rare to see exon names from anybody.   So it's weird
to me that they are a default return value for this method.

Ensembl does provide exon names/ids so any TxDb object that was created
with makeTxDbFromBiomart("ensembl", ...) should have them:

  library(GenomicFeatures)
  txdb <- makeTxDbFromBiomart("ensembl", dataset="celegans_gene_ensembl")
  exonsBy(txdb, use.names=TRUE)$Y74C9A.2a.2
  # GRanges object with 4 ranges and 3 metadata columns:
# seqnames ranges strand | exon_id exon_name exon_rank # <Rle> <IRanges> <Rle> | <integer> <character> <integer> # [1] I [10413, 10585] + | 1 WBGene00022276.e1 1 # [2] I [11618, 11689] + | 6 WBGene00022276.e6 2 # [3] I [14951, 15160] + | 11 WBGene00022276.e11 3 # [4] I [16473, 16842] + | 14 WBGene00022276.e14 4
  #   -------
  #   seqinfo: 7 sequences (1 circular) from an unspecified genome

Note that the *By() extractors don't let the user choose which column
to return at the moment so that's why it was decided (a long time ago)
to return exon internal ids *and* names (better more than less).

H.


   Marc

On Tue, Sep 22, 2015 at 5:29 PM, Hervé Pagès <hpa...@fredhutch.org
<mailto:hpa...@fredhutch.org>> wrote:

    Hi Sonali,

    UCSC doesn't provide names for the exons of their gene models.
    See the table where this data is coming from:


    
https://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema

    The exon information is all coming from the exonStarts and exonEnds
    columns. No exon names!

    H.

    PS: Maybe this would better be asked on the support site.


    On 09/22/2015 04:44 PM, Arora, Sonali wrote:

           Hi everyone,

        I was trying to get the exons by gene from a txdb object but I
        get NA's
        for all exon_name's. Please advise.

          > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
          > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
          > ebg2 <- exonsBy(txdb, by="gene")
          >
          > ebg2
        GRangesList object of length 23459:
        $1
        GRanges object with 15 ranges and 2 metadata columns:
                 seqnames               ranges strand   |   exon_id
                    <Rle>            <IRanges>  <Rle>   | <integer>
             [1]    chr19 [58858172, 58858395]      -   |    250809
             [2]    chr19 [58858719, 58859006]      -   |    250810
             [3]    chr19 [58859832, 58860494]      -   |    250811
             [4]    chr19 [58860934, 58862017]      -   |    250812
             [5]    chr19 [58861736, 58862017]      -   |    250813
             ...      ...                  ...    ... ...       ...
            [11]    chr19 [58868951, 58869015]      -   |    250821
            [12]    chr19 [58869318, 58869652]      -   |    250822
            [13]    chr19 [58869855, 58869951]      -   |    250823
            [14]    chr19 [58870563, 58870689]      -   |    250824
            [15]    chr19 [58874043, 58874214]      -   |    250825
                   exon_name
                 <character>
             [1]        <NA>
             [2]        <NA>
             [3]        <NA>
             [4]        <NA>
             [5]        <NA>
             ...         ...
            [11]        <NA>
            [12]        <NA>
            [13]        <NA>
            [14]        <NA>
            [15]        <NA>

        $10
        GRanges object with 2 ranges and 2 metadata columns:
                seqnames               ranges strand | exon_id exon_name
            [1]     chr8 [18248755, 18248855]      + |  113603      <NA>
            [2]     chr8 [18257508, 18258723]      + |  113604      <NA>

        ...
        <23457 more elements>
        -------
        seqinfo: 93 sequences (1 circular) from hg19 genome
          > testgr <- unlist(ebg2)
          > table(is.na <http://is.na>(mcols(testgr)$exon_name))

            TRUE
        272776
          > sessionInfo()
        R version 3.2.2 RC (2015-08-09 r68965)
        Platform: x86_64-w64-mingw32/x64 (64-bit)
        Running under: Windows 7 x64 (build 7601) Service Pack 1

        locale:
        [1] LC_COLLATE=English_United States.1252
        [2] LC_CTYPE=English_United States.1252
        [3] LC_MONETARY=English_United States.1252
        [4] LC_NUMERIC=C
        [5] LC_TIME=English_United States.1252

        attached base packages:
        [1] stats4    parallel  stats     graphics  grDevices utils
        [7] datasets  methods   base

        other attached packages:
        [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
        [2] GenomicFeatures_1.21.29
        [3] AnnotationDbi_1.31.18
        [4] Biobase_2.29.1
        [5] GenomicRanges_1.21.28
        [6] GenomeInfoDb_1.5.16
        [7] IRanges_2.3.21
        [8] S4Vectors_0.7.18
        [9] BiocGenerics_0.15.6

        loaded via a namespace (and not attached):
           [1] XVector_0.9.4              zlibbioc_1.15.0
           [3] GenomicAlignments_1.5.17   BiocParallel_1.3.52
           [5] tools_3.2.2                SummarizedExperiment_0.3.9
           [7] DBI_0.3.1                  lambda.r_1.1.7
           [9] futile.logger_1.4.1        rtracklayer_1.29.27
        [11] futile.options_1.0.0       bitops_1.0-6
        [13] RCurl_1.95-4.7             biomaRt_2.25.3
        [15] RSQLite_1.0.0              Biostrings_2.37.8
        [17] Rsamtools_1.21.17          XML_3.98-1.3


    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>


    _______________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to