Yes, using BSGenome would help in this case. In the long run I think it might be important to have this fixed, not necessarily for human, but for other species/genome builds for which there might not be an BSGenome package available; through AnnotationHub all GTF files and fasta files would be available. Note also that the FaFiles from Ensembl do have the “correct” chromosome names although I assume they were built from the same Ensembl fasta files than the TwoBitFiles.
jo > On 08 Jan 2016, at 22:49, Hervé Pagès <hpa...@fredhutch.org> wrote: > > On 01/08/2016 01:09 PM, Michael Lawrence wrote: >> That is one solution. But everyone using that genome would need to >> reset the seqlevels to the "standard" ones. In this specific case, is >> there any reason not to just use the BSgenome for GRCh38? > > I agree. Maybe we don't need seqlevels<-,TwoBitFile for that particular > use case. Just wanted to mention that the ability to rename the > sequences in a TwoBitFile, FastaFile, or other file-based object that > supports seqinfo() would be useful in general. > > H. > >> >> On Fri, Jan 8, 2016 at 11:04 AM, Hervé Pagès <hpa...@fredhutch.org> wrote: >>> Hi Jo, Michael, >>> >>> What about implementing a seqlevels() setter for TwoBitFile objects? All >>> you need for this is an extra slot for storing the user-supplied >>> seqlevels. Note that in general the seqlevels() setter allows more than >>> renaming the seqlevels. It also allows dropping, adding, and shuffling >>> them. But you don't need to support all that. Supporting renaming would >>> already go a long way. See selectMethod("seqlevels<-", "TxDb") in >>> GenomicFeatures for an example of a restricted "seqlevels<-" method. >>> >>> H. >>> >>> >>> On 01/08/2016 09:50 AM, Rainer Johannes wrote: >>>> >>>> I agree, I would not modify the file content. At present it is however not >>>> possible to use e.g. getSeq on these TwoBitFiles, since the chromosome >>>> names >>>> in the submitted GRanges (e.g. 1) do not match the seqnames/seqinfo of the >>>> TwoBitFile. I don’t know if a seqnames or seqinfo method stripping of all >>>> but the first name-part would help here... >>>> >>>> jo >>>> >>>>> On 08 Jan 2016, at 15:18, Sean Davis <seand...@gmail.com> wrote: >>>>> >>>>> I will make the small editorial comment to guard against modifying file >>>>> content on transit into the hub object. On the client side (after getting >>>>> such an object) I think a “fix” would be to have a quick seqnames method >>>>> to >>>>> strip off all but the first whitespace delimited piece. >>>>> >>>>> Sean >>>>> >>>>>> On Jan 8, 2016, at 8:40 AM, Michael Lawrence <lawrence.mich...@gene.com> >>>>>> wrote: >>>>>> >>>>>> This is perhaps something that could be handled when population the >>>>>> hub, but I'm not sure how rtracklayer could automatically derive the >>>>>> chromosome names. >>>>>> >>>>>> On Fri, Jan 8, 2016 at 2:37 AM, Rainer Johannes >>>>>> <johannes.rai...@eurac.edu> wrote: >>>>>>> >>>>>>> dear all, >>>>>>> >>>>>>> I just run into a problem with a TwoBitFile I fetched from >>>>>>> AnnotationHub. I was fetching a TwoBitFile with the genomic DNA >>>>>>> sequence, as >>>>>>> provided by Ensembl: >>>>>>> >>>>>>>> library(AnnotationHub) >>>>>>>> ah <- AnnotationHub() >>>>>>>> tbf <- ah[["AH50068”]] >>>>>>> >>>>>>> >>>>>>>> head(seqnames(seqinfo(tbf))) >>>>>>> >>>>>>> [1] "1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF" >>>>>>> [2] "10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF" >>>>>>> [3] "11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF" >>>>>>> [4] "12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF" >>>>>>> [5] "13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF" >>>>>>> [6] "14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF" >>>>>>> >>>>>>> Would be nice, if the seqnames would be really just the chromsome names >>>>>>> and not the whole string from the FA file header. Is there a way I >>>>>>> could fix >>>>>>> the file myself or is this something that should be fixed in the >>>>>>> rtracklayer >>>>>>> or AnnotationHub package when the TwoBitFile is created? >>>>>>> >>>>>>> thanks, jo >>>>>>> _______________________________________________ >>>>>>> Bioc-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpa...@fredhutch.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel