In light of this, could we get a version of GRCh37 with only a single mitochondrial genome?
On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès <hpa...@fredhutch.org> wrote: > Hi Felix, > > On 8/13/20 21:43, Felix Ernst wrote: > > Hi Leonard, Hi Herve, > > > > I followed your conversation, since I have noticed the same problem. > Thanks, Herve, for the explanation of the recent changes on hg19. > > > > The GRCh37.P13 report states in its last line: > > > > MT assembled-molecule MT Mitochondrion J01415.2 = > NC_012920.1 non-nuclear 16569 chrM > > > > Since the last name is called "UCSC-style-name", wouldn't that mean that > chrM has to be renamed to MT and not chrMT? > > This is a mistake in the sequence report for GRCh37.p13. GRCh37.p13:MT > is the same as hg19:chrMT, not hg19:chrM. > > hg19:chrM and hg19:chrMT are **not** the same sequences. The former is > NC_001807 and has length 16571 and the latter is NC_012920.1 and has > length 16569. > > Yes, seqlevelsStyle() is sorting out all this mess for you ;-) > > Cheers, > H. > > > > > Thanks again for the explanation. > > > > Cheers, > > Felix > > > > -----Ursprüngliche Nachricht----- > > Von: Bioc-devel <bioc-devel-boun...@r-project.org> Im Auftrag von Hervé > Pagès > > Gesendet: Freitag, 14. August 2020 01:08 > > An: Leonard Goldstein <goldstein.leon...@gene.com>; > bioc-devel@r-project.org > > Cc: charlotte.sone...@fmi.ch > > Betreff: Re: [Bioc-devel] BSgenome changes > > > > Hi Leonard, > > > > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote: > >> Dear Bioc team, > >> > >> I'm following up on this recent GitHub issue > >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21 > >> > _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e= > >. Please see the issue for more details and code examples. > >> > >> It looks like changes in Bioc devel result in two copies of the > >> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named > >> chrM like in previous package versions (length 16571) and one named > >> chrMT (length 16569). > >> > >> When using seqlevelsStyle() to change chromosome names from UCSC to > >> NCBI format, this results in new behavior -- in the past chrM was > >> simply renamed MT, now the different sequence chrMT is used. Is this > intended? > > > > Absolutely intended. > > > > There is a long story behind the unfortunate fate of the mitochondrial > chromosome in hg19. I'll try to keep it short. > > > > When the UCSC folks released the hg19 browser more than 10 years ago, > they based it on assembly GRCh37: > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e= > > > > See sequence report for GRCh37: > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e= > > > > For some mysterious reason GRCh37 didn't include the mitochondrial > chromosome so the UCSC folks decided to use mitochondrial sequence > > NC_001807 and called it chrM. > > > > However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of > GRCh37. A rather surprising move after many years of hg19 being based on > the latter. > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e= > > > > See sequence report for GRCh37.p13: > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e= > > > > Note that GRCh37.p13 does include the mitochondrial chromosome. It's > called MT in the official sequence report above and chrMT in hg19. > > > > At the same time the UCSC folks decided to keep chrM so now hg19 > contains 2 mitochondrial sequences: chrM and chrMT. Previously it has only > one: chrM. > > > > So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with > > seqlevelsStyle(genome) is only reflecting this. In particular > > seqlevelsStyle(genome) <- "NCBI" now does the following: > > > > - Rename chrMT -> MT. > > > > - chrM does NOT get renamed. There is no point in renaming this > sequence because it has no equivalent in GRCh37.p13. > > > > Hope this helps, > > > > H. > > > >> > >> Leonard > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail > >> man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA > >> vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv > >> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e= > >> > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fredhutch.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e= > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Best, Kasper [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel