I tried an inline png but I think it was rejected by bioc-devel. Here's another try.
On Fri, Dec 13, 2019 at 11:40 AM Vincent Carey <st...@channing.harvard.edu> wrote: > Thanks -- It is good to know more about the complications of adding > seqlevelsStyle elements. > I am not sure how pervasive this will be in SNP annotation in the future. > The "new API" for dbSNP > references SPDI annotation conventions. > > https://api.ncbi.nlm.nih.gov/variation/v0/ > > at least one dbsnp build 152 resource uses this nomenclature. The one > > referenced below is the "go-to" resource for current rsid-coordinate > > correspondence, as far as I know. > > > > library(VariantAnnotation) > > *0/0 packages newly attached/loaded, see sessionInfo() for details.* > > > mypar = GRanges("NC_000001.11", IRanges(100000,120000)) # note seqnames > > > > nn = readVcf(" > ftp://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz > ", > > + genome="GRCh38", param=mypar) > > > > head(rowRanges(nn), 3) > > GRanges object with 3 ranges and 5 metadata columns: > > seqnames ranges strand | paramRangeID REF > > <Rle> <IRanges> <Rle> | <factor> <DNAStringSet> > > rs1331956057 NC_000001.11 100000 * | <NA> C > > rs1252351580 NC_000001.11 100036 * | <NA> T > > rs1238523913 NC_000001.11 100051 * | <NA> T > > ALT QUAL FILTER > > <DNAStringSetList> <numeric> <character> > > rs1331956057 T <NA> . > > rs1252351580 G <NA> . > > rs1238523913 C <NA> . > > ------- > > seqinfo: 1 sequence from GRCh38 genome; no seqlengths > > > On Fri, Dec 13, 2019 at 11:01 AM Robert Castelo <robert.cast...@upf.edu> > wrote: > >> hi Hervé, >> >> i didn't know about this new sequence style until Vince posted his >> message and we briefly talked about it at the European BioC meeting this >> week in Brussels. however, i didn't know that the style was specific to >> a particular assembly. i have no use case of this at the mome moment, >> i.e., i have not encountered myself any annotation or BAM file with >> chromosome names written that way, so i don't know how pressing this >> issue is, maybe Vince can tell us how spread such chromosome naming >> style may become in the near future. >> >> naively, i'd think that it would be matter of adding a >> reference-specific column, i.e., 'GRCh38.p13', 'GRCh37.p13', etc., but i >> can imagine that maybe the "reference style" concept might not be the >> appropriate placeholder to map all different chromosome names of all >> different individual human genomes uploaded to NCBI. maybe we should >> wait until we have a specific use case .. Vince? >> >> robert. >> >> On 12/11/19 10:06 PM, Pages, Herve wrote: >> > Hi Vince, Robert, >> > >> > Looks like Vince wants the RefSeq accession e.g. NC_000017.11 for chrom >> > 17 in the GRCh38. >> > >> > @Robert: Is this what you're also interested in? >> > >> > The problem is that the RefSeq accessions are specific to a particular >> > assembly (e.g. NC_000017.11 for chrom 17 in GRCh38 but NC_000017.10 for >> > the same chrom in GRCh37). >> > >> > Currently seqlevelsStyle() doesn't know how to distinguish between >> > different assemblies of the same organism. Not saying it couldn't but it >> > would require some thinking and some significant refactoring. It >> > wouldn't be just a matter of adding a column to >> > genomeStyles()$Homo_sapiens. >> > >> > H. >> > >> > >> > On 12/10/19 14:19, Robert Castelo wrote: >> >> I second this, and would suggest to name the style as 'GRC' for "Genome >> >> Reference Consortium". >> >> >> >> thanks Vince for bringing this up, being able to easily switch between >> >> genome styles is great. >> >> >> >> if 'paste0()' in R is one of the most influential contributions to >> >> statistical computing >> >> >> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__simplystatistics.org_2013_01_31_paste0-2Dis-2Dstatistical-2Dcomputings-2Dmost-2Dinfluential-2Dcontribution-2Dof-2Dthe-2D21st-2Dcentury&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=b0_SIu8orJ7ZcCS3TIodFvGTPibt9R8vFL5Y40YSx3Q&e= >> >> >> >> i think that 'seqlevelsStyle()' from the GenomeInfoDb package is one of >> >> the most influential contributions to human genetics, if you think >> about >> >> the time invested by researchers in parsing and changing between >> >> different styles of chromosome names :) >> >> >> >> robert. >> >> >> >> On 06/12/2019 15:03, Vincent Carey wrote: >> >>> I raised this issue previously with little response. >> >>> >> >>> I'd propose that we add a column or two to genomeStyles()$Homo_sapiens >> >>> >> >>>> head(genomeStyles()$Homo_sapiens, 2) >> >>> circular auto sex NCBI UCSC dbSNP Ensembl >> >>> >> >>> 1 FALSE TRUE FALSE 1 chr1 ch1 1 >> >>> >> >>> 2 FALSE TRUE FALSE 2 chr2 ch2 2 >> >>> >> >>> >> >>> that includes the values for "NCBI reference sequence names" >> >>> >> >>> See >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_nuccore_568815581&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=3Jy-MH7heIcrc_A4qm_izduLvBoPWHSeq4gdxf5nv24&e= >> >>> for one report on chr17, >> >>> and >> >>> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.39&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=y6ut_Xcc4rSbXanckiJhiwLsL0W8neJfKWQa6wnG3aM&e= >> >>> >> >>> for a table that includes the Genbank labels. >> >>> >> >>> Should I just file a PR at >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomeInfoDb_&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=KMzfo3_8kkJ-wdvRCNP5rUjTVMW87brj07yHaKL5Qb0&e= >> >>> after >> >>> testing? >> >>> >> >> >> >> _______________________________________________ >> >> Bioc-devel@r-project.org mailing list >> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=SvtNreKVOHnSGjsRwzWWpttpEF7wBXI5utI37-qgX1A&e= >> >> >> > >> >> -- >> Robert Castelo, PhD >> Associate Professor >> Dept. of Experimental and Health Sciences >> Universitat Pompeu Fabra (UPF) >> Barcelona Biomedical Research Park (PRBB) >> Dr Aiguader 88 >> E-08003 Barcelona, Spain >> telf: +34.933.160.514 >> fax: +34.933.160.550 >> > -- The information in this e-mail is intended only for the ...{{dropped:18}} _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel