Hi Peter,

Just added support for "dbSNP" seqlevels style for Human (in
GenomeInfoDb 1.1.9, will become available tomorrow):

  library(SNPlocs.Hsapiens.dbSNP.20120608)
  myrsids <- c("rs2639606", "rs75264089", "rs73396229", "rs55871206",
               "rs10932221", "rs56219727", "rs73709730", "rs55838886",
               "rs3734153", "rs79381275", "rs1516535")
  gr <- rsidsToGRanges(myrsids)

Then:

  > seqnames(gr)
  factor-Rle of length 11 with 11 runs
    Lengths:    1    1    1    1    1    1    1    1    1    1    1
    Values :  ch9  ch6 ch11 ch13  ch2  ch4  ch7  ch2  ch5 ch11  ch4
Levels(25): ch1 ch2 ch3 ch4 ch5 ch6 ch7 ... ch19 ch20 ch21 ch22 chX chY chMT

  > seqlevelsStyle(gr)
  [1] "dbSNP"

  > seqlevelsStyle(gr) <- "NCBI"

  > seqnames(gr)
  factor-Rle of length 11 with 11 runs
    Lengths:  1  1  1  1  1  1  1  1  1  1  1
    Values :  9  6 11 13  2  4  7  2  5 11  4
Levels(25): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT

  > seqlevelsStyle(gr) <- "UCSC"

  > seqnames(gr)
  factor-Rle of length 11 with 11 runs
Lengths: 1 1 1 1 1 1 1 1 1 1 1 Values : chr9 chr6 chr11 chr13 chr2 chr4 chr7 chr2 chr5 chr11 chr4 Levels(25): chr1 chr2 chr3 chr4 chr5 chr6 ... chr20 chr21 chr22 chrX chrY chrM

Make the seqlevelsStyle() setter work directly on the
SNPlocs.Hsapiens.dbSNP.20120608 object itself will take more time
though. It'll actually be part of some more important SNPlocs
refactoring plans I've had on my list for a while now. Won't happen
before a couple of months.

Cheers,
H.

On 06/17/2014 10:37 PM, Hervé Pagès wrote:
Hi Peter,

Yes, as Vince said, the chromosome names are those used by dbSNP. For
whatever reason, dbSNP, which is part of NCBI, felt the need to use
a different naming convention than the rest of NCBI :-/

On 06/17/2014 07:57 PM, Peter Hickey wrote:
Thanks for the explanation, Vincent. GenomeInfoDb has NCBI and UCSC
support, but doesn't seem to support the dbSNP format. Perhaps this
should be added?

The seqlevelsStyle() setter first requires that the seqlevels() setter
works on a SNPlocs object, which itself requires that the seqinfo()
setter works. Unfortunately, it doesn't at the moment:

   > library(SNPlocs.Hsapiens.dbSNP.20120608)

   > snps <- SNPlocs.Hsapiens.dbSNP.20120608

   > seqlevels(snps) <- sub("^ch", "chr", seqlevels(snps))
   Error in (function (classes, fdef, mtable)  :
     unable to find an inherited method for function ‘seqinfo<-’ for
signature ‘"SNPlocs"’

Something I'm adding on my list.

In the mean time you can do the renaming on the GRanges objects
you extract with 'getSNPlocs(..., as.GRanges=TRUE)' or with
'rsidsToGRanges(...)'. Maybe it's not very convenient to have to do
this each time you extract snps in a GRanges object but OTOH it's
really easy those days now that we have seqlevelsStyle().

Hope this helps.

Cheers,
H.


seqlevelsStyle(seqnames(SNPlocs.Hsapiens.dbSNP.20120608))
Error in .guessSpeciesStyle(seqnames) :
   The style does not have a compatible entry for the species
supported by Seqname. Please
   see genomeStyles() for supported species/style

On 18/06/2014, at 12:40 PM, Vincent Carey <st...@channing.harvard.edu>
wrote:

it is the convention used in dbSNP, just propagated directly.  indeed
one typically has to relabel, but there
is seqnamesStyle infrastructure in GenomeInfoDb that may help.


On Tue, Jun 17, 2014 at 8:17 PM, Peter Hickey <hic...@wehi.edu.au>
wrote:
Is there a reason why the seqnames of SNPlocs.Hsapiens.dbSNP.20120608
(and possibly the other SNPlocs.*) use the prefix "ch" instead of
"chr"? E.g. "ch1" instead of "chr1". It doesn't seem to fit with any
standard way of naming chromosomes and means that these need to be
renamed to use with most other Bioconductor data sources.
Thanks,
Pete
--------------------------------
Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hic...@wehi.edu.au
http://www.wehi.edu.au


______________________________________________________________________
The information in this email is confidential and inte...{{dropped:28}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to