hi Hervé,

that's great, thanks for implementing quickly these bits.

robert.

On 01/26/2016 01:19 AM, Hervé Pagès wrote:
Hi Robert,

I've made the following changes to the seqlevelsStyle() getter and
setter:
1) No more warning when the getter returns more than 1 compatible
style.
2) When more than 1 style is supplied, the setter uses the 1st one
only with a warning.

That should address the issues you reported below. I'm sure the behavior
of the seqlevelsStyle() setter could be refined when more than 1 style
is supplied but the new behavior should get us going for now.

These changes are in GenomeInfoDb 1.6.3 (release) and 1.7.5 (devel).

Note that the Ensembl mappings were contributed by Jo last week (thanks
Jo) and they're indeed the same as the NCBI mappings for Human but they
differ for all the other organisms supported by GenomeInfoDb at the
moment. Anyway, generally speaking, it sounds like the user should be
able to do seqlevelsStyle(x) <- "Ensembl" independently of whether this
will result in seqlevels that are the same as if s/he had done
seqlevelsStyle(x) <- "NCBI".

Cheers,
H.

On 01/25/2016 02:39 AM, Robert Castelo wrote:
hi,

i would like to ask if current line #142 of
GenomeInfoDb/R/seqlevelsStyle.R:

message("warning! Multiple seqlevels styles found.")

could be replaced by

warning("Multiple seqlevels styles found.")

since, after all, the message is a warning.

the reason for my request is that a recent update on GenomeInfoDb added
the 'Ensembl' sequence style and this executes the previous line when i
try to figure out the sequence style of a BAM file produced from human
sequence data and GATK. for instance:

path2bam <- file.path(system.file("extdata",
package="VariantFiltering"), "NA12878.subset.bam")

hdr <- scanBamHeader(path2bam)
names(hdr[[1]]$targets)
[1] "1" "2" "3" "4" "5"
[6] "6" "7" "8" "9" "10"
[11] "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20"
[21] "21" "22" "X" "Y" "MT"
[26] "GL000207.1" "GL000226.1" "GL000229.1" "GL000231.1" "GL000210.1"
[31] "GL000239.1" "GL000235.1" "GL000201.1" "GL000247.1" "GL000245.1"
[36] "GL000197.1" "GL000203.1" "GL000246.1" "GL000249.1" "GL000196.1"
[41] "GL000248.1" "GL000244.1" "GL000238.1" "GL000202.1" "GL000234.1"
[46] "GL000232.1" "GL000206.1" "GL000240.1" "GL000236.1" "GL000241.1"
[51] "GL000243.1" "GL000242.1" "GL000230.1" "GL000237.1" "GL000233.1"
[56] "GL000204.1" "GL000198.1" "GL000208.1" "GL000191.1" "GL000227.1"
[61] "GL000228.1" "GL000214.1" "GL000221.1" "GL000209.1" "GL000218.1"
[66] "GL000220.1" "GL000213.1" "GL000211.1" "GL000199.1" "GL000217.1"
[71] "GL000216.1" "GL000215.1" "GL000205.1" "GL000219.1" "GL000224.1"
[76] "GL000223.1" "GL000195.1" "GL000212.1" "GL000222.1" "GL000200.1"
[81] "GL000193.1" "GL000194.1" "GL000225.1" "GL000192.1" "NC_007605"
[86] "hs37d5"

seqlevelsStyle(names(hdr[[1]]$targets))
warning! Multiple seqlevels styles found.
[1] "NCBI" "Ensembl"

this is all fine in interactive mode so that the user is aware that
he/she may have to take action on this fact. however, at a specific
place of my package VariantFiltering i'd like to take action without
telling anything to the user. for that purpose i would like to use
suppressWarnings() but does not work currently with the message() call.
i could use suppressMessages() but would not prefer to.

on a side note, both NCBI and Ensembl sequence style seem to be
identical:

genomeStyles()$Homo_sapiens
circular auto sex NCBI UCSC dbSNP Ensembl
1 FALSE TRUE FALSE 1 chr1 ch1 1
2 FALSE TRUE FALSE 2 chr2 ch2 2
3 FALSE TRUE FALSE 3 chr3 ch3 3
4 FALSE TRUE FALSE 4 chr4 ch4 4
5 FALSE TRUE FALSE 5 chr5 ch5 5
6 FALSE TRUE FALSE 6 chr6 ch6 6
7 FALSE TRUE FALSE 7 chr7 ch7 7
8 FALSE TRUE FALSE 8 chr8 ch8 8
9 FALSE TRUE FALSE 9 chr9 ch9 9
10 FALSE TRUE FALSE 10 chr10 ch10 10
11 FALSE TRUE FALSE 11 chr11 ch11 11
12 FALSE TRUE FALSE 12 chr12 ch12 12
13 FALSE TRUE FALSE 13 chr13 ch13 13
14 FALSE TRUE FALSE 14 chr14 ch14 14
15 FALSE TRUE FALSE 15 chr15 ch15 15
16 FALSE TRUE FALSE 16 chr16 ch16 16
17 FALSE TRUE FALSE 17 chr17 ch17 17
18 FALSE TRUE FALSE 18 chr18 ch18 18
19 FALSE TRUE FALSE 19 chr19 ch19 19
20 FALSE TRUE FALSE 20 chr20 ch20 20
21 FALSE TRUE FALSE 21 chr21 ch21 21
22 FALSE TRUE FALSE 22 chr22 ch22 22
23 FALSE FALSE TRUE X chrX chX X
24 FALSE FALSE TRUE Y chrY chY Y
25 TRUE FALSE FALSE MT chrM chMT MT

this leads to the previous use case with a BAM file, and also to this
other one:

library(BSgenome.Hsapiens.NCBI.GRCh38)

seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)
warning! Multiple seqlevels styles found.
[1] "NCBI" "Ensembl"

note that now if you have a UCSC-style GRanges object like this one:

gr <- GRanges(c("chr1", "chr2"), IRanges(c(10, 20), c(30, 40)))
seqlevelsStyle(gr)
[1] "UCSC"

that you want to use with the BSgenome object, the following simple
operation will not work anymore:

seqlevelsStyle(gr) <- seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)
warning! Multiple seqlevels styles found.
Error in mapSeqlevels(x_seqlevels, value, drop = FALSE) :
the supplied seqname style must be a single string

of course, this has an easy fix:

seqlevelsStyle(gr) <- seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)[1]
warning! Multiple seqlevels styles found.

but in my view, we loose simplicity in the grammar of moving back and
forth between sequence styles. one workaround to keep the former simpler
syntax would be that 'seqlevelsStyle()<-' takes the first value and
gives a warning instead of prompting an error.

in any case, just out of curiosity, what is the reason to have two
identical sequence styles under different names?


thanks!

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to