Re: [Bioc-devel] Bioc Git credentials

2021-05-06 Thread Leonard Goldstein
Great, thanks!

Leonard

On Mon, May 3, 2021 at 11:53 PM Nitesh Turaga 
wrote:

> Hi Leonard,
>
>
>
> I’ve changed your email now, you should be able to activate your account.
>
>
>
> The email listed was to a gene.com email address.
>
>
>
> Add new SSH keys if you don’t have the old key pairs.
>
>
>
> Best,
>
>
>
> Nitesh
>
>
>
> *From: *Bioc-devel  on behalf of
> Leonard Goldstein 
> *Date: *Sunday, May 2, 2021 at 10:06 PM
> *To: *bioc-devel@r-project.org 
> *Subject: *[Bioc-devel] Bioc Git credentials
>
> Hi,
>
> I'm maintaining the SGSeq Bioc package and recently changed the maintainer
> email address. I'm trying to activate a new account for Git credentials
> here
> <https://git.bioconductor.org/BiocCredentials/account_activation/>, but
> get
> the following error message:
>
> ldgoldst...@gmail.com is not associated with a maintainer of a
> Bioconductor
> package. Please check the spelling or contact bioc-devel@r-project.org for
> help.
>
> Thanks for your help.
>
> Leonard
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Bioc Git credentials

2021-05-02 Thread Leonard Goldstein
Hi,

I'm maintaining the SGSeq Bioc package and recently changed the maintainer
email address. I'm trying to activate a new account for Git credentials here
, but get
the following error message:

ldgoldst...@gmail.com is not associated with a maintainer of a Bioconductor
package. Please check the spelling or contact bioc-devel@r-project.org for
help.

Thanks for your help.

Leonard

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BSgenome changes

2020-08-18 Thread Leonard Goldstein via Bioc-devel
Thanks for the explanation Hervé.

Best wishes

Leonard


On Tue, Aug 18, 2020 at 10:06 AM Hervé Pagès  wrote:

> On 8/18/20 01:40, Kasper Daniel Hansen wrote:
> > In light of this, could we get a version of GRCh37 with only a single
> > mitochondrial genome?
>
> You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would
> contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without
> the hg19:chrM sequence?
>
> Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you
> stay away from this mess. I'm not sure that adding yet another BSgenome
> package would make the situation less confusing.
>
> >
> > On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès  > <mailto:hpa...@fredhutch.org>> wrote:
> >
> > Hi Felix,
> >
> > On 8/13/20 21:43, Felix Ernst wrote:
> >  > Hi Leonard, Hi Herve,
> >  >
> >  > I followed your conversation, since I have noticed the same
> > problem. Thanks, Herve, for the explanation of the recent changes on
> > hg19.
> >  >
> >  > The GRCh37.P13 report states in its last line:
> >  >
> >  > MTassembled-molecule  MT  Mitochondrion   J01415.2
> >  =   NC_012920.1 non-nuclear 16569   chrM
> >  >
> >  > Since the last name is called "UCSC-style-name", wouldn't that
> > mean that chrM has to be renamed to MT and not chrMT?
> >
> > This is a mistake in the sequence report for GRCh37.p13.
> GRCh37.p13:MT
> > is the same as hg19:chrMT, not hg19:chrM.
> >
> > hg19:chrM and hg19:chrMT are **not** the same sequences. The former
> is
> > NC_001807 and has length 16571 and the latter is NC_012920.1 and has
> > length 16569.
> >
> > Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
> >
> > Cheers,
> > H.
> >
> >  >
> >  > Thanks again for the explanation.
> >  >
> >  > Cheers,
> >  > Felix
> >  >
> >  > -Ursprüngliche Nachricht-
> >  > Von: Bioc-devel  > <mailto:bioc-devel-boun...@r-project.org>> Im Auftrag von Hervé
> Pagès
> >  > Gesendet: Freitag, 14. August 2020 01:08
> >  > An: Leonard Goldstein  > <mailto:goldstein.leon...@gene.com>>; bioc-devel@r-project.org
> > <mailto:bioc-devel@r-project.org>
> >  > Cc: charlotte.sone...@fmi.ch <mailto:charlotte.sone...@fmi.ch>
> >  > Betreff: Re: [Bioc-devel] BSgenome changes
> >  >
> >  > Hi Leonard,
> >  >
> >  > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
> >  >> Dear Bioc team,
> >  >>
> >  >> I'm following up on this recent GitHub issue
> >  >>
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
> >  >>
> >
>  
> _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
> >  >. Please see the issue for more details and code examples.
> >  >>
> >  >> It looks like changes in Bioc devel result in two copies of the
> >  >> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one
> > named
> >  >> chrM like in previous package versions (length 16571) and one
> named
> >  >> chrMT (length 16569).
> >  >>
> >  >> When using seqlevelsStyle() to change chromosome names from UCSC
> to
> >  >> NCBI format, this results in new behavior -- in the past chrM was
> >  >> simply renamed MT, now the different sequence chrMT is used. Is
> > this intended?
> >  >
> >  > Absolutely intended.
> >  >
> >  > There is a long story behind the unfortunate fate of the
> > mitochondrial chromosome in hg19. I'll try to keep it short.
> >  >
> >  > When the UCSC folks released the hg19 browser more than 10 years
> > ago, they based it on assembly GRCh37:
> >  >
> >  >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F01405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4

[Bioc-devel] BSgenome changes

2020-08-12 Thread Leonard Goldstein via Bioc-devel
Dear Bioc team,

I'm following up on this recent GitHub issue
. Please see the issue for more
details and code examples.

It looks like changes in Bioc devel result in two copies of the
mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named chrM
like in previous package versions (length 16571) and one named chrMT
(length 16569).

When using seqlevelsStyle() to change chromosome names from UCSC to NCBI
format, this results in new behavior -- in the past chrM was simply renamed
MT, now the different sequence chrMT is used. Is this intended?

Leonard

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Bioc-devel changes and SummarizedExperiment class

2018-04-11 Thread Leonard Goldstein
Thanks Hervé!

Leonard


On Tue, Apr 10, 2018 at 11:03 AM, Hervé Pagès  wrote:

> Hi Leonard,
>
> This should be fixed in SGSeq 1.13.6 (see commit
> 5dc16968f7ea1a4b59595ebaabacca9a76699b80).
>
> Cheers,
> H.
>
>
> On 04/04/2018 09:23 AM, Leonard Goldstein wrote:
>
>> Hi Hervé,
>>
>> Some recent changes in bioc-devel are causing trouble with
>> SummarizedExperiment objects if the rowRanges slot inherits from
>> GRangesList. Please see example below.
>>
>> Thanks in advance for your help.
>>
>> Leonard
>>
>> --
>>
>>> library(SGSeq)
>>>
>>> ## SGVariants object inherits from GRangesList
>>>
>>
>>
>> is(sgv_pred)
>>>
>>   [1] "SGVariants" "GRangesList"
>>   [3] "Paths"  "GenomicRangesList"
>>   [5] "CompressedRangesList"   "GenomicRanges_OR_GRangesList"
>>   [7] "RangesList" "CompressedList"
>>   [9] "GenomicRanges_OR_GenomicRangesList" "List"
>> [11] "Vector" "list_OR_List"
>> [13] "Annotated"
>>
>>>
>>> ## example counts
>>>
>>
>>
>> counts <- matrix(1:2, ncol = 1)
>>>
>>> ## creating SummarizedExperiment object fails
>>>
>>
>>
>> SummarizedExperiment(assays = list(counts), rowRanges = sgv_pred)
>>>
>> class: RangedSummarizedExperiment
>> dim: 2 1
>> metadata(0):
>> assays(1): ''
>> Error in .local(object, ..., verbose) : unused argument (check = FALSE)
>>
>>>
>>> ## works after coercing to GRangestList
>>>
>>
>>
>> SummarizedExperiment(assays = list(counts), rowRanges = as(sgv_pred,
>>>
>> "GRangesList"))
>> class: RangedSummarizedExperiment
>> dim: 2 1
>> metadata(0):
>> assays(1): ''
>> rownames: NULL
>> rowData names(20): from to ... variantType variantName
>> colnames: NULL
>> colData names(0):
>>
>>>
>>> sessionInfo()
>>>
>> R Under development (unstable) (2017-10-20 r73567)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>>
>> Matrix products: default
>> BLAS:
>> /gnet/is2/p01/apps/R/3.5.0-20171105-devel/x86_64-linux-2.6-
>> rhel6/lib64/R/lib/libRblas.so
>> LAPACK:
>> /gnet/is2/p01/apps/R/3.5.0-20171105-devel/x86_64-linux-2.6-
>> rhel6/lib64/R/lib/libRlapack.so
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>> [8] methods   base
>>
>> other attached packages:
>>   [1] SGSeq_1.13.5SummarizedExperiment_1.9.16
>>   [3] DelayedArray_0.5.23 BiocParallel_1.13.3
>>   [5] matrixStats_0.53.1  Biobase_2.39.2
>>   [7] Rsamtools_1.31.3Biostrings_2.47.12
>>   [9] XVector_0.19.9  GenomicRanges_1.31.23
>> [11] GenomeInfoDb_1.15.5 IRanges_2.13.28
>> [13] S4Vectors_0.17.39   BiocGenerics_0.25.3
>>
>> loaded via a namespace (and not attached):
>>   [1] Rcpp_0.12.16  compiler_3.5.0
>>   [3] GenomicFeatures_1.31.10   prettyunits_1.0.2
>>   [5] bitops_1.0-6  tools_3.5.0
>>   [7] zlibbioc_1.25.0   progress_1.1.2
>>   [9] biomaRt_2.35.13   digest_0.6.15
>> [11] bit_1.1-13RSQLite_2.1.0
>> [13] memoise_1.1.0 lattice_0.20-35
>> [15] pkgconfig_2.0.1   igraph_1.2.1
>> [17] Matrix_1.2-13 DBI_0.8
>> [19] GenomeInfoDbData_1.1.0rtracklayer_1.39.9
>> [21] httr_1.3.1stringr_1.3.0
>> [23] bit64_0.9-8   grid_3.5.0
>> [25] R6_2.2.2  AnnotationDbi_1.41.4
>> [27] XML_3.98-1.10 blob_1.1.1
>> [29] magrittr_1.5  GenomicAlignments_1.15.13
>> [31] RUnit_0.4.31  assertthat_0.2.0
>> [33] stringi_1.1.7 RCurl_1.95-4.10
>>
>>>
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt
>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=qp
>> gk0XNxCYLZKHMRS-PnHD2znDDwj1P-Eiu7P4aUSuI&s=qjwH7TgvfYKGtMDC
>> I77_VVUw8S-5PA6ctju8Jb3erUQ&e=
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Bioc-devel changes and SummarizedExperiment class

2018-04-04 Thread Leonard Goldstein
Hi Hervé,

Some recent changes in bioc-devel are causing trouble with
SummarizedExperiment objects if the rowRanges slot inherits from
GRangesList. Please see example below.

Thanks in advance for your help.

Leonard

--
> library(SGSeq)
>
> ## SGVariants object inherits from GRangesList


> is(sgv_pred)
 [1] "SGVariants" "GRangesList"
 [3] "Paths"  "GenomicRangesList"
 [5] "CompressedRangesList"   "GenomicRanges_OR_GRangesList"
 [7] "RangesList" "CompressedList"
 [9] "GenomicRanges_OR_GenomicRangesList" "List"
[11] "Vector" "list_OR_List"
[13] "Annotated"
>
> ## example counts


> counts <- matrix(1:2, ncol = 1)
>
> ## creating SummarizedExperiment object fails


> SummarizedExperiment(assays = list(counts), rowRanges = sgv_pred)
class: RangedSummarizedExperiment
dim: 2 1
metadata(0):
assays(1): ''
Error in .local(object, ..., verbose) : unused argument (check = FALSE)
>
> ## works after coercing to GRangestList


> SummarizedExperiment(assays = list(counts), rowRanges = as(sgv_pred,
"GRangesList"))
class: RangedSummarizedExperiment
dim: 2 1
metadata(0):
assays(1): ''
rownames: NULL
rowData names(20): from to ... variantType variantName
colnames: NULL
colData names(0):
>
> sessionInfo()
R Under development (unstable) (2017-10-20 r73567)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

Matrix products: default
BLAS:
/gnet/is2/p01/apps/R/3.5.0-20171105-devel/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so
LAPACK:
/gnet/is2/p01/apps/R/3.5.0-20171105-devel/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] SGSeq_1.13.5SummarizedExperiment_1.9.16
 [3] DelayedArray_0.5.23 BiocParallel_1.13.3
 [5] matrixStats_0.53.1  Biobase_2.39.2
 [7] Rsamtools_1.31.3Biostrings_2.47.12
 [9] XVector_0.19.9  GenomicRanges_1.31.23
[11] GenomeInfoDb_1.15.5 IRanges_2.13.28
[13] S4Vectors_0.17.39   BiocGenerics_0.25.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16  compiler_3.5.0
 [3] GenomicFeatures_1.31.10   prettyunits_1.0.2
 [5] bitops_1.0-6  tools_3.5.0
 [7] zlibbioc_1.25.0   progress_1.1.2
 [9] biomaRt_2.35.13   digest_0.6.15
[11] bit_1.1-13RSQLite_2.1.0
[13] memoise_1.1.0 lattice_0.20-35
[15] pkgconfig_2.0.1   igraph_1.2.1
[17] Matrix_1.2-13 DBI_0.8
[19] GenomeInfoDbData_1.1.0rtracklayer_1.39.9
[21] httr_1.3.1stringr_1.3.0
[23] bit64_0.9-8   grid_3.5.0
[25] R6_2.2.2  AnnotationDbi_1.41.4
[27] XML_3.98-1.10 blob_1.1.1
[29] magrittr_1.5  GenomicAlignments_1.15.13
[31] RUnit_0.4.31  assertthat_0.2.0
[33] stringi_1.1.7 RCurl_1.95-4.10
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] exonsBy dropping genes from TxDb

2017-10-30 Thread Leonard Goldstein
Thanks Hervé and others for looking into this.

Leonard

On Sun, Oct 29, 2017 at 10:19 PM, Hervé Pagès  wrote:

> Thanks Dario and Mike for looking into this.
>
> In the mean time I added makeTxDbFromEnsembl() for creating a TxDb
> object by querying directly an Ensembl MySQL server. Only lightly
> tested but so far seems to faster and more reliable than
> makeTxDbFromBiomart():
>
>   > library(GenomicFeatures)
>
>   > system.time(txdb <- makeTxDbFromEnsembl("Homo sapiens", server="
> useastdb.ensembl.org"))
>   Fetch transcripts and genes from Ensembl ... OK
>   Fetch exons and CDS from Ensembl ... OK
>   Fetch chromosome names and lengths from Ensembl ...OK
>   Gather the metadata ... OK
>   Make the TxDb object ... OK
>  user  system elapsed
>46.420   1.271  58.270
>
>   > txdb
>   TxDb object:
>   # Db type: TxDb
>   # Supporting package: GenomicFeatures
>   # Data source: Ensembl
>   # Organism: Homo sapiens
>   # Ensembl release: 90
>   # Ensembl database: homo_sapiens_core_90_38
>   # MySQL server: useastdb.ensembl.org
>   # transcript_nrow: 220144
>   # exon_nrow: 757782
>   # cds_nrow: 789614
>   # Db created by: GenomicFeatures package from Bioconductor
>   # Creation time: 2017-10-29 22:13:21 -0700 (Sun, 29 Oct 2017)
>   # GenomicFeatures version at creation time: 1.29.16
>   # RSQLite version at creation time: 2.0
>   # DBSCHEMAVERSION: 1.2
>
> It's in GenomicFeatures 1.29.16.
>
> Cheers,
> H.
>
>
> On 10/28/2017 03:32 PM, Mike Smith wrote:
>
>> My feeling is that this isn't related to RCurl, since the example
>> transcript is missing if you use httr to submit the query instead.  You
>> can
>> check this with the code at
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.gi
>> thub.com_grimbough_7e7a47b7a4f64915220ce35cc1ce8f39&d=
>> DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0W
>> YiZvSXAJJKaaPhzWA&m=njpXf5TkV-BLhepKQRWVTdAzBmWWCsDj7RIPgs7F
>> WBM&s=orkmK9y7kVCLbLB6tlMJfgT5V_GKzVysZ00DMevuAm4&e=
>>
>>
>> I wonder if this is related to BioMart's instability when you submit large
>> queries.  The web interface suggests no more than 500 entries when
>> filtering on something like a gene ID, but in these examples we're
>> applying
>> no filter at all.  Problems related to this have been reported before (
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__support
>> .bioconductor.org_p_86358_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfh
>> Q&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=njpXf5TkV-
>> BLhepKQRWVTdAzBmWWCsDj7RIPgs7FWBM&s=CxVV4WArQc4xcf74Az8V8vfQ
>> Mr80Q8K2um6sKTggHw0&e=) and I patched the devel version
>>
>> of biomaRt to submit queries in batches of 500 - although this currently
>> has no effect if there isn't a set of 'values' to split into batches.
>>
>> Interestingly, if I first obtain a list of all gene IDs in Ensembl, and
>> then submit those in a batched query I get more exons than your first
>> approach.
>>
>> all_genes <- getBM(attributes = "ensembl_gene_id", mart=mart)
>> attributes1 <- c("ensembl_transcript_id", "ensembl_exon_id")
>> attributes2 <- c(attributes1, "rank", "genomic_coding_start", "cds_start",
>> "5_utr_start")
>> df3 <- getBM(attributes1, filters = "ensembl_gene_id", values =
>> all_genes[,1], mart=mart)
>> df4 <- getBM(attributes2, filters = "ensembl_gene_id", values =
>> all_genes[,1], mart=mart)
>>
>> dim(df3)
>>>
>> [1] 1309356   6
>>
>> The number of returned results is consistent for even with the larger
>> number of attributes
>>
>> identical(df3[,1], df4[,1])
>>>
>> [1] TRUE
>>
>> And some of these transcripts are clearly not present in the first query:
>>
>> length(which( !unique(df3[,"ensembl_transcript_id"]) %in% df1[,
>>>
>> "ensembl_transcript_id"] ))
>> [1] 28128
>>
>> It could be that my batched code is doing something wrong, but I'm
>> starting
>> to suspect that even the first query is dropping data silently!
>>
>> Mike
>>
>>
>> On 28 October 2017 at 13:00, Dario Strbenac 
>> wrote:
>>
>> Good day,
>>>
>>> I stepped through the code until execution reached the end of postForm in
>>> RCurl which is called by getBM and obtains the textual result from the
>>> server. If I check the contents of write$value(), the example missing
>>> transcript is not there.
>>>
>>> Browse[3]> grep("ENST0485971", write$value())
>>> integer(0)
>>>
>>> write$value is a weird function. It's prototype is function (collapse =
>>> "", ...) but its body contains code such as
>>>
>>>  if (is.null(collapse))
>>>  return(txt)
>>>
>>> I wonder where txt is created. It's not passed as an extra variable.
>>>
>>> Browse[7]> print(list(...))
>>> list()
>>>
>>> Searching the R code reveals that txt is created as a global variable in
>>> another function named dynCurlReader by the code statement txt <<-
>>> character().
>>>
>>> RCurl also uses functions that don't begin with a dot but are
>>> undocumented.
>>>
>>> ans = encode(ans)
>>> Browse[7]> ?encode
>>> No documentation for ‘encode’ in specified packages and libraries
>>>
>>> 

[Bioc-devel] exonsBy dropping genes from TxDb

2017-10-27 Thread Leonard Goldstein
Dear bioc-devel,

I noticed exonsBy is dropping a lot of genes when run on a TxDb object
created with makeTxDbFromBiomart (see below). Please also see related post
on the Bioconductor support site:

https://support.bioconductor.org/p/101951/#102160

Thanks for your help.

Leonard

--
> tx <- makeTxDbFromBiomart()
> txs_by_gene <- transcriptsBy(tx, "gene")
> exs_by_gene <- exonsBy(tx, "gene")
> length(txs_by_gene)
[1] 63967
> length(exs_by_gene)
[1] 36751
> subsetByOverlaps(txs_by_gene, GRanges("8", IRanges(127735434,127741434)))
GRangesList object of length 1:
$ENSG0136997
GRanges object with 9 ranges and 2 metadata columns:
  seqnames ranges strand | tx_id tx_name
  |  
  [1]8 [127735434, 127740477]  + |101876 ENST0259523
  [2]8 [127735473, 127735817]  + |101877 ENST0641252
  [3]8 [127735519, 127738772]  + |101878 ENST0517291
  [4]8 [127736046, 127736612]  + |101879 ENST0641036
  [5]8 [127736069, 127741434]  + |101880 ENST0621592
  [6]8 [127736084, 127741434]  + |101881 ENST0377970
  [7]8 [127736220, 127741372]  + |101882 ENST0524013
  [8]8 [127736231, 127738475]  + |101883 ENST0520751
  [9]8 [127736594, 127740958]  + |101884 ENST0613283

---
seqinfo: 555 sequences (1 circular) from an unspecified genome
> subsetByOverlaps(exs_by_gene, GRanges("8", IRanges(127735434,127741434)))
GRangesList object of length 0:
<0 elements>

---
seqinfo: 555 sequences (1 circular) from an unspecified genome
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicFeatures_1.28.5 AnnotationDbi_1.38.2   Biobase_2.36.2
[4] GenomicRanges_1.28.6   GenomeInfoDb_1.12.3IRanges_2.10.5
[7] S4Vectors_0.14.7   BiocGenerics_0.22.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13   XVector_0.16.0
 [3] GenomicAlignments_1.12.2   zlibbioc_1.22.0
 [5] BiocParallel_1.10.1bit_1.1-12
 [7] lattice_0.20-35rlang_0.1.2
 [9] blob_1.1.0 tools_3.4.1
[11] grid_3.4.1 SummarizedExperiment_1.6.5
[13] DBI_0.7matrixStats_0.52.2
[15] bit64_0.9-7digest_0.6.12
[17] tibble_1.3.4   Matrix_1.2-11
[19] GenomeInfoDbData_0.99.0rtracklayer_1.36.6
[21] bitops_1.0-6   RCurl_1.95-4.8
[23] biomaRt_2.32.1 memoise_1.1.0
[25] RSQLite_2.0DelayedArray_0.2.7
[27] compiler_3.4.1 Rsamtools_1.28.0
[29] Biostrings_2.44.2  XML_3.98-1.9
[31] pkgconfig_2.0.1
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes S4Vectors etc.

2017-02-09 Thread Leonard Goldstein
Hi Martin and Hervé,

The problem lies in the pre-built binaries. Installing SGSeq from source
fixes the installation. I bumped the version number for SGSeq to trigger a
rebuild, but I guess other packages are affected as well. Among SGSeq
dependencies it looks like Rsamtools needs a version bump too.

Thanks,

Leonard


On Thu, Feb 9, 2017 at 4:19 AM, Martin Morgan  wrote:

> On 02/09/2017 06:25 AM, Hervé Pagès wrote:
>
>> Hi Leonard,
>>
>> mmh... I can't reproduce this and it doesn't show up on the build report
>> either. But I got another strange error when I tried to *install*
>> SGSeq: something about TxFeatures not being able to extend GenomicRanges
>> because of incompatible type for the elementMetadata slot.
>>
>> I got rid of it by re-installing GenomicRanges from svn.
>> I guess I had some stalled class definition somewhere in a cache.
>>
>> Looks like maybe you also have a stalled class definition somewhere
>> in a cache for one of SGFeatureCounts's parent classes. Not sure which
>> one though :-/
>>
>> Maybe I forgot to bump a package version somewhere after I renamed
>> characterORNULL -> character_OR_NULL? Re-installing S4Vectors, IRanges,
>> GenomicRanges, and SummarizedExperiment directly from svn might help.
>>
>
> See also https://support.bioconductor.org/p/92201/#92205 where
>
> GeneSetCollection extends 'list', which extends 'vector'
>
> GSEABase Depends: on annotate which Depends: on AnnotationDbi which
> Imports: S4Vectors.
>
> S4Vectors defines a class union on vector_OR_factor (previously
> vectorORfactor). Consequently, GeneSetCollection extends vector_OR_factor.
>
> Class definitions are cached at package installation time, so GSEABase
> class definitions became out-of-date when S4Vectors was updated.
>
> A conservative approach is to bump the version of all packages that depend
> on S4Vectors, even GSEABase which is three steps away from S4Vectors. I
> believe that this involves
>
>   db = available.packages(repos=BiocInstaller::biocinstallRepos()[1])
>   revdeps = tools::package_dependencies("S4Vectors", db, recursive=TRUE,
>   reverse=TRUE)
>
> which is
>
> > length(revdeps$S4Vectors)
> [1] 662
>
> packages!
>
> Leonard could find candidates for re-installation with
>
> > mydep = tools::package_dependencies("SGSeq", db, recursive=TRUE)
> > mydep$SGSeq[mydep$SGSeq %in% revdeps$S4Vectors]
>  [1] "IRanges"  "GenomicRanges"    "Rsamtools"
>  [4] "SummarizedExperiment" "AnnotationDbi""Biostrings"
>  [7] "GenomicAlignments""GenomicFeatures"  "GenomeInfoDb"
> [10] "rtracklayer"  "XVector"  "biomaRt"
> [13] "DelayedArray"
>
> Martin
>
>
>
>> Cheers,
>> H.
>>
>> On 02/08/2017 04:18 PM, Leonard Goldstein wrote:
>>
>>> Hi Hervé,
>>>
>>> It looks like there have been some changes in bioc-devel (S4Vectors etc.)
>>> that break the SGSeq package (see below). I'm not sure whether this is
>>> something that needs to be addressed in SGSeq or its dependencies. I'd be
>>> grateful for any pointers. Thanks for your help.
>>>
>>> Leonard
>>>
>>> --
>>>
>>>> example(makeSGFeatureCounts, "SGSeq")
>>>>
>>>
>>> mkSGFC> sgfc <- makeSGFeatureCounts(sgf_pred, si,
>>> mkSGFC+   matrix(0L, length(sgf_pred), nrow(si)))
>>> Error in checkSlotAssignment(object, name, value) :
>>>   assignment of an object of class "NULL" is not valid for slot
>>> 'NAMES' in
>>> an object of class "SGFeatureCounts"; is(value, "characterORNULL") is not
>>> TRUE
>>>
>>>>
>>>> sessionInfo()
>>>>
>>> R Under development (unstable) (2017-02-06 r72129)
>>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>> Running under: OS X El Capitan 10.11.6
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats4parallel  stats graphics  grDevices utils datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>>  [1] SGSeq_1.9.1SummarizedExperiment_1.5.5
>>>  [3] DelayedArray_0.1.4 Biobase_2.35.0
>>>  [5] Rsamtools_1.27.12  Biostrings_2.43.4
>>>  [7] XVector_0.15.2 GenomicRanges_1.27.22
>>>  [9] G

[Bioc-devel] Changes S4Vectors etc.

2017-02-08 Thread Leonard Goldstein
Hi Hervé,

It looks like there have been some changes in bioc-devel (S4Vectors etc.)
that break the SGSeq package (see below). I'm not sure whether this is
something that needs to be addressed in SGSeq or its dependencies. I'd be
grateful for any pointers. Thanks for your help.

Leonard

--
> example(makeSGFeatureCounts, "SGSeq")

mkSGFC> sgfc <- makeSGFeatureCounts(sgf_pred, si,
mkSGFC+   matrix(0L, length(sgf_pred), nrow(si)))
Error in checkSlotAssignment(object, name, value) :
  assignment of an object of class "NULL" is not valid for slot 'NAMES' in
an object of class "SGFeatureCounts"; is(value, "characterORNULL") is not
TRUE
>
> sessionInfo()
R Under development (unstable) (2017-02-06 r72129)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] SGSeq_1.9.1SummarizedExperiment_1.5.5
 [3] DelayedArray_0.1.4 Biobase_2.35.0
 [5] Rsamtools_1.27.12  Biostrings_2.43.4
 [7] XVector_0.15.2 GenomicRanges_1.27.22
 [9] GenomeInfoDb_1.11.9IRanges_2.9.18
[11] S4Vectors_0.13.13  BiocGenerics_0.21.3

loaded via a namespace (and not attached):
 [1] igraph_1.0.1 Rcpp_0.12.9  AnnotationDbi_1.37.2

 [4] magrittr_1.5 zlibbioc_1.21.0
 GenomicAlignments_1.11.9
 [7] BiocParallel_1.9.5   lattice_0.20-34  tools_3.4.0

[10] grid_3.4.0   DBI_0.5-1digest_0.6.12

[13] Matrix_1.2-8 GenomeInfoDbData_0.99.0  rtracklayer_1.35.5

[16] bitops_1.0-6 RUnit_0.4.31 biomaRt_2.31.4

[19] RCurl_1.95-4.8   memoise_1.0.0RSQLite_1.1-2

[22] compiler_3.4.0   GenomicFeatures_1.27.6   XML_3.98-1.5

>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Bug in setdiff() for signature 'IRangesList, IRangesList'?

2016-11-07 Thread Leonard Goldstein
Thanks Michael!

Leonard



On Mon, Nov 7, 2016 at 11:35 AM, Michael Lawrence  wrote:

> I fixed this in 2.9.6 and 2.8.1.
>
> It brings to light an interesting pecularity of the recycling rule.
>
> Thanks,
> Michael
>
> On Mon, Nov 7, 2016 at 9:46 AM, Leonard Goldstein
>  wrote:
> > Dear Hervé et al.
> >
> > I noticed that setdiff(x, y) where 'x' is an IRangesList containing an
> > empty IRanges results in an error (see below). I would have expected the
> > function to return the IRangesList with an empty IRanges?
> >
> > Many thanks for your help.
> >
> > Leonard
> >
> > --
> >> x <- IRangesList(IRanges())
> >> y <- IRangesList(IRanges())
> >>
> >> setdiff(x, y)
> > Error in ans[] <- x : replacement has length zero
> >>
> >> sessionInfo()
> > R version 3.3.0 Patched (2016-05-09 r70594)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats4parallel  stats graphics  grDevices utils datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] IRanges_2.8.0   S4Vectors_0.12.0BiocGenerics_0.20.0
> >
> > loaded via a namespace (and not attached):
> > [1] zlibbioc_1.20.0 XVector_0.14.0
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Bug in setdiff() for signature 'IRangesList, IRangesList'?

2016-11-07 Thread Leonard Goldstein
Dear Hervé et al.

I noticed that setdiff(x, y) where 'x' is an IRangesList containing an
empty IRanges results in an error (see below). I would have expected the
function to return the IRangesList with an empty IRanges?

Many thanks for your help.

Leonard

--
> x <- IRangesList(IRanges())
> y <- IRangesList(IRanges())
>
> setdiff(x, y)
Error in ans[] <- x : replacement has length zero
>
> sessionInfo()
R version 3.3.0 Patched (2016-05-09 r70594)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] IRanges_2.8.0   S4Vectors_0.12.0BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.20.0 XVector_0.14.0

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] New argument 'use.names' in granges() function

2016-06-17 Thread Leonard Goldstein
Thanks Hervé.

Leonard

On Fri, Jun 17, 2016 at 2:11 AM, Hervé Pagès  wrote:

> Done in SGSeq 1.6.3 (release) and 1.7.4 (devel).
>
> Also done in SomaticSignatures 2.8.4 (release) and 2.9.4 (devel).
>
> I scanned the entire Rpacks folder and those are the only 2 packages
> I found that contain calls to granges() with 2 unnamed arguments.
>
> Sorry for the trouble.
>
> H.
>
>
> On 06/16/2016 11:55 PM, Hervé Pagès wrote:
>
>> Hi Leonard,
>>
>> I really like having 'use.names' before 'use.mcols'. Sorry for breaking
>> SGSeq, I'll fix it.
>>
>> FWIW I think it's good practice to always name this kind of arguments.
>> Not the first argument 'x' or the first 2 arguments 'x' and 'y' of a
>> unary or binary function, but the toggles that follow them like
>> 'use.names', 'use.mcols', 'ignore.strand' etc..
>> That makes the code a lot more readable.
>>
>> Cheers,
>> H.
>>
>>
>> On 06/16/2016 04:46 PM, Leonard Goldstein wrote:
>>
>>> Hi Hervé,
>>>
>>> I noticed a recent change in the release and development version of
>>> GenomicRanges, which introduces a new argument 'use.names' for granges()
>>> and other functions.
>>>
>>> The change breaks the SGSeq package, since it uses granges() relying on
>>> positional argument matching (and 'use.names' has been added preceding
>>> existing arguments). Should GenomicRanges be changed so that newly
>>> arguments occur after existing arguments? Otherwise I can fix the problem
>>> in SGSeq.
>>>
>>> Many thanks,
>>>
>>> Leonard
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] New argument 'use.names' in granges() function

2016-06-16 Thread Leonard Goldstein
Hi Hervé,

I noticed a recent change in the release and development version of
GenomicRanges, which introduces a new argument 'use.names' for granges()
and other functions.

The change breaks the SGSeq package, since it uses granges() relying on
positional argument matching (and 'use.names' has been added preceding
existing arguments). Should GenomicRanges be changed so that newly
arguments occur after existing arguments? Otherwise I can fix the problem
in SGSeq.

Many thanks,

Leonard

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] segfault when using RleList in DataFrames

2015-12-07 Thread Leonard Goldstein
Thanks Hervé!

Best wishes

Leonard


On Mon, Dec 7, 2015 at 12:08 AM, Hervé Pagès  wrote:
> Hi Leonard,
>
> Thanks for the report. This segfault was due to a long-standing bug
> in the "window" method for Rle objects when the object has length 0:
>
>   > window(Rle(), 1, 0)  # you might need to try many times before
>  # you get the segfault
>*** caught segfault ***
>   address 0x12f69d74, cause 'memory not mapped'
>
>   Traceback:
>1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>2: .Call2("Rle_getStartEndRunAndOffset", x, start, end, PACKAGE =
> "IRanges")
>3: getStartEndRunAndOffset(x, start(solved_SEW), end(solved_SEW))
>4: .local(x, ...)
>5: window(Rle(), 1, 0)
>6: window(Rle(), 1, 0)
>
>   Possible actions:
>   1: abort (with core dump, if enabled)
>   2: normal R exit
>   3: exit R without saving workspace
>   4: exit R saving workspace
>   Selection: 2
>   Save workspace image? [y/n/c]:
>
> It is now fixed in release (S4Vectors 0.8.4 and IRanges 2.4.5) and
> devel (S4Vectors 0.9.12 and IRanges 2.5.10).
>
> Cheers,
> H.
>
>
>
> On 12/05/2015 03:29 PM, Leonard Goldstein wrote:
>>
>> Hi all,
>>
>> I ran into problems when using an RleList as column in a DataFrame
>> object (see example below).
>>
>> Many thanks in advance for your help.
>>
>> Leonard
>>
>>
>>> sessionInfo()
>>
>> R version 3.2.2 (2015-08-14)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>> [8] methods   base
>>
>> other attached packages:
>> [1] IRanges_2.4.4   S4Vectors_0.8.3 BiocGenerics_0.16.1
>>>
>>>
>>> df <- DataFrame(ID = 1:3)
>>> x <- RleList(IntegerList(vector("list", 3)))
>>> df$rle <- x
>>> df
>>
>> DataFrame with 3 rows and 2 columns
>>
>>   *** caught segfault ***
>> address 0x9b00364, cause 'memory not mapped'
>>
>> Traceback:
>>   1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>   2: .Call2("Rle_getStartEndRunAndOffset", x, start, end, PACKAGE =
>> "S4Vectors")
>>   3: S4Vectors:::getStartEndRunAndOffset(x, start(solved_SEW),
>> end(solved_SEW))
>>   4: .local(x, ...)
>>   5: window(x, start = 1L, width = n)
>>   6: window(x, start = 1L, width = n)
>>   7: .local(x, ...)
>>   8: head(x, 3)
>>   9: head(x, 3)
>>
>> Possible actions:
>> 1: abort (with core dump, if enabled)
>> 2: normal R exit
>> 3: exit R without saving workspace
>> 4: exit R saving workspace
>> Selection:
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] segfault when using RleList in DataFrames

2015-12-05 Thread Leonard Goldstein
Hi all,

I ran into problems when using an RleList as column in a DataFrame
object (see example below).

Many thanks in advance for your help.

Leonard


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] IRanges_2.4.4   S4Vectors_0.8.3 BiocGenerics_0.16.1
>
> df <- DataFrame(ID = 1:3)
> x <- RleList(IntegerList(vector("list", 3)))
> df$rle <- x
> df
DataFrame with 3 rows and 2 columns

 *** caught segfault ***
address 0x9b00364, cause 'memory not mapped'

Traceback:
 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
 2: .Call2("Rle_getStartEndRunAndOffset", x, start, end, PACKAGE = "S4Vectors")
 3: S4Vectors:::getStartEndRunAndOffset(x, start(solved_SEW), end(solved_SEW))
 4: .local(x, ...)
 5: window(x, start = 1L, width = n)
 6: window(x, start = 1L, width = n)
 7: .local(x, ...)
 8: head(x, 3)
 9: head(x, 3)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] error when combining objects inheriting from GRangesList

2015-12-04 Thread Leonard Goldstein
Hi Hervé,

OK thanks for the workaround and the quick reply.

Best wishes,

Leonard


On Fri, Dec 4, 2015 at 10:41 AM, Hervé Pagès  wrote:
> Hi Leonard,
>
>
> On 12/04/2015 09:43 AM, Leonard Goldstein wrote:
>>
>> Hi all,
>>
>> In the latest Bioc release (and devel) I encountered problems with
>> classes inheriting from GRangesList. Combining two or more objects
>> that belong to such classes can result in an error, as the combined
>> object is apparently not recognized as a valid instance of the class.
>> I included an example below. Thanks in advance for your help.
>>
>> Leonard
>>
>> --
>>>
>>> library(GenomicRanges)
>>>
>>> validNewClass <- function(object) {
>>
>> +
>> +   if (!"ID" %in% names(mcols(object))) {
>> +
>> + return("missing metadata column ID")
>> +
>> +   }
>> +
>> + }
>>>
>>>
>>> setClass(
>>
>> +   Class = "newClass",
>> +   contains = "GRangesList",
>> +   validity = validNewClass)
>>>
>>>
>>> gr <- GRanges(1, IRanges(1, 100))
>>> grl <- split(gr, 1)
>>> mcols(grl)$ID <- 1
>>> x <- new("newClass", grl)
>>>
>>> ## combining two instances of newClass results in an error
>>> c(x, x)
>>
>> Error in validObject(.Object) :
>>invalid class “newClass” object: missing metadata column ID
>
>
> A lot of code in the S4Vectors/IRanges/GenomicRanges infrastructure
> needs to create temporarily invalid objects for various reasons. At
> the lowest level we use S4Vectors::new2(..., check=FALSE) for that.
> Unlike methods::new(), which always validates the new object, new2()
> only validates it if 'check' is set to TRUE.
> However new2() is only able to to this if the validity method for the
> object was defined using setValidity2(), also defined in the S4Vectors
> package. If you build on top of the S4Vectors/IRanges/GenomicRanges
> infrastructure, please always use setValidity2():
>
>   library(GenomicRanges)
>
>   setClass("newClass", "GRangesList")
>
>   setValidity2("newClass",
> function(object) {
>   if (!("ID" %in% names(mcols(object
> return("missing metadata column ID")
>   NULL
> }
>   )
>
>   gr <- GRanges("1:1-100")
>   grl <- split(gr, 1)
>   mcols(grl)$ID <- 1
>   x <- new("newClass", grl)
>
> Then:
>
>   > c(x, x)
>
>   newClass object of length 2:
>   $1
>   GRanges object with 1 range and 0 metadata columns:
> seqnamesranges strand
>   
> [1] 1  [   1, 100]  *
>
>   $1
>   GRanges object with 1 range and 0 metadata columns:
> seqnamesranges strand
> [1]1 [ 1, 100]  *
>
>   ---
>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>>>
>>> ## but can create an instance after combining as GRangesLists
>>> new("newClass", c(as(x, "GRangesList"), as(x, "GRangesList")))
>>
>> newClass object of length 2:
>> $1
>> GRanges object with 1 range and 0 metadata columns:
>>seqnamesranges strand
>>  
>>[1]1  [1, 100]  *
>>
>> $1
>> GRanges object with 1 range and 0 metadata columns:
>>seqnames   ranges strand
>>[1]1 [1, 100]  *
>>
>> ---
>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>
> This works because the validity method for GRangesList objects is set
> with setValidity2().
>
> I wish new() had the extra 'check' argument so we wouldn't need to use
> the new2/setValidity2 hack.
>
> Hope this helps,
> H.
>
>>>
>>> sessionInfo()
>>
>> R version 3.2.2 (2015-08-14)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>> [8] methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.22.1 GenomeInfoDb_1.6.1   IRanges_2.4.4
>> [4] S4Vectors_0.8.3  BiocGenerics_0.16.1
>>
>> loaded via a namespace (and not attached):
>> [1] zlibbioc_1.16.0 XVector_0.10.0
>>>
>>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] error when combining objects inheriting from GRangesList

2015-12-04 Thread Leonard Goldstein
Hi all,

In the latest Bioc release (and devel) I encountered problems with
classes inheriting from GRangesList. Combining two or more objects
that belong to such classes can result in an error, as the combined
object is apparently not recognized as a valid instance of the class.
I included an example below. Thanks in advance for your help.

Leonard

--
> library(GenomicRanges)
>
> validNewClass <- function(object) {
+
+   if (!"ID" %in% names(mcols(object))) {
+
+ return("missing metadata column ID")
+
+   }
+
+ }
>
> setClass(
+   Class = "newClass",
+   contains = "GRangesList",
+   validity = validNewClass)
>
> gr <- GRanges(1, IRanges(1, 100))
> grl <- split(gr, 1)
> mcols(grl)$ID <- 1
> x <- new("newClass", grl)
>
> ## combining two instances of newClass results in an error
> c(x, x)
Error in validObject(.Object) :
  invalid class “newClass” object: missing metadata column ID
>
> ## but can create an instance after combining as GRangesLists
> new("newClass", c(as(x, "GRangesList"), as(x, "GRangesList")))
newClass object of length 2:
$1
GRanges object with 1 range and 0 metadata columns:
  seqnamesranges strand

  [1]1  [1, 100]  *

$1
GRanges object with 1 range and 0 metadata columns:
  seqnames   ranges strand
  [1]1 [1, 100]  *

---
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.22.1 GenomeInfoDb_1.6.1   IRanges_2.4.4
[4] S4Vectors_0.8.3  BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0 XVector_0.10.0
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] S4 classes extending GRangesList

2015-09-29 Thread Leonard Goldstein
Great, thank you Hervé.

Best wishes,

Leonard

On Tue, Sep 29, 2015 at 2:07 PM, Hervé Pagès  wrote:
> Hi Leonard,
>
> Fixed in IRanges 2.3.22 (devel) and 2.2.8 (release). Now the "c" method
> for CompressedList objects should always return an object of the same
> class as its 1st argument.
>
> Thanks for your patience and for the reminder.
> H.
>
>
> On 09/29/2015 10:35 AM, Leonard Goldstein wrote:
>>
>> Hi Hervé,
>>
>> This still looks broken in the current bioc-devel. Just wanted to
>> follow up in case it got missed. Thanks again.
>>
>> Leonard
>>
>>
>>
>> On Thu, Jun 11, 2015 at 11:59 AM, Leonard Goldstein 
>> wrote:
>>>
>>> Thanks Hervé.
>>>
>>> Best wishes,
>>>
>>> Leonard
>>>
>>>
>>> On Thu, Jun 11, 2015 at 10:58 AM, Hervé Pagès 
>>> wrote:
>>>>
>>>> Hi Leonard,
>>>>
>>>> It's a bug in the "c" method for "CompressedList" object. I'll fix
>>>> that. Thanks for the report.
>>>>
>>>> H.
>>>>
>>>>
>>>> On 06/11/2015 10:48 AM, Leonard Goldstein wrote:
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I noticed that when combining instances of a class that inherits from
>>>>> GRangesList, the result does not preserve the class (it is returned as
>>>>> a GRangesList instead). The class is preserved in other situations
>>>>> (e.g. when a class extends GRanges). See below for an example. Is
>>>>> there a reason why the class cannot be preserved in the first case?
>>>>> Thanks in advance for your help.
>>>>>
>>>>> Leonard
>>>>>
>>>>>
>>>>>> ## define a new class 'A' inheriting from GRanges
>>>>>> setClass(Class = "A", contains = "GRanges")
>>>>>>
>>>>>> ## combining two instances of class 'A' returns an object of class 'A'
>>>>>> gr <- GRanges("1", IRanges(1, 100))
>>>>>> a <- new("A", gr)
>>>>>> a
>>>>>
>>>>>
>>>>> A object with 1 range and 0 metadata columns:
>>>>> seqnamesranges strand
>>>>>   
>>>>> [1]1  [1, 100]  *
>>>>> ---
>>>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>>>
>>>>>>
>>>>>> c(a, a)
>>>>>
>>>>>
>>>>> A object with 2 ranges and 0 metadata columns:
>>>>> seqnamesranges strand
>>>>>   
>>>>> [1]1  [1, 100]  *
>>>>> [2]1  [1, 100]  *
>>>>> ---
>>>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>>>
>>>>>>
>>>>>>
>>>>>> ## define a new class 'b' inheriting from GRangesList
>>>>>> setClass(Class = "B", contains = "GRangesList")
>>>>>>
>>>>>> ## combining two instances of class 'B' returns a GRangesList
>>>>>> grl <- split(gr, 1)
>>>>>> b <- new("B", grl)
>>>>>> b
>>>>>
>>>>>
>>>>> B object of length 1:
>>>>> $1
>>>>> GRanges object with 1 range and 0 metadata columns:
>>>>> seqnamesranges strand
>>>>>   
>>>>> [1]1  [1, 100]  *
>>>>>
>>>>> ---
>>>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>>>
>>>>>>
>>>>>> c(b, b)
>>>>>
>>>>>
>>>>> GRangesList object of length 2:
>>>>> $1
>>>>> GRanges object with 1 range and 0 metadata columns:
>>>>> seqnamesranges strand
>>>>>   
>>>>> [1]1  [1, 100]  *
>>>>>
>>>>> $1
>>>>> GRanges object with 1 range and 0 metadata columns:
>>>>> seqnames   ranges strand
>>>>> [1]1 [1, 100]  *
>>

Re: [Bioc-devel] S4 classes extending GRangesList

2015-09-29 Thread Leonard Goldstein
Hi Hervé,

This still looks broken in the current bioc-devel. Just wanted to
follow up in case it got missed. Thanks again.

Leonard



On Thu, Jun 11, 2015 at 11:59 AM, Leonard Goldstein  wrote:
> Thanks Hervé.
>
> Best wishes,
>
> Leonard
>
>
> On Thu, Jun 11, 2015 at 10:58 AM, Hervé Pagès  wrote:
>> Hi Leonard,
>>
>> It's a bug in the "c" method for "CompressedList" object. I'll fix
>> that. Thanks for the report.
>>
>> H.
>>
>>
>> On 06/11/2015 10:48 AM, Leonard Goldstein wrote:
>>>
>>> Hi all,
>>>
>>> I noticed that when combining instances of a class that inherits from
>>> GRangesList, the result does not preserve the class (it is returned as
>>> a GRangesList instead). The class is preserved in other situations
>>> (e.g. when a class extends GRanges). See below for an example. Is
>>> there a reason why the class cannot be preserved in the first case?
>>> Thanks in advance for your help.
>>>
>>> Leonard
>>>
>>>
>>>> ## define a new class 'A' inheriting from GRanges
>>>> setClass(Class = "A", contains = "GRanges")
>>>>
>>>> ## combining two instances of class 'A' returns an object of class 'A'
>>>> gr <- GRanges("1", IRanges(1, 100))
>>>> a <- new("A", gr)
>>>> a
>>>
>>> A object with 1 range and 0 metadata columns:
>>>seqnamesranges strand
>>>  
>>>[1]1  [1, 100]  *
>>>---
>>>seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>
>>>> c(a, a)
>>>
>>> A object with 2 ranges and 0 metadata columns:
>>>seqnamesranges strand
>>>  
>>>[1]1  [1, 100]  *
>>>[2]1  [1, 100]  *
>>>---
>>>seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>
>>>>
>>>> ## define a new class 'b' inheriting from GRangesList
>>>> setClass(Class = "B", contains = "GRangesList")
>>>>
>>>> ## combining two instances of class 'B' returns a GRangesList
>>>> grl <- split(gr, 1)
>>>> b <- new("B", grl)
>>>> b
>>>
>>> B object of length 1:
>>> $1
>>> GRanges object with 1 range and 0 metadata columns:
>>>seqnamesranges strand
>>>  
>>>[1]1  [1, 100]  *
>>>
>>> ---
>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>
>>>> c(b, b)
>>>
>>> GRangesList object of length 2:
>>> $1
>>> GRanges object with 1 range and 0 metadata columns:
>>>seqnamesranges strand
>>>  
>>>[1]1  [1, 100]  *
>>>
>>> $1
>>> GRanges object with 1 range and 0 metadata columns:
>>>seqnames   ranges strand
>>>[1]1 [1, 100]  *
>>>
>>> ---
>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2014-11-04 r66932)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats4parallel  stats graphics  grDevices utils datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>> [1] GenomicRanges_1.21.15 GenomeInfoDb_1.5.7IRanges_2.3.11
>>> [4] S4Vectors_0.7.4   BiocGenerics_0.15.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] XVector_0.9.1
>>>>
>>>>
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] using covr with the parallel package

2015-08-17 Thread Leonard Goldstein
Hi Jim,

You are right -- the problem is not related to mc.preschedule but
simply arises when there are child processes. Thanks for pointing this
out.

I will make some changes to my package to make sure all *apply calls
are run in the current process in situations where there is no
parallelization.

Thanks for the covr package! It is extremely useful, as many others
have pointed out.

Leonard


On Mon, Aug 17, 2015 at 1:29 PM, Jim Hester  wrote:
> Leonard,
>
> Setting `mc.preschedule = TRUE` and `ncores < 2` results in `mclapply`
> (which `mcmlapply` calls) delegating to a simple `lapply` (see
> https://github.com/wch/r-source/blob/7578138223130d3c6197b95c3cbbe1e78a5cf230/src/library/parallel/R/unix/mclapply.R#L117).
> Since this `lapply` runs in the current process coverage works fine.
>
> `mc.preschedule = FALSE` calls `mcparallel` in all cases, which spawns child
> processes, and coverage is not tracked in child processes.
>
> I don't have any plans to support tracking coverage in child processes at
> the moment (although I suppose it is possible in the future).
>
> So the answer seems to be if you want test coverage to work with parallel
> code you need to run it with `ncores = 1` and `mc.preschedule = TRUE`.
>
> I will add something to that effect in the covr README.
>
> Thank you for reporting the issue!
>
> Jim
>
> (posting includes off-list email exchange when I neglected to include
> bioc-devel in the response)
>
> On Mon, Aug 17, 2015 at 2:34 PM, Leonard Goldstein
>  wrote:
>>
>> Hi Jim,
>>
>> Thanks for the quick reply!
>>
>> For example, predictTxFeaturesPerSample in
>>
>>
>> https://codecov.io/github/Bioconductor-mirror/SGSeq/R/features-prediction.R
>>
>> is called by predictTxFeatures in main.R (inside mcmapply with
>> mc.preschedule = FALSE)
>>
>> but it is considered untested.
>>
>> Thanks for looking into this.
>>
>> Leonard
>>
>>
>> On Mon, Aug 17, 2015 at 11:24 AM, Jim Hester 
>> wrote:
>> > Leonard,
>> >
>> > Can you link to an example where this behavior is occurring?  I can see
>> > evidence where the coverage is working as expected when using mclapply,
>> > e.g.
>> >
>> > (https://codecov.io/github/Bioconductor-mirror/SGSeq/R/graphs.R?ref=ff09041bd0fec09a6c138e3103bfee23f876f675#l-223),
>> > but not the reverse.
>> >
>> > Jim
>> >
>> > On Mon, Aug 17, 2015 at 2:13 PM, Leonard Goldstein
>> >  wrote:
>> >>
>> >> Hi Jim,
>> >>
>> >> I noticed that when covr calculates test coverage, functions called
>> >> inside mclapply or mcmapply with argument mc.preschedule = FALSE are
>> >> considered untested (even if unit tests exist)
>> >>
>> >> When running checks I only use one core. So an easy fix would be to
>> >> set mc.preschedule to TRUE if mc.cores = 1. But I was wondering if you
>> >> are aware of this behavior and whether there is a way to avoid it?
>> >>
>> >> Many thanks for your help.
>> >>
>> >> Leonard
>> >>
>> >> ___
>> >> Bioc-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>> >

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] using covr with the parallel package

2015-08-17 Thread Leonard Goldstein
Hi Jim,

I noticed that when covr calculates test coverage, functions called
inside mclapply or mcmapply with argument mc.preschedule = FALSE are
considered untested (even if unit tests exist)

When running checks I only use one core. So an easy fix would be to
set mc.preschedule to TRUE if mc.cores = 1. But I was wondering if you
are aware of this behavior and whether there is a way to avoid it?

Many thanks for your help.

Leonard

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] error using the DEXSeqDataSet function

2015-07-30 Thread Leonard Goldstein
Dear Alejandro,

DEXSeq is working fine now. Thanks very much for the quick fix.

Best wishes,

Leonard



On Thu, Jul 30, 2015 at 9:02 AM, Alejandro Reyes
 wrote:
> Dear Leonard,
>
> Thanks a lot for reporting this. It should be fixed in the version that I
> just committed to the svn (DEXSeq 1.5.10).
>
> While debugging the DEXSeq code, I noticed that summarizedOverlaps is giving
> me an error, which I think its a bug while creating the
> summarizedExperiments object that is returned. Here a reproducible example:
>
>>   library(GenomicRanges)
>>   library(GenomicFeatures)
>>   library(GenomicAlignments)
>>
>>   hse <- makeTxDbFromBiomart( biomart="ensembl",
> +   dataset="hsapiens_gene_ensembl" )
>
>
> Download and preprocess the 'transcripts' data frame ... OK
> Download and preprocess the 'chrominfo' data frame ... OK
> Download and preprocess the 'splicings' data frame ... OK
> Download and preprocess the 'genes' data frame ... OK
> Prepare the 'metadata' data frame ... OK
> Make the TxDb object ... OK
>>
>>   bamDir <- system.file(
> + "extdata", package="parathyroidSE", mustWork=TRUE )
>>   fls <- list.files( bamDir, pattern="bam$", full=TRUE )
>>
>>   bamlst <- BamFileList(
> + fls, index=character(),
> + yieldSize=10, obeyQname=TRUE )
>>
>>   exonicParts <- disjointExons( hse, aggregateGenes=FALSE )
>>
>>   SE <- summarizeOverlaps( exonicParts, bamlst,
> + mode="Union", singleEnd=FALSE,
> + ignore.strand=TRUE, inter.feature=FALSE, fragments=TRUE )
> Error in SummarizedExperiment(assays = SimpleList(counts = counts),
> rowRanges = features,  :
>  error in evaluating the argument 'assays' in selecting a method for
> function 'SummarizedExperiment': Error in validObject(.Object) :
>  invalid class “SimpleList” object: invalid object for slot "listData" in
> class "SimpleList": got class "matrix", should be or extend class "lis
> t"
>
> And the output of sessionInfo(),
>
>> sessionInfo()
> R Under development (unstable) (2015-07-25 r68744)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 15.04
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=de_DE.UTF-8LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=de_DE.UTF-8LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=de_DE.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4parallel  stats graphics  grDevices utils datasets
> [8] methods   base
>
> other attached packages:
> [1] GenomicAlignments_1.5.12   Rsamtools_1.21.14
> [3] Biostrings_2.37.2  XVector_0.9.1
> [5] GenomicFeatures_1.21.13AnnotationDbi_1.31.17
> [7] DEXSeq_1.15.10 DESeq2_1.9.26
> [9] RcppArmadillo_0.5.200.1.0  Rcpp_0.12.0
> [11] SummarizedExperiment_0.3.2 GenomicRanges_1.21.17
> [13] GenomeInfoDb_1.5.9 IRanges_2.3.15
> [15] S4Vectors_0.7.10   Biobase_2.29.1
> [17] BiocGenerics_0.15.3BiocParallel_1.3.42
>
> loaded via a namespace (and not attached):
> [1] genefilter_1.51.0statmod_1.4.21   locfit_1.5-9.1
> [4] reshape2_1.4.1   splines_3.3.0lattice_0.20-33
> [7] colorspace_1.2-6 rtracklayer_1.29.12  survival_2.38-3
> [10] XML_3.98-1.3 foreign_0.8-65   DBI_0.3.1
> [13] RColorBrewer_1.1-2   lambda.r_1.1.7   plyr_1.8.3
> [16] stringr_1.0.0zlibbioc_1.15.0  munsell_0.4.2
> [19] gtable_0.1.2 futile.logger_1.4.1  hwriter_1.3.2
> [22] latticeExtra_0.6-26  geneplotter_1.47.0   biomaRt_2.25.1
> [25] proto_0.3-10 acepack_1.3-3.3  xtable_1.7-4
> [28] scales_0.2.5 Hmisc_3.16-0 annotate_1.47.4
> [31] gridExtra_2.0.0  ggplot2_1.0.1digest_0.6.8
> [34] stringi_0.5-5grid_3.3.0   tools_3.3.0
> [37] bitops_1.0-6 magrittr_1.5 RCurl_1.95-4.7
> [40] RSQLite_1.0.0Formula_1.2-1cluster_2.0.3
> [43] futile.options_1.0.0 MASS_7.3-43  rpart_4.1-10
> [46] nnet_7.3-10
>
>
> Best regards,
> Alejandro Reyes
>
>
>
>
>
> On 29.07.2015 20:26, Leonard Goldstein wrote:
>
> Hi all,
>
> I'm having trouble creating a DEXSeqDataSet object (in the devel
> version of DEXSeq)
>
> Running the example included in the manual page results in the same
> error I get with my own data (see below)
>
> Many thanks for your help.
>
> Leonard
>
> library(DEXSeq)
> 

[Bioc-devel] error using the DEXSeqDataSet function

2015-07-29 Thread Leonard Goldstein
Hi all,

I'm having trouble creating a DEXSeqDataSet object (in the devel
version of DEXSeq)

Running the example included in the manual page results in the same
error I get with my own data (see below)

Many thanks for your help.

Leonard

> library(DEXSeq)
> countData <- matrix( rpois(1, 100), nrow=1000 )
> sampleData <- data.frame(
+ condition=rep( c("untreated", "treated"), each=5 ) )
> design <- formula( ~ sample + exon + condition:exon )
> groupID <- rep(
+ paste0("gene", 1:10),
+ each= 100 )
> featureID <- rep(
+ paste0("exon", 1:10),
+ times= 100 )
> DEXSeqDataSet( countData, sampleData, design,
+ featureID, groupID )
converting counts to integer mode

Error in `$<-.data.frame`(`*tmp*`, "dispersion", value = NA) :
  replacement has 1 row, data has 0
In addition: Warning message:
In DESeqDataSet(se, design, ignoreRank = TRUE) :
  900 duplicate rownames were renamed by adding numbers

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] DEXSeq_1.15.9  DESeq2_1.9.26
 [3] RcppArmadillo_0.5.200.1.0  Rcpp_0.12.0
 [5] SummarizedExperiment_0.3.2 GenomicRanges_1.21.17
 [7] GenomeInfoDb_1.5.9 IRanges_2.3.15
 [9] S4Vectors_0.7.10   Biobase_2.29.1
[11] BiocGenerics_0.15.3BiocParallel_1.3.41

loaded via a namespace (and not attached):
 [1] genefilter_1.51.0 statmod_1.4.21locfit_1.5-9.1
 [4] reshape2_1.4.1splines_3.2.1 lattice_0.20-33
 [7] colorspace_1.2-6  survival_2.38-3   XML_3.98-1.3
[10] foreign_0.8-65DBI_0.3.1 RColorBrewer_1.1-2
[13] lambda.r_1.1.7plyr_1.8.3stringr_1.0.0
[16] zlibbioc_1.15.0   Biostrings_2.37.2 munsell_0.4.2
[19] gtable_0.1.2  futile.logger_1.4.1   hwriter_1.3.2
[22] latticeExtra_0.6-26   geneplotter_1.47.0biomaRt_2.25.1
[25] AnnotationDbi_1.31.17 proto_0.3-10  acepack_1.3-3.3
[28] xtable_1.7-4  scales_0.2.5  Hmisc_3.16-0
[31] annotate_1.47.4   XVector_0.9.1 Rsamtools_1.21.14
[34] gridExtra_2.0.0   ggplot2_1.0.1 digest_0.6.8
[37] stringi_0.5-5 grid_3.2.1tools_3.2.1
[40] bitops_1.0-6  magrittr_1.5  RCurl_1.95-4.7
[43] RSQLite_1.0.0 Formula_1.2-1 cluster_2.0.3
[46] futile.options_1.0.0  MASS_7.3-43   rpart_4.1-10
[49] compiler_3.2.1nnet_7.3-10
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] S4 classes extending GRangesList

2015-06-11 Thread Leonard Goldstein
Thanks Hervé.

Best wishes,

Leonard


On Thu, Jun 11, 2015 at 10:58 AM, Hervé Pagès  wrote:
> Hi Leonard,
>
> It's a bug in the "c" method for "CompressedList" object. I'll fix
> that. Thanks for the report.
>
> H.
>
>
> On 06/11/2015 10:48 AM, Leonard Goldstein wrote:
>>
>> Hi all,
>>
>> I noticed that when combining instances of a class that inherits from
>> GRangesList, the result does not preserve the class (it is returned as
>> a GRangesList instead). The class is preserved in other situations
>> (e.g. when a class extends GRanges). See below for an example. Is
>> there a reason why the class cannot be preserved in the first case?
>> Thanks in advance for your help.
>>
>> Leonard
>>
>>
>>> ## define a new class 'A' inheriting from GRanges
>>> setClass(Class = "A", contains = "GRanges")
>>>
>>> ## combining two instances of class 'A' returns an object of class 'A'
>>> gr <- GRanges("1", IRanges(1, 100))
>>> a <- new("A", gr)
>>> a
>>
>> A object with 1 range and 0 metadata columns:
>>seqnamesranges strand
>>  
>>[1]1  [1, 100]  *
>>---
>>seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>
>>> c(a, a)
>>
>> A object with 2 ranges and 0 metadata columns:
>>seqnamesranges strand
>>  
>>[1]1  [1, 100]  *
>>[2]1  [1, 100]  *
>>---
>>seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>
>>>
>>> ## define a new class 'b' inheriting from GRangesList
>>> setClass(Class = "B", contains = "GRangesList")
>>>
>>> ## combining two instances of class 'B' returns a GRangesList
>>> grl <- split(gr, 1)
>>> b <- new("B", grl)
>>> b
>>
>> B object of length 1:
>> $1
>> GRanges object with 1 range and 0 metadata columns:
>>seqnamesranges strand
>>  
>>[1]1  [1, 100]  *
>>
>> ---
>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>
>>> c(b, b)
>>
>> GRangesList object of length 2:
>> $1
>> GRanges object with 1 range and 0 metadata columns:
>>seqnamesranges strand
>>  
>>[1]1  [1, 100]  *
>>
>> $1
>> GRanges object with 1 range and 0 metadata columns:
>>seqnames   ranges strand
>>[1]1 [1, 100]  *
>>
>> ---
>> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>>>
>>>
>>> sessionInfo()
>>
>> R Under development (unstable) (2014-11-04 r66932)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>> [8] methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.21.15 GenomeInfoDb_1.5.7IRanges_2.3.11
>> [4] S4Vectors_0.7.4   BiocGenerics_0.15.2
>>
>> loaded via a namespace (and not attached):
>> [1] XVector_0.9.1
>>>
>>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] S4 classes extending GRangesList

2015-06-11 Thread Leonard Goldstein
Hi all,

I noticed that when combining instances of a class that inherits from
GRangesList, the result does not preserve the class (it is returned as
a GRangesList instead). The class is preserved in other situations
(e.g. when a class extends GRanges). See below for an example. Is
there a reason why the class cannot be preserved in the first case?
Thanks in advance for your help.

Leonard


> ## define a new class 'A' inheriting from GRanges
> setClass(Class = "A", contains = "GRanges")
>
> ## combining two instances of class 'A' returns an object of class 'A'
> gr <- GRanges("1", IRanges(1, 100))
> a <- new("A", gr)
> a
A object with 1 range and 0 metadata columns:
  seqnamesranges strand

  [1]1  [1, 100]  *
  ---
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> c(a, a)
A object with 2 ranges and 0 metadata columns:
  seqnamesranges strand

  [1]1  [1, 100]  *
  [2]1  [1, 100]  *
  ---
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> ## define a new class 'b' inheriting from GRangesList
> setClass(Class = "B", contains = "GRangesList")
>
> ## combining two instances of class 'B' returns a GRangesList
> grl <- split(gr, 1)
> b <- new("B", grl)
> b
B object of length 1:
$1
GRanges object with 1 range and 0 metadata columns:
  seqnamesranges strand

  [1]1  [1, 100]  *

---
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> c(b, b)
GRangesList object of length 2:
$1
GRanges object with 1 range and 0 metadata columns:
  seqnamesranges strand

  [1]1  [1, 100]  *

$1
GRanges object with 1 range and 0 metadata columns:
  seqnames   ranges strand
  [1]1 [1, 100]  *

---
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> sessionInfo()
R Under development (unstable) (2014-11-04 r66932)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.21.15 GenomeInfoDb_1.5.7IRanges_2.3.11
[4] S4Vectors_0.7.4   BiocGenerics_0.15.2

loaded via a namespace (and not attached):
[1] XVector_0.9.1
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] scanBam() segfault error

2015-04-20 Thread Leonard Goldstein
Thanks Martin. I had one other test case that resulted in a segfault.
I just tried Rsamtools 1.21.1 and both cases work fine with the new
version. Thanks for your help and for fixing this so quickly.

Leonard


On Mon, Apr 20, 2015 at 10:59 AM, Martin Morgan  wrote:
> On 04/17/2015 09:12 AM, Martin Morgan wrote:
>>
>> On 04/17/2015 08:59 AM, Leonard Goldstein wrote:
>>>
>>> Hi all,
>>>
>>> I ran into a segfault error when trying to read paired-end RNA-seq
>>> alignments with scanBam (see below). This seems to be a problem
>>> introduced in Rsamtools version 1.19.35 (1.19.34 works fine). In case
>>> you need the BAM file for testing, please let me know. Many thanks for
>>
>>
>> Probably a BAM file would be a big help... Thanks, Martin
>
>
> Thanks Leonard, I think this is addressed in 1.20.1 / 1.21.1, which are not
> yet available via biocLite.
>
> I will likely revisit this code in the next week or so; please let me know
> if there are other issues.
>
>
> Martin
>
>>
>>> your help.
>>>
>>> Leonard
>>>
>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2014-11-04 r66932)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats4stats graphics  grDevices utils datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>> [1] Rsamtools_1.19.54 Biostrings_2.35.13XVector_0.7.4
>>> [4] GenomicRanges_1.19.57 GenomeInfoDb_1.3.20   IRanges_2.1.45
>>> [7] S4Vectors_0.5.23  BiocGenerics_0.13.11
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-6compiler_3.2.0  tools_3.2.0 zlibbioc_1.13.3
>>>>
>>>>
>>>> gr <- GRanges("16", IRanges(13173149, 15234449), "+")
>>>>
>>>> file <- BamFile(file, asMates = TRUE)
>>>>
>>>> param <- ScanBamParam(what = scanBamWhat(), which = gr)
>>>>
>>>> tmp <- scanBam(file = file, param = param)
>>>
>>>
>>>   *** caught segfault ***
>>> address 0x14538b1e, cause 'memory not mapped'
>>>
>>> Traceback:
>>>   1: .Call(func, .extptr(file), space, flag, simpleCigar, tagFilter,
>>> ...)
>>>   2: doTryCatch(return(expr), name, parentenv, handler)
>>>   3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>>   4: tryCatchList(expr, classes, parentenv, handlers)
>>>   5: tryCatch({.Call(func, .extptr(file), space, flag, simpleCigar,
>>> tagFilter, ...)}, error = function(err) {
>>> stop(conditionMessage(err), "\n  f\
>>> ile: ", path(file), "\n  index: ", index(file))})
>>>   6: .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file),
>>>tmpl, obeyQname(file), asMates(file), qnamePrefix, qnameSuffix,
>>> param = param)
>>>   7: scanBam(file = file, param = param)
>>>   8: scanBam(file = file, param = param)
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] scanBam() segfault error

2015-04-17 Thread Leonard Goldstein
Hi all,

I ran into a segfault error when trying to read paired-end RNA-seq
alignments with scanBam (see below). This seems to be a problem
introduced in Rsamtools version 1.19.35 (1.19.34 works fine). In case
you need the BAM file for testing, please let me know. Many thanks for
your help.

Leonard


> sessionInfo()
R Under development (unstable) (2014-11-04 r66932)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] Rsamtools_1.19.54 Biostrings_2.35.13XVector_0.7.4
[4] GenomicRanges_1.19.57 GenomeInfoDb_1.3.20   IRanges_2.1.45
[7] S4Vectors_0.5.23  BiocGenerics_0.13.11

loaded via a namespace (and not attached):
[1] bitops_1.0-6compiler_3.2.0  tools_3.2.0 zlibbioc_1.13.3
>
> gr <- GRanges("16", IRanges(13173149, 15234449), "+")
>
> file <- BamFile(file, asMates = TRUE)
>
> param <- ScanBamParam(what = scanBamWhat(), which = gr)
>
> tmp <- scanBam(file = file, param = param)

 *** caught segfault ***
address 0x14538b1e, cause 'memory not mapped'

Traceback:
 1: .Call(func, .extptr(file), space, flag, simpleCigar, tagFilter, ...)
 2: doTryCatch(return(expr), name, parentenv, handler)
 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 4: tryCatchList(expr, classes, parentenv, handlers)
 5: tryCatch({.Call(func, .extptr(file), space, flag, simpleCigar,
tagFilter, ...)}, error = function(err) {
stop(conditionMessage(err), "\n  f\
ile: ", path(file), "\n  index: ", index(file))})
 6: .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file),
  tmpl, obeyQname(file), asMates(file), qnamePrefix, qnameSuffix,
param = param)
 7: scanBam(file = file, param = param)
 8: scanBam(file = file, param = param)

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-23 Thread Leonard Goldstein
Sounds very sensible not to double count in the context of tallying
variants. I was more concerned with reducing which as the default
behavior for scanBam and other functions.

I wanted to bring up the samtools behavior as - for me at least -
inconsistencies between Rsamtools and samtools have been another
source of confusion in the past (e.g. different naming conventions for
fields like isize vs TLEN etc.)

Leonard


On Mon, Feb 23, 2015 at 11:22 AM, Michael Lawrence
 wrote:
> Maybe Rsamtools would want to follow this precedent. I think there might be
> a difference between fishing out alignments from a SAM/BAM, and deriving a
> summary (tallyVariants) from a BAM. It seems like an argument could be made
> for a tally set to not contain duplicates.
>
> On Mon, Feb 23, 2015 at 11:05 AM, Leonard Goldstein
>  wrote:
>>
>> Hi Michael and Thomas,
>>
>> I ran into the same problem in the past (i.e. when I started working
>> with functions like scanBam I expected them not to return the same
>> alignment multiple times)
>>
>> One thing to consider might be that returning alignments multiple
>> times is consistent with the behavior of the samtools view command.
>> Quoting from the samtools manual:
>>
>> “Important note: when multiple regions are given, some alignments may
>> be output multiple times if they overlap more than one of the
>> specified regions.”
>>
>> Maybe there is an argument for keeping things consistent with
>> samtools? As you said, if documented properly, the user can decide
>> whether to reduce regions specified in which or not.
>>
>> Leonard
>>
>>
>> On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
>>  wrote:
>> > We should at leaast try to avoid surprising the user. Seems like most
>> > people expect "which" to be a simple restriction, so I think for now I
>> > will
>> > just reduce the which, and if someone has a use case for separate
>> > queries,
>> > we can address it in the future.
>> >
>> > On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann
>> > 
>> > wrote:
>> >
>> >> Personally, I don't have a use case with "meaningful loci" worth
>> >> tracking,
>> >> so keeping it simple would work for me.
>> >>
>> >> Incidentally, would it be good to deal with the 'which' parameter in a
>> >> consistent way across different methods ? I just saw this recent post
>> >> on
>> >> the mailing list in which a used got confused by duplicate counts
>> >> returned
>> >> after passing 'which' to scanBamParam:
>> >>
>> >> https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html
>> >>
>> >>
>> >> ---
>> >>
>> >> Thomas Sandmann, PhD
>> >> Computational biologist
>> >>
>> >> Genentech, Inc.
>> >> 1 DNA Way
>> >> South San Francisco, CA 94080
>> >> USA
>> >>
>> >> Phone: +1 650 225 6273
>> >> Fax: +1 650 225 5389
>> >> Email: sandmann.tho...@gene.com
>> >>
>> >> "If a man will begin with certainties, he shall end in doubts; but if
>> >> he
>> >> will be content to begin with doubts he shall end in certainties." --
>> >> Sir
>> >> Francis Bacon
>> >>
>> >>
>> >> On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence <
>> >> lawrence.mich...@gene.com> wrote:
>> >>
>> >>> We just have to decide which is the more useful interpretation of
>> >>> which
>> >>> -- as a simple restriction, or as a vector of meaningful locii, which
>> >>> will
>> >>> be analyzed individually? I would actually favor the first one (the
>> >>> same as
>> >>> yours), just because it's simpler. To keep track of the query ranges,
>> >>> we
>> >>> would need to add a new column to the returned object, which will more
>> >>> often than not just be clutter. I guess we could introduce a new
>> >>> parameter,
>> >>> "reduceWhich" which defaults to TRUE and reduces the which. If FALSE,
>> >>> it
>> >>> instead adds the column mapping back to the original which ranges.
>> >>>
>> >>>
>> >>> On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann <
>> >>> sandmann.tho...@gene.com> wrote:
>> >

Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-23 Thread Leonard Goldstein
Hi Michael and Thomas,

I ran into the same problem in the past (i.e. when I started working
with functions like scanBam I expected them not to return the same
alignment multiple times)

One thing to consider might be that returning alignments multiple
times is consistent with the behavior of the samtools view command.
Quoting from the samtools manual:

“Important note: when multiple regions are given, some alignments may
be output multiple times if they overlap more than one of the
specified regions.”

Maybe there is an argument for keeping things consistent with
samtools? As you said, if documented properly, the user can decide
whether to reduce regions specified in which or not.

Leonard


On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
 wrote:
> We should at leaast try to avoid surprising the user. Seems like most
> people expect "which" to be a simple restriction, so I think for now I will
> just reduce the which, and if someone has a use case for separate queries,
> we can address it in the future.
>
> On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann 
> wrote:
>
>> Personally, I don't have a use case with "meaningful loci" worth tracking,
>> so keeping it simple would work for me.
>>
>> Incidentally, would it be good to deal with the 'which' parameter in a
>> consistent way across different methods ? I just saw this recent post on
>> the mailing list in which a used got confused by duplicate counts returned
>> after passing 'which' to scanBamParam:
>>
>> https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html
>>
>>
>> ---
>>
>> Thomas Sandmann, PhD
>> Computational biologist
>>
>> Genentech, Inc.
>> 1 DNA Way
>> South San Francisco, CA 94080
>> USA
>>
>> Phone: +1 650 225 6273
>> Fax: +1 650 225 5389
>> Email: sandmann.tho...@gene.com
>>
>> "If a man will begin with certainties, he shall end in doubts; but if he
>> will be content to begin with doubts he shall end in certainties." -- Sir
>> Francis Bacon
>>
>>
>> On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence <
>> lawrence.mich...@gene.com> wrote:
>>
>>> We just have to decide which is the more useful interpretation of which
>>> -- as a simple restriction, or as a vector of meaningful locii, which will
>>> be analyzed individually? I would actually favor the first one (the same as
>>> yours), just because it's simpler. To keep track of the query ranges, we
>>> would need to add a new column to the returned object, which will more
>>> often than not just be clutter. I guess we could introduce a new parameter,
>>> "reduceWhich" which defaults to TRUE and reduces the which. If FALSE, it
>>> instead adds the column mapping back to the original which ranges.
>>>
>>>
>>> On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann <
>>> sandmann.tho...@gene.com> wrote:
>>>
 Hi Michael,

 ah, I see. I hadn't realized that returning the pileups separately for
 each region could be a desired feature, but that makes sense. I agree, as
 it is easy for the user to 'reduce' the ranges beforehand your first option
 (e.g. returning the ID of the range) would be more flexible.

 Perhaps you would consider adding a sentence to the documentation of
 'which' on BamTallyParam's help page explaining that users might want to
 'reduce' their ranges beforehand if they are only interested in a single
 tally for each base ?

 Thanks a lot !
 Thomas


>>>
>>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Error when reading BAM files with readGAlignmentPairs

2014-12-12 Thread Leonard Goldstein
Hi Hervé,

Thanks for the quick reply - I didn't notice the build reports.

Best wishes

Leonard


On Fri, Dec 12, 2014 at 12:55 PM, Hervé Pagès  wrote:
> Hi Leonard,
>
> You can also see this error on the build report:
>
>
> http://bioconductor.org/checkResults/3.1/bioc-LATEST/GenomicAlignments/zin2-buildsrc.html
>
> This is a known issue that is the consequence of a recent change in
> Rsamtools. We'll sort it out.
>
> Thanks for your patience,
> H.
>
>
>
> On 12/12/2014 12:51 PM, Leonard Goldstein wrote:
>>
>> Dear Bioc developers,
>>
>> There seems to be an issue with GenomicAlignments or dependent
>> packages as I can no longer read BAM files with readGAlignmentPairs
>> that could be read without problems previously (see example below). I
>> assume this is a general problem but can provide test files if
>> necessary.
>>
>> Leonard
>>
>>
>>> readGAlignmentPairs(file)
>>
>> Error in .Call2("Integer_explode_bits", x, bitpos, PACKAGE = "S4Vectors")
>> :
>>'bitpos' must contain values >= 1
>>>
>>>
>>> sessionInfo()
>>
>> R Under development (unstable) (2014-11-04 r66932)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>> [8] methods   base
>>
>> other attached packages:
>> [1] GenomicAlignments_1.3.14 Rsamtools_1.19.14Biostrings_2.35.6
>> [4] XVector_0.7.3GenomicRanges_1.19.21GenomeInfoDb_1.3.7
>> [7] IRanges_2.1.32   S4Vectors_0.5.14 BiocGenerics_0.13.3
>>
>> loaded via a namespace (and not attached):
>>   [1] base64enc_0.1-2BatchJobs_1.5  BBmisc_1.8
>> BiocParallel_1.1.9
>>   [5] bitops_1.0-6   brew_1.0-6 checkmate_1.5.0
>> codetools_0.2-9
>>   [9] DBI_0.3.1  digest_0.6.6   fail_1.2
>> foreach_1.4.2
>> [13] iterators_1.0.7RSQLite_1.0.0  sendmailR_1.2-1
>> stringr_0.6.2
>> [17] tools_3.2.0zlibbioc_1.13.0
>>>
>>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Error when reading BAM files with readGAlignmentPairs

2014-12-12 Thread Leonard Goldstein
Dear Bioc developers,

There seems to be an issue with GenomicAlignments or dependent
packages as I can no longer read BAM files with readGAlignmentPairs
that could be read without problems previously (see example below). I
assume this is a general problem but can provide test files if
necessary.

Leonard


> readGAlignmentPairs(file)
Error in .Call2("Integer_explode_bits", x, bitpos, PACKAGE = "S4Vectors") :
  'bitpos' must contain values >= 1
>
> sessionInfo()
R Under development (unstable) (2014-11-04 r66932)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] GenomicAlignments_1.3.14 Rsamtools_1.19.14Biostrings_2.35.6
[4] XVector_0.7.3GenomicRanges_1.19.21GenomeInfoDb_1.3.7
[7] IRanges_2.1.32   S4Vectors_0.5.14 BiocGenerics_0.13.3

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2BatchJobs_1.5  BBmisc_1.8 BiocParallel_1.1.9
 [5] bitops_1.0-6   brew_1.0-6 checkmate_1.5.0codetools_0.2-9
 [9] DBI_0.3.1  digest_0.6.6   fail_1.2   foreach_1.4.2
[13] iterators_1.0.7RSQLite_1.0.0  sendmailR_1.2-1stringr_0.6.2
[17] tools_3.2.0zlibbioc_1.13.0
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] flank function with start argument a named vector

2014-03-28 Thread Leonard Goldstein
Dear Bioc developers,

I ran into problems when using the flank function with the start
argument set to a named vector.

This returns an IRanges with named starts and ends, which causes
problems downstream. Please see example below.

Thanks for your help

Leonard


> ir <- IRanges(1, 1)
> fl <- flank(ir, 1, c(a = FALSE))
> fl
IRanges of length 1
start end width
[1] 2   2 1
> start(fl)
a
2
> end(fl)
a
2
> findOverlaps(ir, fl)
Error in findOverlaps(query, IntervalTree(subject), maxgap = maxgap,
minoverlap = minoverlap,  :
  error in evaluating the argument 'subject' in selecting a method for
function 'findOverlaps': Error in validObject(from) :
  invalid class "IRanges" object: 'start(x)' must be an unnamed
integer vector with no NAs
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] IRanges_1.21.36BiocGenerics_0.9.3

loaded via a namespace (and not attached):
[1] stats4_3.1.0
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] readGAlignmentPairs function broken?

2014-03-11 Thread Leonard Goldstein
Hi Hervé

It works great. Thanks for optimizing the function, it will be very
helpful when processing larger data sets.

Best wishes

Leonard


On Fri, Mar 7, 2014 at 8:49 AM, Hervé Pagès  wrote:
> Hi Leonard,
>
> Sorry for the delay. I finally managed to spend some time optimizing
> readGAlignmentPairs(). In the latest GenomicAlignments (0.99.32),
> it's still using the new pairing algo but I got rid of some of the
> inefficiencies I introduced when I switched the code to using this
> algo. So now it's as fast as before the switch, but, thanks to the
> new pairing algo, it also uses less memory and supports reading the
> BAM file by chunk (via the 'yieldSize' arg of BamFile()).
>
> Cheers,
> H.
>
>
>
> On 12/20/2013 11:53 AM, Hervé Pagès wrote:
>>
>> Hi Leonard,
>>
>> What happened between GenomicAlignments 0.99.8 and
>> GenomicAlignments 0.99.10 is that I modified
>> readGAlignmentPairsFromBam() to use the new pairing algo
>> (implemented in scanBam()) instead of the pairing algo
>> implemented by findMateAlignment():
>>
>> 
>> r84106 | hpa...@fhcrc.org | 2013-12-09 23:49:43 -0800 (Mon, 09 Dec 2013)
>> | 3 lines
>>
>> Modify readGAlignmentPairsFromBam() to use
>> scanBam(BamFile(asMates=TRUE), ...)
>> instead of findMateAlignment() for the pairing.
>>
>> 
>>
>> The new pairing algo (implemented by Val and Martin) is fast and memory
>> efficient, and, last but not least, supports 'yieldSize' for reading the
>> BAM file by chunk. The same pairing algo is also used by
>> readGAlignmentsListFromBam().
>>
>> readGAlignmentPairsFromBam() actually calls readGAlignmentsListFromBam()
>> and then turns the returned GAlignmentsList object into a
>> GAlignmentPairs object. I need to do some timings but I suspect this
>> transformation is taking a little bit too long and there might be room
>> for optimization (like e.g. avoiding the GAlignmentsList intermediate
>> representation).
>>
>> I'll keep you updated.
>>
>> Thanks,
>> H.
>>
>>
>> On 12/19/2013 12:58 PM, Leonard Goldstein wrote:
>>>
>>> Hi Hervé
>>>
>>> Thanks for fixing this problem. It works fine now, however I did
>>> notice a drop in performance. For example reading in chromosome 1 can
>>> take about twice as long as previously (see below). Is this something
>>> that can be avoided?
>>>
>>> Many thanks again
>>>
>>> Leonard
>>>
>>> --
>>> GenomicAlignments 0.99.8
>>>
>>>> param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
>>>>
>>>> system.time(readGAlignmentPairs(file, param = param))
>>>
>>> user  system elapsed
>>> 161.282   7.812 169.442
>>>>
>>>> system.time(readGAlignmentPairs(file, param = param))
>>>
>>> user  system elapsed
>>>   91.614   7.256  99.065
>>>>
>>>> system.time(readGAlignmentPairs(file, param = param))
>>>
>>> user  system elapsed
>>>   89.285   7.461  96.940
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2013-12-03 r64376)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats graphics  grDevices utils datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] GenomicAlignments_0.99.8 Rsamtools_1.15.15Biostrings_2.31.5
>>> [4] GenomicRanges_1.15.17XVector_0.3.5IRanges_1.21.17
>>> [7] BiocGenerics_0.9.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>>>>
>>>>
>>>
>>> GenomicAlignments 0.99.10
>>>
>>>> param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
>>>>
>>>> system.time(readGAlignmentPa

Re: [Bioc-devel] readGAlignmentPairs function broken?

2013-12-19 Thread Leonard Goldstein
Hi Hervé

Thanks for fixing this problem. It works fine now, however I did
notice a drop in performance. For example reading in chromosome 1 can
take about twice as long as previously (see below). Is this something
that can be avoided?

Many thanks again

Leonard

--
GenomicAlignments 0.99.8

> param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
>
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
161.282   7.812 169.442
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
 91.614   7.256  99.065
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
 89.285   7.461  96.940
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.8 Rsamtools_1.15.15Biostrings_2.31.5
[4] GenomicRanges_1.15.17XVector_0.3.5IRanges_1.21.17
[7] BiocGenerics_0.9.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>

GenomicAlignments 0.99.10

> param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
>
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
265.624   8.812 274.990
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
249.724   7.177 257.399
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
247.476   6.648 254.621
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.10 Rsamtools_1.15.15
[3] Biostrings_2.31.5 GenomicRanges_1.15.17
[5] XVector_0.3.5 IRanges_1.21.17
[7] BiocGenerics_0.9.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>

On Thu, Dec 19, 2013 at 11:52 AM, Hervé Pagès  wrote:
> Hi Leonard,
>
> This should be fixed in GenomicAlignments 0.99.10, which will become
> available thru biocLite() in the next 24 hours or so. In the mean time
> you can grab it directly from svn:
>
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicAlignments
>
> Please let me know if you still run into problems with this.
> Thanks!
>
>
> H.
>
> On 12/19/2013 10:41 AM, Leonard Goldstein wrote:
>>
>> Hi Hervé
>>
>> You probably spotted this already but it looks like the problem is
>> introduced between GenomicAlignments revisions r84052 (0.99.8) and
>> r84106 (0.99.9)
>>
>> Best wishes
>>
>> Leonard
>>
>>
>> On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein 
>> wrote:
>>>
>>> Dear list,
>>>
>>> There seems to be a problem with the readGAlignmentPairs function:
>>>
>>> Querying genomic regions without any alignments using the which
>>> argument results in an error -- see (1) below. Reading in a whole
>>> chromosome takes indefinitely (or at least much longer than in
>>> previous versions) -- see (2) below. I suspect these issues are not
>>> specific to the BAM files I am working with but can provide test data
>>> if required.
>>>
>>> Many thanks for your help.
>>>
>>> Leonard
>>>
>>>
>>> --
>>> (1) Attempts to read an empty region results in an error.
>>>
>>>> gr <- GRanges("22", IRanges(100, 200))
>>>>
>>>> param <- ScanBamParam(which = gr)
>>>>
>>>> readGAlignmentPairs(file, param = param)
>>>
>>> Error in `elementMetadata<-`(x, ..., value = value) :
>>>replacement 'elementMetadata' value must be a DataTable object or NULL
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2013-12-03 r6

Re: [Bioc-devel] readGAlignmentPairs function broken?

2013-12-19 Thread Leonard Goldstein
Hi Hervé

You probably spotted this already but it looks like the problem is
introduced between GenomicAlignments revisions r84052 (0.99.8) and
r84106 (0.99.9)

Best wishes

Leonard


On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein  wrote:
> Dear list,
>
> There seems to be a problem with the readGAlignmentPairs function:
>
> Querying genomic regions without any alignments using the which
> argument results in an error -- see (1) below. Reading in a whole
> chromosome takes indefinitely (or at least much longer than in
> previous versions) -- see (2) below. I suspect these issues are not
> specific to the BAM files I am working with but can provide test data
> if required.
>
> Many thanks for your help.
>
> Leonard
>
>
> --
> (1) Attempts to read an empty region results in an error.
>
>> gr <- GRanges("22", IRanges(100, 200))
>>
>> param <- ScanBamParam(which = gr)
>>
>> readGAlignmentPairs(file, param = param)
> Error in `elementMetadata<-`(x, ..., value = value) :
>   replacement 'elementMetadata' value must be a DataTable object or NULL
>>
>> sessionInfo()
> R Under development (unstable) (2013-12-03 r64376)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets  methods
> [8] base
>
> other attached packages:
> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15Biostrings_2.31.5
> [4] GenomicRanges_1.15.17XVector_0.3.5IRanges_1.21.16
> [7] BiocGenerics_0.9.2   BiocInstaller_1.13.3
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6   stats4_3.1.0   tools_3.1.0zlibbioc_1.9.0
>>
>
> ... but works fine with previous version
>
>> gr <- GRanges("22", IRanges(100, 200))
>>
>> param <- ScanBamParam(which = gr)
>>
>> readGAlignmentPairs(file, param = param)
> GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
>  seqnames strand :ranges --ranges
>:  -- 
> ---
> seqlengths:
>   1  2  3 ... GL000247.1 GL000248.1 GL000249.1
>   249250621  243199373  198022430 ...  36422  39786  38502
>>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets  methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.14.2 Biostrings_2.30.1GenomicRanges_1.14.4
> [4] XVector_0.2.0IRanges_1.20.6   BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>
>
> (2) Use of the which argument covering chromosome 22 takes under one
> minute with an earlier version of readGAlignmentPairs
>
>> gr <- GRanges("22", IRanges(1, 51304566))
>>
>> param <- ScanBamParam(which = gr)
>>
>> system.time(gap <- readGAlignmentPairs(file, param = param))
>user  system elapsed
>  45.887   0.256  46.168
>>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets  methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.14.2 Biostrings_2.30.1GenomicRanges_1.14.4
> [4] XVector_0.2.0IRanges_1.20.6   BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>
>
> ... but at least twenty times as long with the current version.
>
>> gr <- GRanges("22", IR

[Bioc-devel] readGAlignmentPairs function broken?

2013-12-18 Thread Leonard Goldstein
Dear list,

There seems to be a problem with the readGAlignmentPairs function:

Querying genomic regions without any alignments using the which
argument results in an error -- see (1) below. Reading in a whole
chromosome takes indefinitely (or at least much longer than in
previous versions) -- see (2) below. I suspect these issues are not
specific to the BAM files I am working with but can provide test data
if required.

Many thanks for your help.

Leonard


--
(1) Attempts to read an empty region results in an error.

> gr <- GRanges("22", IRanges(100, 200))
>
> param <- ScanBamParam(which = gr)
>
> readGAlignmentPairs(file, param = param)
Error in `elementMetadata<-`(x, ..., value = value) :
  replacement 'elementMetadata' value must be a DataTable object or NULL
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.9 Rsamtools_1.15.15Biostrings_2.31.5
[4] GenomicRanges_1.15.17XVector_0.3.5IRanges_1.21.16
[7] BiocGenerics_0.9.2   BiocInstaller_1.13.3

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   tools_3.1.0zlibbioc_1.9.0
>

... but works fine with previous version

> gr <- GRanges("22", IRanges(100, 200))
>
> param <- ScanBamParam(which = gr)
>
> readGAlignmentPairs(file, param = param)
GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
 seqnames strand :ranges --ranges
   :  -- 
---
seqlengths:
  1  2  3 ... GL000247.1 GL000248.1 GL000249.1
  249250621  243199373  198022430 ...  36422  39786  38502
>
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] Rsamtools_1.14.2 Biostrings_2.30.1GenomicRanges_1.14.4
[4] XVector_0.2.0IRanges_1.20.6   BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0


(2) Use of the which argument covering chromosome 22 takes under one
minute with an earlier version of readGAlignmentPairs

> gr <- GRanges("22", IRanges(1, 51304566))
>
> param <- ScanBamParam(which = gr)
>
> system.time(gap <- readGAlignmentPairs(file, param = param))
   user  system elapsed
 45.887   0.256  46.168
>
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] Rsamtools_1.14.2 Biostrings_2.30.1GenomicRanges_1.14.4
[4] XVector_0.2.0IRanges_1.20.6   BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>

... but at least twenty times as long with the current version.

> gr <- GRanges("22", IRanges(1, 51304566))
>
> param <- ScanBamParam(which = gr)
>
> system.time(gap <- readGAlignmentPairs(file, param = param))

^C
Timing stopped at: 1108.041 35.998 1144.006
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.9 Rsamtools_1.15.15Biostrings_2.31.5
[4] GenomicRanges_1.15.17XVector_0.3.5IRanges_1.21.16
[7] BiocGenerics_0.9.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>

__