[Bioc-devel] ShortRead::countLines integer overflow with large fastq files

2018-02-20 Thread Thomas Girke
Dear Martin,

countLines in ShrotRead returns the line counts as integers which appears
to create problems with large FASTQ files (>536.8 Mio lines) due to R's
integer limit (2^31-1). When the integer limit is reached/exceeded it seems
that countLines returns negative values not reflecting the number of lines
in a file anymore. At least this is what I learned after several users
reported this problem and then running some tests myself on large FASTQ
files with variable line numbers around the integer limit. If my conclusion
is correct and there aren' t any strong reasons against it, would it be
possible to consider returning numeric values instead either by default or
conditionally (e.g. when the count is >= .Machine$integer.max) to lift this
limit. If this is not possible then returning NAs instead of negative
values would be a sensible compromise.

Thanks,

Thomas

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
[1] C

attached base packages:
[1] stats4parallel  stats graphics  utils datasets  grDevices
methods   base

other attached packages:
 [1] ShortRead_1.36.0   GenomicAlignments_1.14.1
 SummarizedExperiment_1.8.0 DelayedArray_0.4.1 matrixStats_0.52.2
   Biobase_2.38.0 Rsamtools_1.30.0
 GenomicRanges_1.30.0   GenomeInfoDb_1.14.0Biostrings_2.46.0
  XVector_0.18.0 IRanges_2.12.0
 S4Vectors_0.16.0
[14] BiocParallel_1.12.0BiocGenerics_0.24.0setwidth_1.0-4
   colorout_1.1-3

loaded via a namespace (and not attached):
 [1] zlibbioc_1.24.0 lattice_0.20-35 hwriter_1.3.2
 tools_3.4.2 grid_3.4.2  latticeExtra_0.6-28
 Matrix_1.2-12   GenomeInfoDbData_0.99.1 RColorBrewer_1.1-2
bitops_1.0-6RCurl_1.95-4.8  compiler_3.4.2

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.list fails on IRanges inside of lapply(, blah)

2018-02-20 Thread Hervé Pagès

On 02/20/2018 01:25 PM, Gabe Becker wrote:

Herve,

Thanks for the response. The looping across a ranges that's still in 
tehre is:


dss = switch(seqtype,
                      bp = DNAStringSet(*lapply(ranges(srcs)*,
function(x) origin[x])),
                      aa = AAStringSet(*lapply(ranges(srcs),*
function(x) origin[x])),
                      stop("Unrecognized origin sequence type: ",
seqtype)
                      )

(Line 495 in genbankReader.R)


That was also fixed in genbankr 1.7.2. I replaced this with

  dss = extractAt(origin, ranges(srcs))

Do 'git show 340b0d4fac511f8171391fdeb2233ca6a410743d' to see
the details of the changes I made.

Cheers,
H.



srcs is a GRanges, making ranges(srcs) an IRanges, so this lapply fails. 
I'm not sure what I'm meant to do here as there's not an already 
vectorized version that I know of that does the rigth thing (I want 
separate DNAStrings for each range, so origin[ranges(srcs)] doesn't work).


I mean I can force the conversion to list issue with 
lapply(1:length(srcs), function(i) ranges(srcs)[i]) or similar but that 
seems pretty ugly...


As for the other issue with the build not working in release, that is a 
bug in the rentrez (which is on CRAN, not Bioc). I've submitted a PR to 
fix that, and we'll see what the response is as to whether I need to 
remove that integration or not.


~G






On Tue, Feb 20, 2018 at 10:48 AM, Hervé Pagès > wrote:


Hi Gabe,

I made a couple of changes to genbankr (1.7.2) to avoid those looping
e.g. I replaced things like

     sapply(gr, width)

with

     width(gr)

I can't run a full 'R CMD build' + 'R CMD check' on the package though
because the code in the vignette seems to fail for reasons unrelated
to the recent changes to IRanges / GenomicRanges (I get the same error
with the release version, see release build report).

The previous behavior of as.list() on IRanges ans GRanges objects will
be restored (with a deprecation warning) once all the packages that
need a fix get one (only 7 packages left on my list). I should be done
with them in the next couple of days.

H.


On 02/20/2018 09:41 AM, Gabe Becker wrote:

All,

I'm trying to track down the new failure in my genbankr package
and it
appears to come down to the fact  that i'm trying to lapply over an
IRanges, which fails in the IRanges to list (or List?)
conversion. The
particular case that fails in my example is an IRanges of length
1 but that
does not appear to matter, as lapply fails over IRanges of
length >1 as
well.

Is this intentional? If so, it seems a change of this magnitude
would
warrant a deprecation cycle at least. If not, please let me know
so I can
leave the code as is and wait for the fix.

rng1 = IRanges(start = 1, end = 5)


rng2 = IRanges(start = c(1, 7), end = c(3, 10))


rng1


IRanges object with 1 range and 0 metadata columns:

            start       end     width

          

    [1]         1         5         5

rng2


IRanges object with 2 ranges and 0 metadata columns:

            start       end     width

          

    [1]         1         3         3

    [2]         7        10         4

lapply(rng1, identity)


*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function
‘getListElement’ for
signature ‘"IRanges"’*

lapply(rng2, identity)


*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function
‘getListElement’ for
signature ‘"IRanges"’*

sessionInfo()


R Under development (unstable) (2018-02-16 r74263)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6


Matrix products: default

BLAS:

/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib

LAPACK:

/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils   
  datasets


[8] methods   base


other attached packages:

*[1] IRanges_2.13.26     S4Vectors_0.17.33   BiocGenerics_0.25.3*


loaded via a namespace (a


Re: [Bioc-devel] as.list fails on IRanges inside of lapply(, blah)

2018-02-20 Thread Gabe Becker
Herve,

Thanks for the response. The looping across a ranges that's still in tehre
is:

  dss = switch(seqtype,
 bp = DNAStringSet(*lapply(ranges(srcs)*, function(x)
origin[x])),
 aa = AAStringSet(*lapply(ranges(srcs),* function(x)
origin[x])),
 stop("Unrecognized origin sequence type: ", seqtype)
 )

(Line 495 in genbankReader.R)

srcs is a GRanges, making ranges(srcs) an IRanges, so this lapply fails.
I'm not sure what I'm meant to do here as there's not an already vectorized
version that I know of that does the rigth thing (I want separate
DNAStrings for each range, so origin[ranges(srcs)] doesn't work).

I mean I can force the conversion to list issue with lapply(1:length(srcs),
function(i) ranges(srcs)[i]) or similar but that seems pretty ugly...

As for the other issue with the build not working in release, that is a bug
in the rentrez (which is on CRAN, not Bioc). I've submitted a PR to fix
that, and we'll see what the response is as to whether I need to remove
that integration or not.

~G






On Tue, Feb 20, 2018 at 10:48 AM, Hervé Pagès  wrote:

> Hi Gabe,
>
> I made a couple of changes to genbankr (1.7.2) to avoid those looping
> e.g. I replaced things like
>
> sapply(gr, width)
>
> with
>
> width(gr)
>
> I can't run a full 'R CMD build' + 'R CMD check' on the package though
> because the code in the vignette seems to fail for reasons unrelated
> to the recent changes to IRanges / GenomicRanges (I get the same error
> with the release version, see release build report).
>
> The previous behavior of as.list() on IRanges ans GRanges objects will
> be restored (with a deprecation warning) once all the packages that
> need a fix get one (only 7 packages left on my list). I should be done
> with them in the next couple of days.
>
> H.
>
>
> On 02/20/2018 09:41 AM, Gabe Becker wrote:
>
>> All,
>>
>> I'm trying to track down the new failure in my genbankr package and it
>> appears to come down to the fact  that i'm trying to lapply over an
>> IRanges, which fails in the IRanges to list (or List?) conversion. The
>> particular case that fails in my example is an IRanges of length 1 but
>> that
>> does not appear to matter, as lapply fails over IRanges of length >1 as
>> well.
>>
>> Is this intentional? If so, it seems a change of this magnitude would
>> warrant a deprecation cycle at least. If not, please let me know so I can
>> leave the code as is and wait for the fix.
>>
>> rng1 = IRanges(start = 1, end = 5)
>>>
>>
>> rng2 = IRanges(start = c(1, 7), end = c(3, 10))
>>>
>>
>> rng1
>>>
>>
>> IRanges object with 1 range and 0 metadata columns:
>>
>>start   end width
>>
>>  
>>
>>[1] 1 5 5
>>
>> rng2
>>>
>>
>> IRanges object with 2 ranges and 0 metadata columns:
>>
>>start   end width
>>
>>  
>>
>>[1] 1 3 3
>>
>>[2] 710 4
>>
>> lapply(rng1, identity)
>>>
>>
>> *Error in (function (classes, fdef, mtable)  : *
>>
>> *  unable to find an inherited method for function ‘getListElement’ for
>> signature ‘"IRanges"’*
>>
>> lapply(rng2, identity)
>>>
>>
>> *Error in (function (classes, fdef, mtable)  : *
>>
>> *  unable to find an inherited method for function ‘getListElement’ for
>> signature ‘"IRanges"’*
>>
>> sessionInfo()
>>>
>>
>> R Under development (unstable) (2018-02-16 r74263)
>>
>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>
>> Running under: OS X El Capitan 10.11.6
>>
>>
>> Matrix products: default
>>
>> BLAS:
>> /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resour
>> ces/lib/libRblas.dylib
>>
>> LAPACK:
>> /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resour
>> ces/lib/libRlapack.dylib
>>
>>
>> locale:
>>
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>>
>> attached base packages:
>>
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>>
>> [8] methods   base
>>
>>
>> other attached packages:
>>
>> *[1] IRanges_2.13.26 S4Vectors_0.17.33   BiocGenerics_0.25.3*
>>
>>
>> loaded via a namespace (a
>> nd
>> not attached):
>>
>> [1] compiler_3.5.0 tools_3.5.0
>>
>>
>>
>> Best,
>> ~G
>>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.list fails on IRanges inside of lapply(, blah)

2018-02-20 Thread Hervé Pagès

Hi Gabe,

I made a couple of changes to genbankr (1.7.2) to avoid those looping
e.g. I replaced things like

sapply(gr, width)

with

width(gr)

I can't run a full 'R CMD build' + 'R CMD check' on the package though
because the code in the vignette seems to fail for reasons unrelated
to the recent changes to IRanges / GenomicRanges (I get the same error
with the release version, see release build report).

The previous behavior of as.list() on IRanges ans GRanges objects will
be restored (with a deprecation warning) once all the packages that
need a fix get one (only 7 packages left on my list). I should be done
with them in the next couple of days.

H.

On 02/20/2018 09:41 AM, Gabe Becker wrote:

All,

I'm trying to track down the new failure in my genbankr package and it
appears to come down to the fact  that i'm trying to lapply over an
IRanges, which fails in the IRanges to list (or List?) conversion. The
particular case that fails in my example is an IRanges of length 1 but that
does not appear to matter, as lapply fails over IRanges of length >1 as
well.

Is this intentional? If so, it seems a change of this magnitude would
warrant a deprecation cycle at least. If not, please let me know so I can
leave the code as is and wait for the fix.


rng1 = IRanges(start = 1, end = 5)



rng2 = IRanges(start = c(1, 7), end = c(3, 10))



rng1


IRanges object with 1 range and 0 metadata columns:

   start   end width

 

   [1] 1 5 5


rng2


IRanges object with 2 ranges and 0 metadata columns:

   start   end width

 

   [1] 1 3 3

   [2] 710 4


lapply(rng1, identity)


*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function ‘getListElement’ for
signature ‘"IRanges"’*


lapply(rng2, identity)


*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function ‘getListElement’ for
signature ‘"IRanges"’*


sessionInfo()


R Under development (unstable) (2018-02-16 r74263)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6


Matrix products: default

BLAS:
/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib

LAPACK:
/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats4parallel  stats graphics  grDevices utils datasets

[8] methods   base


other attached packages:

*[1] IRanges_2.13.26 S4Vectors_0.17.33   BiocGenerics_0.25.3*


loaded via a namespace (and not attached):

[1] compiler_3.5.0 tools_3.5.0



Best,
~G





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] as.list fails on IRanges inside of lapply(, blah)

2018-02-20 Thread Gabe Becker
All,

I'm trying to track down the new failure in my genbankr package and it
appears to come down to the fact  that i'm trying to lapply over an
IRanges, which fails in the IRanges to list (or List?) conversion. The
particular case that fails in my example is an IRanges of length 1 but that
does not appear to matter, as lapply fails over IRanges of length >1 as
well.

Is this intentional? If so, it seems a change of this magnitude would
warrant a deprecation cycle at least. If not, please let me know so I can
leave the code as is and wait for the fix.

> rng1 = IRanges(start = 1, end = 5)

> rng2 = IRanges(start = c(1, 7), end = c(3, 10))

> rng1

IRanges object with 1 range and 0 metadata columns:

  start   end width



  [1] 1 5 5

> rng2

IRanges object with 2 ranges and 0 metadata columns:

  start   end width



  [1] 1 3 3

  [2] 710 4

> lapply(rng1, identity)

*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function ‘getListElement’ for
signature ‘"IRanges"’*

> lapply(rng2, identity)

*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function ‘getListElement’ for
signature ‘"IRanges"’*

> sessionInfo()

R Under development (unstable) (2018-02-16 r74263)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6


Matrix products: default

BLAS:
/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib

LAPACK:
/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats4parallel  stats graphics  grDevices utils datasets

[8] methods   base


other attached packages:

*[1] IRanges_2.13.26 S4Vectors_0.17.33   BiocGenerics_0.25.3*


loaded via a namespace (and not attached):

[1] compiler_3.5.0 tools_3.5.0



Best,
~G



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] gwaswloc class broken

2018-02-20 Thread Vincent Carey
I think this is settled by running updateObject(ebicat37) with a current
GenomicRanges etc.

I will do this and reserialize.

On Tue, Feb 20, 2018 at 8:01 AM, Robert Castelo 
wrote:

> thanks Vince for your quick response, indeed your intuition is right,
> coercing to 'GRanges' avoids the problem, i'm cc'ing bioc-devel so that
> people involved in 'GRanges' know about this:
>
> library(gwascat)
> data(ebicat37)
> ebicat37[1]
> Error in updateObject(x, check = FALSE) :
>   no slot of name "elementType" for this object of class "gwaswloc"
>
> ebicat37b <- as(ebicat37, "GRanges")
> ebicat37b[1]
> GRanges object with 1 range and 36 metadata columns:
>   seqnamesranges strand | DATE ADDED TO CATALOG  PUBMEDID
>  |
>   [1]chr11  41820450  * |   09-Jul-2015  26114229
>
> [ ... more output ...] ## i'm typing this myself to keep the email short
>
> PLATFORM [SNPS PASSING QC] CNV
> 
>   [1] Illumina [up to 5,616,481] (imputed)   N
> MAPPED_TRAIT MAPPED_TRAIT_URI
>
>   [1] post-traumatic stress disorder http://www.ebi.ac.uk/efo/EFO_0001358
>   ---
>   seqinfo: 23 sequences from GRCh37 genome
>
>
> cheers,
>
> robert.
>
> On 02/20/2018 01:51 PM, Vincent Carey wrote:
>
>> thanks robert   traveling but will tackle asapif u can coerce to
>> granges it may help as the only purpose of gwaswloc is to have a concise
>> show methodbut the coercion might fail too
>>
>> On Tue, Feb 20, 2018 at 6:43 AM Robert Castelo > > wrote:
>>
>> hi,
>>
>> the 'gwasloc' class from the gwascat package seems to be broken in
>> devel, i suspect due to recent changes in the 'GRanges' class or some
>> other class upstream, because the definition of the 'gwasloc' class in
>> gwascat/R/classes.R is:
>>
>> setClass("gwaswloc", representation(extractDate="character"),
>>  contains="GRanges")
>>
>> i paste below a minimal example, traceback and (very long)
>> corresponding
>> session info. i'm using it in the vignette of GenomicScores, so it's
>> not
>> so crucial but it would be nice to have it working again. thanks!!
>>
>> library(gwascat)
>>
>> data(ebicat37)
>>
>> class(ebicat37)
>> [1] "gwaswloc"
>> attr(,"package")
>> [1] "gwascat"
>>
>> ebicat37[1]
>> Error in updateObject(x, check = FALSE) :
>> no slot of name "elementType" for this object of class "gwaswloc"
>> 9: updateObject(x, check = FALSE)
>> 8: updateObject(x, check = FALSE)
>> 7: extractROWS(x, i)
>> 6: extractROWS(x, i)
>> 5: subset_along_ROWS(x, i, , ..., drop = drop)
>> 4: .nextMethod(x = x, i = i)
>> 3: callNextMethod()
>> 2: ebicat37[1]
>> 1: ebicat37[1]
>> sessionInfo()
>> R Under development (unstable) (2017-10-30 r73642)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: CentOS Linux 7 (Core)
>>
>> Matrix products: default
>> BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
>> LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
>>
>> locale:
>>[1] LC_CTYPE=en_US.UTF8   LC_NUMERIC=C
>>[3] LC_TIME=en_US.UTF8LC_COLLATE=en_US.UTF8
>>[5] LC_MONETARY=en_US.UTF8LC_MESSAGES=en_US.UTF8
>>[7] LC_PAPER=en_US.UTF8   LC_NAME=C
>>[9] LC_ADDRESS=C  LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats4stats graphics  grDevices utils
>>  datasets
>> [8] methods   base
>>
>> other attached packages:
>>[1] gwascat_2.11.1
>>[2] Homo.sapiens_1.3.1
>>[3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
>>[4] org.Hs.eg.db_3.5.0
>>[5] GO.db_3.5.0
>>[6] OrganismDbi_1.21.1
>>[7] GenomicFeatures_1.31.10
>>[8] GenomicRanges_1.31.22
>>[9] GenomeInfoDb_1.15.5
>> [10] AnnotationDbi_1.41.4
>> [11] IRanges_2.13.26
>> [12] S4Vectors_0.17.33
>> [13] Biobase_2.39.2
>> [14] BiocGenerics_0.25.3
>> [15] colorout_1.1-3
>>
>> loaded via a namespace (and not attached):
>> [1] ggbeeswarm_0.6.0  colorspace_1.3-2
>> [3] biovizBase_1.27.1 htmlTable_1.11.2
>> [5] XVector_0.19.8base64enc_0.1-3
>> [7] dichromat_2.0-0   rstudioapi_0.7
>> [9] bit64_0.9-7   interactiveDisplayBase_1.17.0
>>[11] codetools_0.2-15  splines_3.5.0
>>[13] snpStats_1.29.1   ggbio_1.27.1
>>[15] doParallel_1.0.11 knitr_1.19
>>[17] Formula_1.2-2 jsonlite_1.5
>>[19] gQTLBase_1.11.0   Rsamtools_1.31.3
>>[21] cluster_2.0.6 

Re: [Bioc-devel] R version check in BiocChech

2018-02-20 Thread Shepherd, Lori
Depending on your reviewer, they MAY let you slide with a different version 
dependency despite the BiocCheck WARNING... maybe...


However ...


It is strongly, strongly recommended that all new package depend on the version 
of R that it will be released under.  New packages currently being accepted 
will be released under Bioc 3.7 which will be associated with R 3.5.   So as 
mentioned, eventually users will want to update to use the latest versions 
anyways.  While new packages could be compatible with earlier versions of Bioc 
and R releases,  we at Bioconductor cannot guarantee that, which is why we 
generally insist for consistency sake that the R requirement is the latest 
version.  R versions can have subtle difference that can have a cascading 
effect on packages - similarly with packages that are used in dependencies - 
Since the Bioconductor build system is set up to test under the latest version 
testing scenario, that is all we can guarantee for compatibility and why we 
check for this.


Perhaps others will want to chime in as well with further reasoning.



Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Alexey 
Sergushichev 
Sent: Tuesday, February 20, 2018 4:26:35 AM
To: Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] R version check in BiocChech

> It _is_ the developer's choice.  But a developer of packages for the
Bioconductor
> project commits to using R-devel during certain pre-release phases,
depending
> on proximity in time to a point release of R.  (See
http://bioconductor.org/developers/how-to/useDevel/)
> for full details.)  BiocCheck verifies that this commitment is met.

No, BiocCheck doesn't verify this, it just checks for presence of
dependence on R >= 3.5. It actually doesn't check, whether I have installed
it on my computer at all; or how often I'm updating R-devel and test my
package against it; or whether I do some manual tests, as unit tests are
running regularly by BioConductor automatically.

--
Alexey



On Mon, Feb 19, 2018 at 9:03 PM, Vincent Carey 
wrote:

>
>
> On Mon, Feb 19, 2018 at 11:27 AM, Alexey Sergushichev  > wrote:
>
>> Kevin,
>>
>> > It does not request users to make R-devel a _requirement_ of their
>> package.
>>
>> Sadly it does for new packages. New packages submitted to Bioconductor 3.7
>> are _required_ to have R >= 3.5 dependency, otherwist BiocCheck will
>> result
>> in a warning (
>> https://github.com/Bioconductor/BiocCheck/blob/be9cd6e36d95f
>> 8bf873b52427d2a97fce6fbb9b9/R/checks.R#L23)
>> and warnings aren't allowed for new package submission.
>>
>> > Here, I think the decision here boils down to how far back in terms of R
>> versions the developer is willing to support the package. I suppose one
>> could state R�2.3 if they're confident about it.
>>
>> That's the problem: this is true for packages already in Bioconductor, but
>> it's not ture for the new package submissions.
>>
>> Aaron,
>>
>> > Personally, I haven't found it to be particularly difficult to update R,
>> > or to run R-devel in parallel with R 3.4, even without root privileges.
>>
>> I find it much harder for a normal user to install R-devel (and update it
>> properly, because it's a development version) and running
>> 'devtools::install_github("blabla/my_package")'.
>>
>> > I think many people underappreciate the benefits of moving to the latest
>> > version of R.
>>
>> Don't you think it should be a developer's choice whether to use such new
>> features or ignore them and have a potentially bigger audience?
>>
>
> It _is_ the developer's choice.  But a developer of packages for the
> Bioconductor
> project commits to using R-devel during certain pre-release phases,
> depending
> on proximity in time to a point release of R.  (See
> http://bioconductor.org/developers/how-to/useDevel/)
> for full details.)  BiocCheck verifies that this commitment is met.
>
>
>>
>> > Enforcing version consistency avoids heartache during release and
>> > debugging.
>>
>> But it's a developer's heartache. As I said, it even can't be attributed
>> to
>> Bioconductor at all, as it's not possible to install the package from
>> bioc-devel, unless you have the corresponding R version.
>>
>>
>> --
>> Alexey
>>
>>
>>
>> On Mon, Feb 19, 2018 at 6:38 PM, Aaron Lun  wrote:
>>
>> > I'll just throw in my two cents here.
>> >
>> > I think many people underappreciate the benefits of moving to the latest
>> > version of R. If you inspect the R-devel NEWS file, there's a couple of
>> > nice fixes/features that a developer might want to take advantage of:
>> >
>> > - sum() doesn't give NAs upon integer overflow anymore.
>> > - New ...elt(n) and ...length() functions for dealing with ellipses.

Re: [Bioc-devel] gwaswloc class broken

2018-02-20 Thread Robert Castelo
thanks Vince for your quick response, indeed your intuition is right, 
coercing to 'GRanges' avoids the problem, i'm cc'ing bioc-devel so that 
people involved in 'GRanges' know about this:


library(gwascat)
data(ebicat37)
ebicat37[1]
Error in updateObject(x, check = FALSE) :
  no slot of name "elementType" for this object of class "gwaswloc"

ebicat37b <- as(ebicat37, "GRanges")
ebicat37b[1]
GRanges object with 1 range and 36 metadata columns:
  seqnamesranges strand | DATE ADDED TO CATALOG  PUBMEDID
 |
  [1]chr11  41820450  * |   09-Jul-2015  26114229

[ ... more output ...] ## i'm typing this myself to keep the email short

PLATFORM [SNPS PASSING QC] CNV

  [1] Illumina [up to 5,616,481] (imputed)   N
MAPPED_TRAIT MAPPED_TRAIT_URI
   
  [1] post-traumatic stress disorder http://www.ebi.ac.uk/efo/EFO_0001358
  ---
  seqinfo: 23 sequences from GRCh37 genome


cheers,

robert.

On 02/20/2018 01:51 PM, Vincent Carey wrote:
thanks robert   traveling but will tackle asap    if u can coerce to 
granges it may help as the only purpose of gwaswloc is to have a concise 
show method    but the coercion might fail too


On Tue, Feb 20, 2018 at 6:43 AM Robert Castelo > wrote:


hi,

the 'gwasloc' class from the gwascat package seems to be broken in
devel, i suspect due to recent changes in the 'GRanges' class or some
other class upstream, because the definition of the 'gwasloc' class in
gwascat/R/classes.R is:

setClass("gwaswloc", representation(extractDate="character"),
     contains="GRanges")

i paste below a minimal example, traceback and (very long) corresponding
session info. i'm using it in the vignette of GenomicScores, so it's not
so crucial but it would be nice to have it working again. thanks!!

library(gwascat)

data(ebicat37)

class(ebicat37)
[1] "gwaswloc"
attr(,"package")
[1] "gwascat"

ebicat37[1]
Error in updateObject(x, check = FALSE) :
    no slot of name "elementType" for this object of class "gwaswloc"
9: updateObject(x, check = FALSE)
8: updateObject(x, check = FALSE)
7: extractROWS(x, i)
6: extractROWS(x, i)
5: subset_along_ROWS(x, i, , ..., drop = drop)
4: .nextMethod(x = x, i = i)
3: callNextMethod()
2: ebicat37[1]
1: ebicat37[1]
sessionInfo()
R Under development (unstable) (2017-10-30 r73642)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so

locale:
   [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8
   [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
   [7] LC_PAPER=en_US.UTF8       LC_NAME=C
   [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
   [1] gwascat_2.11.1
   [2] Homo.sapiens_1.3.1
   [3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
   [4] org.Hs.eg.db_3.5.0
   [5] GO.db_3.5.0
   [6] OrganismDbi_1.21.1
   [7] GenomicFeatures_1.31.10
   [8] GenomicRanges_1.31.22
   [9] GenomeInfoDb_1.15.5
[10] AnnotationDbi_1.41.4
[11] IRanges_2.13.26
[12] S4Vectors_0.17.33
[13] Biobase_2.39.2
[14] BiocGenerics_0.25.3
[15] colorout_1.1-3

loaded via a namespace (and not attached):
    [1] ggbeeswarm_0.6.0              colorspace_1.3-2
    [3] biovizBase_1.27.1             htmlTable_1.11.2
    [5] XVector_0.19.8                base64enc_0.1-3
    [7] dichromat_2.0-0               rstudioapi_0.7
    [9] bit64_0.9-7                   interactiveDisplayBase_1.17.0
   [11] codetools_0.2-15              splines_3.5.0
   [13] snpStats_1.29.1               ggbio_1.27.1
   [15] doParallel_1.0.11             knitr_1.19
   [17] Formula_1.2-2                 jsonlite_1.5
   [19] gQTLBase_1.11.0               Rsamtools_1.31.3
   [21] cluster_2.0.6                 graph_1.57.1
   [23] shiny_1.0.5                   compiler_3.5.0
   [25] httr_1.3.1                    backports_1.1.2
   [27] assertthat_0.2.0              Matrix_1.2-12
   [29] lazyeval_0.2.1                limma_3.35.11
   [31] acepack_1.4.1                 htmltools_0.3.6
   [33] prettyunits_1.0.2             tools_3.5.0
   [35] bindrcpp_0.2                  gtable_0.2.0
   [37] glue_1.2.0                    GenomeInfoDbData_1.1.0
   [39] reshape2_1.4.3                dplyr_0.7.4
   

[Bioc-devel] gwaswloc class broken

2018-02-20 Thread Robert Castelo

hi,

the 'gwasloc' class from the gwascat package seems to be broken in 
devel, i suspect due to recent changes in the 'GRanges' class or some 
other class upstream, because the definition of the 'gwasloc' class in 
gwascat/R/classes.R is:


setClass("gwaswloc", representation(extractDate="character"),
   contains="GRanges")

i paste below a minimal example, traceback and (very long) corresponding 
session info. i'm using it in the vignette of GenomicScores, so it's not 
so crucial but it would be nice to have it working again. thanks!!


library(gwascat)

data(ebicat37)

class(ebicat37)
[1] "gwaswloc"
attr(,"package")
[1] "gwascat"

ebicat37[1]
Error in updateObject(x, check = FALSE) :
  no slot of name "elementType" for this object of class "gwaswloc"
9: updateObject(x, check = FALSE)
8: updateObject(x, check = FALSE)
7: extractROWS(x, i)
6: extractROWS(x, i)
5: subset_along_ROWS(x, i, , ..., drop = drop)
4: .nextMethod(x = x, i = i)
3: callNextMethod()
2: ebicat37[1]
1: ebicat37[1]
sessionInfo()
R Under development (unstable) (2017-10-30 r73642)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF8LC_COLLATE=en_US.UTF8
 [5] LC_MONETARY=en_US.UTF8LC_MESSAGES=en_US.UTF8
 [7] LC_PAPER=en_US.UTF8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] gwascat_2.11.1
 [2] Homo.sapiens_1.3.1
 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [4] org.Hs.eg.db_3.5.0
 [5] GO.db_3.5.0
 [6] OrganismDbi_1.21.1
 [7] GenomicFeatures_1.31.10
 [8] GenomicRanges_1.31.22
 [9] GenomeInfoDb_1.15.5
[10] AnnotationDbi_1.41.4
[11] IRanges_2.13.26
[12] S4Vectors_0.17.33
[13] Biobase_2.39.2
[14] BiocGenerics_0.25.3
[15] colorout_1.1-3

loaded via a namespace (and not attached):
  [1] ggbeeswarm_0.6.0  colorspace_1.3-2
  [3] biovizBase_1.27.1 htmlTable_1.11.2
  [5] XVector_0.19.8base64enc_0.1-3
  [7] dichromat_2.0-0   rstudioapi_0.7
  [9] bit64_0.9-7   interactiveDisplayBase_1.17.0
 [11] codetools_0.2-15  splines_3.5.0
 [13] snpStats_1.29.1   ggbio_1.27.1
 [15] doParallel_1.0.11 knitr_1.19
 [17] Formula_1.2-2 jsonlite_1.5
 [19] gQTLBase_1.11.0   Rsamtools_1.31.3
 [21] cluster_2.0.6 graph_1.57.1
 [23] shiny_1.0.5   compiler_3.5.0
 [25] httr_1.3.1backports_1.1.2
 [27] assertthat_0.2.0  Matrix_1.2-12
 [29] lazyeval_0.2.1limma_3.35.11
 [31] acepack_1.4.1 htmltools_0.3.6
 [33] prettyunits_1.0.2 tools_3.5.0
 [35] bindrcpp_0.2  gtable_0.2.0
 [37] glue_1.2.0GenomeInfoDbData_1.1.0
 [39] reshape2_1.4.3dplyr_0.7.4
 [41] fastmatch_1.1-0   Rcpp_0.12.15
 [43] Biostrings_2.47.9 nlme_3.1-135.5
 [45] rtracklayer_1.39.9iterators_1.0.9
 [47] ffbase_0.12.3 stringr_1.3.0
 [49] mime_0.5  ensembldb_2.3.11
 [51] XML_3.98-1.10 AnnotationHub_2.11.2
 [53] zlibbioc_1.25.0   scales_0.5.0
 [55] BSgenome_1.47.5   VariantAnnotation_1.25.12
 [57] BiocInstaller_1.29.4  ProtGenerics_1.11.0
 [59] SummarizedExperiment_1.9.14   RBGL_1.55.0
 [61] AnnotationFilter_1.3.2RColorBrewer_1.1-2
 [63] BBmisc_1.11   yaml_2.1.16
 [65] curl_3.1  memoise_1.1.0
 [67] gridExtra_2.3 ggplot2_2.2.1
 [69] rpart_4.1-12  biomaRt_2.35.10
 [71] latticeExtra_0.6-28   reshape_0.8.7
 [73] stringi_1.1.6 RSQLite_2.0
 [75] foreach_1.4.4 RMySQL_0.10.13
 [77] checkmate_1.8.5   BiocParallel_1.13.1
 [79] rlang_0.1.6   pkgconfig_2.0.1
 [81] BatchJobs_1.7 GenomicFiles_1.15.2
 [83] matrixStats_0.53.1bitops_1.0-6
 [85] lattice_0.20-35   purrr_0.2.4
 [87] bindr_0.1 GenomicAlignments_1.15.12
 [89] htmlwidgets_1.0   bit_1.1-12
 [91] GGally_1.3.2  plyr_1.8.4
 [93] magrittr_1.5  sendmailR_1.2-1
 [95] R6_2.2.2  Hmisc_4.1-1
 [97] erma_0.11.6   DelayedArray_0.5.20
 [99] DBI_0.7   foreign_0.8-70
[101] pillar_1.1.0  mgcv_1.8-23
[103] nnet_7.3-12   survival_2.41-3
[105] RCurl_1.95-4.10   tibble_1.4.2
[107] plotly_4.7.1  progress_1.1.2
[109] grid_3.5.0data.table_1.10.4-3
[111] blob_1.1.0

Re: [Bioc-devel] R version check in BiocChech

2018-02-20 Thread Alexey Sergushichev
> It _is_ the developer's choice.  But a developer of packages for the
Bioconductor
> project commits to using R-devel during certain pre-release phases,
depending
> on proximity in time to a point release of R.  (See
http://bioconductor.org/developers/how-to/useDevel/)
> for full details.)  BiocCheck verifies that this commitment is met.

No, BiocCheck doesn't verify this, it just checks for presence of
dependence on R >= 3.5. It actually doesn't check, whether I have installed
it on my computer at all; or how often I'm updating R-devel and test my
package against it; or whether I do some manual tests, as unit tests are
running regularly by BioConductor automatically.

--
Alexey



On Mon, Feb 19, 2018 at 9:03 PM, Vincent Carey 
wrote:

>
>
> On Mon, Feb 19, 2018 at 11:27 AM, Alexey Sergushichev  > wrote:
>
>> Kevin,
>>
>> > It does not request users to make R-devel a _requirement_ of their
>> package.
>>
>> Sadly it does for new packages. New packages submitted to Bioconductor 3.7
>> are _required_ to have R >= 3.5 dependency, otherwist BiocCheck will
>> result
>> in a warning (
>> https://github.com/Bioconductor/BiocCheck/blob/be9cd6e36d95f
>> 8bf873b52427d2a97fce6fbb9b9/R/checks.R#L23)
>> and warnings aren't allowed for new package submission.
>>
>> > Here, I think the decision here boils down to how far back in terms of R
>> versions the developer is willing to support the package. I suppose one
>> could state R≥2.3 if they're confident about it.
>>
>> That's the problem: this is true for packages already in Bioconductor, but
>> it's not ture for the new package submissions.
>>
>> Aaron,
>>
>> > Personally, I haven't found it to be particularly difficult to update R,
>> > or to run R-devel in parallel with R 3.4, even without root privileges.
>>
>> I find it much harder for a normal user to install R-devel (and update it
>> properly, because it's a development version) and running
>> 'devtools::install_github("blabla/my_package")'.
>>
>> > I think many people underappreciate the benefits of moving to the latest
>> > version of R.
>>
>> Don't you think it should be a developer's choice whether to use such new
>> features or ignore them and have a potentially bigger audience?
>>
>
> It _is_ the developer's choice.  But a developer of packages for the
> Bioconductor
> project commits to using R-devel during certain pre-release phases,
> depending
> on proximity in time to a point release of R.  (See
> http://bioconductor.org/developers/how-to/useDevel/)
> for full details.)  BiocCheck verifies that this commitment is met.
>
>
>>
>> > Enforcing version consistency avoids heartache during release and
>> > debugging.
>>
>> But it's a developer's heartache. As I said, it even can't be attributed
>> to
>> Bioconductor at all, as it's not possible to install the package from
>> bioc-devel, unless you have the corresponding R version.
>>
>>
>> --
>> Alexey
>>
>>
>>
>> On Mon, Feb 19, 2018 at 6:38 PM, Aaron Lun  wrote:
>>
>> > I'll just throw in my two cents here.
>> >
>> > I think many people underappreciate the benefits of moving to the latest
>> > version of R. If you inspect the R-devel NEWS file, there's a couple of
>> > nice fixes/features that a developer might want to take advantage of:
>> >
>> > - sum() doesn't give NAs upon integer overflow anymore.
>> > - New ...elt(n) and ...length() functions for dealing with ellipses.
>> > - ALTREP support for 1:n sequences (wow!)
>> > - zero length subassignment in a non-zero index fails correctly.
>> >
>> > The previous 3.4.0 release also added support for more DLLs being loaded
>> > at once, which was otherwise causing headaches in workflows. And 3.4.2
>> > had a bug fix to LAPACK, which did result in a few user-level changes in
>> > some packages like edgeR. So there are considerable differences between
>> > the versions of R, especially if one is a package developer.
>> >
>> > Enforcing version consistency avoids heartache during release and
>> > debugging. There's a choice between users getting annoyed about having
>> > to update R, and then updating R, and everything working as a result; or
>> > everyone (developers/users) wasting some time figuring out whether a bug
>> > in a package is due to the code in the package itself or the version of
>> > R. The brief annoyance in the first option is better than the chronic
>> > grief of the second option, especially given that the solution to the
>> > problem in the second option would be to update R anyway.
>> >
>> > Personally, I haven't found it to be particularly difficult to update R,
>> > or to run R-devel in parallel with R 3.4, even without root privileges.
>> >
>> > -Aaron
>> >
>> > On 19/02/18 14:55, Kevin RUE wrote:
>> > > Hi Alexey,
>> > >
>> > > I do agree with you that there is no harm in testing against other
>> > version
>> > > of R. In a way, that is even good practice, considering that many HPC
>> > users
>> > > do not always