Re: [Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

2013-08-23 Thread Valerie Obenchain

Hi Michael,

Martin and I have been discussing this. In addition to the fix you 
suggest, what do you think of changing the default to compressed=TRUE 
for the RleList constructor? Rle is the only one of the AtomicLists with 
default FALSE. Was there a reason for this when it was first implemented?


Val



On 08/22/2013 07:34 PM, Maintainer wrote:

Hi,

SimpleLists are slow in this situation, basically because the underlying
seqselect is slow, due to this loop:

 x <- do.call(c, lapply(seq_len(length(ir)), function(i)
window(x,
 start = start(ir)[i], width = width(ir)[i])))

Am I missing something or could this become a simple x[as.integer(ir)]?

In the meantime, using CompressedLists is the way to go. So for an
RleList, you need to pass compress=TRUE to the constructor.


On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong mailto:jianhong...@umassmed.edu>> wrote:

Hi,

When I use big set of GrangesList, I found it become very slow when
metadata contain AtomicList. e.g.

 > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
width=2), someInfo=rep(RleList("*"), 500))
 > grr <- split(grll, 1:500)
 > grl <- as.list(grr)
 > system.time(grl<- grl[500:1])
user  system elapsed
   0   0   0
 > system.time(grr<- grr[500:1])
user  system elapsed
   1.622   0.013   1.635
 > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
width=2))
 > grr <- split(grll, 1:500)
 > grl <- as.list(grr)
 > system.time(grl<- grl[500:1])
user  system elapsed
   0   0   0
 > system.time(grr<- grr[500:1])
user  system elapsed
   0.029   0.001   0.030
 > sessionInfo()
R Under development (unstable) (2013-07-23 r63392)
Platform: x86_64-apple-darwin12.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets
  methods   base

other attached packages:
[1] GenomicRanges_1.13.36 XVector_0.1.0 IRanges_1.19.24
   BiocGenerics_0.7.3

loaded via a namespace (and not attached):
[1] stats4_3.1.0 tools_3.1.0

Is there any method to improve this?

Yours sincerely,

Jianhong Ou

LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





devteam-bioc mailing list
To unsubscribe from this mailing list send a blank email to
devteam-bioc-le...@lists.fhcrc.org
You can also unsubscribe or change your personal options at
https://lists.fhcrc.org/mailman/listinfo/devteam-bioc



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

2013-08-23 Thread Michael Lawrence
On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain wrote:

> Hi Michael,
>
> Martin and I have been discussing this. In addition to the fix you
> suggest, what do you think of changing the default to compressed=TRUE for
> the RleList constructor? Rle is the only one of the AtomicLists with
> default FALSE. Was there a reason for this when it was first implemented?
>
>
I'm guessing Patrick did that because we always used Rles for coverage, and
RleList for per-chromosome coverage. Also, there might be some overhead in
that Rle runs in the unlistData can cross list elements.

About my fix, the only downside would be if the range widths were much
larger than the size of the vector, e.g., a highly compressed Rle, selected
with chromosome-size ranges. Then the as.integer(ir) is big compared to the
data. Otherwise, it's way faster.


Val
>
>
>
>
> On 08/22/2013 07:34 PM, Maintainer wrote:
>
>> Hi,
>>
>> SimpleLists are slow in this situation, basically because the underlying
>> seqselect is slow, due to this loop:
>>
>>  x <- do.call(c, lapply(seq_len(length(ir)), function(i)
>> window(x,
>>  start = start(ir)[i], width = width(ir)[i])))
>>
>> Am I missing something or could this become a simple x[as.integer(ir)]?
>>
>> In the meantime, using CompressedLists is the way to go. So for an
>> RleList, you need to pass compress=TRUE to the constructor.
>>
>>
>> On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong > > wrote:
>>
>> Hi,
>>
>> When I use big set of GrangesList, I found it become very slow when
>> metadata contain AtomicList. e.g.
>>
>>  > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
>> width=2), someInfo=rep(RleList("*"), 500))
>>  > grr <- split(grll, 1:500)
>>  > grl <- as.list(grr)
>>  > system.time(grl<- grl[500:1])
>> user  system elapsed
>>0   0   0
>>  > system.time(grr<- grr[500:1])
>> user  system elapsed
>>1.622   0.013   1.635
>>  > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
>> width=2))
>>  > grr <- split(grll, 1:500)
>>  > grl <- as.list(grr)
>>  > system.time(grl<- grl[500:1])
>> user  system elapsed
>>0   0   0
>>  > system.time(grr<- grr[500:1])
>> user  system elapsed
>>0.029   0.001   0.030
>>  > sessionInfo()
>> R Under development (unstable) (2013-07-23 r63392)
>> Platform: x86_64-apple-darwin12.4.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.**UTF-8/C/en_US.UTF-8/en_US.UTF-**8
>>
>> attached base packages:
>> [1] parallel  stats graphics  grDevices utils datasets
>>   methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.13.36 XVector_0.1.0 IRanges_1.19.24
>>BiocGenerics_0.7.3
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_3.1.0 tools_3.1.0
>>
>> Is there any method to improve this?
>>
>> Yours sincerely,
>>
>> Jianhong Ou
>>
>> LRB 670A
>> Program in Gene Function and Expression
>> 364 Plantation Street Worcester,
>> MA 01605
>>
>>  [[alternative HTML version deleted]]
>>
>> __**_
>> Bioc-devel@r-project.org 
>> 
>> mailing list
>> 
>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel
>>
>>
>>
>>
>> __**__**
>> 
>> devteam-bioc mailing list
>> To unsubscribe from this mailing list send a blank email to
>> devteam-bioc-leave@lists.**fhcrc.org 
>> You can also unsubscribe or change your personal options at
>> https://lists.fhcrc.org/**mailman/listinfo/devteam-bioc
>>
>>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

2013-08-27 Thread Valerie Obenchain

Thanks Jianhong for reporting this.

Changes implemented in IRanges 1.19.27:
- RleList() constructor now has default 'compress=TRUE'.
- seqselect,Vector-method lapply() loop was replaced with direct subset.

New timings:

## generic subset function
fun0 <- function(x) x[500:1]

## GRangesList with RleList as metadata col
grll <- GRanges(seqnames="chr1",
IRanges(start=1:500, width=2),
someInfo=rep(RleList("*"), 500))
grr <- split(grll, 1:500)
> microbenchmark(fun0(grr), times=10)
Unit: milliseconds
  expr  min   lq   median  uq  max neval
 fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.2636710

Median is now 0.031 seconds compared to the previous 1.635.


  > system.time(grr<- grr[500:1])
 user  system elapsed
1.622   0.013   1.635




Valerie


On 08/23/2013 11:17 AM, Michael Lawrence wrote:




On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain mailto:voben...@fhcrc.org>> wrote:

Hi Michael,

Martin and I have been discussing this. In addition to the fix you
suggest, what do you think of changing the default to
compressed=TRUE for the RleList constructor? Rle is the only one of
the AtomicLists with default FALSE. Was there a reason for this when
it was first implemented?


I'm guessing Patrick did that because we always used Rles for coverage,
and RleList for per-chromosome coverage. Also, there might be some
overhead in that Rle runs in the unlistData can cross list elements.

About my fix, the only downside would be if the range widths were much
larger than the size of the vector, e.g., a highly compressed Rle,
selected with chromosome-size ranges. Then the as.integer(ir) is big
compared to the data. Otherwise, it's way faster.


Val




On 08/22/2013 07:34 PM, Maintainer wrote:

Hi,

SimpleLists are slow in this situation, basically because the
underlying
seqselect is slow, due to this loop:

  x <- do.call(c, lapply(seq_len(length(ir)),
function(i)
window(x,
  start = start(ir)[i], width = width(ir)[i])))

Am I missing something or could this become a simple
x[as.integer(ir)]?

In the meantime, using CompressedLists is the way to go. So for an
RleList, you need to pass compress=TRUE to the constructor.


On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
mailto:jianhong...@umassmed.edu>
>> wrote:

 Hi,

 When I use big set of GrangesList, I found it become very
slow when
 metadata contain AtomicList. e.g.

  > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
 width=2), someInfo=rep(RleList("*"), 500))
  > grr <- split(grll, 1:500)
  > grl <- as.list(grr)
  > system.time(grl<- grl[500:1])
 user  system elapsed
0   0   0
  > system.time(grr<- grr[500:1])
 user  system elapsed
1.622   0.013   1.635
  > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
 width=2))
  > grr <- split(grll, 1:500)
  > grl <- as.list(grr)
  > system.time(grl<- grl[500:1])
 user  system elapsed
0   0   0
  > system.time(grr<- grr[500:1])
 user  system elapsed
0.029   0.001   0.030
  > sessionInfo()
 R Under development (unstable) (2013-07-23 r63392)
 Platform: x86_64-apple-darwin12.4.0 (64-bit)

 locale:
 [1]
en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets
   methods   base

 other attached packages:
 [1] GenomicRanges_1.13.36 XVector_0.1.0 IRanges_1.19.24
BiocGenerics_0.7.3

 loaded via a namespace (and not attached):
 [1] stats4_3.1.0 tools_3.1.0

 Is there any method to improve this?

 Yours sincerely,

 Jianhong Ou

 LRB 670A
 Program in Gene Function and Expression
 364 Plantation Street Worcester,
 MA 01605

  [[alternative HTML version deleted]]

 _
Bioc-devel@r-project.org 
> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel







Re: [Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

2013-08-27 Thread Ou, Jianhong
Dear Valerie,

Great improvement. Thanks a lot for your work. I am greatly appreciated
for this.

Yours sincerely,

Jianhong Ou

LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605




On 8/27/13 4:49 PM, "Valerie Obenchain"  wrote:

>Thanks Jianhong for reporting this.
>
>Changes implemented in IRanges 1.19.27:
>- RleList() constructor now has default 'compress=TRUE'.
>- seqselect,Vector-method lapply() loop was replaced with direct subset.
>
>New timings:
>
>## generic subset function
>fun0 <- function(x) x[500:1]
>
>## GRangesList with RleList as metadata col
>grll <- GRanges(seqnames="chr1",
> IRanges(start=1:500, width=2),
> someInfo=rep(RleList("*"), 500))
>grr <- split(grll, 1:500)
> > microbenchmark(fun0(grr), times=10)
>Unit: milliseconds
>   expr  min   lq   median  uq  max neval
>  fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.2636710
>
>Median is now 0.031 seconds compared to the previous 1.635.
>
>>>   > system.time(grr<- grr[500:1])
>>>  user  system elapsed
>>> 1.622   0.013   1.635
>
>
>
>Valerie
>
>
>On 08/23/2013 11:17 AM, Michael Lawrence wrote:
>>
>>
>>
>> On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain > > wrote:
>>
>> Hi Michael,
>>
>> Martin and I have been discussing this. In addition to the fix you
>> suggest, what do you think of changing the default to
>> compressed=TRUE for the RleList constructor? Rle is the only one of
>> the AtomicLists with default FALSE. Was there a reason for this when
>> it was first implemented?
>>
>>
>> I'm guessing Patrick did that because we always used Rles for coverage,
>> and RleList for per-chromosome coverage. Also, there might be some
>> overhead in that Rle runs in the unlistData can cross list elements.
>>
>> About my fix, the only downside would be if the range widths were much
>> larger than the size of the vector, e.g., a highly compressed Rle,
>> selected with chromosome-size ranges. Then the as.integer(ir) is big
>> compared to the data. Otherwise, it's way faster.
>>
>>
>> Val
>>
>>
>>
>>
>> On 08/22/2013 07:34 PM, Maintainer wrote:
>>
>> Hi,
>>
>> SimpleLists are slow in this situation, basically because the
>> underlying
>> seqselect is slow, due to this loop:
>>
>>   x <- do.call(c, lapply(seq_len(length(ir)),
>> function(i)
>> window(x,
>>   start = start(ir)[i], width = width(ir)[i])))
>>
>> Am I missing something or could this become a simple
>> x[as.integer(ir)]?
>>
>> In the meantime, using CompressedLists is the way to go. So for
>>an
>> RleList, you need to pass compress=TRUE to the constructor.
>>
>>
>> On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
>> mailto:jianhong...@umassmed.edu>
>> > >> wrote:
>>
>>  Hi,
>>
>>  When I use big set of GrangesList, I found it become very
>> slow when
>>  metadata contain AtomicList. e.g.
>>
>>   > grll <- GRanges(seqnames="chr1",
>>ranges=IRanges(start=1:500,
>>  width=2), someInfo=rep(RleList("*"), 500))
>>   > grr <- split(grll, 1:500)
>>   > grl <- as.list(grr)
>>   > system.time(grl<- grl[500:1])
>>  user  system elapsed
>> 0   0   0
>>   > system.time(grr<- grr[500:1])
>>  user  system elapsed
>> 1.622   0.013   1.635
>>   > grll <- GRanges(seqnames="chr1",
>>ranges=IRanges(start=1:500,
>>  width=2))
>>   > grr <- split(grll, 1:500)
>>   > grl <- as.list(grr)
>>   > system.time(grl<- grl[500:1])
>>  user  system elapsed
>> 0   0   0
>>   > system.time(grr<- grr[500:1])
>>  user  system elapsed
>> 0.029   0.001   0.030
>>   > sessionInfo()
>>  R Under development (unstable) (2013-07-23 r63392)
>>  Platform: x86_64-apple-darwin12.4.0 (64-bit)
>>
>>  locale:
>>  [1]
>> 
>>en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>>
>>  attached base packages:
>>  [1] parallel  stats graphics  grDevices utils
>>datasets
>>methods   base
>>
>>  other attached packages:
>>  [1] GenomicRanges_1.13.36 XVector_0.1.0
>>IRanges_1.19.24
>> BiocGenerics_0.7.3
>>
>>  loaded via a namespace (and not attached):
>>  [1] stats4_3.1.0 tools_3.1.0
>>
>>  Is there any method to improve this?
>>
>>  Yours sincerely,
>>
>>  Jianhong Ou
>>
>