Re: [Bioc-devel] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages

2015-01-03 Thread Peter Haverty
There are few other changes in there too, but profiling did identify low
hanging fruit like the sapplys. From there I have found a long list of
refactoring opportunities that offer ~2% improvements. These may not be
worth the risk of reversions, however. I'll be putting together a patch
proposal with the easy changes in the next few days.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Jan 2, 2015 at 1:58 AM, Michael Lawrence 
wrote:

> Pete Haverty is the one working on this. He has almost cut loading time in
> half by just changing some sapply and lappy calls to vapply calls. Most
> likely because allocating all of those list elements is expensive, and
> Martin's memory parameters also help with that.
>
> On Wed, Dec 31, 2014 at 10:30 PM, Herv� Pag�s 
> wrote:
>
> > Hi Gordon,
> >
> > My guess is that it has to do with how many symbols get exported.
> > For example on my machine, doing library(limma) in a fresh session
> > takes 0.261s and triggers export of 292 symbols (as reported by
> > ls(..., all.names=TRUE)). Doing library(GenomicRanges) in a fresh
> > session takes 2.724s and triggers export of 1581 symbols (counting
> > the symbols exported by all the packages that get loaded).
> >
> > Michael it's great to hear that somebody is working on speeding up
> > the code in charge of this.
> >
> > Happy New Year everybody!
> > H.
> >
> >
> >
> > On 12/31/2014 06:07 PM, Gordon K Smyth wrote:
> >
> >> Hi Michael,
> >>
> >> What aspect of the methods package causes the slowness?
> >>
> >> There are many packages (limma for one) that depend on methods but load
> >> quickly.
> >>
> >> Regards
> >> Gordon
> >>
> >>
> >>  Date: Wed, 31 Dec 2014 09:17:01 -0800
> >>> From: Michael Lawrence 
> >>> To: Peng Yu 
> >>> Cc: Bioconductor Package Maintainer ,
> >>> "bioc-devel@r-project.org" 
> >>> Subject: Re: [Bioc-devel] [devteam-bioc] Use Imports instead of
> >>> Depends in the DESCRIPTION files of bioconductor packages.
> >>>
> >>> The slowness is due to the methods package. We're working on it.
> >>>
> >>> Michael
> >>>
> >>> On Wed, Dec 31, 2014 at 8:47 AM, Peng Yu  wrote:
> >>>
> >>>  On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan <
> mtmor...@fredhutch.org>
>  wrote:
> 
> > On 12/24/2014 07:31 PM, Maintainer wrote:
> >
> >>
> >> Hi,
> >>
> >> Many bioconductor packages Depends on other packages but not Imports
> >> other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
> >> usually preferred to Depends.
> >>
> >>
> >>
> >>  http://stackoverflow.com/questions/8637993/better-
>  explanation-of-when-to-use-imports-depends
> 
>   http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
> >>
> >> Could the unnecessary Depends be forced to be replaced by Imports?
> >> This should improve the package load time significantly.
> >>
> >
> >
> > R package symbols and other objects are collated at build time into a
> >
>  'name
> 
> > space'. When used,
> >
> > - Import: loads the name space from disk.
> > - Depends: loads the name space from disk, and attaches it to the
> >
>  search()
> 
> > path.
> >
> > Attaching is very inexpensive compared to loading, so there is no
> speed
> > improvement gained by Import'ing instead of Depend'ing.
> >
> 
>  Yes. For example, changing Depends to Imports does not improve the
>  package load time much.
> 
>  But loading a package in 4 sec seems to be too long.
> 
>    system.time(suppressPackageStartupMessages(library(MBASED)))
> >
> user  system elapsed
>    4.404   0.100   4.553
> 
>  For example, it only takes 10% of the time to load ggplot2. It seems
>  that many bioconductor packages have similar problems.
> 
>   system.time(suppressPackageStartupMessages(library(ggplot2)))
> >
> user  system elapsed
>    0.394   0.036   0.460
> 
>   The main reason to Depend: on a package is because the symbols
> > defined by
> > the package are needed by the end-user. Import'ing a package is
> >
>  appropriate
> 
> > when the package provides functionality only relevant to the package
> >
>  author.
> 
>  What causes the load time to be too long? Is it because exporting too
>  many functions from all dependent packages to the global namespace?
> 
>   There are likely to be specific packages that mis-use Depends;
> packages
> >
>  such
> 
> > as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
> > functions that are useful to the end user.
> >
> > Maintainers are certainly encouraged to think carefully about adding
> > packages providing functionality irrelevant to the end-user to the
> >
>  Depends:
> 
> > field. The codetoolsBioC package (available from svn, 

Re: [Bioc-devel] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages

2015-01-02 Thread Michael Lawrence
Pete Haverty is the one working on this. He has almost cut loading time in
half by just changing some sapply and lappy calls to vapply calls. Most
likely because allocating all of those list elements is expensive, and
Martin's memory parameters also help with that.

On Wed, Dec 31, 2014 at 10:30 PM, Hervé Pagès  wrote:

> Hi Gordon,
>
> My guess is that it has to do with how many symbols get exported.
> For example on my machine, doing library(limma) in a fresh session
> takes 0.261s and triggers export of 292 symbols (as reported by
> ls(..., all.names=TRUE)). Doing library(GenomicRanges) in a fresh
> session takes 2.724s and triggers export of 1581 symbols (counting
> the symbols exported by all the packages that get loaded).
>
> Michael it's great to hear that somebody is working on speeding up
> the code in charge of this.
>
> Happy New Year everybody!
> H.
>
>
>
> On 12/31/2014 06:07 PM, Gordon K Smyth wrote:
>
>> Hi Michael,
>>
>> What aspect of the methods package causes the slowness?
>>
>> There are many packages (limma for one) that depend on methods but load
>> quickly.
>>
>> Regards
>> Gordon
>>
>>
>>  Date: Wed, 31 Dec 2014 09:17:01 -0800
>>> From: Michael Lawrence 
>>> To: Peng Yu 
>>> Cc: Bioconductor Package Maintainer ,
>>> "bioc-devel@r-project.org" 
>>> Subject: Re: [Bioc-devel] [devteam-bioc] Use Imports instead of
>>> Depends in the DESCRIPTION files of bioconductor packages.
>>>
>>> The slowness is due to the methods package. We're working on it.
>>>
>>> Michael
>>>
>>> On Wed, Dec 31, 2014 at 8:47 AM, Peng Yu  wrote:
>>>
>>>  On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan 
 wrote:

> On 12/24/2014 07:31 PM, Maintainer wrote:
>
>>
>> Hi,
>>
>> Many bioconductor packages Depends on other packages but not Imports
>> other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
>> usually preferred to Depends.
>>
>>
>>
>>  http://stackoverflow.com/questions/8637993/better-
 explanation-of-when-to-use-imports-depends

  http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
>>
>> Could the unnecessary Depends be forced to be replaced by Imports?
>> This should improve the package load time significantly.
>>
>
>
> R package symbols and other objects are collated at build time into a
>
 'name

> space'. When used,
>
> - Import: loads the name space from disk.
> - Depends: loads the name space from disk, and attaches it to the
>
 search()

> path.
>
> Attaching is very inexpensive compared to loading, so there is no speed
> improvement gained by Import'ing instead of Depend'ing.
>

 Yes. For example, changing Depends to Imports does not improve the
 package load time much.

 But loading a package in 4 sec seems to be too long.

   system.time(suppressPackageStartupMessages(library(MBASED)))
>
user  system elapsed
   4.404   0.100   4.553

 For example, it only takes 10% of the time to load ggplot2. It seems
 that many bioconductor packages have similar problems.

  system.time(suppressPackageStartupMessages(library(ggplot2)))
>
user  system elapsed
   0.394   0.036   0.460

  The main reason to Depend: on a package is because the symbols
> defined by
> the package are needed by the end-user. Import'ing a package is
>
 appropriate

> when the package provides functionality only relevant to the package
>
 author.

 What causes the load time to be too long? Is it because exporting too
 many functions from all dependent packages to the global namespace?

  There are likely to be specific packages that mis-use Depends; packages
>
 such

> as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
> functions that are useful to the end user.
>
> Maintainers are certainly encouraged to think carefully about adding
> packages providing functionality irrelevant to the end-user to the
>
 Depends:

> field. The codetoolsBioC package (available from svn, see
> http://bioconductor.org/developers/how-to/source-control/) provides
> some
> mostly reliable hints to package authors about correctly formulating a
> NAMESPACE file to facilitate using Imports: instead of Depends:.
>
> General questions about Bioconductor packages should be addressed to
> the
> support forum https://support.bioconductor.org.
>
> Questions about Bioconductor development (such as this) should be
>
 addressed

> to the bioc-devel mailing list (subscription required)
> https://stat.ethz.ch/mailman/listinfo/bioc-devel.
>
> I have cc'd the bioc-devel mailing list; I hope that is ok.
>



 --
 Regards,
 Peng

>>>
>> ___

Re: [Bioc-devel] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages

2014-12-31 Thread Hervé Pagès

Hi Gordon,

My guess is that it has to do with how many symbols get exported.
For example on my machine, doing library(limma) in a fresh session
takes 0.261s and triggers export of 292 symbols (as reported by
ls(..., all.names=TRUE)). Doing library(GenomicRanges) in a fresh
session takes 2.724s and triggers export of 1581 symbols (counting
the symbols exported by all the packages that get loaded).

Michael it's great to hear that somebody is working on speeding up
the code in charge of this.

Happy New Year everybody!
H.


On 12/31/2014 06:07 PM, Gordon K Smyth wrote:

Hi Michael,

What aspect of the methods package causes the slowness?

There are many packages (limma for one) that depend on methods but load
quickly.

Regards
Gordon



Date: Wed, 31 Dec 2014 09:17:01 -0800
From: Michael Lawrence 
To: Peng Yu 
Cc: Bioconductor Package Maintainer ,
"bioc-devel@r-project.org" 
Subject: Re: [Bioc-devel] [devteam-bioc] Use Imports instead of
Depends in the DESCRIPTION files of bioconductor packages.

The slowness is due to the methods package. We're working on it.

Michael

On Wed, Dec 31, 2014 at 8:47 AM, Peng Yu  wrote:


On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan 
wrote:

On 12/24/2014 07:31 PM, Maintainer wrote:


Hi,

Many bioconductor packages Depends on other packages but not Imports
other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
usually preferred to Depends.




http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends


http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/

Could the unnecessary Depends be forced to be replaced by Imports?
This should improve the package load time significantly.



R package symbols and other objects are collated at build time into a

'name

space'. When used,

- Import: loads the name space from disk.
- Depends: loads the name space from disk, and attaches it to the

search()

path.

Attaching is very inexpensive compared to loading, so there is no speed
improvement gained by Import'ing instead of Depend'ing.


Yes. For example, changing Depends to Imports does not improve the
package load time much.

But loading a package in 4 sec seems to be too long.


 system.time(suppressPackageStartupMessages(library(MBASED)))

   user  system elapsed
  4.404   0.100   4.553

For example, it only takes 10% of the time to load ggplot2. It seems
that many bioconductor packages have similar problems.


system.time(suppressPackageStartupMessages(library(ggplot2)))

   user  system elapsed
  0.394   0.036   0.460


The main reason to Depend: on a package is because the symbols
defined by
the package are needed by the end-user. Import'ing a package is

appropriate

when the package provides functionality only relevant to the package

author.

What causes the load time to be too long? Is it because exporting too
many functions from all dependent packages to the global namespace?


There are likely to be specific packages that mis-use Depends; packages

such

as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
functions that are useful to the end user.

Maintainers are certainly encouraged to think carefully about adding
packages providing functionality irrelevant to the end-user to the

Depends:

field. The codetoolsBioC package (available from svn, see
http://bioconductor.org/developers/how-to/source-control/) provides
some
mostly reliable hints to package authors about correctly formulating a
NAMESPACE file to facilitate using Imports: instead of Depends:.

General questions about Bioconductor packages should be addressed to
the
support forum https://support.bioconductor.org.

Questions about Bioconductor development (such as this) should be

addressed

to the bioc-devel mailing list (subscription required)
https://stat.ethz.ch/mailman/listinfo/bioc-devel.

I have cc'd the bioc-devel mailing list; I hope that is ok.




--
Regards,
Peng


__
The information in this email is confidential and inte...{{dropped:20}}


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages

2014-12-31 Thread Gordon K Smyth

Hi Michael,

What aspect of the methods package causes the slowness?

There are many packages (limma for one) that depend on methods but load 
quickly.


Regards
Gordon



Date: Wed, 31 Dec 2014 09:17:01 -0800
From: Michael Lawrence 
To: Peng Yu 
Cc: Bioconductor Package Maintainer ,
"bioc-devel@r-project.org" 
Subject: Re: [Bioc-devel] [devteam-bioc] Use Imports instead of
Depends in the DESCRIPTION files of bioconductor packages.

The slowness is due to the methods package. We're working on it.

Michael

On Wed, Dec 31, 2014 at 8:47 AM, Peng Yu  wrote:


On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan 
wrote:

On 12/24/2014 07:31 PM, Maintainer wrote:


Hi,

Many bioconductor packages Depends on other packages but not Imports
other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
usually preferred to Depends.




http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends

http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/

Could the unnecessary Depends be forced to be replaced by Imports?
This should improve the package load time significantly.



R package symbols and other objects are collated at build time into a

'name

space'. When used,

- Import: loads the name space from disk.
- Depends: loads the name space from disk, and attaches it to the

search()

path.

Attaching is very inexpensive compared to loading, so there is no speed
improvement gained by Import'ing instead of Depend'ing.


Yes. For example, changing Depends to Imports does not improve the
package load time much.

But loading a package in 4 sec seems to be too long.


 system.time(suppressPackageStartupMessages(library(MBASED)))

   user  system elapsed
  4.404   0.100   4.553

For example, it only takes 10% of the time to load ggplot2. It seems
that many bioconductor packages have similar problems.


system.time(suppressPackageStartupMessages(library(ggplot2)))

   user  system elapsed
  0.394   0.036   0.460


The main reason to Depend: on a package is because the symbols defined by
the package are needed by the end-user. Import'ing a package is

appropriate

when the package provides functionality only relevant to the package

author.

What causes the load time to be too long? Is it because exporting too
many functions from all dependent packages to the global namespace?


There are likely to be specific packages that mis-use Depends; packages

such

as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
functions that are useful to the end user.

Maintainers are certainly encouraged to think carefully about adding
packages providing functionality irrelevant to the end-user to the

Depends:

field. The codetoolsBioC package (available from svn, see
http://bioconductor.org/developers/how-to/source-control/) provides some
mostly reliable hints to package authors about correctly formulating a
NAMESPACE file to facilitate using Imports: instead of Depends:.

General questions about Bioconductor packages should be addressed to the
support forum https://support.bioconductor.org.

Questions about Bioconductor development (such as this) should be

addressed

to the bioc-devel mailing list (subscription required)
https://stat.ethz.ch/mailman/listinfo/bioc-devel.

I have cc'd the bioc-devel mailing list; I hope that is ok.




--
Regards,
Peng


__
The information in this email is confidential and intend...{{dropped:4}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel