Hi,

I agree with Laurent that we can't really play the semantic and concept
police. It's the responsibility of package authors to decide whether
it's appropriate or not to call "normalization" that particular
transformation they're implementing.

However I hope that we all agree on the following rule regarding the
generics that make it into BiocGenerics:

  If foo() is a generic function defined in BiocGenerics, no
  BioC package should redefine the function (either as a generic
  or an ordinary function). It can only define methods for it,
  or move away and use a different name for this functionality.

Does that sound reasonable? Otherwise that would kind of defeat the
purpose of having the BiocGenerics package in the 1st place.

To me, having 10 BioC packages defining a normalize() function is far
from being ideal. I think having it defined in BiocGenerics would
improve things a little bit. Also one potential positive side effect
I see is that it would give an opportunity to the authors of those
10 packages to reconsider if they still want to ride the normalize()
poney or not. Maybe some of them won't and they'll pick up another
name. Not something we can really decide for them...

H.


On 02/20/2013 09:47 AM, Laurent Gautier wrote:
On 2013-02-20 17:32, Schalkwyk, Leonard wrote:

Is this not just an indication that normalize is now a poor choice of
a function name?

If the package authors called the functions "normalize", this means either:
1- at least some of the package authors have named a function performing
an action that is inappropriately described as "normalize"
2- all functions "normalize" do perform an action that can be described
with that verb

Without more details, I'd vote for 2.

(more below)


LEo

On 20 Feb 2013, at 16:14, Wolfgang Huber wrote:

Hi

is it clear that all these different functions (methods) share
similar semantics and enough (conceptually) of their interface?

Playing the semantic and concept police would come after defining things
like ontologies of data processing; I am not sure this should be a
priority.
I'd see working out a minimal common signature that keeps everyone going
with a minimal fuss come first.


Wouldn't the implication be that preemptively every possible string
of characters should already be defined as a generic function in
BiocGenerics?

No. Otherwise this would probably also mean that R's S4 system should in
fact define all possible strings as generics, which by extension would
also mean that generic functions do not need to be explicitly declared:
since all possible generics would be declared, it is more practical to
implicitly assume any given function has already generic declared. S4
has notions about implicit generic functions; a starting point is the
man page for setGeneric().




    Best wishes
    Wolfgang

Il giorno Feb 20, 2013, alle ore 11:04 AM, Laurent Gatto
<lg...@cam.ac.uk> ha scritto:

On 19 February 2013 22:44, Hervé Pagès <hpa...@fhcrc.org> wrote:
Hi Laurent, and maintainers of packages with a normalize() function,


On 02/15/2013 04:28 AM, Laurent Gatto wrote:
A quick (and incomplete) manual search using
http://search.bioconductor.jp/ suggest the following usage of
normalize:

As a function:
xps::normalize
codelink::normalize
EBImage::normalize
diffGeneAnalysis::normalize

Defining a generic and methods:
oligo::normalize
flowCore::normalize
MSnbase::normalize
isobar::normalize

and

several normalize\.[*+] functions

Would it be reasonable to add a normalize generic definition to
BiocGenerics?  The generic definitions in the above packages differ,
however.

Sounds good to me.

However, since the various normalize() functions have different
signatures, we need to agree on what the signature of the generic
in BiocGenerics should be.

Here is a summary of the situation:

** xps package: normalize() is an ordinary function with the
    following arg list:

      normalize(xps.data, filename=character(0), filedir=getwd(),
                tmpdir="", update=FALSE, select="all", method="mean",
                option="transcript:all", logbase="0", exonlevel="",
                refindex=0, refmethod="mean", params=list(0.02, 0),
                add.data=TRUE, verbose=TRUE)

    The package also defines normalize.constant(), normalize.lowess(),
    normalize.quantiles(), normalize.supsmu(), which are also ordinary
    functions (not S3 methods), and have similar but slightly
    different arg lists.

** codelink package: Ordinary function with the following args:

      normalize(object, method="quantiles", log.it=TRUE,
                preserve=FALSE, weights=NULL, verbose=FALSE)

** EBImage package: Ordinary function with the following args:

      normalize(x, separate=TRUE, ft=c(0, 1))

** diffGeneAnalysis package: Ordinary function with the following
    args:

      normalize(rawdata, numSlides, ctrl, expm, ctrlbg=0.30,
                expmbg=0.30)

** deepSNV package: S4 generic with the following args:

      normalize(test, control, ...)

** isobar package: S4 generic with the following args:

      normalize(x, f=median, target="intensity", exclude.protein=NULL,
                   use.protein=NULL, f.doapply=TRUE, log=TRUE,
                   channels=NULL, na.rm=FALSE, per.file=TRUE, ...)

** affy package: S4 generic with the following args:

      normalize(object, ...)

** flowCore package: S4 generic with the following args:

      normalize(data, x, ...)

** MSnbase package: S4 generic with the following args:

      normalize(object, method, ...)

** oligo package: S4 generic with the following args:

      normalize(object, method=normalizationMethods(),
                copy=TRUE, subset=NULL,
                target='core', verbose=TRUE, ...)

So it looks like the greatest common factor is normalize(x, ...).
Not too surprising for a generic that covers such a wide range of
related but slightly different concepts / algorithms.

One technical difficulty though is that, even though almost all these
functions seem to take an S4 object as their 1st arg, some of them
don't:

(a) For EBImage::normalize(), 'x' can be an ordinary array in
     addition to an Image object.

(b) For diffGeneAnalysis::normalize(), 'rawdata' is an ordinary
     matrix.

(c) For deepSNV::normalize(), 'test' can be an ordinary matrix
     in addition to a deepSNV object.

(d) For oligo::normalize(), 'object' can be an ordinary matrix
     in addition to a FeatureSet object.

So how can we disambiguate when the first arg is an ordinary matrix?
IMO normalize() should fail in that case i.e. no package should define
methods for ordinary arrays or matrices. Not ideal but better than the
current situation where what is returned depends on which package was
loaded last.

I could put normalize(x, ...) in BiocGenerics if nobody objects, but
that's all. I don't have time to fix the 10 packages that this change
will affect. However, I'd rather wait the beginning of the Bioc 2.13
devel cycle (April) for such a change. For some packages like
diffGeneAnalysis (which doesn't use S4 at all), that will probably
require a significant amount of changes since it will need to pass
the data to normalize in an S4 container instead of an ordinary
matrix.

Comments and suggestions are welcome.
Fine by me.

Laurent

Thanks,
H.

Best wishes,

Laurent

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to