[Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-04-30 Thread Ryan C. Thompson

Hi all,

I recently asked about ways to do non-standard read counting in 
summarizeOverlaps, and Martin Morgan directed me toward writing a custom 
function to pass as the "mode" parameter. I have now written the custom 
modes that I require for counting my ChIP-Seq reads, and I figured I 
would contribute them back in case there was interest in merging them.


The main three functions are "ResizeReads", "FivePrimeEnd", and 
"ThreePrimeEnd". The first allows you to directionally extend or shorten 
each read to the effective fragment length for the purpose of 
determining overlaps. For example, if each read represents the 5-prime 
end of a 150-bp fragment and you want to count these fragments using the 
Union mode, you could do:


summarizeOverlaps(mode=ResizeReads(mode=Union, width=150, 
fix="start"), ...)


Note that ResizeReads takes a mode argument. It returns a function (with 
a closure storing the passed arguments) that performs the resizing (by 
coercing reads to GRanges and calling "resize") and then dispatches to 
the provided mode. (It probably needs to add a call to "match.fun" 
somewhere.)


The other two functions are designed to count overlaps of only the read 
ends. They are implemented internally using "ResizeReads" with width=1.


The other three counting modes (the "*ExtraArgs" functions) are meant to 
be used to easily construct new counting modes. Each function takes any 
number of arguments and returns a counting mode that works like the 
standard one of the same name, except that those arguments are passed as 
extra args to "findOverlaps". For example, you could do Union mode with 
a requirement for a minimum overlap of 10:


summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)

Note that these can be combined or "nested". For instance, you might 
want a fragment length of 150 and a min overlap of 10:


myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10), 
width=150, fix="start")

summarizeOverlaps(mode=myCountingMode, ...)

Anyway, if you think any of these are worthy of inclusion for 
BioConductor, feel free to add them in. I'm not so sure about the 
"nesting" idea, though. Functions that return functions (with states 
saved in closures, which are then passed into another function) are 
confusing for people who are not programmers by trade. Maybe 
summarizeOverlaps should just gain an argument to pass args to findOverlaps.


-Ryan Thompson

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-04-30 Thread Valerie Obenchain

Hi Ryan,

These sound like great contributions. I didn't get an attachment - did 
you send one?


Thanks.
Valerie

On 04/30/2014 01:06 PM, Ryan C. Thompson wrote:

Hi all,

I recently asked about ways to do non-standard read counting in
summarizeOverlaps, and Martin Morgan directed me toward writing a custom
function to pass as the "mode" parameter. I have now written the custom
modes that I require for counting my ChIP-Seq reads, and I figured I
would contribute them back in case there was interest in merging them.

The main three functions are "ResizeReads", "FivePrimeEnd", and
"ThreePrimeEnd". The first allows you to directionally extend or shorten
each read to the effective fragment length for the purpose of
determining overlaps. For example, if each read represents the 5-prime
end of a 150-bp fragment and you want to count these fragments using the
Union mode, you could do:

 summarizeOverlaps(mode=ResizeReads(mode=Union, width=150,
fix="start"), ...)

Note that ResizeReads takes a mode argument. It returns a function (with
a closure storing the passed arguments) that performs the resizing (by
coercing reads to GRanges and calling "resize") and then dispatches to
the provided mode. (It probably needs to add a call to "match.fun"
somewhere.)

The other two functions are designed to count overlaps of only the read
ends. They are implemented internally using "ResizeReads" with width=1.

The other three counting modes (the "*ExtraArgs" functions) are meant to
be used to easily construct new counting modes. Each function takes any
number of arguments and returns a counting mode that works like the
standard one of the same name, except that those arguments are passed as
extra args to "findOverlaps". For example, you could do Union mode with
a requirement for a minimum overlap of 10:

 summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)

Note that these can be combined or "nested". For instance, you might
want a fragment length of 150 and a min overlap of 10:

 myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10),
width=150, fix="start")
 summarizeOverlaps(mode=myCountingMode, ...)

Anyway, if you think any of these are worthy of inclusion for
BioConductor, feel free to add them in. I'm not so sure about the
"nesting" idea, though. Functions that return functions (with states
saved in closures, which are then passed into another function) are
confusing for people who are not programmers by trade. Maybe
summarizeOverlaps should just gain an argument to pass args to
findOverlaps.

-Ryan Thompson

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-04-30 Thread Ryan C. Thompson

No, I forgot to attach the file. Here is the link:

https://www.dropbox.com/s/7qghtksl3mbvlsl/counting-modes.R

On Wed 30 Apr 2014 02:18:28 PM PDT, Valerie Obenchain wrote:

Hi Ryan,

These sound like great contributions. I didn't get an attachment - did
you send one?

Thanks.
Valerie

On 04/30/2014 01:06 PM, Ryan C. Thompson wrote:

Hi all,

I recently asked about ways to do non-standard read counting in
summarizeOverlaps, and Martin Morgan directed me toward writing a custom
function to pass as the "mode" parameter. I have now written the custom
modes that I require for counting my ChIP-Seq reads, and I figured I
would contribute them back in case there was interest in merging them.

The main three functions are "ResizeReads", "FivePrimeEnd", and
"ThreePrimeEnd". The first allows you to directionally extend or shorten
each read to the effective fragment length for the purpose of
determining overlaps. For example, if each read represents the 5-prime
end of a 150-bp fragment and you want to count these fragments using the
Union mode, you could do:

 summarizeOverlaps(mode=ResizeReads(mode=Union, width=150,
fix="start"), ...)

Note that ResizeReads takes a mode argument. It returns a function (with
a closure storing the passed arguments) that performs the resizing (by
coercing reads to GRanges and calling "resize") and then dispatches to
the provided mode. (It probably needs to add a call to "match.fun"
somewhere.)

The other two functions are designed to count overlaps of only the read
ends. They are implemented internally using "ResizeReads" with width=1.

The other three counting modes (the "*ExtraArgs" functions) are meant to
be used to easily construct new counting modes. Each function takes any
number of arguments and returns a counting mode that works like the
standard one of the same name, except that those arguments are passed as
extra args to "findOverlaps". For example, you could do Union mode with
a requirement for a minimum overlap of 10:

 summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)

Note that these can be combined or "nested". For instance, you might
want a fragment length of 150 and a min overlap of 10:

 myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10),
width=150, fix="start")
 summarizeOverlaps(mode=myCountingMode, ...)

Anyway, if you think any of these are worthy of inclusion for
BioConductor, feel free to add them in. I'm not so sure about the
"nesting" idea, though. Functions that return functions (with states
saved in closures, which are then passed into another function) are
confusing for people who are not programmers by trade. Maybe
summarizeOverlaps should just gain an argument to pass args to
findOverlaps.

-Ryan Thompson

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-05-01 Thread Valerie Obenchain

Thanks.

I have some concerns about the *ExtraArgs() functions. Passing flexible 
args to findOverlaps in the existing mode functions fundamentally 
changes the documented behavior. The modes were created to capture 
specific overlap situations pertaining to gene features which are 
graphically depicted in the vignette. Changing 'maxgap' or 'minoverlap' 
will produce a variety of results inconsistent with past behavior and 
difficult to document (e.g., under what circumstances will 
IntersectionNotEmpty now register a hit).


I agree that controlling the overlap args is appealing and I like the 
added ability to resize. I've created a 'chipseq' mode that combines 
these ideas and gives what ResizeReads() was doing but now in 'mode' 
form. If this implementation gives you the flexibility you were looking 
for I'll check it into devel.


A couple of questions:

- Do you want to handle paired-end reads? You coerce to a GRanges to 
resize but don't coerce back.


- Do you want to require strand info for all reads? Is this because of 
how resize() anchors "*" to 'start'?




chipseq <- function(features, reads, ignore.strand=FALSE, inter.feature=TRUE,
type="any", maxgap=0L, minoverlap=1L,
width=NULL, fix="start", use.names=TRUE)
{
reads <- as(reads, "GRanges")
if (any(strand(reads) == "*"))
stop("all reads must have strand")
if (!is.null(width))
reads <- do.call(resize(reads, width, fix=fix, use.names=use.names,
ignore.strand=ignore.strand))

ov <- findOverlaps(features, reads, type=type, ignore.strand=ignore.strand,
   maxgap=maxgap, minoverlap=minoverlap)
if (inter.feature) {
## Remove reads that overlap multiple features.
reads_to_keep <- which(countSubjectHits(ov) == 1L)
ov <- ov[subjectHits(ov) %in% reads_to_keep]
}
countQueryHits(ov)
}



To count the overlaps of 5' and 3' ends:

summarizeOverlaps(reads, features, chipseq, fix="start", width=1)
summarizeOverlaps(reads, features, chipseq, fix="end", width=1)


Valerie

On 04/30/2014 02:41 PM, Ryan C. Thompson wrote:

No, I forgot to attach the file. Here is the link:

https://www.dropbox.com/s/7qghtksl3mbvlsl/counting-modes.R

On Wed 30 Apr 2014 02:18:28 PM PDT, Valerie Obenchain wrote:

Hi Ryan,

These sound like great contributions. I didn't get an attachment - did
you send one?

Thanks.
Valerie

On 04/30/2014 01:06 PM, Ryan C. Thompson wrote:

Hi all,

I recently asked about ways to do non-standard read counting in
summarizeOverlaps, and Martin Morgan directed me toward writing a custom
function to pass as the "mode" parameter. I have now written the custom
modes that I require for counting my ChIP-Seq reads, and I figured I
would contribute them back in case there was interest in merging them.

The main three functions are "ResizeReads", "FivePrimeEnd", and
"ThreePrimeEnd". The first allows you to directionally extend or shorten
each read to the effective fragment length for the purpose of
determining overlaps. For example, if each read represents the 5-prime
end of a 150-bp fragment and you want to count these fragments using the
Union mode, you could do:

 summarizeOverlaps(mode=ResizeReads(mode=Union, width=150,
fix="start"), ...)

Note that ResizeReads takes a mode argument. It returns a function (with
a closure storing the passed arguments) that performs the resizing (by
coercing reads to GRanges and calling "resize") and then dispatches to
the provided mode. (It probably needs to add a call to "match.fun"
somewhere.)

The other two functions are designed to count overlaps of only the read
ends. They are implemented internally using "ResizeReads" with width=1.

The other three counting modes (the "*ExtraArgs" functions) are meant to
be used to easily construct new counting modes. Each function takes any
number of arguments and returns a counting mode that works like the
standard one of the same name, except that those arguments are passed as
extra args to "findOverlaps". For example, you could do Union mode with
a requirement for a minimum overlap of 10:

 summarizeOverlaps(mode=UnionExtraArgs(minoverlap=10), ...)

Note that these can be combined or "nested". For instance, you might
want a fragment length of 150 and a min overlap of 10:

 myCountingMode <- ResizeReads(mode=UnionExtraArgs(minoverlap=10),
width=150, fix="start")
 summarizeOverlaps(mode=myCountingMode, ...)

Anyway, if you think any of these are worthy of inclusion for
BioConductor, feel free to add them in. I'm not so sure about the
"nesting" idea, though. Functions that return functions (with states
saved in closures, which are then passed into another function) are
confusing for people who are not programmers by trade. Maybe
summarizeOverlaps should just gain an argument to pass args to
findOverlaps.

-Ryan Thompson

___
Bioc-devel@r-project.org mailing list
https

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-05-01 Thread Ryan

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing 'maxgap' or
'minoverlap' will produce a variety of results inconsistent with past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).
Well, I wasn't so sure about those functions either. Obviously you can 
pass arguments that break things. They were mostly designed to be 
constructors for specific counting modes involving the minoverlap/maxgap 
arguments, but I decided I didn't need those modes after all. They're 
certainly not designed to be exposed to the user. I haven't carefully 
considered the interaction between the counting mode and 
maxgap/minoverlap, but I believe that it would be roughly equivalent to 
extending/shrinking the features/reads by the specified amount (with 
some differences for e.g. a feature/read smaller than 2*minoverlap). For 
example, with a read length of 100 and a minoverlap of 10 in Union 
counting mode, this would be the same as truncating the first and last 
10 (or mabe 9?) bases and operating in normal Union mode. As I said, 
though, there may be edge cases that I haven't thought of where 
unexpected things happen.


I agree that controlling the overlap args is appealing and I like the
added ability to resize. I've created a 'chipseq' mode that combines
these ideas and gives what ResizeReads() was doing but now in 'mode'
form. If this implementation gives you the flexibility you were
looking for I'll check it into devel.


This sounds nice, but if I use the 'chipseq' mode, how do I specify 
whether I want Union, IntersectionNotEmpty, or IntersectionStrict? It 
looks like it just does Union? IntersectionStrict would be useful for 
specifying that the read has to occur entirely within the bounds of a 
called peak, for example. This is why I implemented it as a "wrapper" 
that takes another mode as an argument, so that the resizing logic and 
the counting logic were independent. Maybe summarizeOverlaps could 
accept an optional "read modification function", and if this is 
provided, it will pass the reads through this before passing them to the 
counting function. The read modification function would have to take any 
valid reads argument and return another valid reads argument. It could 
be used for modifying the reads as well as filtering them. This would 
allow resizing without the awkward nesting method that I've used.


A couple of questions:

- Do you want to handle paired-end reads? You coerce to a GRanges to
resize but don't coerce back.
For paired end reads, there is no need to estimate the fragment length, 
because the pair gives you both ends of the fragment. So if I had 
paired-end ChIP-Seq data, I would use it as is with no resizing. I can't 
personally think of a reason to resize a paired-end fragment, but I 
don't know if others might need that.


I corece to GRanges because I know how GRanges work, but I'm not as 
familiar with GAlignments so I don't know how the resize function works 
on GAlignments and other classes. I'm sure you know better than I do how 
these work. If the coercion is superfluous, feel free to eliminate it.


- Do you want to require strand info for all reads? Is this because of
how resize() anchors "*" to 'start'?
Yes, I require strand info for all reads because the reads must be 
directionally extended, which requires strand info. Ditto for counting 
the 5-prime and 3-prime ends.


-Ryan





chipseq <- function(features, reads, ignore.strand=FALSE,
inter.feature=TRUE,
type="any", maxgap=0L, minoverlap=1L,
width=NULL, fix="start", use.names=TRUE)
{
reads <- as(reads, "GRanges")
if (any(strand(reads) == "*"))
stop("all reads must have strand")
if (!is.null(width))
reads <- do.call(resize(reads, width, fix=fix,
use.names=use.names,
ignore.strand=ignore.strand))

ov <- findOverlaps(features, reads, type=type,
ignore.strand=ignore.strand,
maxgap=maxgap, minoverlap=minoverlap)
if (inter.feature) {
## Remove reads that overlap multiple features.
reads_to_keep <- which(countSubjectHits(ov) == 1L)
ov <- ov[subjectHits(ov) %in% reads_to_keep]
}
countQueryHits(ov)
}




To count the overlaps of 5' and 3' ends:

summarizeOverlaps(reads, features, chipseq, fix="start", width=1)
summarizeOverlaps(reads, features, chipseq, fix="end", width=1)


Valerie

On 04/30/2014 02:41 PM, Ryan C. Thompson wrote:


No, I forgot to attach the file. Here is the link:

https://www.dropbox.com/s/7qghtksl3mbvlsl/counting-modes.R

On Wed 30 Apr 2014 02:18:28 PM PDT, Valerie Obenchain wrote:


Hi Ryan,

These sound like great contributions. I didn't get an attachment - did
you send one?

Thanks.
Valerie

On 04/30/2014 01:06 PM, Ry

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-05-01 Thread Valerie Obenchain

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing 'maxgap' or
'minoverlap' will produce a variety of results inconsistent with past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).

Well, I wasn't so sure about those functions either. Obviously you can
pass arguments that break things. They were mostly designed to be
constructors for specific counting modes involving the minoverlap/maxgap
arguments, but I decided I didn't need those modes after all. They're
certainly not designed to be exposed to the user. I haven't carefully
considered the interaction between the counting mode and
maxgap/minoverlap, but I believe that it would be roughly equivalent to
extending/shrinking the features/reads by the specified amount (with
some differences for e.g. a feature/read smaller than 2*minoverlap). For
example, with a read length of 100 and a minoverlap of 10 in Union
counting mode, this would be the same as truncating the first and last
10 (or mabe 9?) bases and operating in normal Union mode. As I said,
though, there may be edge cases that I haven't thought of where
unexpected things happen.


I agree that controlling the overlap args is appealing and I like the
added ability to resize. I've created a 'chipseq' mode that combines
these ideas and gives what ResizeReads() was doing but now in 'mode'
form. If this implementation gives you the flexibility you were
looking for I'll check it into devel.


This sounds nice, but if I use the 'chipseq' mode, how do I specify
whether I want Union, IntersectionNotEmpty, or IntersectionStrict? It
looks like it just does Union? IntersectionStrict would be useful for
specifying that the read has to occur entirely within the bounds of a
called peak, for example. This is why I implemented it as a "wrapper"
that takes another mode as an argument, so that the resizing logic and
the counting logic were independent.


'chipseq' didn't implement the standard modes b/c I wanted to avoid the 
clash of passing unconventional findOverlaps() args to those modes. The 
assumption was that if the user specified a mode they would expect a 
certain behavior ...


Maybe summarizeOverlaps could

accept an optional "read modification function", and if this is
provided, it will pass the reads through this before passing them to the
counting function. The read modification function would have to take any
valid reads argument and return another valid reads argument. It could
be used for modifying the reads as well as filtering them. This would
allow resizing without the awkward nesting method that I've used.


Good idea. Maybe a 'preprocess' or 'prefilter' arg to allow massaging 
before counting. I'll post back when it's done.


Valerie



A couple of questions:

- Do you want to handle paired-end reads? You coerce to a GRanges to
resize but don't coerce back.

For paired end reads, there is no need to estimate the fragment length,
because the pair gives you both ends of the fragment. So if I had
paired-end ChIP-Seq data, I would use it as is with no resizing. I can't
personally think of a reason to resize a paired-end fragment, but I
don't know if others might need that.

I corece to GRanges because I know how GRanges work, but I'm not as
familiar with GAlignments so I don't know how the resize function works
on GAlignments and other classes. I'm sure you know better than I do how
these work. If the coercion is superfluous, feel free to eliminate it.


- Do you want to require strand info for all reads? Is this because of
how resize() anchors "*" to 'start'?

Yes, I require strand info for all reads because the reads must be
directionally extended, which requires strand info. Ditto for counting
the 5-prime and 3-prime ends.

-Ryan





chipseq <- function(features, reads, ignore.strand=FALSE,
inter.feature=TRUE,
type="any", maxgap=0L, minoverlap=1L,
width=NULL, fix="start", use.names=TRUE)
{
reads <- as(reads, "GRanges")
if (any(strand(reads) == "*"))
stop("all reads must have strand")
if (!is.null(width))
reads <- do.call(resize(reads, width, fix=fix,
use.names=use.names,
ignore.strand=ignore.strand))

ov <- findOverlaps(features, reads, type=type,
ignore.strand=ignore.strand,
maxgap=maxgap, minoverlap=minoverlap)
if (inter.feature) {
## Remove reads that overlap multiple features.
reads_to_keep <- which(countSubjectHits(ov) == 1L)
ov <- ov[subjectHits(ov) %in% reads_to_keep]
}
countQueryHits(ov)
}




To count the overlaps of 5' and 3' ends:

summarizeOverlaps(reads, features, chipseq, fix="start", width=1)
summarizeOverlaps(reads, features, chipseq, fix="end", width=

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-05-01 Thread Valerie Obenchain
GenomicAlignments 1.1.8 has a 'preprocess.reads' argument. This should 
be a function where the first argument is 'reads' and the return value 
is compatible with 'reads' in the pre-defined count modes.


I've used your ResizeReads() as an example in the man page. I think the 
ability to pre-filter and used a pre-defined mode will be useful. Thanks 
for the suggestion.


Valerie


On 05/01/2014 02:29 PM, Valerie Obenchain wrote:

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing 'maxgap' or
'minoverlap' will produce a variety of results inconsistent with past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).

Well, I wasn't so sure about those functions either. Obviously you can
pass arguments that break things. They were mostly designed to be
constructors for specific counting modes involving the minoverlap/maxgap
arguments, but I decided I didn't need those modes after all. They're
certainly not designed to be exposed to the user. I haven't carefully
considered the interaction between the counting mode and
maxgap/minoverlap, but I believe that it would be roughly equivalent to
extending/shrinking the features/reads by the specified amount (with
some differences for e.g. a feature/read smaller than 2*minoverlap). For
example, with a read length of 100 and a minoverlap of 10 in Union
counting mode, this would be the same as truncating the first and last
10 (or mabe 9?) bases and operating in normal Union mode. As I said,
though, there may be edge cases that I haven't thought of where
unexpected things happen.


I agree that controlling the overlap args is appealing and I like the
added ability to resize. I've created a 'chipseq' mode that combines
these ideas and gives what ResizeReads() was doing but now in 'mode'
form. If this implementation gives you the flexibility you were
looking for I'll check it into devel.


This sounds nice, but if I use the 'chipseq' mode, how do I specify
whether I want Union, IntersectionNotEmpty, or IntersectionStrict? It
looks like it just does Union? IntersectionStrict would be useful for
specifying that the read has to occur entirely within the bounds of a
called peak, for example. This is why I implemented it as a "wrapper"
that takes another mode as an argument, so that the resizing logic and
the counting logic were independent.


'chipseq' didn't implement the standard modes b/c I wanted to avoid the
clash of passing unconventional findOverlaps() args to those modes. The
assumption was that if the user specified a mode they would expect a
certain behavior ...

Maybe summarizeOverlaps could

accept an optional "read modification function", and if this is
provided, it will pass the reads through this before passing them to the
counting function. The read modification function would have to take any
valid reads argument and return another valid reads argument. It could
be used for modifying the reads as well as filtering them. This would
allow resizing without the awkward nesting method that I've used.


Good idea. Maybe a 'preprocess' or 'prefilter' arg to allow massaging
before counting. I'll post back when it's done.

Valerie



A couple of questions:

- Do you want to handle paired-end reads? You coerce to a GRanges to
resize but don't coerce back.

For paired end reads, there is no need to estimate the fragment length,
because the pair gives you both ends of the fragment. So if I had
paired-end ChIP-Seq data, I would use it as is with no resizing. I can't
personally think of a reason to resize a paired-end fragment, but I
don't know if others might need that.

I corece to GRanges because I know how GRanges work, but I'm not as
familiar with GAlignments so I don't know how the resize function works
on GAlignments and other classes. I'm sure you know better than I do how
these work. If the coercion is superfluous, feel free to eliminate it.


- Do you want to require strand info for all reads? Is this because of
how resize() anchors "*" to 'start'?

Yes, I require strand info for all reads because the reads must be
directionally extended, which requires strand info. Ditto for counting
the 5-prime and 3-prime ends.

-Ryan





chipseq <- function(features, reads, ignore.strand=FALSE,
inter.feature=TRUE,
type="any", maxgap=0L, minoverlap=1L,
width=NULL, fix="start", use.names=TRUE)
{
reads <- as(reads, "GRanges")
if (any(strand(reads) == "*"))
stop("all reads must have strand")
if (!is.null(width))
reads <- do.call(resize(reads, width, fix=fix,
use.names=use.names,
ignore.strand=ignore.strand))

ov <- findOverlaps(features, reads, type=type,
i

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-08-05 Thread Ryan C. Thompson

Hi Valerie,

I got really busy around May and never got a chance to thank you for 
adding this option to summarizeOverlaps! So thank you!


-Ryan

On Thu 01 May 2014 04:25:33 PM PDT, Valerie Obenchain wrote:

GenomicAlignments 1.1.8 has a 'preprocess.reads' argument. This should
be a function where the first argument is 'reads' and the return value
is compatible with 'reads' in the pre-defined count modes.

I've used your ResizeReads() as an example in the man page. I think
the ability to pre-filter and used a pre-defined mode will be useful.
Thanks for the suggestion.

Valerie


On 05/01/2014 02:29 PM, Valerie Obenchain wrote:

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing 'maxgap' or
'minoverlap' will produce a variety of results inconsistent with past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).

Well, I wasn't so sure about those functions either. Obviously you can
pass arguments that break things. They were mostly designed to be
constructors for specific counting modes involving the
minoverlap/maxgap
arguments, but I decided I didn't need those modes after all. They're
certainly not designed to be exposed to the user. I haven't carefully
considered the interaction between the counting mode and
maxgap/minoverlap, but I believe that it would be roughly equivalent to
extending/shrinking the features/reads by the specified amount (with
some differences for e.g. a feature/read smaller than 2*minoverlap).
For
example, with a read length of 100 and a minoverlap of 10 in Union
counting mode, this would be the same as truncating the first and last
10 (or mabe 9?) bases and operating in normal Union mode. As I said,
though, there may be edge cases that I haven't thought of where
unexpected things happen.


I agree that controlling the overlap args is appealing and I like the
added ability to resize. I've created a 'chipseq' mode that combines
these ideas and gives what ResizeReads() was doing but now in 'mode'
form. If this implementation gives you the flexibility you were
looking for I'll check it into devel.


This sounds nice, but if I use the 'chipseq' mode, how do I specify
whether I want Union, IntersectionNotEmpty, or IntersectionStrict? It
looks like it just does Union? IntersectionStrict would be useful for
specifying that the read has to occur entirely within the bounds of a
called peak, for example. This is why I implemented it as a "wrapper"
that takes another mode as an argument, so that the resizing logic and
the counting logic were independent.


'chipseq' didn't implement the standard modes b/c I wanted to avoid the
clash of passing unconventional findOverlaps() args to those modes. The
assumption was that if the user specified a mode they would expect a
certain behavior ...

Maybe summarizeOverlaps could

accept an optional "read modification function", and if this is
provided, it will pass the reads through this before passing them to
the
counting function. The read modification function would have to take
any
valid reads argument and return another valid reads argument. It could
be used for modifying the reads as well as filtering them. This would
allow resizing without the awkward nesting method that I've used.


Good idea. Maybe a 'preprocess' or 'prefilter' arg to allow massaging
before counting. I'll post back when it's done.

Valerie



A couple of questions:

- Do you want to handle paired-end reads? You coerce to a GRanges to
resize but don't coerce back.

For paired end reads, there is no need to estimate the fragment length,
because the pair gives you both ends of the fragment. So if I had
paired-end ChIP-Seq data, I would use it as is with no resizing. I
can't
personally think of a reason to resize a paired-end fragment, but I
don't know if others might need that.

I corece to GRanges because I know how GRanges work, but I'm not as
familiar with GAlignments so I don't know how the resize function works
on GAlignments and other classes. I'm sure you know better than I do
how
these work. If the coercion is superfluous, feel free to eliminate it.


- Do you want to require strand info for all reads? Is this because of
how resize() anchors "*" to 'start'?

Yes, I require strand info for all reads because the reads must be
directionally extended, which requires strand info. Ditto for counting
the 5-prime and 3-prime ends.

-Ryan





chipseq <- function(features, reads, ignore.strand=FALSE,
inter.feature=TRUE,
type="any", maxgap=0L, minoverlap=1L,
width=NULL, fix="start", use.names=TRUE)
{
reads <- as(reads, "GRanges")
if (any(strand(reads) == "*"))

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-08-05 Thread Ryan C. Thompson

Hi again,

I'm looking at the examples in the summarizeOverlaps help page here: 
http://www.bioconductor.org/packages/devel/bioc/manuals/GenomicAlignments/man/GenomicAlignments.pdf


And the examples fod preprocess.reads are a little confusing. One of 
the examples passes some additional "..." options to summarizeOverlaps, 
and implies that these will be passed along as the "..." arguments to 
the proprocess.reads function:


   summarizeOverlaps(grl, reads, mode=Union, 
preprocess.reads=ResizeReads, width=1, fix="end")


The width and fix arguments are implied to be passed through to 
ResizeReads, but I don't see it documented anywhere how this would be 
done. The summarizeOverlaps documentation for "..." says "Additional 
arguments for BAM file methods such as singleEnd, fragments or param 
that apply to the reading of records from a file (see below)." I don't 
see anything about passing through to preprocess.reads.


Incidentally, this is why my original example used a function that 
constructed a closure with these arguments already bound. I would write 
the example like this to ensure no ambiguity in argument passing (pay 
attention to parens):


   ResizeReads <- function(mode, ...) {
   resize.args <- list(...)
   function(reads) {
   reads <- as(reads, "GRanges")
   ## Need strandedness
   stopifnot(all(strand(reads) != "*"))
   do.call(resize, c(list(x=reads), resize.args))
   }
   }

   ## By default ResizeReads() counts reads that overlap on the 5 end:
   summarizeOverlaps(grl, reads, mode=Union, 
preprocess.reads=ResizeReads())


   ## Count reads that overlap on the 3 end by passing new values
   ## for width and fix:
   summarizeOverlaps(grl, reads, mode=Union, 
preprocess.reads=ResizeReads(width=1, fix="end"))


Anyway, I don't have the devel version of R handy to test this out, so 
I don't know if what I've described is a problem in practice. But I 
think that either the preprocess.reads function should be required to 
only take one argument, or else the method of passing through 
additional arguments to it should be documented.


-Ryan

On Tue 05 Aug 2014 05:12:41 PM PDT, Ryan C. Thompson wrote:

Hi Valerie,

I got really busy around May and never got a chance to thank you for
adding this option to summarizeOverlaps! So thank you!

-Ryan

On Thu 01 May 2014 04:25:33 PM PDT, Valerie Obenchain wrote:

GenomicAlignments 1.1.8 has a 'preprocess.reads' argument. This should
be a function where the first argument is 'reads' and the return value
is compatible with 'reads' in the pre-defined count modes.

I've used your ResizeReads() as an example in the man page. I think
the ability to pre-filter and used a pre-defined mode will be useful.
Thanks for the suggestion.

Valerie


On 05/01/2014 02:29 PM, Valerie Obenchain wrote:

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing 'maxgap' or
'minoverlap' will produce a variety of results inconsistent with past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).

Well, I wasn't so sure about those functions either. Obviously you can
pass arguments that break things. They were mostly designed to be
constructors for specific counting modes involving the
minoverlap/maxgap
arguments, but I decided I didn't need those modes after all. They're
certainly not designed to be exposed to the user. I haven't carefully
considered the interaction between the counting mode and
maxgap/minoverlap, but I believe that it would be roughly
equivalent to
extending/shrinking the features/reads by the specified amount (with
some differences for e.g. a feature/read smaller than 2*minoverlap).
For
example, with a read length of 100 and a minoverlap of 10 in Union
counting mode, this would be the same as truncating the first and last
10 (or mabe 9?) bases and operating in normal Union mode. As I said,
though, there may be edge cases that I haven't thought of where
unexpected things happen.


I agree that controlling the overlap args is appealing and I like the
added ability to resize. I've created a 'chipseq' mode that combines
these ideas and gives what ResizeReads() was doing but now in 'mode'
form. If this implementation gives you the flexibility you were
looking for I'll check it into devel.


This sounds nice, but if I use the 'chipseq' mode, how do I specify
whether I want Union, IntersectionNotEmpty, or IntersectionStrict? It
looks like it just does Union? IntersectionStrict would be useful for
specifying that the read has to occur entirely within the bounds of a
called peak, for example. This is why 

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-08-06 Thread Valerie Obenchain

Hi Ryan,

On 08/05/2014 05:47 PM, Ryan C. Thompson wrote:

Hi again,

I'm looking at the examples in the summarizeOverlaps help page here:
http://www.bioconductor.org/packages/devel/bioc/manuals/GenomicAlignments/man/GenomicAlignments.pdf


And the examples fod preprocess.reads are a little confusing. One of the
examples passes some additional "..." options to summarizeOverlaps, and
implies that these will be passed along as the "..." arguments to the
proprocess.reads function:

summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads, width=1, fix="end")

The width and fix arguments are implied to be passed through to
ResizeReads, but I don't see it documented anywhere how this would be
done.


This is standard use of '...'. See the ?'...' and ?dotsMethods man 
pages. I've added a sentence in \arguments clarifying that '...' can 
encompass arguments called by any subsequent function/method.


The summarizeOverlaps documentation for "..." says "Additional

arguments for BAM file methods such as singleEnd, fragments or param
that apply to the reading of records from a file (see below)." I don't
see anything about passing through to preprocess.reads.

Incidentally, this is why my original example used a function that
constructed a closure with these arguments already bound. I would write
the example like this to ensure no ambiguity in argument passing (pay
attention to parens):


The 'resize.args' variable below captures all variables that exist in 
the .dispatchOverlaps() helper, many of which don't need to be passed to 
resize(). The 'preprocess.reads' function can be written this way or it 
can have default values in the signature as I've done in the man page 
example.



Valerie




ResizeReads <- function(mode, ...) {
resize.args <- list(...)
function(reads) {
reads <- as(reads, "GRanges")
## Need strandedness
stopifnot(all(strand(reads) != "*"))
do.call(resize, c(list(x=reads), resize.args))
}
}

## By default ResizeReads() counts reads that overlap on the 5 end:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads())

## Count reads that overlap on the 3 end by passing new values
## for width and fix:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads(width=1, fix="end"))

Anyway, I don't have the devel version of R handy to test this out, so I
don't know if what I've described is a problem in practice. But I think
that either the preprocess.reads function should be required to only
take one argument, or else the method of passing through additional
arguments to it should be documented.

-Ryan

On Tue 05 Aug 2014 05:12:41 PM PDT, Ryan C. Thompson wrote:

Hi Valerie,

I got really busy around May and never got a chance to thank you for
adding this option to summarizeOverlaps! So thank you!

-Ryan

On Thu 01 May 2014 04:25:33 PM PDT, Valerie Obenchain wrote:

GenomicAlignments 1.1.8 has a 'preprocess.reads' argument. This should
be a function where the first argument is 'reads' and the return value
is compatible with 'reads' in the pre-defined count modes.

I've used your ResizeReads() as an example in the man page. I think
the ability to pre-filter and used a pre-defined mode will be useful.
Thanks for the suggestion.

Valerie


On 05/01/2014 02:29 PM, Valerie Obenchain wrote:

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing 'maxgap' or
'minoverlap' will produce a variety of results inconsistent with past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).

Well, I wasn't so sure about those functions either. Obviously you can
pass arguments that break things. They were mostly designed to be
constructors for specific counting modes involving the
minoverlap/maxgap
arguments, but I decided I didn't need those modes after all. They're
certainly not designed to be exposed to the user. I haven't carefully
considered the interaction between the counting mode and
maxgap/minoverlap, but I believe that it would be roughly
equivalent to
extending/shrinking the features/reads by the specified amount (with
some differences for e.g. a feature/read smaller than 2*minoverlap).
For
example, with a read length of 100 and a minoverlap of 10 in Union
counting mode, this would be the same as truncating the first and last
10 (or mabe 9?) bases and operating in normal Union mode. As I said,
though, there may be edge cases that I haven't thought of where
unexpected things happen.


I agree that controlling the overlap args is appealing and I 

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-08-06 Thread Ryan
Ok, I had a look at the code, and I think understand now. The help text 
for the "..." argument says "Additional arguments for BAM file methods 
such assingleEnd,fragmentsorparamthat apply to the reading of records 
from a file (see below)." But this is actually referring to the fixed 
set of individual arguments listed below the dots. It doesn't apply to 
the arguments that actually get matched by "..." in a call to 
summarizeOverlaps. These actually get passed straight to 
preprocess.reads. Perhaps the documentaion for "..." should be updated 
to reflect this?


On Wed Aug  6 11:21:20 2014, Valerie Obenchain wrote:

Hi Ryan,

On 08/05/2014 05:47 PM, Ryan C. Thompson wrote:

Hi again,

I'm looking at the examples in the summarizeOverlaps help page here:
http://www.bioconductor.org/packages/devel/bioc/manuals/GenomicAlignments/man/GenomicAlignments.pdf



And the examples fod preprocess.reads are a little confusing. One of the
examples passes some additional "..." options to summarizeOverlaps, and
implies that these will be passed along as the "..." arguments to the
proprocess.reads function:

summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads, width=1, fix="end")

The width and fix arguments are implied to be passed through to
ResizeReads, but I don't see it documented anywhere how this would be
done.


This is standard use of '...'. See the ?'...' and ?dotsMethods man
pages. I've added a sentence in \arguments clarifying that '...' can
encompass arguments called by any subsequent function/method.

The summarizeOverlaps documentation for "..." says "Additional

arguments for BAM file methods such as singleEnd, fragments or param
that apply to the reading of records from a file (see below)." I don't
see anything about passing through to preprocess.reads.

Incidentally, this is why my original example used a function that
constructed a closure with these arguments already bound. I would write
the example like this to ensure no ambiguity in argument passing (pay
attention to parens):


The 'resize.args' variable below captures all variables that exist in
the .dispatchOverlaps() helper, many of which don't need to be passed
to resize(). The 'preprocess.reads' function can be written this way
or it can have default values in the signature as I've done in the man
page example.


Valerie




ResizeReads <- function(mode, ...) {
resize.args <- list(...)
function(reads) {
reads <- as(reads, "GRanges")
## Need strandedness
stopifnot(all(strand(reads) != "*"))
do.call(resize, c(list(x=reads), resize.args))
}
}

## By default ResizeReads() counts reads that overlap on the 5 end:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads())

## Count reads that overlap on the 3 end by passing new values
## for width and fix:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads(width=1, fix="end"))

Anyway, I don't have the devel version of R handy to test this out, so I
don't know if what I've described is a problem in practice. But I think
that either the preprocess.reads function should be required to only
take one argument, or else the method of passing through additional
arguments to it should be documented.

-Ryan

On Tue 05 Aug 2014 05:12:41 PM PDT, Ryan C. Thompson wrote:

Hi Valerie,

I got really busy around May and never got a chance to thank you for
adding this option to summarizeOverlaps! So thank you!

-Ryan

On Thu 01 May 2014 04:25:33 PM PDT, Valerie Obenchain wrote:

GenomicAlignments 1.1.8 has a 'preprocess.reads' argument. This should
be a function where the first argument is 'reads' and the return value
is compatible with 'reads' in the pre-defined count modes.

I've used your ResizeReads() as an example in the man page. I think
the ability to pre-filter and used a pre-defined mode will be useful.
Thanks for the suggestion.

Valerie


On 05/01/2014 02:29 PM, Valerie Obenchain wrote:

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were
created
to capture specific overlap situations pertaining to gene features
which are graphically depicted in the vignette. Changing
'maxgap' or
'minoverlap' will produce a variety of results inconsistent with
past
behavior and difficult to document (e.g., under what circumstances
will IntersectionNotEmpty now register a hit).

Well, I wasn't so sure about those functions either. Obviously
you can
pass arguments that break things. They were mostly designed to be
constructors for specific counting modes involving the
minoverlap/maxgap
arguments, but I decided I didn't need those modes after all.
They're
certainly not designed to be exposed to the user. I haven't
carefully
considered th

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-08-06 Thread Valerie Obenchain
The man page '...' section was updated in GenomicAlignments 1.1.24 in 
devel. I've now also updated it in 1.0.5 in release.


The '...' does not refer only to the fixed set of args listed below the 
dots. The '...' encompasses any argument(s) provided to 
summarizeOverlaps() not explicitly stated in the function signature. For 
example, if you passed FOO=3, then FOO would end up in '...'.


Any function/method called inside summarizeOverlaps() with a '...' will 
pass the arguments down; they continue to be passed down until they are 
explicitly stated in a function signature (e.g., 'width' and 'fix' in 
ResizeReads()).


Valerie

On 08/06/2014 11:35 AM, Ryan wrote:

Ok, I had a look at the code, and I think understand now. The help text
for the "..." argument says "Additional arguments for BAM file methods
such assingleEnd,fragmentsorparamthat apply to the reading of records
from a file (see below)." But this is actually referring to the fixed
set of individual arguments listed below the dots. It doesn't apply to
the arguments that actually get matched by "..." in a call to
summarizeOverlaps. These actually get passed straight to
preprocess.reads. Perhaps the documentaion for "..." should be updated
to reflect this?

On Wed Aug  6 11:21:20 2014, Valerie Obenchain wrote:

Hi Ryan,

On 08/05/2014 05:47 PM, Ryan C. Thompson wrote:

Hi again,

I'm looking at the examples in the summarizeOverlaps help page here:
http://www.bioconductor.org/packages/devel/bioc/manuals/GenomicAlignments/man/GenomicAlignments.pdf




And the examples fod preprocess.reads are a little confusing. One of the
examples passes some additional "..." options to summarizeOverlaps, and
implies that these will be passed along as the "..." arguments to the
proprocess.reads function:

summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads, width=1, fix="end")

The width and fix arguments are implied to be passed through to
ResizeReads, but I don't see it documented anywhere how this would be
done.


This is standard use of '...'. See the ?'...' and ?dotsMethods man
pages. I've added a sentence in \arguments clarifying that '...' can
encompass arguments called by any subsequent function/method.

The summarizeOverlaps documentation for "..." says "Additional

arguments for BAM file methods such as singleEnd, fragments or param
that apply to the reading of records from a file (see below)." I don't
see anything about passing through to preprocess.reads.

Incidentally, this is why my original example used a function that
constructed a closure with these arguments already bound. I would write
the example like this to ensure no ambiguity in argument passing (pay
attention to parens):


The 'resize.args' variable below captures all variables that exist in
the .dispatchOverlaps() helper, many of which don't need to be passed
to resize(). The 'preprocess.reads' function can be written this way
or it can have default values in the signature as I've done in the man
page example.


Valerie




ResizeReads <- function(mode, ...) {
resize.args <- list(...)
function(reads) {
reads <- as(reads, "GRanges")
## Need strandedness
stopifnot(all(strand(reads) != "*"))
do.call(resize, c(list(x=reads), resize.args))
}
}

## By default ResizeReads() counts reads that overlap on the 5 end:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads())

## Count reads that overlap on the 3 end by passing new values
## for width and fix:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads(width=1, fix="end"))

Anyway, I don't have the devel version of R handy to test this out, so I
don't know if what I've described is a problem in practice. But I think
that either the preprocess.reads function should be required to only
take one argument, or else the method of passing through additional
arguments to it should be documented.

-Ryan

On Tue 05 Aug 2014 05:12:41 PM PDT, Ryan C. Thompson wrote:

Hi Valerie,

I got really busy around May and never got a chance to thank you for
adding this option to summarizeOverlaps! So thank you!

-Ryan

On Thu 01 May 2014 04:25:33 PM PDT, Valerie Obenchain wrote:

GenomicAlignments 1.1.8 has a 'preprocess.reads' argument. This should
be a function where the first argument is 'reads' and the return value
is compatible with 'reads' in the pre-defined count modes.

I've used your ResizeReads() as an example in the man page. I think
the ability to pre-filter and used a pre-defined mode will be useful.
Thanks for the suggestion.

Valerie


On 05/01/2014 02:29 PM, Valerie Obenchain wrote:

On 05/01/2014 02:05 PM, Ryan wrote:

Hi Valerie,

On Thu May 1 13:27:16 2014, Valerie Obenchain wrote:


I have some concerns about the *ExtraArgs() functions. Passing
flexible args to findOverlaps in the existing mode functions
fundamentally changes the documented behavior. The modes were
created
to capture specif

Re: [Bioc-devel] Additional summarizeOverlaps counting modes for ChIP-Seq

2014-08-06 Thread Ryan
Yes, I understand that the "..." doesn't only refer to the fixed set of 
arguments listed below. I was saying that the *documentation* for the 
dots argument in summarizeOverlaps was actually talking about the below 
arguments instead. I don't see it documented anywhere that the dots 
arguments passed to summarizeOverlaps are passed through to 
preprocess.reads. You'd have to read the code to figure that out. I 
guess my issue is that there are a number of internal functions that 
could conceivably receive the dots arguments that are passed to 
summarizeOverlaps, and it's not obvious to me that they will ultimately 
be passed down to preprocess.reads and not some other function or 
functions.


On Wed Aug  6 11:57:10 2014, Valerie Obenchain wrote:

The man page '...' section was updated in GenomicAlignments 1.1.24 in
devel. I've now also updated it in 1.0.5 in release.

The '...' does not refer only to the fixed set of args listed below
the dots. The '...' encompasses any argument(s) provided to
summarizeOverlaps() not explicitly stated in the function signature.
For example, if you passed FOO=3, then FOO would end up in '...'.

Any function/method called inside summarizeOverlaps() with a '...'
will pass the arguments down; they continue to be passed down until
they are explicitly stated in a function signature (e.g., 'width' and
'fix' in ResizeReads()).

Valerie

On 08/06/2014 11:35 AM, Ryan wrote:

Ok, I had a look at the code, and I think understand now. The help text
for the "..." argument says "Additional arguments for BAM file methods
such assingleEnd,fragmentsorparamthat apply to the reading of records
from a file (see below)." But this is actually referring to the fixed
set of individual arguments listed below the dots. It doesn't apply to
the arguments that actually get matched by "..." in a call to
summarizeOverlaps. These actually get passed straight to
preprocess.reads. Perhaps the documentaion for "..." should be updated
to reflect this?

On Wed Aug  6 11:21:20 2014, Valerie Obenchain wrote:

Hi Ryan,

On 08/05/2014 05:47 PM, Ryan C. Thompson wrote:

Hi again,

I'm looking at the examples in the summarizeOverlaps help page here:
http://www.bioconductor.org/packages/devel/bioc/manuals/GenomicAlignments/man/GenomicAlignments.pdf





And the examples fod preprocess.reads are a little confusing. One
of the
examples passes some additional "..." options to summarizeOverlaps,
and
implies that these will be passed along as the "..." arguments to the
proprocess.reads function:

summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads, width=1, fix="end")

The width and fix arguments are implied to be passed through to
ResizeReads, but I don't see it documented anywhere how this would be
done.


This is standard use of '...'. See the ?'...' and ?dotsMethods man
pages. I've added a sentence in \arguments clarifying that '...' can
encompass arguments called by any subsequent function/method.

The summarizeOverlaps documentation for "..." says "Additional

arguments for BAM file methods such as singleEnd, fragments or param
that apply to the reading of records from a file (see below)." I don't
see anything about passing through to preprocess.reads.

Incidentally, this is why my original example used a function that
constructed a closure with these arguments already bound. I would
write
the example like this to ensure no ambiguity in argument passing (pay
attention to parens):


The 'resize.args' variable below captures all variables that exist in
the .dispatchOverlaps() helper, many of which don't need to be passed
to resize(). The 'preprocess.reads' function can be written this way
or it can have default values in the signature as I've done in the man
page example.


Valerie




ResizeReads <- function(mode, ...) {
resize.args <- list(...)
function(reads) {
reads <- as(reads, "GRanges")
## Need strandedness
stopifnot(all(strand(reads) != "*"))
do.call(resize, c(list(x=reads), resize.args))
}
}

## By default ResizeReads() counts reads that overlap on the 5
end:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads())

## Count reads that overlap on the 3 end by passing new values
## for width and fix:
summarizeOverlaps(grl, reads, mode=Union,
preprocess.reads=ResizeReads(width=1, fix="end"))

Anyway, I don't have the devel version of R handy to test this out,
so I
don't know if what I've described is a problem in practice. But I
think
that either the preprocess.reads function should be required to only
take one argument, or else the method of passing through additional
arguments to it should be documented.

-Ryan

On Tue 05 Aug 2014 05:12:41 PM PDT, Ryan C. Thompson wrote:

Hi Valerie,

I got really busy around May and never got a chance to thank you for
adding this option to summarizeOverlaps! So thank you!

-Ryan

On Thu 01 May 2014 04:25:33 PM PDT, Valerie Obenchain wr