Re: [Bioc-devel] baySeq countData class bug?

2015-02-23 Thread Tom Hardcastle
Hi,

 From v2, baySeq has moved from storing data in matrices to arrays, to 
allow for multi-dimensional analyses. This looks like an unintended 
consequence of that change - I'll look into it and see if it's fixable, 
or whether it's indices only from now on.

Thanks for bringing it to my attention.

Cheers,

Tom

On 20/02/15 11:00, bioc-devel-requ...@r-project.org wrote:
 Message: 5
 Date: Fri, 20 Feb 2015 11:55:21 +0200
 From: Panagiotis Moulosmou...@fleming.gr
 To:bioc-devel@r-project.org  bioc-devel@r-project.org
 Subject: [Bioc-devel] baySeq countData class bug?
 Message-ID:54e70489.8050...@fleming.gr
 Content-Type: text/plain; charset=UTF-8

 Hello,

 In the previous Bioconductor release (2.14), a countData object from the
 baySeq package could be accessed by sample names. Now it seems that it
 can be accessed only by indices (version 2.0.50). Was this done on
 purpose or is it a bug unnoticed so far? I discovered this as baySeq
 fails with my package metaseqR. Some steps to reproduce:

 library(metaseqR)
 data(mm9.gene.data,package=metaseqR)
 result - metaseqr(
  counts=mm9.gene.counts,
  sample.list=sample.list.mm9,
  contrast=c(e14.5_vs_adult_8_weeks),
  libsize.list=libsize.list.mm9,
  annotation=embedded,
  id.col=4,
  gc.col=5,
  name.col=7,
  bt.col=8,
  org=mm9,
  count.type=gene,
  normalization=edger,
  statistics=bayseq,
  qc.plots=mds,
  pcut=0.05,
  fig.format=png,
  export.what=c(annotation,p.value,fold.change),
  export.scale=log2,
  export.values=normalized,
  export.stats=mean,
  export.where=~/bayseq_test,
  out.list=TRUE
 )

 crashes with: Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] :
 non-numeric argument to binary operator

 and the problem stems from the countData class:

 object - metaseqR::normalize.edger(mm9.gene.counts[,9:12],sample.list.mm9)
 CD - new(countData,data=object,replicates=c(
 e14.5,e14.5,adult_8_weeks,adult_8_weeks))
 baySeq::libsizes(CD) - baySeq::getLibsizes(CD)

 and then

 cd - CD[,c(e14.5_1,e14.5_2)]

 crashes with

 Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] :
 non-numeric argument to binary operator

 while

 cd - CD[,1:2]

 works. In the version released with Bioconductor 2.14, this was not
 happening and the countData object could be accessed by sample names.
 Thanks in advance for informing me whether this is a bug or if it will
 remain like this from now on so I can update my code.

 Panagiotis

 -- *Panagiotis Moulos* Post doctoral researcher Biomedical Sciences 
 Research Center 'Alexander Fleming' Fleming 34, 16672, Vari, Greece 
 e-mail: mou...@fleming.gr mailto:mou...@fleming.gr [[alternative 
 HTML version deleted]]


-- 
Dr. Thomas J. Hardcastle
Senior Research Associate

Department of Plant Sciences
University of Cambridge
Downing Street
Cambridge, CB2 3EA
United Kingdom


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Confused by readGAlignments(scanBamParam( which ))

2015-02-23 Thread Leonardo Collado Torres
Hi,

It took me a while, but I was confused by getting different results
when using 'which' in reading BAM files. In my use case, I have a
GRanges with 15 ranges (different exons from the transcripts of a
gene) and I get much higher coverage values than simply using the
range of the 15 ranges.

I now realize that by using all 15 ranges I was duplicating the number
of alignments to use for calculating the coverage instead of what I
wanted: get all the reads that mapped to any of the 15 ranges counting
them once each.

You could say that as a user I was incorrectly using the function. Or
that it would be good to clarify that if you use overlapping ranges in
'which' then you could read a sequence read more than once.

Anyhow, I think that clarifying would be useful to others in the
future. Even a simple example could show this situation so others are
aware.

The code and output is available at
https://gist.github.com/lcolladotor/76f2efdd5d5108772696

Cheers,
Leo

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GRanges to VRanges coercion

2015-02-23 Thread Valerie Obenchain

Hi Thomas, Michael,

makeVRangesFromGRanges has been added to 1.13.34. The 
methods-VRanges-class.R file was already large so I added 
makeVRangesFromGRanges.R.


A few notes:

- '...' was replaced with 'keep.extra.columns'

- row names are propagated automatically so I removed the explicit 
row.names(vr)


- Errors are thrown in 2 cases: when arguments are not a single string 
and if 'ref' can't be found. You may want to consider a warning if one 
of the 'field' args can't be found in the GRanges metadata.


- I have not added unit tests. If you have some I'll put them in.

- The function is documented in VRanges-class.Rd. If you want to make a 
stand alone man page similar to makeGRangesFromDataFrame I'll add that too.


Thanks for the contribution Thomas. Nice addition.

Valerie



On 02/20/2015 04:17 PM, Michael Lawrence wrote:



On Thu, Feb 19, 2015 at 12:46 PM, Thomas Sandmann
sandmann.tho...@gene.com mailto:sandmann.tho...@gene.com wrote:

Hi Valerie, hi Michael,

I find myself frequently moving back and forth between data.frames,
GRanges and VRanges objects.

The makeGRangesFromDataFrame function from the GenomicRanges makes
the coercion between the former straightforward, but I couldn't find
anything similar for the second step, coercsion from GRanges to VRanges.

There is a coercion method defined in the GenomicRanges package:

getMethod(coerce, c(GRanges, VRanges))
Method Definition:

function (from, to = VRanges, strict = TRUE)
{
 obj - new(VRanges)
 as(obj, GRanges) - from
 obj
}
environment: namespace:GenomicRanges

Signatures:
 from  to
target  GRanges VRanges
defined GRanges VRanges

but I haven't been able to get it to work (or find where it is
documented). The source code shown above doesn't indicate how the
coercion method would check for the presence of required / optional
VRanges columns, e.g. 'ref', 'alt', 'altDepth', etc.


This is just the default coercion method added by the methods package
for a conversion of a class to its parent class. It obviously will not
do the right thing, in general.


Would it be useful to add an explicit makeVRangesFromGRanges
function to the VariantAnnotation package ( and / or the
corresponding coercion method) ?

Then it would be easy to go from a data.frame to a VRanges object,
e.g. as in this pseudocode:

makeVRangesFromGRanges(
makeGRangesFromDataFrame( data.frame )
)

You can find a first attempt at implementing the
makeVRangesFromGRanges function here
https://gist.github.com/tomsing1/bf917b1b48afb952e1c5, which you
are welcome to use / modify if you find it useful.

If this functionality should already be available, I'd be happy to
learn about that, too !


Val, do you think you could review and incorporate Thomas's code? It
seems like a good addition to me.

Thanks,
Michael

Thank you,
Thomas


SessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils
datasets  methods   base

other attached packages:
  [1] VariantAnnotation_1.12.9 Rsamtools_1.18.2
Biostrings_2.34.1XVector_0.6.0GenomicRanges_1.18.4
  [6] GenomeInfoDb_1.2.4   IRanges_2.0.1
  S4Vectors_0.4.0  BiocGenerics_0.12.1
  BiocInstaller_1.16.1
[11] roxygen2_4.1.0   devtools_1.7.0

loaded via a namespace (and not attached):
  [1] AnnotationDbi_1.28.1base64enc_0.1-2 BatchJobs_1.5
   BBmisc_1.9  Biobase_2.26.0
  [6] BiocParallel_1.0.3  biomaRt_2.22.0  bitops_1.0-6
  brew_1.0-6  BSgenome_1.34.1
[11] checkmate_1.5.1 codetools_0.2-10DBI_0.3.1
 digest_0.6.8fail_1.2
[16] foreach_1.4.2   GenomicAlignments_1.2.1
GenomicFeatures_1.18.3  iterators_1.0.7 Rcpp_0.11.4
[21] RCurl_1.95-4.5  RSQLite_1.0.0
rtracklayer_1.26.2  sendmailR_1.2-1 stringr_0.6.2
[26] tools_3.1.2 XML_3.98-1.1zlibbioc_1.12.0




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-23 Thread Leonard Goldstein
Hi Michael and Thomas,

I ran into the same problem in the past (i.e. when I started working
with functions like scanBam I expected them not to return the same
alignment multiple times)

One thing to consider might be that returning alignments multiple
times is consistent with the behavior of the samtools view command.
Quoting from the samtools manual:

“Important note: when multiple regions are given, some alignments may
be output multiple times if they overlap more than one of the
specified regions.”

Maybe there is an argument for keeping things consistent with
samtools? As you said, if documented properly, the user can decide
whether to reduce regions specified in which or not.

Leonard


On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
lawrence.mich...@gene.com wrote:
 We should at leaast try to avoid surprising the user. Seems like most
 people expect which to be a simple restriction, so I think for now I will
 just reduce the which, and if someone has a use case for separate queries,
 we can address it in the future.

 On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann sandmann.tho...@gene.com
 wrote:

 Personally, I don't have a use case with meaningful loci worth tracking,
 so keeping it simple would work for me.

 Incidentally, would it be good to deal with the 'which' parameter in a
 consistent way across different methods ? I just saw this recent post on
 the mailing list in which a used got confused by duplicate counts returned
 after passing 'which' to scanBamParam:

 https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html


 ---

 Thomas Sandmann, PhD
 Computational biologist

 Genentech, Inc.
 1 DNA Way
 South San Francisco, CA 94080
 USA

 Phone: +1 650 225 6273
 Fax: +1 650 225 5389
 Email: sandmann.tho...@gene.com

 If a man will begin with certainties, he shall end in doubts; but if he
 will be content to begin with doubts he shall end in certainties. -- Sir
 Francis Bacon


 On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

 We just have to decide which is the more useful interpretation of which
 -- as a simple restriction, or as a vector of meaningful locii, which will
 be analyzed individually? I would actually favor the first one (the same as
 yours), just because it's simpler. To keep track of the query ranges, we
 would need to add a new column to the returned object, which will more
 often than not just be clutter. I guess we could introduce a new parameter,
 reduceWhich which defaults to TRUE and reduces the which. If FALSE, it
 instead adds the column mapping back to the original which ranges.


 On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann 
 sandmann.tho...@gene.com wrote:

 Hi Michael,

 ah, I see. I hadn't realized that returning the pileups separately for
 each region could be a desired feature, but that makes sense. I agree, as
 it is easy for the user to 'reduce' the ranges beforehand your first option
 (e.g. returning the ID of the range) would be more flexible.

 Perhaps you would consider adding a sentence to the documentation of
 'which' on BamTallyParam's help page explaining that users might want to
 'reduce' their ranges beforehand if they are only interested in a single
 tally for each base ?

 Thanks a lot !
 Thomas





 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-23 Thread Hervé Pagès

On 02/23/2015 11:05 AM, Leonard Goldstein wrote:

Hi Michael and Thomas,

I ran into the same problem in the past (i.e. when I started working
with functions like scanBam I expected them not to return the same
alignment multiple times)

One thing to consider might be that returning alignments multiple
times is consistent with the behavior of the samtools view command.
Quoting from the samtools manual:

“Important note: when multiple regions are given, some alignments may
be output multiple times if they overlap more than one of the
specified regions.”


Thanks Leonard for pointing this out. This is indeed the reason why all
the functions in Rsamtools and GenomicAlignments that take a 'which'
argument (via a ScanBamParam object) treat it the samtools way, that
is, as a vector of meaningful loci.

Most of them track the index of the individual loci via a which_label
metadata column. See for example Rsamtools::pileup() and all the
readGAlignment*() functions in the GenomicAlignments package.
FWIW the man page for the readGAlignment*() functions contains the
following note:

 Note that a given record is loaded one time for each region it
 belongs to (this is a scanBam() feature, readGAlignments()
 is based on scanBam()).

but maybe this should be emphasized a little bit more.

Cheers,
H.



Maybe there is an argument for keeping things consistent with
samtools? As you said, if documented properly, the user can decide
whether to reduce regions specified in which or not.

Leonard


On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
lawrence.mich...@gene.com wrote:

We should at leaast try to avoid surprising the user. Seems like most
people expect which to be a simple restriction, so I think for now I will
just reduce the which, and if someone has a use case for separate queries,
we can address it in the future.

On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann sandmann.tho...@gene.com
wrote:


Personally, I don't have a use case with meaningful loci worth tracking,
so keeping it simple would work for me.

Incidentally, would it be good to deal with the 'which' parameter in a
consistent way across different methods ? I just saw this recent post on
the mailing list in which a used got confused by duplicate counts returned
after passing 'which' to scanBamParam:

https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html


---

Thomas Sandmann, PhD
Computational biologist

Genentech, Inc.
1 DNA Way
South San Francisco, CA 94080
USA

Phone: +1 650 225 6273
Fax: +1 650 225 5389
Email: sandmann.tho...@gene.com

If a man will begin with certainties, he shall end in doubts; but if he
will be content to begin with doubts he shall end in certainties. -- Sir
Francis Bacon


On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


We just have to decide which is the more useful interpretation of which
-- as a simple restriction, or as a vector of meaningful locii, which will
be analyzed individually? I would actually favor the first one (the same as
yours), just because it's simpler. To keep track of the query ranges, we
would need to add a new column to the returned object, which will more
often than not just be clutter. I guess we could introduce a new parameter,
reduceWhich which defaults to TRUE and reduces the which. If FALSE, it
instead adds the column mapping back to the original which ranges.


On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann 
sandmann.tho...@gene.com wrote:


Hi Michael,

ah, I see. I hadn't realized that returning the pileups separately for
each region could be a desired feature, but that makes sense. I agree, as
it is easy for the user to 'reduce' the ranges beforehand your first option
(e.g. returning the ID of the range) would be more flexible.

Perhaps you would consider adding a sentence to the documentation of
'which' on BamTallyParam's help page explaining that users might want to
'reduce' their ranges beforehand if they are only interested in a single
tally for each base ?

Thanks a lot !
Thomas








 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-23 Thread Leonard Goldstein
Sounds very sensible not to double count in the context of tallying
variants. I was more concerned with reducing which as the default
behavior for scanBam and other functions.

I wanted to bring up the samtools behavior as - for me at least -
inconsistencies between Rsamtools and samtools have been another
source of confusion in the past (e.g. different naming conventions for
fields like isize vs TLEN etc.)

Leonard


On Mon, Feb 23, 2015 at 11:22 AM, Michael Lawrence
lawrence.mich...@gene.com wrote:
 Maybe Rsamtools would want to follow this precedent. I think there might be
 a difference between fishing out alignments from a SAM/BAM, and deriving a
 summary (tallyVariants) from a BAM. It seems like an argument could be made
 for a tally set to not contain duplicates.

 On Mon, Feb 23, 2015 at 11:05 AM, Leonard Goldstein
 goldstein.leon...@gene.com wrote:

 Hi Michael and Thomas,

 I ran into the same problem in the past (i.e. when I started working
 with functions like scanBam I expected them not to return the same
 alignment multiple times)

 One thing to consider might be that returning alignments multiple
 times is consistent with the behavior of the samtools view command.
 Quoting from the samtools manual:

 “Important note: when multiple regions are given, some alignments may
 be output multiple times if they overlap more than one of the
 specified regions.”

 Maybe there is an argument for keeping things consistent with
 samtools? As you said, if documented properly, the user can decide
 whether to reduce regions specified in which or not.

 Leonard


 On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
 lawrence.mich...@gene.com wrote:
  We should at leaast try to avoid surprising the user. Seems like most
  people expect which to be a simple restriction, so I think for now I
  will
  just reduce the which, and if someone has a use case for separate
  queries,
  we can address it in the future.
 
  On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann
  sandmann.tho...@gene.com
  wrote:
 
  Personally, I don't have a use case with meaningful loci worth
  tracking,
  so keeping it simple would work for me.
 
  Incidentally, would it be good to deal with the 'which' parameter in a
  consistent way across different methods ? I just saw this recent post
  on
  the mailing list in which a used got confused by duplicate counts
  returned
  after passing 'which' to scanBamParam:
 
  https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html
 
 
  ---
 
  Thomas Sandmann, PhD
  Computational biologist
 
  Genentech, Inc.
  1 DNA Way
  South San Francisco, CA 94080
  USA
 
  Phone: +1 650 225 6273
  Fax: +1 650 225 5389
  Email: sandmann.tho...@gene.com
 
  If a man will begin with certainties, he shall end in doubts; but if
  he
  will be content to begin with doubts he shall end in certainties. --
  Sir
  Francis Bacon
 
 
  On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence 
  lawrence.mich...@gene.com wrote:
 
  We just have to decide which is the more useful interpretation of
  which
  -- as a simple restriction, or as a vector of meaningful locii, which
  will
  be analyzed individually? I would actually favor the first one (the
  same as
  yours), just because it's simpler. To keep track of the query ranges,
  we
  would need to add a new column to the returned object, which will more
  often than not just be clutter. I guess we could introduce a new
  parameter,
  reduceWhich which defaults to TRUE and reduces the which. If FALSE,
  it
  instead adds the column mapping back to the original which ranges.
 
 
  On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann 
  sandmann.tho...@gene.com wrote:
 
  Hi Michael,
 
  ah, I see. I hadn't realized that returning the pileups separately
  for
  each region could be a desired feature, but that makes sense. I
  agree, as
  it is easy for the user to 'reduce' the ranges beforehand your first
  option
  (e.g. returning the ID of the range) would be more flexible.
 
  Perhaps you would consider adding a sentence to the documentation of
  'which' on BamTallyParam's help page explaining that users might want
  to
  'reduce' their ranges beforehand if they are only interested in a
  single
  tally for each base ?
 
  Thanks a lot !
  Thomas
 
 
 
 
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel