Re: [Bioc-devel] specify the color in plotMutationSpectrum() of SomaticSignatures library

2016-02-18 Thread Wolfgang Huber
Dear Rebecca

can you please post this in the Bioconductor user forum - this is not really a 
developer question. 

> On Feb 17, 2016, at 20:33 GMT+1, sun  wrote:
> 
> Hi All,
> 
> How can I specify the color that I would like to use in
> plotMutationSpectrum()?
> 
> eg.
> 
> plotMutationSpectrum(sca_motifs, "study", normalize = TRUE), I would like
> to only use "red" color here, how should I do?
> 
> Thanks,
> 
> Rebecca
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Wolfgang

Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

wolfgang.hu...@embl.de
http://www.huber.embl.de

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Strand-Awareness for Restrict Function

2016-02-18 Thread Hervé Pagès

On 02/18/2016 01:00 AM, Maintainer wrote:

Hello,


Hi,

I'm putting this back to bioc-devel and setting the Subject to
"Strand-Awareness for Restrict Function" since this seems to be
a continuation of this thread (that you started here). I hope you
don't mind.



Setdiff automatically reduces the resulting ranges. Is it intentional ?


Absolutely. The "vector-wise set operations" (i.e. union, intersect,
setdiff) treat 'x' and 'y' as 2 sets of genomic positions (or sets of
integers when 'x' and 'y' are Ranges objects) so duplicated positions
are removed.


I can't see any formal definition of what setdiff is supposed to do, so I'm not 
sure. For example,

setdiff(IRanges(1:10, width =1), IRanges(2, 4))

reduces all of the adjacent ranges in the first variable. This is not 
documented.


The man page for set operations on Ranges and RangesList objects says:

  The 'union', 'intersect' and 'setdiff' methods for Ranges objects
  return a "normal" Ranges object representing the union, intersection
  and (asymmetric!) difference of the sets of integers represented by
  'x' and 'y'.

Admittedly this man page was too hard to find (you had to do
?`setdiff,Ranges,Ranges-method`, and ?`setdiff,GRanges,GRanges-method`
to get the man page for the methods for GRanges objects) and the above
explanation could probably be clearer. I've tried to remedy this (in
IRanges 2.5.32 and GenomicRanges 1.23.20) by adding aliases to these
man pages so they can be found with just ?setdiff (or ?union, or
?intersect).



Also, it would be good if it had a strict.strand option


All the setops methods for genomic ranges have an 'ignore.strand'
option. Is that what you are asking for or do you have something
else in mind with 'strict.strand'?


and an explanation of which variables the ... option accepts.
Currently, it does not have any effect and the documentation of ... is 
ambiguous.


humanGenome

GRanges object with 23 ranges and 0 metadata columns:
seqnames ranges strand
   
[1] chr1 [1, 249250621]  *
[2] chr2 [1, 243199373]  *
[3] chr3 [1, 198022430]  *
[4] chr4 [1, 191154276]  *
[5] chr5 [1, 180915260]  *

humanGenes

GRanges object with 22980 ranges and 1 metadata column:
 seqnames ranges strand | gene_id
 | 
   1chr19 [ 58858172,  58874214]  - |   1
  10 chr8 [ 18248755,  18258723]  + |  10
 100chr20 [ 43248163,  43280376]  - | 100
1000chr18 [ 25530930,  25757445]  - |1000
   1 chr1 [243651535, 244006886]  - |   1

setdiff(humanGenome, humanGenes, strict.strand = FALSE)  # Has no effect 
because of transcripts' strands.

GRanges object with 23 ranges and 0 metadata columns:
seqnames ranges strand
   
[1] chr1 [1, 249250621]  *
[2] chr2 [1, 243199373]  *
[3] chr3 [1, 198022430]  *
[4] chr4 [1, 191154276]  *
[5] chr5 [1, 180915260]  *

--


The ellipsis was actually not needed (and was not used) in the
setops methods for genomic ranges so I removed it in GenomicRanges
1.23.20.

Back to your original question: setdiff() is not the right tool for
the strand-aware trimming you're trying to achieve but it occurs to
me that psetdiff() (element-wise setdiff) might be a suitable
alternative (although you have some work to do to prepare an 'y'
that is parallel to 'x' and contains the regions to trim). I don't
think it's going to be easier/simpler than the restrict-based solution
I gave you earlier though.

H.


Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia



devteam-bioc mailing list
To unsubscribe from this mailing list send a blank email to
devteam-bioc-le...@lists.fhcrc.org
You can also unsubscribe or change your personal options at
https://lists.fhcrc.org/mailman/listinfo/devteam-bioc



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Efficient Random Sampling of Positions in GRanges

2016-02-18 Thread Hervé Pagès

Hi,

On 02/18/2016 03:00 AM, Dario Strbenac wrote:

Good day,

Thank you for these two suggestions. I have questions about both options.

GPos seems to have a limit on the number of bases it can represent. I get an 
error: Error in GPos(samplingAreas) : too many genomic positions in 'pos_runs'. 
What exactly is this limit ? Could it be added to the documentation ?


It is documented. The man page for GPos says:

  Note:

 Like for any Vector derivative, the length of a GPos object cannot
 exceed ‘.Machine$integer.max’ (i.e. 2^31 on most platforms).
 GPos() will return an error if 'pos_runs' contains too many
 genomic positions.

I've started to believe that with GPos we might have a use case that is
strong enough to justify adding support for long Vector objects. This
is a big change to our infrastructure though and it won't happen before
the next release.


I used all of the human chromosomes. There is also no documentation of the 
sample function for GPos objects.


That's because there is no sample() function for GPos objects:

  > sample
  function (x, size, replace = FALSE, prob = NULL)
  {
if (length(x) == 1L && is.numeric(x) && x >= 1) {
if (missing(size))
size <- x
sample.int(x, size, replace, prob)
}
else {
if (missing(size))
size <- length(x)
x[sample.int(length(x), size, replace, prob)]
}
  }

As you can see, sample() works on anything that has a length() and
is subsettable (a.k.a. "vector-like" object). See ?sample for more
information.

H.



Could regioneR be improved to consider strand ? It generates regions with no 
strand. I would like the regions to have a strand, since I have a 
strand-specific sequencing dataset, and for regions to be possibly chosen on 
the opposite strand to a masked region, such as when the genome is masked by 
transcripts for the purpose of choosing intergenic sequences.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Efficient Random Sampling of Positions in GRanges

2016-02-18 Thread Dario Strbenac
Good day,

Thank you for these two suggestions. I have questions about both options.

GPos seems to have a limit on the number of bases it can represent. I get an 
error: Error in GPos(samplingAreas) : too many genomic positions in 'pos_runs'. 
What exactly is this limit ? Could it be added to the documentation ? I used 
all of the human chromosomes. There is also no documentation of the sample 
function for GPos objects.

Could regioneR be improved to consider strand ? It generates regions with no 
strand. I would like the regions to have a strand, since I have a 
strand-specific sequencing dataset, and for regions to be possibly chosen on 
the opposite strand to a masked region, such as when the genome is masked by 
transcripts for the purpose of choosing intergenic sequences.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel