Re: Filtering out loci downstream of CRMAv2 (Was: Re: [aroma.affymetrix] Re: AromaUnitTotalCnBinarySet Manipulation)

Henrik Bengtsson Thu, 17 Mar 2011 20:08:34 -0700

Hi,

you never got an answer for the following:

On Thu, Feb 24, 2011 at 10:44 AM,  <greg.d.w...@gmail.com> wrote:
> Doing more reading on the aroma forum I see question #2 stated above
> dovetails with the idea of "filtering" probes.
>
> I noticed this statement in two places in the Vignettes:
>
>> "The full CDF contains what the default one does plus more. We are always
>> using the full CDF. If we want to do filtering, we do it afterward."
>
> But I can not find any examples, or code, that performs this filtering.

True, there is actually no high-level API for doing this.  It also
depends on what your downstream analysis you are doing.  For instance,
if you do segmentation, ideally you would tell the segmentation
algorithm which loci you wish it do ignore.   Depending on
segmentation methods/algorithms (and their implementations by their
maintainers) this may or may not be possible.  For instance, about a
year ago or so, the DNAcopy package (the CBS method) was updated to
allow passing locus-specific weights meaning theoretically we can
filter by given certain loci zero weights - even better is to weight
inversely by some quality score.  However, the aroma framework has yet
to provide a generic way to deal with such quality/filtering scores.

Having said this, you can of course pull out the CN data yourself from
the aroma framework and use the low-level API to the segmentation
methods for doing this kind of work.  For instance,

cbs <- CbsModel(cesN);
cn <- extractRawCopyNumbers(cbs, array=1, chromosome=5);

will give you a (in-memory) RawCopyNumbers object.  With this object
you can use fairly generic segmentByNnn() wrappers (provided by
aroma), e.g.

fit <- segmentByCBS(cn);
fit <- segmentByGLAD(cn);
fit <- segmentByHaarSeg(cn);

Each of the returned 'fit' objects will contain whatever the
underlying segmentation method returns, e.g. DNAcopy::segment().
These wrappers do indeed support weights, given that the
RawCopyNumbers object contains weights.  You can add weights to 'cn'
simply by:

cn$w <- weights;

where 'weights' should be an non-negative numeric vector of length J,
where J == nbrOfLoci(cn).  BTW, the genomic locations can be obtain by
getPositions(cn) and the CN signals as getSignals(cn).  You can also
construct a RawCopyNumbers object from scratch by for instance,

cn <- RawCopyNumbers(x=x, y=y, w=w, chromosome=3)

where x is the chromosomal position (each object can only hold one
chromosome), y is the CN signal, w are the weights.

Now, if you call either of the above segmentation methods, you will do
a weighted segmentation.   There is a special trick we do for
supporting at least zero-one weights for methods that do not handle
weights otherwise, e.g. GLAD.   Before calling the segmentation
method, we drop all loci with exactly zero weight.  Then, if the
remaining weights are all equal (e.g. 1), we can fall back to do a
non-weighted segmentation method on the remain loci (which will work
with GLAD).  If the remain loci have different weights, you will get a
warning saying weights have been ignored (loci with zero weights have
still be dropped though).

Now back to the high-level API of the aroma framework.  You can
manually add NAs for the loci that you want to excluded.  This has to
be done each of the data files.  This is not too hard to do,
especially not with a AromaUnitTotalCnBinarySet, but that is what I
consider a "nasty" trick.  Let me know if you wish to go down this
path and I could explain.

Finally, I'll add it to the todo list to add something like a
'unitsToFit' argument to, say, CbsModel.

>
> Why is the Full CDF recommended given the fact that Affy indicates several
> probes are of poor quality?

Here I have to say that this is my personal suggestion/recommendation.
 The main reason is that as the genome annotations are updated the set
of poor loci/SNPs may somewhat change.  So instead of doing up front,
I prefer to procrastinate the decision on what to filter out until it
is really needed, e.g. when segmenting.  And even better, in general I
prefer using weights instead of throwing data out.  I also don't think
keeping "poor" loci/probes in the preprocessing will hurt you - they
may even help you (but if so probably only marginally).

> Is there a way to assess poor quality probes and
> filter them from the SNP 6.0 array?

Nothing automatic.  By you could for instance look at (a robust
estimate of) the variation of each loci across are large set of
samples.

> Would this filtering take place after CRMAv2 has run?

So, yes, as argued above.

Hope this helps

/Henrik

>
> Thanks in advance for any clarity on these issues,
>
> Greg
>
>
> On Feb 23, 2011 4:02pm, Gregory W <greg.d.w...@gmail.com> wrote:
>> Hello,
>>
>>
>>
>> I was hoping to get some information about how to manipulate
>>
>> AromaUnitTotalCnBinarySets.
>>
>>
>>
>> I've already performed CRMAv2 on all the affy platforms in my study.
>>
>>
>>
>> I load the results into an R session with:
>>
>>
>>
>>
>>
>> >    tags
>> >    dsT
>> >    print(dsT)
>>
>> >    dsC
>> >    dsC$total
>>
>>
>>
>>
>> I have normal and tumor replicates for a particular patient.  I grab
>>
>> these data by:
>>
>>
>>
>> >    normals.indx
>> >    normals
>>
>>
>> And similarly for the tumors.  This is what normals and tumors
>>
>> contains:
>>
>>
>>
>> > normals
>>
>> .   AromaUnitTotalCnBinarySet:
>>
>> .   Name: MyStudy
>>
>> .   Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>
>> .   Full name: MyStudy,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>
>> .   Number of files: 2
>>
>> .   Names: Normal_Rep1,   Normal_Rep2
>>
>> .   Path (to the first file): totalAndFracBData/MyStudy,ACC,ra,-
>>
>> XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6
>>
>> .   Total file size: 14.35 MB
>>
>> .   RAM: 0.00MB
>>
>>
>>
>>
>>
>> > tumors
>>
>> .   AromaUnitTotalCnBinarySet:
>>
>> .   Name: MyStudy
>>
>> .   Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>
>> .   Full name: MyStudy,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>
>> .   Number of files: 2
>>
>> .   Names: Tumors_Rep1,   Tumors_Rep2
>>
>> .   Path (to the first file): totalAndFracBData/MyStudy,ACC,ra,-
>>
>> XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6
>>
>> .   Total file size: 14.38 MB
>>
>> .   RAM: 0.00MB
>>
>>
>>
>>
>>
>> I see there's a function getAverageFile that allows me to do this:
>>
>>
>>
>> >     normals.average
>> >     normals
>>
>>
>> and similarly for tumors.  Then I can take these results and pass them
>>
>> to CbsModel:
>>
>>
>>
>> >     cbs
>>
>>
>> HOWEVER, even though my replicates are from the same sample, they were
>>
>> processed using different protocols. So instead of averaging the
>>
>> normals and then averaging the tumors, i would like to take the
>>
>> respective ratios and then average the ratios.
>>
>>
>>
>> I can accomplish this in a round about way by using writeDataFrame and
>>
>> loading it back into R and then manipulating the data.  But then the
>>
>> data isn't in the right format for CbsModel since its been ratio-ized
>>
>> and is no longer two AromaUnitTotalCnBinarySets.
>>
>>
>>
>> 1-  Is there a way to perform an average of the ratios and then pass
>>
>> this into the CbsModel??
>>
>>
>>
>> Additionally, there are several Units (or probes) that I want to
>>
>> disqualify, for instance maybe by setting them to 1 for each tumor and
>>
>> normal replicate.
>>
>>
>>
>> 2- Is there a way to set
>>
>>
>>
>> > CN_473963
>>
>> > CN_473964
>>
>> > CN_473965
>>
>>
>>
>> to 1 for each file Normal_Rep1, Normal_Rep2, Tumor_Rep1 and Tumor_Rep2
>>
>> file in the AromaUnitTotalCnBinarySet??
>>
>>
>>
>> Thank you!
>>
>> Greg
>>
>>
>>
>>
>>
>>
>>
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.

You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: Filtering out loci downstream of CRMAv2 (Was: Re: [aroma.affymetrix] Re: AromaUnitTotalCnBinarySet Manipulation)

Reply via email to