Re: [aroma.affymetrix] An error message from running PSCBS

Henrik Bengtsson Sun, 15 Jan 2012 11:28:27 -0800

On Sun, Jan 15, 2012 at 11:20 AM, Henrik Bengtsson
<henrik.bengts...@aroma-project.org> wrote:
> On Sat, Jan 14, 2012 at 10:02 PM, Hoon Kim <wise...@gmail.com> wrote:
>> Dear Henrik,
>>
>> Thank you very much for your time to reply to my questions, and your
>> suggestions worked very well. By following your advice, I successfully
>> evaluated LOHs for all samples.
>>
>> I wonder if I can ask you two more questions, one related to your previous
>> email and the other related to the output of PSCBS.
>>
>> 1.  A question related to your previous email:
>> ------------------A part of your previous email--------
>> # Estimate the LOH *threshold* from autosomal chromosomes alone
>> DeltaLOH <- estimateDeltaLOH(extractChromosomes(fit, 1:22));
>> # Call LOH
>> fitC <- callLOH(fit, delta=DeltaLOH);
>> -----------------------------------------------------------------
>> In this case, do I also have to estimate the Alleleic Balance *threshold*
>> (deltaAB) from autosomal chromosomes alone in order to do "callAB"?
>
> You could do that, but since the AB caller relies on weaker
> assumptions it is more likely to work well anyway.  The advantage of
> estimating the DeltaAB and DeltaLOH threshold from only the autosomal
> chromosomes is that (ceteris paribus) the estimates will be the same
> regardless of the germline being XX or XY; include X and Y in those
> estimates will give a tiny (and negligible) bias between XXs and XYs.
>
>>
>> 2. Please, find the attached file that contains the PSCBS output for two
>> patients. In "Patient 1," most lohCalls are "NA" rather than "False," while
>> only a few of NAs are shown in "Patient 2."  I would greatly appreciate it
>> if you let me know why Patient 1 has many NAs rather than "False."
>> Unfortunately, I couldn't find a reference where I can find answer myself,
>> but I might have missed.
>
> See argument 'xorCalls' in callLOH(): "If @TRUE, a region already
> called AB, will not be called LOH."  The basic idea is that if a
> segment is already called AB, it cannot be LOH.  Thus, if AB = TRUE
> then LOH = FALSE, i.e. (AB,LOH)=(TRUE,FALSE).  However, we still run
> the LOH caller and gets the calls.  Now, to distinguish the cases
> where the AB and LOH agree from when they disagree, we report either
> (AB,LOH)=(TRUE,FALSE) or (TRUE,NA).  The former is reported when they
> agree and the latter when they disagree.  You can always override this
> yourself by the logic: LOH[AB] <- FALSE.


Forgot to add that (AB,LOH)=(TRUE,NA) call also be reported when the
LOH caller itself could not decide on either TRUE or FALSE, and hence
return NA, e.g. when there is too little evidence/data.

For your Patient #1, it is likely that the estimate of the DeltaLOH
threshold is not correct/sensible, which would explain your
(TRUE,FALSE) and (TRUE,NA) calls, as well as the (FALSE,FALSE) and
(FALSE,TRUE) calls.

/Henrik

>
> In the next release help("callLOH", package="PSCBS") will clarify this better.
>
> Cheers,
>
> Henrik
>
>>
>> Thank you very much.
>>
>> Best,
>>
>> Hoon
>>
>> On Fri, Jan 13, 2012 at 4:31 PM, Henrik Bengtsson
>> <henrik.bengts...@aroma-project.org> wrote:
>>>
>>> Hi,
>>>
>>> I could reproduce this using the PairedPSCBS object that you sent me;
>>>
>>> library("PSCBS");
>>>
>>> # Load Paired PSCBS results with AB calls already done
>>> fit <- loadObject("HoonKim_20120110.Rbin");
>>> fitC <- callLOH(fit);
>>> ## Error: muC1atNonAB < muC1atAB is not TRUE
>>>
>>> # The CN data (figure attached) looks good
>>> toPNG("HoonKim_20120110", tags="PSCBS", width=840, aspectRatio=0.8, {
>>> plotTracks(fit); })
>>>
>>> What causing the problem is basically that nothing is going on in the
>>> tumor, that is, it behaves as the germline.  No change points were
>>> detected on any chromomosome (Chr 1-22, 23 (X), 24 (Y), 25 (M)),
>>> resulting in one segment per chromosome.  Each such segment/chromosome
>>> was then correctly called to be in allelic balance (AB) by callAB()
>>> that you did earlier.  The exception is ChrY, which is non-AB.  The
>>> total CNs and the BAFs for Chr24 are probably "garbage", because this
>>> sample is an XX person, meaning we only observe noise on ChrY.
>>>
>>> One obvious solution is to exclude ChrY, either already when running
>>> the segmentation, or immediately afterward, e.g.
>>>
>>> df <- subset(df, chromosome %in% 1:23);
>>> fit   <- segmentByPairedPSCBS(df, verbose=verbose);
>>> fit <- callAB(fit);
>>> fit <- callLOH(fit);
>>>
>>> Alternatively, if you wish to keep ChrY, say for consistency with
>>> other samples, you can do as follows:
>>>
>>> # Estimate the LOH *threshold* from autosomal chromosomes alone
>>> DeltaLOH <- estimateDeltaLOH(extractChromosomes(fit, 1:22));
>>> # Call LOH
>>> fitC <- callLOH(fit, delta=DeltaLOH);
>>>
>>> In both cases you will get a warning that DeltaLOH was set to -Inf,
>>> which is alright.  See below.
>>>
>>>
>>> DETAILS:
>>> When calling LOH, we compare (a bootstrap quantile estimate of) the
>>> minor CN (C1) of each segment toward a threshold DeltaLOH.  See Eqn
>>> (7) in Olshen et al. (2011) [http://aroma-project.org/publications/]
>>> for details.  The main problem in calling LOH is to choose a good
>>> DeltaLOH.  As explained in the paper, this threshold is sample
>>> specific and requires rather strong assumptions and ad hoc estimators.
>>>  In the Section S1.2 of the Supplementary Materials, we propose on
>>> approach to estimate DeltaLOH from data.  The idea is to choose
>>> DeltaLOH as the midpoint between where we believe the true CN=1 and
>>> true CN=0 is.  We estimate the true CN=1 (mu1) from all segment that
>>> are in AB and with TCN near two.  That estimate is not that debatable
>>> and works for data sets.  However, estimating where the true CN=0
>>> (mu0) is, requires that we have some segments that have a zero minor
>>> CN.  The method for estimating DeltaLOH is estimateDeltaLOH().  Use it
>>> and you will get:
>>>
>>> DeltaLOH <- estimateDeltaLOH(fit);
>>> ## Error: muC1atNonAB < muC1atAB is not TRUE
>>>
>>> which confirms that it's there the error occurs.  BTW, the error is a
>>> sanity check asserting that our assumptions are correct and that we
>>> can safely go ahead and estimate DeltaLOH.
>>>
>>> First, when *all* segments are in AB, then all minor CNs are 1 (and so
>>> the major CNs).  In that case we cannot estimate mu0 and hence not
>>> DeltaLOH.  It is a non-identifiable setup.  However, when *all*
>>> segments are AB, we know that none are in LOH.  Thus, in that case we
>>> can use DeltaLOH = -Inf as a threshold (as in the above workarounds).
>>>
>>> However, in your case, ChrY is not called AB and hence it will try to
>>> estimate DeltaLOH as outlined in the paper.  It gets a good estimate
>>> of mu1, but not mu0.  It has to estimate mu0 based on the single
>>> non-AB segment, which is the data on ChrY.  This is what it looks like
>>> if you turn on the verbose messages:
>>>
>>> DeltaLOH <- estimateDeltaLOH(fit, verbose=-10);
>>>  ...
>>>  Weighted median of (corrected) C1 in allelic balance: 0.993
>>>  Smallest C1 among segments not in allelic balance: 1.15
>>>  There are 1 segments with in total 295 heterozygous SNPs with this level.
>>>  ...
>>>
>>> From this you see that the mu1 = 0.993 and mu0=1.15.  However, true
>>> CN=0 (mu0) should always be less than true CN=1 (mu1), which is why
>>> the sanity check fails.
>>>
>>> I will discuss with my PSCBS coauthors if it possible to automate the
>>> above, e.g. by detecting that ChrY is just noise with signal so that
>>> it can be excluded.  However, doing that introduces new assumptions,
>>> e.g. assuming it is a human genome that is segmented and so on.
>>>
>>> Thanks for the report
>>>
>>> Henrik
>>>
>>> On Tue, Jan 10, 2012 at 6:39 PM, Henrik Bengtsson
>>> <henrik.bengts...@aroma-project.org> wrote:
>>> > Hi,
>>> >
>>> > when you say "bad pair", what do you really mean?
>>> >
>>> > Please send me (offline; not to the mailing list) an example of the
>>> > 'fit' object where you get that error message, e.g.
>>> >
>>> > saveObject(fit, "HoonKim_20120110.Rbin");
>>> >
>>> > so that I can try to reproduce this.
>>> >
>>> > /Henrik
>>> >
>>> > On Mon, Jan 9, 2012 at 11:37 AM, wisekh6 <wise...@gmail.com> wrote:
>>> >> Dear Henrik,
>>> >> I have been doing paired PSCBS analysis on a Genome Wide SNP array 6.0
>>> >> data set in order to estimate LOH for 20 pairs of samples.
>>> >> The code on the aroma web site successfully estimated LOH for most
>>> >> pairs, while it fails to estimate LOH for several bad pairs.
>>> >> An error message I got from a bad pair is :
>>> >>>fit   <- callLOH(fit, verbose=verbose);
>>> >> Error: muC1atNonAB < muC1atAB is not TRUE
>>> >>
>>> >> The code, which is not exactly the same as my actual one, that I use
>>> >> is as follows:
>>> >> ===============================================
>>> >> library("aroma.affymetrix");
>>> >> verbose           <- Arguments$getVerbose(-10, timestamp=TRUE);
>>> >> library("PSCBS");
>>> >>
>>> >> res     <- doASCRMAv2(csR, verbose=verbose);
>>> >> data    <- extractPSCNArray(res$total);
>>> >> # Total CNs for the tumor relative to the matched normal
>>> >> CT      <- 2 * (data[,"total","T"] / data[,"total","N"]);
>>> >> # Allele B fractions for the tumor
>>> >> betaT   <- data[,"fracB","T"];
>>> >> # Allele B fractions for the normal
>>> >> betaN   <- data[,"fracB","N"];
>>> >> # Get (chromosome, position) annotation data
>>> >> ugp     <- getAromaUgpFile(res$total);
>>> >> chromosome <- ugp[,1,drop=TRUE];
>>> >> x       <- ugp[,2,drop=TRUE];
>>> >>
>>> >> # Setup data structure for Paired PSCBS
>>> >> df    <- data.frame(chromosome=chromosome, x=x, CT=CT, betaT=betaT,
>>> >> betaN=betaN);
>>> >>
>>> >> # Run PSCBS
>>> >> fit   <- segmentByPairedPSCBS(df, verbose=verbose);
>>> >> print("...segmentByPairedPSCBS")
>>> >>
>>> >> fit   <- callAB(fit, verbose=verbose);
>>> >> print("...callAB")
>>> >>
>>> >> fit   <- callLOH(fit, verbose=verbose);
>>> >> print("...callLOH")
>>> >> ===============================================
>>> >>
>>> >> I wonder if you provide me with any suggestion regarding this error
>>> >> message.
>>> >>
>>> >> Thank you,
>>> >>
>>> >> Best,
>>> >>
>>> >> Hoon
>>> >>
>>> >> sessionInfo() is as follows:
>>> >> ===============================================
>>> >>>             sessionInfo()
>>> >>
>>> >> R version 2.13.0 (2011-04-13)
>>> >> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> >>
>>> >> locale:
>>> >>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> >>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> >>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>> >>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> >>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> >>
>>> >> attached base packages:
>>> >> [1] splines   stats     graphics  grDevices utils     datasets
>>> >> methods
>>> >> [8] base
>>> >>
>>> >> other attached packages:
>>> >>  [1] Hmisc_3.8-3            survival_2.36-5        DNAcopy_1.26.0
>>> >>  [4] PSCBS_0.17.1           aroma.affymetrix_2.3.0 affxparser_1.24.0
>>> >>  [7] aroma.apd_0.2.0        R.huge_0.3.0           aroma.core_2.3.2
>>> >> [10] aroma.light_1.22.0     matrixStats_0.4.3      R.rsp_0.7.0
>>> >> [13] R.cache_0.5.2          R.filesets_1.1.3       digest_0.5.1
>>> >> [16] R.utils_1.9.6          R.oo_1.8.3             R.methodsS3_1.2.1
>>> >>
>>> >> loaded via a namespace (and not attached):
>>> >> [1] cluster_1.13.3  grid_2.13.0     lattice_0.19-23
>>> >> ===============================================
>>> >>
>>> >> --
>>> >> When reporting problems on aroma.affymetrix, make sure 1) to run the
>>> >> latest version of the package, 2) to report the output of sessionInfo() 
>>> >> and
>>> >> traceback(), and 3) to post a complete code example.
>>> >>
>>> >>
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups "aroma.affymetrix" group with website 
>>> >> http://www.aroma-project.org/.
>>> >> To post to this group, send email to aroma-affymetrix@googlegroups.com
>>> >> To unsubscribe and other options, go to
>>> >> http://www.aroma-project.org/forum/
>>>
>>> --
>>> When reporting problems on aroma.affymetrix, make sure 1) to run the
>>> latest version of the package, 2) to report the output of sessionInfo() and
>>> traceback(), and 3) to post a complete code example.
>>>
>>>
>>> You received this message because you are subscribed to the Google Groups
>>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>>> To unsubscribe and other options, go to
>>> http://www.aroma-project.org/forum/
>>
>>
>> --
>> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
>> version of the package, 2) to report the output of sessionInfo() and
>> traceback(), and 3) to post a complete code example.
>>
>>
>> You received this message because you are subscribed to the Google Groups
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] An error message from running PSCBS

Reply via email to