[aroma.affymetrix] Re: Error in ACC using custom cdf

Henrik Bengtsson Tue, 03 Feb 2009 12:34:58 -0800

Hi,

I'm swamped and still haven't had a chance to create a real fix, but a
quick fix is to override the arguments that defaults to "-XY", but
replacing them with an integer/index vector specifying the *units* on
the autosomal chromosomes, that is, the ones that are are expected to
be copy neutral (overall) across samples.


For example:

  acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2")

is identical to:

  acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2", subsetToAvg="-XY")


When specified this way, all units but those on ChrX and ChrY are used
to fit the calibration model.  For the human genome, you can grab the
ChrX & ChrY units you wish to exclude as follows:

  gi <- getGenomeInformation(cdf);
  unitsSex <- getUnitsOnChromosome(gi, 23:24);
  unitsToFit <- setdiff(1:nbrOfUnits(cdf), unitsSex);

This is almost the same as doing:

  gi <- getGenomeInformation(cdf);
  unitsToFit <- getUnitsOnChromosome(gi, 1:22);

but the former also includes all units that are not mapped to the
genome including control units.  It should make little difference what
to use.  In your case, you need to specify which units are
autosomal/copy neutral in the mouse genome.  From what I understand
from your UGP files, that would be 1:19.

To use these units to *fit* the model - all units are updated
(calibrated/normalized) - do:

  acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2",
subsetToAvg=unitsToFit);

For BasePositionNormalization and FragmentLengthNormalization you have to do:

  bpn <- BasePositionNormalization(csC, target="zero", unitsToFit=unitsToFit);

and

  fln <- FragmentLengthNormalization(ces, target="zero",
subsetToFit=unitsToFit);

Hope this helps for now.

Henrik



On Tue, Feb 3, 2009 at 12:01 PM, David Rosenberg
<[email protected]> wrote:
>
> Do they ufl/ugp/acc files need to be build with X and Y chromosomes
> included?  If so, this won't be too hard to fix.
>
>
> On Jan 30, 9:50 pm, David Rosenberg <[email protected]>
> wrote:
>> As I think about this further, it occurs to me that there are other
>> potential problems with this chip/cdf.  I was looking at the ugp/ufl
>> files and it appears that fragment length normalization etc. is
>> performed on a unit-by-unit basis.  The way the array/cdf is currently
>> structured, not all probes in a particular unit are precisely co-
>> located.  While the location differences within a unit are quite small
>> (100 bp or so), this does result in units where probes hybridize to
>> multiple digestion fragments.  This definitely 'breaks' fragment
>> length normalization as I see it currently implemented.  Now, the cdf
>> can be restructured such that all units map to a single genomic
>> location, but that seems to preclude merging/summarizing further down
>> the analysis workflow.  If there were a way to perform these
>> normalization procedures on a per-probe basis rather than a per-unit
>> basis, this would be preferable.  Let me know your thoughts.
>>
>> On Jan 30, 2009, at 6:41 PM, Henrik Bengtsson wrote:
>>
>>
>>
>> > Hi,
>>
>> > could you please forward your UGP file to me; I think I know what the
>> > problem is, but I guess it easier for me to check it myself first.
>>
>> > BTW, although this looks like a custom CDF - if you want to, I can put
>> > up a group page specific to this chip type, documenting the chip type
>> > (and either link or host those annotation files). Might be useful for
>> > a future fellow researcher.  It's your call.
>>
>> > /Henrik
>>
>> > On Fri, Jan 30, 2009 at 12:08 PM, David Rosenberg
>> > <[email protected]> wrote:
>>
>> >> I am receiving the following errors when attempting to perform
>> >> allelic
>> >> crosstalk calibration on a dataset using a custom CDF.  I don't know
>> >> if this is indicative of errors in the internal structure of the CDF,
>> >> or if there are parameters that I must pass to
>> >> AllelicCrosstalkCalibration due to the properties of the CDF (i.e. #
>> >> of chromosomes, etc.)
>>
>> >>> library("aroma.affymetrix")
>> >> Loading required package: R.utils
>> >> Loading required package: R.oo
>> >> Loading required package: R.methodsS3
>> >> R.methodsS3 v1.0.3 (2008-07-02) successfully loaded. See ?R.methodsS3
>> >> for help.
>> >> R.oo v1.4.6 (2008-08-11) successfully loaded. See ?R.oo for help.
>> >> R.utils v1.1.3 (2009-01-12) successfully loaded. See ?R.utils for
>> >> help.
>> >> Loading required package: aroma.core
>> >> Loading required package: R.cache
>> >> R.cache v0.1.7 (2008-02-27) successfully loaded. See ?R.cache for
>> >> help.
>> >> Loading required package: R.rsp
>> >> R.rsp v0.3.4 (2008-03-06) successfully loaded. See ?R.rsp for help.
>> >> Type browseRsp() to open the RSP main menu in your browser.
>> >> Loading required package: matrixStats
>> >> Loading required package: digest
>> >> Loading required package: aroma.light
>> >> aroma.light v1.11.1 (2009-01-12) successfully loaded. See ?
>> >> aroma.light
>> >> for help.
>> >> aroma.core v1.0.0 (2009-01-12) successfully loaded. See ?aroma.core
>> >> for help.
>> >> Loading required package: affxparser
>> >> Loading required package: R.huge
>> >> R.huge v0.1.6 (2008-07-03) successfully loaded. See ?R.huge for help.
>> >> Loading required package: aroma.apd
>> >> aroma.apd v0.1.3 (2006-06-14) successfully loaded. See ?aroma.apd for
>> >> help.
>> >> aroma.affymetrix v1.0.0 (2009-01-12) successfully loaded. See ?
>> >> aroma.affymetrix for help.
>>
>> >>> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
>>
>> >>> # Don't display too many decimals.
>> >>> options(digits=4)
>> >>> chipType="MOUSEDIVm520650"
>> >>> cdf<-AffymetrixCdfFile$byChipType("MOUSEDIVm520650")
>> >>> print(cdf)
>> >> AffymetrixCdfFile:
>> >> Path: annotationData/chipTypes/MOUSEDIVm520650
>> >> Filename: MOUSEDIVm520650.CDF
>> >> Filesize: 463.91MB
>> >> Chip type: MOUSEDIVm520650
>> >> RAM: 0.00MB
>> >> File format: v4 (binary; XDA)
>> >> Dimension: 2572x2680
>> >> Number of cells: 6892960
>> >> Number of units: 973990
>> >> Cells per unit: 7.08
>> >> Number of QC units: 4
>> >>> gi<-getGenomeInformation(cdf)
>> >>> print(gi)
>> >> UgpGenomeInformation:
>> >> Name: MOUSEDIVm520650
>> >> Tags: DMR20090129
>> >> Pathname: annotationData/chipTypes/MOUSEDIVm520650/
>> >> MOUSEDIVm520650,DMR20090129.ugp
>> >> File size: 4.64MB
>> >> RAM: 0.00MB
>> >> Chip type: MOUSEDIVm520650
>> >>> si<-getSnpInformation(cdf)
>> >>> print(si)
>> >> UflSnpInformation:
>> >> Name: MOUSEDIVm520650
>> >> Tags: DMR20090129
>> >> Pathname: annotationData/chipTypes/MOUSEDIVm520650/
>> >> MOUSEDIVm520650,DMR20090129.ufl
>> >> File size: 3.72MB
>> >> RAM: 0.00MB
>> >> Chip type: MOUSEDIVm520650
>> >> Number of enzymes: 2
>> >>> acs<-AromaCellSequenceFile$byChipType(getChipType(cdf,
>> >>> fullname=FALSE))
>> >>> print(acs)
>> >> AromaCellSequenceFile:
>> >> Name: MOUSEDIVm520650
>> >> Tags: DMR20090129
>> >> Pathname: annotationData/chipTypes/MOUSEDIVm520650/
>> >> MOUSEDIVm520650,DMR20090129.acs
>> >> File size: 170.91MB
>> >> RAM: 0.00MB
>> >> Number of data rows: 6892960
>> >> File format: v1
>> >> Dimensions: 6892960x26
>> >> Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw,
>> >> raw,
>> >> raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw,
>> >> raw
>> >> Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
>> >> Footer: <createdOn>20090129 12:45:11 CST</
>> >> createdOn><platform>Affymetrix</platform><chipType>MOUSEDIVm520650</
>> >> chipType>
>> >> Chip type: MOUSEDIVm520650
>> >> Platform: Affymetrix
>> >>> csR<-AffymetrixCelSet$byName("mDIV/testset", cdf=cdf)
>> >>> print(csR)
>> >> AffymetrixCelSet:
>> >> Name: testset
>> >> Tags:
>> >> Path: rawData/mDIV/testset/MOUSEDIVm520650
>> >> Platform: Affymetrix
>> >> Chip type: MOUSEDIVm520650
>> >> Number of arrays: 44
>> >> Names: SNP_mDIV_A10-10_081308, SNP_mDIV_A10-201_091708, ...,
>> >> SNP_mDIV_A9-9_081308
>> >> Time period: 2008-08-13 15:39:47 -- 2008-09-18 00:20:33
>> >> Total file size: 2899.65MB
>> >> RAM: 0.06MB
>> >> There were 50 or more warnings (use warnings() to see the first 50)
>> >>> acc<-AllelicCrosstalkCalibration(csR, model="CRMAv2")
>> >>> print(acc)
>> >> AllelicCrosstalkCalibration:
>> >> Data set: testset
>> >> Input tags:
>> >> User tags: *
>> >> Asterisk ('*') tags: ACC,ra,-XY
>> >> Output tags: ACC,ra,-XY
>> >> Number of files: 44 (2899.65MB)
>> >> Platform: Affymetrix
>> >> Chip type: MOUSEDIVm520650
>> >> Algorithm parameters: (rescaleBy: chr "all", targetAvg: num 2200,
>> >> subsetToAvg: chr "-XY", mergeShifts: logi TRUE, B: int 1, flavor: chr
>> >> "sfit", algorithmParameters:List of 3, ..$ alpha: num [1:8] 0.1 0.075
>> >> 0.05 0.03 0.01 0.0025 0.001 0.0001, ..$ q: num 2, ..$ Q: num 98)
>> >> Output path: probeData/testset,ACC,ra,-XY/MOUSEDIVm520650
>> >> Is done: FALSE
>> >> RAM: 0.01MB
>> >>> csC<-process(acc, verbose=verbose)
>> >> 20090130 14:03:43|Calibrating data set for allelic cross talk...
>> >> Error in if (any(units < 1)) stop("Argument 'units' contains non-
>> >> positive indices.") :
>> >> missing value where TRUE/FALSE needed
>> >> 20090130 14:03:43|Calibrating data set for allelic cross talk...done
>> >>> traceback()
>> >> 13: readCdfCellIndices(pathname, ...)
>> >> 12: getCellIndicesChunk(getPathname(this), units = unitsChunk, ...,
>> >>       verbose = verbose)
>> >> 11: fcn(idxs[ii], ...)
>> >> 10: lapplyInChunks.numeric(units, function(unitsChunk) {
>> >>       cdfChunk <- getCellIndicesChunk(getPathname(this), units =
>> >> unitsChunk,
>> >>           ..., verbose = verbose)
>> >>       res <- vector("list", length(unitsChunk))
>> >>       res[[1]] <- unlist(cdfChunk, use.names = useNames)
>> >>       res
>> >>   }, chunkSize = 1e+05, useNames = useNames, verbose = verbose)
>> >> 9: lapplyInChunks(units, function(unitsChunk) {
>> >>      cdfChunk <- getCellIndicesChunk(getPathname(this), units =
>> >> unitsChunk,
>> >>          ..., verbose = verbose)
>> >>      res <- vector("list", length(unitsChunk))
>> >>      res[[1]] <- unlist(cdfChunk, use.names = useNames)
>> >>      res
>> >>  }, chunkSize = 1e+05, useNames = useNames, verbose = verbose)
>> >> 8: getCellIndices.AffymetrixCdfFile(cdf, units = subset, useNames =
>> >> FALSE,
>> >>      unlist = TRUE)
>> >> 7: getCellIndices(cdf, units = subset, useNames = FALSE, unlist =
>> >> TRUE)
>> >> 6: getSubsetToAvg.AllelicCrosstalkCalibration(this)
>> >> 5: getSubsetToAvg(this)
>> >> 4: getParameters.AllelicCrosstalkCalibration(this)
>> >> 3: getParameters(this)
>> >> 2: process.AllelicCrosstalkCalibration(acc, verbose = verbose)
>> >> 1: process(acc, verbose = verbose)
> >
>

--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: Error in ACC using custom cdf

Reply via email to