Aha. I have just discovered that the actual name of the CDF file makes a difference. Changing the CDF name back to Hs133P_Hs_REFSEQ.cdf makes it work without errors. I'm guessing that using a well-known name like "HG-U133_Plus_2" causes it to use Bioconductor's default version, in some cases. Am I right?
-Randy On Jan 18, 9:47 am, Randy Gobbel <randy.gob...@gmail.com> wrote: > Hi Mark. > > Here's what I was running. I've cut out the verbose tracing, added in > some info about the files. The CDF file is actually > Hs133P_Hs_REFSEQ.cdf from the BrainArray site, renamed--I'm wondering > if I need to do some more complex conversion on the CEL files. > > library(aroma.affymetrix) > verbose <- Arguments$getVerbose(-8, timestamp=TRUE) > > cs <- AffymetrixCelSet$byName('all', chipType='HG-U133_Plus_2')> getCdf(cs) > > AffymetrixCdfFile: > Path: annotationData/chipTypes/HG-U133_Plus_2 > Filename: HG-U133_Plus_2.CDF > Filesize: 15.21MB > Chip type: HG-U133_Plus_2 > RAM: 0.00MB > File format: v4 (binary; XDA) > Dimension: 1164x1164 > Number of cells: 1354896 > Number of units: 25102 > Cells per unit: 53.98 > Number of QC units: 9 > > > cs1 <- getFile(cs, 1) > > cs1 > > AffymetrixCelFile: > Name: EA08034_98020_H133+_MCCW199 > Tags: > Full name: EA08034_98020_H133+_MCCW199 > Pathname: rawData/all/HG-U133_Plus_2/EA08034_98020_H133+_MCCW199.CEL > File size: 12.93 MB (13555928 bytes) > RAM: 0.00 MB > File format: v4 (binary; XDA) > Platform: Affymetrix > Chip type: HG-U133_Plus_2 > Timestamp: 2009-12-18 16:21:36> getCdf(cs1) > > AffymetrixCdfFile: > Path: annotationData/chipTypes/HG-U133_Plus_2 > Filename: HG-U133_Plus_2.CDF > Filesize: 15.21MB > Chip type: HG-U133_Plus_2 > RAM: 0.00MB > File format: v4 (binary; XDA) > Dimension: 1164x1164 > Number of cells: 1354896 > Number of units: 25102 > Cells per unit: 53.98 > Number of QC units: 9 > > > bc <- RmaBackgroundCorrection(cs) > > csBC <- process(bc,verbose=verbose) > qn <- QuantileNormalization(csBC) > csN <- process(qn, verbose=verbose) > plm <- RmaPlm(csN) > fit(plm, verbose=verbose) > ces <- getChipEffectSet(plm) > gExprs <- extractDataFrame(ces, units=NULL, addNames=TRUE) > Error in list(`extractDataFrame(ces, units = NULL, addNames = TRUE)` = > <environment>, : > > [2010-01-18 09:13:25] Exception: Range of argument 'indices' is out of > range [1,30625]: [1,54675] > at throw(Exception(...)) > at throw.default(sprintf("Range of argument '%s' is out of range [%s, > %s]: [%s,%s]", .name, range[1], range[2], xrange[1], xrange[2])) > at throw(sprintf("Range of argument '%s' is out of range [%s,%s]: > [%s,%s]", .name, range[1], range[2], xrange[1], xrange[2])) > at getNumerics.Arguments(static, ..., asMode = "integer", disallow = > disallow) > at getNumerics(static, ..., asMode = "integer", disallow = disallow) > at getIntegers.Arguments(static, ..., range = range) > at getIntegers(static, ..., range = range) > at method(static, ...) > at Arguments$getIndices(indices, range = c(1, nbrOfCells), disallow > = "NaN") > at readRawData.AffymetrixCelFile(this, ...) > at readRawData(this, ...) > at getData.AffymetrixCelFile(this, indices = map[, "cell"], fields = > celFields[fields]) > at getData(this, indices = map[, "cell"], fields = celFields > [fields]) > at wit> ces > > ChipEffectSet: > Name: all > Tags: RBC,QN,RMA > Path: plmData/all,RBC,QN,RMA/HG-U133_Plus_2 > Platform: Affymetrix > Chip type: HG-U133_Plus_2,monocell > Number of arrays: 9 > Names: EA08034_98020_H133+_MCCW199, EA08034_98021_H133+_SKINW199, ..., > EA08034_98031_H133+_PN-1NN2 > Time period: 2010-01-18 09:13:24 -- 2010-01-18 09:13:25 > Total file size: 2.63MB > RAM: 0.02MB > Parameters: (probeModel: chr "pm")> getCdf(ces) > > AffymetrixCdfFile: > Path: annotationData/chipTypes/HG-U133_Plus_2 > Filename: HG-U133_Plus_2,monocell.CDF > Filesize: 4.44MB > Chip type: HG-U133_Plus_2,monocell > RAM: 0.00MB > File format: v4 (binary; XDA) > Dimension: 175x175 > Number of cells: 30625 > Number of units: 25102 > Cells per unit: 1.22 > Number of QC units: 9 > > > > On Jan 17, 1:59 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > > > > > Hi Randy. > > > From that error message, it looks like there was a mix of CDF files > > being used (my guess is 54675 corresponds to the number of Affymetrix > > probesets, whereas 30625 corresponds to the Refseq reorganization of > > probesets). Can you post the code you ran? > > > Cheers, > > Mark > > > On 16-Jan-10, at 11:41 AM, Randy Gobbel wrote: > > > > I'm also trying to get gene-level expression values, using HG- > > > U133_Plus_2 data. I downloaded the custom CDF that combines probes > > > into probesets that correspond to RefSeq genes, linked from the > > > aroma.affymetrix group page for this chip type (Hs133P_Hs_REFSEQ.cdf), > > > and ran the same set of commands. It works up to the point of trying > > > to extract expression values, then dies with: > > > > Exception: Range of argument 'indices' is out of range [1,30625]: > > > [1,54675] > > > > At this point, I'm not sure what to do next. Suggestions? It looks > > > like you were the creator of the CDF--is it the right one for this? > > > > -Randy > > > > On Jun 19 2009, 10:08 pm, Mark Robinson <mrobin...@wehi.edu.au> wrote: > > >> Hi Steve. > > > >> I don't know how common this is. Basically, a colleague found agene > > >> that was very differentially expressed when analyzing using the > > >> Affymetrix probesets definition and found virtually nothing when > > >> using > > >> the custom CDF that bundles all the probes for agenetogether. The > > >> reason was simple. There were several probesets designed for this > > >> geneand presumably they measure different isoforms. The probes for > > >> the DE probeset showed the difference, but all the other probesets > > >> didn't. When you use a robust linear model like RMA, outliers get > > >> downweighted. Because the DE probes accounted for a small proportion > > >> of the probes (I think there was 3 or 4 other probesets at this > > >> locus), their effect got washed out. > > > >> So, its a tradeoff. Sometimes (perhaps most of the time) you gain by > > >> lumping them all together ... more information, more power to detect > > >> changes. But, sometimes (perhaps rarely) it can mislead. I'm sure > > >> I'm not the only one to observe such things. The probe-level data > > >> (usually?) doesn't lie. But, since you are comparing across > > >> platforms, you will undoubtedly find this as you go along. Different > > >> microarray designs often measure slightly different things. > > > >> One other thing. Be sure to convert your CDF to binary if it is not > > >> already using affxparser's convertCdf(). Having this info stored in > > >> binary format will make the processing much faster. I think the MBNI > > >> custom CDFs are text. > > > >> Cheers, > > >> Mark > > > >> On 20/06/2009, at 6:55 AM, Steve P wrote: > > > >>> Mark, > > > >>> Thanks for the information. That is very helpful. > > > >>> I want to do the latter, which is to "combine probesets such that > > >>> all > > >>> probes for a givengene(by some definition -- RefSeq, Ensembl, etc) > > >>> are used to arise at the summarize value." > > > >>> I was able to obtain a custom CDF for the U133-A array. So I will > > >>> try > > >>> that approach. But part of the reason I want to do this is to be > > >>> able > > >>> to compare values across platforms, so I may need to find/build a > > >>> custom CDF for the other platform. > > > >>> I would appreciate any cautionary advice you have about > > >>> summarizing at > > >>> thegenelevel. > > > >>> Regards, > > >>> -Steve > > > >>> On Jun 17, 9:56 am, Steve Piccolo <steve.picc...@gmail.com> wrote: > > >>>> Yesterday I posted this question to the list, but the spam blocker > > >>>> didn't > > >>>> let it through. Below my question is a response from Mark Robinson. > > > >>>> --------------------------------------------------------------------------- > > >>>> ----------------------------------- > > > >>>> Following the example provided > > >>>> athttp://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array > > >>>> ... > > >>>> , > > >>>> I am running the following code: > > > >>>> chipType <- "HT_HG-U133A" > > >>>> dataSet = "myData" > > > >>>> library(aroma.affymetrix) > > >>>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE) > > > >>>> cdf <- AffymetrixCdfFile$byChipType(chipType) > > >>>> cs <- AffymetrixCelSet$byName(dataSet, cdf=cdf) > > > >>>> bc <- RmaBackgroundCorrection(cs) > > >>>> csBC <- process(bc,verbose=verbose) > > >>>> qn <- QuantileNormalization(csBC) > > >>>> csN <- process(qn, verbose=verbose) > > > >>>> plm <- RmaPlm(csN) > > >>>> fit(plm, verbose=verbose) > > > >>>> ces <- getChipEffectSet(plm) > > >>>> gExprs <- extractDataFrame(ces, units=NULL, addNames=TRUE) > > > >>>> This seems to be working beautifully. > > > >>>> However, I'm doing an analysis that requires my expression values > > >>>> to > > >>>> be summarized at thegenelevel rather than the probeset level. > > > >>>> In the gExprs object that results from the above analysis, I get a > > >>>> data.frame object in which each row contains expression values > > >>>> for a > > >>>> given probeset across all samples. What I would love to see in each > > >>>> row is an expression value for a givengene. I believe RMA has the > > >>>> ability to do this, but I'm not sure how to do it via > > >>>> aroma.affymetrix. > > > >>>> Any suggestions? I'm happy to provide any more details that would > > >>>> be > > >>>> helpful. > > > >>>> Regards, > > >>>> -Steve > > > >>>> --------------------------------------------------------------------------- > > >>>> ----------------------------------- > > > >>>> Hi Steve. > > > >>>> As to your question, it depends on what you need. When you say you > > >>>> want > > >>>> every row to be agene, do you just want to know thegenename that > > >>>> goes > > >>>> with the probeset identifier, or do you want to combine probesets > > >>>> such that > > >>>> all probes for a givengene(by some definition -- RefSeq, Ensembl, > > >>>> etc) are > > >>>> used to arise at the summarize value (a la the MBNI CustomCDF)? > > > >>>> If the former, then there are annotation packages within R. > > > >>>> If the latter, I have a few cautionary tales of doing this, since > > >>>> the > > >>>> different probesets for a given locus can be measuring different > > >>>> variants. > > >>>> But if you still want to do this, we need to make a CDF file > > >>>> specific to > > >>>> the annotation you want. For the standard HG-U133 arrays, I know > > >>>> the MBNI > > >>>> guys made the CDFs and we could use those within aroma.affymetrix. > > >>>> I don't > > >>>> know if they build custom CDFs for the HT- arrays. > > > >>>> Hope that gets you started.... > > read more »
-- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en