Hi. On Mon, Jul 11, 2011 at 5:33 AM, Anita <anita.grigoria...@kcl.ac.uk> wrote: > Dear group, > > I would like to extract a data.frame on the exon level and followed > these commands (see below), surprisingly the data.frame had these > dimensions: > > 326983 249 > > while I expected to have a data.frame with > 1.4 Mill 249 samples > > Could you please advise me how I can extract the data for all exons > from this analysis?
You did indeed get all the data. You are using a custom CDF which defines (only) 22035 units (here "transcripts"); > cdf AffymetrixCdfFile: Path: annotationData/chipTypes/HuEx-1_0-st-v2 Filename: HuEx-1_0-st-v2,U-Ensembl49,G-Affy.cdf Filesize: 44.04MB Chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 22035 Cells per unit: 297.42 Number of QC units: 1 Each unit contains one or more groups, which here corresponds to "exons". For instance, for unit #34 there are 16 groups: > data <- readUnits(cdf, units=34); > print(names(data)); [1] "ENSG00000003096" > print(names(data[["ENSG00000003096"]]$groups)); [1] "4019161" "4019162" "4019163" "4019164" "4019167" [6] "4019169" "4019170" "4019173" "4019174" "4019175" [11] "4019176" "4019177" "4019179" "4019180" "4019196" [16] "4019197" In order words, the data frame that you get in the end *when using this particular CDF* will contain 22035 transcripts * <avg number of exons per transcript> which here becomes 326983 exons. One way to get to this count without processing all your data is to do: > library("aroma.affymetrix"); > chipType <- "HuEx-1_0-st-v2"; > cdf <- AffymetrixCdfFile$byChipType(chipType,tags="U-Ensembl49,G-Affy"); > nbrOfGroupsPerUnit <- getUnitSizes(cdf); > sum(nbrOfGroupsPerUnit); [1] 326983 To conclude, it is important that you pick the custom CDF you want, and that you understand the objectives it was created based upon. If you didn't understand the above about units and groups for Affymetrix CDFs, I recommend that you try to read about that too (see for instance the affxparser package). Hope this helps Henrik > > > Many thanks for your advise in advance, > > best wishes, > > Anita > > > > > library(aroma.affymetrix) > library(Biobase) > library(limma) > library(affy) > library(biomaRt) > > > ############################################################################## > verbose <- Arguments$getVerbose(-8, timestamp=TRUE) > chipType <- "HuEx-1_0-st-v2" > cdf <- AffymetrixCdfFile$byChipType(chipType,tags="U-Ensembl49,G- > Affy") > print(cdf) > > cs <- AffymetrixCelSet$byName("Affy_Exon_June2011", cdf=cdf) > print(cs) > > setCdf(cs,cdf) > > bc <- RmaBackgroundCorrection(cs,tag="U-Ensembl49,G-Affy") > csBC <- process(bc,verbose=verbose) > > qn <- QuantileNormalization(csBC,typesToUpdate="pm") > print(qn) > > csN <- process(qn,verbose=verbose) > > getCdf(csN) > > plmEx <- ExonRmaPlm(csN,mergeGroups=FALSE) > print(plmEx) > > fit(plmEx,verbose=verbose) > > cesEx <- getChipEffectSet(plmEx) > exFit <- extractDataFrame(cesEx,units=NULL,addNames=TRUE) > dim(exFit) > #326983 - 249 - exon based data > ############################################################################## > > > > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.6.0 aroma.affymetrix_2.1.0 aroma.apd_0.1.8 > [4] affxparser_1.22.1 R.huge_0.2.2 aroma.core_2.1.0 > [7] aroma.light_1.20.0 matrixStats_0.2.2 R.rsp_0.5.3 > [10] R.cache_0.4.2 R.filesets_1.0.1 digest_0.5.0 > [13] R.utils_1.7.5 R.oo_1.8.0 affy_1.28.1 > [16] R.methodsS3_1.2.1 limma_3.6.9 Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 preprocessCore_1.12.0 RCurl_1.6-6 > [4] tcltk_2.12.1 tools_2.12.1 XML_3.4-0 >> > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/