Hi. On Wed, Feb 23, 2011 at 4:02 PM, Gregory W <greg.d.w...@gmail.com> wrote: > Hello, > > I was hoping to get some information about how to manipulate > AromaUnitTotalCnBinarySets. > > I've already performed CRMAv2 on all the affy platforms in my study. > > I load the results into an R session with: > > >> tags <- "ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" ; >> dsT <- AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, >> chipType=chipType) ; >> print(dsT) >> dsC <- list() ; >> dsC$total <- dsT ; > > > I have normal and tumor replicates for a particular patient. I grab > these data by: > >> normals.indx <- match(normal.names,getNames(dsC$total)) >> normals <- extract(dsC$total, normals.indx) > > And similarly for the tumors. This is what normals and tumors > contains: > >> normals > . AromaUnitTotalCnBinarySet: > . Name: MyStudy > . Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY > . Full name: MyStudy,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY > . Number of files: 2 > . Names: Normal_Rep1, Normal_Rep2 > . Path (to the first file): totalAndFracBData/MyStudy,ACC,ra,- > XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6 > . Total file size: 14.35 MB > . RAM: 0.00MB > > >> tumors > . AromaUnitTotalCnBinarySet: > . Name: MyStudy > . Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY > . Full name: MyStudy,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY > . Number of files: 2 > . Names: Tumors_Rep1, Tumors_Rep2 > . Path (to the first file): totalAndFracBData/MyStudy,ACC,ra,- > XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6 > . Total file size: 14.38 MB > . RAM: 0.00MB > > > I see there's a function getAverageFile that allows me to do this: > >> normals.average <- getAverageFile(normals) >> normals <- newInstance(normals, files=list(normals.average)) > > and similarly for tumors. Then I can take these results and pass them > to CbsModel: > >> cbs <- CbsModel(tumors, normals) > > HOWEVER, even though my replicates are from the same sample, they were > processed using different protocols. So instead of averaging the > normals and then averaging the tumors, i would like to take the > respective ratios and then average the ratios. > > I can accomplish this in a round about way by using writeDataFrame and > loading it back into R and then manipulating the data. But then the > data isn't in the right format for CbsModel since its been ratio-ized > and is no longer two AromaUnitTotalCnBinarySets. > > 1- Is there a way to perform an average of the ratios and then pass > this into the CbsModel??
There is not-yet-public approach. The reason why it is "not yet public" is that the directory and/or file structure may be changed in the future. However, here is how it goes: Given your two AromaUnitTotalCnBinarySet:s, you can export total CN ratios for each tumor-normal pair as: dsT <- tumors; dsN <- normals; dsTCN <- exportTotalCnRatioSet(dsT, dsN, logBase=NULL, verbose=verbose); which will generate a new AromaUnitTotalCnBinarySet (stored under "rawCnData/") where the fullnames of the files have format GSM226867,ref=GSM226872,ratio,total (indicating that the TCN ratios have been calculated for GSM226867 with GSM226872 as the reference). Note that this will export TCN on the non-log scale (we avoid logarithms as far as possible in aroma, especially in the file formats!). In order to calculate the average of the T/N samples in this TCN data set, you can do: dfAvg <- getAverageFile(dsTCN, verbose=-10); Importantly, each TCN file i=1,2,...,I contains TCNs calculated as C_ij = T_ij / N_ij for loci j=1,2,...,J. This means that when you do the above, the average is calculated as C_j = median_i(C_ij) = median_i(T_ij / N_ij). If you want to calculate this on the log-ratio scale(*), you need to do: dfAvgLog <- getAverageFile(dsTCN, g=log2, h=function(y) 2^y, verbose=-10); which will calculate the average as C_j = h(median_i(g(C_ij)) = 2^(median_i(log2(C_ij)). Footnote: (*) Note that since we are using the median to calculate the average, the two approaches will indeed give identical results, cf. > diff <- dfAvg[,1] - dfAvgLog[,1] > summary(diff) 1 Min. :0 1st Qu.:0 Median :0 Mean :0 3rd Qu.:0 Max. :0 In other words, you don't have to bother with g=..., and h=... approach. Now we're close. The 'dfAvg' object is a file and it has a internal fullname, e.g. .average-signals-median-mad,72ad79d42f44e60321fc0ad2fbe30a59. To change this name, do: # Change the fullname (using a fullname translator) setFullName(dfAvg, "TumorVsNormal"); Next, before you can pass it to CbsModel, you have wrap the 'dfAvg' file up in a (single-file) data set, e.g. dsAvg <- newInstance(dsT, list(dfAvg)); Then do: # Segmentation model cbs <- CbsModel(dsAvg, tags="*,TCN,avg"); cbs$.calculateRatios <- FALSE; and so on. /Henrik [snip] > > Thank you! > Greg > > > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/