[aroma.affymetrix] Re: GWS_6 hg19
Dear Henrik, On May 27, 9:03 pm, Henrik Bengtsson wrote: > On Fri, May 27, 2011 at 4:02 AM, cstratowa > > wrote: > > Dear Henrik, > > > Thank you for this clarification. > > > Since I am still using an older version of aroma.affymetrix which does > > not contain parameter 'lengthRange' I cannot specify the fragment > > length range. > > Just curious, what's the reason for not updating? It's should be a > matter of seconds. Due to our Firewall settings we have to download all packages manually. Furthermore, we have to install the packages on all of our servers and on the cluster, which I am using to distribute the GLAD jobs. > > > However, at the moment I do not see any disadvantage in > > not specifying lengthRange. When I compare the the CN ranges obtained > > with GLAD between na30 and na31 there are only slight differences due > > to the different annotations. What is your opinion? > > The UGP file that contain genomic positions are probably the same. > > If you are using the FragmentLengthNormalization with the na31 UFLs > provided online, I'm rather sure that you do need to set 'lengthRange' > otherwise you'll get an "identifiability" error, cf. thread 'CRMA v2 > failed at FragmentLengthNormalization for Mouse diversity array' > started on 2011-01-13: > > https://groups.google.com/forum/#!topic/aroma-affymetrix/iRJ7Fggi0JY > > Details on the fragment-lengths in na31 UFL provided online: > > > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6,Full"); > > ufl <- getAromaUflFile(cdf); > > ufl > > AromaUflFile: > Name: GenomeWideSNP_6 > Tags: Full,na31,hg19,HB20110328 > Full name: GenomeWideSNP_6,Full,na31,hg19,HB20110328 > Pathname: > annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na31,hg19,HB20110328.ufl > File size: 7.18 MB (7526452 bytes) > RAM: 0.00 MB > Number of data rows: 1881415 > File format: v1 > Dimensions: 1881415x2 > Column classes: integer, integer > Number of bytes per column: 2, 2 > Footer: 20110328 19:41:33 > PDTAffymetrix rm>GenomeWideSNP_6,FullHenrik > Bengtsso > n[...]< > filename>GenomeWideSNP_6,Full.CDF493291745 sum>3fbe0f6e7c8a346105238a3f3d10d4ecGe > nomeWideSNP_6.na31.annot.csv1501563495 > 3f59efe386ffc1af656d01be0ab035e9Genome > WideSNP_6.cn.na31.annot.csv8792574806a > 21fef62c71fb56bc5dda4499b70a7f > Chip type: GenomeWideSNP_6,Full > Platform: Affymetrix > > > summaryOfUnits(ufl) > > snp cnp affxSnp other total > enzyme1-only 577 9609 0 0 10186 > enzyme2-only 502 23432 0 0 23934 > both 929322 912574 0 0 1841896 > missing 1545 211 3022 621 5399 > total 931946 945826 3022 621 1881415 > > The above summary show that there are very few SNPs that exists only > fragments generated by only one of the two enzymes. The estimator of > the fragment-length normalization model requires such SNPs. By > filtering by length using argument 'lengthRange' more and more SNPs > will have a length value on one enzyme only. Although I have updated my package to be able to use CRMAv2, I am still using CRMA for the SNP6.0 arrays, i.e. correcting for allelic cross-talk and for fragment length effects. The reason for this is that I did some testing by comparing CRMAv2 and CRMA for SNP6.0 with real datasets, and the only difference that I saw when looking e.g. at ChromosomeExplorer was that there was a slight decrease in CN values when using CRMAv2. Thus I preferred to stay with CRMA. I am not sure if the fragment length correction for SNP6.0 depends on my choice of using CRMA instead of CRMAv2. In any case not filtering for fragment length does not seem to affect my CN computations. Do you have any comments? Best regards Christian > > Hope this clarifies > > /Henrik > > > > > Best regards > > Christian > > > On May 26, 11:01 pm, Henrik Bengtsson > project.org> wrote: > >> Hi. > > >> On Thu, May 26, 2011 at 4:05 AM, cstratowa > > >> wrote: > >> > Dear Henrik, > > >> > Thank you for the update. > > >> > Meanwhile I have already created the ufl/ugp files by simply modifying > >> > your R-scripts GenomeWideSNP_6,UFL,na30.R and > >> > GenomeWideSNP_6,UGP,na30.R. > > >> > Thus, my question is: Did you do something special or do you still use > >> > these R-scripts? > > >> Nope, nothing special, except that I made sure that the file footer > >> reflects the fact that the date and my initials are correct. > > &g
[aroma.affymetrix] Re: GWS_6 hg19
Dear Henrik, Thank you for this clarification. Since I am still using an older version of aroma.affymetrix which does not contain parameter 'lengthRange' I cannot specify the fragment length range. However, at the moment I do not see any disadvantage in not specifying lengthRange. When I compare the the CN ranges obtained with GLAD between na30 and na31 there are only slight differences due to the different annotations. What is your opinion? Best regards Christian On May 26, 11:01 pm, Henrik Bengtsson wrote: > Hi. > > On Thu, May 26, 2011 at 4:05 AM, cstratowa > > wrote: > > Dear Henrik, > > > Thank you for the update. > > > Meanwhile I have already created the ufl/ugp files by simply modifying > > your R-scripts GenomeWideSNP_6,UFL,na30.R and > > GenomeWideSNP_6,UGP,na30.R. > > > Thus, my question is: Did you do something special or do you still use > > these R-scripts? > > Nope, nothing special, except that I made sure that the file footer > reflects the fact that the date and my initials are correct. > > One more thing specific to the na30 -> na31 transition: Affymetrix no > longer filter the PCR fragment-length data, which means that the new > UFL files now contains fragment lengths of a large range. You now > have to specify the range as FragmentLengthNormalization(..., > lengthRange=c(400,2000)). > > > > > BTW, on your new web-site I can no longer find these scripts. Are they > > still available? > > Yes, I did remove the links from the website because it is too much > work to also keep those links up-to-date with everything else. > Instead, I have added the archive of all build scripts (sorted by chip > type and NA revision) to the package itself so that they will be > accessible via: > > path <- system.file("buildScripts/", package="aroma.affymetrix"); > > This will be available in the next release of aroma.affymetrix. Until > then, you can also grab/browse them via: > > https://r-forge.r-project.org/scm/viewvc.php/pkg/aroma.affymetrix/ins... > > As you'll see, that for GenomeWideSNP_6 I am moving towards setting up > somewhat interactive user scripts for doing this. You may have to > install the R.menu package, i.e. > > source("http://aroma-project.org/hbLite.R";); > hbLite("R.menu"); > > Hope this helps > > Henrik > > > > > Best regards > > Christian > > > On May 26, 3:36 am, Henrik Bengtsson > project.org> wrote: > >> A late follow up/to close this thread: > > >> The requested na31/hg19 annotation files are now available: > > >>http://aroma-project.org/chipTypes/GenomeWideSNP_6 > > >> /Henrik > > >> On Mon, Mar 28, 2011 at 6:48 AM, cstratowa > > >> wrote: > >> > Dear Henrik, > > >> > I am also looking for the new (old) na31 annotation files for SNP6. > > >> > I have seen that for the Nsp/Sty arrays you have put the na31 (hg19) > >> > ufl/ugp files on the Chip types web-site for download. > > >> > However, the web-site for SNP6 still contains the old na30 (hg18) ufl/ > >> > ugp files. > >> > Is it possible to update this site to the new na31 (hg19) ufl/ugp > >> > files, too? > > >> > Best regards > >> > Christian > > >> > On Feb 2, 9:39 am, Mete Civelek wrote: > >> >> Hi All, > > >> >> A few weeks ago there was a discussion about ugp and ufl files based > >> >> on the hg19 annotation. Are those files available for share? > > >> >> Thank you > > >> > -- > >> > When reporting problems on aroma.affymetrix, make sure 1) to run the > >> > latest version of the package, 2) to report the output of sessionInfo() > >> > and traceback(), and 3) to post a complete code example. > > >> > You received this message because you are subscribed to the Google > >> > Groups "aroma.affymetrix" group with > >> > websitehttp://www.aroma-project.org/. > >> > To post to this group, send email to aroma-affymetrix@googlegroups.com > >> > To unsubscribe and other options, go > >> > tohttp://www.aroma-project.org/forum/ > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > > version of the package, 2) to report the output of sessionInfo() and > > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > > "aroma.affymetrix" group with websitehttp://www.aroma-project.org/. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: GWS_6 hg19
Dear Henrik, Thank you for the update. Meanwhile I have already created the ufl/ugp files by simply modifying your R-scripts GenomeWideSNP_6,UFL,na30.R and GenomeWideSNP_6,UGP,na30.R. Thus, my question is: Did you do something special or do you still use these R-scripts? BTW, on your new web-site I can no longer find these scripts. Are they still available? Best regards Christian On May 26, 3:36 am, Henrik Bengtsson wrote: > A late follow up/to close this thread: > > The requested na31/hg19 annotation files are now available: > > http://aroma-project.org/chipTypes/GenomeWideSNP_6 > > /Henrik > > On Mon, Mar 28, 2011 at 6:48 AM, cstratowa > > wrote: > > Dear Henrik, > > > I am also looking for the new (old) na31 annotation files for SNP6. > > > I have seen that for the Nsp/Sty arrays you have put the na31 (hg19) > > ufl/ugp files on the Chip types web-site for download. > > > However, the web-site for SNP6 still contains the old na30 (hg18) ufl/ > > ugp files. > > Is it possible to update this site to the new na31 (hg19) ufl/ugp > > files, too? > > > Best regards > > Christian > > > On Feb 2, 9:39 am, Mete Civelek wrote: > >> Hi All, > > >> A few weeks ago there was a discussion about ugp and ufl files based > >> on the hg19 annotation. Are those files available for share? > > >> Thank you > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > > version of the package, 2) to report the output of sessionInfo() and > > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > > "aroma.affymetrix" group with websitehttp://www.aroma-project.org/. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: GWS_6 hg19
Dear Henrik, I am also looking for the new (old) na31 annotation files for SNP6. I have seen that for the Nsp/Sty arrays you have put the na31 (hg19) ufl/ugp files on the Chip types web-site for download. However, the web-site for SNP6 still contains the old na30 (hg18) ufl/ ugp files. Is it possible to update this site to the new na31 (hg19) ufl/ugp files, too? Best regards Christian On Feb 2, 9:39 am, Mete Civelek wrote: > Hi All, > > A few weeks ago there was a discussion about ugp and ufl files based > on the hg19 annotation. Are those files available for share? > > Thank you -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Dear Henrik, Thank you, this discussion was very helpful for me. Your suggestion to compute getAverageFile() first is a good idea, I will try it when I find time to change my code again. It is also good to know that I can change ".Rcache" by using setCacheRootPath(). Best regards Christian On Aug 4, 10:21 am, Henrik Bengtsson wrote: > Hi Christian, > > On Wed, Aug 4, 2010 at 9:04 AM, cstratowa > > wrote: > > Dear Henrik, > > > Thank you for your suggestion to use ceRef directly. > > > Regarding your explanation of getAverageFile() the question is where > > the generated output will be saved. > > > As I have mentioned, each node creates first a plmData subdirectory, > > e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the > > normalized CEL-files located in "Prostate/plmData". Thus the output of > > getAverageFile() should be stored for each node separately. > > Ah, now I see; I've been reading it as you were linking the > directories, not the individual CEL files. > > > > > This seems indeed to be the case, since e.g. the subdirectory > > "Prostate/Prostate21/plmData/Prostate,ACC,-XY,QN,RMA,A+B,FLN,-XY/ > > Mapping250K_Nsp" contains the file ".average-intensities-median- > > mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a certain time > > while subdirectory "Prostate/Prostate8/plmData/Prostate,ACC,- > > XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Nsp" contains a file with the same > > name, i.e. ".average-intensities-median- > > mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a different > > time. > > Yes. > > As I understand it now, you preprocess all of the data, and wait for > everything to be done (all *,chipEffects.CEL files to be generated) > before continuing with the above, correct? If so, I'd suggest that > you also wait for getAverageFile() to finish first. Then that average/ > results file be available to all your cluster nodes as well. I even > think you don't have to link each CEL file separately, because nothing > else should be written back to the data set. It should be enough to > link each data set directory, or even just plmData/ itself (not even > sure the need to split it up anymore). > > > > > As far as I understand these are the files created by getAverageFile() > > and thus each node creates its own file saved in its own subdirectory, > > so there will be no problem. > > Yes. Now I agree with you. > > > > > It seems that the problem was indeed the result of saveObject() stored > > in ".Rcache", which caused the race conditions. Since the removal of > > saveObject() I have until now experienced no problems. > > Yes. You are correct. > > Since caching is mainly done for memoization purposes, that is, to > load already calculated results that are computational expensive to > obtain from file, it is recommended to store the cache in a fast > place. In other words, it is better if the .Rcache directory is on > the local drive of the machine, rather than on a shared file system. > If you had done that, then each machine would had to have do those > calculations by themselves once, but when done the memoization would > be faster and you would not have had any race conditions accessing the > memoized results. The default ~/.Rcache/ can be changed, > cf.http://www.aroma-project.org/archive/GoogleGroups/web/caching. > > This was a useful conversation to me; it made me see other ways for > (unnecessary) race conditions to occur, and remind me how important it > is to not overlook the smallest details in scientific communication > since they can make big differences. > > Cheers, > > Henrik > > > > > Thank you for your help. > > Best regards > > Christian > > > On Aug 2, 2:54 pm, Henrik Bengtsson wrote: > > > > Hi. > > > > On Mon, Jul 26, 2010 at 12:00 PM, cstratowa > > > > wrote: > > > > Dear Henrik, > > > > > Maybe, my explanation was not clear enough: > > > > > I have created my own package based on S4 classes, where one subclass > > > > is "AromaSNP" with slots celset, normset, plmset, effectset as lists, > > > > and methods readSNPData(), normalizeSNPData(), computeCN(), > > > > computeRawCN(), among others. Furthermore, the package includes > > > > scripts batch.aroma.norm.R, batch.aroma.model.R, > > > > batch.aroma.combine.R, and a perl script which distributes these > > > > scripts to the different cluster nodes. > > > > > 1, Normalization: Script batch.
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Dear Henrik, Thank you for your suggestion to use ceRef directly. Regarding your explanation of getAverageFile() the question is where the generated output will be saved. As I have mentioned, each node creates first a plmData subdirectory, e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the normalized CEL-files located in "Prostate/plmData". Thus the output of getAverageFile() should be stored for each node separately. This seems indeed to be the case, since e.g. the subdirectory "Prostate/Prostate21/plmData/Prostate,ACC,-XY,QN,RMA,A+B,FLN,-XY/ Mapping250K_Nsp" contains the file ".average-intensities-median- mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a certain time while subdirectory "Prostate/Prostate8/plmData/Prostate,ACC,- XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Nsp" contains a file with the same name, i.e. ".average-intensities-median- mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a different time. As far as I understand these are the files created by getAverageFile() and thus each node creates its own file saved in its own subdirectory, so there will be no problem. It seems that the problem was indeed the result of saveObject() stored in ".Rcache", which caused the race conditions. Since the removal of saveObject() I have until now experienced no problems. Thank you for your help. Best regards Christian On Aug 2, 2:54 pm, Henrik Bengtsson wrote: > Hi. > > On Mon, Jul 26, 2010 at 12:00 PM, cstratowa > > > > wrote: > > Dear Henrik, > > > Maybe, my explanation was not clear enough: > > > I have created my own package based on S4 classes, where one subclass > > is "AromaSNP" with slots celset, normset, plmset, effectset as lists, > > and methods readSNPData(), normalizeSNPData(), computeCN(), > > computeRawCN(), among others. Furthermore, the package includes > > scripts batch.aroma.norm.R, batch.aroma.model.R, > > batch.aroma.combine.R, and a perl script which distributes these > > scripts to the different cluster nodes. > > > 1, Normalization: Script batch.aroma.norm.R creates first the > > subdirectory structure which I have already described, and then does > > the normalization. All normalization steps run on one server and the > > results are saved as AromaSNP object "aroma" in "Prostate/ > > Prostate.Rdata". Furthermore, subdirectories "Prostate/probeData" and > > "Prostate/plmData" are created. > > > 2, GLAD: Script batch.aroma.norm.R is called from each node > > separately. For each node it creates first a plmData subdirectory, > > e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the > > normalized CEL-files located in "Prostate/plmData". Then it loads > > object "aroma" from "Prostate/Prostate.Rdata", whereby each node has a > > separate RAM of 2GB. Slot "ar...@effectset" contains the normalized > > data and is called from computeCN() as "cesList <- ar...@effectset". > > This "cesList" (which is in the RAM of each node) is passed to "model > > <- GladModel(cesList, refList)", and is thus used to compute > > getAverageFile(), if refList=NULL (which is the default). Since each > > node calls the same "cesList" object, function saveObject() writes to > > identical filenames which causes the clash. > > > Here is the relevant code for each cluster node: > > > cesList <- ar...@effectset > > refList <- NULL #but see below > > model <- GladModel(cesList, refList) > > ce <- ChromosomeExplorer(model, tags=tags) > > setParallelSafe(ce, status=TRUE) #suggested by you long time ago > > tmp <- process(ce,...) > > cnrs <- getRegions(model, ...) > > ar...@cnregion <- cnrs > > > Each node saves the result, i.e. object "aroma", in a different Rdata > > file, e.g. "Prostate/Prostate.GLAD.21.Rdata" and saves the images in > > its own reports subdirectory, e.g. "Prostate/Prostate21/reports". > > > 3, Script batch.aroma.combine.R combines the results of each Rdata > > file in a final file "Prostate/Prostate.GLAD.All.Rdata", combines all > > images from the different subdirectories in "Prostate/reports", and > > finally deletes all temporary Rdata files and subdirectories. > > > Since you mention that I should call getAverageFile() before calling > > GladModel(), here is one more information: > > All necessary information is stored in a text-file "pheno.txt", which > > is read initially and contains the names of the CEL-files, the > > l
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Dear Henrik, Maybe, my explanation was not clear enough: I have created my own package based on S4 classes, where one subclass is "AromaSNP" with slots celset, normset, plmset, effectset as lists, and methods readSNPData(), normalizeSNPData(), computeCN(), computeRawCN(), among others. Furthermore, the package includes scripts batch.aroma.norm.R, batch.aroma.model.R, batch.aroma.combine.R, and a perl script which distributes these scripts to the different cluster nodes. 1, Normalization: Script batch.aroma.norm.R creates first the subdirectory structure which I have already described, and then does the normalization. All normalization steps run on one server and the results are saved as AromaSNP object "aroma" in "Prostate/ Prostate.Rdata". Furthermore, subdirectories "Prostate/probeData" and "Prostate/plmData" are created. 2, GLAD: Script batch.aroma.norm.R is called from each node separately. For each node it creates first a plmData subdirectory, e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the normalized CEL-files located in "Prostate/plmData". Then it loads object "aroma" from "Prostate/Prostate.Rdata", whereby each node has a separate RAM of 2GB. Slot "ar...@effectset" contains the normalized data and is called from computeCN() as "cesList <- ar...@effectset". This "cesList" (which is in the RAM of each node) is passed to "model <- GladModel(cesList, refList)", and is thus used to compute getAverageFile(), if refList=NULL (which is the default). Since each node calls the same "cesList" object, function saveObject() writes to identical filenames which causes the clash. Here is the relevant code for each cluster node: cesList <- ar...@effectset refList <- NULL #but see below model <- GladModel(cesList, refList) ce <- ChromosomeExplorer(model, tags=tags) setParallelSafe(ce, status=TRUE) #suggested by you long time ago tmp <- process(ce,...) cnrs <- getRegions(model, ...) ar...@cnregion <- cnrs Each node saves the result, i.e. object "aroma", in a different Rdata file, e.g. "Prostate/Prostate.GLAD.21.Rdata" and saves the images in its own reports subdirectory, e.g. "Prostate/Prostate21/reports". 3, Script batch.aroma.combine.R combines the results of each Rdata file in a final file "Prostate/Prostate.GLAD.All.Rdata", combines all images from the different subdirectories in "Prostate/reports", and finally deletes all temporary Rdata files and subdirectories. Since you mention that I should call getAverageFile() before calling GladModel(), here is one more information: All necessary information is stored in a text-file "pheno.txt", which is read initially and contains the names of the CEL-files, the location of each CEL-file, an alias name for each CEL-file, and optionally columns "Reference" or "Pairs". By default "refList <- NULL", however, when I activate option "Reference" then all CEL-files with "1" in column "Reference" are used as reference. Here is the relevant code: cesList <- ar...@effectset refList <- list(); for (chiptype in names(cesList)) { ces <- cesList[[chiptype]]; refcol <- which(pheno[,reference] == 1); datcol <- which(pheno[,reference] == 0); if (reference == "Pairs") { cesList[[chiptype]] <- extract(ces, datcol); refList[[chiptype]] <- extract(ces, refcol); } else { cesRef <- extract(ces, refcol); ceRef <- getAverageFile(cesRef); ## convert single ceRef file into a set of identical files of same size/length as ces numarr <- nbrOfArrays(ces); cesRef <- rep(list(ceRef), numarr); cesRef <- newInstance(ces, cesRef, mergeStrands=ces $mergeStrands, combineAlleles=ces$combineAlleles); refList[[chiptype]] <- cesRef; }#if }#for model <- GladModel(cesList, refList); Thus, if really necessary, I can always create "pheno.txt" with Reference column "1" for all CEL-files in order to call getAverageFile(). However, I must admit that it is still not clear to me what the advantage would be, especially now that you have removed function saveObject(). I hope that this explanation could explain better what the different steps are. Best regards Christian On Jul 23, 4:35 pm, Henrik Bengtsson wrote: > Hi. > > On Jul 22, 10:24 am, cstratowa > > > ingelheim.com> wrote: > > Dear Henrik, > > > Thank you very much for changing the code for getAverageFile(), I will > > try it and let you know. > > > Thank you also for the explanation of writing to a temporary file, now > > I understand your intention. > > > Regarding race conditions: No, I do not assume that aroma.* takes care &
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Dear Henrik, Thank you very much for changing the code for getAverageFile(), I will try it and let you know. Thank you also for the explanation of writing to a temporary file, now I understand your intention. Regarding race conditions: No, I do not assume that aroma.* takes care of potential race conditions. Here is what I do: Assume that I have downloaded from GEO a prostate cancer dataset consisting of 40 CEL-files. Then I create a directory "Prostate" and subdirectories "Prostate/annotationData" and "Prostate/rawData" following your required file structure. However, starting with the 2nd CEL-file I create subdirectories "Prostate/Prostate2",...,"Prostate/Prostate40", each containing a symbolic link to "../annotationData" and "../rawData" from "Prostate". Thus when running GLAD each cluster node has its own directory to write to, e.g. "Prostate/Prostate21/reports" for creating the images. Only after all nodes have finished their computations, then I move the relevant files to the main directory, e.g. all images are moved to "Prostate/reports". Afterwards I delete the subdirectories "Prostate2",...,"Prostate40" and their contents. As you can see, using this setup there should not be any race conditions. The only remaining problem are the temporary files which you store in ".Rcache" in my home directory. I know that you store the monocell files in ".Rcache/ aroma.affymetrix", so that the monocell files have to be created only once. However, for the temporary files please allow me to suggest that you create a temporary directory in your file structure, e.g. "Prostate/tmp", where these files are stored. In my case this would definitely solve my problem since each subdirectory would contain its own temporary directory, e.g. "Prostate/Prostate21/tmp". I do not know if this change would break any code or cause any problems, it is only a naive suggestion. What is your opinion? Best regards Christian On Jul 21, 6:46 pm, Henrik Bengtsson wrote: > Hi Christian. > > On Wed, Jul 21, 2010 at 2:59 PM, cstratowa > > > > wrote: > > Dear Henrik, > > > Thank you for this extensive explanation and sorry for the late reply > > but I was pretty busy. > > > Yes, it did work before! As I mentioned with versions > > aroma.affymetrix_1.1.0 and earlier I have never had a problem doing > > the analyses on cluster nodes. > > > Looking at the source code of different versions of saveObject() I > > realize that using "saveObject(..,safe=FALSE)" would be the same as > > using saveObject() from R.utils_0.9.1. Thus in principle this could > > solve my problem. Is this correct? > > > Sadly, method AffymetrixCelSet::getAverageFile() in > > aroma.affymetrix_1.6.2 does not allow to pass parameter "safe=FALSE" > > to saveObject(). Is it possible for you to change it? > > I have decided to remove that debug code that calls saveObject(), > because it is not really needed anymore. The main reason why I remove > it is because it is obsolete code. The intention of that code snippet > in getAverageFile() was never to protect against race conditions (it > was just an unplanned side effect). > > Until next release, you can get a patched version as: > > library("aroma.affymetrix"); > downloadPackagePatch("aroma.affymetrix"); > > Note, as I said in my previous reply, by processing (=here calling > getAverageFile() on) the same data set on multiple hosts, you are > potentially running into race conditions resulting in corrupt data. > You should at least be aware of it and understand why this is the > case. > > > > > It is still not clear to me why you create first a temporary file > > which you then rename (although you mention power failures etc). > > However, would it be possible to add a random number to the temporary > > filename, e.g. "*.tmp.1948234", so that the problem with the existing > > temporary file could be avoided? > > The main purpose of writing to a temporary file and then renaming is > to make sure that the file is complete. If something happens while > writing the temporary file, the final file will not exist/be created. > If one would write to the final file from the beginning, there is no > way for us to know if the file was correctly created or not. So, > writing via a temporary file, we effectively have a way of creating > files in one atomic action. > > > > > Probably you only need to change line 59 to: > > > pathnameT <- sprintf("%s.tmp.%i", pathname, > > as.integer(runif(1,1,))) > > In order not to corrupt the temporary f
[aroma.affymetrix] Re: Problem with GLAD on linux cluster
Dear Henrik, Thank you for this extensive explanation and sorry for the late reply but I was pretty busy. Yes, it did work before! As I mentioned with versions aroma.affymetrix_1.1.0 and earlier I have never had a problem doing the analyses on cluster nodes. Looking at the source code of different versions of saveObject() I realize that using "saveObject(..,safe=FALSE)" would be the same as using saveObject() from R.utils_0.9.1. Thus in principle this could solve my problem. Is this correct? Sadly, method AffymetrixCelSet::getAverageFile() in aroma.affymetrix_1.6.2 does not allow to pass parameter "safe=FALSE" to saveObject(). Is it possible for you to change it? It is still not clear to me why you create first a temporary file which you then rename (although you mention power failures etc). However, would it be possible to add a random number to the temporary filename, e.g. "*.tmp.1948234", so that the problem with the existing temporary file could be avoided? Probably you only need to change line 59 to: pathnameT <- sprintf("%s.tmp.%i", pathname, as.integer(runif(1,1,))) Regarding your suggestion to wrap getAverageFile() in Mutex calls I have no idea if there exists an R-package for this purpose. Neither Rmpi nor snow seem to be suitable for this purpose (at least not without a complete re-write of my package). One other question: Is it allowed to delete the contents of directory .Rcache/ aroma.affymetrix/idChecks? Best regards Christian On Jul 2, 12:47 am, Henrik Bengtsson wrote: > Hi Christian. > > On Tue, Jun 29, 2010 at 3:39 PM, cstratowa > > > > wrote: > > Dear Henrik, > > > Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could > > run my analysis on our sge-cluster w/o any problems. > > > Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are > > curently testing with 8 chips whether my package based on > > aroma.affymetrix still works on the cluster. The normalization step on > > a server did run fine, howeever, distributing the 8 samples on the > > cluster to run GladModel() resulted in the problem that 3 of 8 cluster > > nodes did stop with the following error message: > > > Loading required package: GLAD > > ... > > Loading required package: RColorBrewer > > Loading required package: Cairo > > Error in list(`computeCN(aroma, model = model, arrays = arrays[i], > > chromosomes = 1:23, ref` = , : > > > [2010-06-29 15:08:49] Exception: Cannot save to file. Temporary file > > already exists: ~/.Rcache/aroma.affymetrix/idChecks/ > > a1c33926939ee43fbed83ae69301d215.tmp > > at throw(Exception(...)) > > at throw.default("Cannot save to file. Temporary file already > > exists: ", pathn > > at throw("Cannot save to file. Temporary file already exists: ", > > pathnameT) > > at saveObject.default(list(key = key, keyIds = lapply(key, digest2), > > id = id), > > at saveObject(list(key = key, keyIds = lapply(key, digest2), id = > > id), idPathn > > at getAverageFile.AffymetrixCelSet(ces, force = force, verbose = > > less(verbose) > > at NextMethod(generic = "getAverageFile", object = this, indices = > > indices, .. > > at getAverageFile.ChipEffectSet(ces, force = force, verbose = > > less(verbose)) > > at NextMethod(generic = "getAverageFile", object = this, ...) > > at getAverageFile.SnpChipEffectSet(ces, force = force, verbose = > > less(verbose) > > at NextMethod(generic = "getAverageFile", object = this, ...) > > at getAverageFile.CnChipEffectS > > Calls: computeCN ... saveObject.default -> throw -> throw.default -> > > throw -> throw.Exception > > Execution halted > > > Interestingly, on the other 5 nodes GladModel() seems to run fine. > > > Do you have any idea what the reason for this problem might be? > > This seems to be due to a race condition, because several processes > calls getAverageFile() on the same data set (set of data files). It > has nothing to do with the GladModel - that is only calling > getAverageFile() in order to calculate the average signal across all > samples in the data set. > > More precisely, in this particular case it is saveObject() of R.utils > that detects that there already exist a temporary file (added file > name extension *.tmp) that is currently being created and written to > by another process. This temporary file is renamed to its final name > when done. The reason why didn't observe it before is most likely > because this additional feature was added to saveObject() in R.utils > v1.2.4: > > Version: 1.2.4 [2009-10-30] > o ROBUSTIFICATION: Lowered
[aroma.affymetrix] Problem with GLAD on linux cluster
Dear Henrik, Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could run my analysis on our sge-cluster w/o any problems. Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are curently testing with 8 chips whether my package based on aroma.affymetrix still works on the cluster. The normalization step on a server did run fine, howeever, distributing the 8 samples on the cluster to run GladModel() resulted in the problem that 3 of 8 cluster nodes did stop with the following error message: Loading required package: GLAD ... Loading required package: RColorBrewer Loading required package: Cairo Error in list(`computeCN(aroma, model = model, arrays = arrays[i], chromosomes = 1:23, ref` = , : [2010-06-29 15:08:49] Exception: Cannot save to file. Temporary file already exists: ~/.Rcache/aroma.affymetrix/idChecks/ a1c33926939ee43fbed83ae69301d215.tmp at throw(Exception(...)) at throw.default("Cannot save to file. Temporary file already exists: ", pathn at throw("Cannot save to file. Temporary file already exists: ", pathnameT) at saveObject.default(list(key = key, keyIds = lapply(key, digest2), id = id), at saveObject(list(key = key, keyIds = lapply(key, digest2), id = id), idPathn at getAverageFile.AffymetrixCelSet(ces, force = force, verbose = less(verbose) at NextMethod(generic = "getAverageFile", object = this, indices = indices, .. at getAverageFile.ChipEffectSet(ces, force = force, verbose = less(verbose)) at NextMethod(generic = "getAverageFile", object = this, ...) at getAverageFile.SnpChipEffectSet(ces, force = force, verbose = less(verbose) at NextMethod(generic = "getAverageFile", object = this, ...) at getAverageFile.CnChipEffectS Calls: computeCN ... saveObject.default -> throw -> throw.default -> throw -> throw.Exception Execution halted Interestingly, on the other 5 nodes GladModel() seems to run fine. Do you have any idea what the reason for this problem might be? > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-unknown-linux-gnu locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Cairo_1.4-5RColorBrewer_1.0-2 GLAD_2.10.0 [4] biasnp_0.2.54 RODBC_1.3-1 aroma.affymetrix_1.6.2 [7] aroma.apd_0.1.7affxparser_1.20.0 R.huge_0.2.0 [10] aroma.core_1.6.2 aroma.light_1.16.0 matrixStats_0.2.1 [13] R.rsp_0.3.6R.cache_0.3.0 R.filesets_0.8.2 [16] digest_0.4.2 R.utils_1.4.2 R.oo_1.7.3 [19] R.methodsS3_1.2.0 loaded via a namespace (and not attached): [1] tools_2.11.1 Warning message: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible Best regards Christian P.S.: Is Google groups still the place to post questions? -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: Question for custom CDF of ST-Array
Dear Henrik I have sent this offline to Affymetrix DevNet, so you will not find any note. I have already done this once for the MoGene and RaGene arrays where Affymetrix has deleted one AFFX control in the PGF-files but not in the corresponding annotation files, see: https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-August/029049.html However, this time they have added a few hundred probesets to the annotation file which are not present in the PGF-file, see their README file for the updated HuGene annotation file. BTW, I really hope that Affymetrix (and you) will soon update the annotation files for the SNP arrays to hg19, i.e. human genome build 37. Best regards Christian On Feb 8, 11:09 am, Henrik Bengtsson wrote: > Hi. > > On Mon, Feb 1, 2010 at 9:02 AM, cstratowa > > > > wrote: > > Dear Branko, > > > I have just seen your comment that you have an issue with XPS, which > > is my BioC package. However, you have never contacted me at > > nor have sent a mail to the BioC mailing list. I > > would really appreciate that you contact me first before making such a > > statement. > > > Regarding your problem with HuGene-1_0-st-v1.na30.hg19.probeset._csv, > > (which could be the reason of your problem with xps), Affymetrix had > > included probesets in this annotation file which were not included in > > the corresponding *.PGF file. For this reason I have contacted > > Affymetrix DevNet and asked them to remove the extra probesets, which > > they kindly did resulting in file HuGene-1_0-st- > > v1.na30.1.hg19.probeset._csv. > > Thanks for this note/update. Did you send this offline to Affymetrix > or did you use their Affymetrix Scientific Community Forums > [https://www.affymetrix.com/community/forums/index.jspa]? I could > find it in the latter, but there are quite a few forums too. > > Cheers, > > Henrik > > > > > Best regards > > Christian > > > On Jan 28, 4:58 pm, branko wrote: > >> Hi all, > > >> First to thank you Mark , Henrik for nice tool and also all others > >> for informative mailing list. > >> This is my first email so hopefully not so many dummy questions. > > >> I found this thread as it is linked to my first 2 questions: > >> Month ago I did some basic analysis with 300 chips where your tool > >> got very handy. And I'm starting again to analyze things ... > >> (Btw. I found about Aroma on BioC list as I had issue with XPS > >> package that I couldn’t resolve . I would like to have them both > >> working so Xps will come on board eventually.) > > >> Now regarding this tread. > >> 1) When I compared HuGene-1_0-st-v1.na30.hg19.probeset._csv. with > >> the Aroma CDF file there are > >> 33257 PSets in csv while Aroma has 33252 so 4 different. So > >> maybe the CDF update is needed in Aroma? > >> I also just saw some more recent update on Affymetrix to “na30.1” > >> and I see I have used “na30” csv file. > > >> Following seems weird , many ProbeSets have no gene symbol > >> annotation in “main“ category : > > >> tr <- read.csv ( "HuGene-1_0-st-v1.na30.hg19.transcript.csv" , > >> stringsAsFactors =F ,comment.char="#") > > >> sum(tr$gene_assignment=="---" & tr$category=="main") #6711 > >> sum(tr$unigene=="---" & tr$category=="main") #7376 > >> sum(tr$swissprot=="---" & tr$category=="main") # 8021 > > >> Affy 33257 Probesets are classified as: > >> TYPE TOTAL > >> control->affx 57 > >> control->bgp->antigenomic 45 > >> main 28829 > >> normgene->exon 1195 > >> normgene->intron 2904 > >> rescue->FLmRNA->unmapped 227 > > >> Do you maybe know why are there so many e.g above 6711 without > >> gene_assignment ? > > >> 2) One more related to 1st Q : How to run normalization only on > >> ProbeSet classified as “main” ? ( Affymetrix csv file says this > >> class has 28829 > >> Psets in main class) Idea is of course to minimize > >> normalization bias of non-main class Psets. > > >> 3) Few questions regarding Quality checks and basic data > >> manipulations in Aroma: > >> I would like to give meaningful labels to the Arrays in the e.g. > >> box-plots (instead of CEL file names, as said i have 300) . >
[aroma.affymetrix] Re: Question for custom CDF of ST-Array
Dear Henrik I have sent this offline to Affymetrix DevNet, so you will not find any note. I have already done this once for the MoGene and RaGene arrays where Affymetrix has deleted one AFFX control in the PGF-files but not in the corresponding annotation files, see: https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-August/029049.html However, this time they have added a few hundred probesets to the annotation file which are not present in the PGF-file, see their README file for the updated HuGene annotation file. BTW, I really hope that Affymetrix (and you) will soon update the annotation files for the SNP arrays to hg19, i.e. human genome build 37. Best regards Christian On Feb 8, 11:09 am, Henrik Bengtsson wrote: > Hi. > > On Mon, Feb 1, 2010 at 9:02 AM, cstratowa > > > > wrote: > > Dear Branko, > > > I have just seen your comment that you have an issue with XPS, which > > is my BioC package. However, you have never contacted me at > > nor have sent a mail to the BioC mailing list. I > > would really appreciate that you contact me first before making such a > > statement. > > > Regarding your problem with HuGene-1_0-st-v1.na30.hg19.probeset._csv, > > (which could be the reason of your problem with xps), Affymetrix had > > included probesets in this annotation file which were not included in > > the corresponding *.PGF file. For this reason I have contacted > > Affymetrix DevNet and asked them to remove the extra probesets, which > > they kindly did resulting in file HuGene-1_0-st- > > v1.na30.1.hg19.probeset._csv. > > Thanks for this note/update. Did you send this offline to Affymetrix > or did you use their Affymetrix Scientific Community Forums > [https://www.affymetrix.com/community/forums/index.jspa]? I could > find it in the latter, but there are quite a few forums too. > > Cheers, > > Henrik > > > > > Best regards > > Christian > > > On Jan 28, 4:58 pm, branko wrote: > >> Hi all, > > >> First to thank you Mark , Henrik for nice tool and also all others > >> for informative mailing list. > >> This is my first email so hopefully not so many dummy questions. > > >> I found this thread as it is linked to my first 2 questions: > >> Month ago I did some basic analysis with 300 chips where your tool > >> got very handy. And I'm starting again to analyze things ... > >> (Btw. I found about Aroma on BioC list as I had issue with XPS > >> package that I couldn’t resolve . I would like to have them both > >> working so Xps will come on board eventually.) > > >> Now regarding this tread. > >> 1) When I compared HuGene-1_0-st-v1.na30.hg19.probeset._csv. with > >> the Aroma CDF file there are > >> 33257 PSets in csv while Aroma has 33252 so 4 different. So > >> maybe the CDF update is needed in Aroma? > >> I also just saw some more recent update on Affymetrix to “na30.1” > >> and I see I have used “na30” csv file. > > >> Following seems weird , many ProbeSets have no gene symbol > >> annotation in “main“ category : > > >> tr <- read.csv ( "HuGene-1_0-st-v1.na30.hg19.transcript.csv" , > >> stringsAsFactors =F ,comment.char="#") > > >> sum(tr$gene_assignment=="---" & tr$category=="main") #6711 > >> sum(tr$unigene=="---" & tr$category=="main") #7376 > >> sum(tr$swissprot=="---" & tr$category=="main") # 8021 > > >> Affy 33257 Probesets are classified as: > >> TYPE TOTAL > >> control->affx 57 > >> control->bgp->antigenomic 45 > >> main 28829 > >> normgene->exon 1195 > >> normgene->intron 2904 > >> rescue->FLmRNA->unmapped 227 > > >> Do you maybe know why are there so many e.g above 6711 without > >> gene_assignment ? > > >> 2) One more related to 1st Q : How to run normalization only on > >> ProbeSet classified as “main” ? ( Affymetrix csv file says this > >> class has 28829 > >> Psets in main class) Idea is of course to minimize > >> normalization bias of non-main class Psets. > > >> 3) Few questions regarding Quality checks and basic data > >> manipulations in Aroma: > >> I would like to give meaningful labels to the Arrays in the e.g. > >> box-plots (instead of CEL file names, as said i have 300) . >
[aroma.affymetrix] Re: Question for custom CDF of ST-Array
Dear Branko, I have just seen your comment that you have an issue with XPS, which is my BioC package. However, you have never contacted me at nor have sent a mail to the BioC mailing list. I would really appreciate that you contact me first before making such a statement. Regarding your problem with HuGene-1_0-st-v1.na30.hg19.probeset._csv, (which could be the reason of your problem with xps), Affymetrix had included probesets in this annotation file which were not included in the corresponding *.PGF file. For this reason I have contacted Affymetrix DevNet and asked them to remove the extra probesets, which they kindly did resulting in file HuGene-1_0-st- v1.na30.1.hg19.probeset._csv. Best regards Christian On Jan 28, 4:58 pm, branko wrote: > Hi all, > > First to thank you Mark , Henrik for nice tool and also all others > for informative mailing list. > This is my first email so hopefully not so many dummy questions. > > I found this thread as it is linked to my first 2 questions: > Month ago I did some basic analysis with 300 chips where your tool > got very handy. And I'm starting again to analyze things ... > (Btw. I found about Aroma on BioC list as I had issue with XPS > package that I couldn’t resolve . I would like to have them both > working so Xps will come on board eventually.) > > Now regarding this tread. > 1) When I compared HuGene-1_0-st-v1.na30.hg19.probeset._csv. with > the Aroma CDF file there are > 33257 PSets in csv while Aroma has 33252 so 4 different. So > maybe the CDF update is needed in Aroma? > I also just saw some more recent update on Affymetrix to “na30.1” > and I see I have used “na30” csv file. > > Following seems weird , many ProbeSets have no gene symbol > annotation in “main“ category : > > tr <- read.csv ( "HuGene-1_0-st-v1.na30.hg19.transcript.csv" , > stringsAsFactors =F ,comment.char="#") > > sum(tr$gene_assignment=="---" & tr$category=="main") #6711 > sum(tr$unigene=="---" & tr$category=="main") #7376 > sum(tr$swissprot=="---" & tr$category=="main") # 8021 > > Affy 33257 Probesets are classified as: > TYPE TOTAL > control->affx 57 > control->bgp->antigenomic 45 > main 28829 > normgene->exon 1195 > normgene->intron 2904 > rescue->FLmRNA->unmapped 227 > > Do you maybe know why are there so many e.g above 6711 without > gene_assignment ? > > 2) One more related to 1st Q : How to run normalization only on > ProbeSet classified as “main” ? ( Affymetrix csv file says this > class has 28829 > Psets in main class) Idea is of course to minimize > normalization bias of non-main class Psets. > > 3) Few questions regarding Quality checks and basic data > manipulations in Aroma: > I would like to give meaningful labels to the Arrays in the e.g. > box-plots (instead of CEL file names, as said i have 300) . > How can I do that? > Also how can I sort them ? > I ask this silly questions because Using R commands like str() > doesn’t show me the > object fields etc. so I can’t use standard R matrix commands, > also help (“some Aroma command” ) doesn’t show enough information. > Sometimes it gives empty help page. > I could not find pdf manual in Aroma installed libraries nor in > the Google group. I see only html file showing me all the functions & > classes. > Is there easier way to look for functions than main html pages ? > Code of functions are not visible by just typing func.name() , I > guess I can always get source code and search but there is likely easy > way to do it. > > It is visible that Aroma uses different classes than > BioConductor. I assume there is a good reason for that, but maybe you > can give some link with explanation? > > Oh and thank you for putting clean instructions for basic > preprocessing steps to get Expression matrix out for further > analysis. > > One example of my attempt to searching things : > cs <- AffymetrixCelSet$byName(dataSet, cdf=cdf) > print(cs) > str(cs) > #Classes 'AffymetrixCelSet', 'AffymetrixFileSet', > 'AromaPlatformInterface', 'AromaMicroarrayDataSet', > 'GenericDataFileSet', 'FullNameInterface', 'Object' atomic [1:1] NA > …. > ?classname put me on some track. > > So I looked a bit and I saw GenericDataFileSet class provides > function > getFullNames() and it does give me the my CEL filenames (btw. > help on ?getFullNames gives me no help .) > setNames() point to basic library… but before going deeper and > making some mess I hope you > can give me answer on above questions or some links ? > > 4) Last one , regarding QC issue with plotting … SO when doing Array > (pseudo) image plots my RGui in windows complains e.g.: > > If I do: cf <- getFile (cs, 1) > plotImage(cf, transform=list(log2), palette=rainbow > (256)) > >
[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0
Dear Henrik, Thank you for your reply. I must admit that I had removed all verbose statements from the code fragment I have sent you. From my verbose statements I knew already that the CEL set is empty but I did not understand why. After following your advice and stepping through each statement manually it turned out to be a problem with "chiptype": For certain parts of my code the chipname must be "GenomeWideSNP_6.Full", which is the correct chip name, but for the aroma.affymetrix parts chiptype must be "GenomeWideSNP_6". Thus in the paired case I forgot to remove ".Full". Best regards Christian On Jul 8, 2:22 am, Henrik Bengtsson wrote: > Hi, > > the only thing I can conclude is that nbrOfFiles(this) == 0 in: > > > at Arguments$getIndex(idx, range = c(1, nbrOfFiles(this))) > > and from > > > at getUnitNamesFile.AffymetrixCelSet(this) > > I learn that 'this' refers to a CEL set (which is empty). > > However, it is impossible for me to know where/what causes this. I > suggest that you add some verbose statements to computeCN() to narrow > down at what lines of code the error occurs, or alternatively "unwrap" > it from computeCN() and step through each of its commands manually. > > This is exactly why I add so many verbose statements and sanity checks > in my code so that any errors/bugs are caught as soon as possible. > See my code how to add verbose statements. > > Cheers, > > /Henrik > > On Thu, Jul 2, 2009 at 6:24 AM, > > cstratowa wrote: > > > Dear Henrik, > > > Sadly, paired analysis with SNP6.0 arrays results still in the Error > > messages shown below. > > > Please note the following: > > > 1, With the new versions of aroma.* I have just successfully tested > > CEL-files from Mapping250K_Sty arrays both with and w/o paired > > analysis. > > > 2, As mentioned yesterday I have successfully tested CEL-files from > > SNP6.0 arrays w/o paired analysis. > > > 3, Since the error seems to occur in function "computeCN()" please > > note that I have already listed the relevant code in my initial > > message. > > > Here is the error message I get: > > > Loading required package: DNAcopy > > Error in list(`computeCN(aroma, model = model, arrays = arrays[i], > > chromosomes = 1:23, ref` = , : > > > [2009-07-02 10:21:14] Exception: Range of argument 'idx' is out of > > range [1,0]: [1,1] > > at throw(Exception(...)) > > at throw.default(sprintf("Range of argument '%s' is out of range [%s, > > %s]: [%s, > > at throw(sprintf("Range of argument '%s' is out of range [%s,%s]: > > [%s,%s]", .n > > at getNumerics.Arguments(static, ..., asMode = "integer", disallow = > > disallow) > > at getNumerics(static, ..., asMode = "integer", disallow = disallow) > > at getIntegers.Arguments(static, ..., range = range) > > at getIntegers(static, ..., range = range) > > at getIndices.Arguments(static, ..., length = length) > > at getIndices(static, ..., length = length) > > at method(static, ...) > > at Arguments$getIndex(idx, range = c(1, nbrOfFiles(this))) > > at getFile.GenericDataFileSet(this, 1) > > at getFile(this, 1) > > at getUnitNamesFile.AffymetrixCelSet(this) > > at getUnitNamesFile(this) > > at getChipType.AffymetrixCelSet(X[[1]], ...) > > at FUN(X[[1]], ...) > > at lapply(X, FUN, ...) > > at sapply.default(csList, FUN = getChipTyp > > Calls: computeCN ... getNumerics.Arguments -> throw -> throw.default - > >> throw -> throw.Exception > > Execution halted > > > Do you have any idea what might be the reason? > > > Since I do not get this error with Sty arrays it must be special for > > SNP6.0 arrays when doing paired analysis. Could it be that there may > > be a problem with aroma functions "getNames()" or "extract()" for > > SNP6.0? > > > Here once again the sessionInfo: > >> sessionInfo() > > R version 2.8.1 (2008-12-22) > > x86_64-unknown-linux-gnu > > > locale: > > C > > > attached base packages: > > [1] tcltk stats graphics grDevices utils datasets > > methods > > [8] base > > > other attached packages: > > [1] GLAD_1.18.0 Cairo_1.4-5 > > DNAcopy_1.19.2 > > [4] biasnp_0.2.43 RODBC_1.2-5 > > aroma.affymetrix_1.1.0 > > [7] aroma.apd_0.1.3 R.huge_0.1.7 > > affxparser_1.14.2 > > [10] aroma.core_1.1.2 aroma.light_1.13.2 > > matrixStats_0.1.6 > &g
[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0
Dear Henrik, Sadly, paired analysis with SNP6.0 arrays results still in the Error messages shown below. Please note the following: 1, With the new versions of aroma.* I have just successfully tested CEL-files from Mapping250K_Sty arrays both with and w/o paired analysis. 2, As mentioned yesterday I have successfully tested CEL-files from SNP6.0 arrays w/o paired analysis. 3, Since the error seems to occur in function "computeCN()" please note that I have already listed the relevant code in my initial message. Here is the error message I get: Loading required package: DNAcopy Error in list(`computeCN(aroma, model = model, arrays = arrays[i], chromosomes = 1:23, ref` = , : [2009-07-02 10:21:14] Exception: Range of argument 'idx' is out of range [1,0]: [1,1] at throw(Exception(...)) at throw.default(sprintf("Range of argument '%s' is out of range [%s, %s]: [%s, at throw(sprintf("Range of argument '%s' is out of range [%s,%s]: [%s,%s]", .n at getNumerics.Arguments(static, ..., asMode = "integer", disallow = disallow) at getNumerics(static, ..., asMode = "integer", disallow = disallow) at getIntegers.Arguments(static, ..., range = range) at getIntegers(static, ..., range = range) at getIndices.Arguments(static, ..., length = length) at getIndices(static, ..., length = length) at method(static, ...) at Arguments$getIndex(idx, range = c(1, nbrOfFiles(this))) at getFile.GenericDataFileSet(this, 1) at getFile(this, 1) at getUnitNamesFile.AffymetrixCelSet(this) at getUnitNamesFile(this) at getChipType.AffymetrixCelSet(X[[1]], ...) at FUN(X[[1]], ...) at lapply(X, FUN, ...) at sapply.default(csList, FUN = getChipTyp Calls: computeCN ... getNumerics.Arguments -> throw -> throw.default - > throw -> throw.Exception Execution halted Do you have any idea what might be the reason? Since I do not get this error with Sty arrays it must be special for SNP6.0 arrays when doing paired analysis. Could it be that there may be a problem with aroma functions "getNames()" or "extract()" for SNP6.0? Here once again the sessionInfo: > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: C attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GLAD_1.18.0Cairo_1.4-5 DNAcopy_1.19.2 [4] biasnp_0.2.43 RODBC_1.2-5 aroma.affymetrix_1.1.0 [7] aroma.apd_0.1.3R.huge_0.1.7 affxparser_1.14.2 [10] aroma.core_1.1.2 aroma.light_1.13.2 matrixStats_0.1.6 [13] R.rsp_0.4.0R.httpd_0.1.1 R.filesets_0.5.2 [16] digest_0.3.1 R.cache_0.1.7 R.utils_1.1.6 [19] R.oo_1.4.6 R.methodsS3_1.0.3 Best regards Christian On Jul 1, 3:31 pm, cstratowa wrote: > Dear Henrik, > > Thank you for your suggestions. > > The answer to your first reply is that the security settings prevent > us from using biocLite() etc so we need to download all files and > usually run install scripts. > > Following your suggestion I have now installed DNAcopy v 1.19.2 which > caused some problems on the cluster because of a missing > libgfortran.so. (As I have mentioned in an earlier post, I am running > CBS and GLAD on cluster nodes in order to speed up processing.) > > I have just started another session and at the moment it seems to > work, the current output is: > > Loading required package: DNAcopy > Loading required package: Cairo > Loading required package: GLAD > [1] "Have fun with GLAD" > > However, I still need to test the paired analysis with the SNP6.0 > arrays, this will take some time. > > Here is how the sessionInfo is supposed to look like:> sessionInfo() > > R version 2.8.1 (2008-12-22) > x86_64-unknown-linux-gnu > > locale: > C > > attached base packages: > [1] tcltk stats graphics grDevices utils datasets > methods > [8] base > > other attached packages: > [1] GLAD_1.18.0 Cairo_1.4-5 > DNAcopy_1.19.2 > [4] biasnp_0.2.43 RODBC_1.2-5 > aroma.affymetrix_1.1.0 > [7] aroma.apd_0.1.3 R.huge_0.1.7 > affxparser_1.14.2 > [10] aroma.core_1.1.2 aroma.light_1.13.2 > matrixStats_0.1.6 > [13] R.rsp_0.4.0 R.httpd_0.1.1 > R.filesets_0.5.2 > [16] digest_0.3.1 R.cache_0.1.7 > R.utils_1.1.6 > [19] R.oo_1.4.6 R.methodsS3_1.0.3 > > Best regards > Christian > > On Jun 29, 4:34 pm, Henrik Bengtsson wrote: > > > Hi. > > > On Mon, Jun 29, 2009 at 5:14 AM, > > > cstratowa wrote: > > > > Dear Henrik, > > > > Meanwhile I have downloaded the latest versions from your repository > > > and installed them on R-2.8.1. Please note that due to our settings it
[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0
Dear Henrik, Thank you for your suggestions. The answer to your first reply is that the security settings prevent us from using biocLite() etc so we need to download all files and usually run install scripts. Following your suggestion I have now installed DNAcopy v 1.19.2 which caused some problems on the cluster because of a missing libgfortran.so. (As I have mentioned in an earlier post, I am running CBS and GLAD on cluster nodes in order to speed up processing.) I have just started another session and at the moment it seems to work, the current output is: Loading required package: DNAcopy Loading required package: Cairo Loading required package: GLAD [1] "Have fun with GLAD" However, I still need to test the paired analysis with the SNP6.0 arrays, this will take some time. Here is how the sessionInfo is supposed to look like: > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: C attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GLAD_1.18.0Cairo_1.4-5 DNAcopy_1.19.2 [4] biasnp_0.2.43 RODBC_1.2-5 aroma.affymetrix_1.1.0 [7] aroma.apd_0.1.3R.huge_0.1.7 affxparser_1.14.2 [10] aroma.core_1.1.2 aroma.light_1.13.2 matrixStats_0.1.6 [13] R.rsp_0.4.0R.httpd_0.1.1 R.filesets_0.5.2 [16] digest_0.3.1 R.cache_0.1.7 R.utils_1.1.6 [19] R.oo_1.4.6 R.methodsS3_1.0.3 Best regards Christian On Jun 29, 4:34 pm, Henrik Bengtsson wrote: > Hi. > > On Mon, Jun 29, 2009 at 5:14 AM, > > > > cstratowa wrote: > > > Dear Henrik, > > > Meanwhile I have downloaded the latest versions from your repository > > and installed them on R-2.8.1. Please note that due to our settings it > > is NOT POSSIBLE for me to use hbInstall("aroma.affymetrix")!!! > > > Sadly, now CBS does no longer work at all. I get the message: > > > Loading required package: DNAcopy > > Loading required package: Cairo > > Error: package 'DNAcopy' does not have a name space > > In addition: Warning messages: > > 1: In log(theta/thetaRef, base = 2) : NaNs produced > > 2: In log(theta * thetaRef, base = 2) : NaNs produced > > Execution halted > > Output from traceback() would be helpful here. I can make a guess > where/why it occurs, but I'm not sure what you are doing. > > > > > > > Thus it seems that the new versions do not work with R-2.8.1, so I > > need to downgrade to the old versions. > > > Here is the sessionInfo() for the new versions: > >> sessionInfo() > > R version 2.8.1 (2008-12-22) > > x86_64-unknown-linux-gnu > > > locale: > > C > > > attached base packages: > > [1] tcltk stats graphics grDevices utils datasets > > methods > > [8] base > > > other attached packages: > > [1] biasnp_0.2.43 RODBC_1.2-5 > > aroma.affymetrix_1.1.0 > > [4] aroma.apd_0.1.3 R.huge_0.1.7 > > affxparser_1.14.2 > > [7] aroma.core_1.1.2 aroma.light_1.13.2 > > matrixStats_0.1.4 > > [10] R.rsp_0.4.0 R.httpd_0.1.1 > > R.filesets_0.5.2 > > [13] digest_0.3.1 R.cache_0.1.7 > > R.utils_1.1.6 > > [16] R.oo_1.4.6 R.methodsS3_1.0.3 > > Version of DNAcopy? > > So, I am quite sure the aroma.* code is working on much older > versions, but I cannot speak for external packages. > > However, in this case aroma.* do assume that DNAcopy has a namespace > and that was most likely introduced in v1.18.0. You might have to > upgrade DNAcopy to v1.18.0 or newer. I really recommend v 1.19.2 > because it is now much much faster (really!). For people who can use > our install scripts, this can be done by fooling bioc to think you > have a newer R version as follows: > > source("http://www.braju.com/R/hbLite.R";); > biocLite("DNAcopy", rver="2.10.0"); > > If you cannot use this, you probably have to download the file > *.tar.gz (*.zip) file and install it manually. > > Let us know if this solves the problem. > > /Henrik > > > > > Best regards > > Christian > > > P.S.: I know that this is OSS and everybody is very busy, but > > nevertheless I would like to mention, that I am not the only person > > who is not happy with the policy of many open source software systems > > to supply a new version every 6 months and no longer support older > > versions. This is no problem when running the software on one's > > personal PC but often causes problems in company settings where you > > need to upgrade a lot of systems and the philosophy is not to change a > > runn
[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0
Dear Henrik, Meanwhile I have downloaded the latest versions from your repository and installed them on R-2.8.1. Please note that due to our settings it is NOT POSSIBLE for me to use hbInstall("aroma.affymetrix")!!! Sadly, now CBS does no longer work at all. I get the message: Loading required package: DNAcopy Loading required package: Cairo Error: package 'DNAcopy' does not have a name space In addition: Warning messages: 1: In log(theta/thetaRef, base = 2) : NaNs produced 2: In log(theta * thetaRef, base = 2) : NaNs produced Execution halted Thus it seems that the new versions do not work with R-2.8.1, so I need to downgrade to the old versions. Here is the sessionInfo() for the new versions: > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: C attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] biasnp_0.2.43 RODBC_1.2-5 aroma.affymetrix_1.1.0 [4] aroma.apd_0.1.3R.huge_0.1.7 affxparser_1.14.2 [7] aroma.core_1.1.2 aroma.light_1.13.2 matrixStats_0.1.4 [10] R.rsp_0.4.0R.httpd_0.1.1 R.filesets_0.5.2 [13] digest_0.3.1 R.cache_0.1.7 R.utils_1.1.6 [16] R.oo_1.4.6 R.methodsS3_1.0.3 Best regards Christian P.S.: I know that this is OSS and everybody is very busy, but nevertheless I would like to mention, that I am not the only person who is not happy with the policy of many open source software systems to supply a new version every 6 months and no longer support older versions. This is no problem when running the software on one's personal PC but often causes problems in company settings where you need to upgrade a lot of systems and the philosophy is not to change a running system. In our case this means to update the servers and the Linux clusters, too, since I have created a package which allows me to run CBS and GLAD on the cluster nodes. On Jun 26, 7:35 am, Henrik Bengtsson wrote: > 2009/6/22 cstratowa : > > > > > Does this update also work with R-2.8.1? > > I avoid increasing the dependency in the DESCRIPTION files unless I > know they have to. Currently I think it is roughly R 2.6.x that is > specified. It does not mean it is guaranteed to work. However, I > only support and run redundancy tests on the lastest stable version of > R (now R 2.9.x). > > Did you try? > > /Henrik > > > > > Best regards > > Christian > > > On Jun 18, 5:39 pm, Henrik Bengtsson wrote: > >> Hi, > > >> update aroma.affymetrix et al. and retry. > > >> /Henrik > > >> On Thu, Jun 18, 2009 at 5:01 AM, > > >> cstratowa wrote: > > >> > Dear all, > > >> > Below I show you the relevant code fragments of my function "computeCN > >> > ()" to compute copy-number regions using model CBS. This function can > >> > compute copy-number regions for the usual case (reference=NULL), for > >> > pairs of samples (reference="Pairs") or using reference samples > >> > (reference="Reference"). > > >> > I am using this function for a long time for 100K and 500K mapping > >> > arrays w/o any problems. However for the SNP6.0 array this function > >> > works only for the usual case (reference=NULL), but not for the Pairs/ > >> > Reference cases. > > >> > For SNP6.0 array and Pairs/Reference cases I get always the following > >> > error message: > > >> > Error in this$files[[1]] : subscript out of bounds > >> > Calls: computeCN ... clearCache.AffymetrixCelSet -> getCdf -> > >> > getCdf.AffymetrixCelSet -> getCdf > >> > Execution halted > > >> > Do you have any idea why I get these errors? > >> > Do I have to change the code for Pairs/Reference cases, and how? > > >> > Here is the relevant code of function "computeCN()": > > >> > "computeCN" <- function(object, pheno, arrays=NULL, reference=NULL, > >> > chromosomes=NULL) { > >> > ## get normalized data > >> > cesList <- effectSet(object); > > >> > ## get optional reference set > >> > if (is.null(reference)) { > >> > refList <- NULL; > >> > } else if (is.character(reference)) { > >> > ## extract reference samples (and data samples if paired) > >> > refList <- list(); > >> > for (chiptype in names(cesList)) { > >> > ces <- cesList[[chiptype]]; > >> > cesnames <- getNames(ces); > > >> > ## get row numbers containing r
[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0
Does this update also work with R-2.8.1? Best regards Christian On Jun 18, 5:39 pm, Henrik Bengtsson wrote: > Hi, > > update aroma.affymetrix et al. and retry. > > /Henrik > > On Thu, Jun 18, 2009 at 5:01 AM, > > cstratowa wrote: > > > Dear all, > > > Below I show you the relevant code fragments of my function "computeCN > > ()" to compute copy-number regions using model CBS. This function can > > compute copy-number regions for the usual case (reference=NULL), for > > pairs of samples (reference="Pairs") or using reference samples > > (reference="Reference"). > > > I am using this function for a long time for 100K and 500K mapping > > arrays w/o any problems. However for the SNP6.0 array this function > > works only for the usual case (reference=NULL), but not for the Pairs/ > > Reference cases. > > > For SNP6.0 array and Pairs/Reference cases I get always the following > > error message: > > > Error in this$files[[1]] : subscript out of bounds > > Calls: computeCN ... clearCache.AffymetrixCelSet -> getCdf -> > > getCdf.AffymetrixCelSet -> getCdf > > Execution halted > > > Do you have any idea why I get these errors? > > Do I have to change the code for Pairs/Reference cases, and how? > > > Here is the relevant code of function "computeCN()": > > > "computeCN" <- function(object, pheno, arrays=NULL, reference=NULL, > > chromosomes=NULL) { > > ## get normalized data > > cesList <- effectSet(object); > > > ## get optional reference set > > if (is.null(reference)) { > > refList <- NULL; > > } else if (is.character(reference)) { > > ## extract reference samples (and data samples if paired) > > refList <- list(); > > for (chiptype in names(cesList)) { > > ces <- cesList[[chiptype]]; > > cesnames <- getNames(ces); > > > ## get row numbers containing reference and data samples > > refcol <- which(pheno[,reference] == 1); > > datcol <- which(pheno[,reference] == 0); > > > if (reference == "Pairs") { > > cesList[[chiptype]] <- extract(ces, datcol); > > refList[[chiptype]] <- extract(ces, refcol); > > } else if (reference == "Reference") { > > cesRef <- extract(ces, refcol); > > ceRef <- getAverageFile(cesRef); > > > ## convert single ceRef file into a set of identical files > > of same size/length as ces > > numarr <- nbrOfArrays(ces); > > cesRef <- rep(list(ceRef), numarr); # becomes a list of > > CnChipEffectFile:s > > cesRef <- newInstance(ces, cesRef, mergeStrands=ces > > $mergeStrands, combineAlleles=ces$combineAlleles); > > refList[[chiptype]] <- cesRef; > > }#if > > }#for > > }#if > > > ## select model > > model <- CbsModel(cesList, refList); > > > ## get index of arrays > > if (is.null(arrays)) { > > arrays <- indexOfArrays(model); > > } else { > > arrays <- which(getNames(model)==arrays); > > }#if > > > ## fit model with Chromosome Explorer > > ce <- ChromosomeExplorer(model, tags=tags); > > setParallelSafe(ce, status=TRUE); #necessary for running on cluster > > tmp <- process(ce, arrays=arrays, chromosomes=chromosomes, > > force=force, verbose=verbose); > > > ## get fitted CN regions from model > > cnrs <- getRegions(model, arrays=arrays, chromosomes=chromosomes, > > organism="Human", hgVersion="hg18"); > > > return(object); > > }#computeCN > > > Here is my sessionInfo: > > >> sessionInfo() > > R version 2.8.1 (2008-12-22) > > x86_64-unknown-linux-gnu > > > locale: > > C > > > attached base packages: > > [1] tcltk stats graphics grDevices utils datasets > > methods > > [8] base > > > other attached packages: > > [1] biasnp_0.2.41 RODBC_1.2-5 > > aroma.affymetrix_1.0.1 > > [4] aroma.apd_0.1.3 R.huge_0.1.6 > > affxparser_1.14.2 > > [7] aroma.core_1.0.1 aroma.light_1.11.1 > > digest_0.3.1 > > [10] matrixStats_0.1.4 R.rsp_0.4.0 > > R.httpd_0.1.1 > > [13] R.cache_0.1.7 R.utils_1.1.3 > > R.oo_1.4.6 > > [16] R.methodsS3_1.0.3 > > > Thank you in advance for your help. > > > Best regards > > Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Problem with paired copynumber analysis for SNP6.0
Dear all, Below I show you the relevant code fragments of my function "computeCN ()" to compute copy-number regions using model CBS. This function can compute copy-number regions for the usual case (reference=NULL), for pairs of samples (reference="Pairs") or using reference samples (reference="Reference"). I am using this function for a long time for 100K and 500K mapping arrays w/o any problems. However for the SNP6.0 array this function works only for the usual case (reference=NULL), but not for the Pairs/ Reference cases. For SNP6.0 array and Pairs/Reference cases I get always the following error message: Error in this$files[[1]] : subscript out of bounds Calls: computeCN ... clearCache.AffymetrixCelSet -> getCdf -> getCdf.AffymetrixCelSet -> getCdf Execution halted Do you have any idea why I get these errors? Do I have to change the code for Pairs/Reference cases, and how? Here is the relevant code of function "computeCN()": "computeCN" <- function(object, pheno, arrays=NULL, reference=NULL, chromosomes=NULL) { ## get normalized data cesList <- effectSet(object); ## get optional reference set if (is.null(reference)) { refList <- NULL; } else if (is.character(reference)) { ## extract reference samples (and data samples if paired) refList <- list(); for (chiptype in names(cesList)) { ces <- cesList[[chiptype]]; cesnames <- getNames(ces); ## get row numbers containing reference and data samples refcol <- which(pheno[,reference] == 1); datcol <- which(pheno[,reference] == 0); if (reference == "Pairs") { cesList[[chiptype]] <- extract(ces, datcol); refList[[chiptype]] <- extract(ces, refcol); } else if (reference == "Reference") { cesRef <- extract(ces, refcol); ceRef <- getAverageFile(cesRef); ## convert single ceRef file into a set of identical files of same size/length as ces numarr <- nbrOfArrays(ces); cesRef <- rep(list(ceRef), numarr); # becomes a list of CnChipEffectFile:s cesRef <- newInstance(ces, cesRef, mergeStrands=ces $mergeStrands, combineAlleles=ces$combineAlleles); refList[[chiptype]] <- cesRef; }#if }#for }#if ## select model model <- CbsModel(cesList, refList); ## get index of arrays if (is.null(arrays)) { arrays <- indexOfArrays(model); } else { arrays <- which(getNames(model)==arrays); }#if ## fit model with Chromosome Explorer ce <- ChromosomeExplorer(model, tags=tags); setParallelSafe(ce, status=TRUE); #necessary for running on cluster tmp <- process(ce, arrays=arrays, chromosomes=chromosomes, force=force, verbose=verbose); ## get fitted CN regions from model cnrs <- getRegions(model, arrays=arrays, chromosomes=chromosomes, organism="Human", hgVersion="hg18"); return(object); }#computeCN Here is my sessionInfo: > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: C attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] biasnp_0.2.41 RODBC_1.2-5 aroma.affymetrix_1.0.1 [4] aroma.apd_0.1.3R.huge_0.1.6 affxparser_1.14.2 [7] aroma.core_1.0.1 aroma.light_1.11.1 digest_0.3.1 [10] matrixStats_0.1.4 R.rsp_0.4.0 R.httpd_0.1.1 [13] R.cache_0.1.7 R.utils_1.1.3 R.oo_1.4.6 [16] R.methodsS3_1.0.3 Thank you in advance for your help. Best regards Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Interpretation of GLAD results for SNP6.0
Dear Nicolas, Thank you for your comments, I will delete all segments with count=1. Since currently aroma.affymetrix supports both GLAD and CBS, I am routinely using both algorithms for CN analysis. Ultrasome looks interesting, but first I would need aroma.affymetrix anyhow for normalization of the raw copynumbers, and second it is free for academic use only. Currently I am more worried about the normalization step, I would wish that aroma.affymetrix could include the "reference alignment procedure" of S.Pounds et al. Best regards Christian On Apr 8, 12:11 am, stran...@broad.mit.edu wrote: > Hi Christian, > > On Mar 10, 10:53 am, cstratowa > ingelheim.com> wrote: > > Dear all, > > > I am currently determining copynumber regions for tumor cell lines > > using GLAD as model for the SNP6.0 array. (This takes about 13 hrs per > > array and per cluster node) > > Despite GLAD being a good segmentation algorithm, you may want to > consider alternatives that are less computationally intensive and give > equivalent results. CBS (package DNAcopy in bioconductor) runs in 1.5h > for a SNP6.0 array. A fairly new but very promising program is > Ultrasome (http://www.broad.mit.edu/ultrasome). It segments a SNP6.0 > array in 2 seconds (plus 1 minute to read the array). It's a > standalone program. > > > My problem is the interpretation of the results. > > Here are some examples: > > > 1. example: > > "chromosome" "start" "stop" "mean" "count" "call" > > 1 3674045 10539859 0.154251128612596 4517 > > "neutral" > > 1 10544961 10544961 1.57871123504559 1 > > "gain" > > 1 10546623 10546623 -2.21390332082091 1 > > "loss" > > 1 10548849 21751915 0.150400582947561 6872 > > "neutral" > > > This example has two regions where start=stop position (10544961, > > 10546623). > > How do you interpret this result? > > How should regions with count=1 be interpreted? > > Are these two regions considered to be outliers? > > SNP6.0 arrays are very noisy. You should be very careful when you > consider any segment that has a size < 10 probes, and in any case, you > should not believe any segment that has a size of one probe. In your > case, I would just delete these segments. They are likely to be marked > as "outliers" by GLAD. > > > 2. example: > > "chromosome" "start" "stop" "mean" "count" "call" > > 6 18253390 18253886 -1.33576824261601 2 > > "loss" > > 6 18254264 29945167 -0.0986937482773974 7921 > > "neutral" > > 6 29945167 29945167 1.47894372023 1 "gain" > > 6 29945294 30015940 0.225347608366043 42 > > "gain" > > > Here you see two regions with different start positions but identical > > stop positions (=29945167), where one region shows a gain but the > > other region is neutral. > > How do you interpret this result? > > Same problem, count is too small. > > > > > 3. example: > > "chromosome" "start" "stop" "mean" "count" "call" > > 8 15672606 15672655 -2.45103201175721 2 > > "loss" > > 8 15672695 15672695 -4.68547262667068 1 > > "loss" > > 8 15672695 15678118 -2.65573944523429 10 > > "loss" > > 8 15678687 15678880 -2.01991964356680 3 > > "loss" > > > In this case two regions have identical start positions (=15672695). > > How can this be the case? > > Could this be a bug, since it is assumed that two regions should not > > overlap? > > > 4. example: > > "chromosome" "start" "stop" "mean" "count" "call" > > 17 23813398 32995436 -0.272936124639797 5529 > > "neutral" > > 17 32999012 32999012 -2.26635873379834 1 > > "loss" > > 17 32999012 32999012 -8.28862000630996 1 > > "loss" > > 17 33000940 36676630
[aroma.affymetrix] Re: Problem understanding FIRMA
Dear Mark I have sent you the PNG file to your email. Best regards Christian On Mar 23, 9:44 am, "Mark Robinson" wrote: > Hi Christian. > > Thanks for this. That's great! > > Can you send me a copy of your PNG file to my email? > > I will add this > to:http://groups.google.com/group/aroma-affymetrix/web/redundancy-tests > > Cheers, > Mark > > > > > Dear Mark, > > > I have now created the following function, which people can use to > > compute firma scores: > > > # - - - - - - - - - - - - - - - - - - - > > firma <- function(m, method="rlm") { > > ## function to compute FIRMA scores (adapted from aroma.affymetrix) > > ## m: dataframe containing normalized probe intensities (linear > > scale) > > ## with 1.column containing the transcript_cluster_id > > ## and 2. column containing the corresponding probeset_ids > > ## method: fitting model, one of "rlm" or "mdp" > > > ## convert expression levels to log2 > > y <- as.matrix(log2(m[,3:ncol(m)])); > > > ## dimensions > > K <- nrow(y); # number of probes > > I <- ncol(y); # number of arrays > > > ## fit log-additive model > > if (method == "rlm") { > > fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345, > > PACKAGE="preprocessCore"); > > } else if (method == "mdp") { > > mp <- medpolish(y, trace.iter=FALSE); > > fit <- list(Estimates = c(mp$overall + mp$col, mp$row), > > StdErrors = rep(0, length(c(mp$row, mp$col; > > } else { > > stop("method must be "); > > }#if > > > ## extract parameters > > est <- fit$Estimates; > > se <- fit$StdErrors; > > > ## chip effects > > beta <- est[1:I]; > > > ## probe affinities > > if (K == 1) { > > ## if only one probe must have affinity=1 since sum constraint > > alpha <- 0; > > } else { > > ## affinities sum to zero (on log scale) > > alpha <- est[(I+1):length(est)]; > > alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]); > > }#if > > > ## estimates on the intensity scale > > theta <- 2^beta; > > phi <- 2^alpha; > > > ## calculate residuals > > phi <- matrix(phi, nrow=K, ncol=I, byrow=FALSE); > > theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE); > > yhat <- phi *theta; > > eps <- (2^y)/yhat; # rma uses y/yhat > > > ## estimate of standard error > > u.mad <- mad(unlist(log2(eps)), center=0); > > > ## get probeset_ids > > id <- unique(m[,2]); > > > ## firma scores > > fs <- sapply(id, > > function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2, > > median)}); > > fs <- t(fs); > > rownames(fs) <- id; > > > return(fs); > > }#firma > > # - - - - - - - - - - - - - - - - - - - > > > Now I can apply this function to the normalized intensities of gene > > UNR: > > > > library(preprocessCore) > > > score.mdp <- firma(unr, "mdp") > > > score.rml <- firma(unr, "rlm") > > > Then I can plot the firma scores for robust fitting ("rlm") compared > > to > > median-polish ("mdp"): > > > > png(file="firmaplots6.png", width=540, height=800) > > > oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2)) > > > for (i in 1:6) { > > > plot(score.rml[,i], type="l", ylim=c(-3,5), > > main=colnames(score.rml)[i], xlab="Probeset", ylab="Score") > > > abline(h=0, lty=3) > > > lines(score.mdp[,i], lty=2) > > > } > > > par(oldpar) > > > dev.off() > > > The attached plots compare "rlm" (solid lines) and "mdp" (dashed > > lines) > > and should reflect the firma scores for heart and muscle shown in > > Figure > > 3B of Purdom et al. > > As you can see the difference between"rlm" and "mdp" is pretty small. > > > Best regards > > Christian > > > P.S.: It seems that I cannot attach the png-file. > > > On Mar 18, 8:28 am, cstratowa > ingelheim.com> wrote: > >> Dear Mark, > > >> Probably, there is not much difference, maybe I will check it. > >> I will let you know but it will take some time. >
[aroma.affymetrix] Re: Problem understanding FIRMA
Dear Mark, I have now created the following function, which people can use to compute firma scores: # - - - - - - - - - - - - - - - - - - - firma <- function(m, method="rlm") { ## function to compute FIRMA scores (adapted from aroma.affymetrix) ## m: dataframe containing normalized probe intensities (linear scale) ## with 1.column containing the transcript_cluster_id ## and 2. column containing the corresponding probeset_ids ## method: fitting model, one of "rlm" or "mdp" ## convert expression levels to log2 y <- as.matrix(log2(m[,3:ncol(m)])); ## dimensions K <- nrow(y); # number of probes I <- ncol(y); # number of arrays ## fit log-additive model if (method == "rlm") { fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345, PACKAGE="preprocessCore"); } else if (method == "mdp") { mp <- medpolish(y, trace.iter=FALSE); fit <- list(Estimates = c(mp$overall + mp$col, mp$row), StdErrors = rep(0, length(c(mp$row, mp$col; } else { stop("method must be "); }#if ## extract parameters est <- fit$Estimates; se <- fit$StdErrors; ## chip effects beta <- est[1:I]; ## probe affinities if (K == 1) { ## if only one probe must have affinity=1 since sum constraint alpha <- 0; } else { ## affinities sum to zero (on log scale) alpha <- est[(I+1):length(est)]; alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]); }#if ## estimates on the intensity scale theta <- 2^beta; phi <- 2^alpha; ## calculate residuals phi <- matrix(phi, nrow=K, ncol=I, byrow=FALSE); theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE); yhat <- phi *theta; eps <- (2^y)/yhat; # rma uses y/yhat ## estimate of standard error u.mad <- mad(unlist(log2(eps)), center=0); ## get probeset_ids id <- unique(m[,2]); ## firma scores fs <- sapply(id, function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2, median)}); fs <- t(fs); rownames(fs) <- id; return(fs); }#firma # - - - - - - - - - - - - - - - - - - - Now I can apply this function to the normalized intensities of gene UNR: > library(preprocessCore) > score.mdp <- firma(unr, "mdp") > score.rml <- firma(unr, "rlm") Then I can plot the firma scores for robust fitting ("rlm") compared to median-polish ("mdp"): > png(file="firmaplots6.png", width=540, height=800) > oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2)) > for (i in 1:6) { >plot(score.rml[,i], type="l", ylim=c(-3,5), main=colnames(score.rml)[i], xlab="Probeset", ylab="Score") >abline(h=0, lty=3) >lines(score.mdp[,i], lty=2) > } > par(oldpar) > dev.off() The attached plots compare "rlm" (solid lines) and "mdp" (dashed lines) and should reflect the firma scores for heart and muscle shown in Figure 3B of Purdom et al. As you can see the difference between"rlm" and "mdp" is pretty small. Best regards Christian P.S.: It seems that I cannot attach the png-file. On Mar 18, 8:28 am, cstratowa wrote: > Dear Mark, > > Probably, there is not much difference, maybe I will check it. > I will let you know but it will take some time. > > Best regards > Christian > > On Mar 17, 11:46 am, Mark Robinson wrote: > > > Hi Christian. > > > > However, in this respect I have also the following question: > > > How does using "median polish" compare to using > > > "R_rlm_rma_default_model"? > > > Are the final scores still of some use if you use medpol? > > > Short answer is I haven't investigated this too thoroughly. > > > But, my guess is that it wouldn't be "too" different. That prediction > > is based on the fact that the chip effects are in the same ballpark, > > as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol) > > plot in the following thread: > > >http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/... > > > But, I'd be interested to hear more details if you do look into it more. > > > Cheers, > > Mark > > > > Best regards > > > Christian > > > > On Mar 16, 9:13 am, Mark Robinson wrote: > > >> Hi Christian. > > > >> From what I can tell looking at your code (rather quickly, i must > > >> admit), there will be 2 differences between aroma.affymetrix and what > > >> you have: > > > >> 1. We use the 'preprocessCore' codebase for
[aroma.affymetrix] Re: Problem understanding FIRMA
Dear Mark, Probably, there is not much difference, maybe I will check it. I will let you know but it will take some time. Best regards Christian On Mar 17, 11:46 am, Mark Robinson wrote: > Hi Christian. > > > > > However, in this respect I have also the following question: > > How does using "median polish" compare to using > > "R_rlm_rma_default_model"? > > Are the final scores still of some use if you use medpol? > > Short answer is I haven't investigated this too thoroughly. > > But, my guess is that it wouldn't be "too" different. That prediction > is based on the fact that the chip effects are in the same ballpark, > as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol) > plot in the following thread: > > http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/... > > But, I'd be interested to hear more details if you do look into it more. > > Cheers, > Mark > > > > > Best regards > > Christian > > > On Mar 16, 9:13 am, Mark Robinson wrote: > >> Hi Christian. > > >> From what I can tell looking at your code (rather quickly, i must > >> admit), there will be 2 differences between aroma.affymetrix and what > >> you have: > > >> 1. We use the 'preprocessCore' codebase for the robust fitting of the > >> linear model (... but maybe you are just using median polish as an > >> illustration). For example, you might try: > > >> library(preprocessCore) > >> f <- .Call("R_rlm_rma_default_model", log2(yTr), 0, > >> 1.345,PACKAGE="preprocessCore") > >> [... and piece together the alpha, beta, etc ...] > > >> 2. The "estimate of standard error" is calculated genewise, over > >> residuals from all probes/samples (i.e. u.mad should be a scalar > >> not a > >> vector). > > >> Hope that helps. > >> Mark > > >> On 16/03/2009, at 6:32 PM, cstratowa wrote: > > >>> Dear all, > > >>> After reading the FIRMA paper I would like to understand the > >>> implementation, but this is not easy since the source code is hard > >>> to > >>> read. Nevertheless, I tried and would like to know if this is > >>> correct. > > >>> According to the page on exon array analysis you do the following: > > >>> I, fit a summary of the entire transcript > >>>> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE) > >>>> fit(plmTr, verbose=verbose) > > >>> II, fit the FIRMA model for each exon > >>>> firma <- FirmaModel(plmTr) > >>>> fit(firma, verbose=verbose) > > >>> However, I would like to understand the underlying source code. > > >>> For this example let us assume that we have quantile-normalized > >>> intensities yTr for a transcript containing two exons: > >>>> yTr > >>> HeartA HeartB HeartC MuscleA MuscleB MuscleC > >>> 1 5.74954 18.0296 2.50436 15.5857 26.1744 31.0075 > >>> 2 9.59819 23.0093 22.01120 70.1742 32.8408 102.0080 > >>> 3 114.50800 87.1742 70.34080 312.3410 266.1740 601.3410 > >>> 4 66.34080 52.0075 67.34080 184.1740 266.1740 147.0080 > >>> 5 210.17400 142.0080 173.34100 514.5080 659.1740 509.6740 > >>> 6 104.00800 84.3408 70.34080 333.5080 324.1740 231.0080 > >>> 7 194.00800 124.5080 234.00800 443.6740 767.5080 716.8410 > >>> 8 319.34100 282.6740 283.50800 656.0080 807.6740 954.6740 > > >>> Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2. > > >>> I, fit a summary of the entire transcript > >>> To simplify issues I will fit the data using median polish: > >>> # 1. fit median polish > >>>> mp <- medpolish(log2(yTr)) > > >>> # 2. data set specific estimates (probe affinities) > >>>> beta <- mp$overall+mp$col > >>>> thetaTr <- 2^beta > > >>> # 3. array-specific estimates > >>>> alpha <- mp$row > >>>> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]) > >>>> phiTr <- 2^alpha > > >>> II, fit FIRMA model for each exon > >>> # 1. calculate residuals > >>>> phi <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr)) > >>>> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr), > >>> byrow=TRUE) > >>>> yhat &l
[aroma.affymetrix] Re: Problem understanding FIRMA
Dear Mark, Thank you for your fast reply. You are right: ad 2, "estimate of standard error", when I do: > u.mad <- mad(unlist(log2(eps)), center=0) > u.mad [1] 0.3797971 I get for the 1.probeset > y1 <- log2(eps[1:4,]) > F1 <- apply(y1/u.mad, 2, median) HeartA HeartB HeartC MuscleA MuscleB MuscleC -0.37628338 0.13465003 -0.63489610 0.06787988 -0.30099279 0.99667714 and for the 2.probeset > y2 <- log2(eps[5:8,]) > F2 <- apply(y2/u.mad, 2, median) HeartAHeartBHeartC MuscleA MuscleB MuscleC -9.719282e-02 -7.810612e-01 -3.951703e-01 -4.654384e-01 -2.844947e-16 -1.108233e+0 ad 1, Yes, for the moment I am using median polish as an illustration. However, in this respect I have also the following question: How does using "median polish" compare to using "R_rlm_rma_default_model"? Are the final scores still of some use if you use medpol? Best regards Christian On Mar 16, 9:13 am, Mark Robinson wrote: > Hi Christian. > > From what I can tell looking at your code (rather quickly, i must > admit), there will be 2 differences between aroma.affymetrix and what > you have: > > 1. We use the 'preprocessCore' codebase for the robust fitting of the > linear model (... but maybe you are just using median polish as an > illustration). For example, you might try: > > library(preprocessCore) > f <- .Call("R_rlm_rma_default_model", log2(yTr), 0, > 1.345,PACKAGE="preprocessCore") > [... and piece together the alpha, beta, etc ...] > > 2. The "estimate of standard error" is calculated genewise, over > residuals from all probes/samples (i.e. u.mad should be a scalar not a > vector). > > Hope that helps. > Mark > > On 16/03/2009, at 6:32 PM, cstratowa wrote: > > > > > > > Dear all, > > > After reading the FIRMA paper I would like to understand the > > implementation, but this is not easy since the source code is hard to > > read. Nevertheless, I tried and would like to know if this is correct. > > > According to the page on exon array analysis you do the following: > > > I, fit a summary of the entire transcript > >> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE) > >> fit(plmTr, verbose=verbose) > > > II, fit the FIRMA model for each exon > >> firma <- FirmaModel(plmTr) > >> fit(firma, verbose=verbose) > > > However, I would like to understand the underlying source code. > > > For this example let us assume that we have quantile-normalized > > intensities yTr for a transcript containing two exons: > >> yTr > > HeartA HeartB HeartC MuscleA MuscleB MuscleC > > 1 5.74954 18.0296 2.50436 15.5857 26.1744 31.0075 > > 2 9.59819 23.0093 22.01120 70.1742 32.8408 102.0080 > > 3 114.50800 87.1742 70.34080 312.3410 266.1740 601.3410 > > 4 66.34080 52.0075 67.34080 184.1740 266.1740 147.0080 > > 5 210.17400 142.0080 173.34100 514.5080 659.1740 509.6740 > > 6 104.00800 84.3408 70.34080 333.5080 324.1740 231.0080 > > 7 194.00800 124.5080 234.00800 443.6740 767.5080 716.8410 > > 8 319.34100 282.6740 283.50800 656.0080 807.6740 954.6740 > > > Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2. > > > I, fit a summary of the entire transcript > > To simplify issues I will fit the data using median polish: > > # 1. fit median polish > >> mp <- medpolish(log2(yTr)) > > > # 2. data set specific estimates (probe affinities) > >> beta <- mp$overall+mp$col > >> thetaTr <- 2^beta > > > # 3. array-specific estimates > >> alpha <- mp$row > >> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]) > >> phiTr <- 2^alpha > > > II, fit FIRMA model for each exon > > # 1. calculate residuals > >> phi <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr)) > >> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr), > > byrow=TRUE) > >> yhat <- phi *theta > >> eps <- yTr/yhat # rma uses y/yhat > > > # 2. estimate of standard error > >> u.mad <- apply(log2(eps), 2, mad, center=0) > > > # 3. compute final score statisitc > > # for 1. exon > >> y1 <- log2(eps[1:4,]) > >> F1 <- apply(y1/u.mad, 2, median) > >> F1 > > HeartA HeartB HeartC MuscleA MuscleB > > MuscleC > > -0.89938777 -0.03792624 -0.69409936 0.11536565 -0.61385296 > > 1.08709568 > > > # for 2. exon > >> y2 <- log2(eps[5:8,]) > >> F2 <
[aroma.affymetrix] Problem understanding FIRMA
Dear all, After reading the FIRMA paper I would like to understand the implementation, but this is not easy since the source code is hard to read. Nevertheless, I tried and would like to know if this is correct. According to the page on exon array analysis you do the following: I, fit a summary of the entire transcript > plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE) > fit(plmTr, verbose=verbose) II, fit the FIRMA model for each exon > firma <- FirmaModel(plmTr) > fit(firma, verbose=verbose) However, I would like to understand the underlying source code. For this example let us assume that we have quantile-normalized intensities yTr for a transcript containing two exons: > yTr HeartA HeartBHeartC MuscleA MuscleB MuscleC 1 5.74954 18.02962.50436 15.5857 26.1744 31.0075 2 9.59819 23.0093 22.01120 70.1742 32.8408 102.0080 3 114.50800 87.1742 70.34080 312.3410 266.1740 601.3410 4 66.34080 52.0075 67.34080 184.1740 266.1740 147.0080 5 210.17400 142.0080 173.34100 514.5080 659.1740 509.6740 6 104.00800 84.3408 70.34080 333.5080 324.1740 231.0080 7 194.00800 124.5080 234.00800 443.6740 767.5080 716.8410 8 319.34100 282.6740 283.50800 656.0080 807.6740 954.6740 Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2. I, fit a summary of the entire transcript To simplify issues I will fit the data using median polish: # 1. fit median polish > mp <- medpolish(log2(yTr)) # 2. data set specific estimates (probe affinities) > beta <- mp$overall+mp$col > thetaTr <- 2^beta # 3. array-specific estimates > alpha <- mp$row > alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]) > phiTr <- 2^alpha II, fit FIRMA model for each exon # 1. calculate residuals > phi <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr)) > theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr), byrow=TRUE) > yhat <- phi *theta > eps <- yTr/yhat# rma uses y/yhat # 2. estimate of standard error > u.mad <- apply(log2(eps), 2, mad, center=0) # 3. compute final score statisitc # for 1. exon > y1 <- log2(eps[1:4,]) > F1 <- apply(y1/u.mad, 2, median) > F1 HeartA HeartB HeartC MuscleA MuscleB MuscleC -0.89938777 -0.03792624 -0.69409936 0.11536565 -0.61385296 1.08709568 # for 2. exon > y2 <- log2(eps[5:8,]) > F2 <- apply(y2/u.mad, 2, median) > F2 HeartA HeartB HeartC MuscleA MuscleB MuscleC -0.02899616 -1.64645153 -0.70048533 -0.39996057 0.02666064 -1.46657055 Now my question is: Is this calculation of the final score statistic F1 for exon 1 and F2 for exon 2 correct? Did I miss something? Best regards Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Interpretation of GLAD results for SNP6.0
Dear all, I am currently determining copynumber regions for tumor cell lines using GLAD as model for the SNP6.0 array. (This takes about 13 hrs per array and per cluster node) My problem is the interpretation of the results. Here are some examples: 1. example: "chromosome""start" "stop" "mean" "count" "call" 1 3674045 10539859 0.154251128612596 4517 "neutral" 1 1054496110544961 1.578711235045591 "gain" 1 1054662310546623 -2.21390332082091 1 "loss" 1 1054884921751915 0.150400582947561 6872 "neutral" This example has two regions where start=stop position (10544961, 10546623). How do you interpret this result? How should regions with count=1 be interpreted? Are these two regions considered to be outliers? 2. example: "chromosome""start" "stop" "mean" "count" "call" 6 1825339018253886 -1.33576824261601 2 "loss" 6 1825426429945167 -0.0986937482773974 7921 "neutral" 6 2994516729945167 1.47894372023 1 "gain" 6 2994529430015940 0.225347608366043 42 "gain" Here you see two regions with different start positions but identical stop positions (=29945167), where one region shows a gain but the other region is neutral. How do you interpret this result? 3. example: "chromosome""start" "stop" "mean" "count" "call" 8 1567260615672655 -2.45103201175721 2 "loss" 8 1567269515672695 -4.68547262667068 1 "loss" 8 1567269515678118 -2.65573944523429 10 "loss" 8 1567868715678880 -2.01991964356680 3 "loss" In this case two regions have identical start positions (=15672695). How can this be the case? Could this be a bug, since it is assumed that two regions should not overlap? 4. example: "chromosome""start" "stop" "mean" "count" "call" 172381339832995436 -0.272936124639797 5529 "neutral" 173299901232999012 -2.26635873379834 1 "loss" 173299901232999012 -8.28862000630996 1 "loss" 173300094036676630 -0.295688057544771 1841 "neutral" This example seems to be a combination of examples 2 and 3: Here you have two identical regions with identical positions and start=stop (=32999012) How can two regions be identical but show a different mean (-2.266 vs -8.288)? Could this be a bug? In summary my questions are: How should I interpret the results? Would it be better to eliminate these regions? Thank you in advance. Best regards Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Broken link to Mapping50K_Xba240,na26,HB20080916.ufl
Thank you, Christian On Feb 23, 4:52 pm, Henrik Bengtsson wrote: > Fixed. /Henrik > > On Mon, Feb 23, 2009 at 6:24 AM, cstratowa > > wrote: > > > Dear Henrik > > > On your web-page the link to: > > Mapping50K_Xba240,na26,HB20080916.ufl > > is broken > > > Best regards > > Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Broken link to Mapping50K_Xba240,na26,HB20080916.ufl
Dear Henrik On your web-page the link to: Mapping50K_Xba240,na26,HB20080916.ufl is broken Best regards Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: ufl and ugp files for na27?
p)...done Reading (unitName, fragmentLength+) from file...done 2. Running "importFrom(ufl, csv, verbose=-50)" for StyI has the following partial output: Reading AffymetrixNetAffxCsvFile...done Extracting fragment lengths from ([enzyme], lengths, start, stop)... Inferring if enzyme names are specified... Has enzyme names: TRUE Inferring if enzyme names are specified...done Identifying number of enzymes... nbrOfEnzymes 1 2 3 ==> 93436 144865 3 Max number of enzymes: 3 Identifying number of enzymes...done Splitting into subunits and padding with NAs... Splitting into subunits and padding with NAs...done Extracting enzyme names... Identified enzymes: NspI, StyI Extracting enzyme names...done Identifying the location of the fragment lengths... Offset: 3 Identifying the location of the fragment lengths...done Extracting fragment lengths... Summary of *all* fragment lengths: Min. 1st Qu. MedianMean 3rd Qu.Max.NA's 22 481 695 788 9662000 332341 Extracting fragment lengths...done Sorting data by enzyme... V1 V2 V3 Min. : 22 Min. : 115 Min. :NA 1st Qu.: 672 1st Qu.: 440 1st Qu.:NA Median : 1078 Median : 598 Median :NA Mean : 1082 Mean : 609 Mean : NaN 3rd Qu.: 1499 3rd Qu.: 771 3rd Qu.:NA Max. : 2000 Max. :1266 Max. :NA ==> NA's :93436 NA's : 607 NA's :238304 Sorting data by enzyme...done int [1:238304, 1] NA 1139 599 1872 1857 1465 382 1651 NA 1833 ... Extracting fragment lengths from ([enzyme], lengths, start, stop)...done Reading (unitName, fragmentLength+) from file...done As you see the higlighted lines for NspI and STyI are: V1 V2 V3 ==> NA's : 701.0 NA's :131564 NA's :262264 ==> NA's :93436 NA's : 607 NA's :238304 This means missing = 701 appears in V1 for NspI, but missing = 607 appears in V2 for StyI. What does this mean? Best regards Christian On Jan 15, 1:50 am, Henrik Bengtsson wrote: > Hi Christian, > > I'm quite swamped but I've added links the scripts that I used to > generate the existing NetAffx 26 (na26) UFL and UGP to each of the > different SNP & CN chip type pages, e.g. > > http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp-m... > > More comments below. > > On Mon, Jan 12, 2009 at 6:24 AM, cstratowa > > wrote: > > > Dear Henrik > > > Meanwhile I have created ufl and ugp files for both 100K and 500K > > arrays but not for GenomeWideSNP_6 aray. > > The above scripts will show you how to do it for GenomeWideSNP_6. > Slightly more complicated since two NetAffx CSV files are involved. > > > > > > > Can you confirm that the following code, which I use for both 100K and > > 500K arrays, is correct: > > > # retrieving annotation files > > chiptypes <- c("Mapping50K_Hind240", "Mapping50K_Xba240") > > cdfs <- lapply(chiptypes, FUN=function(x){AffymetrixCdfFile$byChipType > > (x)}) > > names(cdfs) <- chiptypes > > print(cdfs) > > > # importing data from NetAffx CSV files > > csvs <- lapply(cdfs, FUN=function(cdf){AffymetrixNetAffxCsvFile > > $byChipType(getChipType(cdf), tags=".na27")}) > > print(csvs) > > > # allocating empty UFL (Unit Fragment Length) files > > ufls <- lapply(cdfs, FUN=function(cdf){AromaUflFile$allocateFromCdf > > (cdf, tags="na27,CS20090112")}) > > print(ufls) > > > # import SNP data > > units <- list(); > > for (chipType in names(ufls)) { > > ufl <- ufls[[chipType]]; > > csv <- csvs[[chipType]]; > > units[[chipType]] <- importFrom(ufl, csv, verbose=-50); > > } > > str(units) > > > # allocating empty UGP (Unit Genome Position) files > > ugps <- lapply(cdfs, FUN=function(cdf){AromaUgpFile$allocateFromCdf > > (cdf, tags="na27,CS20090112")}) > > print(ugps) > > > # import SNP data > > units <- list(); > > for (chipType in names(ugps)) { > > ugp <- ugps[[chipType]]; > > csv <- csvs[[chipType]]; > > units[[chipType]] <- importFrom(ugp, csv, verbose=-50); > > } > > str(units) > > This looks alright to me. You might want to check toward the posted > NA26 scripts as well, because they are more recent. > > > > > > > Here is the summary for the 100K arrays: > > # Summary 50K chips > >> str(units) > > List of 2 > > $ Mapping50K_Hind240: int [1:57244] 18632
[aroma.affymetrix] Re: ufl and ugp files for na27?
's :775.000 NA's : 775 Could it be that function summaryOfUnits() does not work as expected? For ufl it prints "enzyme1-only" instead of e.g. "NspI-only" For ugp it prints an error: no applicable method Is it correct that for na27 there are now 93436 missing SNPs for StyI compared to only 75 missing SNPs for na24 (as shown on your page about building UFL files)? Can I build the ufl and ugp files in the same way for GenomeWideSNP_6? I assume that I need to repeat everything done for ".na27" with ".cn.na27"? Will these files be identical to the files supplied by you or do you use the file "GenomeWideSNP_6_build36_SNPandCN.tab", which only you have (as decribed in your pages)? Here is the sessionInfo: > sessionInfo() R version 2.7.1 (2008-06-23) x86_64-unknown-linux-gnu locale: C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] aroma.affymetrix_0.9.4 aroma.apd_0.1.3 R.huge_0.1.6 [4] affxparser_1.12.2 aroma.core_0.9.4 sfit_0.1.5 [7] aroma.light_1.8.1 digest_0.3.1 matrixStats_0.1.3 [10] R.rsp_0.3.4R.cache_0.1.7 R.utils_1.0.4 [13] R.oo_1.4.5 R.methodsS3_1.0.3 > Best regards Christian On Jan 8, 2:39 pm, cstratowa wrote: > Dear Henrik > > Would it be possible for you to supply the ufl and ugp files for the > new Affymetrix xxx.na27.annot.csv files for the mapping arrays? > > Thank you in advance > Best regards > Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] ufl and ugp files for na27?
Dear Henrik Would it be possible for you to supply the ufl and ugp files for the new Affymetrix xxx.na27.annot.csv files for the mapping arrays? Thank you in advance Best regards Christian --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---