Thanks a lot, Henrik. Very helpful and informative, as usual. Just for reference, the removeDirectory had to be recursive to work: removeDirectory(getPath(bc), recursive=TRUE, mustExist=FALSE)
_Taku On Jan 24, 7:01 pm, Henrik Bengtsson <henrik.bengts...@aroma- project.org> wrote: > Hi. > > On Mon, Jan 24, 2011 at 2:58 PM, Taku Tokuyasu <tok...@gmail.com> wrote: > > I would like to know if there are any recommendations on re-running > > 'aroma' on a subset of samples. In my case, the first pass through a > > dataset (Affy Mouse expression) revealed an outlier array. I would > > now like to re-run with the outlier array removed. > > I guess you use this as an example only. Dropping a single array > shouldn't make a big difference on the other arrays, because the > analysis steps used are all robust against outliers (as long as you > have enough samples in your data set, say, more than 10). However, if > you want compare the results from using all samples with or without > that outlier, below is how to do it. > > > > > 1) Is extract() the recommended way to do sample subsetting: > > cs <- extract(cs, 2:length(cs)) ## first array is the outlier > > Using brackets [] produced a list, so I presume that does not work. > > Yes, use extract() to subset a data set. (Correct, brackets are not > doing the same thing are neither part of the official API, simply > because we haven't decided on what they should do). > > > > > 2) An aroma re-run quickly returns at this point, because the output > > files already exist. It appears necessary to remove the output files > > first. > > Correct, if there already exists previous (intermediate and/or final) > results that have the same data set full name (name plus tags), the > aroma framework assumes the content is correct. > > Alt 1: The easiest way to force the rerun is to simply delete those > intermediate results, which typically can be found in subdirectories > of the following "root" directories: probeData/ and plmData/. Other > directories may also be created, depending on the analysis you do. > > Alt 2: An alternative is to add a new tag to the *first* step of the > analysis where your want to drop some samples. For instance, in your > case it is sufficient to do it in the quantile normalization step, > because the RMA-style background correction is a truly single-array > method. So, you can do qn <- QuantileNormalization(csBC, > typesToUpdate="pm", tags="*,v2"). That will append your custom tag > "v2" to the default ones (hence "*"). Since any downstream steps will > include tags from previous steps, this will also make sure new > intermediate and final results will be done. > > More comments below: > > > The following code appears apropos (pulled from > >http://www.agron-omics.eu/uploads/Tiling%20array%20files/agronomicsTo... > > Just FYI, that script contains lots of "tricks" for aroma.*, R.cache, > R.utils etc, some of which I do not recommend others to use. There is > also some code in there that the author of that script may want to fix > (if they're listening on this channel). > > >>>> CODE > > force <- TRUE > > bc <- RmaBackgroundCorrection(celSet); > > if (force & !is.null(getOutputFiles(bc))){ > > file.remove(getOutputFiles(bc)) > > } > > The easiest way to delete a data set from within R, is to do: > > removeDirectory(getPath(bc), mustExist=FALSE); > > > csBC <- process(bc, verbose=verbose, force=force, overwrite=force); > > qn <- QuantileNormalization(csBC, typesToUpdate="pm"); > > if (force) file.remove(getTargetDistributionPathname(qn)) > > I don't not encourage this. First, the > getTargetDistributionPathname() is not part of the public API. > Second, it is even unnecessary to delete the so called target > distribution file (here "getTargetDistributionPathname(qn)"), because > it is calculated as the robust average of all arrays in the 'csBC' > data set and its filename is generated using checksums such that the > filename will be unique for any set of data files. > > > clearCache(qn) > > Again, an internal method is used and shouldn't be needed; I'm not > sure why it is used here. > > > if (force & !is.null(getOutputFiles(qn))){ > > file.remove(getOutputFiles(qn)) > > } > > csN <- process(qn, verbose=verbose, force=force); > > plm <- RmaPlm(csN); > > fit(plm, unit=NULL, verbose=verbose, force=force) > > <<< END OF CODE > > > I feel having the output files consistent (i.e. not mixed from > > different runs) is a good idea. > > Yes. > > > The flip side is, analyzing subsets > > of samples in parallel (e.g. before a strict decision on outliers has > > been made) is probably best handled by treating each one as a separate > > dataset, starting from the original CEL files. > > Or better, only at the first step that is a multi-array method, as > suggested above. > > Thus, it is useful to understand how the models/methods/algorithms > work so one can tell which are truly single-array methods and which > are multi-array methods. (Yes, I've been considering to annotate the > methods/classes to contain this information. The problem is that some > methods can be both depending on which parameters are used. It is > also a priority thing). > > Hope this helps > > Henrik > > > > > Regards, > > > _Taku > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > > version of the package, 2) to report the output of sessionInfo() and > > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > > "aroma.affymetrix" group with websitehttp://www.aroma-project.org/. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ > > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/