[aroma.affymetrix] Re: Recommendations on re-running on a subset of samples

Taku Tokuyasu Fri, 28 Jan 2011 13:08:01 -0800

Thanks a lot, Henrik.  Very helpful and informative, as usual.

Just for reference, the removeDirectory had to be recursive to work:
 removeDirectory(getPath(bc), recursive=TRUE, mustExist=FALSE)


_Taku


On Jan 24, 7:01 pm, Henrik Bengtsson <henrik.bengts...@aroma-
project.org> wrote:
> Hi.
>
> On Mon, Jan 24, 2011 at 2:58 PM, Taku Tokuyasu <tok...@gmail.com> wrote:
> > I would like to know if there are any recommendations on re-running
> > 'aroma' on a subset of samples.  In my case, the first pass through a
> > dataset (Affy Mouse expression) revealed an outlier array.  I would
> > now like to re-run with the outlier array removed.
>
> I guess you use this as an example only.  Dropping a single array
> shouldn't make a big difference on the other arrays, because the
> analysis steps used are all robust against outliers (as long as you
> have enough samples in your data set, say, more than 10).  However, if
> you want compare the results from using all samples with or without
> that outlier, below is how to do it.
>
>
>
> > 1) Is extract() the recommended way to do sample subsetting:
> >  cs <- extract(cs, 2:length(cs))  ## first array is the outlier
> > Using brackets [] produced a list, so I presume that does not work.
>
> Yes, use extract() to subset a data set.  (Correct, brackets are not
> doing the same thing are neither part of the official API, simply
> because we haven't decided on what they should do).
>
>
>
> > 2) An aroma re-run quickly returns at this point, because the output
> > files already exist.  It appears necessary to remove the output files
> > first.
>
> Correct, if there already exists previous (intermediate and/or final)
> results that have the same data set full name (name plus tags), the
> aroma framework assumes the content is correct.
>
> Alt 1: The easiest way to force the rerun is to simply delete those
> intermediate results, which typically can be found in subdirectories
> of the following "root" directories: probeData/ and plmData/.  Other
> directories may also be created, depending on the analysis you do.
>
> Alt 2: An alternative is to add a new tag to the *first* step of the
> analysis where your want to drop some samples.  For instance, in your
> case it is sufficient to do it in the quantile normalization step,
> because the RMA-style background correction is a truly single-array
> method.  So, you can do qn <- QuantileNormalization(csBC,
> typesToUpdate="pm", tags="*,v2").  That will append your custom tag
> "v2" to the default ones (hence "*").  Since any downstream steps will
> include tags from previous steps, this will also make sure new
> intermediate and final results will be done.
>
> More comments below:
>
> > The following code appears apropos (pulled from
> >http://www.agron-omics.eu/uploads/Tiling%20array%20files/agronomicsTo...
>
> Just FYI, that script contains lots of "tricks" for aroma.*, R.cache,
> R.utils etc, some of which I do not recommend others to use.  There is
> also some code in there that the author of that script may want to fix
> (if they're listening on this channel).
>
> >>>>  CODE
> > force <- TRUE
> > bc <- RmaBackgroundCorrection(celSet);
> > if (force & !is.null(getOutputFiles(bc))){
> >    file.remove(getOutputFiles(bc))
> > }
>
> The easiest way to delete a data set from within R, is to do:
>
> removeDirectory(getPath(bc), mustExist=FALSE);
>
> > csBC <- process(bc, verbose=verbose, force=force, overwrite=force);
> > qn <- QuantileNormalization(csBC, typesToUpdate="pm");
> > if (force) file.remove(getTargetDistributionPathname(qn))
>
> I don't not encourage this.  First, the
> getTargetDistributionPathname() is not part of the public API.
> Second, it is even unnecessary to delete the so called target
> distribution file (here "getTargetDistributionPathname(qn)"), because
> it is calculated as the robust average of all arrays in the 'csBC'
> data set and its filename is generated using checksums such that the
> filename will be unique for any set of data files.
>
> > clearCache(qn)
>
> Again, an internal method is used and shouldn't be needed; I'm not
> sure why it is used here.
>
> > if (force & !is.null(getOutputFiles(qn))){
> >    file.remove(getOutputFiles(qn))
> > }
> > csN <- process(qn, verbose=verbose, force=force);
> > plm <- RmaPlm(csN);
> > fit(plm, unit=NULL, verbose=verbose, force=force)
> > <<<   END OF CODE
>
> > I feel having the output files consistent (i.e. not mixed from
> > different runs) is a good idea.
>
> Yes.
>
> > The flip side is, analyzing subsets
> > of samples in parallel (e.g. before a strict decision on outliers has
> > been made) is probably best handled by treating each one as a separate
> > dataset, starting from the original CEL files.
>
> Or better, only at the first step that is a multi-array method, as
> suggested above.
>
> Thus, it is useful to understand how the models/methods/algorithms
> work so one can tell which are truly single-array methods and which
> are multi-array methods.  (Yes, I've been considering to annotate the
> methods/classes to contain this information.  The problem is that some
> methods can be both depending on which parameters are used.  It is
> also a priority thing).
>
> Hope this helps
>
> Henrik
>
>
>
> > Regards,
>
> > _Taku
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> > version of the package, 2) to report the output of sessionInfo() and
> > traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google Groups
> > "aroma.affymetrix" group with websitehttp://www.aroma-project.org/.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/
>
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

[aroma.affymetrix] Re: Recommendations on re-running on a subset of samples

Reply via email to